Feedback on Classification Assignment
Adrian, Emil
- Well constructed arguments but lots of bullets. Try to write full sentences.
- Some comments in code
- "Imbalance leads to a low overall accuracy and poor reliability for bearish predictions." Thus…?
- Clear observations
- kNN only with k=1
Jess, Janina
- Little analysis
- Formatting and layout need attention
Carlijn, Fridolin
- Make a new repo for each assignment please.
- Report well written and formatted
- Correlations well argued. Observation, hypothesis, test.
- Good use of LLMs, but advice not always followed
- Hints for transforming variables
- Handling nonpositive values
- Better model overview
- Pandas vs Polars
- "The model predicts Up most of the times, meaning we have a lot of false positives." Thus unbalanced data. Thus…?
- Good plot of error rate of kNN v Logistic Regression
- Use of LLM's
- model transformation explanation
- add column transformation
code error solving and git problems
Nice — that’s a clean and systematic way to explore transformations for your predictors! Here’s what your code is doing, step by step: ... ⚠️ Potential Improvements / Notes ...
That error — ValueError: endog has evaluated to an array with multiple columns that has shape (605, 2) This occurs when the variable converted to endog is non-numeric (e.g., bool or str). — means your response variable (Direction) is categorical (e.g. "Up"/"Down"), and statsmodels.ols is trying to treat it as numeric, which OLS can’t handle.
TypeError: DataFrame object does not support Series assignment by index ChatGPT said: Good catch — that error usually means something like this happened: ❌ You tried to assign a Pandas Series to a Polars DataFrame using weekly_pd[...] = .... 🔍 What’s going on Your earlier code suggests you’re using Polars, not Pandas — for example from your earlier messages (weekly_pd looks like a Polars DataFrame). But here: weekly_pd[f"log_{col}"] = np.log(weekly_pd[col] + 1e-6) that’s Pandas-style assignment — it won’t work with Polars. Polars uses expression syntax instead of direct column assignment. ✅ How to fix it ...If you want, I can rewrite your full loop for all transformations (log, sqrt, x^2) and automatically print a confusion matrix for each, using only the libraries you listed. This will give you a clean, reproducible workflow. Do you want me to do that?
You said: yes please
I see exactly what’s happening here: 👉 The error comes from this line: python X_train, X_test = X.loc[train], X.loc[~train] The problem is that X is a NumPy array (because you used .to_numpy() earlier). NumPy arrays don’t have .loc (that’s a pandas method).
Asmahene, Jagoda
Efraim, Lynn
- Good correlation plot.
- Give your plot files sensible names.
- Note how to do math in markdown using LaTeX notation.
- Better explanation and reasoning would be helpful.
Romane, Natalia
- Good explanation and reasoning about the parts that you did.
- Note how to do math in markdown using LaTeX notation.
- Good interpretation of kNN tuning plot
ChatGPT said
There’s no real problem here 🙂 — your commits and push worked perfectly.
- when i add .idea/ to my gitigonre i dont have to commit right?
ChatGPT said: Good question — but actually, yes, you do need to commit 😄 Here’s why: Adding .idea/ to your .gitignore just changes a local text file called .gitignore. Git treats that file like any other tracked file — so if you want that ignore rule to apply for everyone who clones the repo (and to keep it saved permanently), you must commit the change.
what are these problems
ChatGPT said: Excellent question — these warnings are actually quite common when you use LDA or QDA in scikit-learn, and they indicate numerical or collinearity problems in your data. Let’s break each part down clearly 👇
- whats the function for KNN in python
- how do you put bold text in markdown
whats a k tunning plot
ChatGPT said: Excellent question 👏 — you’re probably referring to a “k tuning plot” (also called a KNN tuning curve) used when training a K-Nearest Neighbors (KNN) model. Let’s go over it clearly 👇
- how do i cite a chat gpt conversation in a README in markdown language
Matúš, Madalina
- All commits by Matúš with input from Madalina
- Data in repo
- Good explations of your understanding of what you are seeing.
- The volume data appear to follow exponential/ polynomial trend. Once we take the cube root, we can see the data shape changing/ becoming more linear.
- The following table was generated using ChatGPT (link above), I struggled with splitting the data for training:
Accuracy: 0.994 Accuracy is fairly high => not trustworthy at all, perhaps due to the high number of predictors. Multicolinearity is highly unlikely since no correlation could be observed between the remaining variables.
Yaohong, Giorgia
- "Volume increases exponentially with year." If that is so then log(Volume) should be approximately linear. That is something you could test!
- Try a tuning plot for kNN
Nora, Mariana
- Your layout and formatting need attention
- Nice to see you citing resources other than LLMs!
Emilio, Louanne, Roman
- All Roman?
- Commit messages a bit monotonous.
- Good use of plots in the data exploration phase.
- Clear statement of observations and hyphotheses/questions.
- Limit float precision.
does this mean that in 2009 and 2010 the predictions were very good or no
ChatGPT said: Excellent question — and it shows you’re thinking critically about what the results actually mean 👏 Let’s unpack this clearly: ...
- DeprecationWarning
ChatGPT said: Ah good catch 🔥 — you’re on Polars 1.0+, and both of those warnings are about API changes. Let me rewrite the snippet in the new Polars style so it’s clean and future-proof:
so I want the same plots but the year one the line is weird
ChatGPT said: Excellent question — that “weird” line in your Volume vs Year plot happens because your data has many rows per year (e.g., weekly observations), so Matplotlib connects all those points in order — making the line zig-zag heavily. You’ve got two clean ways to fix that, depending on what you want to show 👇 ...