Feedback on Classification Assignment

Adrian, Emil

Well constructed arguments but lots of bullets. Try to write full sentences.
Some comments in code
"Imbalance leads to a low overall accuracy and poor reliability for bearish predictions." Thus…?
Clear observations
kNN only with k=1

Jess, Janina

Little analysis
Formatting and layout need attention

Carlijn, Fridolin

Make a new repo for each assignment please.
Report well written and formatted
Correlations well argued. Observation, hypothesis, test.
Good use of LLMs, but advice not always followed
- Hints for transforming variables
- Handling nonpositive values
- Better model overview
- Pandas vs Polars
"The model predicts Up most of the times, meaning we have a lot of false positives." Thus unbalanced data. Thus…?
Good plot of error rate of kNN v Logistic Regression

Use of LLM's

model transformation explanation
add column transformation

code error solving and git problems

Nice — that’s a clean and systematic way to explore transformations for your predictors!

Here’s what your code is doing, step by step:

...

⚠️ Potential Improvements / Notes

...

That error —

ValueError: endog has evaluated to an array with multiple columns that has shape (605, 2)
This occurs when the variable converted to endog is non-numeric (e.g., bool or str).

— means your response variable (Direction) is categorical (e.g. "Up"/"Down"), and
statsmodels.ols is trying to treat it as numeric, which OLS can’t handle.

TypeError: DataFrame object does not support Series assignment by index
ChatGPT said:

Good catch — that error usually means something like this happened:

    ❌ You tried to assign a Pandas Series to a Polars DataFrame using weekly_pd[...] = ....

🔍 What’s going on

Your earlier code suggests you’re using Polars, not Pandas — for example from
your earlier messages (weekly_pd looks like a Polars DataFrame).

But here:

weekly_pd[f"log_{col}"] = np.log(weekly_pd[col] + 1e-6)

that’s Pandas-style assignment — it won’t work with Polars.
Polars uses expression syntax instead of direct column assignment.

✅ How to fix it

...

If you want, I can rewrite your full loop for all transformations (log, sqrt, x^2)
and automatically print a confusion matrix for each, using only the libraries you
listed. This will give you a clean, reproducible workflow.

Do you want me to do that?

You said: yes please

I see exactly what’s happening here:

👉 The error comes from this line:
python

X_train, X_test = X.loc[train], X.loc[~train]

The problem is that X is a NumPy array (because you used .to_numpy() earlier).
NumPy arrays don’t have .loc (that’s a pandas method).

Asmahene, Jagoda

Efraim, Lynn

Good correlation plot.
Give your plot files sensible names.
Note how to do math in markdown using LaTeX notation.
Better explanation and reasoning would be helpful.

Romane, Natalia

Good explanation and reasoning about the parts that you did.
Note how to do math in markdown using LaTeX notation.
Good interpretation of kNN tuning plot

ChatGPT said

There’s no real problem here 🙂 — your commits and push worked perfectly.

when i add .idea/ to my gitigonre i dont have to commit right?

ChatGPT said:

Good question — but actually, yes, you do need to commit 😄

Here’s why:

Adding .idea/ to your .gitignore just changes a local text file called .gitignore.

Git treats that file like any other tracked file — so if you want that ignore
rule to apply for everyone who clones the repo (and to keep it saved permanently),
you must commit the change.

what are these problems

ChatGPT said:

Excellent question — these warnings are actually quite common when you use LDA
or QDA in scikit-learn, and they indicate numerical or collinearity problems in
your data. Let’s break each part down clearly 👇

whats the function for KNN in python
how do you put bold text in markdown

whats a k tunning plot

ChatGPT said:

Excellent question 👏 — you’re probably referring to a “k tuning plot” (also
called a KNN tuning curve) used when training a K-Nearest Neighbors (KNN)
model.

Let’s go over it clearly 👇

how do i cite a chat gpt conversation in a README in markdown language

Matúš, Madalina

All commits by Matúš with input from Madalina
Data in repo
Good explations of your understanding of what you are seeing.
The volume data appear to follow exponential/ polynomial trend. Once we take the cube root, we can see the data shape changing/ becoming more linear.
The following table was generated using ChatGPT (link above), I struggled with splitting the data for training:

Accuracy: 0.994

Accuracy is fairly high => not trustworthy at all, perhaps due to the high number of predictors.
Multicolinearity is highly unlikely since no correlation could be observed between the
remaining variables.

Yaohong, Giorgia

"Volume increases exponentially with year." If that is so then log(Volume) should be approximately linear. That is something you could test!
Try a tuning plot for kNN

Nora, Mariana

Your layout and formatting need attention
Nice to see you citing resources other than LLMs!

Emilio, Louanne, Roman

All Roman?
Commit messages a bit monotonous.
Good use of plots in the data exploration phase.
Clear statement of observations and hyphotheses/questions.
Limit float precision.

does this mean that in 2009 and 2010 the predictions were very good or no

ChatGPT said:

Excellent question — and it shows you’re thinking critically about
what the results actually mean 👏

Let’s unpack this clearly:

...

DeprecationWarning

ChatGPT said:

Ah good catch 🔥 — you’re on Polars 1.0+, and both of those warnings are about API changes.
Let me rewrite the snippet in the new Polars style so it’s clean and future-proof:

so I want the same plots but the year one the line is weird

ChatGPT said:

Excellent question — that “weird” line in your Volume vs Year plot
happens because your data has many rows per year (e.g., weekly
observations), so Matplotlib connects all those points in order
— making the line zig-zag heavily.

You’ve got two clean ways to fix that, depending on what you want to show 👇

...