Feedback on Classification Assignment

Adrian, Emil

  • Well constructed arguments but lots of bullets. Try to write full sentences.
  • Some comments in code
  • "Imbalance leads to a low overall accuracy and poor reliability for bearish predictions." Thus…?
  • Clear observations
  • kNN only with k=1

Jess, Janina

  • Little analysis
  • Formatting and layout need attention

Carlijn, Fridolin

  • Make a new repo for each assignment please.
  • Report well written and formatted
  • Correlations well argued. Observation, hypothesis, test.
  • Good use of LLMs, but advice not always followed
    • Hints for transforming variables
    • Handling nonpositive values
    • Better model overview
    • Pandas vs Polars
  • "The model predicts Up most of the times, meaning we have a lot of false positives." Thus unbalanced data. Thus…?
  • Good plot of error rate of kNN v Logistic Regression
  • Use of LLM's
    • model transformation explanation
    • add column transformation
    • code error solving and git problems

      Nice — that’s a clean and systematic way to explore transformations for your predictors!
      
      Here’s what your code is doing, step by step:
      
      ...
      
      ⚠️ Potential Improvements / Notes
      
      ...
      
      That error —
      
      ValueError: endog has evaluated to an array with multiple columns that has shape (605, 2)
      This occurs when the variable converted to endog is non-numeric (e.g., bool or str).
      
      — means your response variable (Direction) is categorical (e.g. "Up"/"Down"), and
      statsmodels.ols is trying to treat it as numeric, which OLS can’t handle.
      
      
      TypeError: DataFrame object does not support Series assignment by index
      ChatGPT said:
      
      Good catch — that error usually means something like this happened:
      
          ❌ You tried to assign a Pandas Series to a Polars DataFrame using weekly_pd[...] = ....
      
      🔍 What’s going on
      
      Your earlier code suggests you’re using Polars, not Pandas — for example from
      your earlier messages (weekly_pd looks like a Polars DataFrame).
      
      But here:
      
      weekly_pd[f"log_{col}"] = np.log(weekly_pd[col] + 1e-6)
      
      that’s Pandas-style assignment — it won’t work with Polars.
      Polars uses expression syntax instead of direct column assignment.
      
      ✅ How to fix it
      
      ...
      
      If you want, I can rewrite your full loop for all transformations (log, sqrt, x^2)
      and automatically print a confusion matrix for each, using only the libraries you
      listed. This will give you a clean, reproducible workflow.
      
      Do you want me to do that?
      
      

      You said: yes please

      I see exactly what’s happening here:
      
      👉 The error comes from this line:
      python
      
      X_train, X_test = X.loc[train], X.loc[~train]
      
      The problem is that X is a NumPy array (because you used .to_numpy() earlier).
      NumPy arrays don’t have .loc (that’s a pandas method).
      
      

Asmahene, Jagoda

Efraim, Lynn

  • Good correlation plot.
  • Give your plot files sensible names.
  • Note how to do math in markdown using LaTeX notation.
  • Better explanation and reasoning would be helpful.

Romane, Natalia

  • Good explanation and reasoning about the parts that you did.
  • Note how to do math in markdown using LaTeX notation.
  • Good interpretation of kNN tuning plot
  • ChatGPT said

    There’s no real problem here 🙂 — your commits and push worked perfectly.
    
    • when i add .idea/ to my gitigonre i dont have to commit right?
    ChatGPT said:
    
    Good question — but actually, yes, you do need to commit 😄
    
    Here’s why:
    
    Adding .idea/ to your .gitignore just changes a local text file called .gitignore.
    
    Git treats that file like any other tracked file — so if you want that ignore
    rule to apply for everyone who clones the repo (and to keep it saved permanently),
    you must commit the change.
    
  • what are these problems

    ChatGPT said:
    
    Excellent question — these warnings are actually quite common when you use LDA
    or QDA in scikit-learn, and they indicate numerical or collinearity problems in
    your data. Let’s break each part down clearly 👇
    
    
  • whats the function for KNN in python
  • how do you put bold text in markdown
  • whats a k tunning plot

    ChatGPT said:
    
    Excellent question 👏 — you’re probably referring to a “k tuning plot” (also
    called a KNN tuning curve) used when training a K-Nearest Neighbors (KNN)
    model.
    
    Let’s go over it clearly 👇
    
  • how do i cite a chat gpt conversation in a README in markdown language

Matúš, Madalina

  • All commits by Matúš with input from Madalina
  • Data in repo
  • Good explations of your understanding of what you are seeing.
  • The volume data appear to follow exponential/ polynomial trend. Once we take the cube root, we can see the data shape changing/ becoming more linear.
  • The following table was generated using ChatGPT (link above), I struggled with splitting the data for training:
Accuracy: 0.994

Accuracy is fairly high => not trustworthy at all, perhaps due to the high number of predictors.
Multicolinearity is highly unlikely since no correlation could be observed between the
remaining variables.

Yaohong, Giorgia

  • "Volume increases exponentially with year." If that is so then log(Volume) should be approximately linear. That is something you could test!
  • Try a tuning plot for kNN

Nora, Mariana

  • Your layout and formatting need attention
  • Nice to see you citing resources other than LLMs!

Emilio, Louanne, Roman

  • All Roman?
  • Commit messages a bit monotonous.
  • Good use of plots in the data exploration phase.
  • Clear statement of observations and hyphotheses/questions.
  • Limit float precision.
  • does this mean that in 2009 and 2010 the predictions were very good or no

    ChatGPT said:
    
    Excellent question — and it shows you’re thinking critically about
    what the results actually mean 👏
    
    Let’s unpack this clearly:
    
    ...
    
    • DeprecationWarning
    ChatGPT said:
    
    Ah good catch 🔥 — you’re on Polars 1.0+, and both of those warnings are about API changes.
    Let me rewrite the snippet in the new Polars style so it’s clean and future-proof:
    
  • so I want the same plots but the year one the line is weird

    ChatGPT said:
    
    Excellent question — that “weird” line in your Volume vs Year plot
    happens because your data has many rows per year (e.g., weekly
    observations), so Matplotlib connects all those points in order
    — making the line zig-zag heavily.
    
    You’ve got two clean ways to fix that, depending on what you want to show 👇
    
    ...
    

Author: Breanndán Ó Nualláin <o@uva.nl>

Date: 2025-10-09 Thu 09:26