Data Frames

Breaking the ice

Get to know your partner. Here are some questions you might ask each other.

  1. What's your name? Where do you come from?
  2. Which AUC courses have we followed in common?
  3. Are you following the Information Track or have you another focus? How do you see ML fitting into your study programme?
  4. How are you finding this course so far? What do you like about it? Do you have any worries about it?
  5. Have you experimented with Polars yet? Could you figure it out?
  6. How would you like to plan our work together?

git and forgejo

Introduce yourself to git

  • Where are you working? Make a directory structure.
  • Tell git your full name and email

    git config --global user.name "Your Name"
    git config --global user.email "you@example.com"
    
  • Check if git knows your full name and email

    git config --get user.name
    git config --get user.email
    

Make a git repository

  • Only one member of your team should do this!
  • This is my directory structure. You may choose your own

    cd
    pwd
    cd projects/auc/courses/ml/repos
    mkdir trial-repo
    cd trial-repo
    
  • Now make this a git repo

    git init
    
  • Use your code editor (Spyder, vim, Visual Studio, Emacs) to edit a python file

    def factorial(n):
        if n == 0:
            return 1
        else:
            return n * factorial(n - 1)
    
  • Save it to factorial.py
  • Check the git status and note the branch name

    git status
    
  • Add it to the git index

    git add factorial.py
    
  • Commit it and check the log

    git commit -m "Add factorial file"
    git log
    

Make a remote repo on Forgejo

  • Use the + at to top right to make a new repo.
  • Give it a sensible name, e.g. the same as the repo on your laptop trial-repo
  • Under the Advanced settings check that the name of the Default branch matches that on your laptop.
  • Click Create repository
  • There are three headers
    • Under "Clone this repository" you will see HTTPS highlighted. Click on SSH beside so that is highlighted.
    • You can ignore "Creating a new repository on the command line". We have done most of that.
    • Under "Pushing an existing repository from the command line" you will see two commands. Copy paste these into your command line where you made your repo on your laptop.
    • Reload the Forgejo page. You should see your Python file.

Get your partner on board

  • On the Forgejo page, click Settings (near the top right)
  • On the left click Collaborators
  • In the Search users.. box, type your partner's Forgejo userid.
  • Click "Add collaborator"
  • You should see a line with their name. On the right should be Write. This means that they can write to (and read from) the repo.
  • Now click on the repository name trial-repo near the top left.
  • Your partner(s) should go to this page too. They should see near the top on a red background either HTTPS or SSH. They should click on SSH so that it is red. Then copy the URL beside it. It should be something like

    ssh://forgejo@git.auc-computing.nl/bon/trisl-repo.git
    
  • Now your partners(s) should cd to the place in their directory structure where they want to keep their repos for the ML course.
  • They shold now clone the repo with this command on the command line

    git clone ssh://forgejo@git.auc-computing.nl/bon/trisl-repo.git
    
  • The can cd to that directory, see the Python file and check the git log

    cd trial-repo
    ls -l
    git log
    

Collaborate!

  • Now you and your partner(s) should take turns editing the file(s) in the repo.
  • After each edit
    1. add
    2. commit
    3. push to Forgejo using

      git push
      

Markdown

  • Markdown is a "simple and easy-to-use markup language you can use to format virtually any document."
  • We will use it to write our reports.
  • In your repo, make a markdown file called README.md and push it. Notice how Forgejo formats it beautifully.

LLMs

Exploring data

Section 2.4 of the ISLP book has three exercises 8, 9 & 10, each of which explores a data set, step by step. Carry out each of these steps but using Polars instead of Numpy and Pandas.

A zip file containing all data sets for ISLP can be found at datasets

When loading a .csv file using pl.read_csv be sure to give the full path name of the file or make sure that Python is working in the directory where the data file is found.

Author: Breanndán Ó Nualláin <o@uva.nl>

Date: 2025-09-11 Thu 07:57