Data Frames
Breaking the ice
Get to know your partner. Here are some questions you might ask each other.
- What's your name? Where do you come from?
- Which AUC courses have we followed in common?
- Are you following the Information Track or have you another focus? How do you see ML fitting into your study programme?
- How are you finding this course so far? What do you like about it? Do you have any worries about it?
- Have you experimented with Polars yet? Could you figure it out?
- How would you like to plan our work together?
git and forgejo
Introduce yourself to git
- Where are you working? Make a directory structure.
Tell git your full name and email
git config --global user.name "Your Name" git config --global user.email "you@example.com"
Check if git knows your full name and email
git config --get user.name git config --get user.email
Make a git repository
- Only one member of your team should do this!
This is my directory structure. You may choose your own
cd pwd cd projects/auc/courses/ml/repos mkdir trial-repo cd trial-repo
Now make this a git repo
git init
Use your code editor (Spyder, vim, Visual Studio, Emacs) to edit a python file
def factorial(n): if n == 0: return 1 else: return n * factorial(n - 1)
- Save it to
factorial.py Check the git status and note the branch name
git status
Add it to the git index
git add factorial.py
Commit it and check the log
git commit -m "Add factorial file" git log
Make a remote repo on Forgejo
- Use the
+at to top right to make a new repo. - Give it a sensible name, e.g. the same as the repo on your laptop
trial-repo - Under the
Advanced settingscheck that the name of theDefault branchmatches that on your laptop. - Click
Create repository - There are three headers
- Under "Clone this repository" you will see
HTTPShighlighted. Click onSSHbeside so that is highlighted. - You can ignore "Creating a new repository on the command line". We have done most of that.
- Under "Pushing an existing repository from the command line" you will see two commands. Copy paste these into your command line where you made your repo on your laptop.
- Reload the Forgejo page. You should see your Python file.
- Under "Clone this repository" you will see
Get your partner on board
- On the Forgejo page, click
Settings(near the top right) - On the left click
Collaborators - In the
Search users..box, type your partner's Forgejo userid. - Click "Add collaborator"
- You should see a line with their name. On the right should be
Write. This means that they can write to (and read from) the repo. - Now click on the repository name
trial-reponear the top left. Your partner(s) should go to this page too. They should see near the top on a red background either
HTTPSorSSH. They should click onSSHso that it is red. Then copy the URL beside it. It should be something likessh://forgejo@git.auc-computing.nl/bon/trisl-repo.git
- Now your partners(s) should
cdto the place in their directory structure where they want to keep their repos for the ML course. They shold now clone the repo with this command on the command line
git clone ssh://forgejo@git.auc-computing.nl/bon/trisl-repo.git
The can cd to that directory, see the Python file and check the git log
cd trial-repo ls -l git log
Collaborate!
- Now you and your partner(s) should take turns editing the file(s) in the repo.
- After each edit
- add
- commit
push to Forgejo using
git push
Markdown
- Markdown is a "simple and easy-to-use markup language you can use to format virtually any document."
- We will use it to write our reports.
- In your repo, make a markdown file called
README.mdand push it. Notice how Forgejo formats it beautifully.
Exploring data
Section 2.4 of the ISLP book has three exercises 8, 9 & 10, each of which explores a data set, step by step. Carry out each of these steps but using Polars instead of Numpy and Pandas.
A zip file containing all data sets for ISLP can be found at datasets
When loading a .csv file using pl.read_csv be sure to give the full path name
of the file or make sure that Python is working in the directory where the data
file is found.