Exam 1

Exam rules

This exam takes place during regular class time on Thursday 29th October 2025 from 09:00-10:45 in classroom 2.05.

This is an open exam. This means that you may consult any resources during the exam, on condition that you observe good citation practice. This means that you must link to any sources you use, such as documentation, Stack Overflow, blog postings, etc. by providing a link to the URL in question. Failure to cite your sources may be considered plagiarism.

If you use Large Language Models, such as ChatGPT, you must link to your entire dialogue or provide a copy of it in your repo.

During the exam you may not communicate with any other person.

Submission

You should make a git repository to hold your work and push it to the AUC Forgejo server in the same way as you have done so for each of your weekly assignments.

Part of your grade will be for good git discipline. Each commit should be a single, logical unit of work. Commit often, commit small. Use clear and descriptive commit messages.

Push frequently to avoid losing work in the event of a problem with your laptop.

Your grade will be based on the work you push to the Forgejo server before the end of the exam.

Instructions

A Californian investor wishes to invest in housing. He has collected data about geographical areas in the state called districts. Each district typically has a population of 600 to 3,000 people. For each district, the investor has collected data such as the population, median income, and median housing price.

The housing data set

The investor is interested to know if he can predict the median housing price for other districts for which he has all of the data except the median housing price. This is the task he has set you.

Your task is to write a report for the investor on the quality of the dataset and to construct a regression model which predicts the median housing price of a district from some or all of the other data provided about that district .

Your report should use data-driven arguments to convince the investor of the soundness and validity of the model.

You may choose to follow some or all of the following steps.

  1. Analyse the dataset, identifying characteristics that might be useful or indeed problematic for modelling.
  2. Examine each feature of the dataset, making observations about it which might have a bearing on your modelling.
  3. Examine the features pairwise for possible correlation. Make recommendations on feature selection.
  4. Select a performance metric and motivate your choice.
  5. Make explicit any assumptions you make about the data.
  6. Clean the data where necessary.
  7. Select and apply transformations of features to make them more amenable to modelling.
  8. Select a regression model and motivate your choice.
  9. Split the data into training and test sets.
  10. Analyse the performance of your model.
  11. Fine tune the model.
  12. Deliver a final model together with its performance metric.
  13. If you have ideas for improving the model but do not have the time to carry them out, write them in a section on future work.

Author: Breanndán Ó Nualláin <o@uva.nl>

Date: 2025-10-30 Thu 08:59