
Each lab will be completed with a lab partner. The group can decide which member is the recorder (copies the colab workbook and notetakes) and the researcher (searchs documentation, slide decks, class materials). The recorder should submit a clickable link to the Moodle submission by the next lecture.
Lab: INTRO
In this lab, we explore how to build our own Python dictionaries and functions to store and summarize data. Using a national policing dataset, we examine the impact of policing on Black Americans. Click to view Colab Notebook
Lab: EDA
In this lab, we will explore how to critic and refine code outputted by a language model, and practice a bit of prompt engineering. We will be using the Titanic dataset to perform exploratory data analysis tasks, including calculating statistical measures, conducting statistical tests, and creating visualizations. Click to view Colab Notebook
Lab: EVL
In this lab, we will explore the Wisconsin Breast Cancer dataset and focus on key aspects of model evaluation and resampling techniques. We will comment on the complexity of chosen models in a developed workflow, discuss resampling techniques, as well as the implications on various evaluation metrics. The lab finishes with auditing the algorithms using the ethical matrix framework discussed in class. Click to view Colab Notebook
Lab: CLS
In this lab, we will be working with image data and exploring how decisions made in the modeling workflow impact results for our clustering models. There are a selection of open-ended questions (give your best guess on these) along with a few questions that require you to implement your own clustering functions. Click to view Colab Notebook
Lab: DATA
In this lab, we will perform various data cleaning techniques and transformations on a rental housing dataset. The last task in this lab requires you to go out and view analyses that other coders have published using this dataset, bringing back at least one interesting code snippet you saw. Click to view Colab Notebook
Lab: TREE
In this lab, we will train a tree-based model that can use the measurements of a tumor to diagnosis it as benign or malignant. Click to view Colab Notebook
Lab: Models
For this two-part lab, you will come up with (Part 1) and implement (Part 2) a workflow to determine the attributes that make a cookie delicious. Based on historical data of 5K cookies regarding baked_temp, sugar_index among others, the authors of Cookie Monster collected the dataset we’d like to use.