Name:
Who you worked with:
##Objectives The goals of this project are to: * Understand the process of exploratory data analysis (EDA), statistical tests, and visualizations. * Develop the ability to critique and improve code generated by a language model (like ChatGPT or DeepSeek). * Create two publication-style visualizations that tell a story.
##Overview This project is designed to reinforce some of the concepts we have learned while exploring data. For Task 1, you are to generate Python code to analyze and summarize the Titanic dataset given a prompt for a genAI model. For Task 2, you will critique the generated program by identifying three areas of improvement. For Task 3, you will implement the improvements you identified. For Task 4, you will create two publication-style visualizations that tell a story. For Task 5, you will reflect on this assignment.
##Schedule Here is the suggested schedule for working on this project: * By Thursday, 2/13, read through the project instructions. * By Sunday, 2/16, complete Tasks 1-2 of the project, and start Task 3 of the project. * By Tuesday, 2/18, complete Task 3 of the project, and start Task 4. * By Wednesday, 2/19, complete Tasks 4-5 and check your solutions against the rubric (included at the end of these instructions), and submit your files through moodle.
This project is due on Thursday, 2/20, by 11:59pm.
#Task 1: Generate an analysis
Use the following prompt in ChatGPT to generate Python code for analyzing the Titanic dataset:
Write a Python script using pandas, matplotlib, and seaborn that loads the Titanic dataset using seaborn, performs exploratory data analysis (EDA) including basic statistics, visualizes relationships between features using seaborn (include three different chart types), and conducts and interprets three separate statistical tests to examine the relationship survival and other features. Summarize the results of the analysis, and do not performing any data cleaning or preprocessing.
model used:
Part 1: Generate Code
Add the output to a code chunk below.
If your code does not initially run, you may want to comment some of it out and return to it during Task2.
Part 2: Outline Analysis
With the program that was generated, fill in the methods taken for each of the points below (keep it general). Some examples have been filled in for you.
Once you have listed these out, add a quick explaination for why this step was done (i.e. what did it accomplish? what do you know about the data now?)
Steps taken below:
- Basic statistics:
headdescribeisnull+sum
- Visualizations:
sns.boxplot
- Statistical tests:
stats.ttest_ind
Task 2: Critique the analysis
Time to hack this code apart. The goal of this is to identify areas of improvement of the analysis that was generated.
Part 1: Questions to consider
Some initial questions to consider are listed below. Add three more questions you may want to consider when looking over EDA, visualizations, and code in general.
Example questions: * Is the choice of statistical test appropriate for the analysis? * Are there interpretations of statistical tests beyond p-value? * Are the visualizations clear, informative, and easy to interpret? * Are there any redundancies or areas where the code could be simplified?
Additional questions to consider:
Part 2: Improvements
Using all the questions above, list out three improvements that you would like to implement.
For each improvement, describe when this would occur (e.g. visualizations, statistical tests), why you would like to implement it, and what additional information is gained by doing so (i.e. what was the purpose of doing it?)
Improvements:
Task 3: Implement improvements
Now, time to implement your improvements! Your goal is to edit the original prompt to include the improvements you came up with, and then generate code with those improvements included.
Part 1: Updated Prompt
You may find that you will have to iteratively update the prompt to get the exact output you’re looking for. Once you have a final prompt that implements all of the improvements, add it to the text box below
Part 2: Updated Code
Now add the outputted code generated from the updated prompt below.
Task 4: Visualizations that tell a story
For this task, you need to create a visualization that could appear in a publication like The Pudding, The New York Times, or FiveThirtyEight. The visualization should aim to tell a story about the Titanic dataset, focusing on the relationship between survival and key variables (e.g., class, sex, age).
Part 1: Viz 1
Part 2: Viz 2
Task 5: Reflection
Take a moment to reflect on this assignment. What did you like about it? What could be improved? Your answers will not affect your overall grade. This feedback will be used to improve future programming assignments.