Group Members:

Task1: Understanding the data

Follow the link https://github.com/washingtonpost/data-police-shootings to read about the data. Write a three sentence summary of what is included in this data, and where it comes from. This dataset allows only a small number of categories for race, and the dataset fails to recognize that some people are multiracial. This is a weakness of the dataset. Include at least one other observation that may be a weakness of the data

Written Answer

You can view csv by running the code chunk below.

import pandas as pd

fatal_police_shootings_data = pd.read_csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/refs/heads/master/v2/fatal-police-shootings-data.csv')

fatal_police_shootings_data.head(10)

	id	date	threat_type	flee_status	armed_with	city	county	state	latitude	longitude	location_precision	name	age	gender	race	race_source	was_mental_illness_related	body_camera	agency_ids
0	3	2015-01-02	point	not	gun	Shelton	Mason	WA	47.246826	-123.121592	not_available	Tim Elliot	53.0	male	A	not_available	True	False	73
1	4	2015-01-02	point	not	gun	Aloha	Washington	OR	45.487421	-122.891696	not_available	Lewis Lee Lembke	47.0	male	W	not_available	False	False	70
2	5	2015-01-03	move	not	unarmed	Wichita	Sedgwick	KS	37.694766	-97.280554	not_available	John Paul Quintero	23.0	male	H	not_available	False	False	238
3	8	2015-01-04	point	not	replica	San Francisco	San Francisco	CA	37.762910	-122.422001	not_available	Matthew Hoffman	32.0	male	W	not_available	True	False	196
4	9	2015-01-04	point	not	other	Evans	Weld	CO	40.383937	-104.692261	not_available	Michael Rodriguez	39.0	male	H	not_available	False	False	473
5	11	2015-01-04	attack	not	gun	Guthrie	Logan	OK	35.876991	-97.423454	not_available	Kenneth Joe Brown	18.0	male	W	not_available	False	False	101
6	13	2015-01-05	shoot	car	gun	Chandler	Maricopa	AZ	33.327887	-111.840959	not_available	Kenneth Arnold Buck	22.0	male	H	not_available	False	False	195
7	15	2015-01-06	point	not	gun	Assaria	Saline	KS	38.703755	-97.563904	not_available	Brock Nichols	35.0	male	W	not_available	False	False	490
8	16	2015-01-06	accident	not	unarmed	Burlington	Des Moines	IA	40.809250	-91.118875	not_available	Autumn Steele	34.0	female	W	not_available	False	True	287
9	17	2015-01-06	point	not	replica	Knoxville	Allegheny	PA	40.412936	-79.991408	not_available	Leslie Sapp III	47.0	male	B	not_available	False	False	26254

List out some of the columns that are contained in this csv.

Written Answer
For each of those columns, give some examples for the data in that column, and what they mean. For example, some examples of values in the first column id are 22, 325, and 140.

Written Answer
In order to compare the representation of black people among the subjects of fatal police shootings to their representation in the general population, we need to know the percent of the United States population which is black. You can find this by googling ’racial demographics of the United States‘. Include a citation with the organization and url.

Written Answer

Task2: Reading data into a dict

Run the code chunk below. This will select some of the features from the dataset, and put it into a Python dictionary.

policing = fatal_police_shootings_data[['id', 'name', 'date', 'armed_with', 'age', 'gender', 'race', 'state']]
policing_dict = policing.set_index('id').to_dict(orient='index')

What information from the original csv is stored in policing_dict? What information from the original csv do we use as the keys in policing_dict? What is the type of the values in policing_dict? Hint: use type(dict[key]).

Student Answer
Hint: create a code chunk with type(dict[key]) to find the data type of the values in our dictionary

Task3: Using the database

Finding a particular record: Find the record of the fatal police shooting with ID number 1694. If you’ve followed current events in the past few years, there should be a familiar name here.

Student Answer
Add policing_dict[1694] to a code chunk

Displaying data from a single record: Print the name and state of the individual with ID number 1694.

Student Answer
Displaying data from multiple records: The following code will display the entire dictionary using a for loop:

for person,data in policing dict.items():
print(policing dict[person])

Add an if statement to the for loop to filter out only individuals from Minnesota. This means you’ll need to check the state for each person and only print data if the state is Minnesota.

Student Answer

Task4: Summarizing our data

Creating a new dictionary as a subset: Instead of displaying all the information, let’s create a new dictionary that only contains inviduals from Minnesota. The syntax for this is: key:values for key, values in dict.items() if condition where condition is what you’re filtering on and dict is your dictionary. Name this new dictionary MN_selection.

Student Answer

Creating summarization functions: We’d like to create a function that takes in our database dictionary and returns a dictionary with counts of occurrences of each race among subjects of fatal police shootings. The keys in this dictionary should be races, and the corresponding values should be the number of subjects of that race.

Student Answer
The starter code for this function is below:

def get_race_counts(database:dict)->dict:
  race_counts = {}
  for person, data in database.items():
    #add logic to populate race counts


  return race_counts

Using a summarization function: Print the fraction of fatal police shootings with a black subject. This should be a number between 0 and 1, and can be computed by dividing the number of fatal police shootings with a black subject, by the total number of fatal police shootings.

Student Answer
Note: using get, len, and round may be helpful.

How does the proportion of black subjects in fatal police shootings compare to the proportion of black people in the United States population?

Student Answer

Task5: Comparing Summarizations

Create a new dictionary called unarmed_selection, which is built using dictionary comprehension on the original dictionary policing_dict. This dictionary should have the same structure as policing dict, except it will only contain entries for fatal police shootings where the subject was unarmed.

Student Answer

Create a new dictionary, called unarmed race counts, which is created using the get_race_counts function on the dictionary unarmed_selection. The purpose of this dictionary will be to count the number of occurrences of each race among subjects of fatal police shootings, including only those where the subject is unarmed.

Student Answer

Print the fraction of unarmed fatal police shootings with a black subject. This should be a number between 0 and 1, and can be computed by dividing the number of unarmed fatal police shootings with a black subject (you can get this from the dictionary unarmed race counts), by the total number of unarmed fatal police shootings.

Student Answer

How does the proportion of black subjects in fatal police shootings where the subject is unarmed compare to the proportion of black people in the United States population? How does it compare to the proportion of black subjects in all police shootings?

Student Answer

Task6: Trends over time

Creating a list of a single feature: Let’s now take a look at creating a list with our data. Using the code below, edit it so that only dates are shown (hint: do something to data). Now, get rid of anything that isn’t the year (use some string slicing). Name this list year_col, which contains only the year for each record in our database.

Student Answer
Edit [data for person, data in policing dict.items()]

Creating counts from a list: Summarize the list you just created into a dictionary where each key is a single year and the value is the count of that year in our database. Note: If you do this with dictionary comprehension use the list method count and set(year col).

Student Answer

How do the number of fatal police shootings change over time? Do you notice any patterns or trends?

Student Answer

Task7: More stories to be told

Sources: * U.S. Crime and Arrest Data (FBI UCR) * U.S. Demographic Data (Census Bureau) * Police Departments’ Use of Force Data (BJS) * Pew Research on Policing * Other relevant sources (Reddit, social media, etc.)

List at least two sources you found informative. Include the organization, url, and brief description of the information the source contains. This source could be an additional dataset, or it could be a result of an analysis done. Our goal is to try to link the insights from the previous tasks to other insights found elsewhere.

Student Answer

Come up with at least three questions/insights that could supplement the analysis you have done so far. These questions should address some aspects of the following:

validation: how can the data from these sources support or challenge the findings you’ve already made about racial disparities in fatal police shootings?
expansion: how can additional data be used to expand upon the analysis in new areas, or further explore systemic issues related to police shootings?
comparisons: how do the findings from these sources compare to the conclusion you’ve drawn from other tasks? Are there any contradictions or surprising findings?

Student Answer

Task8: Reflection

Write a reflection (at least five sentences) on what you learned from this project. This can include your reaction to the results of the project, as well as the process of working with the data.

Student Answer

When your group is done, each partner must submit a link to their workbook to Moodle. Make sure the link you shared is public.