Follow the link https://github.com/washingtonpost/data-police-shootings to read about the data. Write a three sentence summary of what is included in this data, and where it comes from. This dataset allows only a small number of categories for race, and the dataset fails to recognize that some people are multiracial. This is a weakness of the dataset. Include at least one other observation that may be a weakness of the data
Written Answer
You can view csv by running the code chunk below.
import pandas as pdfatal_police_shootings_data = pd.read_csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/refs/heads/master/v2/fatal-police-shootings-data.csv')fatal_police_shootings_data.head(10)
id
date
threat_type
flee_status
armed_with
city
county
state
latitude
longitude
location_precision
name
age
gender
race
race_source
was_mental_illness_related
body_camera
agency_ids
0
3
2015-01-02
point
not
gun
Shelton
Mason
WA
47.246826
-123.121592
not_available
Tim Elliot
53.0
male
A
not_available
True
False
73
1
4
2015-01-02
point
not
gun
Aloha
Washington
OR
45.487421
-122.891696
not_available
Lewis Lee Lembke
47.0
male
W
not_available
False
False
70
2
5
2015-01-03
move
not
unarmed
Wichita
Sedgwick
KS
37.694766
-97.280554
not_available
John Paul Quintero
23.0
male
H
not_available
False
False
238
3
8
2015-01-04
point
not
replica
San Francisco
San Francisco
CA
37.762910
-122.422001
not_available
Matthew Hoffman
32.0
male
W
not_available
True
False
196
4
9
2015-01-04
point
not
other
Evans
Weld
CO
40.383937
-104.692261
not_available
Michael Rodriguez
39.0
male
H
not_available
False
False
473
5
11
2015-01-04
attack
not
gun
Guthrie
Logan
OK
35.876991
-97.423454
not_available
Kenneth Joe Brown
18.0
male
W
not_available
False
False
101
6
13
2015-01-05
shoot
car
gun
Chandler
Maricopa
AZ
33.327887
-111.840959
not_available
Kenneth Arnold Buck
22.0
male
H
not_available
False
False
195
7
15
2015-01-06
point
not
gun
Assaria
Saline
KS
38.703755
-97.563904
not_available
Brock Nichols
35.0
male
W
not_available
False
False
490
8
16
2015-01-06
accident
not
unarmed
Burlington
Des Moines
IA
40.809250
-91.118875
not_available
Autumn Steele
34.0
female
W
not_available
False
True
287
9
17
2015-01-06
point
not
replica
Knoxville
Allegheny
PA
40.412936
-79.991408
not_available
Leslie Sapp III
47.0
male
B
not_available
False
False
26254
List out some of the columns that are contained in this csv.
Written Answer
For each of those columns, give some examples for the data in that column, and what they mean. For example, some examples of values in the first column id are 22, 325, and 140.
Written Answer
In order to compare the representation of black people among the subjects of fatal police shootings to their representation in the general population, we need to know the percent of the United States population which is black. You can find this by googling ’racial demographics of the United States‘. Include a citation with the organization and url.
Written Answer
Task2: Reading data into a dict
Run the code chunk below. This will select some of the features from the dataset, and put it into a Python dictionary.
What information from the original csv is stored in policing_dict? What information from the original csv do we use as the keys in policing_dict? What is the type of the values in policing_dict? Hint: use type(dict[key]).
Student Answer Hint: create a code chunk with type(dict[key]) to find the data type of the values in our dictionary
Task3: Using the database
Finding a particular record: Find the record of the fatal police shooting with ID number 1694. If you’ve followed current events in the past few years, there should be a familiar name here.
Student Answer Add policing_dict[1694] to a code chunk
Displaying data from a single record: Print the name and state of the individual with ID number 1694.
Student Answer
Displaying data from multiple records: The following code will display the entire dictionary using a for loop:
for person,data in policing dict.items():
print(policing dict[person])
Add an if statement to the for loop to filter out only individuals from Minnesota. This means you’ll need to check the state for each person and only print data if the state is Minnesota.
Student Answer
Task4: Summarizing our data
Creating a new dictionary as a subset: Instead of displaying all the information, let’s create a new dictionary that only contains inviduals from Minnesota. The syntax for this is: key:values for key, values in dict.items() if condition where condition is what you’re filtering on and dict is your dictionary. Name this new dictionary MN_selection.
Student Answer
Creating summarization functions: We’d like to create a function that takes in our database dictionary and returns a dictionary with counts of occurrences of each race among subjects of fatal police shootings. The keys in this dictionary should be races, and the corresponding values should be the number of subjects of that race.
Student Answer The starter code for this function is below:
def get_race_counts(database:dict)->dict: race_counts = {}for person, data in database.items():#add logic to populate race countsreturn race_counts
Using a summarization function: Print the fraction of fatal police shootings with a black subject. This should be a number between 0 and 1, and can be computed by dividing the number of fatal police shootings with a black subject, by the total number of fatal police shootings.
Student Answer Note: using get, len, and round may be helpful.
How does the proportion of black subjects in fatal police shootings compare to the proportion of black people in the United States population?
Student Answer
Task5: Comparing Summarizations
Create a new dictionary called unarmed_selection, which is built using dictionary comprehension on the original dictionary policing_dict. This dictionary should have the same structure as policing dict, except it will only contain entries for fatal police shootings where the subject was unarmed.
Student Answer
Create a new dictionary, called unarmed race counts, which is created using the get_race_counts function on the dictionary unarmed_selection. The purpose of this dictionary will be to count the number of occurrences of each race among subjects of fatal police shootings, including only those where the subject is unarmed.
Student Answer
Print the fraction of unarmed fatal police shootings with a black subject. This should be a number between 0 and 1, and can be computed by dividing the number of unarmed fatal police shootings with a black subject (you can get this from the dictionary unarmed race counts), by the total number of unarmed fatal police shootings.
Student Answer
How does the proportion of black subjects in fatal police shootings where the subject is unarmed compare to the proportion of black people in the United States population? How does it compare to the proportion of black subjects in all police shootings?
Student Answer
Task6: Trends over time
Creating a list of a single feature: Let’s now take a look at creating a list with our data. Using the code below, edit it so that only dates are shown (hint: do something to data). Now, get rid of anything that isn’t the year (use some string slicing). Name this list year_col, which contains only the year for each record in our database.
Student Answer Edit [data for person, data in policing dict.items()]
Creating counts from a list: Summarize the list you just created into a dictionary where each key is a single year and the value is the count of that year in our database. Note: If you do this with dictionary comprehension use the list method count and set(year col).
Student Answer
How do the number of fatal police shootings change over time? Do you notice any patterns or trends?
Student Answer
Task7: More stories to be told
Sources: * U.S. Crime and Arrest Data (FBI UCR) * U.S. Demographic Data (Census Bureau) * Police Departments’ Use of Force Data (BJS) * Pew Research on Policing * Other relevant sources (Reddit, social media, etc.)
List at least two sources you found informative. Include the organization, url, and brief description of the information the source contains. This source could be an additional dataset, or it could be a result of an analysis done. Our goal is to try to link the insights from the previous tasks to other insights found elsewhere.
Student Answer
Come up with at least three questions/insights that could supplement the analysis you have done so far. These questions should address some aspects of the following:
validation: how can the data from these sources support or challenge the findings you’ve already made about racial disparities in fatal police shootings?
expansion: how can additional data be used to expand upon the analysis in new areas, or further explore systemic issues related to police shootings?
comparisons: how do the findings from these sources compare to the conclusion you’ve drawn from other tasks? Are there any contradictions or surprising findings?
Student Answer
Task8: Reflection
Write a reflection (at least five sentences) on what you learned from this project. This can include your reaction to the results of the project, as well as the process of working with the data.
Student Answer
When your group is done, each partner must submit a link to their workbook to Moodle. Make sure the link you shared is public.