Open In Colab

Group Members:

Task1: Understanding the data

  1. Follow the link https://github.com/washingtonpost/data-police-shootings to read about the data. Write a three sentence summary of what is included in this data, and where it comes from. This dataset allows only a small number of categories for race, and the dataset fails to recognize that some people are multiracial. This is a weakness of the dataset. Include at least one other observation that may be a weakness of the data

Written Answer

  1. You can view csv by running the code chunk below.
import pandas as pd

fatal_police_shootings_data = pd.read_csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/refs/heads/master/v2/fatal-police-shootings-data.csv')

fatal_police_shootings_data.head(10)
id date threat_type flee_status armed_with city county state latitude longitude location_precision name age gender race race_source was_mental_illness_related body_camera agency_ids
0 3 2015-01-02 point not gun Shelton Mason WA 47.246826 -123.121592 not_available Tim Elliot 53.0 male A not_available True False 73
1 4 2015-01-02 point not gun Aloha Washington OR 45.487421 -122.891696 not_available Lewis Lee Lembke 47.0 male W not_available False False 70
2 5 2015-01-03 move not unarmed Wichita Sedgwick KS 37.694766 -97.280554 not_available John Paul Quintero 23.0 male H not_available False False 238
3 8 2015-01-04 point not replica San Francisco San Francisco CA 37.762910 -122.422001 not_available Matthew Hoffman 32.0 male W not_available True False 196
4 9 2015-01-04 point not other Evans Weld CO 40.383937 -104.692261 not_available Michael Rodriguez 39.0 male H not_available False False 473
5 11 2015-01-04 attack not gun Guthrie Logan OK 35.876991 -97.423454 not_available Kenneth Joe Brown 18.0 male W not_available False False 101
6 13 2015-01-05 shoot car gun Chandler Maricopa AZ 33.327887 -111.840959 not_available Kenneth Arnold Buck 22.0 male H not_available False False 195
7 15 2015-01-06 point not gun Assaria Saline KS 38.703755 -97.563904 not_available Brock Nichols 35.0 male W not_available False False 490
8 16 2015-01-06 accident not unarmed Burlington Des Moines IA 40.809250 -91.118875 not_available Autumn Steele 34.0 female W not_available False True 287
9 17 2015-01-06 point not replica Knoxville Allegheny PA 40.412936 -79.991408 not_available Leslie Sapp III 47.0 male B not_available False False 26254
  1. List out some of the columns that are contained in this csv.

    Written Answer

  2. For each of those columns, give some examples for the data in that column, and what they mean. For example, some examples of values in the first column id are 22, 325, and 140.

    Written Answer

  3. In order to compare the representation of black people among the subjects of fatal police shootings to their representation in the general population, we need to know the percent of the United States population which is black. You can find this by googling ’racial demographics of the United States‘. Include a citation with the organization and url.

    Written Answer

Task2: Reading data into a dict

Run the code chunk below. This will select some of the features from the dataset, and put it into a Python dictionary.

policing = fatal_police_shootings_data[['id', 'name', 'date', 'armed_with', 'age', 'gender', 'race', 'state']]
policing_dict = policing.set_index('id').to_dict(orient='index')
  1. What information from the original csv is stored in policing_dict? What information from the original csv do we use as the keys in policing_dict? What is the type of the values in policing_dict? Hint: use type(dict[key]).

Student Answer
Hint: create a code chunk with type(dict[key]) to find the data type of the values in our dictionary

Task3: Using the database

  1. Finding a particular record: Find the record of the fatal police shooting with ID number 1694. If you’ve followed current events in the past few years, there should be a familiar name here.

Student Answer
Add policing_dict[1694] to a code chunk

  1. Displaying data from a single record: Print the name and state of the individual with ID number 1694.

    Student Answer

  2. Displaying data from multiple records: The following code will display the entire dictionary using a for loop:
for person,data in policing dict.items():
print(policing dict[person])
Add an if statement to the for loop to filter out only individuals from Minnesota. This means you’ll need to check the state for each person and only print data if the state is Minnesota.

Student Answer

Task4: Summarizing our data

  1. Creating a new dictionary as a subset: Instead of displaying all the information, let’s create a new dictionary that only contains inviduals from Minnesota. The syntax for this is: key:values for key, values in dict.items() if condition where condition is what you’re filtering on and dict is your dictionary. Name this new dictionary MN_selection.

Student Answer

  1. Creating summarization functions: We’d like to create a function that takes in our database dictionary and returns a dictionary with counts of occurrences of each race among subjects of fatal police shootings. The keys in this dictionary should be races, and the corresponding values should be the number of subjects of that race.

Student Answer
The starter code for this function is below:

def get_race_counts(database:dict)->dict:
  race_counts = {}
  for person, data in database.items():
    #add logic to populate race counts


  return race_counts
  1. Using a summarization function: Print the fraction of fatal police shootings with a black subject. This should be a number between 0 and 1, and can be computed by dividing the number of fatal police shootings with a black subject, by the total number of fatal police shootings.

Student Answer
Note: using get, len, and round may be helpful.

  1. How does the proportion of black subjects in fatal police shootings compare to the proportion of black people in the United States population?

Student Answer

Task5: Comparing Summarizations

  1. Create a new dictionary called unarmed_selection, which is built using dictionary comprehension on the original dictionary policing_dict. This dictionary should have the same structure as policing dict, except it will only contain entries for fatal police shootings where the subject was unarmed.

Student Answer

  1. Create a new dictionary, called unarmed race counts, which is created using the get_race_counts function on the dictionary unarmed_selection. The purpose of this dictionary will be to count the number of occurrences of each race among subjects of fatal police shootings, including only those where the subject is unarmed.

Student Answer

  1. Print the fraction of unarmed fatal police shootings with a black subject. This should be a number between 0 and 1, and can be computed by dividing the number of unarmed fatal police shootings with a black subject (you can get this from the dictionary unarmed race counts), by the total number of unarmed fatal police shootings.

Student Answer

  1. How does the proportion of black subjects in fatal police shootings where the subject is unarmed compare to the proportion of black people in the United States population? How does it compare to the proportion of black subjects in all police shootings?

Student Answer

Task8: Reflection

  1. Write a reflection (at least five sentences) on what you learned from this project. This can include your reaction to the results of the project, as well as the process of working with the data.

Student Answer

When your group is done, each partner must submit a link to their workbook to Moodle. Make sure the link you shared is public.