Lonell Childred – Final Project
(Police Shootings Dataset)

Download: Python Code / HTML

My background for choosing this topic comes from my person experience in dealing with the police and I will analyse the dataset using machine learning models to further my study. This is the background for my data story.

This is an excerpt that I posted on Facebook right after the killing of George Floyd last May 2020. “I am extremely disturbed and saddened by what is happening in our nation tonight. In fact, I haven’t had much sleep in the last 2 nights. I do not endorse the violence and destruction of property, but at the same time as a black man in America I understand completely the anger and frustration that black and brown people in our nation have and are experiencing.

We just saw this morning Omar Jimenez from CNN get arrested for NO reason and was told NO reason for his arrest when he asked. Unfortunately, I am in the same club. I have never shared this, but in my lifetime I have been falsely arrested 3 times and had an Ohio State Trooper pull out a firearm on me once, and I have NO criminal record, nor have I committed any crimes. I used to drive a green BMW and was pulled over and harassed by the police, asking if this was my car, how did it get this expensive BMW and do I have any drugs or guns in my car. My response was that I actually am a IT Professional and a member of the Screen Actor’s Guild (SAG-AFTRA). The police said that they had probable cause to arrest me, my car was towed and I was taken to the local sheriff’s department, stripped down butt naked and then taken downtown Cincinnati where I was thrown in jail for the weekend. I was not told what crime I had committed or probable cause to be arrested. After having a client of mine bail be out of jail, I hired an attorney and threatened to sue the city. Then, magically I started getting all types of apologies from the police department and of course the arrest was cleared from my record.

In another incident, which was mistaken identity, which I proved to the police I was not the person that they were looking for; however, this time I was handcuffed and ruffed up by 2 police officers, slammed onto the police car and arrested. I filed a complaint against the police officers and had them internally investigated. This time I received a written apology from the police department and the charges were dropped.

The point is that I am very upset, because just like George Floyd, I complied 100% with the police and he is now dead. This could have easily happened to me. As an Orthodox Christian man, I am praying for peace, and positive changes in our nation, and I am asking everyone who sees this message to do the same. GOD Bless Us All.”

https://www.facebook.com/lonell/posts/10221100403238729

Today, 4.20.2021, we saw the conviction of Derek Chauvin on all 3 counts for the murder of George Floyd.Previously a $27 million civil settlement was awarded to the Floyd Family.

This alone is proof of the racism, bias and disparity against black and brown minorities in the United States.

Read & Convert data for proper analysis Statistical Analysis.

I would like to primarily compare cause of death based on race and age. Another way would be say this is white v.non-white deaths by th police.

The Labels are as follows:

Gender
    M = Male
    F = Female
    None = Unknown
Race
    W = White, non-Hispanic
    B = Black, non-Hispanic
    A = Asian
    N = Native American
    H = Hispanic
    O = Other
    None = Unknown

#import vital packages for proper analysis
import time
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from pandas import Series, DataFrame
import pandas as pd
from scipy import stats
%matplotlib inline

# Read the Police Shooting data set
data_df=pd.read_csv('/home/mint/fatal-police-shootings-data.csv')

# Print the shape & Head of the data frame
print(data_df.shape)
print(data_df.head(6))

(5416, 14)
   id                name        date   manner_of_death       armed   age  \
0   3          Tim Elliot  2015-01-02              shot         gun  53.0   
1   4    Lewis Lee Lembke  2015-01-02              shot         gun  47.0   
2   5  John Paul Quintero  2015-01-03  shot and Tasered     unarmed  23.0   
3   8     Matthew Hoffman  2015-01-04              shot  toy weapon  32.0   
4   9   Michael Rodriguez  2015-01-04              shot    nail gun  39.0   
5  11   Kenneth Joe Brown  2015-01-04              shot         gun  18.0   

  gender race           city state  signs_of_mental_illness threat_level  \
0      M    A        Shelton    WA                     True       attack   
1      M    W          Aloha    OR                    False       attack   
2      M    H        Wichita    KS                    False        other   
3      M    W  San Francisco    CA                     True       attack   
4      M    H          Evans    CO                    False       attack   
5      M    W        Guthrie    OK                    False       attack   

          flee  body_camera  
0  Not fleeing        False  
1  Not fleeing        False  
2  Not fleeing        False  
3  Not fleeing        False  
4  Not fleeing        False  
5  Not fleeing        False

#Properly assigning categorical records as a category
data_df.id = data_df.id.astype('category')
data_df.armed = data_df.armed.astype('category')
data_df.gender = data_df.gender.astype('category')
data_df.city = data_df.city.astype('category')
data_df.state = data_df.state.astype('category')
data_df.race = data_df.race.astype('category')
data_df.threat_level = data_df.threat_level.astype('category')
data_df.flee = data_df.flee.astype('category')
data_df.manner_of_death = data_df.manner_of_death.astype('category')

#Properly naming each one of the races, to facilitate analysis and comprehension in visualizations
data_df.replace(to_replace = ['A'], value = ['Asian'], inplace = True)
data_df.replace(to_replace = ['B'], value = ['Black'], inplace = True)
data_df.replace(to_replace = ['H'], value = ['Hispanic'], inplace = True)
data_df.replace(to_replace = ['N'], value = ['Native American'], inplace = True)
data_df.replace(to_replace = ['O'], value = ['Other'], inplace = True)
data_df.replace(to_replace = ['W'], value = ['White'], inplace = True)

data_df['month'] = pd.to_datetime(data_df['date']).dt.month
data_df['year'] = pd.to_datetime(data_df['date']).dt.year

data_df.head()

	id	name	date	manner_of_death	armed	age	gender	race	city	state	signs_of_mental_illness	threat_level	flee	body_camera	month	year
0	3	Tim Elliot	2015-01-02	shot	gun	53.0	M	Asian	Shelton	WA	True	attack	Not fleeing	False	1	2015
1	4	Lewis Lee Lembke	2015-01-02	shot	gun	47.0	M	White	Aloha	OR	False	attack	Not fleeing	False	1	2015
2	5	John Paul Quintero	2015-01-03	shot and Tasered	unarmed	23.0	M	Hispanic	Wichita	KS	False	other	Not fleeing	False	1	2015
3	8	Matthew Hoffman	2015-01-04	shot	toy weapon	32.0	M	White	San Francisco	CA	True	attack	Not fleeing	False	1	2015
4	9	Michael Rodriguez	2015-01-04	shot	nail gun	39.0	M	Hispanic	Evans	CO	False	attack	Not fleeing	False	1	2015

Data Visualizations based on the data set.

The Research Question is as follows:

Is there RACIAL BIAS that shows that Non-White Americans are shot and killed by the police more often more often than White Americans based on age?

# In order to facilitate our analysis, and understand if there is racial basis in shootings, we will create categories for the following
# Armed = Will be categorized into Armed and Unarmed
# Fleeing = Will be categorized into Fleeing and Not Fleeing

# ARMED CATEGORY - BUCKET
UnavailableUndetermined = ['NaN','undetermined',]
Unarmed = ['unarmed']
Armed = ['gun',
 'toy weapon',
 'nail gun',
 'knife',
 'shovel',
 'hammer',
 'hatchet',
 'sword',
 'machete',
 'box cutter',
 'metal object',
 'screwdriver',
 'lawn mower blade',
 'flagpole',
 'guns and explosives',
 'cordless drill',
 'crossbow',
 'metal pole',
 'Taser',
 'metal pipe',
 'metal hand tool',
 'blunt object',
 'metal stick',
 'sharp object',
 'meat cleaver',
 'carjack',
 'chain',
 "contractor's level",
 'unknown weapon',
 'stapler',
 'beer bottle',
 'bean-bag gun',
 'baseball bat and fireplace poker',
 'straight edge razor',
 'gun and knife',
 'ax',
 'brick',
 'baseball bat',
 'hand torch',
 'chain saw',
 'garden tool',
 'scissors',
 'pole',
 'pick-axe',
 'flashlight',
 'vehicle',
 'baton',
 'spear',
 'chair',
 'pitchfork',
 'hatchet and gun',
 'rock',
 'piece of wood',
 'bayonet',
 'pipe',
 'glass shard',
 'motorcycle',
 'pepper spray',
 'metal rake',
 'crowbar',
 'oar',
 'machete and gun',
 'tire iron',
 'air conditioner',
 'pole and knife',
 'baseball bat and bottle',
 'fireworks',
 'pen',
 'chainsaw',
 'gun and sword',
 'gun and car',
 'pellet gun',
 'claimed to be armed',
 'BB gun',
 'incendiary device',
 'samurai sword',
 'bow and arrow',
 'gun and vehicle',
 'vehicle and gun',
 'wrench',
 'walking stick',
 'barstool',
 'grenade',
 'BB gun and vehicle',
 'wasp spray',
 'air pistol',
 'Airsoft pistol',
 'baseball bat and knife',
 'vehicle and machete',
 'ice pick',
 'car, knife and mace']

df_UnavailableUndetermined = pd.DataFrame({'armed': UnavailableUndetermined})
df_UnavailableUndetermined ['category'] = 'Unavailable_Undetermined'

df_Unarmed = pd.DataFrame({'armed': Unarmed})
df_Unarmed ['category'] = 'Unarmed'

df_Armed = pd.DataFrame({'armed': Armed})
df_Armed ['category'] = 'Armed'

df_lookup2 = df_Armed

df_lookup1 = df_lookup2.append(df_Unarmed)

df_lookup = df_lookup1.append(df_UnavailableUndetermined)
df2 = pd.merge(data_df, df_lookup, on = 'armed', how = 'outer' )
df2 = df2.rename({'category':'armed_category'}, axis = 1)
df2.head()
#df2.armed_category.value_counts(normalize = True)

# FLEE CATEGORY - BUCKET
Fleeing = ['Car', 'Foot', 'Other']
NotFleeing = ['Not fleeing']

FleeLookUp2 = pd.DataFrame({'flee': Fleeing})
FleeLookUp2['flee_category'] = "Fleeing"
FleeLookUp1 = pd.DataFrame({'flee': NotFleeing})
FleeLookUp1['flee_category'] = "Not_Fleeing"

FleeLookUp = FleeLookUp1.append(FleeLookUp2)
#FleeLookUp.head()

df3 = pd.merge(df2,FleeLookUp,how='outer', on = 'flee')
#df3.head()
#df3.flee_category.value_counts(normalize=True)
#df3.race.value_counts(normalize=True)

#The majority of crimes are committed by 3 racial groups. White, Black and Hispanic
df3.race.value_counts(normalize=True).plot(kind='pie', figsize = (8,8))
plt.title('Deaths by Race\nNormalized Data')

Text(0.5, 1.0, 'Deaths by Race\nNormalized Data')

Important Racial Statistics

The percentage of people from White race, in USA, is 63.4%, the percentage of Latinos and Black are 15% and 13.4%. Therefore we can assume that Latinos and African Americans die more with regards of their own population.

# Top States
plt.xlabel("Frequency")
plt.ylabel("City")
plt.title("Top 10 States with Most Fatal Police Shootings")
plt.barh(df3[STATE].value_counts(normalize=True)[:10].index, df3[STATE].value_counts()[:10].values)

# Sum by percentage
# we can see that the top 10 states in the US account for 53.32% 
# of all deaths in the US. Might be worth focusing on these states to look for trends
df3.state.value_counts(normalize=True)[:10].sum()

0.5332348596750369

# Listing Specifically Black vs. White
RaceList = ['White', 'Black']
df3_race = df3[df3.race.isin(RaceList)]
df3_race.race.unique()

CityList = ['Los Angeles','Phoenix','Houston','Las Vegas','San Antonio','Columbus','Chicago','Albuquerque','Kansas City','Jacksonville']
df3_race_city = df3_race[df3_race.city.isin(CityList)]
df3_race_city.city.unique()

df3_race_city.groupby('race').city.value_counts(normalize=True).unstack().plot(kind='bar', figsize=(18,8))
plt.title('Deaths Per Race and City')
plt.ylabel('% of Total Deaths per Race')

Text(0, 0.5, '% of Total Deaths per Race')

# Adding a correlation (corrmat) as well as heatmap
corrmat = data_df.corr()
print(corrmat)

#sns.heatmap (corrmat, vmax=.8, square=True)
sns.set()
f, ax = plt.subplots(figsize=(12,8))
heatmap=data_df.corr()
sns.heatmap(heatmap, vmax=.8, square=True, annot=True)

                              age  signs_of_mental_illness  body_camera  \
age                      1.000000                 0.105763    -0.040138   
signs_of_mental_illness  0.105763                 1.000000     0.051838   
body_camera             -0.040138                 0.051838     1.000000   
month                    0.011028                -0.027029     0.011036   
year                     0.035409                -0.079972     0.018592   

                            month      year  
age                      0.011028  0.035409  
signs_of_mental_illness -0.027029 -0.079972  
body_camera              0.011036  0.018592  
month                    1.000000 -0.144633  
year                    -0.144633  1.000000

<AxesSubplot:>

data_df.race.unique()

array(['Asian', 'White', 'Hispanic', 'Black', 'Other', nan,
       'Native American'], dtype=object)

# Here I have added two columns to the df3 dataframe named - white and non_white.
# Next, I map and convert the race for white as 1 and non-white as 0 
# finally, I display the head of the df3 dataframe o confirm the mapping.

df3['white'] = 'white'
df3['non_white'] = 'non_white'
df3['white'] = df3.race.map ({'Asian':0,'White':1,'Hispanic':0,'Black':0,'Other':0,'Native American':0})
df3['non_white'] = df3.race.map ({'Asian':1,'White':0,'Hispanic':1,'Black':1,'Other':1,'Native American':1})

df3.dropna(inplace=True)
df3.head()

	id	name	date	manner_of_death	armed	age	gender	race	city	state	signs_of_mental_illness	threat_level	flee	body_camera	month	year	armed_category	flee_category	white	non_white
0	3	Tim Elliot	2015-01-02	shot	gun	53.0	M	Asian	Shelton	WA	True	attack	Not fleeing	False	1.0	2015.0	Armed	Not_Fleeing	0.0	1.0
1	4	Lewis Lee Lembke	2015-01-02	shot	gun	47.0	M	White	Aloha	OR	False	attack	Not fleeing	False	1.0	2015.0	Armed	Not_Fleeing	1.0	0.0
2	11	Kenneth Joe Brown	2015-01-04	shot	gun	18.0	M	White	Guthrie	OK	False	attack	Not fleeing	False	1.0	2015.0	Armed	Not_Fleeing	1.0	0.0
3	15	Brock Nichols	2015-01-06	shot	gun	35.0	M	White	Assaria	KS	False	attack	Not fleeing	False	1.0	2015.0	Armed	Not_Fleeing	1.0	0.0
4	21	Ron Sneed	2015-01-07	shot	gun	31.0	M	Black	Freeport	TX	False	attack	Not fleeing	False	1.0	2015.0	Armed	Not_Fleeing	0.0	1.0

# Disply the scatter plot of Age v. Non -white deaths from the updated df3 dataframe
x = df3['age']
y = df3['non_white'] 

logR = LogisticRegression()
plt.scatter(x,y, marker='+',color='red')
plt.xlabel("Age")
plt.ylabel("Non-White Deaths")

Text(0, 0.5, 'Non-White Deaths')

# I believe the result of the Linear Regression Model should really be a Logistic Regression Model.
slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
  return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y, marker='+',color='red')
plt.xlabel("Age")
plt.ylabel("Non-White Deaths")
plt.plot(x, mymodel)

[<matplotlib.lines.Line2D at 0x7fb8934a0f40>]

Machine Learning Models.

# I am using a Logistic Regression Model to test and train aganist age.
from sklearn.model_selection import train_test_split 

X_train, X_test, y_train, y_test = train_test_split(df3[['age']],df3.non_white,test_size=0.7)

X_test

	age
1915	42.0
2816	17.0
4900	40.0
5093	43.0
4870	27.0
…	…
2783	20.0
3065	40.0
3134	48.0
932	52.0
3280	40.0

3080 rows × 1 columns

X_train

	age
1050	55.0
2433	23.0
2383	46.0
238	40.0
2299	35.0
…	…
2297	44.0
3529	40.0
1508	28.0
1699	27.0
747	57.0

1319 rows × 1 columns

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

model.fit(X_train,y_train)

LogisticRegression()

model.predict(X_test)

array([0., 1., 0., ..., 0., 0., 0.])

# Accuancy of my model prediction base on X_Test
model.score(X_test,y_test)

0.6126623376623377

model.predict_proba(X_test)

array([[0.57123091, 0.42876909],
       [0.31489565, 0.68510435],
       [0.5502643 , 0.4497357 ],
       ...,
       [0.63234362, 0.36765638],
       [0.67096591, 0.32903409],
       [0.5502643 , 0.4497357 ]])

Lonell Childred – Results & Conclusions.

Based on the data from my research and my personal experience, it is clear that there is racial bias in police shootings and police brutality in the United States. Based on age, my predictive logistic regression model shows that most deaths are between the ages of 20s-30s; after forty the probability seems to decline.I did delete some NaN/blank data from my dataframe as needed, and I choose to leave most columns intact such as name for future ML projects. I did add two columns for analysis which were white and non-white that I converted to numeric 0/1 to help with my Machine Learning analysis and Logistic Regression Modeling. My Machine Learning Model has an accuracy of 61.26% based on the ML training and also I list the predict probability based on the X_test. This class and final project has made it very clear how important Data Science is and that Machine Learning will become more important both now and in the future.

Code References:
https://www.facebook.com/lonell/posts/10221100403238729

https://www.kaggle.com/gusvalicente/is-the-police-killing-minorities

https://www.kaggle.com/andle1/kernel2cdb30105f

https://www.kaggle.com/sameensalam/police-shooting-analysis

https://www.kaggle.com/mrinaal007/police-shootouts

https://www.kaggle.com/umerkk12/police-shooting-analysis

https://www.newscientist.com/article/2099357-us-police-use-machine-learning-to-curb-their-own-violence

https://academic.oup.com/policing/advance-article/doi/10.1093/police/paz035/5518992

https://youtu.be/GdkUbZkF5bo

https://www.tutorialspoint.com/python_pandas/python_pandas_merging_joining.htm

https://youtu.be/RehA-5OjTN4

https://youtu.be/zM4VZR0px8E