Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Predicting Crime

Risk in New Zealand


Farhad Mehdipour
Principal Academic and Head of Department,
Otago Polytechnic – Auckland International Campus (OPAIC)
Postgraduate Lecturer

Analytics Friday, OPAIC, July 2022


Outline
• Predictive Analytics
• Application Areas
• Predicting Crime Risk
• Our Approach
• Results
• Demo
?

?
Source: https://demand-planning.com/2020/01/20/the-differences-between-descriptive-diagnostic-predictive-cognitive-analytics/

Descriptive, Diagnostic, Predictive, Prescriptive, ?


Predicting vs. Forecasting
Predicting
Predicts using various features

High Accuracy and less uncertainty

How likely is an outcome is possible


INVOLVES PAST
SUB-CATEGORY
AND PRESENT
OF PREDICTION
TIME

USES TIME
SERIES

Forecasting
HIGH DEGREE OF
UNCERTAINTY

6
Why Forecasting
/ Predicting? • To plan for future capital expenditure

• Staffing and resource management

• Impacts of new products

• Expectations of future performance


Predicting with
Machine
Learning

9
Healthcare

1 2 3
Finding Major Trends Analysing Cancer Predicting
in Life Expectancy Trends Emergency Medical
Services Utilisation

10
Energy

Power plants
Consumers

Analysis of Energy Consumption


Profile

How household energy usage varies


over a 2-day period in a certain time?
Renewable
energy sources

Control center

11
Predictive
Policing
Auror

https://www.auror.co/
Location-based algorithms People-based algorithms

• Draw on links between • Draw on data about


Predictive places, events, and historical people (age, gender, marital
crime rates to predict where status, history of substance
Policing and when crimes are more abuse, and criminal record)
likely to happen
• Predicts who has a high
• In certain weather chance of being involved in
conditions or at large future criminal activity.
sporting events.
• Can be used by police or
by courts,
• Identify hot spots, and the
police plan patrols around
these tip-offs.
Used in many cities in the US

PredPol Breaks locations up into 500-


by-500 foot blocks
https://www.predpol.com/

Updates predictions
throughout the day—a kind of
crime weather forecast
COMPAS
Correctional Offender Management Profiling
for Alternative Sanctions (COMPAS)

• Case Management and Decision Management


System

• Used in jurisdictions

• Helps make decisions about pretrial release and


sentencing

• Issues a statistical score between 1 and 10 on how


likely a person is to be rearrested if released
To forecast the crime at certain areas and times

Why Crime Risk Prediction?


Improve resource management

Reduce cost
Inequality and the misuses of police power

https://www.technologyreview.com/2020/07/17/1005396/predictive-policing-algorithms-racist-dismantled-machine-
learning-bias-criminal-justice/
Our Approach
Datasets #Records #Attributes #Crime Types Has Date Has Time Has Location

Boston 319,074 17 66 Y Y Y
Denver 466,841 19 15 Y Y Y
London 13,490,605 7 9 year|month N Y
San Francisco 150,501 13 39 Y Y Y

New Zealand (1)


641,641 23 16 year|month N Y
(16 csv files)

New Zealand (2)


1,194,764 17 6 Y Y Y
(nz_victim_timeplace.csv)
Crime Risk Modelling
Crime Risk Modelling

Crime
Factors

Crime
Time Location
Type

Territorial
Month Day Hour Area Unit Meshblock
Authority
MONTH_AREA_CRIME_COUNT

DAY_AREA_CRIME_COUNT

MONTH_AREA_CRIME_TYPE_COUNT
MONTH_AREA_CRIME_COUNT

df.groupby(['MONTH', 'AREA_1']).size().to_frame('MONTH_AREA_CRIME_COUNT')
DAY_AREA_CRIME_COUNT

df.groupby(['MONTH', 'DAY', 'AREA_0']).size().to_frame('DAY_AREA_CRIME_COUNT')


MONTH_AREA_CRIME_TYPE_COUNT

df.groupby(['MONTH', 'AREA_1', 'CRIME_TYPE']).size().to_frame('MONTH_AREA_CRIME_TYPE_COUNT')


Total Crime(M,T) = Crime Count Grouped by Month and Territorial Authority

Total Crime(M,T) = Sum of Everyday Crime Count of All Area Unit

Total Crime(M,T) = Sum of Crime Count by Crim Type


DAY_AREA_CRIME_COUNT
MONTH_AREA_CRIME_COUNT

MONTH_AREA_CRIME_TYPE_COUNT
MONTH_AREA_CRIME_COUNT

DAY_AREA_CRIME_RATIO x MONTH_AREA_CRIME_RATIO
In a specific
Area Unit

Risk = 0 1 2

0 Percentile 1

0.3 0.7
df["CRIME_RANK"] = df.groupby(["AREA_0"])["CRIME_RATIO"]
.rank(method='max', pct=True)

df.loc[(df['CRIME_RANK'] >= 0.7), 'RISK'] = 2

df.loc[((df['CRIME_RANK'] >= 0.3)


& (df['CRIME_RANK'] < 0.7)), 'RISK'] = 1

df.loc[(df['CRIME_RANK'] < 0.3), 'RISK'] = 0


Ready for the predictive modelling!
Load Declare
feature &
Encode
categorical
Oversampling
data
dataset target vars. features (SMOTE) Features Target
MONTH
Feature RISK
DAY Engineering
AREA_0
AREA_1
QUARTER
HOUR_PARTITION
WEAPON_TYPE
CRIME_TYPE
3_DAY_AREA_CRIME_MEAN
Load Declare Encode Oversampling
feature & categorical data
dataset target vars. features (SMOTE)

Split
Train/test Cross-
train/test models validation
set
Load Declare Encode Oversampling
feature & categorical data
dataset target vars. features (SMOTE)

Split
Train/test Cross-
train/test models validation
set

Evaluate Classification
Log Loss
result report
Load Declare Encode Oversampling
feature & categorical data
dataset target vars. features (SMOTE)

Split
Train/test Cross-
train/test models validation
set

Evaluate Classification
Log Loss
result report

Visualize Confusion
ROC curve PR curve
result matrix
train=70 / test=30 K-Fold CV. (k=5) K-Fold CV. (k=10

algorithm accuracy log_loss accuracy log_loss accuracy log_loss


LogisticRegression 0.4 1.09 0.39 1.09 0.37 1.09
GaussianNB 0.37 1.12 0.36 1.12 0.34 1.12
KNeighbors
0.77 1.03 0.78 1.04 0.79 1.04
(k=12)
DecisionTree
0.76 1.81 0.75 1.81 0.75 1.7
(max_depth=20)
XGBoost
0.84 0.49 0.84 0.49 0.83 0.5
(max_depth=12)
RandomForest
0.84 0.51 0.83 0.52 0.83 0.52
(max_depth=20)

epoch=32 epoch=64

accuracy loss accuracy loss


Deep Learning: TensorFlow
0.77 0.5 0.79 0.46
Batch size: 64
April Love Naviza

Farhad Mehdipour
Vimitaben Vaidya
Project Leader

Wisanu Boonrat

• Declare the goal and objective


• Guide the project in the correct direction
• Review and feedback
This project was funded by Otago Polytechnic – Auckland
International Campus (OPAIC) Contestable Research Funding,
2020

Acknowledgment
Thank you

You might also like