Analytics Friday-22 July-OPAIC

Predicting Crime
Risk in New Zealand

Farhad Mehdipour
Principal Academic and Head of Department,
Otago Polytechnic – Auckland International Campus (OPAIC)
Postgraduate Lecturer
Analytics Friday, OPAIC, July 2022

Outline
• Predictive Analytics
• Application Areas
• Predicting Crime Risk
• Our Approach
• Results
• Demo
?
?
Source: https://demand-planning.com/2020/01/20/the-differences-between-descriptive-diagnostic-predictive-cognitive-analytics/
Descriptive, Diagnostic, Predictive, Prescriptive, ?

Predicting vs. Forecasting
Predicting
Predicts using various features
High Accuracy and less uncertainty
How likely is an outcome is possible

INVOLVES PAST
SUB-CATEGORY
AND PRESENT
OF PREDICTION
TIME
USES TIME
SERIES
Forecasting
HIGH DEGREE OF
UNCERTAINTY
6
Why Forecasting
/ Predicting? • To plan for future capital expenditure
• Staffing and resource management
• Impacts of new products
• Expectations of future performance

Predicting with
Machine
Learning
9
Healthcare
1 2 3
Finding Major Trends Analysing Cancer Predicting
in Life Expectancy Trends Emergency Medical
Services Utilisation
10
Energy
Power plants
Consumers
Analysis of Energy Consumption

Profile
How household energy usage varies

over a 2-day period in a certain time?
Renewable
energy sources
Control center
11
Predictive
Policing
Auror
https://www.auror.co/
Location-based algorithms People-based algorithms
• Draw on links between • Draw on data about

Predictive places, events, and historical people (age, gender, marital
crime rates to predict where status, history of substance
Policing and when crimes are more abuse, and criminal record)
likely to happen
• Predicts who has a high
• In certain weather chance of being involved in
conditions or at large future criminal activity.
sporting events.
• Can be used by police or
by courts,
• Identify hot spots, and the
police plan patrols around
these tip-offs.
Used in many cities in the US
PredPol Breaks locations up into 500-

by-500 foot blocks
https://www.predpol.com/
Updates predictions
throughout the day—a kind of
crime weather forecast
COMPAS
Correctional Offender Management Profiling
for Alternative Sanctions (COMPAS)
• Case Management and Decision Management

System
• Used in jurisdictions
• Helps make decisions about pretrial release and

sentencing
• Issues a statistical score between 1 and 10 on how

likely a person is to be rearrested if released
To forecast the crime at certain areas and times
Why Crime Risk Prediction?

Improve resource management
Reduce cost
Inequality and the misuses of police power
https://www.technologyreview.com/2020/07/17/1005396/predictive-policing-algorithms-racist-dismantled-machine-
learning-bias-criminal-justice/
Our Approach
Datasets #Records #Attributes #Crime Types Has Date Has Time Has Location
Boston 319,074 17 66 Y Y Y
Denver 466,841 19 15 Y Y Y
London 13,490,605 7 9 year|month N Y
San Francisco 150,501 13 39 Y Y Y
New Zealand (1)

641,641 23 16 year|month N Y
(16 csv files)
New Zealand (2)

1,194,764 17 6 Y Y Y
(nz_victim_timeplace.csv)
Crime Risk Modelling
Crime Risk Modelling
Crime
Factors
Crime
Time Location
Type
Territorial
Month Day Hour Area Unit Meshblock
Authority
MONTH_AREA_CRIME_COUNT
DAY_AREA_CRIME_COUNT
MONTH_AREA_CRIME_TYPE_COUNT
df.groupby(['MONTH', 'AREA_1']).size().to_frame('MONTH_AREA_CRIME_COUNT')
df.groupby(['MONTH', 'DAY', 'AREA_0']).size().to_frame('DAY_AREA_CRIME_COUNT')

df.groupby(['MONTH', 'AREA_1', 'CRIME_TYPE']).size().to_frame('MONTH_AREA_CRIME_TYPE_COUNT')

Total Crime(M,T) = Crime Count Grouped by Month and Territorial Authority
Total Crime(M,T) = Sum of Everyday Crime Count of All Area Unit
Total Crime(M,T) = Sum of Crime Count by Crim Type

DAY_AREA_CRIME_RATIO x MONTH_AREA_CRIME_RATIO
In a specific
Area Unit
Risk = 0 1 2
0 Percentile 1
0.3 0.7
df["CRIME_RANK"] = df.groupby(["AREA_0"])["CRIME_RATIO"]
.rank(method='max', pct=True)
df.loc[(df['CRIME_RANK'] >= 0.7), 'RISK'] = 2
df.loc[((df['CRIME_RANK'] >= 0.3)

& (df['CRIME_RANK'] < 0.7)), 'RISK'] = 1
df.loc[(df['CRIME_RANK'] < 0.3), 'RISK'] = 0

Ready for the predictive modelling!
Load Declare
feature &
Encode
categorical
Oversampling
data
dataset target vars. features (SMOTE) Features Target
MONTH
Feature RISK
DAY Engineering
AREA_0
AREA_1
QUARTER
HOUR_PARTITION
WEAPON_TYPE
CRIME_TYPE
3_DAY_AREA_CRIME_MEAN
Load Declare Encode Oversampling
feature & categorical data
dataset target vars. features (SMOTE)
Split
Train/test Cross-
train/test models validation
set
Split
Train/test Cross-
set
Evaluate Classification
Log Loss
result report
Split
Train/test Cross-
set
Evaluate Classification
Log Loss
result report
Visualize Confusion
ROC curve PR curve
result matrix
train=70 / test=30 K-Fold CV. (k=5) K-Fold CV. (k=10
algorithm accuracy log_loss accuracy log_loss accuracy log_loss

LogisticRegression 0.4 1.09 0.39 1.09 0.37 1.09
GaussianNB 0.37 1.12 0.36 1.12 0.34 1.12
KNeighbors
0.77 1.03 0.78 1.04 0.79 1.04
(k=12)
DecisionTree
0.76 1.81 0.75 1.81 0.75 1.7
(max_depth=20)
XGBoost
0.84 0.49 0.84 0.49 0.83 0.5
(max_depth=12)
RandomForest
0.84 0.51 0.83 0.52 0.83 0.52
(max_depth=20)
epoch=32 epoch=64
accuracy loss accuracy loss

Deep Learning: TensorFlow
0.77 0.5 0.79 0.46
Batch size: 64
April Love Naviza
Farhad Mehdipour
Vimitaben Vaidya
Project Leader
Wisanu Boonrat
• Declare the goal and objective

• Guide the project in the correct direction
• Review and feedback
This project was funded by Otago Polytechnic – Auckland
International Campus (OPAIC) Contestable Research Funding,
2020
Acknowledgment
Thank you

Analytics Friday-22 July-OPAIC

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analytics Friday-22 July-OPAIC

Uploaded by

Copyright:

Available Formats

Predicting Crime

Risk in New Zealand

Analytics Friday, OPAIC, July 2022

Descriptive, Diagnostic, Predictive, Prescriptive, ?

High Accuracy and less uncertainty

How likely is an outcome is possible

• Staffing and resource management

• Impacts of new products

• Expectations of future performance

Analysis of Energy Consumption

How household energy usage varies

• Draw on links between • Draw on data about

PredPol Breaks locations up into 500-

• Case Management and Decision Management

• Helps make decisions about pretrial release and

• Issues a statistical score between 1 and 10 on how

Why Crime Risk Prediction?

New Zealand (1)

New Zealand (2)

df.groupby(['MONTH', 'DAY', 'AREA_0']).size().to_frame('DAY_AREA_CRIME_COUNT')

df.groupby(['MONTH', 'AREA_1', 'CRIME_TYPE']).size().to_frame('MONTH_AREA_CRIME_TYPE_COUNT')

Total Crime(M,T) = Sum of Everyday Crime Count of All Area Unit

Total Crime(M,T) = Sum of Crime Count by Crim Type

df.loc[(df['CRIME_RANK'] >= 0.7), 'RISK'] = 2

df.loc[((df['CRIME_RANK'] >= 0.3)

df.loc[(df['CRIME_RANK'] < 0.3), 'RISK'] = 0

algorithm accuracy log_loss accuracy log_loss accuracy log_loss

accuracy loss accuracy loss

• Declare the goal and objective

You might also like