Professional Documents
Culture Documents
Analytics Friday-22 July-OPAIC
Analytics Friday-22 July-OPAIC
?
Source: https://demand-planning.com/2020/01/20/the-differences-between-descriptive-diagnostic-predictive-cognitive-analytics/
USES TIME
SERIES
Forecasting
HIGH DEGREE OF
UNCERTAINTY
6
Why Forecasting
/ Predicting? • To plan for future capital expenditure
9
Healthcare
1 2 3
Finding Major Trends Analysing Cancer Predicting
in Life Expectancy Trends Emergency Medical
Services Utilisation
10
Energy
Power plants
Consumers
Control center
11
Predictive
Policing
Auror
https://www.auror.co/
Location-based algorithms People-based algorithms
Updates predictions
throughout the day—a kind of
crime weather forecast
COMPAS
Correctional Offender Management Profiling
for Alternative Sanctions (COMPAS)
• Used in jurisdictions
Reduce cost
Inequality and the misuses of police power
https://www.technologyreview.com/2020/07/17/1005396/predictive-policing-algorithms-racist-dismantled-machine-
learning-bias-criminal-justice/
Our Approach
Datasets #Records #Attributes #Crime Types Has Date Has Time Has Location
Boston 319,074 17 66 Y Y Y
Denver 466,841 19 15 Y Y Y
London 13,490,605 7 9 year|month N Y
San Francisco 150,501 13 39 Y Y Y
Crime
Factors
Crime
Time Location
Type
Territorial
Month Day Hour Area Unit Meshblock
Authority
MONTH_AREA_CRIME_COUNT
DAY_AREA_CRIME_COUNT
MONTH_AREA_CRIME_TYPE_COUNT
MONTH_AREA_CRIME_COUNT
df.groupby(['MONTH', 'AREA_1']).size().to_frame('MONTH_AREA_CRIME_COUNT')
DAY_AREA_CRIME_COUNT
MONTH_AREA_CRIME_TYPE_COUNT
MONTH_AREA_CRIME_COUNT
DAY_AREA_CRIME_RATIO x MONTH_AREA_CRIME_RATIO
In a specific
Area Unit
Risk = 0 1 2
0 Percentile 1
0.3 0.7
df["CRIME_RANK"] = df.groupby(["AREA_0"])["CRIME_RATIO"]
.rank(method='max', pct=True)
Split
Train/test Cross-
train/test models validation
set
Load Declare Encode Oversampling
feature & categorical data
dataset target vars. features (SMOTE)
Split
Train/test Cross-
train/test models validation
set
Evaluate Classification
Log Loss
result report
Load Declare Encode Oversampling
feature & categorical data
dataset target vars. features (SMOTE)
Split
Train/test Cross-
train/test models validation
set
Evaluate Classification
Log Loss
result report
Visualize Confusion
ROC curve PR curve
result matrix
train=70 / test=30 K-Fold CV. (k=5) K-Fold CV. (k=10
epoch=32 epoch=64
Farhad Mehdipour
Vimitaben Vaidya
Project Leader
Wisanu Boonrat
Acknowledgment
Thank you