Dhanacrime Prediction2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

A Mini Project Report

On

DEEP LEARNING PROCESS IN ANALYZING CRIMES


Submitted in partial fulfillment of the

Requirements for the award of degree of


Bachelor of Technology

in

Computer Science and Engineering

by

Kammampati Dhanalakshmi
19H61A0580

Under the Guidance of

Mr. Madar Bandu


Assistant Professor

Department of Computer Science and Engineering


ANURAG GROUP OF INSTITUTIONS
(Formerly CVSR College of Engineering)
(An Autonomous Institution, Approved by AICTE and NBA Accredited)
Venkatapur (V), Ghatkesar (M), Medchal(D)., T.S-500088 (2019-2023)
i
CERTIFICATE

This is to certify that the project entitled “Deep Learning Process In Analyzing Crimes
using Machine Learning” being submitted by Kammampati Dhanalakshmi bearing the Hall
Ticket number 19H61A0580 in partial fulfillment of the requirements for the award of the degree
of the Bachelor of Technology in Computer Science and Engineering to Anurag Group of
Institutions (Formerly CVSR College of Engineering) is a record of bonafide work carried out
by her under my guidance and supervision from June 2022 to October 2022.

The results presented in this project have been verified and found to be satisfactory. The
results embodied in this project report have not been submitted to any other University for the
award of any other degree or diploma.

Internal Guide External Examiner


Mr. Madar Bandu
(Assistant Professor)

Dr. G. Vishnu Murthy,


Professor & Head, Dept. of CSE
ACKNOWLEDGEMENT

i
It is my privilege and pleasure to express profound sense of respect, gratitude and
indebtedness to our guide Mr. Madar Bandu, Assistant Professor, Dept. of Computer Science and
Engineering, Anurag Group of Institutions (Formerly CVSR College of Engineering), for his
indefatigable inspiration, guidance, cogent discussion, constructive criticisms, and encouragement
throughout dissertation work.

I express my sincere gratitude to Dr. G. Vishnu Murthy, Professor & Head, Department
of Computer Science and Engineering, Anurag Group of Institutions (Formerly CVSR College of
Engineering), for his suggestions, motivations, and co-operation for the successful completion of
the work.

I extend my sincere thanks to Dr. V. Vijaya Kumar, Dean, Research and Development, Anurag
Group of Institutions, for his encouragement and constant help.

Kammampati Dhanalakshmi
(19H61A0580)

DECLARATION

I hereby declare that the project work entitled “Deep Learning Process In Analyzing
Crimes using Machine Learning” submitted to the Anurag Group of Institutions(Formerly
CVSR College of Engineering) in partial fulfillment of the requirements for the award of the
degree of Bachelor of Technology (B. Tech) in Computer Science and Engineering is a record of
an original work done by me under the guidance of Mr. Madar Bandu, Assistant Professor and
this project work have not been submitted to any other university for the award of any other degree
or diploma.

ii
Kammampati Dhanalakshmi
(19H61A0580)

ABSTRACT

This project investigates machine-learning-based crime prediction. In this work, Vancouver


crime data for the last 15 years is analyzed using two different data-processing approaches.
MachineLearning predictive models, K nearest neighbor and boosted decision tree, are implemented
and a crime prediction accuracy between 39% to 44% is obtained when predicting crime in Vancouver.
On the other hand, decision trees can deal better with large datasets that have many layers with different
nodes. Crime analysis is a way of observing the patterns of crime that happened recently and in past
to get a better estimate of future crimes. Crime analysis is very important for the law enforcement also
helps crime detectives to find the criminals.

iii
CONTENTS
S.NO
Pg.NO

4.1.1. Use Case 10

4.1.2. Class Diagram 11

4.1.3 Activity Diagram 12

4.1.4 Sequence Diagram 13

5. Implementation 14

iv
5.1. Modules 14

5.2. Module Description 14

5.4 Sample Code 18

6. Test Cases 22

7. Screen Shots 23

8. Conclusion 25

9. Future Enhancement 26

10. Bibliography 27

1
1.INTRODUCTION
Crime is a socio-economical problem affecting life quality and economic growth. The
specifics of how crime is conducted changes depending on the type of society and community.
Previous researches in crime prediction have found that factors like education, poverty,
employment, and climate affect the crime rate. Vancouver is one of the most populous,
ethnically-diverse, and multicultural urban cities in Canada. The overall crime rate in
Vancouver dropped 1.5% in 2017, but high vehicle break-ins and theft is still an issue.
Recently, the Vancouver Police Department (VPD) introduced a crime predictive model to
predict crimes related to property break-ins and, once implemented, the city of Vancouver
witnessed a 27% drop in residential break-ins. Crime prediction is a law enforcement technique
that uses data and statistical analysis for the identification of crimes most likely to occur. This
field has been subject to continued research in many parts of the world.

1.1 Motivation:
In this study, we are looking to predict the number of crimes that will occur in the future based
on the number of crimes which have occurred in the past. The amount of crimes committed is said
to be the product of the number of people committing crimes and the average frequency at which
they commit crimes. We want to know if we can make accurate predictions of future crimes using
deep learning. Crime has been a constant and problematic issue that causes socioeconomic disparity
for the society as a whole. When there are not enough police officers to enforce the law, civilians
can risk getting injured and their properties can become easy targets for criminals. The motivation
and benefit behind predicting future crimes is that it allows a city to better prepare for the future. It
can be a large undertaking to prepare and plan for the amount of crime that will occur in the future,
especially if a city does not have a well-funded police department or good amount of police officers.
The city can allocate resources and officers more effectively if they know how much crime to expect
in future weeks. Study into how and why crimes are committed have already been theorized by
criminologist. By studying these theories, we can better understand the behavioral patterns of
offenders and create a model that uses these patterns to predict crime at a certain location and time.

2
1.2 Problem Definition:
The challenge facing the crime analyst is how to extrapolate past crime data into the likelihoods
of future incidents occurring at specified locations in space and time. Ideally the analyst wants an
image map showing the intensities of future crime activities at each location within their
jurisdictional boundaries. Opinions or points of view are obviously useful in crime prevention.

Since they would allow the police to allocate resources to the areas of higher risk.

1.3. Objective of the Project:


The primary objective of this work is to create a prediction model that can accurately predict
crime. In our research, two classification algorithms, K-Nearest Neighbor (KNN) and boosted
decision tree, were implemented to analyze the VPD crime dataset compiled between 2003 and
2018 with more than 560,000 records.

3
2.LITERATURE SURVEY

[1]. BURGLARY CRIME ANALYSIS USING LOGISTIC REGRESSION


AUTHORS: Daniel Antolos, Dahai Liu, Andrei Ludu, and Dennis Vincenzi
ABSTRACT:
This project used a logistic regression model to investigate the relationship between several
predicting factors and burglary occurrence probability with regard to the epicentre. These factors
include day of the week, time of the day, repeated victimization, connectors and barriers. Data was
collected from a local police report on 2010 burglary incidents. Results showed the model has
various degrees of significance in terms of predicting the occurrence within difference ranges from
the epicentre. Follow-up refined multiple comparisons of different sizes were observed to further
discover the pattern of prediction strength of these factors. Results are discussed and further
research directions were given at the end of the project.

[2]. ONCE UPON A CRIME: TOWARDS CRIME PREDICTION FROM


DEMOGRAPHICS AND MOBILE DATA
AUTHORS: Andrey Bogomolov, Bruno Lepri, Jacopo Staiano.
ABSTRACT:
In this paper, we present a novel approach to predict crime in a geographic space from multiple
data sources, in particular mobile phone and demographic data. The main contribution of the
proposed approach lies in using aggregated and anonymized human behavioral data derived from
mobile network activity to tackle the crime prediction problem. While previous research efforts
have used either background historical knowledge or offenders' profiling, our findings support the
hypothesis that aggregated human behavioral data captured from the mobile network infrastructure,
in combination with basic demographic information, can be used to predict crime. In our
experimental results with real crime data from London we obtain an accuracy of almost 70% when
predicting whether a specific area in the city will be a crime hotspot or not. Moreover, we provide
a discussion of the implications of our findings for data-driven crime analysis.

4
[3]. CRIME IN RELATION TO URBAN DESIGN.
AUTHORS: Heba Adel , Mohamed Saleem , Ran da Mahmoud
ABSTRACT:

Crime is a part of any social system and known to human communities since its origins. It
differs from community to another, even within one community it doesn’t occur equally in all places
and nor by the same way. It is also concentrated in some places more than others, sometimes
increases, sometimes decreases, etc. Previous researches have proved that crime rate has significant
correlation with different social factors: education levels, poverty rates and lack of social
organization, while others have drawn the attention to its relation with the built environment. They
proposed that crime occurs in places where both opportunities and criminals are available. The 4
role of this paper is to identify urban circumstances related to crime occurrence within the Greater
Cairo Region, and to propose different ways to reduce these crimes. Consecutively, agglomeration’s
main districts were scrutinized according to social analysis, street-network pattern and land-use.

[4]. CRIME DATA MINING: A GENERAL FRAMEWORK AND SOME EXAMPLES


AUTHORS: H. Chen; W. Chung; J.J. Xu; G. Wang.
ABSTRACT:
A major challenge facing all law-enforcement and intelligence-gathering organizations is
accurately and efficiently analyzing the growing volumes of crime data. Detecting cybercrime can
likewise be difficult because busy network traffic and frequent online transactions generate large
amounts of data, only a small portion of which relates to illegal activities. Data mining is a powerful
tool that enables criminal investigators who may lack extensive training as data analysts to explore
large databases quickly and efficiently. We present a general framework for crime data mining that
draws on experience gained with the Coplink project, which researchers at the University of
Arizona have been conducting in collaboration with the Tucson and Phoenix police departments
since 1997.

5
[5]. MINING ROAD TRAFFIC ACCIDENT DATA TO IMPROVE SAFETY ROLE OF ROAD-
RELATED FACTORS ON ACCIDENT SEVERITY IN ETHIOPIA.

AUTHORS: Tibe be Beshah, Shawndra Hill.


ABSTRACT:
Road traffic accidents (RTAs) are a major public health concern, resulting in an estimated 1.2
million deaths and 50 million injuries worldwide each year. In the developing world, RTAs are
among the leading cause of death and injury; Ethiopia in particular experiences the highest rate of
such accidents. Thus, methods to reduce accident severity are of great interest to traffic agencies 5
and the public at large. In this work, we applied data mining technologies to link recorded road
characteristics to accident severity in Ethiopia, and developed a set of rules that could be used by
the Ethiopian Traffic Agency to improve safety

[6]. AREA-SPECIFIC CRIME PREDICTION MODELS


AUTHORS: Mohammad Al Boni; Matthew S. Gerber ABSTRACT:

The convergence of public data and statistical modeling has created opportunities for public
safety officials to prioritize the deployment of scarce resources on the basis of predicted crime
patterns. Current crime prediction methods are trained using observed crime and information
describing various criminogenic factors. Researchers have favored global models (e.g., of entire
cities) due to a lack of observations at finer resolutions (e.g., ZIP codes). These global models and
their assumptions are at odds with evidence that the relationship between crime and criminogenic
factors is not homogeneous across space. In response to this gap, we present area-specific crime
prediction models based on hierarchical and multi-task statistical learning. Our models mitigate
sparseness by sharing information across ZIP codes, yet they retain the advantages of localized
models in addressing non-homogeneous crime patterns. Out-of-sample testing on real crime data
indicates predictive advantages over multiple state-of-the-art global models.

6
3.ANALYSIS

3.1 Existing System:


Machine learning is the science of having computers make decisions without human
intervention. Recently, machine learning has been applied in self-driving cars, speech recognition,
web search, and an improved understanding of the human genome. It has also made predicting
crime based on referenced data feasible. Classification is a supervised prediction technique which
allows for nominal class labels. Classification has been used in many domains including weather
forecasting, medical care, finances and banking, homeland security, and business intelligence.

EXISTING TECHNIQUE: -
• Logistic Regression

TECHNIQUE DEFINITION: -
• It is used to estimate discrete values (Binary values like 0/1, yes/no, true/false) based on given set of
independent variable(s).

• In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function.

• Hence, it is also known as logit regression.

DRAWBACKS: -
• Over fitting the Model

3.2 Proposed System:


• In the first approach, each neighbourhood and crime category was given a unique number when a certain
crime happens in a certain neighbourhood.

• In the second approach, the neighbourhood and the day of the week during which the crime was
committed were given a binary number and marked as 1 when the crime happened on that day in that
neighbourhood, and 0 otherwise.

PROPOSED TECHNIQUE: -
7
• KNN and boosted decision tree

TECHNIQUE DEFINITION: -
• It is more widely used in classification problems in the industry.
• K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a
majority vote of its k neighbors.

ADVANTAGES: -
• Comprehensive Nature

PROPOSED SYSTEM ADVANTAGES:


• Decision Tree can be used for both classification and regression problems.
• Decision Tree can automatically handle missing values.
• KNN is a very simple algorithm used to solve classification problems.
• KNN is very easy to implement.
• There are only two parameters required to implement KNN, the value of K and the distance function

PROPOSED SYSTEM ARCHITECTURE:

8
3.3 Software Requirement Specification:

3.3.1. Purpose:
The purpose of this study is to identify, evaluate and understand the various machine learning
and statistical techniques suitable for solving crime related problems based on proactive detection and
prediction of crime and management.

3.3.2. Scope:
Proposed system will deal with crime detection, prediction and management. Using clustering
and data mining techniques and time series, the system will offer predictions of future crime
incidence. This will be done through graphical representation of crime trends and use of
geographical heat maps to represent concentration of data and hotspots in real time.

3.3.3 Overall Description:


Vehicle detection module detects the presence of vehicle by using inductive sensors in which
metal wire loop is placed beneath the road. When a vehicle crosses the loop, there is changeinduced
current which detects presence of vehicle. As a result, the DSP is interrupted and it triggers the IR
camera to capture the image.

HARDWARE REQUIREMENTS:
The hardware requirements may serve as the basis for a contract for the implementation of the
system and should therefore be a complete and consistent specification of the whole system. They
are used by software engineers as the starting point for the system design. It should what the system
does and not how it should be implemented.

• PROCESSOR : DUAL CORE 2 DUOS.


• RAM : 4GB DD RAM
• HARD DISK : 250 GB

9
SOFTWARE REQUIREMENTS:
The software requirements document is the specification of the system. It should include both
a definition and a specification of requirements. It is a set of what the system should do rather than
how it should do it. The software requirements provide a basis for creating the software requirements
specification. It is useful in estimating cost, planning team activities, performing tasks and tracking the
teams and tracking the team’s progress throughout the development activity.

• Operating System : Windows 7/8/10


• Platform : Spyder3
• Programming Language : Python, HTML
• Front End : Spyder3

10
4.DESIGN

Design Engineering deals with the various UML [Unified Modelling language] diagrams for
the implementation of project. Design is a meaningful engineering representation of a thing that is to
be built. Software design is a process through which the requirements are translated into representation
of the software. Design is the place where quality is rendered in software engineering. Design is the
means to accurately translate customer requirements into finished product.

4.1. UML Diagrams:

We prepare UML diagrams to understand the system in a better and simple way. A single
diagram is not enough to cover all the aspects of the system. UML defines various kinds of diagrams
to cover most of the aspects of a system.

You can also create your own set of diagrams to meet your requirements. Diagrams are generally
made in an incremental and iterative way.There are two broad categories of diagrams and

they are again divided into subcategories −

• Structural Diagrams
• Behavioral Diagrams

Structural Diagrams

The structural diagrams represent the static aspect of the system. These static aspects represent those
parts of a diagram, which forms the main structure and are therefore stable.

Behavioral Diagrams

Any system can have two aspects, static and dynamic. So, a model is considered as complete when
both the aspects are fully covered.

11
Behavioral diagrams basically capture the dynamic aspect of a system. Dynamic aspect can be
further described as the changing/moving parts of a system.

4.1.1. Use Case Diagram:


The main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted. The above diagram consists of user as
actor. Each will play a certain role to achieve the concept.

12
4.1.2. Class Diagram:
In this class diagram represents how the classes with attributes and methods are linked together
to perform the verification with security. From the above diagram shown the various classes involved
in our project.

13
4.1.3 Activity Diagram:
Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of components
in a system. An activity diagram shows the overall flow of control.

14
4.1.4 Sequence Diagram:
Sequence Diagrams are interaction diagrams that detail how operations are carried out. They
capture the interaction between objects in the context of a collaboration. Sequence Diagrams are time
focus and they show the order of the interaction visually by using the vertical axis of the diagram to
represent time what messages are sent and when.

15
5. Implementation

5.1 Modules:
• DATA SOURCE
• PREPROCESSING
• STATISTICAL ANALYSIS
• TREND ANALYSIS

5.2. Module Description :


DATA SOURCE:
The original datasets were obtained from the open data catalog of the city of Vancouver. There
are two datasets used for this project: crime and neighborhood. The crime dataset has been collected
by the VPD since 2003 and is updated every Sunday morning. It provides information on the type of
crime committed and the time and location of the offence. The neighborhood dataset contains the

boundaries for the city’s 22 local areas in the Geographic Information System (GIS).

PREPROCESSING:
The original dataset needs to be preprocessed to fill the empty cells, delete unnecessary columns, and
add several relevant features to the original and preprocessed datasets.

Preprocessed datasets

16
STATISTICAL ANALYSIS:

The distribution of the crime dataset described is based on year, month, and day. In Vancouver,
the average number of crime incidents is around 31624 per year, 2720 per month, and 90 per day. The
dataset tends to show a normal distribution as the time intervals lengthen. However, the graph of each
day has an abnormal max value of 650 incidents, which is suspected as an outlier - and turns out to
indicate the Stanley- Cup riot on June 15, 2011.

(a) Distribution of crimes per year

(b) Distribution of crimes per day

17
TREND ANALYSIS:
The overall trend shows that the average number of crimes per month decreased from 2003 to
2013 but increased in 2016, and again fell slightly to about 3000 incidents per year in 2018. After
statistical analysis we classify the prediction values of the crime rate in Vancouver

Moving average of crimes per month

5.3 Introduction to Technologies Used:


PYTHON
Python is a high-level, interpreted, interactive and object-oriented scripting language. Python
is designed to be highly readable. It uses English keywords frequently where as other languages use
punctuation, and it has fewer syntactical constructions than other languages.

IMPORTANCE OF PYTHON:
• Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to compile your
program before executing it. This is similar to PERL and PHP.

• Python is Interactive − You can actually sit at a Python prompt and interact with the interpreter directly to
write your programs.

18
• Python is Object-Oriented − Python supports Object-Oriented style or technique of programming that
encapsulates code within objects. 30

• Python is a Beginner's Language − Python is a great language for the beginner-level programmers and
supports the development of a wide range of applications from simple text processing to WWW browsers
to games.

LIBRARIES USED IN PYTHON:


• NumPy - mainly useful for its N-dimensional array objects.
• pandas - Python data analysis library, including structures such as data frames.
• matplotlib - 2D plotting library producing publication quality figures.
• scikit-learn - the machine learning algorithms used for data analysis and data mining tasks.

ALGORITHM USED:
K-NEAREST NEIGHBORS(KNN):
KNN was applied in both approaches with the same parameters, and the accuracies and training
time was compared. For approach 1, KNN’s accuracy was 40.1% and training time is 2209 seconds,
while for approach 2 it turned out to be 39.9% accurate and took 101.73 seconds to train. The KNN
algorithm is a simple, supervised machine learning algorithm that can be used to solve both
classification and regression problems. Its purpose is to use a database in which the data points are
separated into several classes to predict the classification of a new sample point.

BOOSTED DECISION TREE:


Boosted decision tree is dependent on prior trees. The algorithm learns by fitting the residual
of the trees that preceded it. It is a decision-making tool that uses a flowchart like tree structure or is a
model of decisions and all of their possible results, including outcomes, input costs, and utility.
Decision-tree can deal better with large data sets that have many layers with different nodes. So, it falls
under the category of supervised learning algorithm. We applied boosted decision tree algorithm in
both approaches and compared the results. For both approaches, we used the Adaptive Boosting
(AdaBoost) ensemble method and learner-type decision tree. AdaBoost is a meta-algorithm that
combines several weak learners to improve a weak classifier. The maximum number of splits was 20.

19
Accuracy and training time for approach 1 was 41.9% 903.63 seconds, respectively, while approach 2
was 43.2% accurate with 459.26 sec training time.
5.4 Sample Code:
# -*- coding: utf-8 -*- import

numpy as np

# linear algebra import pandas as pd

# data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib.pyplot as plt from subprocess

import check_output

crimes1 =

pd.read_csv('input/crime/42_District_wise_crimes_committed_against_women_2001_2012.csv')

crimes2 =

pd.read_csv('input/crime/42_District_wise_crimes_committed_against_women_2013.csv') crimes

= pd.concat([crimes1,crimes2], ignore_index=False, axis=0)

# rename the STATE/UT column to STATE crimes.rename(columns={'STATE/UT':'STATE'},

inplace=True)

# delete data sets post concat

del crimes1 del crimes2

print('Dataset is ready....')

# know the shape of dataset crimes.shape

20
# collect the state names in a list and print

states = crimes.STATE.unique() print(states)

# do some data cleansing on state names


for i in range(0, len(states)):

states[i] = states[i].lower() for

j in range(0, len(states)): if

states[j] == 'a & n islands':

states[j] = 'a&n islands' if

states[j] == 'd & n haveli':

states[j] = 'd&n haveli' print(states)

# remove duplicate state names from the list

states = np.unique(states).tolist() print(states) #

convert the state names to lower

crimes['STATE'] = crimes['STATE'].str.lower()

crimes.head(3)

# filter out the Total crimes for each State & UT crimes_total =

crimes[crimes['DISTRICT'] == 'TOTAL'] # drop DISTRCT

Column as we do not intend to use at this point

crimes_total.drop('DISTRICT', axis=1, inplace=True) # filter out

the Total crimes for each State & UT for the year 2001

crimes_total_2001 = crimes_total[crimes_total['Year'] == 2001]

crimes_total_2001.drop('Year', axis=1, inplace=True) # Data of

21
Rape crime committed in the year 2001 per state x =

crimes_total_2001['STATE'].values y =

crimes_total_2001['Rape'].values

# plot the bar graph fig,


ax = plt.subplots() crime_rape = crimes_total_2001['STATE'].values

y_pos = np.arange(len(crime_rape)) performance =

crimes_total_2001['Rape'].values ax.barh(y_pos, performance,

align='center',color='green', ecolor='black') ax.set_yticks(y_pos)

ax.set_yticklabels(crime_rape) ax.invert_yaxis()

# labels read top-to-bottom

ax.set_xlabel('Rapes') ax.set_title('RAPE

VS STATE') fig.set_size_inches(20, 18,

forward=True) plt.show()

# Any results you write to the current directory are saved as output. # creating a new data set

crimes_total_women1=pd.read_csv('input/crime/42_District_wise_crimes_committed_against_wom

en_2001_2012.csv')

crimes_total_women2=pd.read_csv('input/crime/42_District_wise_crimes_committed_against_wom

en_2013.csv')

crimes_total_women = pd.concat([crimes_total_women1,crimes_total_women2],

ignore_index=False, axis=0)

crimes_total_women.rename(columns={'STATE/UT':'STATE'}, inplace=True)

del crimes_total_women1 del crimes_total_women2

22
# calculating total crimes of all kinds in each state from 2001 to 2013
crimes_total_women = crimes_total_women[crimes_total_women['DISTRICT'] == 'TOTAL']

crimes_total_women.drop('DISTRICT', axis=1, inplace=True) crimes_total_women['Total Crimes']=

crimes_total_women.iloc[:, -9:-1].sum(axis=1) crimes_total_women =

crimes_total_women.groupby(['STATE'])['Total Crimes'].sum() # plot graph of crimes committed on

women since 2001-2013 in each state/ UT fig1, ax1 = plt.subplots() states =

crimes_total_women.index.tolist() y_pos = np.arange(len(states)) performance =

crimes_total_women.tolist() ax1.barh(y_pos, performance, align='center',color='green',

ecolor='black') ax1.set_yticks(y_pos) ax1.set_yticklabels(states) ax1.invert_yaxis() # labels read

top-to-bottom ax1.set_xlabel('All Crimes Aganist Women') ax1.set_title('Crime VS STATE')

fig1.set_size_inches(20, 18, forward=True) plt.show()

#Import dependencies import numpy as

np import pandas as pd from urllib.error

import HTTPError from urllib.parse

import quote from urllib.parse import

urlencode import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

crime_df =

pd.read_csv('input/2001_2012/42_District_wise_crimes_committed_against_women_2001_2012

.csv')

print(crime_df.columns) crime_df = crime_df[['STATE/UT',

'DISTRICT','Year','Rape']] print(crime_df.head()) crime_df =

crime_df.loc[crime_df['STATE/UT'] == "ANDHRA PRADESH"] crime_df =

crime_df.loc[crime_df['DISTRICT'] == "EAST GODAVARI"]

23
#ANDHRA PRADESH

#SECUNDERABAD RLY.

X = crime_df.Year.values.reshape(-1, 1) y =

crime_df.Rape.values.reshape(-1, 1) print(y)

print("Shape: ", X.shape, y.shape) plt.scatter(X, y)

plt.show() from sklearn.linear_model import

LinearRegression model = LinearRegression()

model.fit(X, y) print('Weight coefficients: ',

model.coef_) print('y-axis intercept: ',

model.intercept_) x_min = np.array([[X.min()]])

x_max = np.array([[X.max()]])

y_min = model.predict(x_min) y_max

= model.predict(x_max)

plt.scatter(X, y, c='blue') plt.plot([x_min[0], x_max[0]],

[y_min[0], y_max[0]], c='red') plt.ylabel('Crime Count for

510-Auto Stolen') plt.xlabel('Month')

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

from sklearn.svm import SVC svclassifier = SVC(kernel='linear')

svclassifier.fit(X_train, y_train) y_pred = svclassifier.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix

print(confusion_matrix(y_test,y_pred))

print(classification_report(y_test,y_pred))

24
6.TESTCASES

25
7.SCREENSHOTS

26
SAMPLE RESULTS:

The results from both methods (KNN and boosted decision tree) are shown in following figures for

both approaches:

(a): KNN result for approach 1

(b): KNN result for approach 2

27
(c): Boosted Decision tree result for approach 1

(d): Boosted Decision tree result for approach 2

28
29
30
8.CONCLUSION

31
In this research, Telangana crime data for the last 15 years was used in two different dataset
approaches. Machine Learning predictive models KNN and boosted decision tree were used to obtain
crime-prediction accuracy between 70 to 80%. The accuracy, complexity, and training time of
algorithms were slightly different for different approaches and algorithms. The prediction accuracy
can be improved by tuning both the algorithm and the data for specific applications. Although this
model has low accuracy as a prediction model, it provides a preliminary framework for further analyses

9. FUTURE ENHANCEMENTS

32
Crime prediction is a law enforcement technique that uses data and statistical analysis for the
identification of crimes most likely to occur in the future. This field has been subject to continued
research in many parts of the world.

10.BIBLIOGRAPHY

33
[1] A. Bogomolov, B. Lepri, J. Staiano, N. Oliver, F. Piane.si, and A. Pentland, "Once upon a crime: towards
crime prediction from demographics and mobile data," Proc. of the 16th Intl. Conf. on Multimodal
Interaction, pp. 427-434, 2014.

[2] H. Adel, M. Saleem, and R. Mahmoud, "Crime in relation to urban design. Case study: the greater Cairo
region," Ain Shams Eng. J., vol. 7, no. 3, pp. 925-938, 2016.

[3] "Overall crime rate in Vancouver went down in 2017, VPD says," CBC News, Feb. 15, 2018.
[Online] Available: https://www.cbc.ca/news/canada/british-columbia/crime-ratevancouver2017-
1.4537831. [Accessed: 09- Aug- 2018].
[4] J. Kerr, "Vancouver police go high tech to predict and prevent crime before it happens," Vancouver
Courier, July 23, 2017. [Online] Available: https://www.vancourier.com/news/vancouverpolice-go-
high-tech-topredict-and-prevent-crimebefore-it-happens-1.21295288. [Accessed: 09- Aug- 2018]

[5] J. Han, Data mining: concepts and techniques, Morgan Kaufmann, 2012.
[6] R. Iqbal, M. A. A. Murad, A. Mustapha, P. H. Shariat Panahi, and N. Khana hm Adil Ravi, "An
experimental study of classification algorithms for crime prediction," Indian J. of Sci. and Technol., vol.
6, no. 3, pp. 4219- 4225, Mar. 2013.

[7] H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, and M. Chau, "Crime data mining: a general framework
and some examples," IEEE Computer, vol. 37, no. 4, pp. 50-56, Apr. 2004.

[8] T. Beshah and S. Hill, "Mining Road traffic accident data to improve safety: role of roadrelated factors
on accident severity in Ethiopia," Proc. of Artificial Intel. for Develop. (AID 2010), pp. 1419, 2010.

[9] M. Al Boni and M. S. Gerber, "Area-specific crime prediction models," 15th IEEE Intl. Conf. on Mach.
Learn. and Appl., Anaheim, CA, USA, Dec. 2016.

34

You might also like