
You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.


An experimental study of Crime Prediction using Machine Learning


Article in Test Engineering and Management · May 2022


0 164

5 authors, including:

Sikhinam Nagamani
Rajiv Gandhi University of Knowledge Technologies


All content following this page was uploaded by Sikhinam Nagamani on 30 May 2022.

The user has requested enhancement of the downloaded file.

May – June 2020
ISSN: 0193-4120 Page No. 17819 - 17825

An experimental study of Crime Prediction using

Machine Learning Algorithms
Ms.Sikhniam Nagamani1, Ms.I.Bhavishya2, Mr. B.Vijay Kumar3, Ms.T.Geetha Sree4
Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India
bhavishyainuganti@gmail.com, vijaykumarbarige@gmail.com,4geethasree.tati@gmail.com

Article Info Abstract

Volume 83 Crime in present society is a great troubling issue that is prevailing which makes it hard to
Page Number: 17819 - 17825
avoid. Many cases are recorded on a daily basis at many places. Since there are numerous
Publication Issue:
May - June 2020 cases that have been registered, it is necessary to maintain a database which makes it useful
for future use. The present issue that is faced is maintaining of legitimate crime datasets and
analysing the information to assist in anticipating and understanding the issues that may
arise in future. The main purpose of this is to predict the crimes that might happen in the
foreseeing future with the help of datasets that are available by capturing the crimes from
the past and present. We use the machine learning algorithms for analysing and predicting
crimes from crime datasets. Websites like kaggle provides required datasets. Data is a
mixture of type of crime, description, time and date, latitude and longitude. After gathering
Article History datasets pre processing is performed to remove noisy data and fill incomplete records which
Article Received: 1May 2020 leads to high accuracy. Different algorithms like LightGBM will be performed for crime
Revised: 11 May 2020 estimation, only the algorithm which gives high accuracy is be selected. Crimes are
Accepted: 20 May 2020 displayed with relation to the day, time and area of their occurrence. The sole purpose of this
Publication: 24May 2020
idea is to predict crimes with utilization of effective machine learning algorithms which
reduces the rate of crimes by predicting them and taking precautions

I. INTRODUCTION The problem that is raised made us to search for a

solution on how to solve a crime case in a simple
Crimes are the noticeable danger to the mankind.
method. On studying many documentations and
There are numerous crimes that happen very often
various different cases a conclusion was formed that
[1]. The rate at which crimes are happening has
using machine learning can make things work
increased a lot over the time. Violations occur in a
wide range of communities and places. There are
certain types of crimes and they are categorised as The datasets contain many features which helps us
battery, robbery, homicide, murder, rape, assault, to predict crimes. Data is gathered from many
kidnapping and false imprisonment. As a result of locations and hence it is separated with regard to the
this huge inclination of crimes there is a need to find occurrence of place of crime and time of the day,
solution to stop them in every way possible. The month, year and also type of crime [2]. Making
police needs assistance to control these mishaps. python as centre along with machine learning [1]
Crime prediction [6] and recognition of the criminal algorithms it is possible to predict crimes that will
are considered as serious problems for the officials happen at a particular area in a particular time.
due to the huge information which needs to be
To perform prediction the first step is to train a
checked for confirmation and matching the criminal.
model. Training is the process of validating test
The need to find a better way of solving this issue
datasets with the help of training datasets. The
has a great significance.
Published by: The Mattingley Publishing Co., Inc.
May – June 2020
ISSN: 0193-4120 Page No. 17819 - 17825

model is completely build by choosing algorithms innovations so as to keep police in front of them.
which gives better accuracy and precision. The main focus is the survey of algorithms and
LightGBM characterization and other calculation techniques utilized for identifying the criminals.
will be utilized for crime forecast. Displaying of
Crime analysis [8] is stated as methodology for
datasets is necessary which leads to check the
recognising the crime regions [1]. Crime type differs
crimes that occurred in the country. This particular
from every crime region; every zone is helpful to
work reduces the complexity for prediction of
reduce the percentage of crime. This is very difficult
crimes and eventually helps the officials to halt the
to differentiate the crime zones; with the help of this
rate of crimes that are committed.
procedure the crime percentage can be studied. As
II. LITERATURE SURVEY the use of computers is expanding it is evident to say
that data analysts are considered as a great help to
A lot of researchers confronted with different types
police officials for tracing and analysing of crimes.
of problems involving crime control and came up
Clustering [3] and pre processing techniques are
with distinct crime prediction algorithms. There are
performed on the data to extract Crime areas from
certain constraints to be satisfied to declare an
structured data [9] .In early days the factors of crime
algorithm is successful. Accuracy of prediction is
mainly dependent on the details of criminal and
solely based on the attributes that are chosen from
other factors. But the present system mainly
the data sets.
concentrates on the regions in which the crimes took
Crime is the most predominant action across the place. Naive bayes order was utilized in the existing
world [1]. Tracking such crimes need a colossal system and the fuzzy C-Means algorithm[7] will be
framework and activities should be intended to deal used in the present framework, to cluster the crime
with the datasets. Vancouver city’s data which is data for all recognizable crimes, for example, theft,
collected from 15 years is taken as datasets which is Burglary, Kidnapping, murder, cheating, wrong-
analysed. When K-nearest neighbours and supported doing against ladies, burglary and other crimes
choice trees were used a result showing 39% to 44%
Security is considered as important part. Many
of accuracy was achieved.
organisations and the government of many countries
Analysis [10] of crime for recognizing and are working very hard to stop crime and provide
examining trends and patterns in crimes. With the safety to their people. Reduction of crimes seems
expanding starting point of electronic frameworks, like a huge challenge because it needs storing and
crime data analytics can help the Law authorization utilization of large sets of information. So to access
officials to accelerate the way toward solving huge amount of data a crime data system is needed,
crimes. Utilizing the idea of data mining [2], we can it reduces the crime for analysts to find crime zones
break down already known, helpful data from ,crime patterns and also to predict future events.
unstructured information. Predictive policing Datasets are preprocessed and two methodologies
implies, utilizing logical and predictive techniques, are applied and two different results are retrieved
to distinguish criminal and it has been seen as which are to be compared.
essentially successful in doing likewise. In light of
the expanded crime percentage throughout the years,
we should deal with an enormous amount of crime 3.1 Predictive modelling:
information stored in warehouses which would be
Predictive modelling is defined as the method for
hard to be examined physically. Now a days
building a model that is equipped for making
criminals are getting advanced in technology, so
expectations. This procedure includes a technique of
there is huge requirement to utilize advance
Published by: The Mattingley Publishing Co., Inc.
May – June 2020
ISSN: 0193-4120 Page No. 17819 - 17825

machine learning which takes in specific properties mainly between atleast one illustrative factor
from a dataset to make those forecasts. indicated X and a scalar dependent variable Y.
Instance of X is called simple linear regression.
It can be distinguished into two different areas;
those are regression [11] and pattern classification. Logistic Regression is a type of regression in which
Forecasting is done with the help of regression the dependent variableis either categorical or binary.
models, it depends on analyzing the connection
Data preprocessing:
between the factors and crime patterns which leads
to make forecast. This procedure incorporates strategies to omit any
infinite or invalid terms which prove to influence
Unlike models of regression [5], pattern
exactness of the system. Formatting, cleaning and
classifications main theme is to produce a different
sampling are primary important steps for omitting or
class names to a specific detail data as a product of
filling missing data cleaning is performed.
forecasting. A real time example involving
classification model is climate estimation which has To reduce the runtime of the algorithm sampling is
various types of weather conditions. performed. This procedure produces suitable
information which is needed.
Further pattern classification is divided into two
parts. Supervised learning and Unsupervised 3.2 Functional Diagram of Proposed System
learning. In supervised learning the class mark
It is divided into 4 sections:
which are needed to build a classification model are
cognizant. In this type of learning we would know 1. Illustrative examination on given data
what will be the yield of a specific preparing dataset 2. Treatment of Information
that will be used to prepare with the goal that 3. Information Modelling
forecast can be made for incomplete information. 4. Prediction [4] of execution
Predictive model algorithm types:

Classification and Decision Trees are two types of

predictive modelling classification algorithms.
Decision tree algorithm uses a tree structured graph
or group of options which includes chance location
results, expenses, utilities.

Naive Bayes classifiers are one of the machine

learning [12] classifiers which are very straight
forward anticipated classifier relying on
implementing bayes theorem with liberation
presumptions. This system designs classifier models
and then these models allocate discrete class names
to publish examples, symbolized as course of special
values where the limited set features helps to draw
Fig 1: Architecture for crime prediction
class marks.
Prepare Data
Linear Regression is a type of analysis [4] which is
based on true methods for inspecting connections 1. The given information is shaped exactly for better
that are between the factors. These connections are analysis.
Published by: The Mattingley Publishing Co., Inc.
May – June 2020
ISSN: 0193-4120 Page No. 17819 - 17825

2. Data Cleaning 4.1 Data preprocessing

Data Cleaning is investigation and transformation of The dataset contains 10 thousand entries. The null
variables. We can use one of the following methods values are removed using df=df.dropna(),

Standardization or Normalization and Missing value Where df is the outline of details. The categorical
treatment attributes(location, street, type of crime and
community area) are converted into label encoder
Random sampling
numeric. The data attribute is divided into new
 In training sample a model will be created a attributes such as month and hour that can be used
near 70% to 80% of the information is put into the as the model’s function.
example model.
4.2 Feature selection
 In test sample the exhibitions of the model
will be approved with regard to this example, it Selection of features is done which can be utilized to
takes about 20% to 30% of the data. build the outline. Block, Location, City, Community
area, X organize, Y promote, Latitude, Longitude,
Model Selection
Hour and Month are the attributes used for
Bearing in mind the defined goals we need to select visualization.
one of the modelling methods or blends. As in the
4.3 Building and Training model
cases of
Since the field of collection of features and the
 LightGBM consistency of the month are used for planning. The
 Random Forest dataset is divided into the xtrain, ytrain and xtest, y
 KNN Classification classes. The architecture of the algorithms is sklearn
 Logistic Regression of an imported structure. Model building is finished
 Support Vector Machine using software Appropriate(xtrain,ytrain).
 Bayesian methods
4.4 Prediction
Build/Train/Develop models
Once the model is assembled using the method
 Check the verified calculation presumptions. mentioned above, prediction is finished using
 Generate or train sample model, which is model.predict(xtest). The accuracy is calculated
accessible data. using the measurement-imported accuracy score-
 Test model accuracy mistake. metrics.accuracy score (ytest,predicted).

Validate or Test models 4.5 Visualization

 Measure sample score and predict. Usage of sklearnmathplotlib library analysis of the
 Check model performance with accuracy and crime dataset is done by illustrating different maps.
so on. 4.6 Results and Discussion
IV. IMPLEMENTATION The tests are acquired in the wake of undertaking
The datasets that are using are taken from the different procedures which go through machine
website kaggle. These data sets are stored and learning.
updated by police department.
Implementation consists of following steps
Published by: The Mattingley Publishing Co., Inc.
May – June 2020
ISSN: 0193-4120 Page No. 17819 - 17825

K Neighbours 0.7173

Gaussian NB 0.646

Multinomial NB 0.456

Bernoulli NB 0.313

SVC 0.313

Decision Tree 0.586

Crime visualization
This section works with the study conducted on the
dataset and plots it into different charts like those of
bar, pie.
Research done were forms of crimes committed.
Fig 2: Comparision for the random forest and
LightGBM 1. No criminal offenses of any sort in country.

Predictive Modelling 2. Ratio of imprisoned.

Preprocessing data integrates slipping line without 3. Crimes committed across regions.
any row and turning over any value that has value as 4. Information of major crimes in the area.
infinity. Changing over string variable to numeric
variable with the objective that more training can be

The model is prepared with algorithms as listed in

the table in the face of partitioning the data index
introduction preparing set and testing set. The
accuracy is measured using a feature score precision
imported from sklearn metric. The exactness is
given in the table below.

From what we see in the results obtained from the

table, the algorithm that can be used for predictive
modelling will be greater than the algorithms in
LightGBMwith 0.9688 accuracy.

The least that can be used would be SVM. There is

no need to use other calculations for further display
using unseen data.
Fig 3: Types of crimes committed
Algorithm Accuracy
The most crimes happened in the city are shown in
LightGBM 0.9688 this graph. The x coordinate shows the crime types
that have been committed and y coordinate
Random Forest 0.7772

Published by: The Mattingley Publishing Co., Inc.
May – June 2020
ISSN: 0193-4120 Page No. 17819 - 17825

represents the number of crimes that have been


Fig 6: Crimes occurred in location

Frequently occurring crimes are visualized in the
above chart. More number of crimes occurred in
Northern and lowest occurred in Richmond.
Fig 4: Crimes occurred per day V. CONCLUSION AND FUTURE
The crimes that occurred more in number is ENHANCEMENT
described in the graph below. It is visible that that With the assistance of machine learning, it has
Friday has high rate of occurrence. gotten simple to discover connection and patterns
among different data's. The work right now spins
around predicting the type of crimes which may
occur on the off chance that we have the knowledge
of where it has occurred. Utilizing the principle of
machine learning, we constructed a model using
training dataset that cleaned up data and transform
data. The model predicts 0.9688 accuracy for the
form of crime. Representation of data analysis
assists in analysing data collection. The graphs used
are bar charts and pie charts each of which have
their own qualities. We have generated several
Fig 5: Action of crime graphs and discovered interesting metrics that
helped to explain crime statistics and helped to
The number of convicted persons in the city is
identify factors that can help to protect society. The
shown in the graph above. The x coordinate
limitation of LightGBM is it has narrow user base
indicates the arrest made or not. The y coordinate
and it is changing fast. In future we can overcome
indicates the number of crimes.
this by using XGBOOST

[1] Crime Analysis Through Machine Learning
SuhongKim,Param Joshi, Paminder Singh
Kalsi and PooyaTaheri Fraser International
College, Simon Fraser University British
Columbia, Canada. IEEE 2018
Published by: The Mattingley Publishing Co., Inc.
May – June 2020
ISSN: 0193-4120 Page No. 17819 - 17825

[2] A Review: Crime analysis using Data Mining Computing. Advances in Intelligent Systems
Techniques and Algorithms and Computing, vol 768. Springer,Singapore
ChhayaChauhan,SmritiSehgal, Amity [11] K. Lavanya, L. S. S. Reddy and B. Eswara
Univaersity Uttar Pradesh,India. IEEE 2018 Reddy, ‖Modelling of Missing Data
[3] Crime Prediction and Forecasting in Imputation using Additive LASSO Regression
Tamilnadu Using Clustering Approaches Model in Microsoft Azure‖, Journal of
S,Sivaranjani,Dr.S.Sivakumari,Aasha.MAvina Engineering and Applied Sciences,2018,Vol
shilingam University Coimbatore,India.IEEE 13,Special Issue 8,pp:6324-6334.(SCOPUS)
2016 [12] Rama Devi Burri, Ram Burri, Ramesh
[4] Crime Pattern Detection,Analysis and ReddyBojja, SrinivasaraoBuraga―Insurance
Prediction Sunil Yadav, Meet Timbadia, claim Analysis using Machine learning
AjitYadav, RohitVishwakarma and Algorithms, ―International journal of
NikhileshYadav University of Mumbai, Shree innovative technology and Exploring
L.R Tiwari College of Engineering, Thane, Engineering (IJITEE), Volume-8,Issue-6S4,
India.IEEE 2017 April-2019, ISSN: 22278-3075.
[5] Crime Prediction using Auto Regression
Techniques for Time series Data
Romikayadav,SavitaKumarisheoron Indira
Gandhi University Meerpur, Rewari –
[6] Crimecast :A Prediction and Strategy
Direction Service Nafiz Mahmud, Khalid
IbnZinnah, YeasinArRahman, Nasim Ahmed
Chittagong University of Engineering &
Technology Chittagong-4349,
Bangladesh.IEEE 2016
[7] Crime Analysis and Prediction Using Fuzzy C-
Means algorithm B. Sivanagaleela, S. Rajesh
V. R. Siddhartha Engineering College
Vijayawada, Andhrapradesh.IEEE 2016
[8] Crime Analysis in Chicago City
Ayidhalqahtani, AjwaniGarima, Ahmad
Alaiad University of Maryland Baltimore
County Baltimore ,United States.IEEE 2019
[9] Prediction analysis of Crime in India using a
Hybrid Clustering Approach Dr.J.Kiran,
Kaishveen.K Guru Nanak Dev Engineering
College, Ludhiana.IEEE 2018
[10] PurushottamaRao K., Koneru A., Naga Raju
D. (2019) OEFC Algorithm—Sentiment
Analysis on Goods and Service Tax System in
India. In:Mallick P., Balas V., Bhoi A., Zobaa
A. (eds) Cognitive Informatics and Soft

Published by: The Mattingley Publishing Co., Inc.

View publication stats

You might also like