Using Big Data Analytics For Developing Crime Predictive Model

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/302026832

Using Big Data Analytics for developing Crime Predictive Model

Conference Paper · January 2016

CITATIONS READS

2 2,496

2 authors:

Tirthraj Chauhan Rajanikanth Aluvalu


Darshan Institute of Engineering and Technology Vardhaman College of Engineering
1 PUBLICATION   2 CITATIONS    43 PUBLICATIONS   70 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

attack graph ,intrusin detection system View project

Access control models View project

All content following this page was uploaded by Rajanikanth Aluvalu on 07 May 2016.

The user has requested enhancement of the downloaded file.


Proceedings of RK University’s First International Conference on Research & Entrepreneurship (Jan. 5th& Jan. 6th, 2016)

ISBN:978-93-5254-061-7 (Proceedings available for download at rku.ac.in/icre)

RK University’s First International Conference on Research & Entrepreneurship (ICRE 2016)

Using Big Data Analytics for developing Crime Predictive Model

Tirthraj Chauhan1,*, Rajanikanth Aluvalu2

1
School of Science, RK University, Rajkot-Bhavnagar Highway, Rajkot-360020, Gujarat, India.
2
Associate Professor, Dept. Of Computer Engineering, RK University, Rajkot, India

*Corresponding author: Tirthraj Chauhan (tirthrajchauhan141290@gmail.com)

ABSTRACT

In this growing field of technology, rate of cyber-crimes is increasing and are challenging the capabilities
of investigation people. The data generation regarding crime is also increased nowadays which is mostly
digital in nature. Nowadays generated data cannot be handled efficiently with the use of traditional
analysis techniques. Instead of using traditional data analysis techniques it would be beneficial to use Big
Data Analytics for that huge data. Primarily collected data will be distributed over geographic location
and based on that clusters will be created. In second phase the created clusters are analyzed using Big
Data Analytics. Finally that analyzed clusters are given to the Artificial Neural Network which will
results in production of prediction pattern. That prediction pattern can be used by security authorities for
allocating resources that helps in reducing crime.

SUMMARY

Using Big data Analytics for analyzing crime related data to develop crime predictive model.

Keywords: Crime Mapping, R tool, HDFS,

INTRODUCTION

Increasing crime day by day is the main issue in front of human society. Crime occurs when the personnel
space or work space of offender and target intersects at a single point (1). Target may be single person or
group of people or say a territory. Crime might accidental or it might be planned. Accidental crime is
RK University’s First International Conference on Research & Entrepreneurship (ICRE 2016) 2

unfortunate and unexpectedly occurs. Accidental crime occurs at any places. The group of people fight
with others for a small matter which may harms the people which are not having any relation with that
matter. Planned crime is the crime which is implemented intentionally. The person whose intention is to
do crime, primarily research the target or target area and study it accordingly to implement crime.
Secluded places have the higher chances for crime to occur where police patrolling is less (1).

Crime mapping is used to analyze, map and visualize crime incidents or crime pattern to have an idea for
predicting the crime occurrence. Crime mapping thus helps the security as well as police to accommodate
their resources accordingly for preventing crime (2). In earlier time crime mapping could be done by few
peoples who were having special tools. Nowadays both scholars as well as practitioners have the capacity
to map the crime using available criminal spatial data and with the help of developed advance technology.
Thus crime mapping is mainly implemented to reduce the crime from the society by identifying the
Hotspots (the places where crime can occur at higher rate) (3).

In earlier time the data regarding crime are mostly the police complaints, news paper’s report and articles
which are available in hand written format or printed but as the technological development advances the
data regarding crime are available in hard copy as well as soft copy format. Past scenarios are different as
the lower crime rate was there, the data generated regarding criminal activities was also low. On that less
amount of data traditional data analysis techniques are efficient to analyze and predict the crime. The past
data related to criminal activities plays a vital role in mapping crime and prediction of places where crime
can occur(3). Analyzing that data available in earlier time was very tedious and time consuming task by
traditional data mining techniques even though the data was very less. Data generation nowadays is vast
due to increased crime rate which cannot be handled by traditional data analysis techniques. This vast
generated data is Big Data which can be easily treated with the help of Big Data Analytics (4). Digital
data may be structured, semi-structured or unstructured. Mostly the digital data which are analyzed till
now was a structured kind of data for predicting crime (4). Structured data can be considered as the data
arranged in tabular format with the help of suitable rows and columns. Previous data are helpful to
predict the volatile places or say hotspots. After applying some data mining techniques like clustering,
classification and other techniques the places having higher chances of crime to be occur were identified
and police capabilities can be allocated there. Nowadays the use of internet is increasing rapidly. The use
of internet is also responsible to provide communication between criminals for completing their targeted
mission. So the data generation is in huge amount which is mostly in semi structured or unstructured data
format and can be analyzed using clustering for Big Data (5). To analyze such huge amount of data either
in semi-structured or unstructured format traditional data mining techniques are not that much capable.
For that purpose data Big Data Analytics is used.

Generation of data increases exponentially and traditional infrastructure is somewhat incapable of


handling such a vast data. Using Big Data Analytics these vast data which includes unstructured or semi-
structured data can be handled (6). As the input given to Hadoop might be in semi-structured or
unstructured but the output generated from Hadoop will results in structured data(9). The mapper and
reducer will contain the prediction algorithm and map reduce is used to handle such data and produce the
results in half the time that is taken by traditional data mining methodology.

R tool is used to distribute the data geographically. This tool is capable of generating geospatial
representation of data geographically distributed data. Different packages are available with this tool
which needs to be installed in order to perform the data distribution. Data analysis as well as different
visualization patterns of distributed data can be obtained from this tool.
RK University’s First International Conference on Research & Entrepreneurship (ICRE 2016) 3

Artificial Neural Network is the collection of different processing neurons or nodes (processing elements)
which gives the prediction based on available data or clustered data. The prediction accuracy of Artificial
Neural Network is normally very high as compared to other systems like Fuzzy Logic Series or Bayesian
Network (11). The main disadvantage of Artificial Neural Network is that it takes time to learn Artificial
Neural Network implementation.

RELATED STUDY

Crime can be considered as an “act against the law which harms the innocent peoples and results in
acquiring punishments from the legal authorities like law enforcement or judiciary authority of
government”. Different types of crime are mainly traffic violations, fraud, sex crime, arson, drug
offenses, violent crimes, murders, robbery, damage, theft and cyber-crime (1). It can be observed that the
past data which were relevant to criminal activities are helpful for predicting the crime hotspots.

Crime data analysis can be done using data mining techniques with the tools like weka tool, rapid minor
tool, R tool, KNIME, ORANGE and Tanagra etc. Mostly the crime data analysis is done using k-means
clustering technique of data mining. Due to development in technology the criminals are using their
technological equipment for doing crime. That digital data is being used to analyze the crime (3). The
analyzed crime will be useful in predicting the hotspots. Again the data used for analyzing and for
prediction purpose using data mining is structured data, when there is unstructured or semi-structured
data, data mining techniques are somewhat time consuming at that moment. (4)Obtained criminal data
was taken, preparing that data for rapid minor tool and perform k-means clustering on that data to obtain
the clusters. After obtaining clusters, analyzing that clusters to predict the crime.

Other data mining techniques can also be applicable to analyze the crime data and prediction can be done
to identify the hotspots. Other technique includes mainly classification, aK-means clustering algorithm,
Expectation maximizing algorithm etc. After applying aK-means clustering algorithm it might provide
improved results then what we obtain after only applying k-means clustering(4). K-Means algorithm can
be implemented in Big Data Analytics (5) (7). These are some traditional and time consuming techniques
to map the crime as it requires more over the structured data.

To distribute the data relevant to crime geographically is also a tedious task but now it can be
implemented using the tools like R tool. With some geospatial packages that needs to be installed and
running with the R tool will greatly influence the data to be distributed over geographic areas. Clustering
of that criminal data can be done using appropriate technology. (8) In Big Data Analytics GA (Genetic
Algorithm) based clustering can also be implemented for analyzing or implementing clustering.

The apache foundation provided the powerful tool for storing and analyzing the data in different clusters
separately. Different cluster’s data are processed separately where map reduce is used to produce some
fruitful results and also uses the HDFS (Hadoop File System). As the different clustered data are
processed simultaneously the processing is very fast as compared to traditional storage and processing
scenarios and tools. HDFS is a component or say file system of Hadoop which is distributed in nature (9).
In HDFS meta-data deals NameNode servers and data regarding applications deals with the DataNode
servers (10). With the use of map reducing it is possible to process the semi-structured as well as
unstructured data so this Hadoop platform is used nowadays for large data that contains the data of all
data model format. Using Big Data Analytics is to overcome the problems like increment in data size
which should be stored and analyzed, varied data recording methods and infrastructure, due to its
complex nature and time consumption.
RK University’s First International Conference on Research & Entrepreneurship (ICRE 2016) 4

Artificial Neural Networks are used to provide the prediction pattern based on analyzed data or given
data. Other soft computing techniques like fuzzy time series, Bayesian networks are also used for
prediction purpose but Artificial Neural Networks are more powerful as they provide more accuracy as
compared to other techniques. Bayesian networks are totally depends upon the selection of parameters
and in Fuzzy Time Series results are effected by various factors. Drawback of using Artificial Neural
Networks is to learn how to implement that (11) (12).

Nowadays the field of digital forensics is also approaching to analyze the crime in order to predict the
crime which helps in crime mapping ultimately results in identifying the places where crime can occur.
The field of forensics uses some data mining technique as well as trying to use the concept of big data
analytics for crime mapping. Digital forensics is the branch of computer science and engineering which
deals mainly with collecting evidences which are digital in nature and can be obtained from digital device
such as smart phones, computers, laptops, tablets, palmtops. The major problem over here is also the
generated data which is in huge amount. That couldn’t be handled with the existing infrastructure and that
is the reason for approaching big analytics (13).

RESEARCH OBJECTIVE

Due to increased crime rate, vastly generated criminal data cannot be efficiently analyzed by traditional
data analysis techniques. Objective of this research is to analyze such vastly generated data using Big
Data Analytics for providing the analyzed clusters to Artificial Neural Network which in turn produces
the crime prediction pattern. Produced prediction pattern can be utilized by police department for
allocating their resources in order to reduce crime rate.

PROPOSED WORK

Using R tool, Big Data Analytics and Artificial Neural Network we are going to perform the crime
mapping. It contains mainly three phase – Distribution of data geographically and creating clusters,
Cluster analysis of created clusters and prediction of crime.

Distribution of data geographically is the first phase where the available data is distributed over
geographical areas. Here the available data is related to crime. This can be implemented using the R tool
with the geospatial packages. With that the clusters are created after allocation of centroids. The KDE
(Kernel Density Estimation) technique will be used for estimating or creating clusters on the basis of
mapped data. As this created clusters are being utilized by cluster analysis phase.

Hadoop platform is used for cluster analysis purpose which is second phase. Clusters created in primary
phase are used as input this phase and suitable clustering algorithm is to apply over here for the analysis
purpose. Hadoop can perform parallel processing on different clusters the processing will be fast as
compared to traditional processing capabilities. This will result in less time consumption and gives the
output earlier than the normal data mining cluster analysis process. The GAMMA Test is used for cluster
analysis over this phase.

As shown in fig.1, Analyzed data of cluster analysis i.e., identified cluster from the Hadoop is utilized by
the Artificial Neural Network as an input for crime forecasting purpose is third and final phase. It also
uses the regression tree prediction specification and classification. The output of Artificial Neural
Network is the pattern that predicts the crime rate at different places or the places where the chances of
crime occurrence are high. Artificial neural network is selected to predict the pattern because it’s quite
RK University’s First International Conference on Research & Entrepreneurship (ICRE 2016) 5

good than other network in terms of producing pattern as well as it takes less time for producing the
pattern.

CONCLUSION

The proposed work focuses on crime prediction by crime mapping with recorded data using the latest
technology. The model helps in reducing crime for the security authorities. The model also helps the
authorities in investigation of crimes. Using Big Data Analytics with clustering approach reduces the
investigation time and helps in retrieving the hidden information through correlation and categorization.

FIGURES

Fig. 1 Crime Prediction process Model

REFERENCES

[1].Saoumya, Anurag Singh Baghel, A Predictive Model For Mapping Crime Using Big Data Analytics,
IJRET, eISSN:2319-1163

[2].Vikas Grover, Richard Adderley, Max Bramer, Review of Current Crime Prediction Techniques

[3]. Lenin Mookiah, William Eberle, AmbareenSiraj, Survey of Crime Analysis and Prediction,
Proceedings of the twenty-Eighth International Florida Artificial Intelligence Research Society
Conference, 2015

[4].RenukaNagpal, RajniSehgal, Crime Analysis using K-Means Clustering, International Journal of


Computer Applications (0975 – 8887) Volume 83 – No4, December 2013
RK University’s First International Conference on Research & Entrepreneurship (ICRE 2016) 6

[5]. Mugdha Jain, ChakradharVarma, Adapting K-means for Clustering in Big Data, International
Journal of Computer Application, Volume 101-No.1, September 2014.

[6].Dr.A.Bharthi, R.Shilpa, A Survey On Crime Data Analysis of Data Mining using Clustering
Techniques,International Journal of Advance Research in Computer Science and Management Studies,
Volume 2, Issue 8, August 2014.

[7]. KeshavSanse, Meena Sharma, Clustering methods for Big data analysis, IJARCET, Volume 4, Issue 3,
March 2015.

[8]. Nivranshu Hans, Sana Mahajan, SN Omkar, Big Data Clustering Using Genetic Algorithm On
Hadoop Mapreduce, Internation Journal Of Scientific & Technology Research Volume 4, Issue 4, April
2015.

[9].Konstantin Shvachko, HairongKuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed File
System.

[10].Shalini Jain, SatendraSonare, Big Data Analysis Using HDFS, C-MEANS and Map reduce,
International Journal Of Advanced Reasearch in Computer Science and Software Engineering, Volume 5,
Issue 4, 2015

[11].Setu Kumar Chaturvedi, Nikhil Dubey, A Survey Papaer on Crime Prediction Technique Using Data
Mining, Int. Journal Of Engineering Reasearch and Applications, Vol. 4, Issue 3(version 1), March
2014

[12].Ms.Sonali. B. Maind, Ms.PriyankaWankar, Research Paper on Basics Of Artificial Neural Network,


International Journal on Recent and Innovation Trends in Computing and Communication, Volume :2,
Issue :1.

[13].Sindhu K. K., Dr. B. B. Meshram, A Digital Forensic Tool for Cyber-Crime Data Mining, An
International Journal (ESTIJ), Vol.2, No.1, 2012.

View publication stats

You might also like