Sat - 63.Pdf - Crime Detction Using Machine Learning

 Crimes are of different type – robbery, murder, rape, assault, battery,
false imprisonment, kidnapping, homicide. Since crimes are increasing

there is a need to solve the cases in a much faster way. The crime
activities have been increased at a faster rate and it is the responsibility
of police department to control and reduce the crime activities
 The aim of this project is to make crime prediction using the features
present in the dataset. The dataset is extracted from the official sites.
With the help of machine learning algorithm, using python as core we
can predict the type of crime which will occur in a particular area.
Chapter 2
LITERATURE REVIEW
2.1 CRIME RATE PREDICTION
Abstract
Crime is one of the biggest and dominating problem in our society. Daily there are
huge number of crimes committed frequently. Here the dataset consists of the date
and the crime rate that has taken place in the corresponding years. In this project the
crime rate is only based on the robbery. We use linear regression algorithm to predict
the percentage of the crime rate in the future by using the previous data information.
The date is given as an input to the algorithm and the output is the percentage of the
crime rate in that particular year. Keywords: Crime rate, number of crimes, regression
algorithm, Machine learning.
1. INTRODUCTION
Crimes are the significant threat to the humankind. There are many crimes
that happen in regular intervals of time. Perhaps it is increasing and spreading
at a fast and vast rate. Crimes happen from small village, town to big cities.
Crimes are of different type – robbery, murder, rape, assault, battery, false
imprisonment, kidnapping, homicide. Since crimes are increasing there is a
need to solve the cases in a much faster way. The crime activities have been
increased at a faster rate and it is the responsibility of police department to
control and reduce the crime activities. Crime prediction and criminal
identification are the major problems to the police department as there are
tremendous amount of crime data that exist. There is a need of technology
through which the case solving could be faster. Through many documentation
and cases, it came out that machine learning and data science can make the
work easier and faster. The aim of this project is to make crime prediction
using the features present in the dataset. The dataset is extracted from the
official sites. With the help of machine learning algorithm, using python as
core we can predict the type of crime which will occur in a particular area
with crime perception. The objective would be to train a model for prediction.
The training would be done using Training data set which will be validated
using the test dataset. The Multi Linear Regression (MLR) will be used for
crime prediction. Visualization of dataset is done to analyze the crimes which
may have occurred in a particular year and based on population and number
of crimes. This work helps the law enforcement agencies to predict and detect
the crime_percapita in an area and thus reduces the crime rate.
METHODOLOGY:
Initially, the data set is prepared manually. After identifying the relationships
and visualizing the data, we create a regression model for forecasting the
percapita. For this model, we have used Multi Linear regression model. Other
models such as the Linear Regression and Logistic Regression models were
also tested, but the Multi Linear regression produced the minimal error while
training the model. This regression model predicts the percapita of Crime rate
that is going to be happen in future by taking different parameters. Logistic
Regression: In statistics, the logistic model (or logit model) is used to model
the probability of a certain class or event existing such as pass/fail, win/lose,
alive/dead or healthy/sick. Logistic regression is a statistical model that in its
basic form uses a logistic function to model a binary dependent variable,
although many more complex extensions exist. In regression analysis, logistic
regression (or logit regression) is estimating the parameters of a logistic
model (a form of binary regression). Mathematically, a binary logistic model
has a dependent variable with two possible values, such as pass/fail which is
represented by an indicator variable, where the two values are labeled "0" and
"1". KNN: In pattern recognition, the k-nearest neighbors algorithm (k-NN) is
a nonparametric method used for classification and regression. In both cases,
the input consists of the k closest training examples in the feature space. The
output depends on whether k-NN is used for classification or regression. In k-
NN classification, the output is a class membership. An object is classified by
a plurality vote of its neighbors, with the object being assigned to the class
most common among its k nearest neighbors (k is a positive integer, typically
small). If k = 1, then the object is simply assigned to the class of that single
nearest neighbor. In k-NN regression, the output is the property value for the
object. This value is the average of the values of k nearest neighbors. Multiple
linear regression (MLR), also known simply as multiple regression, is a
statistical technique that uses several explanatory variables to predict the
outcome of a response variable. The goal of multiple linear regression (MLR)
is to model the linear relationship between the explanatory (independent)
variables and response (dependent) variable. So far, we have seen the concept
of simple linear regression where a single predictor variable X was used to
model the response variable Y. In many applications, there is more than one
factor that influences the response. Multiple regression models thus describe
how a single response variable Y depends linearly on a number of predictor
variables. In essence, multiple regression is the extension of ordinary least-
squares (OLS) regression that involves more than one explanatory variable
RESULTS
The Support vector Regression model results in classifying the test data. It
predicts the stock crime rate value that is going to be happening in future. It
uses train set to train and fit the model.The test data is transformed and
predicts the accurate result By Using Django and Aws we prepared the
UserInterface. The below figure describes that home page takes the year,
murder, attempts, cheating etc., values as input in text boxes.
2.2Crime forecasting: a machine learning and computer vision approach to

crime prediction and prevention
Abstract
A crime is a deliberate act that can cause physical or psychological harm, as well as
property damage or loss, and can lead to punishment by a state or other authority
according to the severity of the crime. The number and forms of criminal activities are
increasing at an alarming rate, forcing agencies to develop efficient methods to take
preventive measures. In the current scenario of rapidly increasing crime, traditional
crime-solving techniques are unable to deliver results, being slow paced and less
efficient. Thus, if we can come up with ways to predict crime, in detail, before it
occurs, or come up with a ―machine‖ that can assist police officers, it would lift the
burden of police and help in preventing crimes. To achieve this, we suggest including
machine learning (ML) and computer vision algorithms and techniques. In this paper,
we describe the results of certain cases where such approaches were used, and which
motivated us to pursue further research in this field. The main reason for the change in
crime detection and prevention lies in the before and after statistical observations of
the authorities using such techniques. The sole purpose of this study is to determine
how a combination of ML and computer vision can be used by law agencies or
authorities to detect, prevent, and solve crimes at a much more accurate and faster
rate. In summary, ML and computer vision techniques can bring about an evolution in
law agencies.
Introduction
Computer vision is a branch of artificial intelligence that trains the computer to

understand and comprehend the visual world, and by doing so, creates a sense of
understanding of a machine‘s surroundings. It mainly analyzes data of the
surroundings from a camera, and thus its applications are significant. It can be used
for face recognition, number plate recognition, augmented and mixed realities,
location determination, and identifying objects. Research is currently being conducted
on the formation of mathematical techniques to recover and make it possible for
computers to comprehend 3D images. Obtaining the 3D visuals of an object helps us
with object detection, pedestrian detection, face recognition, Eigenfaces active
appearance and 3D shape models, personal photo collections, instance recognition,
geometric alignment, large databases, location recognition, category recognition, bag
of words, part-based models, recognition with segmentation, intelligent photo editing,
context and scene understanding, and large image collection and learning, image
searches, recognition databases, and test sets. These are only basic applications, and
each category mentioned above can be further explored. In, VLFeat is introduced,
which is a library of computer vision algorithms that can be used to conduct fast
prototyping in computer vision research, thus enabling a tool to obtain computer
vision results much faster than anticipated. Considering face detection/human
recognition, human posture can also be recognized. Thus, computer vision is
extremely attractive for visualizing the world around us.
Machine learning (ML) is an application that provides a system with the ability to
learn and improve automatically from past experiences without being explicitly
programmed. After viewing the data, an exact pattern or information cannot always be
determined . In such cases, ML is applied to interpret the exact pattern and
information . ML pushes forward the idea that, by providing a machine with access to
the right data, the machine can learn and solve both complex mathematical problems
and some specific problems. In general, ML is categorized into two parts: (1)
supervised ML and (2) unsupervised ML. In supervised learning, the machine is
trained on the basis of a predefined set of training examples, which facilitates its
capability to obtain precise and accurate conclusions when new data are given. In
unsupervised learning, the machine is given a set of data, and it must find some
common patterns and relationships between the data its own. Neural networks, which
are important tools used in supervised learning, have been studied since the 1980s. the
author suggested that different aspects are needed to obtain an exit from
nondeterministic polynomial (NP)-completeness, and architectural constraints are
insufficient. However, it was proved that NP-completeness problems can be extended
to neural networks using sigmoid functions. Although such research has attempted to
demonstrate the various aspects of new ML approaches, how accurate are the results?
Although various crimes and their underlying nature seem to be unpredictable, how
unforeseeable are they? the authors pointed out that as society and the economy
results in new types of crimes, the need for a prediction system has grown. crime
trends and prediction technology called Mahanolobis and a dynamic time wrapping
technique are given, delivering the possibility of predicting crime and apprehending
the actual culprit. As described in 1998, the United States National Institute of Justice
granted five grants for crime forecasting as an extension to crime mapping.
Applications of crime forecasting are currently being used by law enforcement in the
United States, the United Kingdom, the Netherlands, Germany, and Switzerland.
Nowadays, criminal intellect with the help of advances in technology is improving
with each passing year. Consequently, it has become necessary for us to provide the
police department and the government with the means of a new and powerful machine
(a set of programs) that can help them in their process of solving crimes. The main
aim of crime forecasting is to predict crimes before they occur, and thus, the
importance of using crime forecasting methods is extremely clear. Furthermore, the
prediction of crimes can sometimes be crucial because it may potentially save the life
of a victim, prevent lifelong trauma, and avoid damage to private property. It may
even be used to predict possible terrorist crimes and activities. Finally, if we
implement predictive policing with a considerable level of accuracy, governments can
apply other primary resources such as police manpower, detectives, and funds in other
fields of crime solving, thereby curbing the problem of crime with double the power.
In this paper, we aim to make an impact by using both ML algorithms and computer
vision methods to predict both the nature of a crime and possibly pinpoint a culprit.
Beforehand, we questioned whether the nature of the crime was predictable. Although
it might seem impossible from the outside, categorizing every aspect of a crime is
quite possible. We have all heard that every criminal has a motive. That is, if we use
motive as a judgment for the nature of a crime, we may be able to achieve a list of
ways in which crimes can be categorized. Herein, we discuss a theory where we
combine ML algorithms to act as a database for all recorded crimes in terms of
category, along with providing visual knowledge of the surroundings through
computer vision techniques, and using the knowledge of such data, we may predict a
crime before it occurs.
Present technological used in crime detection and prediction
Crime forecasting refers to the basic process of predicting crimes before they occur.
Tools are needed to predict a crime before it occurs. Currently, there are tools used by
police to assist in specific tasks such as listening in on a suspect‘s phone call or using
a body cam to record some unusual illegal activity. Below we list some such tools to
better understand where they might stand with additional technological assistance.
One good way of tracking phones is through the use of a stingray , which is a new
frontier in police surveillance and can be used to pinpoint a cellphone location by
mimicking cellphone towers and broadcasting the signals to trick cellphones within
the vicinity to transmit their location and other information. An argument against the
usage of stingrays in the United States is that it violates the fourth amendment. This
technology is used in 23 states and in the district of Columbia. the authors provide
insight on how this is more than just a surveillance system, raising concerns about
privacy violations. In addition, the Federal Communicatons Commission became
involved and ultimately urged the manufacturer to meet two conditions in exchange
for a grant: (1) ―The marketing and sale of these devices shall be limited to federal,
state, local public safety and law enforcement officials only‖ and (2) ―State and local
law enforcement agencies must advance coordinate with the FBI the acquisition and
use of the equipment authorized under this authorization.‖ Although its use is
worthwhile, its implementation remains extremely controversial.
A very popular method that has been in practice since the inception of surveillance is
―the stakeout‖. A stakeout is the most frequently practiced surveillance technique
among police officers and is used to gather information on all types of suspects. the
authors discuss the importance of a stakeout by stating that police officers witness an
extensive range of events about which they are required to write a report. Such
criminal acts are observed during stakeouts or patrols; observations of weapons,
drugs, and other evidence during house searches; and descriptions of their own
behavior and that of the suspect during arrest. Stakeouts are extremely useful, and are
considered 100% reliable, with the police themselves observing the notable
proceedings. However, are they actually 100% accurate? All officers are humans, and
all humans are subject to fatigue. The major objective of a stakeout is to observe
wrongful activities. Is there a tool that can substitute its use? We will discuss this
point herein.
Another way to conduct surveillance is by using drones, which help in various fields
such as mapping cities, chasing suspects, investigating crime scenes and accidents,
traffic management and flow, and search and rescue after a disaster.legal issues
regarding the use of drones and airspace distribution problems are described. Legal
issues include the privacy concerns raised by the public, with the police gaining
increasing power and authority. Airspace distribution raises concerns about how high
a drone is allowed to go.
Other surveillance methods include face recognition, license plate recognition, and
body cams. the authors indicated that facial recognition can be used to obtain the
profile of suspects and analyze it from different databases to obtain more information.
Similarly, a license plate reader can be used to access data about a car possibly
involved in a crime. They may even use body cams to see more than what the human
eye can see, meaning that the reader observes everything a police officer sees and
records it. Normally, when we see an object, we cannot recollect the complete image
of it. the impact of body cams was studied in terms of officer misconduct and
domestic violence when the police are making an arrest. Body cams are thus being
worn by patrol officers. the authors also mentioned how protection against wrongful
police practices is provided. However, the use of body cams does not stop here, as
other primary reasons for having a body camera on at all times is to record the
happenings in front of the wearer in hopes of record useful events during daily
activities or during important operations.
Although each of these methods is effective, one point they share in common is that
they all work individually, and while the police can use any of these approaches
individually or concurrently, having a machine that is able to incorporate the positive
aspects of all of these technologies would be highly beneficial.
ML techniques used in crime prediction
a comparative study was carried out between violent crime patterns from the
Communities and Crime Unnormalized Dataset versus actual crime statistical data
using the open source data mining software Waikato Environment for Knowledge
Analysis (WEKA). Three algorithms, namely, linear regression, additive regression,
and decision stump, were implemented using the same finite set of features on
communities and actual crime datasets. Test samples were randomly selected. The
linear regression algorithm could handle randomness to a certain extent in the test
samples and thus proved to be the best among all three selected algorithms. The scope
of the project was to prove the efficiency and accuracy of ML algorithms in predicting
violent crime patterns and other applications, such as determining criminal hotspots,
creating criminal profiles, and learning criminal trends.
When considering WEKA the integration of a new graphical interface called

Knowledge Flow is possible, which can be used as a substitute for Internet Explorer.
IT provides a more concentrated view of data mining in association with the process
orientation, in which individual learning components (represented by java beans) are
used graphically to show a certain flow of information. The authors then describe
another graphical interface called an experimenter, which as the name suggests, is
designed to compare the performance of multiple learning schemes on multiple data
sets.
the potential of applying a predictive analysis of crime forecasting in an urban context

is studied. Three types of crime, namely, home burglary, street robbery, and battery,
were aggregated into grids of 200 m × 250 m and retrospectively analyzed. Based on
the crime data of the previous 3 years, an ensemble model was applied to synthesize
the results of logistic regression and neural network models in order to obtain
fortnightly and monthly predictions for the year 2014. The predictions were evaluated
based on the direct hit rate, precision, and prediction index. The results of the
fortnightly predictions indicate that by applying a predictive analysis methodology to
the data, it is possible to obtain accurate predictions. They concluded that the results
can be improved remarkably by comparing the fortnightly predictions with the
monthly predictions with a separation between day and night.
crime predictions were investigated based on ML. Crime data of the last 15 years in
Vancouver (Canada) were analyzed for prediction. This machine-learning-based
crime analysis involves the collection of data, data classification, identification of
patterns, prediction, and visualization. K-nearest neighbor (KNN) and boosted
decision tree algorithms were also implemented to analyze the crime dataset. In their
study, a total of 560,000 crime datasets between 2003 and 2018 were analyzed, and
crime prediction with an accuracy of between 39% and 44% was obtained by
predicting the crime using ML algorithms. The accuracy was low as a prediction
model, but the authors concluded that the accuracy can be increased or improved by
tuning both the algorithms and crime data for specific applications.
a ML approach is presented for the prediction of crime-related statistics in

Philadelphia, United States. The problem was divided into three parts: determining
whether the crime occurs, occurrence of crime and most likely crime. Algorithms
such as logistic regression, KNN, ordinal regression, and tree methods were used to
train the datasets to obtain detailed quantitative crime predictions with greater
significance. They also presented a map for crime prediction with different crime
categories in different areas of Philadelphia for a particular time period with different
colors indicating each type of crime. Different types of crimes ranging from assaults
to cyber fraud were included to match the general pattern of crime in Philadelphia for
a particular interval of time. Their algorithm was able to predict whether a crime will
occur with an astonishing 69% accuracy, as well as the number of crimes ranging
from 1 to 32 with 47% accuracy.
the authors analyzed a dataset consisting of several crimes and predicted the type of
crime that may occur in the near future depending on various conditions. ML and data
science techniques were used for crime prediction in a crime dataset from Chicago,
United States. The crime dataset consists of information such as the crime location
description, type of crime, date, time, and precise location coordinates. Different
combinations of models, such as KNN classification, logistic regression, decision
trees, random forest, a support vector machine (SVM), and Bayesian methods were
tested, and the most accurate model was used for training. The KNN classification
proved to be the best with an accuracy of approximately 0.787. They also used
different graphs that helped in understanding the various characteristics of the crime
dataset of Chicago. The main purpose of this paper is to provide an idea of how ML
can be used by law enforcement agencies to predict, detect, and solve crime at a much
better rate, which results in a reduction in crime.
a graphical user interface-based prediction of crime rates using a ML approach is

presented. The main focus of this study was to investigate machine-learning-based
techniques with the best accuracy in predicting crime rates and explore its
applicability with particular importance to the dataset. Supervised ML techniques
were used to analyze the dataset to carry out data validation, data cleaning, and data
visualization on the given dataset. The results of the different supervised ML
algorithms were compared to predict the results. The proposed system consists of data
collection, data preprocessing, construction of a predictive model, dataset training,
dataset testing, and a comparison of algorithms, as shown in Fig. 1. The aim of this
study is to prove the effectiveness and accuracy of a ML algorithm for predicting
violent crimes.
Result:
After finding and understanding various distinct methods used by the police for
surveillance purposes, we determined the importance of each method. Each
surveillance method can perform well on its own and produce satisfactory results,
although for only one specific characteristic, that is, if we use a Sting Ray, it can help
us only when the suspect is using a phone, which should be switched on. Thus, it is
only useful when the information regarding the stake out location is correct. Based on
this information, we can see how the ever-evolving technology has yet again
produced a smart way to conduct surveillance. The introduction of deep learning, ML,
and computer vision techniques has provided us with a new perspective on ways to
conduct surveillance. This is an intelligent approach to surveillance because it tries to
mimic a human approach, but it does so 24 h a day, 365 days a year, and once it has
been taught how to do things it does them in the same manner repeatedly.
Although we have discussed the aspects that ML and computer vision can achieve,
but what are these aspects essentially? This brings us to the main point of our paper
discussion, i.e., our proposed idea, which is to combine the point aspects of Sting Ray,
body cams, facial recognition, number plate recognition, and stakeouts. New features
iclude core analytics, neural networks, heuristic engines, recursion processors,
Bayesian networks, data acquisition, cryptographic algorithms, document processors,
computational linguistics, voiceprint identification, natural language processing, gait
analysis, biometric recognition, pattern mining, intel interpretation, threat detection,
threat classification. The new features are completely computer dependent and hence
require human interaction for development; however, once developed, it functions
without human interaction and frees humans for other tasks. Let us understand the use
of each function.
Chapter 3
EXISTING SYSTEM:
 finding and understanding various distinct methods used by the police for
surveillance purposes, we determined the importance of each method.
 Each surveillance method can perform well on its own and produce satisfactory
results, although for only one specific characteristic, that is, if we use a Sting
Ray, it can help us only when the suspect is using a phone, which should be
switched on.
 Thus, it is only useful when the information regarding the stake out location is
correct. Based on this information, we can see how the ever-evolving technology
has yet again produced a smart way to conduct surveillance.
 The introduction of deep learning, ML, and computer vision techniques has
provided us with a new perspective on ways to conduct surveillance.
PROBLEM STATEMENT
 Crimes are of different type – robbery, murder, rape, assault, battery, false
imprisonment, kidnapping, homicide. Since crimes are increasing there is a need
to solve the cases in a much faster way. The crime activities have been increased
at a faster rate and it is the responsibility of police department to control and
reduce the crime activities
 The aim of this project is to make crime prediction using the features present in
the dataset. The dataset is extracted from the official sites. With the help of
machine learning algorithm, using python as core we can predict the type of
crime which will occur in a particular area.
Propose system:
• Our proposed approach uses svm filtering methods to detect the crime-related
post from the social media data set.
• There are four main phases. In the first phase, social media text posts that
relate to the crimes are extracted. In the second phase, we applied the
preprocessing techniques to clean the data set.
• Then, we calculated the TF- IDF values for each pre-processed post in the
third phase. Finally, SVM- Based Filter is applied to remove non-related data.
• Then random forest classifier is used for classification to categorize the data.
Block diagram
Flow diagram
We can also use some free data sets which are present on the internet. Kaggle and UCI
Machine learning Repository are the repositories that are used the most for making
Machine learning models. Kaggle is one of the most visited websites that is used for
practicing machine learning algorithms, they also host competitions in which people
can participate and get to test their knowledge of machine learning.
2. Data pre-processing
Data pre-processing is one of the most important steps in machine learning. It is the
most important step that helps in building machine learning models more accurately. In
machine learning, there is an 80/20 rule. Every data scientist should spend 80% time
for data per-processing and 20% time to actually perform the analysis.
What is data pre-processing?
Data pre-processing is a process of cleaning the raw data i.e. the data is collected in the
real world and is converted to a clean data set. In other words, whenever the data is
gathered from different sources it is collected in a raw format and this data isn‘t
feasible for the analysis.
Therefore, certain steps are executed to convert the data into a small clean data set, this
part of the process is called as data pre-processing.
Why do we need it?
As we know that data pre-processing is a process of cleaning the raw data into clean
data, so that can be used to train the model. So, we definitely need data pre-processing
to achieve good results from the applied model in machine learning and deep learning
projects.
Most of the real-world data is messy, some of these types of data are:
1. Missing data: Missing data can be found when it is not continuously created or due
to technical issues in the application (IOT system).

Sat - 63.Pdf - Crime Detction Using Machine Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sat - 63.Pdf - Crime Detction Using Machine Learning

Uploaded by

Copyright:

Available Formats

 Crimes are of different type – robbery, murder, rape, assault, battery,

false imprisonment, kidnapping, homicide. Since crimes are increasing

2.2Crime forecasting: a machine learning and computer vision approach to

Computer vision is a branch of artificial intelligence that trains the computer to

ML techniques used in crime prediction

When considering WEKA the integration of a new graphical interface called

the potential of applying a predictive analysis of crime forecasting in an urban context

a ML approach is presented for the prediction of crime-related statistics in

a graphical user interface-based prediction of crime rates using a ML approach is

What is data pre-processing?

feasible for the analysis.

Why do we need it?

You might also like