Crime Analysis System

Crime Analysis System
1 INTRODUCTION
Crime is one of the biggest and dominating problem in our society and its prevention is an important.
task. Daily there are huge numbers of crimes committed frequently. This require keeping track of all
the crimes and maintaining a database for same which may be used for future reference. The current
problem faced are maintaining of proper dataset of crime and analyzing this data to help in predicting
and solving crimes in future. The objective of this project is to analyze dataset which consist of
numerous crimes and predicting the type of crime which may happen in future depending upon
various conditions. In this project, we will be using the technique of machine learning and data
science for crime prediction of India crime data set. The crime data is extracted from the official
portal of India police. It consists of crime information like location description, type of crime, date,
time, latitude, longitude. Before training of the model data preprocessing will be done following this
feature selection and scaling will be done so that accuracy obtain will be high. The K-Nearest
Neighbor (KNN) classification and various other algorithms will be tested for crime prediction and
one with better accuracy will be used for training. Visualization of dataset will be done in terms of
graphical representation of many cases for example at which time the criminal rates are high or at
which month the criminal activities are high. The soul purpose of this project is to give a jest idea of
how machine learning can be used by the law enforcement agencies to detect, predict and solve
crimes at a much faster rate and thus reduces the crime rate. It not restricted to India, this can be used
in other states or countries depending upon the availability of the dataset.
Abstract
Crime is one of the biggest and dominating problem in our society and its prevention is an
important. task. Daily there are huge numbers of crimes committed frequently. This require
keeping track of all the crimes and maintaining a database for same which may be used for
future reference. The current problem faced are maintaining of proper dataset of crime and
analyzing this data to help in predicting and solving crimes in future. The objective of this
project is to analyze dataset which consist of numerous crimes and predicting the type of
crime which may happen in future depending upon various conditions. In this project, we will
be using the technique of machine learning and data science for crime prediction of India
crime data set. The crime data is extracted from the official portal of India police. It consists
of crime information like location description, type of crime, date, time, latitude, longitude.
Before training of the model data pre-processing will be done following this feature selection
and scaling will be done so that accuracy obtain will be high. The K-Nearest Neighbor
(KNN) classification and various other algorithms will be tested for crime prediction and one
with better accuracy will be used for training.
Crimes are the significant threat to the humankind. There are many crimes that happens
regular interval of time. Perhaps it is increasing and spreading at a fast and vast rate. Crimes
happen from small village, town to big cities. Crimes are of different type – robbery, murder,
rape, assault, battery, false imprisonment, kidnapping, homicide. Since crimes are increasing
there is a need to solve the cases in a much faster way. The crime activities have been
increased at a faster rate and it is the responsibility of police department to control and reduce
the crime activities. Crime prediction and criminal identification are the major problems to
the police department as there are tremendous amount of crime data that exist. There is a
need of technology through which the case solving could be faster.
The above problem made me to go for a research about how can solving a crime case made
easier. Through many documentation and cases, it came out that machine learning and data
science can make the work easier and faster. The aim of this project is to make crime
prediction using the features present in the dataset. The dataset is extracted from the official
sites. With the help of machine learning algorithm, using python as core we can predict the
type of crime which will occur in a particular area. The objective would be to train a model
for prediction. The training would be done using the training data set which will be validated
using the test dataset. Building the model will be done using better algorithm depending upon
the accuracy. The Convolutional Neural Network (CNN) classification and other algorithm
will be used for crime prediction. Visualization of dataset is done to analyze the crimes which
may have occurred in the country. This work helps the law enforcement agencies to predict
and detect crimes in India with improved accuracy and thus reduces the crime rate.
Problem Statement
Crimes are the significant threat to the humankind. There are many crimes that happens
regular interval of time. Perhaps it is increasing and spreading at a fast and vast rate. Crimes
happen from small village, town to big cities. Crimes are of different type –robbery, murder,
rape, assault, battery, false imprisonment, kidnapping, homicide. Since crimes are increasing
there is a need to solve the cases in a much faster way. The crime activities have been
increased at a faster rate and it is the responsibility of police department to control and reduce
the crime activities. Crime prediction and criminal identification are the major problems to
the police department as there are tremendous amount of crime data that exist. There is a
need of technology through which the case solving could be faster.
Proposed Solution
Predictive modeling is the way of building a model that is capable of making predictions. The
process includes a machine learning algorithm that learns certain properties from a training
dataset in order to make those predictions. Predictive modeling can be divided further into
two areas: Regression and pattern classification. Regression models are based on the analysis
of relationships between variables and trends in order to make predictions about continuous
variables. In contrast to regression models, the task of pattern classification is to assign
discrete class labels to particular data value as output of a prediction. Example of a
classification model is -A pattern classification task in weather forecasting could be the
prediction of a sunny, rainy, or snowy day. In this project, we will be using the technique of
machine learning and data science for crime prediction of India crime data set. The crime
data is extracted from the official portal of India police. It consists of crime information like
location description, type of crime, date, time, latitude, longitude. Before training of the
model data pre-processing will be done following this feature selection and scaling will be
done so that accuracy obtain will be high. The Convolutional Neural Network (CNN)
classification and various other algorithms will be tested for crime prediction and one with
better accuracy will be used for training.
System Requirements:
Hardware Requirements:
 Processor - Pentium –IV
 Speed - 1.1 GHz

 Ram - 256 MB
 Hard Disk - 20 GB
 Key Board - Standard Windows Keyboard
 Mouse - Two or Three Button Mouse
 Monitor - SVGA
Software Requirements:
 Operating System - Windows XP
 Coding Language - Python

2 SYSTEM ANALYSIS
2.1 Predictive Modeling
Predictive modeling is the way of building a model that is capable of making predictions. The
process includes a machine learning algorithm that learns certain properties from a training
dataset in order to make those predictions. Predictive modeling can be divided further into
two areas: Regression and pattern classification. Regression models are based on the analysis
of relationships between variables and trends in order to make predictions about continuous
variables. In contrast to regression models, the task of pattern classification is to assign
discrete class labels to particular data value as output of a prediction.
Example of a classification model is - A pattern classification task in weather forecasting

could be the prediction of a sunny, rainy, or snowy day. Pattern classification tasks can be
divided into two parts, Supervised and unsupervised learning. In supervised learning, the
class labels in the dataset, which is used to build the classification model, are known. In a
supervised learning problem, we would know which training dataset has the particular output
which will be used to train so that prediction can be made for unseen data.
Types of Predictive Models Algorithms
Classification and Decision TreesA decision tree is an algorithm that uses a tree shaped graph
or model of decisions including chance event outcomes, costs, and utility. It is one way to
display an algorithm. Naive Bayes -In machine learning, naive Bayes classifiers are a family
of simple probabilistic classifiers based on applying Bayes theorem with independence
assumptions between the features. The technique constructs classifier models that assign class
labels to problem instances, represented as vectors of feature values, where the class labels
are drawn from some finite set.
Linear Regression – The analysis is a statistical process for estimating the relationships
among variables. Linear regression is an approach for modelling the relationship between a
scalar dependent variable Y and one or more explanatory variables denoted X. The case of
one explanatory variable is called simple linear regression. More than one variable is called
multivariate. Logistic Regression - In statistics, logistic regression, is a regression model
where the dependent variable is categorical or binary. Data Preprocessing This process
includes methods to remove any null values or infinite values which may affect the accuracy
of the system. The main steps includeFormatting, cleaning and sampling.Cleaning process is
used for removal or fixing of some missing data there may be data that are incomplete.
Sampling is the process where appropriate data are used which may reduce the running time
for the algorithm. Using python, the preprocessing is done.
2.2 Functional Diagram of Proposed Work
It can be divided into 4 parts:
1. Descriptive analysis on the Data
2. Data treatment (Missing value and outlier fixing)
3. Data Modelling
4. Estimation of performance
Prepare Data
1. In this step we need prepare data into right format for analysis
2.3 Data cleaning
Analyze and Transform Variables We may need to transform the variables using one of the
approaches
1. Normalization or standardization
2. Missing Value Treatment
Random Sampling (Train and Test)

• Training Sample: Model will be developed on this sample. 70% or 80% of the data goes
here.
• Test Sample: Model performances will be validated on this sample. 30% or 20% of the data
goes here
Model Selection
Based on the defined goal(s) (supervised or unsupervised) we have to select one of or
combinations of modeling techniques. Such as
• KNN Classification
• Logistic Regression
• Decision Trees
• Random Forest
• Support Vector Machine (SVM)
• Bayesian methods
Build/Develop/Train Models
 Validate the assumptions of the chosen algorithm
 Develop/Train Model on Training Sample, which is the available data(Population)
 Check Model performance - Error, Accuracy
Validate/Test Model
 Score and Predict using Test Sample
 Check Model Performance: Accuracy etc
Algorithm Explanation
CNNs enable analysis and extraction of the spatial information in images, building from fine
grained details into higher level structures. The temporal dimension can be added to these
networks by adding a third axis to the convolutional kernels. This work demonstrates how
CNNs can be used to interpret the output of Numerical Weather Prediction (NWP)
automatically to generate local forecasts.
The main outcomes of this work are:
• CNNs are able to provide a model to interpret numerical weather model fields directly and
to generate local weather forecasts.
• Class Activation Mapping (CAM) provides a valuable mechanism to assess the spatial and
temporal correlations of the different fields visually, helping to introspect and develop new
models.
• 3D convolutions can naturally incorporate the temporal component into neural networks,
significantly improving the accuracy of the results.
3 LITERATURE REVIEW
Complex relationships between these factors. With the development of high-performance
computers, the cost of these machine learning techniques was drastically reduced. They
achieved a great deal of success in many domains. In particular, deep learning, a relatively
recent development in artificial intelligence, has achieved impressive results with many types
of classification problems, ranging from speech to visual recognition. One domain that has
not received much attention within deep learning is crime prediction. Deep learning is well
suited to handle the temporal and spatial components of the problem as well as a large feature
set that can be constructed using relevant public datasets in addition to crime datasets. In
addition, the inherent complexity of the problem makes it suitable for deep learning.
Using neural networks reduces the need for extensive feature engineering found in some
previous work and allows for training over large datasets. In order to address the temporal
aspects of crime prediction, a Recurrent Neural Network (RNN) can be used. In a traditional
neural network, there is the assumption that all inputs and outputs are independent of one
another, but for crime this is clearly not the case. Crimes that happened yesterday will affect
what happens today. For example, if there is an increase in thefts in an area then it is possible
that the area is being targeted and will continue to have theft rates higher than normal for a
period of time.
RNNs apply the same weights to each element of a sequence while also having a memory of
past states of the sequence. This memory encodes the sequence up to the given point in time
(it is also called the hidden state). This means a given output at a time depends on the current
input at the time and the hidden state containing the history up until the time, allowing for
greater predictive power. For the spatial aspects of crime prediction, a Convolution Neural
Network (CNN) is fitted. CNNs are widely used and successful in image classification. For
this task, the image is input as a grid of pixels with three channels, corresponding to RGB
values.
For our purposes, we can think of a city grid with relevant data in each grid in terms of an
image. Each grid cell corresponds to a pixel, and each different feature in that cell
corresponds to a different channel. CNNs slide filters across an image to get local responses
from the image, and then use a max pooling over these responses to reduce the
dimensionality of the problem. As dimensionality is reduced, the features extracted go from
highly local to a global view built off the aggregate of the local features. Since crime is not
randomly distributed geographically, taking into account what is happening in neighbouring
cells is useful for predictions in the main cell. While RNNs and CNNs capture temporal and
spatial aspects of the problem separately, they can be combined into a single network that
captures both. Network architecture is flexible and so it is possible to include both types of
layers into one network. The most important spatial features at a time are placed into a grid
and passed through the CNN layers, and the output of the CNN is combined with the rest of
the features. The resulting vector is then treated as input at the time for the RNN.
Data mining is a new information technology as the development of database technology and
artificial intelligence technology. Data mining is the process of extracting the hidden
information and knowledge that people do not know in advance but potentially useful. The
aim is to discover unknown relationships and summarize data in the innovative way of
understanding by the data owner and value, and predict possible future behavior, so as to
provide stronger support for decision.
making. According to the different forms of the main data structure, data mining can be
divided into three categories in general, that is data mining, Web data mining and text data
mining. Data mining. The data mining is for structured data, such as SQL, Server, Oracle,
Informix and other data or data warehouse. At present, the following software DB2
Intelligent Miner for Data SAS Enterprise Miner of IBM can be used. Web mining. The
object of data mining is a traditional database or data warehousing, and Web data mining is
the various Web data including Web pages, structure between pages, the user access to
information, and business transaction information, which is to discover useful knowledge to
help people extract knowledge from the World Wide Web, improving site design and
ecommerce to better or improve services. Web data mining can be divided into Web content
mining, Web usage mining and Web structure mining. Text data mining. When the objects of
data mining is composed entirely of text type, the process of automated information
processing and analysis massive text information is called text data mining, which is in the
way of data mining algorithms and information retrieval algorithms. It includes feature
extraction, text summarization, text classification and clustering, concept operations and
exploratory data analysis. The techniques of text data mining contains the vector of word
frequency, the word string representation, Bayesian classifier, Bag of word, text clustering
algorithm based on the concept and classification algorithm of K - the nearest neighbor. Data
mining technology and its application is a hot topic in the international arena currently, and it
has been applied in many industries, which demonstrates its advantages and development
potential. In the area of information management, it is the only way to achieve development
of knowledge retrieval and knowledge management, as long as we integrate data mining
technology with artificial intelligence technology, access to user knowledge, literature and
other kinds of knowledge. Data mining is to discover and extract hidden information from
large databases and vast network information space in the digital library, and the purpose is to
help information workers search for the potential association between the data and find the
neglected elements, which is very useful to predict trends and make decision.
Inductive learning method. Inductive learning methods include information methods or
decision tree and set theory methods. Decision tree uses attribute structure to represent the
decision-making set, and these decision set products rules through classification of data sets.
Typical methods of decision tree have classification and regression tree and mining used in
classification rules. Decision tree classification method is faster and more easily converted
into simple understandable classification rules, etc., especially in problem areas of high
dimension, it can get a very good classification results. Set theory method is a new
mathematical tool dealing with vague and uncertain problems. It has some advantages of a
strong mathematical foundation, simple method, highly targeted and small computation.
Using this method it can handle the following problems, including data simplification, data
association discovery, data meaning evaluation and approximate analysis of data.
Imitation biotechnology method. Imitation biotechnology method includes neural networks
and genetic algorithms. Artificial neural networks imitating biological neural network in
structure, is a nonlinear prediction model through training, and can be used for classification,
clustering, feature extraction and other operations. Genetic algorithm is an optimization
technique, it uses a series of concepts of biological evolution to search issues, and ultimately
to optimize. It calculates the fitness of individuals after genetic, then carries out chromosome
replication, exchange and mutation, etc., produces new individuals, repeats this operation
until obtains the best or better individual. In data mining, it is often expressed as a search
problem, and uses powerful search capabilities of genetic algorithms to find the optimal
solution. Formula discovery. Formula discovery includes discovery system BACON of
physical laws and discovery system FDD of empirical formula. It identifies the new record
through combination of K similar historical records, and this technique can be used as data
mining tasks of clustering and deviation analysis. Statistical analysis. Statistical analysis is to
extract unknown mathematical model from samples analysis. In data mining, it often involves
a certain statistical procedure, judgment hypothesis and errors control.
3 SYSTEM DESIGN
System design is transition from a user oriented document to programmers or data base
personnel. The design is a solution, how to approach to the creation of a new system. This is
composed of several steps. It provides the understanding and procedural details necessary for
implementing the system recommended in the feasibility study. Designing goes through
logical and physical stages of development, logical design reviews the present physical
system, prepare input and output specification, details of implementation plan and prepare a
logical design walkthrough.
The database tables are designed by analyzing functions involved in the system and
format of the fields is also designed. The fields in the database tables should define their role
in the system. The unnecessary fields should be avoided because it affects the storage areas of
the system. Then in the input and output screen design, the design should be made user
friendly. The menu should be precise and compact.
Software Design:
In designing the software following principles are followed:
Modularity and partitioning:
Software is designed such that, each system should consists of hierarchy of modules and
serve to partition into separate function.
Coupling:
Modules should have little dependence on other modules of a system.
Cohesion:
Modules should carry out in a single processing function.
Shared use:
Avoid duplication by allowing a single module is called by other that needs the function it
provides.
Data Flow Diagrams:
A Data Flow Diagram (DFD) is also known as a Process Model. Process Modeling is
an analysis technique used to capture the flow of inputs through a system (or group of
processes) to their resulting output. The model is fairly simple in that there are only four
types of symbols.
Process, dataflow, external entity, data store. Process Modeling is used to visually
represent what a system is doing. It is much easier to look at a picture and understand the
essence than to read through verbiage describing the activities. System Analyst after talking
with various users will create DFD diagrams and then show them to users to verify that their
understanding is correct. The process models can be created to represent an existing system
as well as a proposed system.
DATA FLOW DIAGRAM
The data flow diagrams are very helpful in determining the flow of data in an application.
Figure 1: Crime Analysis - Detailed Data Flow diagram

Activity Diagrams:
Figure 2: Crime Analysis - Activities of user

Use-Case Diagram
Figure 3: Crime Analysis - Use Case

Sequence Diagram
Figure 4: Crime Analysis - Sequence of Actions

Object Diagram:
Figure 5: Crime Analysis - Objects

8. Deployment Diagram
Figure 5: Crime Analysis – Deploying Application

4. IMPLEMENTATION
Dataset:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
dataset=pd.read_csv('data.csv')
data=pd.read_csv('data.csv')
dataset.head()
for col in data:

print (type(data[col][1]))
data['timestamp'] = data['timestamp'].map(lambda x: pd.to_datetime(x,

errors='coerce'))#pd.to_datetime(data['timestamp'], error='ignore')
data['timestamp'] = pd.to_datetime(data['timestamp'], format = '%d/%m/%Y %H:%M:%S')
# DATE TIME STAMP FUNCTION

column_1 = data.ix[:,0]
db=pd.DataFrame({"year": column_1.dt.year,
"month": column_1.dt.month,
"day": column_1.dt.day,
"hour": column_1.dt.hour,
"dayofyear": column_1.dt.dayofyear,
"week": column_1.dt.week,
"weekofyear": column_1.dt.weekofyear,
"dayofweek": column_1.dt.dayofweek,
"weekday": column_1.dt.weekday,
"quarter": column_1.dt.quarter,
})
dataset1=dataset.drop('timestamp',axis=1)
data1=pd.concat([db,dataset1],axis=1)
data1.info()
data1.dropna(inplace=True)
data1.head()
sns.boxplot(x='act379' ,y='hour' ,data=data1, palette='winter_r')
plt.show()
data1.plot.hexbin(x='act13',y='hour',gridsize=25)
plt.show()
#Choose data
X=data1.iloc[:,[1,2,3,4,6]].values
y=data1.iloc[:,[10,11,12,13,14,15]].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=50)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=10)
knn.fit(X_train,y_train)
print("Score: ",knn.score(X_test,y_test))
print("Score: ",knn.score(X_train,y_train))
data1.to_csv("result.csv",index=False)
Output:
<class 'str'>
<class 'numpy.int64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
E:\Academic Projects 2020\crime\Final\Phase-1.py:17: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-
deprecated
column_1 = data.ix[:,0]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2090 entries, 0 to 2089
Data columns (total 18 columns):
year 2068 non-null float64
month 2068 non-null float64
day 2068 non-null float64
hour 2068 non-null float64

dayofyear 2068 non-null float64
week 2068 non-null float64
weekofyear 2068 non-null float64
dayofweek 2068 non-null float64
weekday 2068 non-null float64
quarter 2068 non-null float64
act379 2090 non-null int64
latitude 2090 non-null float64
longitude 2090 non-null float64
dtypes: float64(12), int64(6)
memory usage: 294.0 KB

memory usage: 294.0 KB
Score: 0.5797101449275363
Score: 0.6463119709794438
CNN Implementation:
# first neural network with keras tutorial
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
# load the dataset
dataset = pd.read_csv('result.csv')
# split into input (X) and output (y) variables
#X =
dataset.drop(['year','month','day','hour','dayofyear','week','weekofyear','dayofweek','
weekday','quarter', 'act363','act302','latitude','longitude'],axis=1)
#X =
dataset.drop(['hour','weekofyear','dayofweek','weekday','quarter','act379','act13','act
279','act323','act363','act302','latitude','longitude'],axis=1)
#y = dataset[["hour","week"]]
X=dataset.iloc[:,[1,2,3,4,6]].values
y=dataset.iloc[:,[10,11,12,13,14,15]].values
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=5, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(6, activation='sigmoid'))
# compile the keras model

model.compile(loss='binary_crossentropy', optimizer='adam'
, metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=1000, batch_size=50)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
Output:
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From C:\Program Files\Python37\lib\site-

packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from
tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2020-03-08 17:52:02.117894: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU

supports instructions that this TensorFlow binary was not compiled to use: AVX2
Epoch 1/1000
50/2068 [..............................] - ETA: 6s - loss: 3.9424 - accuracy: 0.5067
2068/2068 [==============================] - 0s 90us/step - loss: 3.3186 - accuracy: 0.6560
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000

Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000

Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000

Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000

Epoch 32/1000
Epoch 701/1000
Epoch 702/1000
Epoch 703/1000
Epoch 704/1000
Epoch 705/1000
Epoch 706/1000
Epoch 707/1000
Epoch 708/1000

Epoch 709/1000
Epoch 710/1000
Epoch 711/1000
Epoch 712/1000
Epoch 713/1000
Epoch 714/1000
Epoch 715/1000
Epoch 716/1000
Epoch 717/1000

Epoch 718/1000
Epoch 719/1000
Epoch 720/1000
Epoch 721/1000
Epoch 722/1000
Epoch 723/1000
Epoch 724/1000
Epoch 725/1000
Epoch 726/1000

Epoch 727/1000
Epoch 728/1000
Epoch 729/1000
Epoch 730/1000
Epoch 731/1000
Epoch 732/1000
Epoch 733/1000
Epoch 734/1000
Epoch 735/1000

Epoch 736/1000
Epoch 737/1000
Epoch 738/1000
Epoch 739/1000
Epoch 740/1000
Epoch 741/1000
Epoch 742/1000
Epoch 743/1000
Epoch 744/1000

Epoch 745/1000
Epoch 746/1000
Epoch 747/1000
Epoch 748/1000
Epoch 749/1000
Epoch 750/1000
Epoch 751/1000
Epoch 752/1000
Epoch 753/1000

Epoch 754/1000
Epoch 755/1000
Epoch 756/1000
Epoch 757/1000
Epoch 758/1000
Epoch 759/1000
Epoch 760/1000
Epoch 761/1000
Epoch 762/1000

Epoch 763/1000
Epoch 764/1000
Epoch 765/1000
Epoch 766/1000
Epoch 767/1000
Epoch 768/1000
Epoch 769/1000
Epoch 770/1000
Epoch 771/1000

Epoch 772/1000
Epoch 773/1000
Epoch 774/1000
Epoch 775/1000
Epoch 776/1000
Epoch 777/1000
Epoch 778/1000
Epoch 779/1000
Epoch 780/1000

Epoch 781/1000
Epoch 782/1000
Epoch 783/1000
Epoch 784/1000
Epoch 785/1000
Epoch 786/1000
Epoch 787/1000
Epoch 788/1000
Epoch 789/1000

Epoch 790/1000
Epoch 791/1000
Epoch 792/1000
Epoch 793/1000
Epoch 794/1000
Epoch 795/1000
Epoch 796/1000
Epoch 797/1000
Epoch 798/1000

Epoch 799/1000
Epoch 800/1000
Epoch 801/1000
Epoch 802/1000
Epoch 803/1000
Epoch 804/1000
Epoch 805/1000
Epoch 806/1000
Epoch 807/1000

Epoch 808/1000
Epoch 809/1000
Epoch 810/1000
Epoch 811/1000
Epoch 812/1000
Epoch 813/1000
Epoch 814/1000
Epoch 815/1000
Epoch 816/1000

Epoch 817/1000
Epoch 818/1000
Epoch 819/1000
Epoch 820/1000
Epoch 821/1000
Epoch 822/1000
Epoch 823/1000
Epoch 824/1000
Epoch 825/1000

Epoch 826/1000
Epoch 827/1000
Epoch 828/1000
Epoch 829/1000
Epoch 830/1000
Epoch 831/1000
Epoch 832/1000
Epoch 833/1000
Epoch 834/1000

Epoch 835/1000
Epoch 836/1000
Epoch 837/1000
Epoch 838/1000
Epoch 839/1000
Epoch 840/1000
Epoch 841/1000
Epoch 842/1000
Epoch 843/1000

Epoch 844/1000
Epoch 845/1000
Epoch 846/1000
Epoch 847/1000
Epoch 848/1000
Epoch 849/1000
Epoch 850/1000
Epoch 851/1000
Epoch 852/1000

Epoch 853/1000
Epoch 854/1000
Epoch 855/1000
Epoch 856/1000
Epoch 857/1000
Epoch 858/1000
Epoch 859/1000
Epoch 860/1000
Epoch 861/1000

Epoch 862/1000
Epoch 863/1000
Epoch 864/1000
Epoch 865/1000
Epoch 866/1000
Epoch 867/1000
Epoch 868/1000
Epoch 869/1000
Epoch 870/1000

Epoch 871/1000
Epoch 872/1000
Epoch 873/1000
Epoch 874/1000
Epoch 875/1000
Epoch 876/1000
Epoch 877/1000
Epoch 878/1000
Epoch 879/1000

Epoch 880/1000
Epoch 881/1000
Epoch 882/1000
Epoch 883/1000
Epoch 884/1000
Epoch 885/1000
Epoch 886/1000
Epoch 887/1000
Epoch 888/1000

Epoch 889/1000
Epoch 890/1000
Epoch 891/1000
Epoch 892/1000
Epoch 893/1000
Epoch 894/1000
Epoch 895/1000
Epoch 896/1000
Epoch 897/1000

Epoch 898/1000
Epoch 899/1000
Epoch 900/1000
Epoch 901/1000
Epoch 902/1000
Epoch 903/1000
Epoch 904/1000
Epoch 905/1000
Epoch 906/1000

Epoch 907/1000
Epoch 908/1000
Epoch 909/1000
Epoch 910/1000
Epoch 911/1000
Epoch 912/1000
Epoch 913/1000
Epoch 914/1000
Epoch 915/1000

Epoch 916/1000
Epoch 917/1000
Epoch 918/1000
Epoch 919/1000
Epoch 920/1000
Epoch 921/1000
Epoch 922/1000
Epoch 923/1000
Epoch 924/1000

Epoch 925/1000
Epoch 926/1000
Epoch 927/1000
Epoch 928/1000
Epoch 929/1000
Epoch 930/1000
Epoch 931/1000
Epoch 932/1000
Epoch 933/1000

Epoch 934/1000
Epoch 935/1000
Epoch 936/1000
Epoch 937/1000
Epoch 938/1000
Epoch 939/1000
Epoch 940/1000
Epoch 941/1000
Epoch 942/1000

Epoch 943/1000
Epoch 944/1000
Epoch 945/1000
Epoch 946/1000
Epoch 947/1000
Epoch 948/1000
Epoch 949/1000
Epoch 950/1000
Epoch 951/1000

Epoch 952/1000
Epoch 953/1000
Epoch 954/1000
Epoch 955/1000
Epoch 956/1000
Epoch 957/1000
Epoch 958/1000
Epoch 959/1000
Epoch 960/1000

Epoch 961/1000
Epoch 962/1000
Epoch 963/1000
Epoch 964/1000
Epoch 965/1000
Epoch 966/1000
Epoch 967/1000
Epoch 968/1000
Epoch 969/1000

Epoch 970/1000
Epoch 971/1000
Epoch 972/1000
Epoch 973/1000
Epoch 974/1000
Epoch 975/1000
Epoch 976/1000
Epoch 977/1000
Epoch 978/1000

Epoch 979/1000
Epoch 980/1000
Epoch 981/1000
Epoch 982/1000
Epoch 983/1000
Epoch 984/1000
Epoch 985/1000
Epoch 986/1000
Epoch 987/1000

Epoch 988/1000
Epoch 989/1000
Epoch 990/1000
Epoch 991/1000
Epoch 992/1000
Epoch 993/1000
Epoch 994/1000
Epoch 995/1000
Epoch 996/1000

Epoch 997/1000
Epoch 998/1000
Epoch 999/1000
Epoch 1000/1000
32/2068 [..............................] - ETA: 2s
2068/2068 [==============================] - 0s 30us/step
Accuracy: 85.64
[Finished in 44.6s]
5. SYSTEM TESTING
Testing is a process, which reveals errors in the program. It is the major quality measure employed
during software development. During testing, the program is executed with a set of conditions
known as test cases and the output is evaluated to determine whether the program is performing as
expected.
Software testing is the process of testing the functionality and correctness of software by
running it. Process of executing a program with the intent of finding an error.
A good test case is one that has a high probability of finding an as yet undiscovered error. A
successful test is one that uncovers an as yet undiscovered error. Software testing is usually
performed for two reasons.
 Defect detection
 Reliability estimation
6.1 TESTING OBJECTIVES

 Testing is a process of executing a program with the intent of finding an error.
 A good test case is one that has a high probability of finding an as yet undiscovered.
 A successful test is one that uncovers an as yet undiscovered error.
6.2 TESTING PRINCIPLES:

 All tests should be traceable to customer requirements.
 Tests should be planned large before testing begins.
 Testing should begin “In the Small” and progress towards “In the Large”.
6.3 TYPES OF TESTING:
In order to make sure that the system does not have errors, the different levels of testing strategies
that are applied at differing phases of software development are:
Unit Testing:
Unit Testing is done on individual modules as they are completed and become executable. It is
confined only to the designer's requirements.
Each module can be tested using the following two strategies:
i) Black Box Testing:
In this strategy some test cases are generated as input conditions that fully execute all functional
requirements for the program. This testing has been uses to find errors in the following
categories:
a) Incorrect or missing functions

b) Interface errors
c) Errors in data structure or external database access
d) Performance errors
e) Initialization and termination errors.
In this testing only the output is checked for correctness. The logical flow of
the data is not checked.
ii) White Box testing:
In this the test cases are generated on the logic of each module by drawing flow graphs of that
module and logical decisions are tested on all the cases.
It has been uses to generate the test cases in the following cases:
a) Guarantee that all independent paths have been executed.

b) Execute all logical decisions on their true and false sides.
c) Execute all loops at their boundaries and within their operational bounds.
d) Execute internal data structures to ensure their valid
2. Integrating Testing:
Integration testing ensures that software and subsystems work together as a whole. It tests
the interface of all the modules to make sure that the modules behave properly when
integrated together.
3. System Testing:
Involves in-house testing of the entire system before delivery to the user. Its aim is to satisfy
the user the system meets all requirements of the client's specifications.
4. Acceptance Testing:
It is a pre-delivery testing in which entire system is tested at client's site on real world data to
find errors.
Validation:
The system has been tested and implemented successfully and thus ensured that all the
requirements as listed in the software requirements specification are completely fulfilled. In case of
erroneous input corresponding error messages are displayed.
COMPILING TEST
It was a good idea to do our stress testing early on, because it gave us time to fix some of
the unexpected deadlocks and stability problems that only occurred when components were
exposed to very high transaction volumes.
EXECUTION TEST
This program was successfully loaded and executed. Because of good

programming there were no execution errors.
OUTPUT TEST
The successful output screens are placed in the output screens section above.
6.4 TEST CASES:
Test case 1:Filter Test Priority (H, L):

High
Test Objective: To check whether the result value is accurate or not
Test Description: when a user selected dataset and applied algorithm for user based analysis and
the accuracy is good
Requirements Verified: Yes
Test Environment: Swing UI
Test Setup/Pre-Conditions: UI is up for execution
Actions Expected Results

The user selects input dataset and starts analysis Final calculated value is
displayed, and result is true
Pass: Yes Conditions pass: Yes Fail: No
Problems / Issues: NIL
Notes: Successfully Executed

High
Test Description: when a user selected dataset and applied regression and the accuracy is better
than the prior

The user selects input dataset and starts algorithm Final calculated value is

High
Test Description: when a user selected dataset and applied SVD and accuracy is good

The user selects input dataset and starts SVD Final calculated value is

6 Future work
With the help of machine learning technology, it has become easy to find out relation and
patterns among various data’s. The work in this project mainly revolves around predicting the
type of crime which may happen if we know the location of where it has occurred. Using the
concept of machine learning we have built a model using training data set that have
undergone data cleaning and data transformation. The model predicts the type of crime with
accuracy of 0.789. Data visualization helps in analysis of data set. The graphs include bar,
pie, line and scatter graphs each having its own characteristics. We generated many graphs
and found interesting statistics that helped in understanding India crimes datasets that can
help in capturing the factors that can help in keeping society safe.
REFERENCES
1. Yadav, S., Timbadia, M., Yadav, A., Vishwakarma, R., & Yadav, N. (2017, April). Crime
pattern detection, analysis & prediction. In Electronics, Communication and Aerospace
Technology (ICECA), 2017 International conference of (Vol. 1, pp. 225-230). IEEE.
2. Shamsuddin, N. H. M., Ali, N. A., & Alwee, R. (2017, May). An overview on crime
prediction methods. In Student Project Conference (ICT-ISPC), 2017 6th ICT International
(pp. 1-5). IEEE.
3. Sivaranjani, S., Sivakumari, S., & Aasha, M. (2016, October). Crime prediction and
forecasting in Tamilnadu using clustering approaches. In Emerging Technological Trends
(ICETT),
International Conference on (pp. 1-6). IEEE.
4. Sathyadevan, S., & Gangadharan, S. (2014, August). Crime analysis and prediction using
data mining. In Networks & Soft Computing (ICNSC), 2014 First International Conference
on (pp. 406-412). IEEE.
5. Nath, S. V. (2006, December). Crime pattern detection using data mining. In Web
intelligence and intelligent agent technology workshops, 2006. wi-iat 2006 workshops. 2006
ieee/wic/acm international conference on (pp. 41-44). IEEE.
6. Zhao, X., & Tang, J. (2017, November). Exploring Transfer Learning for Crime
Prediction. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on
(pp. 1158-1159). IEEE.
7. Al Boni, M., & Gerber, M. S. (2016, December). Area-Specific Crime Prediction Models.
In Machine Learning and Applications (ICMLA), 2016 15th IEEE International Conference
on (pp. 671-676). IEEE.
8. Tayebi, M. A., Gla, U., & Brantingham, P. L. (2015, May). Learning where to inspect:
location learning for crime prediction. In Intelligence and Security Informatics (ISI), 2015
IEEE International Conference on (pp. 25-30). IEEE.
Appendix

Crime Analysis System

Uploaded by

Copyright:

Available Formats

You might also like

Crime Analysis System

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Crime Analysis System

Uploaded by

Copyright:

Available Formats

Crime Analysis System

 Processor - Pentium –IV

 Speed - 1.1 GHz

 Operating System - Windows XP

 Coding Language - Python

2.1 Predictive Modeling

Example of a classification model is - A pattern classification task in weather forecasting

Types of Predictive Models Algorithms

2.2 Functional Diagram of Proposed Work

It can be divided into 4 parts:

1. Descriptive analysis on the Data

2. Data treatment (Missing value and outlier fixing)

2.3 Data cleaning

2. Missing Value Treatment

Random Sampling (Train and Test)

• Support Vector Machine (SVM)

 Check Model Performance: Accuracy etc

In designing the software following principles are followed:

Modularity and partitioning:

Modules should have little dependence on other modules of a system.

Modules should carry out in a single processing function.

DATA FLOW DIAGRAM

Figure 1: Crime Analysis - Detailed Data Flow diagram

Figure 2: Crime Analysis - Activities of user

Figure 3: Crime Analysis - Use Case

Figure 4: Crime Analysis - Sequence of Actions

Figure 5: Crime Analysis - Objects

Figure 5: Crime Analysis – Deploying Application

for col in data:

data['timestamp'] = data['timestamp'].map(lambda x: pd.to_datetime(x,

# DATE TIME STAMP FUNCTION

E:\Academic Projects 2020\crime\Final\Phase-1.py:17: FutureWarning:

.ix is deprecated. Please use

.loc for label based indexing or

.iloc for positional indexing

See the documentation here:

RangeIndex: 2090 entries, 0 to 2089

Data columns (total 18 columns):

year 2068 non-null float64

month 2068 non-null float64

day 2068 non-null float64

hour 2068 non-null float64

week 2068 non-null float64

weekofyear 2068 non-null float64

dayofweek 2068 non-null float64

weekday 2068 non-null float64

quarter 2068 non-null float64

act379 2090 non-null int64

act13 2090 non-null int64

act279 2090 non-null int64

act323 2090 non-null int64

act363 2090 non-null int64

act302 2090 non-null int64

latitude 2090 non-null float64

longitude 2090 non-null float64

dtypes: float64(12), int64(6)

memory usage: 294.0 KB