Professional Documents
Culture Documents
Dhanacrime Prediction2
Dhanacrime Prediction2
Dhanacrime Prediction2
On
in
by
Kammampati Dhanalakshmi
19H61A0580
This is to certify that the project entitled “Deep Learning Process In Analyzing Crimes
using Machine Learning” being submitted by Kammampati Dhanalakshmi bearing the Hall
Ticket number 19H61A0580 in partial fulfillment of the requirements for the award of the degree
of the Bachelor of Technology in Computer Science and Engineering to Anurag Group of
Institutions (Formerly CVSR College of Engineering) is a record of bonafide work carried out
by her under my guidance and supervision from June 2022 to October 2022.
The results presented in this project have been verified and found to be satisfactory. The
results embodied in this project report have not been submitted to any other University for the
award of any other degree or diploma.
i
It is my privilege and pleasure to express profound sense of respect, gratitude and
indebtedness to our guide Mr. Madar Bandu, Assistant Professor, Dept. of Computer Science and
Engineering, Anurag Group of Institutions (Formerly CVSR College of Engineering), for his
indefatigable inspiration, guidance, cogent discussion, constructive criticisms, and encouragement
throughout dissertation work.
I express my sincere gratitude to Dr. G. Vishnu Murthy, Professor & Head, Department
of Computer Science and Engineering, Anurag Group of Institutions (Formerly CVSR College of
Engineering), for his suggestions, motivations, and co-operation for the successful completion of
the work.
I extend my sincere thanks to Dr. V. Vijaya Kumar, Dean, Research and Development, Anurag
Group of Institutions, for his encouragement and constant help.
Kammampati Dhanalakshmi
(19H61A0580)
DECLARATION
I hereby declare that the project work entitled “Deep Learning Process In Analyzing
Crimes using Machine Learning” submitted to the Anurag Group of Institutions(Formerly
CVSR College of Engineering) in partial fulfillment of the requirements for the award of the
degree of Bachelor of Technology (B. Tech) in Computer Science and Engineering is a record of
an original work done by me under the guidance of Mr. Madar Bandu, Assistant Professor and
this project work have not been submitted to any other university for the award of any other degree
or diploma.
ii
Kammampati Dhanalakshmi
(19H61A0580)
ABSTRACT
iii
CONTENTS
S.NO
Pg.NO
5. Implementation 14
iv
5.1. Modules 14
6. Test Cases 22
7. Screen Shots 23
8. Conclusion 25
9. Future Enhancement 26
10. Bibliography 27
1
1.INTRODUCTION
Crime is a socio-economical problem affecting life quality and economic growth. The
specifics of how crime is conducted changes depending on the type of society and community.
Previous researches in crime prediction have found that factors like education, poverty,
employment, and climate affect the crime rate. Vancouver is one of the most populous,
ethnically-diverse, and multicultural urban cities in Canada. The overall crime rate in
Vancouver dropped 1.5% in 2017, but high vehicle break-ins and theft is still an issue.
Recently, the Vancouver Police Department (VPD) introduced a crime predictive model to
predict crimes related to property break-ins and, once implemented, the city of Vancouver
witnessed a 27% drop in residential break-ins. Crime prediction is a law enforcement technique
that uses data and statistical analysis for the identification of crimes most likely to occur. This
field has been subject to continued research in many parts of the world.
1.1 Motivation:
In this study, we are looking to predict the number of crimes that will occur in the future based
on the number of crimes which have occurred in the past. The amount of crimes committed is said
to be the product of the number of people committing crimes and the average frequency at which
they commit crimes. We want to know if we can make accurate predictions of future crimes using
deep learning. Crime has been a constant and problematic issue that causes socioeconomic disparity
for the society as a whole. When there are not enough police officers to enforce the law, civilians
can risk getting injured and their properties can become easy targets for criminals. The motivation
and benefit behind predicting future crimes is that it allows a city to better prepare for the future. It
can be a large undertaking to prepare and plan for the amount of crime that will occur in the future,
especially if a city does not have a well-funded police department or good amount of police officers.
The city can allocate resources and officers more effectively if they know how much crime to expect
in future weeks. Study into how and why crimes are committed have already been theorized by
criminologist. By studying these theories, we can better understand the behavioral patterns of
offenders and create a model that uses these patterns to predict crime at a certain location and time.
2
1.2 Problem Definition:
The challenge facing the crime analyst is how to extrapolate past crime data into the likelihoods
of future incidents occurring at specified locations in space and time. Ideally the analyst wants an
image map showing the intensities of future crime activities at each location within their
jurisdictional boundaries. Opinions or points of view are obviously useful in crime prevention.
Since they would allow the police to allocate resources to the areas of higher risk.
3
2.LITERATURE SURVEY
4
[3]. CRIME IN RELATION TO URBAN DESIGN.
AUTHORS: Heba Adel , Mohamed Saleem , Ran da Mahmoud
ABSTRACT:
Crime is a part of any social system and known to human communities since its origins. It
differs from community to another, even within one community it doesn’t occur equally in all places
and nor by the same way. It is also concentrated in some places more than others, sometimes
increases, sometimes decreases, etc. Previous researches have proved that crime rate has significant
correlation with different social factors: education levels, poverty rates and lack of social
organization, while others have drawn the attention to its relation with the built environment. They
proposed that crime occurs in places where both opportunities and criminals are available. The 4
role of this paper is to identify urban circumstances related to crime occurrence within the Greater
Cairo Region, and to propose different ways to reduce these crimes. Consecutively, agglomeration’s
main districts were scrutinized according to social analysis, street-network pattern and land-use.
5
[5]. MINING ROAD TRAFFIC ACCIDENT DATA TO IMPROVE SAFETY ROLE OF ROAD-
RELATED FACTORS ON ACCIDENT SEVERITY IN ETHIOPIA.
The convergence of public data and statistical modeling has created opportunities for public
safety officials to prioritize the deployment of scarce resources on the basis of predicted crime
patterns. Current crime prediction methods are trained using observed crime and information
describing various criminogenic factors. Researchers have favored global models (e.g., of entire
cities) due to a lack of observations at finer resolutions (e.g., ZIP codes). These global models and
their assumptions are at odds with evidence that the relationship between crime and criminogenic
factors is not homogeneous across space. In response to this gap, we present area-specific crime
prediction models based on hierarchical and multi-task statistical learning. Our models mitigate
sparseness by sharing information across ZIP codes, yet they retain the advantages of localized
models in addressing non-homogeneous crime patterns. Out-of-sample testing on real crime data
indicates predictive advantages over multiple state-of-the-art global models.
6
3.ANALYSIS
EXISTING TECHNIQUE: -
• Logistic Regression
TECHNIQUE DEFINITION: -
• It is used to estimate discrete values (Binary values like 0/1, yes/no, true/false) based on given set of
independent variable(s).
• In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function.
DRAWBACKS: -
• Over fitting the Model
• In the second approach, the neighbourhood and the day of the week during which the crime was
committed were given a binary number and marked as 1 when the crime happened on that day in that
neighbourhood, and 0 otherwise.
PROPOSED TECHNIQUE: -
7
• KNN and boosted decision tree
TECHNIQUE DEFINITION: -
• It is more widely used in classification problems in the industry.
• K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a
majority vote of its k neighbors.
ADVANTAGES: -
• Comprehensive Nature
8
3.3 Software Requirement Specification:
3.3.1. Purpose:
The purpose of this study is to identify, evaluate and understand the various machine learning
and statistical techniques suitable for solving crime related problems based on proactive detection and
prediction of crime and management.
3.3.2. Scope:
Proposed system will deal with crime detection, prediction and management. Using clustering
and data mining techniques and time series, the system will offer predictions of future crime
incidence. This will be done through graphical representation of crime trends and use of
geographical heat maps to represent concentration of data and hotspots in real time.
HARDWARE REQUIREMENTS:
The hardware requirements may serve as the basis for a contract for the implementation of the
system and should therefore be a complete and consistent specification of the whole system. They
are used by software engineers as the starting point for the system design. It should what the system
does and not how it should be implemented.
9
SOFTWARE REQUIREMENTS:
The software requirements document is the specification of the system. It should include both
a definition and a specification of requirements. It is a set of what the system should do rather than
how it should do it. The software requirements provide a basis for creating the software requirements
specification. It is useful in estimating cost, planning team activities, performing tasks and tracking the
teams and tracking the team’s progress throughout the development activity.
10
4.DESIGN
Design Engineering deals with the various UML [Unified Modelling language] diagrams for
the implementation of project. Design is a meaningful engineering representation of a thing that is to
be built. Software design is a process through which the requirements are translated into representation
of the software. Design is the place where quality is rendered in software engineering. Design is the
means to accurately translate customer requirements into finished product.
We prepare UML diagrams to understand the system in a better and simple way. A single
diagram is not enough to cover all the aspects of the system. UML defines various kinds of diagrams
to cover most of the aspects of a system.
You can also create your own set of diagrams to meet your requirements. Diagrams are generally
made in an incremental and iterative way.There are two broad categories of diagrams and
• Structural Diagrams
• Behavioral Diagrams
Structural Diagrams
The structural diagrams represent the static aspect of the system. These static aspects represent those
parts of a diagram, which forms the main structure and are therefore stable.
Behavioral Diagrams
Any system can have two aspects, static and dynamic. So, a model is considered as complete when
both the aspects are fully covered.
11
Behavioral diagrams basically capture the dynamic aspect of a system. Dynamic aspect can be
further described as the changing/moving parts of a system.
12
4.1.2. Class Diagram:
In this class diagram represents how the classes with attributes and methods are linked together
to perform the verification with security. From the above diagram shown the various classes involved
in our project.
13
4.1.3 Activity Diagram:
Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of components
in a system. An activity diagram shows the overall flow of control.
14
4.1.4 Sequence Diagram:
Sequence Diagrams are interaction diagrams that detail how operations are carried out. They
capture the interaction between objects in the context of a collaboration. Sequence Diagrams are time
focus and they show the order of the interaction visually by using the vertical axis of the diagram to
represent time what messages are sent and when.
15
5. Implementation
5.1 Modules:
• DATA SOURCE
• PREPROCESSING
• STATISTICAL ANALYSIS
• TREND ANALYSIS
boundaries for the city’s 22 local areas in the Geographic Information System (GIS).
PREPROCESSING:
The original dataset needs to be preprocessed to fill the empty cells, delete unnecessary columns, and
add several relevant features to the original and preprocessed datasets.
Preprocessed datasets
16
STATISTICAL ANALYSIS:
The distribution of the crime dataset described is based on year, month, and day. In Vancouver,
the average number of crime incidents is around 31624 per year, 2720 per month, and 90 per day. The
dataset tends to show a normal distribution as the time intervals lengthen. However, the graph of each
day has an abnormal max value of 650 incidents, which is suspected as an outlier - and turns out to
indicate the Stanley- Cup riot on June 15, 2011.
17
TREND ANALYSIS:
The overall trend shows that the average number of crimes per month decreased from 2003 to
2013 but increased in 2016, and again fell slightly to about 3000 incidents per year in 2018. After
statistical analysis we classify the prediction values of the crime rate in Vancouver
IMPORTANCE OF PYTHON:
• Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to compile your
program before executing it. This is similar to PERL and PHP.
• Python is Interactive − You can actually sit at a Python prompt and interact with the interpreter directly to
write your programs.
18
• Python is Object-Oriented − Python supports Object-Oriented style or technique of programming that
encapsulates code within objects. 30
• Python is a Beginner's Language − Python is a great language for the beginner-level programmers and
supports the development of a wide range of applications from simple text processing to WWW browsers
to games.
ALGORITHM USED:
K-NEAREST NEIGHBORS(KNN):
KNN was applied in both approaches with the same parameters, and the accuracies and training
time was compared. For approach 1, KNN’s accuracy was 40.1% and training time is 2209 seconds,
while for approach 2 it turned out to be 39.9% accurate and took 101.73 seconds to train. The KNN
algorithm is a simple, supervised machine learning algorithm that can be used to solve both
classification and regression problems. Its purpose is to use a database in which the data points are
separated into several classes to predict the classification of a new sample point.
19
Accuracy and training time for approach 1 was 41.9% 903.63 seconds, respectively, while approach 2
was 43.2% accurate with 459.26 sec training time.
5.4 Sample Code:
# -*- coding: utf-8 -*- import
numpy as np
import check_output
crimes1 =
pd.read_csv('input/crime/42_District_wise_crimes_committed_against_women_2001_2012.csv')
crimes2 =
pd.read_csv('input/crime/42_District_wise_crimes_committed_against_women_2013.csv') crimes
inplace=True)
print('Dataset is ready....')
20
# collect the state names in a list and print
j in range(0, len(states)): if
crimes['STATE'] = crimes['STATE'].str.lower()
crimes.head(3)
# filter out the Total crimes for each State & UT crimes_total =
the Total crimes for each State & UT for the year 2001
21
Rape crime committed in the year 2001 per state x =
crimes_total_2001['STATE'].values y =
crimes_total_2001['Rape'].values
ax.set_yticklabels(crime_rape) ax.invert_yaxis()
ax.set_xlabel('Rapes') ax.set_title('RAPE
forward=True) plt.show()
# Any results you write to the current directory are saved as output. # creating a new data set
crimes_total_women1=pd.read_csv('input/crime/42_District_wise_crimes_committed_against_wom
en_2001_2012.csv')
crimes_total_women2=pd.read_csv('input/crime/42_District_wise_crimes_committed_against_wom
en_2013.csv')
crimes_total_women = pd.concat([crimes_total_women1,crimes_total_women2],
ignore_index=False, axis=0)
crimes_total_women.rename(columns={'STATE/UT':'STATE'}, inplace=True)
22
# calculating total crimes of all kinds in each state from 2001 to 2013
crimes_total_women = crimes_total_women[crimes_total_women['DISTRICT'] == 'TOTAL']
crime_df =
pd.read_csv('input/2001_2012/42_District_wise_crimes_committed_against_women_2001_2012
.csv')
23
#ANDHRA PRADESH
#SECUNDERABAD RLY.
X = crime_df.Year.values.reshape(-1, 1) y =
crime_df.Rape.values.reshape(-1, 1) print(y)
x_max = np.array([[X.max()]])
= model.predict(x_max)
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
24
6.TESTCASES
25
7.SCREENSHOTS
26
SAMPLE RESULTS:
The results from both methods (KNN and boosted decision tree) are shown in following figures for
both approaches:
27
(c): Boosted Decision tree result for approach 1
28
29
30
8.CONCLUSION
31
In this research, Telangana crime data for the last 15 years was used in two different dataset
approaches. Machine Learning predictive models KNN and boosted decision tree were used to obtain
crime-prediction accuracy between 70 to 80%. The accuracy, complexity, and training time of
algorithms were slightly different for different approaches and algorithms. The prediction accuracy
can be improved by tuning both the algorithm and the data for specific applications. Although this
model has low accuracy as a prediction model, it provides a preliminary framework for further analyses
9. FUTURE ENHANCEMENTS
32
Crime prediction is a law enforcement technique that uses data and statistical analysis for the
identification of crimes most likely to occur in the future. This field has been subject to continued
research in many parts of the world.
10.BIBLIOGRAPHY
33
[1] A. Bogomolov, B. Lepri, J. Staiano, N. Oliver, F. Piane.si, and A. Pentland, "Once upon a crime: towards
crime prediction from demographics and mobile data," Proc. of the 16th Intl. Conf. on Multimodal
Interaction, pp. 427-434, 2014.
[2] H. Adel, M. Saleem, and R. Mahmoud, "Crime in relation to urban design. Case study: the greater Cairo
region," Ain Shams Eng. J., vol. 7, no. 3, pp. 925-938, 2016.
[3] "Overall crime rate in Vancouver went down in 2017, VPD says," CBC News, Feb. 15, 2018.
[Online] Available: https://www.cbc.ca/news/canada/british-columbia/crime-ratevancouver2017-
1.4537831. [Accessed: 09- Aug- 2018].
[4] J. Kerr, "Vancouver police go high tech to predict and prevent crime before it happens," Vancouver
Courier, July 23, 2017. [Online] Available: https://www.vancourier.com/news/vancouverpolice-go-
high-tech-topredict-and-prevent-crimebefore-it-happens-1.21295288. [Accessed: 09- Aug- 2018]
[5] J. Han, Data mining: concepts and techniques, Morgan Kaufmann, 2012.
[6] R. Iqbal, M. A. A. Murad, A. Mustapha, P. H. Shariat Panahi, and N. Khana hm Adil Ravi, "An
experimental study of classification algorithms for crime prediction," Indian J. of Sci. and Technol., vol.
6, no. 3, pp. 4219- 4225, Mar. 2013.
[7] H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, and M. Chau, "Crime data mining: a general framework
and some examples," IEEE Computer, vol. 37, no. 4, pp. 50-56, Apr. 2004.
[8] T. Beshah and S. Hill, "Mining Road traffic accident data to improve safety: role of roadrelated factors
on accident severity in Ethiopia," Proc. of Artificial Intel. for Develop. (AID 2010), pp. 1419, 2010.
[9] M. Al Boni and M. S. Gerber, "Area-specific crime prediction models," 15th IEEE Intl. Conf. on Mach.
Learn. and Appl., Anaheim, CA, USA, Dec. 2016.
34