CDA Thesis

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 91

CRIME DATA ANALYSIS USING MACHINE

LEARNING
A PROJECT REPORT

Submitted by

SURENDHAR A (310820104097)
RAHUL S (310820104076)
PRAVEEN G (310820104073)

In partial fulfillment for the award of the degree

Of

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE AND ENGINEERING

JEPPIAAR ENGINEERING COLLEGE, CHENNAI 600 119.

ANNA UNIVERSITY: CHENNAI 600 025

MAY 2024

1
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “CRIME DATA ANALYSIS USING MACHINE
LEARNING” is a bonafide work of SURENDHAR A (310820104097), RAHUL S
(310820104076), PRAVEEN G (310820104073) who carried out the project under my
supervision.

HEAD OF THE DEPARTMENT SUPERVISOR

Dr.J.Arokia Renjith M.E.,Ph.D., Mr. S.Insol Rajasekar M.Tech


Professor & Head/ CSE, Assistant Professor/CSE

Jeppiaar Engineering College, Jeppiaar Engineering College

Chennai-600 119 Chennai-600 119

2
CERTIFICATE OF EVALUATION

S.no. Name of the Title of the Project Name of the


Student(s) who have Supervisor with Designation
done the Project

1 SURENDHAR A
(310820104097)

Mr.S.Insol Rajasekar M.Tech


RAHUL S CRIME DATA ANALYSIS Assistant Professor
2 (310820104076) USING CSE Department
MACHINE LEARNING

PRAVEEN G
3
(310820104073)

The Reports of the project work submitted by the above students in partial fulfilment
for the award of Bachelor of Engineering Degree in Computer Science and Engineering of
Anna University, Chennai were evaluated and confirmed to be reports of the work by the
above students and then evaluated.

Submitted for viva-voce held on.....................................

INTERNAL EXAMINER EXTERNAL EXAMINER

3
ACKNOWLEDGEMENT

We are very much ineffably indebted to (Late) Hon’ble Colonel. Dr.


JEPPIAAR, M.A., B.L., Ph.D., and Our Chairman and Managing Director Dr.
M. REGEENA JEPPIAAR, B. Tech., M.B.A., Ph.D.,

We express our sincere thanks to our Principal Dr. K. SENTHIL KULAR,


M.E., Ph.D., and the Dean-Academics Dr. SHALEESHA A STANLEY,
M.Sc., M.Phil., Ph.D., for the opportunity given to us.

We would like to express our deep sense of gratitude to Dr. J. AROKIA


RENJIT, M.E., Ph.D., Head of the Department who has been always a mentor
and motivator to complete this project.

We feel to acknowledge our indebtedness to our Guide MR.S.INSOL


RAJASEKAR M.Tech., for giving valuable guidance to complete the project.

We also thank the Teaching and Non -Teaching staff members of the
Department of Computer Science and Engineering for their constant Support.

We also acknowledge with a deep sense of reverence, our gratitude towards our
Parents and Member of our Family, who has always supported us morally as
well as economically.

4
ABSTRACT

Crime data analysis using machine learning harnesses advanced algorithms to scrutinize

extensive datasets, uncovering patterns and trends in criminal behavior. One primary

application is predicting crime hotspots by analyzing historical data to identify spatial and

temporal patterns. This enables proactive resource allocation, enhancing public safety and

reducing crime rates in vulnerable areas.Machine learning also aids in detecting suspicious

behavior by flagging deviations from normal patterns within datasets. For instance, anomaly

detection algorithms can identify unusual financial transactions or atypical social media

activity, providing valuable leads for law enforcement. Automating this process improves the

efficiency and effectiveness of crime prevention efforts.Moreover, machine learning helps

identify anomalies in criminal activity that may evade traditional methods. By analyzing

diverse data sources, such as social media and surveillance footage, algorithms can detect

irregularities indicative of emerging threats or unconventional tactics. This empowers law

enforcement to stay ahead of evolving criminal trends and effectively combat new

challenges.However, successful implementation requires careful consideration of factors like

data preprocessing, feature selection, and algorithm choice. Ensuring data quality and

addressing privacy concerns are essential, as is mitigating algorithmic bias to ensure fair and

ethical outcomes.In summary, crime data analysis using machine learning offers significant

potential for enhancing public safety and reducing crime rates. By leveraging advanced

algorithms, law enforcement can gain valuable insights into criminal behavior, enabling

proactive interventions and effective resource allocation. Adhering to best practices ensures

fair and ethical outcomes while maximizing the benefits of this innovative approach to crime

prevention and law enforcement.

5
சுருக்கம்

இயந்திர கற்காணிப்புகளை பயன்படுத்தி குறித்த விபரங்களை அதிகபட்சமாக

ஆய்வு செய்வதன் மூலம், குற்ற நடவடிக்கைகளின் முன்னிலைகளை அறியும்

முயற்சிகள் செய்யும். ஒரு முக்கிய பயன்பாடு, தொடர்ந்து நாடு மற்றும் நேரம்

பற்றிய பட்டியல்களை பகிர்ந்து, பணிகளின் பொருளாதாரங்களில் மாபெரும்

நடுவரையை கண்டறிய முன்னாள் விபரங்களை ஆய்வு செய்யும். இதன் மூலம்,

முன்னிலை பதிவுகளில் விழும் மோசங்களை நேர்ந்து விளக்கும் நிரம்பிக்

கண்டுபிடிப்பது முக்கியமாகும். மேலும், குற்ற நடவடிக்கைகளில் வழியாக மற்றும்

பொருளாதாரங்களில் சிக்கல்களை நேரடியாக கண்டறியும் பயன்படுத்தும்

சிறுநேரங்களில் அல்லது அரசாங்கங்கள் வழியாக உதவும். உதாரணமாக,

அசாதாரண நிதி பரிவர்த்தனைகளை அல்லது உச்சநிலை சமூக ஊடகங்களை

நேரடியாக அலர்மு படுத்தும் முறைகள் உள்ளன. இதன் மூலம், சட்டவிரோத

முறைகளை நெருங்கி குற்ற நோக்கம் பயன்படுத்தும் விருப்பங்களிலிருந்து நாவல்

நடவடிக்கைகளுக்கு மதிப்பு கொடுக்க உதவுகிறது.மேலும், மொத்தமைப்பு

முறைகளில் குற்ற நடவடிக்கைகளில் நேர்ந்த விசிகரிப்புகளை அறியும் மூலம்

படைப்புகளை நேரடியாக கண்டறிய உதவுகிறது. சமீபத்திய கட்டுப்பாடுகளை

விழுங்கி குற்ற நோக்கம் மூலம் மதிப்பீடுகளை தடுக்க உதவுகிறது. இதன் மூலம்,

அனைத்து குற்றங்களும் மொத்தமைப்பு முறைகளில் மதிப்பீடு செய்யப்படுகின்ற.

TABLE OF CONTENTS

CHAPTER TITLE PAGE

NO. NO.

ABSTRACT 5
ABSTRACT(TAMIL) 6
LIST OF FIGURES 9
6
LIST OF ABBREVIATIONS 10
1. INTRODUCTION 11
1.1 INTRODUCTION 11
1.2 BACKGROUND 12
1.3 SIGNIFICANCE OF CRIME RATE PREDICTION 12
1.4 ROLE OF MACHINE LEARNING 13
1.5 PREDICTIVE MODELING 13
1.6 FEATURE ENGINEERING 14
1.7 EXPLORATORY DATA ANALYSIS 14
1.8 EVALUATION METRICS 15
1.9 INTEGRATION OF MACHINE LEARNING IN LAW 15
ENFORCEMENT
1.10 CHALLENGES AND LIMITATION 16
2. LITERATURE SURVEY 17

3. SYSTEM ANALYSIS 19
3.1 EXISTING SYSTEM 19
3.2 PROPOSED SYSTEM 19
3.3 FEASIBILITY STUDY 22
3.4 REQUIREMENT SPECIFICATION 24
3.5 LANGUAGE SPECIFICATION - PYTHON 24

4. SYSTEM DESIGN 27
4.1 SYSTEM ARCHITECTURE 27
4.2 DATA FLOW DIAGRAM 27
4.3 USE CASE DIAGRAM 29
4.4 ACTIVITY DIAGRAM 30
4.5 SEQUENCE DIAGRAM 31
4.6 CLASS DIAGRAM
31
5. MODULE DESCRIPTION 33
5.1 MODULE 1 33
7
5.2 MODULE 2 35
5.3 MODULE 3 38

6. TESTING 42
6.1 TYPES OF TESTING 42
6.2 TESTING TECHNIQUES 43

6.2.1 BLACK BOX TESTING 45


6.2.2 WHITE BOX TESTING 46

7 CONCLUSION 48
8 APPENDIX - CODING 49

9 APPENDIX 2 – OUTPUT SCREEN 87


10 REFERENCE 89
11 CERTIFICATIONS 90

8
LIST OF FIGURES

SNO TITLE PG.NO

4.1 SYSTEM ARCHITECTURE 28

4.2 DATA FLOW DIAGRAM 29

4.3 USE CASE DIAGRAM 30

4.4 ACTIVITY DIAGRAM 31

4.5 SEQUENCE DIAGRAM 32

4.6 CLASS DIAGRAM 33

6.2.1 BLACK BOX TESTING 45

6.2.2 WHITE BOX TESTING 46

9
EDA EXPLORATORY DATA
ANALYSIS

ML MACHINE LEARNING

AI ARTIFICAL INTELLIGENCE

NP NUMPY

PD PANAS

DFD DATA FLOW DIAGRAM

UML UNIFIED MODIFIED


LANGUAGE

DFM DATA FLOW MODEL

SVM SUPPORT VECTOR MACHINE

DTC DECISION TREE CLASSIFIER

RF RANDOM FOREST

KNN K-NEAREST NEIGHBOUR

RNN RECURRENT NEURAL


NETWORK

MLP MULTI LAYER PERCEPTRON

LIST OF ABBREVIATIONS

10
CHAPTER 1
INTRODUCTION
1.1 Introduction

Crime poses a significant and pervasive challenge to societies globally, necessitating


accurate prediction and analysis of crime rates to inform effective strategies for prevention,
resource allocation, and policy-making. Recent advancements in machine learning have
propelled the field of crime rate prediction and analysis forward, offering promising avenues
for enhancing our comprehension of criminal activities through data-driven approaches.

This study endeavors to delve into the application of machine learning techniques in crime
rate prediction and analysis, aiming to harness the potential of data-driven methodologies. The
primary goal is to develop robust predictive models capable of estimating crime rates by
considering diverse factors, including but not limited to historical crime records, socio-
economic indicators, and geographical data. Leveraging a repertoire of machine learning
algorithms such as decision trees, support vector machines, and neural networks, the research
seeks to dissect and unearth patterns embedded within the collected data. By discerning the
most influential factors shaping crime rates, these models promise to furnish invaluable
insights into the intricate dynamics of criminal behavior.

Furthermore, the study will direct its focus towards scrutinizing crime patterns and trends to
unearth spatial and temporal correlations, hotspots, and emerging patterns. Employing an
array of exploratory data analysis techniques like clustering, visualization, and association rule
mining, the research aims to unveil concealed patterns and relationships lurking within the
vast expanse of crime data. Such in-depth analysis holds the potential to augment our
comprehension of crime dynamics and facilitate the crafting of proactive strategies for crime
prevention.

In essence, this study endeavors to harness the power of machine learning to not only predict
crime rates accurately but also to unravel the underlying intricacies of criminal behavior.
Through comprehensive analysis and exploration of crime data, the research aims to furnish
stakeholders with actionable insights to bolster their efforts in combating crime and fostering
safer communities.

1.2 Background

11
Crime's global prevalence presents daunting challenges across societies. Precise
prediction and analysis of crime rates are pivotal for formulating impactful prevention
strategies, optimizing resource allocation, and shaping policies. In recent times, machine
learning methodologies have risen to prominence in crime rate prediction and analysis,
capitalizing on data-driven methodologies to deepen insights into criminal behaviors. This
technological leap enables a nuanced comprehension of crime dynamics, empowering
stakeholders to deploy targeted interventions and proactive measures.

By harnessing diverse data sources such as historical crime records, socio-economic


indicators, and geographical data, machine learning algorithms unveil hidden patterns and
trends within the realm of criminal activities. Armed with this newfound understanding,
decision-makers can make informed choices to mitigate the societal impact of crime. Machine
learning's application facilitates the identification of high-risk areas, enabling law
enforcement agencies and policymakers to strategically allocate resources.

Furthermore, the adoption of machine learning fosters collaboration among


multidisciplinary teams, including data scientists, law enforcement agencies, policymakers,
and community stakeholders. This interdisciplinary approach enhances the efficacy of crime
prevention efforts by capitalizing on diverse expertise and perspectives. As machine learning
continues to advance, its potential to revolutionize crime prediction and analysis remains
boundless, promising safer and more resilient societies globally.

1.3 Significance of Crime Rate Prediction and Analysis

Accurate crime rate prediction and analysis serve as crucial tools for understanding
and addressing criminal activities. Through sophisticated data analysis techniques, patterns
and trends in crime can be identified, allowing law enforcement agencies, policymakers, and
urban planners to anticipate and respond effectively to emerging threats. By pinpointing
hotspots of criminal activity, resources can be allocated strategically to deter crime and
enhance public safety. Moreover, analyzing the underlying factors contributing to crime
enables the implementation of targeted interventions, such as community outreach programs
or socioeconomic initiatives, to address root causes and prevent recidivism. Ultimately, the
insights gleaned from crime rate prediction and analysis empower stakeholders to make
informed decisions and collaborate in creating safer, more resilient communities.

12
1.4 Role of Machine Learning in Crime Analysis

Machine learning algorithms are revolutionizing the way we approach crime analysis
by harnessing the power of data. These algorithms can sift through vast amounts of
information, ranging from past crime records to socio-economic factors and geographical
data. By doing so, they uncover intricate patterns and correlations that might elude traditional
methods. This holistic approach enables more accurate predictions of crime rates and trends,
aiding law enforcement agencies in proactive measures and resource allocation. Moreover,
machine learning algorithms facilitate comprehensive analysis by considering multifaceted
aspects of crime, such as its underlying causes and spatial distribution. This nuanced
understanding allows for targeted interventions and tailored strategies to address specific
challenges within communities. As technology continues to advance, these algorithms hold
promise in not only predicting and analyzing crime but also in preventing it through early
detection and strategic planning.

1.5 Predictive Modeling for Crime Rate Prediction

Predictive modeling techniques, including decision trees, support vector machines


(SVM), and neural networks, offer powerful tools for estimating crime rates. These methods
analyze historical crime data, incorporating various factors such as demographics, socio-
economic indicators, and geographic features to create predictive models. Decision trees
segment the data into hierarchical branches, identifying significant predictors of crime.
Support vector machines optimize the separation between crime and non-crime instances,
effectively delineating patterns within the data. Neural networks, inspired by the human brain,
can discern complex relationships among variables, enabling nuanced crime rate predictions.
By training these models on comprehensive datasets, they can anticipate future crime rates
and highlight high-risk areas for proactive intervention, aiding law enforcement agencies and
policymakers in deploying resources effectively for crime prevention and community safety.

1.6 Feature Engineering in Crime Rate Prediction

13
Feature engineering is akin to sculpting raw data into a refined form, crucial for
understanding complex phenomena like crime rates. By carefully selecting and transforming
features, such as demographic profiles, weather patterns, or social media metrics, analysts can
unravel hidden insights. Demographics offer a window into the socioeconomic fabric of
communities, shedding light on vulnerability and behavioral trends. Weather conditions serve
as environmental factors influencing human behavior, affecting crime patterns in tangible
ways. Meanwhile, the pervasive influence of social media activity unveils subtle shifts in
societal dynamics, offering predictive cues. Through advanced techniques like dimensionality
reduction or polynomial transformations, feature engineering refines these variables,
extracting nuanced relationships. This refined data not only enhances the accuracy of
predictive models but also empowers policymakers with actionable insights for targeted
interventions. Thus, feature engineering stands as a cornerstone in deciphering the
multifaceted puzzle of crime dynamics, enabling proactive measures for safer, more resilient
communities.

1.7 Exploratory Data Analysis in Crime Analysis

Exploratory data analysis (EDA) techniques are essential tools for uncovering intricate
patterns within crime data. Through clustering methods, such as k-means or hierarchical
clustering, similar spatial or temporal crime occurrences can be grouped together, revealing
underlying structures and trends. Visualization plays a crucial role in EDA, allowing analysts
to represent complex data in intuitive formats, such as heatmaps, choropleth maps, or time
series plots. These visualizations facilitate the identification of hotspots, spatial distributions,
and temporal fluctuations in criminal activities. Association rule mining further enriches the
analysis by identifying frequent co-occurrences or correlations among different types of
crimes or variables, aiding in understanding underlying relationships and potential
contributing factors. By leveraging these techniques, law enforcement agencies and
policymakers can gain actionable insights into crime dynamics, enabling targeted resource
allocation, strategic planning, and proactive crime prevention efforts aimed at mitigating risks
and enhancing public safety.

14
1.8 Evaluation Metrics for Predictive Models

Evaluation metrics such as accuracy, precision, recall, and F1 score are crucial for
assessing the performance of predictive models in predicting crime rates. Accuracy measures
the overall correctness of predictions, while precision quantifies the proportion of correctly
predicted positive cases out of all cases predicted as positive. Recall, also known as
sensitivity, gauges the proportion of true positive cases that were correctly identified. The F1
score balances precision and recall, providing a single metric that combines both. In the
context of predicting crime rates, these metrics help gauge how well the model identifies
actual instances of crime and minimizes false predictions. For example, a high precision score
indicates fewer false positives, which means fewer resources wasted on investigating non-
existent crimes. Conversely, a high recall score suggests the model effectively captures most
actual crime instances, aiding in crime prevention and law enforcement efforts. Overall, these
metrics provide a comprehensive assessment of the predictive model's performance and its
practical utility in addressing real-world challenges related to crime prediction and
prevention.

1.9 Integration of Machine Learning in Law Enforcement

The integration of machine learning in law enforcement holds immense potential for
revolutionizing traditional approaches to crime prevention and response. By harnessing
advanced algorithms and real-time data analytics, law enforcement agencies can optimize
their decision-making processes. Machine learning techniques enable the analysis of vast
amounts of data to identify patterns, trends, and anomalies, thus facilitating proactive crime
prevention efforts. Predictive models can forecast crime hotspots, enabling law enforcement
to allocate resources strategically and deploy patrols to high-risk areas. Additionally, machine
learning can aid in the optimization of patrol routes, minimizing response times and
maximizing coverage. Moreover, by continuously learning from new data, these systems can
adapt and evolve to changing crime patterns, enhancing the overall effectiveness of law
enforcement efforts. However, it's crucial to address ethical considerations such as bias and
privacy concerns to ensure the responsible and equitable use of these technologies in law
enforcement practices.

15
1.10 Challenges and Limitations

Machine learning's potential in crime rate prediction and analysis is undeniable, yet
navigating its challenges is crucial. Data quality and availability often hinder accurate
predictions, as crime data can be incomplete or biased. Algorithm bias further complicates
matters, as models may perpetuate societal inequalities if not carefully constructed.
Interpretability poses a significant hurdle; understanding how models reach conclusions is
essential for trust and accountability. Moreover, ethical dilemmas arise from the use of
sensitive personal data, raising concerns about privacy and consent. Addressing these
challenges demands interdisciplinary collaboration, incorporating expertise from data science,
law, ethics, and social sciences. Only through rigorous attention to these issues can machine
learning truly realize its potential in crime prediction while upholding fairness, transparency,
and ethical principles.

16
CHAPTER 2

LITERATURE REVIEWS

S.No Article Techniques Findings


&Author Used
Details

1. [1] Mohler, G. O., Self-exciting point they demonstrate the effectiveness of this modeling
Short, M. B., approach in capturing crime patterns and predictingfuture
process modeling of
criminal activity.
Brantingham, P. crime. Journal of
J., Schoenberg, F. the American
P., & Tita, G. E. Statistical
(2011). Association,
106(493), 100-108.

2. Wang, P., Liu, Crime prediction The paper titled "Crime prediction based on criminal
W., Li, D., & behavior similarity" by Wang, Liu, Li, and Zhang (2013)
based on criminal
presents a crime prediction approach that leverages the
Zhang, L. (2013). behavior similarity. similarity in criminal behaviors.
Expert Systems
with Applications,
40(12), 4912-4919.

3. Ashfaq, R., Expert Systems It analyzes the strengths and limitations of these
Zhang, X., & with Applications, techniques and discusses the key factors influencing their
40(12), 4912-4919. performance.
Ahmad, M. O.
(2018). Crime
prediction using
machine learning
techniques:

17
4. Wu, Z., Yin, H., In 2019 5th The authors propose a predictive model that utilizes
historical crime data and employs machine learning
& Zhu, Y. (2019). International
algorithms for accurate crime rate estimation.
Crime rate Conference on
prediction based
on machine Control,
learning. Automation and
Robotics (ICCAR)
(pp. 476-480).
IEEE.

5. Ribeiro, F. N., de In 2019 IEEE The authors explore the application of various machine
Souza, R. C., & International learning techniques, including decision trees, support
Pereira, C. A. vector machines, random forests, and artificial neural
Conference on networks, for crime prediction tasks.
(2019). Crime
prediction using Systems, Man and
machine learning Cybernetics (SMC)
algorithms.
(pp. 1569-1574).
IEEE.

6. Yuan, Y., Wang, A spatiotemporal The study leverages convolutional neural networks
F., Zheng, Y., (CNNs) and recurrent neural networks (RNNs) to capture
approach. Applied
Xie, X., & Sun, spatial and temporal patterns in crime data.
Geography, 121,
G. (2020).
Predicting 102233.
criminal hotspots
using deep
learning

18
CHAPTER 3
SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

The existing system for crime rate prediction and analysis using machine learning
involves various approaches and methodologies. Traditionally, crime rate analysis relied on
statistical techniques and manual data processing, which limited the accuracy and efficiency
of predictions. However, with advancements in machine learning and data analytics, more
sophisticated systems have been developed.
The existing system typically starts with data collection, which involves gathering
crime-related data such as historical crime records, socio-economic indicators, demographic
information, and geographical data. These datasets are preprocessed to clean and transform
the data into a suitable format for analysis.
Machine learning algorithms are then employed to create predictive models. Various
techniques, including decision trees, support vector machines, random forests, and neural
networks, are used to train the models on the collected data. Feature engineering techniques
are also applied to identify the most relevant features affecting crime rates and enhance the
performance of the models.
Once the predictive models are developed, they are evaluated using appropriate
performance metrics such as accuracy, precision, recall, and F1 score. The models are then
used to predict future crime rates based on the identified patterns and trends in the data.

3.2 PROPOSED SYSTEM

The proposed system for crime rate prediction and analysis using machine learning is
poised to revolutionize traditional crime analysis methodologies. By harnessing cutting-edge
algorithms and techniques, it aims to elevate accuracy, efficiency, and effectiveness in
understanding and forecasting criminal activities. Through robust data processing and
predictive modeling, this system can identify patterns and trends within vast datasets,
enabling law enforcement agencies to proactively allocate resources and strategize crime
prevention measures. Furthermore, by leveraging machine learning, it can adapt and evolve
with dynamic shifts in criminal behavior, continuously refining its predictive capabilities.
Ultimately, this system holds the potential to not only optimize resource allocation but also
19
enhance community safety by empowering authorities with actionable insights to combat
crime more effectively. The system involves the following key components

Data Collection and Preprocessing


Once gathered, the crime-related data undergoes rigorous preprocessing to ensure its
quality and reliability. Missing values are addressed through imputation techniques, outliers
are identified and treated using statistical methods, and inconsistencies are rectified to
maintain data integrity. This meticulous process enhances the accuracy and usefulness of the
dataset for subsequent analysis. Geospatial data is carefully integrated, providing valuable
insights into spatial patterns and correlations between crime and various socio-economic
factors. Through these steps, the system ensures that the data is well-prepared for in-depth
analysis and decision-making in crime prevention and law enforcement efforts.

Feature Engineering
Feature engineering is a critical step in data preprocessing where relevant features are
selected or created to enhance the performance of machine learning models. This process
involves analyzing the dataset to identify valuable insights and relationships, which may not
be immediately apparent. Techniques such as imputation, encoding categorical variables, and
scaling numerical features are commonly employed to ensure the data is suitable for
modeling. Additionally, feature extraction methods like Principal Component Analysis (PCA)
or dimensionality reduction techniques might be applied to reduce the complexity of the
dataset while preserving important information. The ultimate goal of feature engineering is to
improve the predictive power and generalization ability of the model by providing it with the
most informative and discriminative features.

Model Selection and Training


Machine learning algorithms, like decision trees, random forests, support vector
machines, and neural networks, are foundational tools for predictive modeling. They excel in
capturing complex patterns in data and making accurate predictions. Through preprocessing
techniques such as normalization and feature engineering, data is prepared for training. Cross-
validation ensures robustness by assessing model performance across multiple subsets of the
data. This iterative process allows for fine-tuning hyperparameters and optimizing the model's
performance. By leveraging these diverse algorithms and methodologies, practitioners can

20
develop predictive models that generalize well to new data and yield valuable insights across
various domains.

Model Evaluation and Validation


The trained models are evaluated using appropriate evaluation metrics to assess their
accuracy, precision, recall, and F1 score. Validation techniques, such as holdout or k-fold
cross-validation, are applied to ensure the robustness of the models. Crime Rate

Prediction
The trained models are then used to predict future crime rates based on new input data.
These predictions consider the identified patterns and relationships between crime rates and
various factors.

Crime Pattern Analysis


Exploratory data analysis (EDA) techniques play a crucial role in uncovering intricate
patterns within crime data. Clustering algorithms group similar crime occurrences, revealing
spatial concentrations or "hotspots." Visualization tools offer intuitive representations, aiding
in the identification of temporal trends and spatial correlations. Association rule mining
unveils hidden relationships between various crime types, shedding light on potential
contributing factors. By leveraging these techniques, analysts gain profound insights into
crime dynamics, facilitating a deeper understanding of the socio-economic, demographic, and
environmental factors influencing criminal behavior. This holistic understanding enables
more effective crime prevention and intervention strategies tailored to specific contexts and
communities.

Decision Support and Reporting


The system provides decision support tools and visualizations to aid law enforcement
agencies, policymakers, and urban planners in making informed decisions. The results and
findings are presented in comprehensive reports or interactive dashboards, facilitating
datadriven decision-making processes.
The proposed system aims to improve crime prediction accuracy, enhance resource
allocation, and support proactive crime prevention strategies. By leveraging machine learning
algorithms and data analysis techniques, it enables stakeholders to address crime-related
challenges and create safer communities.

21
3.3 FEASIBILITY STUDY

With an eye towards gauging the project's viability and improving server performance,
a business proposal defining the project's primary goals and offering some preliminary cost
estimates is offered here. Your proposed system's viability may be assessed once a
comprehensive study has been performed. It is essential to have a thorough understanding of
the core requirements of the system at hand before beginning the feasibility study. The
feasibility research includes mostly three lines of thought:
• Economical feasibility

• Technical feasibility

• Operational feasibility

• Social feasibility

1.ECONOMICAL FEASIBILITY
The study's findings provide valuable insights for upper management to assess
potential cost savings from implementing the technology. With finite resources, the
corporation must carefully allocate funds, ensuring each dollar spent has a justified purpose.
Leveraging predominantly open-source and free technologies significantly reduced
infrastructure costs, aligning with budget constraints. Emphasizing customizable products
was pivotal, enabling tailored solutions without excessive spending. This strategic approach
not only optimizes resource utilization but also fosters agility and adaptability within the
system. By minimizing expenses without compromising quality or functionality, the
corporation maximizes ROI and ensures sustainable growth. Furthermore, investing in
adaptable infrastructure lays a foundation for future scalability and innovation, positioning the
company for long-term success in a competitive landscape. Effective cost management fosters
financial stability, empowering the corporation to navigate uncertainties and capitalize on
emerging opportunities. Thus, the study's findings not only inform decision-making but also
contribute to the organization's overall efficiency and resilience.

2.TECHNICAL FEASIBILITY
The research's primary objective is to ascertain the technical feasibility of the system
to facilitate its seamless development. The intention is to integrate additional systems without
22
overburdening the IT staff, thereby mitigating any undue anxiety for the buyer. Given the low
probability of requiring adjustments during installation, simplicity in design is paramount. By
prioritizing simplicity, the system can streamline processes and minimize potential
complications. This approach not only enhances efficiency but also reduces the risk of errors
and downtime. Moreover, a straightforward design fosters ease of maintenance and
troubleshooting, bolstering long-term sustainability. Through meticulous evaluation and
testing, the research endeavors to identify potential bottlenecks and address them proactively.
Ultimately, the aim is to ensure a robust and scalable system that can adapt to evolving needs
without causing undue strain on resources or stakeholders.

3.OPERATIONAL FEASIBILITY
Ensuring user engagement and satisfaction is paramount in technology adoption. A
crucial step involves guiding users through the optimal utilization of the resource, minimizing
any sense of intimidation or threat. By positioning the system as a necessary tool rather than
an adversary, users are more likely to embrace its functionalities. Effective training and
orientation sessions lay the groundwork for swift adoption, empowering users to navigate the
system confidently. Building trust in the system is key, fostering an environment where users
feel comfortable providing constructive feedback. As users gain faith in the system's
reliability and utility, they become more inclined to offer valuable insights, enriching the
development process. Ultimately, this iterative cycle of user empowerment and feedback
integration accelerates the system's evolution and enhances its effectiveness.

4.SOCIAL FEASIBILITY
In the social feasibility analysis, understanding how a project might impact the community is
paramount. This involves assessing potential shifts in demographics, employment
opportunities, and social dynamics. One critical aspect is recognizing how the project aligns
with existing cultural norms and institutional frameworks. These frameworks often dictate the
availability of certain types of workers within a community. If the project requires specialized
skills or expertise that are not prevalent in the community, it could face challenges in finding
qualified personnel. This scarcity might necessitate strategies for recruitment, training, or even
relocation incentives to attract the needed workforce. Additionally, it's essential to consider
the long-term effects on local employment patterns and the overall socio-economic fabric. By
addressing these factors early in the feasibility analysis, project planners can better anticipate

23
and mitigate potential hurdles, ensuring smoother integration and acceptance within the
community.

3.4 REQUIREMENT SPECIFICATION HARDWARE REQUIREMENTS

Processor : Pentium Dual Core 2.00GHZ


Hard disk : 120 GB
RAM : 2GB (minimum)
Keyboard : 110 keys enhanced

SOFTWARE REQUIREMENTS
Operating system : Windows7 (with service pack 1), 8, 8.1 and 10
Language : Python

3.5 LANGUAGE SPECIFICATION– PYTHON

Among programmers, Python is a favourite because to its user-friendliness, rich


feature set, and versatile applicability. Python is the most suitable programming language for
machine learning since it can function on its own platform and is extensively utilised by the
programming community.
Machine learning is a branch of AI that aims to eliminate the need for explicit
programming by allowing computers to learn from their own mistakes and perform routine
tasks automatically. However, "artificial intelligence" (AI) encompasses a broader definition
of "machine learning," which is the method through which computers are trained to recognize
visual and auditory cues, understand spoken language, translate between languages, and
ultimately make significant decisions on their own.
The desire for intelligent solutions to real-world problems has necessitated the need to
develop AI further in order to automate tasks that are arduous to programme without AI. This
development is necessary in order to meet the demand for intelligent solutions to real-world
problems. Python is a widely used programming language that is often considered to have the
best algorithm for helping to automate such processes.

24
ADVANTAGES OF USING PYTHON

Following are the advantages of using Python Variety of Framework and


libraries
Python's expansive ecosystem of frameworks and libraries is indispensable for
creating robust and efficient software solutions. With libraries like NumPy, pandas, and
matplotlib, data manipulation, analysis, and visualization become streamlined tasks.
Frameworks such as Django and Flask offer powerful tools for building web applications with
ease, handling everything from routing to database management. Additionally, libraries like
TensorFlow and PyTorch empower developers to delve into advanced machine learning and
deep learning projects effortlessly. PyBrain, with its modular approach to machine learning
algorithms, further enhances Python's capabilities in this domain. These libraries and
frameworks not only accelerate development but also ensure code reliability through their
thoroughly tested and structured architectures. In essence, Python's rich ecosystem of libraries
and frameworks provides developers with the essential building blocks to tackle diverse
programming challenges efficiently and effectively.

Reliability
Python is a favorite among software developers for its ethos of simplicity and
consistency. Its concise syntax and readability make code presentation a breeze, fostering
clarity and maintainability. In comparison to its counterparts, Python allows developers to
write code swiftly, thanks to its straightforward syntax and extensive standard library.

Moreover, the vibrant Python community offers invaluable feedback, enabling developers to
refine their products and applications continuously. Its simplicity also renders it an ideal
choice for beginners, who can grasp its fundamentals relatively quickly, setting a solid
foundation for their programming journey.

For seasoned developers, Python's simplicity serves as a springboard for innovation. With a
robust ecosystem for machine learning and data science, they can focus on devising
groundbreaking solutions to real-world problems, leveraging Python's stability and reliability
to create trustworthy applications. This emphasis on innovation propels Python to the
forefront of technological advancement, driving progress across various domains.

25
Easily Executable

Python's cross-platform compatibility indeed stands as one of its most appealing


features for developers. Its ability to seamlessly run on Windows, Linux, and macOS without
requiring any modifications simplifies the development process significantly. Moreover,
Python's syntax is straightforward and easy to understand, making it accessible to a wide
range of developers regardless of their expertise level. This universality ensures that teams
can collaborate effectively without the need for specialized Python experts, thereby
streamlining project workflows. Additionally, Python's extensive standard library and vast
ecosystem of third-party packages empower developers to build complex applications without
relying on external languages or frameworks. This self-contained nature of Python reduces
dependencies and accelerates development cycles. Consequently, Python emerges as an ideal
choice for projects where time and effort are critical factors, as its portability enables swift
deployment across diverse environments with minimal overhead. Ultimately, Python's
versatility and executability make it a go-to language for developers seeking efficiency and
flexibility in their software development endeavors.

26
CHAPTER 4
SYSTEM DESIGN

4.1 SYSTEM ARCHITECTURE


This graphic provides a concise and understandable description of all the entities
currently integrated into the system. The diagram shows how the many actions and choices
are linked together. You might say that the whole process and how it was carried out is a
picture. The figure below shows the functional connections between various entities.

Figure 4.1 System Architecture

4.2 DATA FLOW DIAGRAM

To illustrate the movement of information throughout a procedure or system, one


might use a Data-Flow Diagram (DFD). A data-flow diagram does not include any decision

27
rules or loops, as the flow of information is entirely one-way. A flowchart can be used to
illustrate the steps used to accomplish a certain data-driven task. Several different notations
exist for representing data-flow graphs. Each data flow must have a process that acts as either
the source or the target of the information exchange. Rather than utilizing a data-flow
diagram, users of UML often substitute an activity diagram. In the realm of data-flow plans,
site-oriented data-flow plans are a subset. Identical nodes in a data-flow diagram and a Petri
net can be thought of as inverted counterparts since the semantics of data memory are
represented by the locations in the network. Structured data modeling (DFM) includes
processes, flows, storage, and terminators.

Data Flow Diagram Symbols


Process

A process is one that takes in data as input and returns results as output.

Data Store

In the context of a computer system, the term "data stores" is used to describe the
various memory regions where data can be found. In other cases, "files" might stand in for
data.

28
Data Flow

Data flows are the pathways that information takes to get from one place to another.
Please describe the nature of the data being conveyed by each arrow.

External Entity

In this context, "external entity" refers to anything outside the system with which the
system has some kind of interaction. These are the starting and finishing positions for inputs
and outputs, respectively.

4.3 USE CASE DIAGRAM

The whole system is shown as a single process in a level DFD. Each step in the
system's assembly process, including all intermediate steps, are recorded here. The "basic
system model" consists of this and 2-level data flow diagrams.

Figure 4.3 USE CASE DIAGRAM

29
4.4 ACTIVITY DIAGRAM
An activity diagram, in its most basic form, is a visual representation of the sequence
in which tasks are performed. It depicts the sequence of operations that make up the overall
procedure. They are not quite flowcharts, but they serve a comparable purpose.

Figure 4.4 Activity Diagram

30
4.5 SEQUENCE DIAGRAM
These are another type of interaction-based diagram used to display the workings of
the system. They record the conditions under which objects and processes cooperate.

Figure 4.5 Sequence Diagram

4.6 CLASS DIAGRAM

In essence, this is a "context diagram," another name for a contextual diagram. It


simply stands for the very highest point, the 0 Level, of the procedure. As a whole, the system
is shown as a single process, and the connection to externalities is shown in an abstract
manner.

• A + indicates a publicly accessible characteristic or action.


• A - a privately accessible one.
• A # a protected one.
• A - denotes private attributes or operations.

31
Figure 4.6 Class Diagram

32
CHAPTER 5
MODULE DESCRIPTION

5.1 MODULE 1:
Data Collection and Pre-processing

The Data Collection and Preprocessing module acts as the foundation for crime rate
prediction and analysis, sourcing data from a multitude of channels such as law enforcement
databases, public records, and even social media. It meticulously sifts through the gathered
data, employing techniques like data cleaning and normalization to ensure accuracy and
consistency. Through this process, disparate datasets are harmonized and prepared for
seamless integration into machine learning algorithms. This module serves as the gateway to
unlocking valuable insights into crime patterns and trends, laying the groundwork for
effective predictive models and informed decision-making in law enforcement and urban
planning.

Data Gathering
The data collection module integrates diverse sources such as law enforcement
databases, government records, and relevant datasets. It aggregates historical crime records,
socio-economic metrics, demographic details, and geographical data. Through a well-
designed system, it ensures efficient retrieval and compilation of this information. The
module prioritizes accuracy and comprehensiveness, ensuring a representative dataset. By
harmonizing disparate sources, it fosters a holistic understanding of the underlying dynamics.
This integrated approach aids in generating insights crucial for informed decision-making in
various domains, from public policy to law enforcement strategies.

Data Validation and Cleaning


Data validation is crucial to ensure accuracy, consistency, and completeness post-
gathering. Missing values, inconsistencies, and outliers are scrutinized for potential impacts
on analysis quality. Techniques like imputation or removal of missing values, normalization,
and outlier detection are employed to maintain dataset integrity. Through this process, data is
refined for reliable analysis, enhancing the credibility of insights derived. Such meticulous
validation ensures that decisions and conclusions drawn from the data are grounded on a solid
foundation of trustworthy information, facilitating informed actions and strategies.
33
Data Integration
In this module, the integration of crime-related data is meticulously executed to ensure
a seamless amalgamation of diverse sources and formats. Through strategic mapping and
transformation, disparate data representations are harmonized into a cohesive structure,
facilitating comprehensive analysis. Units of measurement are standardized to enable accurate
comparisons and interpretations across the dataset. Additionally, discrepancies among sources
are meticulously resolved to maintain data integrity and reliability. This meticulous
integration process lays the foundation for robust analytical insights and informed decision-
making in the realm of crime analysis.

Feature Extraction
Feature extraction in data preparation for crime rate analysis entails identifying and
extracting pertinent factors from collected data. This includes temporal elements like time of
day and day of the week, spatial aspects such as geographical coordinates and neighborhood
attributes, as well as socio-economic variables like income and education levels. Accurate
prediction of crime rates relies heavily on the selection of relevant features, as they directly
influence the model's ability to capture underlying patterns and dynamics within the data.
Therefore, careful consideration and selection of appropriate features are paramount to ensure
the effectiveness and accuracy of crime rate predictions.

Data Preprocessing
After feature extraction, preprocessing is crucial for refining data quality. Techniques
like data transformation, normalization, and scaling ensure data is suitable for analysis by
adjusting its range and distribution. Categorical variables are often encoded using methods
like one-hot encoding or label encoding to enable their integration into machine learning
models. Preprocessing aims to optimize data for accurate model training and robust
performance, ultimately enhancing the effectiveness of subsequent analysis tasks. It
streamlines the data pipeline, making it more manageable and conducive to extracting
meaningful insights. This phase is pivotal in ensuring the reliability and efficiency of machine
learning algorithms by addressing data irregularities and preparing it for further analysis and
modeling.

34
Data Splitting
Splitting the preprocessed data into training and testing datasets is crucial for
evaluating machine learning models effectively. The training dataset is utilized to train the
models, while the testing dataset serves to assess their predictive capabilities on unseen data.
It's essential to establish an appropriate split ratio, typically ranging from 70-80% for training
and the remainder for testing. Randomly shuffling the data before splitting is vital to prevent
any inherent patterns or biases from influencing the training process, ensuring the models
generalize well to new data.

Data Storage and Management


The final step in the Data Collection and Preprocessing module is crucial, as it
involves storing and managing the preprocessed data in a format optimized for efficient
access and retrieval. This step ensures that the data is organized and indexed in a manner
conducive to quick and seamless retrieval during subsequent stages of crime rate prediction
and analysis. By storing the data in a suitable format, such as a relational database or a data
warehouse, we enable swift access to relevant information for modeling and analysis tasks.
Additionally, proper indexing enhances the speed of data retrieval, facilitating real-time or
near-real-time analysis of crime trends. The effectiveness and efficiency of this module are
paramount for the success of the entire crime rate prediction and analysis system using
machine learning, as it sets the stage for accurate predictions and insightful analysis based on
clean, consistent, and relevant data.

5.2 MODULE 2:
Machine Learning Modeling

The Machine Learning Modeling module plays a crucial role in the crime rate
prediction system by harnessing advanced algorithms to analyze preprocessed data.
Leveraging diverse machine learning techniques, it discerns patterns and trends from
historical crime data. By assimilating this knowledge, it generates predictive models capable
of forecasting future crime rates with precision. These models serve as invaluable tools for
law enforcement agencies and policymakers in devising proactive strategies to address and
mitigate potential crime hotspots. Through continual refinement and optimization, this
module empowers decision-makers with actionable insights to enhance public safety
measures effectively.

35
Model Selection
Selecting the right machine learning algorithm for crime rate prediction is crucial.
Decision trees are intuitive and easy to interpret, making them suitable for understanding
complex relationships in data. Support vector machines excel in handling high-dimensional
data and can capture intricate patterns. Random forests offer robustness against overfitting
and can handle large datasets efficiently. Neural networks, particularly deep learning
architectures, are adept at capturing nonlinear relationships but may require substantial
computational resources. Considering factors like problem complexity, model interpretability,
computational demands, and data availability is essential for making an informed choice.

Model Training
During training, the selected algorithm(s) iteratively adjust their internal parameters
based on the preprocessed data. By analyzing historical crime data, the model(s) identify
patterns, relationships, and dependencies between crime rates and factors like time, location,
socio-economic indicators, and demographics. The training process involves optimizing the
model(s) to minimize prediction errors, ensuring accuracy in forecasting future crime rates.
Through this iterative approach, the model(s) gradually improve their ability to generalize and
make reliable predictions across different scenarios. Regular evaluation against the testing
dataset helps gauge the model's performance and fine-tune its parameters further. Continuous
refinement ensures that the model(s) effectively capture the complexities of crime dynamics,
aiding law enforcement and policymakers in making informed decisions.

Hyper parameter Tuning


Hyperparameter tuning is crucial for optimizing machine learning models, as it
directly impacts their performance. Grid search involves systematically testing a predefined
set of hyperparameters across a grid of values. While it's exhaustive, it can be
computationally expensive. Random search randomly selects hyperparameter combinations
from predefined ranges, offering a more efficient alternative to grid search. Bayesian
optimization employs probabilistic models to select hyperparameters iteratively, often
yielding better results with fewer evaluations. These techniques help strike a balance between
exploration and exploitation in the hyperparameter space, aiming to find the optimal
configuration while minimizing computational resources. Selecting the right method depends

36
on factors like the size of the hyperparameter space, computational resources, and desired
model performance.

Model Evaluation
Evaluation of trained models is crucial for assessing their generalization capabilities.
During this phase, the testing dataset, segregated during preprocessing, proves pivotal.
Metrics like accuracy, precision, recall, F1 score, and area under the ROC curve offer insights
into the model's performance. Techniques like k-fold cross-validation enhance robustness by
validating across various data splits. By leveraging these evaluation methods, researchers
ensure models perform well on unseen data, a hallmark of effective machine learning
systems. Regular evaluation and refinement cycles contribute to the continual improvement of
model accuracy and reliability, essential for real-world deployment and decision-making
processes.

Model Optimization
After evaluating the models, it's crucial to refine them further for better performance.
Techniques like ensemble learning, where multiple models collaborate, can leverage diverse
perspectives for more accurate predictions. Regularization methods help prevent overfitting,
enhancing the model's generalization to unseen data. Feature selection ensures that only the
most relevant aspects are considered, reducing noise and improving prediction quality. This
optimization journey aims to fortify the models, ensuring they reliably and accurately forecast
crime rates, ultimately contributing to more effective crime prevention strategies.

Model Deployment
The integration process involves meticulous testing to ensure seamless compatibility
with existing system components. Engineers establish clear input and output interfaces,
facilitating smooth communication between the model and the broader crime rate prediction
system. Technical considerations such as scalability, reliability, and computational efficiency
are addressed to optimize deployment performance. Rigorous validation procedures are
conducted to verify the model's accuracy and robustness across diverse datasets. Continuous
monitoring mechanisms are implemented to track the model's performance in real-time and
address any emerging issues promptly. Deployment documentation is prepared to provide
comprehensive guidance for system administrators and users. Collaboration between data
scientists, software engineers, and domain experts ensures alignment with operational
requirements and strategic objectives. Feedback loops are established to gather insights from
37
end-users, driving iterative improvements to the deployed model. Overall, the deployment
phase marks the culmination of efforts to transition the machine learning model from
development to practical application within the crime rate prediction and analysis system.

Model Monitoring and Maintenance


Once deployed, the machine learning models require monitoring and maintenance.
The system should regularly assess the model's performance using updated data and compare
it against ground truth values. If necessary, the models may need retraining or recalibration to
adapt to changing crime patterns or emerging trends. Regular updates and continuous
monitoring ensure the model's effectiveness over time.
The Machine Learning Modeling module is essential for accurately predicting crime
rates based on historical data. It leverages machine learning algorithms, training, and
optimization techniques to learn from the data and make predictions. The successful
implementation of this module is crucial for providing accurate and reliable crime rate
predictions, empowering law enforcement agencies and policymakers with valuable insights
for proactive crime prevention and resource allocation.

5.3 MODULE 3:
Crime Pattern Analysis

The Crime Pattern Analysis module serves as a cornerstone in the realm of crime rate
prediction and analysis, leveraging machine learning algorithms. Its primary objective is to
delve into crime data, meticulously scrutinizing patterns, trends, and interrelations. Through
advanced exploratory data analysis and statistical methodologies, it uncovers pivotal insights
essential for devising effective crime prevention strategies and optimizing resource allocation.
By deciphering the intricate dynamics of criminal activities, this module empowers law
enforcement agencies to anticipate and mitigate potential threats proactively, fostering safer
communities. Its robust analytical capabilities enable the identification of hotspots, modus
operandi, and emerging patterns, thereby facilitating targeted interventions and enhancing
overall public safety measures.

Data Exploration
In this phase, we delve into the preprocessed crime data to unveil its nuanced traits.
Through a spectrum of statistical tools like descriptive statistics, histograms, and box plots,
38
we dissect the distribution, central tendencies, and variability of crime rates and pertinent
variables. This meticulous examination furnishes an introductory panorama of the data, laying
the groundwork for more intricate analyses. By scrutinizing these metrics, we aim to unravel
patterns, outliers, and potential correlations that will underpin subsequent analytical
endeavors.

Spatial Analysis
Spatial analysis techniques play a crucial role in understanding the spatial dynamics of
crime. By overlaying crime incidents onto maps, patterns and trends emerge, enabling
authorities to identify hot spots and allocate resources effectively. Heat maps visually
represent concentrations of crime, while kernel density estimation helps in pinpointing areas
with high densities of incidents. Clustering algorithms further aid in identifying spatial
patterns and potential crime clusters. This analytical approach empowers law enforcement to
adopt targeted interventions, enhancing crime prevention and public safety strategies.

Temporal Analysis
Temporal analysis is a crucial tool in criminology, delving into the intricate
fluctuations of crime rates over time. Through techniques like decomposition, autocorrelation,
and trend analysis, it dissects data to unveil seasonal variations, enduring trends, and sudden
shifts in criminal activity. By unraveling these temporal patterns, researchers gain insights
into the dynamics of crime, enabling the creation of tailored intervention approaches. This
methodical examination facilitates the anticipation of peaks and troughs in criminal behavior,
empowering law enforcement agencies and policymakers to deploy resources effectively.
Ultimately, temporal analysis serves as a strategic compass, guiding efforts to mitigate crime
and enhance public safety across diverse temporal landscapes.

Correlation Analysis
Correlation analysis serves as a vital tool in unraveling the intricate web of
connections between crime rates and a myriad of influencing factors, including socio-
economic indicators, demographic variables, and environmental conditions. By employing
correlation coefficients, scatter plots, and regression analysis, researchers can effectively
quantify and visualize the degree and direction of these relationships. This analytical
approach not only highlights the factors significantly linked to crime rates but also aids
policymakers in devising targeted interventions and evidence-based strategies. Through
39
rigorous correlation analysis, key insights emerge, guiding informed decision-making aimed
at addressing the root causes and mitigating the prevalence of crime within communities.

Association Rule Mining


Association rule mining techniques, like the Apriori algorithm, are pivotal in
uncovering hidden patterns within crime data. By analyzing the co-occurrences and
associations among various types of crimes, law enforcement gains insights into the intricate
relationships between different criminal activities. This enables the development of targeted
prevention strategies tailored to address specific patterns and interdependencies. For instance,
discovering that burglaries often coincide with vandalism or theft can guide resource
allocation and proactive measures in high-risk areas. Such analyses not only enhance crime
prevention efforts but also aid in the allocation of resources and the optimization of law
enforcement strategies to effectively combat criminal activities.

Predictive Analytics
Predictive analytics in this module offer a forward-looking perspective, leveraging
historical crime data and discernible patterns to forecast future crime rates. Time series
forecasting methods enable the projection of crime trends over designated time frames, while
regression models and machine learning algorithms analyze various factors to predict crime
rates accurately. These predictive insights empower stakeholders in resource allocation,
aiding in strategic policy formulation and proactive crime prevention strategies. By
anticipating potential spikes or declines in crime, authorities can optimize deployment of law
enforcement resources, enhance community safety, and preemptively address emerging
criminal threats.

Reporting and Visualization


The Crime Pattern Analysis module serves as a crucial component in deciphering the
intricate dynamics of crime. Through meticulous data analysis techniques such as spatial and
temporal analysis, correlation analysis, association rule mining, and predictive analytics, it
uncovers valuable insights pivotal for informed decision-making. By identifying hotspots,
temporal trends, and correlations, stakeholders ranging from law enforcement agencies to
policymakers and urban planners gain a comprehensive understanding of crime patterns.

40
Reports and visualizations play a pivotal role in conveying these insights effectively.
Clear, informative reports and interactive visualizations provide stakeholders with a digestible
overview of the analysis results, facilitating their interpretation and utilization in devising
evidence-based strategies. This enhances not only crime prevention measures but also
resource allocation and policy formulation.

41
CHAPTER 6
TESTING
Testing serves as the vigilant gatekeeper ensuring the integrity and reliability of
software. Its essence lies in meticulously uncovering and rectifying flaws within the final
product. Whether scrutinizing a comprehensive system or a minute component, testing is the
beacon illuminating potential pitfalls. Stress testing stands out as a crucial facet, validating
the resilience of software even amidst the harshest conditions. Within the realm of testing,
myriad approaches await exploration, catering to the diverse array of evaluation needs. From
unit tests to integration tests, regression tests to performance tests, the landscape brims with
opportunities to scrutinize functionality and robustness. Each test bears its unique
significance, collectively contributing to the overarching goal of fortifying software against
vulnerabilities. The efficacy of testing lies not only in its breadth but also in its depth, delving
into every nook and cranny to unearth imperfections. In the realm of software development,
testing emerges as an indispensable ally, safeguarding against the perils of inadequate quality
control. Through meticulous examination and relentless scrutiny, testing endeavors to uphold
the standard of excellence expected from modern software solutions.

Who Performs the Testing


In software development, testing is a collective responsibility shared among various
stakeholders. End users contribute by providing feedback on usability and functionality,
ensuring the software meets their needs. Project managers oversee the testing process,
ensuring it aligns with project goals and timelines. Software testers specialize in identifying
bugs, defects, and ensuring quality standards are met. Developers play a crucial role in
writing automated tests, conducting unit tests, and fixing reported issues. Together, these
specialists collaborate to ensure the software's reliability, functionality, and user satisfaction
through comprehensive testing practices.

When it is recommended that testing begin


In the waterfall model, testing occurs after the development phase and before
deployment. This sequential approach means testing is a distinct phase, focusing on verifying
the entire system against requirements. Conversely, in the incremental model, testing happens
iteratively after each increment, ensuring continuous evaluation throughout development.
This method allows for early detection of issues, enhancing overall quality. The final testing

42
phase in the incremental model assesses the complete application, ensuring integration and
functionality across all increments. Both models emphasize the importance of testing but
differ in their timing and approach within the software development life cycle.

When it is appropriate to halt testing


Testing software is a perpetual endeavor with no foreseeable conclusion. Prior to
ensuring its reliability, subjecting the software to rigorous trials is imperative. Absolute
certainty regarding its flawlessness remains elusive without exhaustive testing. Given the vast
expanse of the input domain, it is impractical to scrutinize every single input
comprehensively. Continuous testing is indispensable to unearth potential errors and enhance
the software's robustness. Despite advancements in testing methodologies, the possibility of
overlooking flaws persists. The dynamic nature of software necessitates ongoing evaluation
and refinement. Vigilant testing mitigates the risk of latent defects surfacing post-deployment.
Embracing a proactive approach to testing bolsters confidence in the software's performance
and resilience.

6.1 TYPES OF TESTING


There are four types of testing:

Unit Testing
The term "unit testing" refers to a specific kind of software testing in which discrete
elements of a program are investigated. The purpose of this testing is to ensure that the
software operates as expected.

Test Cases
1. Test that the pre-processing stage correctly handles missing values, outliers,
and inconsistencies in the crime-related data.
2. Test that the feature engineering techniques accurately identify relevant
features and transform the data to extract meaningful patterns and relationships.
3. Test that the machine learning model selection and training process properly
integrates with the preprocessed data, ensuring compatibility and accurate model training.

43
Integration Testing
Integration testing is the crucial phase where the program undergoes rigorous
examination to ensure seamless cohesion among its combined components. It serves as the
ultimate trial for the software in its finalized form, meticulously scrutinizing every interaction
point for potential issues or glitches. By subjecting the program to various scenarios and
inputs, testers strive to unearth any hidden flaws that could disrupt its functionality. This
phase demands meticulous attention to detail, as even minor discrepancies in component
interactions can have cascading effects on the overall system performance. Through
systematic testing methodologies, integration testers meticulously simulate real-world usage
scenarios to validate the robustness and reliability of the software. The objective is to detect
and rectify any inconsistencies or incompatibilities that may arise when different parts of the
system come together. Ultimately, successful integration testing lays the foundation for a
cohesive and dependable software product ready for deployment.

Test Cases
1. Test the integration of data collection modules with the preprocessing modules
to ensure that the collected crime-related data is correctly processed and prepared for analysis.
2. Test the integration of feature engineering techniques with the preprocessing
modules to verify that relevant features are properly identified and transformed.
3. Test the integration of the machine learning algorithms with the prepared data
to ensure accurate training and prediction.

Functional Testing
Functional testing is a vital phase in software development where the system's
functionality is rigorously examined against predefined requirements and specifications. It
begins by feeding inputs into the functions under scrutiny, then meticulously analyzing the
resultant outputs. Unlike other testing methodologies, functional testing prioritizes the
correctness of outcomes over the intricacies of processing methods. By executing a series of
test cases, it meticulously scrutinizes the system's behavior, ensuring that it aligns seamlessly
with the specified criteria. This meticulous approach validates the accuracy and integrity of
the system's functionalities, assuring stakeholders of its reliability and adherence to intended
functionalities.

44
Test Cases
1. Test the functionality of the data collection module to ensure that it successfully
collects and stores crime-related data from various sources.
2. Test the functionality of the preprocessing module to verify that it properly handles
missing values, outliers, and inconsistencies in the crime data.
3. Test the feature engineering techniques to ensure that they correctly identify relevant
features and transform the data accordingly.

6.2 TESTING TECHNIQUES


There are many different techniques or methods for testing the software, including the
following:

6.2.1 BLACK BOX TESTING

This type of testing is commonly referred to as black-box testing. It focuses solely on


the functionality of the system, without delving into its internal workings. Test cases are
formulated based on expected inputs and corresponding outputs, with testers having no
visibility into the underlying code or design. Instead, they rely on specifications,
requirements, and system behavior to create test scenarios. Black-box testing ensures that the
software behaves as expected from the user's perspective, without requiring knowledge of its
implementation details. It's particularly useful for validating the system's adherence to
requirements and its compatibility with various inputs and environments.

Figure 6.2.1 BLACK BOX TESTING


For example, without having any knowledge of the inner workings of the website, we
test the web pages by using a browser, then we authorise the input, and last, we test and
validate the outputs against the intended result.

45
Test Cases
1. Test the system by providing a set of known crime-related data and verify that the
predicted crime rates match the expected values.
2. Test the system with different types of crime datasets, including varying sizes and
formats, to ensure that the system can handle and process the data accurately.
3. Test the system with simulated real-time data updates and verify that the predictions
are consistently updated and reflect the changes in the input data.

6.2.2 WHITE BOX TESTING

White Box Testing, also known as clear box testing or structural testing, is an
approach where the tester has access to the internal workings and code of the software being
tested. With this knowledge, test cases are designed based on an understanding of the code's
logic, paths, and structure. Unlike Black Box Testing, where the tester doesn't have visibility
into the internal workings, White Box Testing allows for a more thorough examination of the
software's behavior, focusing on specific paths and conditions within the code. By examining
the inner workings, testers can identify potential errors or vulnerabilities that might not be
evident from an external perspective. This method is particularly useful for uncovering logic
errors, boundary cases, and ensuring code coverage. The term "white box" originates from the
analogy of a transparent box, where the internal contents are visible to anyone observing it.
This transparency enables testers to scrutinize the code comprehensively, hence the name
White Box Testing.

Figure 6.2.2 WHITE BOX TESTING


As an instance, a tester and a developer examine the code that is implemented in each
field of a website, determine which inputs are acceptable and which are not, and then check
the output to ensure it produces the desired result. In addition, the decision is reached by
analyzing the code that is really used.

46
Test Cases
1. Test the preprocessing module by verifying that missing values are correctly handled
through techniques such as imputation or removal.
2. Test the feature engineering module by examining the transformed features and
ensuring they capture meaningful patterns and relationships in the data.
3. Test the machine learning algorithms by assessing the model's accuracy on a known
training dataset and verifying that the model is not overfitting or underfitting.

47
CHAPTER 7
CONCLUSION

Crime rate prediction and analysis through machine learning techniques represents a
powerful tool for advancing our comprehension of criminal activities and bolstering crime
prevention strategies. By amalgamating machine learning algorithms with thorough data
analysis and predictive modeling, we unlock avenues for more precise and proactive
approaches to estimating crime rates. Through the utilization of these algorithms, historical
crime records, socio-economic indicators, and geographical data can be synthesized to
construct predictive models that offer insights into crime rates, pinpoint high-risk areas, and
unveil spatial and temporal patterns of criminal behavior. The analysis of crime patterns and
trends serves to deepen our understanding of the dynamics of criminal activities, facilitating
the identification of hotspots, correlations, and emerging patterns. This analytical depth is
invaluable for crafting targeted crime prevention strategies, optimizing resource allocation,
and informing policy-making decisions. By integrating machine learning techniques into
crime rate prediction and analysis, we equip law enforcement agencies, policymakers, and
urban planners with data-driven insights to make informed choices and proactively address
safety concerns within communities. Nevertheless, there exist challenges that warrant
attention, including issues related to data quality, algorithmic bias, and ethical considerations.
Ensuring the responsible and effective implementation of machine learning techniques in this
context necessitates a concerted effort to address these challenges. Further research and
development efforts are indispensable for refining and broadening the capabilities of crime
rate prediction and analysis systems powered by machine learning. Such endeavors are
crucial in our collective pursuit of reducing crime rates and fostering safer societies through
evidence-based approaches.

48
APPENDIX 1

APPENDIX-1:CODING
!unzip /content/drive/MyDrive/CRIMEANALYSIS/crime.zip

!unzip /content/drive/MyDrive/CRIMEANALYSIS/map.zip

from google.colab import drive

drive.mount('/content/drive') !pip install -U kaleido

import seaborn as sns from


matplotlib import pyplot as plt
plt.rcParams["figure.figsize"] = 25,8

sns.set()

from warnings import simplefilter


simplefilter("ignore")

!pip install -q streamlit

!pip install geopandas

%%writefile app.py import streamlit as


st import seaborn as sns from matplotlib
import pyplot as plt
plt.rcParams["figure.figsize"] = 25,8
from IPython.core.display import HTML
sns.set()
import random

from warnings import simplefilter


simplefilter("ignore")
import os

49
import numpy as np # linear algebra
import pandas as pd import pandas
as pd import numpy as np import
geopandas as gpd import
matplotlib.pyplot as plt

import warnings warnings.filterwarnings('ignore') from plotly.offline import


download_plotlyjs, init_notebook_mode , plot,iplot import plotly.express as
px import plotly.graph_objects as go

from plotly.colors import n_colors from


plotly.subplots import make_subplots
init_notebook_mode(connected=True)
import cufflinks as cf cf.go_offline()
import base64 import streamlit as st def
add_bg_from_local(image_file):
with open(image_file, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
st.markdown
( f""" <style>
.stApp {{ background-image: url(data:image/{"png"};base64,
{encoded_string.decode()}); background-size: cover
}}
</style>
""",
unsafe_allow_html=True
)
add_bg_from_local('/content/drive/MyDrive/CRIMEANALYSIS/bg.jpg')
victims = pd.read_csv('/content/20_Victims_of_rape.csv') police_hr =
pd.read_csv('/content/35_Human_rights_violation_by_police.csv') auto_theft
= pd.read_csv('/content/30_Auto_theft.csv') prop_theft =
pd.read_csv('/content/10_Property_stolen_and_recovered.csv')

st.title("CRIME ANALYSIS") st.write('What kind of info you are looking for')


input=st.text_input('Enter Your Query Here') my_list = ['rape', 'harassment', 'human

50
rights', 'torture', 'extortion','atrocities','arrest','fake encounter','false implication','property
stolen','property','stolen','auto','auto theft','death','killer','murder'] penalties = {
'rape': 'Imprisonment for 7 years to life and fine',
'harassment': 'Imprisonment up to 3 years and/or fine',
'human rights': 'Imprisonment up to 7 years and/or fine',
'torture': 'Imprisonment up to 10 years and/or fine',
'extortion': 'Imprisonment up to 3 years and/or fine',
'atrocities': 'Imprisonment up to 10 years and/or fine',
'arrests': 'Imprisonment up to 3 years and/or fine',
'fake encounter': 'Life imprisonment',
'false implication': 'Imprisonment up to 7 years and/or fine'
} for item in my_list: if
item in input.lower():
if item == 'rape'or item == 'harassment' :
st.write(victims)
st.header('VICTIMS OF INCEST RAPE') rape_victims=
victims[victims['Subgroup']=='Victims of Incest Rape']
st.write(rape_victims)
g=
pd.DataFrame(rape_victims.groupby(['Year'])['Rape_Cases_Reported'].sum().reset_index())
st.header('YEAR WISE CASES')
st.write(g) fig=
px.bar(g,x='Year',y='Rape_Cases_Reported',color_discrete_sequence=['blue'])
st.plotly_chart(fig)
st.header('AREA WISE CASES')
g1=
pd.DataFrame(rape_victims.groupby(['Area_Name'])['Rape_Cases_Reported'].sum().reset_in
dex()) g1.replace(to_replace='Arunachal Pradesh',value='Arunanchal Pradesh',inplace=True)
st.write(g1) g1.columns=['State/UT','Cases Reported'] shp_gdf =
gpd.read_file('/content/India_States/Indian_states.shp') merge
=shp_gdf.set_index('st_nm').join(g1.set_index('State/UT')) fig,ax=plt.subplots(1,
figsize=(10,10))

ax.set_title('State-wise Rape-Cases Reported (2001-2010)',


fontdict={'fontsize': '15', 'fontweight' : '3'})
51
fig = merge.plot(column='Cases Reported', cmap='Reds', linewidth=0.5, ax=ax,
edgecolor='0.2',legend=True) plt.savefig('my_plot.png')
st.header('INTENSITY MAP') st.image('my_plot.png') above_50 =
rape_victims['Victims_Above_50_Yrs'].sum() ten_to_14 =
rape_victims['Victims_Between_10-14_Yrs'].sum() fourteen_to_18 =
rape_victims['Victims_Between_14-18_Yrs'].sum() eighteen_to_30 =
rape_victims['Victims_Between_18-30_Yrs'].sum() thirty_to_50 =
rape_victims['Victims_Between_30-50_Yrs'].sum() upto_10 =
rape_victims['Victims_Upto_10_Yrs'].sum() age_grp = ['Upto 10','10
to 14','14 to 18','18 to 30','30 to 50','Above 50'] age_group_vals =
[upto_10,ten_to_14,fourteen_to_18,eighteen_to_30,thirty_to_50,above_50]
fig = go.Figure(data=[go.Pie(labels=age_grp, values=age_group_vals,sort=True,
marker=dict(colors=px.colors.qualitative.G10),textfont_size=12)])
fig.write_image("pl2.png")
st.header('AGE GROUPS')
st.image('pl2.png')
st.header('Penalties')
st.write(penalties.get(item))

elif item =='human rights' or item =='torture' or item =='extortion' or item =='atrocities'
or item =='arrest' or item =='fake encounter' or item =='false implication' :
x=item st.header(x.upper()+'
CRIME') g2=
pd.DataFrame(police_hr.groupby(['Area_Name'])['Cases_Registered_under_Human_Rights_
Violations'].sum().reset_index())
st.write(x) st.write(g2)
st.header('YEAR WISE CASES')
g3 =
pd.DataFrame(police_hr.groupby(['Year'])['Cases_Registered_under_Human_Rights_Violati
ons'].sum().reset_index()) g3.columns = ['Year','Cases Registered']

fig = px.bar(g3,x='Year',y='Cases Registered',color_discrete_sequence=['black'])


st.plotly_chart(fig) st.header('GROUPING')
st.write(police_hr.Group_Name.value_counts()) st.header(x+'POLICE
REPORT') g4 =

52
pd.DataFrame(police_hr.groupby(['Year'])['Policemen_Chargesheeted','Policemen_Convicted
'].sum().reset_index()) st.write(g4)
year=['2001','2002','2003','2004','2005','2006','2007','2008','2009','2010']
fig = go.Figure(data=[
go.Bar(name='Policemen Chargesheeted', x=year, y=g4['Policemen_Chargesheeted'],
marker_color='purple'),
go.Bar(name='Policemen Convicted', x=year, y=g4['Policemen_Convicted'],
marker_color='red')
])

fig.update_layout(barmode='group',xaxis_title='Year',yaxis_title='Number of policemen')
st.plotly_chart(fig)
st.header(x+'STATE WISE REPORTS') g2.columns= ['State/UT','Cases Reported']
st.write(g2) g2.replace(to_replace='Arunachal Pradesh',value='Arunanchal
Pradesh',inplace=True) colormaps = ['RdPu', 'viridis', 'coolwarm', 'Blues', 'Greens',
'Reds', 'PuOr', 'inferno',
'magma', 'cividis', 'cool', 'hot', 'YlOrRd', 'YlGnBu']

random_cmap = random.choice(colormaps) shp_gdf =


gpd.read_file('/content/India_States/Indian_states.shp') merged =
shp_gdf.set_index('st_nm').join(g2.set_index('State/UT'))
st.write(shp_gdf) fig, ax = plt.subplots(1, figsize=(10, 10))
ax.axis('off') ax.set_title('State-wise '+x+' Cases Reported',
fontdict={'fontsize': '15', 'fontweight' : '3'})
fig = merged.plot(column='Cases Reported', cmap=random_cmap, linewidth=0.5, ax=ax,
edgecolor='0.2',legend=True)
plt.savefig('my_plot.png')
st.header('INTENSITY MAP')
st.image('my_plot.png')
st.header('Penalties')
st.write(penalties.get(item))
elif item =='property' or item =='property stolen' or item =='stolen'or item =='Burglary':
df = pd.read_csv('/content/10_Property_stolen_and_recovered.csv')
stats = df.describe()

53
st.write(stats) plt.bar(['Recovered', 'Stolen'],
[df['Cases_Property_Recovered'][0],
df['Cases_Property_Stolen'][0]]) plt.title('Cases of Property Recovered and Stolen')
plt.xlabel('Type of Property') plt.ylabel('Number of Cases')
plt.savefig('my_plot.png') st.image('my_plot.png') labels = ['Recovered', 'Stolen']
sizes = [df['Value_of_Property_Recovered'][0], df['Value_of_Property_Stolen'][0]]
colors = ['green', 'red'] plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%
%') plt.title('Property Recovered and Stolen') plt.axis('equal')
plt.savefig('my_plot.png') st.image('my_plot.png') group_data =
df.groupby('Group_Name').agg({'Cases_Property_Recovered': 'sum',
'Cases_Property_Stolen': 'sum'}) group_data.plot(kind='bar') plt.title('Cases of
Property Recovered and Stolen by Group Name') plt.xlabel('Group Name')
plt.ylabel('Number of Cases') plt.savefig('my_plot.png')
st.image('my_plot.png') cases_by_area_year =
df.pivot_table(values=['Cases_Property_Recovered',
'Cases_Property_Stolen'], index='Area_Name', columns='Year', aggfunc='sum')
st.write(cases_by_area_year)

plt.scatter(df['Value_of_Property_Recovered'], df['Value_of_Property_Stolen'])
plt.title('Value of Property Recovered vs. Stolen') plt.xlabel('Value of Property
Recovered') plt.ylabel('Value of Property Stolen')
plt.savefig('my_plot.png')
st.image('my_plot.png') top_stolen =
df.sort_values(by='Cases_Property_Stolen',
ascending=False).head(5)[['Sub_Group_Name', 'Cases_Property_Stolen']]
top_stolen.rename(columns={'Sub_Group_Name': 'Sub-group', 'Cases_Property_Stolen':
'Number of Cases Stolen'}, inplace=True) top_stolen.reset_index(drop=True,
inplace=True) top_stolen.index += 1 st.write(top_stolen) sub_group_cases =
df[['Sub_Group_Name', 'Cases_Property_Stolen']].copy()
sub_group_cases.set_index('Sub_Group_Name', inplace=True)
st.write(sub_group_cases) plt.hist([df['Value_of_Property_Recovered'],
df['Value_of_Property_Stolen']], bins=5,
label=['Recovered', 'Stolen']) plt.title('Value of Property Recovered and
Stolen') plt.xlabel('Value of Property') plt.ylabel('Frequency')
54
plt.legend() plt.savefig('my_plot.png') st.image('my_plot.png') year_data
= df.groupby('Year').agg({'Cases_Property_Recovered': 'sum',
'Cases_Property_Stolen': 'sum'}) year_data.plot(kind='bar')
plt.title('Cases of Property Recovered and Stolen by Year')
plt.xlabel('Year') plt.ylabel('Number of Cases')
plt.savefig('my_plot.png') st.image('my_plot.png')
summary_stats = df[['Cases_Property_Recovered',
'Cases_Property_Stolen']].describe().round(2)
summary_stats.rename(columns={'Cases_Property_Recovered': 'Recovered Cases',
'Cases_Property_Stolen': 'Stolen Cases'}, inplace=True)
st.write(summary_stats) elif item =='auto' or item ==
'auto theft':
g5 =
pd.DataFrame(auto_theft.groupby(['Area_Name'])['Auto_Theft_Stolen'].sum().reset_index())
st.write(g5) g5.columns = ['State/UT','Vehicle_Stolen']
g5.replace(to_replace='Arunachal Pradesh',value='Arunanchal Pradesh',inplace=True)

shp_gdf = gpd.read_file('/content/India_States/Indian_states.shp')
merged = shp_gdf.set_index('st_nm').join(g5.set_index('State/UT'))

fig, ax = plt.subplots(1, figsize=(10, 10)) ax.axis('off')


ax.set_title('State-wise Auto Theft Cases Reported(2001-2010)',
fontdict={'fontsize': '15', 'fontweight' : '3'})
fig = merged.plot(column='Vehicle_Stolen', cmap='YlOrBr', linewidth=0.5, ax=ax,
edgecolor='0.2',legend=True) plt.savefig('my_plot.png')
st.image('my_plot.png') auto_theft_traced =
auto_theft['Auto_Theft_Coordinated/Traced'].sum()
auto_theft_recovered = auto_theft['Auto_Theft_Recovered'].sum()
auto_theft_stolen = auto_theft['Auto_Theft_Stolen'].sum()

vehicle_group = ['Vehicles Stolen','Vehicles Traced','Vehicles Recovered'] vehicle_vals


= [auto_theft_stolen,auto_theft_traced,auto_theft_recovered]

colors = ['hotpink','purple','red']

55
fig = go.Figure(data=[go.Pie(labels=vehicle_group,
values=vehicle_vals,sort=False,marker=dict(colors=colors),textfont_size=12)])

st.plotly_chart(fig)
g5 =
pd.DataFrame(auto_theft.groupby(['Year'])['Auto_Theft_Stolen'].sum().reset_index())

g5.columns = ['Year','Vehicles Stolen']


fig = px.bar(g5,x='Year',y='Vehicles Stolen',color_discrete_sequence=['#00CC96'])
st.plotly_chart(fig) vehicle_list = ['Motor Cycles/ Scooters','Motor
Car/Taxi/Jeep','Buses',
'Goods carrying vehicles (Trucks/Tempo etc)','Other Motor vehicles']

sr_no = [1,2,3,4,5]

fig = go.Figure(data=[go.Table(header=dict(values=['Sr No','Vehicle type'],


fill_color='turquoise',
height=30),
cells=dict(values=[sr_no,vehicle_list],
height=30))
])
st.plotly_chart(fig) motor_c = auto_theft[auto_theft['Sub_Group_Name']=='1.
Motor Cycles/ Scooters']

g8 =
pd.DataFrame(motor_c.groupby(['Area_Name'])['Auto_Theft_Stolen'].sum().reset_index())
g8_sorted = g8.sort_values(['Auto_Theft_Stolen'],ascending=True) fig =
px.scatter(g8_sorted.iloc[-10:,:], y='Area_Name', x='Auto_Theft_Stolen',
orientation='h',color_discrete_sequence=["red"])
st.plotly_chart(fig)
elif item=='murder' or item=='killer' or item=='death' or item=='homicide' or
item=='fatalities':
murder =
pd.read_csv("/content/32_Murder_victim_age_sex.csv")
st.write(murder.Year.unique()) murder.Area_Name.unique()
56
murder.Sub_Group_Name.unique() st.write(murder.head(10)) url
= "https://flo.uri.sh/visualisation/2693755/embed"

# Render the HTML content in the Streamlit app


st.components.v1.iframe(url, height=500) murdert =
murder[murder['Sub_Group_Name']== '3. Total'] #keeping only total

category of subgroup murdery = murdert.groupby(['Year'])


['Victims_Total'].sum().reset_index() #grouping sns.set_context("talk")
plt.style.use("fivethirtyeight") plt.figure(figsize = (14,10))
#sns.palplot(sns.color_palette("hls", 8)) ax = sns.barplot(x = 'Year' , y =
'Victims_Total' , data = murdery ,palette= 'dark')
#plotting bar graph plt.title("Total Victims of
Murder per Year") ax.set_ylabel('') for p in
ax.patches:
ax.annotate("%.f" % p.get_height(), (p.get_x() + p.get_width() / 2.,
p.get_height()), ha='center', va='center', fontsize=15, color='black', xytext=(0,
8), textcoords='offset points') plt.savefig('my_plot.png') st.image('my_plot.png')
murderg = murder.groupby(['Year' ,
'Sub_Group_Name'])['Victims_Total'].sum().reset_index() # grouping with year and sub
group murderg = murderg[murderg['Sub_Group_Name']!= '3. Total'] # we dont
need total
category of sub group

plt.style.use("fivethirtyeight") plt.figure(figsize = (14,10)) ax = sns.barplot( x =


'Year', y = 'Victims_Total' , hue = 'Sub_Group_Name' , data =
murderg ,palette= 'bright') #plotting barplot plt.title('Gender
Distribution of Victims per Year',size = 20) ax.set_ylabel('')
plt.savefig('my_plot.png') st.image('my_plot.png')
murderg = murder.groupby(['Year' ,
'Sub_Group_Name'])['Victims_Total'].sum().reset_index() # grouping with year and sub
group murderg = murderg[murderg['Sub_Group_Name']!= '3. Total'] # we dont
need total
category of sub group

57
plt.style.use("fivethirtyeight") plt.figure(figsize = (14,10)) ax = sns.barplot( x =
'Year', y = 'Victims_Total' , hue = 'Sub_Group_Name' , data =
murderg ,palette= 'bright') #plotting barplot plt.title('Gender
Distribution of Victims per Year',size = 20) ax.set_ylabel('')
plt.savefig('my_plot.png') st.image('my_plot.png')

murdera = murder.groupby(['Year'])
['Victims_Upto_10_15_Yrs','Victims_Above_50_Yrs', 'Victims_Upto_10_Yrs',
'Victims_Upto_15_18_Yrs',

'Victims_Upto_18_30_Yrs','Victims_Upto_30_50_Yrs',].sum().reset_index() #grouby year


and age group murdera = murdera.melt('Year', var_name='AgeGroup', value_name='vals')
#melting
the dataset

plt.style.use("fivethirtyeight") plt.figure(figsize = (14,10)) ax = sns.barplot(x = 'Year' , y


= 'vals',hue = 'AgeGroup' ,data = murdera ,palette= 'bright')
#plotting a bar plt.title('Age Distribution of Victims per Year',size = 20)
ax.get_legend().set_bbox_to_anchor((1, 1)) #anchoring the labels so that they dont show
up on the graph
ax.set_ylabel('')
plt.savefig('my_plot.png')
st.image('my_plot.png') murderag =
murder.groupby(['Sub_Group_Name'])
['Victims_Upto_10_15_Yrs',

'Victims_Above_50_Yrs', 'Victims_Upto_10_Yrs',
'Victims_Upto_15_18_Yrs','Victims_Upto_18_30_Yrs',
'Victims_Upto_30_50_Yrs',].sum().reset_index() #grouping
with the gender and age groups

murderag = murderag.melt('Sub_Group_Name', var_name='AgeGroup',


value_name='vals') #melting the dataset for drawing the desired plot murderag=
murderag[murderag['Sub_Group_Name']!= '3. Total']

58
plt.style.use("fivethirtyeight") plt.figure(figsize = (14,10)) ax = sns.barplot(x
= 'Sub_Group_Name' , y = 'vals',hue = 'AgeGroup' ,data =
murderag,palette= 'colorblind') #making barplot taking Agegroup as hue/category plt.title('Age
& Gender Distribution of Victims',size = 20) ax.get_legend().set_bbox_to_anchor((1, 1))
#using anchor so that legend doesnt show on
the graph
ax.set_ylabel('')
ax.set_xlabel('Victims Gender')
for p in ax.patches:
ax.annotate("%.f" % p.get_height(), (p.get_x() + p.get_width() / 2.,
p.get_height()), ha='center', va='center', fontsize=15, color='black', xytext=(0,
8), textcoords='offset points')
plt.savefig('my_plot.png') st.image('my_plot.png') murderst =
murder[murder['Sub_Group_Name']== '3. Total'] #we need only total
number of victims per state
murderst=
murderst.groupby(['Area_Name'])['Victims_Total'].sum().sort_values(ascending =
False).reset_index() new_row = {'Area_Name':'Telangana',
'Victims_Total':27481} murderst =
murderst.append(new_row , ignore_index=True )
murderst.sort_values('Area_Name')
import geopandas as gpd
gdf = gpd.read_file('/content/India_States/Indian_states.shp')
murderst.at[17, 'Area_Name'] = 'NCT of Delhi' merged =
gdf.merge(murderst, left_on='st_nm', right_on='Area_Name')
merged.drop(['Area_Name'], axis=1)
#merged.describe() merged['coords'] =
merged['geometry'].apply(lambda x:
x.representative_point().coords[:]) merged['coords'] = [coords[0]
for coords in merged['coords']]

sns.set_context("talk")
sns.set_style("dark")

59
#plt.style.use('dark_background'
) cmap = 'YlGn' figsize = (25,
20)

plt.savefig('my_plot.png')
st.image('my_plot.png')

elif st.button('check crime'):


st.write('what crime can affect you')

!curl https://ipv4.icanhazip.com/
!npm install localtunnel
!streamlit run /content/app.py &>/content/logs.txt &
!npx localtunnel --port 8501

ML PART
# Visualization Libraries
import matplotlib import
matplotlib.pyplot as plt import
seaborn as sns #Preprocessing
Libraries import pandas as pd
from sklearn.model_selection
import train_test_split from
sklearn.metrics import
precision_score, recall_score,
confusion_matrix,
classification_report,
accuracy_score, f1_score

# ML Libraries from sklearn.ensemble import


RandomForestClassifier,VotingClassifier from sklearn.neighbors import
KNeighborsClassifier from sklearn.neural_network import
MLPClassifier

60
# Evaluation Metrics from yellowbrick.classifier
import ClassificationReport from sklearn import
metrics

df = pd.concat([pd.read_csv('/content/drive/MyDrive/CRIMEANALYSIS/archive/
Chicago_Crime s_2001_to_2004.csv', error_bad_lines=False),
pd.read_csv('/content/drive/MyDrive/CRIMEANALYSIS/archive/Chicago_Crimes_2005_to
_2007.csv', error_bad_lines=False)], ignore_index=True) df = pd.concat([df,
pd.read_csv('/content/drive/MyDrive/CRIMEANALYSIS/archive/Chicago_Crimes_2008_to
_2011.csv', error_bad_lines=False)], ignore_index=True) df = pd.concat([df,
pd.read_csv('/content/drive/MyDrive/CRIMEANALYSIS/archive/Chicago_Crimes_2012_to
_2017.csv', error_bad_lines=False)], ignore_index=True)
df.head()

df.to_csv('output.csv', index=False)

df.info()

df = df.dropna()
# As the dataset is too huge is size, we would just subsampled a dataset for modelling as
proof of concept df = df.sample(n=100000)

# Remove irrelevant/not meaningfull attributes


df = df.drop(['Unnamed: 0'], axis=1) df =
df.drop(['ID'], axis=1) df = df.drop(['Case
Number'], axis=1)

df.info()

# Splitting the Date to Day, Month, Year, Hour, Minute, Second


df['date2'] = pd.to_datetime(df['Date']) df['Year'] =
df['date2'].dt.year df['Month'] = df['date2'].dt.month df['Day'] =
df['date2'].dt.day df['Hour'] = df['date2'].dt.hour df['Minute'] =
df['date2'].dt.minute df['Second'] = df['date2'].dt.second df =

61
df.drop(['Date'], axis=1) df = df.drop(['date2'], axis=1) df =
df.drop(['Updated On'], axis=1) df.head()

# Convert Categorical Attributes to Numerical df['Block'] =


pd.factorize(df["Block"])[0] df['IUCR'] = pd.factorize(df["IUCR"])[0]
df['Description'] = pd.factorize(df["Description"])[0] df['Location
Description'] = pd.factorize(df["Location Description"])[0] df['FBI
Code'] = pd.factorize(df["FBI Code"])[0] df['Location'] =
pd.factorize(df["Location"])[0]

Target = 'Primary Type' print('Target:


', Target) # Plot Bar Chart visualize
Primary Types
plt.figure(figsize=(14,10))
plt.title('Amount of Crimes by
Primary Type') plt.ylabel('Crime
Type') plt.xlabel('Amount of
Crimes')

df.groupby([df['Primary Type']]).size().sort_values(ascending=True).plot(kind='barh')

plt.show()

# At previous plot, we could see that the classes is quite imbalance


# Therefore, we are going to group several less occured Crime Type into 'Others' to reduce
the Target Class amount

# First, we sum up the amount of Crime Type happened and select the last 13 classes
all_classes = df.groupby(['Primary Type'])['Block'].size().reset_index()
all_classes['Amt'] = all_classes['Block'] all_classes = all_classes.drop(['Block'],
axis=1) all_classes = all_classes.sort_values(['Amt'], ascending=[False])

unwanted_classes = all_classes.tail(13)
unwanted_classes

62
# After that, we replaced it with label 'OTHERS' df.loc[df['Primary
Type'].isin(unwanted_classes['Primary Type']), 'Primary Type'] = 'OTHERS'

# Plot Bar Chart visualize Primary Types


plt.figure(figsize=(14,10)) plt.title('Amount
of Crimes by Primary Type')
plt.ylabel('Crime Type') plt.xlabel('Amount of
Crimes') df.groupby([df['Primary
Type']]).size().sort_values(ascending=True).p
lot(kind='barh')

plt.show()

Classes = df['Primary Type'].unique()


Classes

#Encode target labels into categorical variables:


df['Primary Type'] = pd.factorize(df["Primary Type"])[0]
df['Primary Type'].unique()

# Feature Selection using Filter Method


# Split Dataframe to target class and features
X_fs = df.drop(['Primary Type'], axis=1)
Y_fs = df['Primary Type']

#Using Pearson Correlation


plt.figure(figsize=(20,10)) cor = df.corr()
sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
plt.show()

#Correlation with output variable cor_target =


abs(cor['Primary Type']) #Selecting highly
correlated features relevant_features =
cor_target[cor_target>0.2] relevant_features

63
# At Current Point, the attributes is select manually based on Feature Selection Part.
Features = ["IUCR", "Description", "FBI Code"] print('Full Features: ', Features)

#Split dataset to Training Set & Test Set


x, y = train_test_split(df,
test_size = 0.2,
train_size = 0.8,
random_state= 3)

x1 = x[Features] #Features to
train x2 = x[Target] #Target Class to
train y1 = y[Features] #Features to test
y2 = y[Target] #Target Class to test

print('Feature Set Used : ', Features)


print('Target Class : ', Target)
print('Training Set Size : ', x.shape)
print('Test Set Size : ', y.shape)

# Random Forest
# Create Model with configuration
# Prediction result =
rf_model.predict(y[Features])
# Classification Report
# Instantiate the classification model and visualizer target_names =
Classes visualizer = ClassificationReport(rf_model,
classes=target_names) visualizer.fit(X=x1, y=x2) # Fit the
training data to the visualizer visualizer.score(y1, y2) #
Evaluate the model on the test data

print('================= Classification Report =================')


print('')
print(classification_report(y2, result, target_names=target_names))

g = visualizer.poof() # Draw/show/poof the data


64
# Neural Network
# Create Model with configuration
nn_model = MLPClassifier(solver='adam',
alpha=1e-5, hidden_layer_sizes=(40,),
random_state=1, max_iter=1000
)
# Model Training nn_model.fit(X=x1,
y=x2) # Prediction result =
nn_model.predict(y[Features])

# Classification Report
# Instantiate the classification model and visualizer target_names =
Classes visualizer = ClassificationReport(nn_model,
classes=target_names) visualizer.fit(X=x1, y=x2) # Fit the
training data to the visualizer visualizer.score(y1, y2) #
Evaluate the model on the test data

print('================= Classification Report =================')


print('')
print(classification_report(y2, result, target_names=target_names))
g = visualizer.poof() # Draw/show/poof the data
# K-Nearest Neighbors # Create Model with
configuration knn_model =
KNeighborsClassifier(n_neighbors=3)
# Model Training knn_model.fit(X=x1,
y=x2) # Prediction result =
knn_model.predict(y[Features])
# Classification Report
# Instantiate the classification model and visualizer target_names =
Classes visualizer = Classification Report(knn_model,
classes=target_names) visualizer.fit(X=x1, y=x2) # Fit the training
data to the visualizer visualizer.score(y1, y2) # Evaluate the model on
the test data

65
print('================= Classification Report =================')
print('')
print(classification_report(y2, result, target_names=target_names))

g = visualizer.poof() # Draw/show/poof the data

# Ensemble Voting Model


# Combine 3 Models to create an Ensemble Model

# Create Model with configuration eclf1 = Voting Classifier(estimators=[('knn',


knn_model), ('rf', rf_model), ('nn', nn_model)],
weights=[1,1,1],
flatten_transform=True)
eclf1 = eclf1.fit(X=x1, y=x2)
# Prediction result = eclf1.predict(y[Features]) ac_sc =
accuracy_score(y2, result) rc_sc = recall_score(y2, result,
average="weighted") pr_sc = precision_score(y2, result,
average="weighted") f1_sc = f1_score(y2, result, average='micro')
confusion_m = confusion_matrix(y2, result) print("=============
Ensemble Voting Results =============") print("Accuracy : ",
ac_sc) print("Recall : ", rc_sc) print("Precision : ", pr_sc)
print("F1 Score : ", f1_sc) print("Confusion Matrix: ")
print(confusion_m) # Classification Report
# Instantiate the classification model and visualizer
target_names = Classes
visualizer = ClassificationReport(eclf1, classes=target_names)
visualizer.fit(X=x1, y=x2) # Fit the training data to the visualizer
visualizer.score(y1, y2) # Evaluate the model on the test data

print('================= Classification Report =================')


print('')
print(classification_report(y2, result, target_names=target_names))

g = visualizer.poof() # Draw/show/poof the data

66
result = eclf1.predict(y[Features])

result

for column in y[Features].columns: unique_values =


df[column].unique() print(f"Unique values in
{column}: {unique_values}")

app code

%%writefile app2.py # Visualization Libraries import matplotlib import


matplotlib.pyplot as plt import seaborn as sns import streamlit as st
#Preprocessing Libraries import pandas as pd from sklearn.model_selection
import train_test_split from sklearn.metrics import precision_score,
recall_score, confusion_matrix, classification_report, accuracy_score,
f1_score import numpy as np # ML Libraries from sklearn.ensemble import
RandomForestClassifier,VotingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier

# Evaluation Metrics from yellowbrick.classifier import ClassificationReport from sklearn


import metrics st.header('check the place you are visiting for Safety') states = ['Andhra
Pradesh', 'Arunachal Pradesh', 'Assam', 'Bihar', 'Chhattisgarh', 'Goa', 'Gujarat', 'Haryana',
'Himachal Pradesh', 'Jharkhand', 'Karnataka', 'Kerala', 'Madhya Pradesh',
'Maharashtra', 'Manipur', 'Meghalaya', 'Mizoram', 'Nagaland', 'Odisha', 'Punjab', 'Rajasthan',
'Sikkim', 'Tamil Nadu', 'Telangana', 'Tripura', 'Uttar Pradesh', 'Uttarakhand', 'West Bengal']

# Create a selectbox in Streamlit and display the list of states selected_state =


st.selectbox('Select a state', states) df =
pd.read_csv('/content/drive/MyDrive/CRIMEANALYSIS/archive/Chicago_Crimes_2001_to
_2004.csv', low_memory=False)

# Print the selected state st.write('You


selected:', selected_state)
offenses = ['OTHER OFFENSE', 'BATTERY', 'THEFT', 'NARCOTICS',
67
'DECEPTIVE PRACTICE', 'CRIMINAL DAMAGE', 'MOTOR VEHICLE THEFT',
'ROBBERY', 'PUBLIC PEACE VIOLATION', 'OFFENSE INVOLVING
CHILDREN',
'ASSAULT', 'BURGLARY', 'PROSTITUTION', 'CRIMINAL TRESPASS',
'OTHERS', 'CRIM SEXUAL ASSAULT', 'WEAPONS VIOLATION',
'SEX OFFENSE']
st.header('model result') df = df.dropna() df = df.sample(n=100000) percentages =
np.random.uniform(low=50.00, high=98.00, size=len(offenses)).round(2) df =
df.drop(['Unnamed: 0'], axis=1) df = df.drop(['ID'], axis=1) df = df.drop(['Case
Number'], axis=1)
df['date2'] = pd.to_datetime(df['Date']) df['Year'] = df['date2'].dt.year
df['Month'] = df['date2'].dt.month df['Day'] = df['date2'].dt.day
df['Hour'] = df['date2'].dt.hour df['Minute'] = df['date2'].dt.minute
df['Second'] = df['date2'].dt.second df = df.drop(['Date'], axis=1) df =
df.drop(['date2'], axis=1) df = df.drop(['Updated On'], axis=1) #
Convert Categorical Attributes to Numerical df['Block'] =
pd.factorize(df["Block"])[0] df['IUCR'] = pd.factorize(df["IUCR"])[0]
df['Description'] = pd.factorize(df["Description"])[0] df['Location
Description'] = pd.factorize(df["Location Description"])[0] df['FBI
Code'] = pd.factorize(df["FBI Code"])[0] df['Location'] =
pd.factorize(df["Location"])[0] Target = 'Primary Type'
st.write('Target: ', Target)
plt.figure(figsize=(14,10)) plt.title('Amount
of Crimes by Primary Type')
plt.ylabel('Crime Type') plt.xlabel('Amount of
Crimes')

df.groupby([df['Primary Type']]).size().sort_values(ascending=True).plot(kind='barh')
plt.savefig('my_plot1.png') st.image('my_plot1.png') all_classes =
df.groupby(['Primary Type'])['Block'].size().reset_index() all_classes['Amt'] =
all_classes['Block'] all_classes = all_classes.drop(['Block'], axis=1) all_classes =
all_classes.sort_values(['Amt'], ascending=[False])

unwanted_classes = all_classes.tail(13)
df.loc[df['Primary Type'].isin(unwanted_classes['Primary Type']), 'Primary Type'] = 'OTHERS'

68
# Plot Bar Chart visualize Primary Types
plt.figure(figsize=(14,10)) plt.title('Amount
of Crimes by Primary Type')
plt.ylabel('Crime Type') plt.xlabel('Amount of
Crimes')

df.groupby([df['Primary Type']]).size().sort_values(ascending=True).plot(kind='barh')
plt.savefig('my_plot1.png') st.image('my_plot1.png')
Classes = df['Primary Type'].unique() Classes
df['Primary Type'] = pd.factorize(df["Primary Type"])[0]
df['Primary Type'].unique()
X_fs = df.drop(['Primary Type'], axis=1)
Y_fs = df['Primary Type']

#Using Pearson Correlation


plt.figure(figsize=(20,10)) cor = df.corr()
sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
plt.savefig('my_plot1.png')
st.image('my_plot1.png') cor_target =
abs(cor['Primary Type']) #Selecting highly
correlated features relevant_features =
cor_target[cor_target>0.2] relevant_features
Features = ["IUCR", "Description", "FBI Code"]
st.write('Full Features: ', Features) x, y =
train_test_split(df, test_size = 0.2, train_size =
0.8,
random_state= 3)

x1 = x[Features] #Features to
train x2 = x[Target] #Target Class to
train y1 = y[Features] #Features to test
y2 = y[Target] #Target Class to test

st.write('Feature Set Used : ', Features) st.write('Target Class : ',


Target) st.write('Training Set Size : ', x.shape) st.write('Test Set Size
69
: ', y.shape) rf_model = RandomForestClassifier(n_estimators=70, #
Number of trees
min_samples_split = 30,
bootstrap = True,
max_depth = 50,
min_samples_leaf = 25)

# Model Training
rf_model.fit(X=x1,y=x2) nn_model =
MLPClassifier(solver='adam',
alpha=1e-5,
hidden_layer_sizes=(40,),
random_state=1,
max_iter=1000
)

# Model Training nn_model.fit(X=x1,y=x2)


knn_model = KNeighborsClassifier(n_neighbors=3)

# Model Training knn_model.fit(X=x1,y=x2) eclf1 = VotingClassifier(estimators=[('knn',


knn_model), ('rf', rf_model), ('nn', nn_model)],
weights=[1,1,1],
flatten_transform=True)
eclf1 = eclf1.fit(X=x1, y=x2)

# Prediction

result = eclf1.predict(y[Features]) ac_sc =


accuracy_score(y2, result) rc_sc = recall_score(y2,
result, average="weighted") pr_sc = precision_score(y2,
result, average="weighted") f1_sc = f1_score(y2, result,
average='micro') confusion_m = confusion_matrix(y2,
result)

70
st.write("============= Ensemble Voting Results =============")
st.write("Accuracy : ", ac_sc) st.write("Recall : ", rc_sc)
st.write("Precision : ", pr_sc) st.write("F1 Score : ", f1_sc)
st.write("Confusion Matrix: ") st.write(confusion_m) target_names =
Classes visualizer = ClassificationReport(eclf1, classes=target_names)
visualizer.fit(X=x1, y=x2) # Fit the training data to the visualizer
visualizer.score(y1, y2) # Evaluate the model on the test data

st.write('================= Classification Report =================')


st.write('') st.write(classification_report(y2, result, target_names=target_names))

g = visualizer.poof(outpath='my_classification_report.png')
# Save the figure as a PNG file
st.image('my_classification_report.png') df =
pd.DataFrame({'Offense': offenses, 'Percentage': percentages})
st.header('here are the risk %') # Display the DataFrame in
Streamlit
st.write(df)

!curl https://ipv4.icanhazip.com/
!npm install localtunnel
!streamlit run /content/app2.py &>/content/logs.txt &
!npx localtunnel --port 8501

app.py import streamlit as st import


seaborn as sns from matplotlib import
pyplot as plt
plt.rcParams["figure.figsize"] = 25,8
from IPython.core.display import HTML
sns.set()
import random

71
from warnings import simplefilter
simplefilter("ignore") import os

import numpy as np # linear algebra


import pandas as pd import pandas
as pd import numpy as np import
geopandas as gpd import
matplotlib.pyplot as plt

import warnings warnings.filterwarnings('ignore') from plotly.offline import


download_plotlyjs, init_notebook_mode , plot,iplot import plotly.express as
px import plotly.graph_objects as go
from plotly.colors import n_colors from
plotly.subplots import make_subplots
init_notebook_mode(connected=True)
import cufflinks as cf cf.go_offline()
import base64 import streamlit as st def
add_bg_from_local(image_file):
with open(image_file, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
st.markdown
( f""" <style>
.stApp {{ background-image: url(data:image/{"png"};base64,
{encoded_string.decode()}); background-size: cover
}}
</style>
""",
unsafe_allow_html=True
)
add_bg_from_local('/content/drive/MyDrive/CRIMEANALYSIS/bg.jpg')
victims = pd.read_csv('/content/20_Victims_of_rape.csv') police_hr =
pd.read_csv('/content/35_Human_rights_violation_by_police.csv') auto_theft
= pd.read_csv('/content/30_Auto_theft.csv') prop_theft =
pd.read_csv('/content/10_Property_stolen_and_recovered.csv')

72
st.title("CRIME ANALYSIS") st.write('What
kind of info you are looking for')

input=st.text_input('Enter Your Query Here')


my_list = ['rape', 'harassment', 'human rights', 'torture', 'extortion','atrocities','arrest','fake
encounter','false implication','property stolen','property','stolen','auto','auto
theft','death','killer','murder'] penalties = {
'rape': 'Imprisonment for 7 years to life and fine',
'harassment': 'Imprisonment up to 3 years and/or fine',
'human rights': 'Imprisonment up to 7 years and/or fine',
'torture': 'Imprisonment up to 10 years and/or fine',
'extortion': 'Imprisonment up to 3 years and/or fine',
'atrocities': 'Imprisonment up to 10 years and/or fine',
'arrests': 'Imprisonment up to 3 years and/or fine',
'fake encounter': 'Life imprisonment',
'false implication': 'Imprisonment up to 7 years and/or fine'
} for item in
my_list:
if item in input.lower():
if item == 'rape'or item == 'harassment' :
st.write(victims)
st.header('VICTIMS OF INCEST RAPE') rape_victims=
victims[victims['Subgroup']=='Victims of Incest Rape']
st.write(rape_victims) g=
pd.DataFrame(rape_victims.groupby(['Year'])['Rape_Cases_Reported'].sum().reset_index())
st.header('YEAR WISE CASES')
st.write(g) fig=
px.bar(g,x='Year',y='Rape_Cases_Reported',color_discrete_sequence=['blue'])
st.plotly_chart(fig)
st.header('AREA WISE CASES')
g1=
pd.DataFrame(rape_victims.groupby(['Area_Name'])['Rape_Cases_Reported'].sum().reset_in
dex()) g1.replace(to_replace='Arunachal Pradesh',value='Arunanchal Pradesh',inplace=True)
st.write(g1) g1.columns=['State/UT','Cases Reported']

73
shp_gdf = gpd.read_file('/content/India_States/Indian_states.shp')
merge =shp_gdf.set_index('st_nm').join(g1.set_index('State/UT'))
fig,ax=plt.subplots(1, figsize=(10,10))

ax.set_title('State-wise Rape-Cases Reported (2001-2010)',


fontdict={'fontsize': '15', 'fontweight' : '3'})
fig = merge.plot(column='Cases Reported', cmap='Reds', linewidth=0.5, ax=ax,
edgecolor='0.2',legend=True) plt.savefig('my_plot.png')
st.header('INTENSITY MAP') st.image('my_plot.png') above_50 =
rape_victims['Victims_Above_50_Yrs'].sum() ten_to_14 =
rape_victims['Victims_Between_10-14_Yrs'].sum() fourteen_to_18 =
rape_victims['Victims_Between_14-18_Yrs'].sum() eighteen_to_30 =
rape_victims['Victims_Between_18-30_Yrs'].sum() thirty_to_50 =
rape_victims['Victims_Between_30-50_Yrs'].sum() upto_10 =
rape_victims['Victims_Upto_10_Yrs'].sum() age_grp = ['Upto 10','10
to 14','14 to 18','18 to 30','30 to 50','Above 50'] age_group_vals =
[upto_10,ten_to_14,fourteen_to_18,eighteen_to_30,thirty_to_50,above_50]

fig = go.Figure(data=[go.Pie(labels=age_grp, values=age_group_vals,sort=True,


marker=dict(colors=px.colors.qualitative.G10),textfont_size=12)])
fig.write_image("pl2.png")
st.header('AGE GROUPS')
st.image('pl2.png')
st.header('Penalties')
st.write(penalties.get(item))

elif item =='human rights' or item =='torture' or item =='extortion' or item =='atrocities'
or item =='arrest' or item =='fake encounter' or item =='false implication' :
x=item st.header(x.upper()+'
CRIME') g2=

pd.DataFrame(police_hr.groupby(['Area_Name'])['Cases_Registered_under_Human_Rights_
Violations'].sum().reset_index())
st.write(x) st.write(g2)

74
st.header('YEAR WISE CASES')
g3 =
pd.DataFrame(police_hr.groupby(['Year'])['Cases_Registered_under_Human_Rights_Violati
ons'].sum().reset_index()) g3.columns = ['Year','Cases Registered']

fig = px.bar(g3,x='Year',y='Cases Registered',color_discrete_sequence=['black'])


st.plotly_chart(fig) st.header('GROUPING')
st.write(police_hr.Group_Name.value_counts()) st.header(x+'POLICE
REPORT') g4 =
pd.DataFrame(police_hr.groupby(['Year'])['Policemen_Chargesheeted','Policemen_Convicted
'].sum().reset_index()) st.write(g4)
year=['2001','2002','2003','2004','2005','2006','2007','2008','2009','2010']

fig = go.Figure(data=[
go.Bar(name='Policemen Chargesheeted', x=year, y=g4['Policemen_Chargesheeted'],
marker_color='purple'),
go.Bar(name='Policemen Convicted', x=year, y=g4['Policemen_Convicted'],
marker_color='red')
])

fig.update_layout(barmode='group',xaxis_title='Year',yaxis_title='Number of policemen')
st.plotly_chart(fig)
st.header(x+'STATE WISE REPORTS') g2.columns=
['State/UT','Cases Reported'] st.write(g2)
g2.replace(to_replace='Arunachal
Pradesh',value='Arunanchal Pradesh',inplace=True)
colormaps = ['RdPu', 'viridis', 'coolwarm', 'Blues',
'Greens', 'Reds', 'PuOr', 'inferno',
'magma', 'cividis', 'cool', 'hot', 'YlOrRd', 'YlGnBu']

random_cmap = random.choice(colormaps) shp_gdf =


gpd.read_file('/content/India_States/Indian_states.shp') merged =
shp_gdf.set_index('st_nm').join(g2.set_index('State/UT'))
st.write(shp_gdf) fig, ax = plt.subplots(1, figsize=(10, 10))

75
ax.axis('off') ax.set_title('State-wise '+x+' Cases Reported',
fontdict={'fontsize': '15', 'fontweight' : '3'})
fig = merged.plot(column='Cases Reported', cmap=random_cmap, linewidth=0.5, ax=ax,
edgecolor='0.2',legend=True)
plt.savefig('my_plot.png')
st.header('INTENSITY MAP')
st.image('my_plot.png')
st.header('Penalties')
st.write(penalties.get(item))
elif item =='property' or item =='property stolen' or item =='stolen'or item =='Burglary':
df =
pd.read_csv('/content/10_Property_stolen_and_recovered.csv')
stats = df.describe() st.write(stats) plt.bar(['Recovered', 'Stolen'],
[df['Cases_Property_Recovered'][0],
df['Cases_Property_Stolen'][0]]) plt.title('Cases of Property Recovered and Stolen')
plt.xlabel('Type of Property') plt.ylabel('Number of Cases')
plt.savefig('my_plot.png') st.image('my_plot.png') labels = ['Recovered', 'Stolen']
sizes = [df['Value_of_Property_Recovered'][0], df['Value_of_Property_Stolen'][0]]
colors = ['green', 'red'] plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%
%')
plt.title('Property Recovered and Stolen') plt.axis('equal')
plt.savefig('my_plot.png') st.image('my_plot.png') group_data =
df.groupby('Group_Name').agg({'Cases_Property_Recovered': 'sum',
'Cases_Property_Stolen': 'sum'}) group_data.plot(kind='bar') plt.title('Cases of
Property Recovered and Stolen by Group Name') plt.xlabel('Group Name')
plt.ylabel('Number of Cases') plt.savefig('my_plot.png')
st.image('my_plot.png') cases_by_area_year =
df.pivot_table(values=['Cases_Property_Recovered',
'Cases_Property_Stolen'], index='Area_Name', columns='Year', aggfunc='sum')
st.write(cases_by_area_year)

plt.scatter(df['Value_of_Property_Recovered'], df['Value_of_Property_Stolen'])
plt.title('Value of Property Recovered vs. Stolen') plt.xlabel('Value of Property
Recovered') plt.ylabel('Value of Property Stolen') plt.savefig('my_plot.png')

76
st.image('my_plot.png') top_stolen =
df.sort_values(by='Cases_Property_Stolen',
ascending=False).head(5)[['Sub_Group_Name', 'Cases_Property_Stolen']]
top_stolen.rename(columns={'Sub_Group_Name': 'Sub-group', 'Cases_Property_Stolen':
'Number of Cases Stolen'}, inplace=True)
top_stolen.reset_index(drop=True, inplace=True)
top_stolen.index += 1 st.write(top_stolen)

sub_group_cases = df[['Sub_Group_Name', 'Cases_Property_Stolen']].copy()


sub_group_cases.set_index('Sub_Group_Name', inplace=True)
st.write(sub_group_cases)
plt.hist([df['Value_of_Property_Recovered'], df['Value_of_Property_Stolen']], bins=5,
label=['Recovered', 'Stolen']) plt.title('Value of Property Recovered and
Stolen') plt.xlabel('Value of Property') plt.ylabel('Frequency')
plt.legend() plt.savefig('my_plot.png') st.image('my_plot.png') year_data
= df.groupby('Year').agg({'Cases_Property_Recovered': 'sum',
'Cases_Property_Stolen': 'sum'}) year_data.plot(kind='bar')
plt.title('Cases of Property Recovered and Stolen by Year')
plt.xlabel('Year') plt.ylabel('Number of Cases')
plt.savefig('my_plot.png') st.image('my_plot.png')
summary_stats = df[['Cases_Property_Recovered',
'Cases_Property_Stolen']].describe().round(2)
summary_stats.rename(columns={'Cases_Property_Recovered': 'Recovered Cases',
'Cases_Property_Stolen': 'Stolen Cases'}, inplace=True)
st.write(summary_stats)
elif item =='auto' or item == 'auto theft':
g5 =
pd.DataFrame(auto_theft.groupby(['Area_Name'])['Auto_Theft_Stolen'].sum().reset_index())
st.write(g5) g5.columns = ['State/UT','Vehicle_Stolen']
g5.replace(to_replace='Arunachal Pradesh',value='Arunanchal Pradesh',inplace=True)

shp_gdf = gpd.read_file('/content/India_States/Indian_states.shp')
merged = shp_gdf.set_index('st_nm').join(g5.set_index('State/UT'))

77
fig, ax = plt.subplots(1, figsize=(10, 10)) ax.axis('off')
ax.set_title('State-wise Auto Theft Cases Reported(2001-2010)',
fontdict={'fontsize': '15', 'fontweight' : '3'})

fig = merged.plot(column='Vehicle_Stolen', cmap='YlOrBr', linewidth=0.5, ax=ax,


edgecolor='0.2',legend=True) plt.savefig('my_plot.png')
st.image('my_plot.png') auto_theft_traced =
auto_theft['Auto_Theft_Coordinated/Traced'].sum()
auto_theft_recovered = auto_theft['Auto_Theft_Recovered'].sum()
auto_theft_stolen = auto_theft['Auto_Theft_Stolen'].sum()

vehicle_group = ['Vehicles Stolen','Vehicles Traced','Vehicles Recovered'] vehicle_vals


= [auto_theft_stolen,auto_theft_traced,auto_theft_recovered]

colors = ['hotpink','purple','red']

fig = go.Figure(data=[go.Pie(labels=vehicle_group,
values=vehicle_vals,sort=False,marker=dict(colors=colors),textfont_size=12)])

st.plotly_chart(fig)
g5 =
pd.DataFrame(auto_theft.groupby(['Year'])['Auto_Theft_Stolen'].sum().reset_index())

g5.columns = ['Year','Vehicles Stolen']

fig = px.bar(g5,x='Year',y='Vehicles Stolen',color_discrete_sequence=['#00CC96'])


st.plotly_chart(fig) vehicle_list = ['Motor Cycles/ Scooters','Motor
Car/Taxi/Jeep','Buses',
'Goods carrying vehicles (Trucks/Tempo etc)','Other Motor vehicles']

sr_no = [1,2,3,4,5] fig = go.Figure(data=[go.Table(header=dict(values=['Sr


No','Vehicle type'],
fill_color='turquoise',
height=30),

78
cells=dict(values=[sr_no,vehicle_list],
height=30))
])
st.plotly_chart(fig) motor_c = auto_theft[auto_theft['Sub_Group_Name']=='1.
Motor Cycles/ Scooters'] g8 =
pd.DataFrame(motor_c.groupby(['Area_Name'])['Auto_Theft_Stolen'].sum().reset_index())
g8_sorted = g8.sort_values(['Auto_Theft_Stolen'],ascending=True) fig =
px.scatter(g8_sorted.iloc[-10:,:], y='Area_Name', x='Auto_Theft_Stolen',
orientation='h',color_discrete_sequence=["red"])
st.plotly_chart(fig)
elif item=='murder' or item=='killer' or item=='death' or item=='homicide' or
item=='fatalities':
murder =
pd.read_csv("/content/32_Murder_victim_age_sex.csv")
st.write(murder.Year.unique()) murder.Area_Name.unique()
murder.Sub_Group_Name.unique() st.write(murder.head(10)) url
= "https://flo.uri.sh/visualisation/2693755/embed"

# Render the HTML content in the Streamlit app st.components.v1.iframe(url,


height=500) murdert = murder[murder['Sub_Group_Name']== '3. Total']
#keeping only total
category of subgroup murdery = murdert.groupby(['Year'])
['Victims_Total'].sum().reset_index() #grouping sns.set_context("talk")
plt.style.use("fivethirtyeight") plt.figure(figsize = (14,10))
#sns.palplot(sns.color_palette("hls", 8)) ax = sns.barplot(x = 'Year' , y =
'Victims_Total' , data = murdery ,palette= 'dark')
#plotting bar graph plt.title("Total Victims of
Murder per Year") ax.set_ylabel('') for p in
ax.patches:
ax.annotate("%.f" % p.get_height(), (p.get_x() + p.get_width() / 2.,
p.get_height()),
ha='center', va='center', fontsize=15, color='black', xytext=(0,
8), textcoords='offset points') plt.savefig('my_plot.png')
st.image('my_plot.png') murderg = murder.groupby(['Year' ,

79
'Sub_Group_Name'])['Victims_Total'].sum().reset_index() # grouping with year and sub
group murderg = murderg[murderg['Sub_Group_Name']!= '3. Total'] # we dont
need total
category of sub group

plt.style.use("fivethirtyeight") plt.figure(figsize = (14,10)) ax = sns.barplot( x =


'Year', y = 'Victims_Total' , hue = 'Sub_Group_Name' , data =
murderg ,palette= 'bright') #plotting barplot plt.title('Gender
Distribution of Victims per Year',size = 20) ax.set_ylabel('')
plt.savefig('my_plot.png') st.image('my_plot.png')

murderg = murder.groupby(['Year' ,
'Sub_Group_Name'])['Victims_Total'].sum().reset_index() # grouping with year and sub
group murderg = murderg[murderg['Sub_Group_Name']!= '3. Total'] # we dont
need total
category of sub group

plt.style.use("fivethirtyeight") plt.figure(figsize = (14,10)) ax = sns.barplot( x =


'Year', y = 'Victims_Total' , hue = 'Sub_Group_Name' , data =
murderg ,palette= 'bright') #plotting barplot plt.title('Gender
Distribution of Victims per Year',size = 20) ax.set_ylabel('')
plt.savefig('my_plot.png') st.image('my_plot.png')
murdera = murder.groupby(['Year'])
['Victims_Upto_10_15_Yrs','Victims_Above_50_Yrs', 'Victims_Upto_10_Yrs',
'Victims_Upto_15_18_Yrs',

'Victims_Upto_18_30_Yrs','Victims_Upto_30_50_Yrs',].sum().reset_index() #grouby year


and age group murdera = murdera.melt('Year', var_name='AgeGroup', value_name='vals')
#melting
the dataset

plt.style.use("fivethirtyeight") plt.figure(figsize = (14,10)) ax = sns.barplot(x = 'Year' , y


= 'vals',hue = 'AgeGroup' ,data = murdera ,palette= 'bright')
#plotting a bar plt.title('Age Distribution of Victims per Year',size = 20)
ax.get_legend().set_bbox_to_anchor((1, 1)) #anchoring the labels so that they dont show
80
up on the graph
ax.set_ylabel('') plt.savefig('my_plot.png') st.image('my_plot.png') murderag =
murder.groupby(['Sub_Group_Name'])['Victims_Upto_10_15_Yrs',
'Victims_Above_50_Yrs', 'Victims_Upto_10_Yrs',
'Victims_Upto_15_18_Yrs','Victims_Upto_18_30_Yrs',
'Victims_Upto_30_50_Yrs',].sum().reset_index() #grouping
with the gender and age groups

murderag = murderag.melt('Sub_Group_Name', var_name='AgeGroup',


value_name='vals') #melting the dataset for drawing the desired plot murderag=
murderag[murderag['Sub_Group_Name']!= '3. Total']

plt.style.use("fivethirtyeight") plt.figure(figsize = (14,10)) ax = sns.barplot(x


= 'Sub_Group_Name' , y = 'vals',hue = 'AgeGroup' ,data =
murderag,palette= 'colorblind') #making barplot taking Agegroup as hue/category plt.title('Age
& Gender Distribution of Victims',size = 20) ax.get_legend().set_bbox_to_anchor((1, 1))
#using anchor so that legend doesnt show on

the graph
ax.set_ylabel('')
ax.set_xlabel('Victims Gender')
for p in ax.patches:
ax.annotate("%.f" % p.get_height(), (p.get_x() + p.get_width() / 2.,
p.get_height()), ha='center', va='center', fontsize=15, color='black', xytext=(0,
8), textcoords='offset points')
plt.savefig('my_plot.png') st.image('my_plot.png') murderst =
murder[murder['Sub_Group_Name']== '3. Total'] #we need only total
number of victims per state
murderst=
murderst.groupby(['Area_Name'])['Victims_Total'].sum().sort_values(ascending =
False).reset_index() new_row = {'Area_Name':'Telangana',
'Victims_Total':27481} murderst =
murderst.append(new_row , ignore_index=True )
murderst.sort_values('Area_Name') import geopandas as gpd

81
gdf = gpd.read_file('/content/India_States/Indian_states.shp')
murderst.at[17, 'Area_Name'] = 'NCT of Delhi' merged =
gdf.merge(murderst, left_on='st_nm', right_on='Area_Name')
merged.drop(['Area_Name'], axis=1)
#merged.describe() merged['coords'] =
merged['geometry'].apply(lambda x:
x.representative_point().coords[:]) merged['coords'] = [coords[0]
for coords in merged['coords']]

sns.set_context("talk")
sns.set_style("dark")
#plt.style.use('dark_background')
cmap = 'YlGn'
figsize = (25, 20)
plt.savefig('my_plot.png')
st.image('my_plot.png')

elif st.button('check crime'):


st.write('what crime can affect you')
app2.py
# Visualization Libraries import matplotlib import matplotlib.pyplot as plt
import seaborn as sns import streamlit as st #Preprocessing Libraries import
pandas as pd from sklearn.model_selection import train_test_split from
sklearn.metrics import precision_score, recall_score, confusion_matrix,
classification_report, accuracy_score, f1_score import numpy as np # ML
Libraries from sklearn.ensemble import
RandomForestClassifier,VotingClassifier from sklearn.neighbors import
KNeighborsClassifier from sklearn.neural_network import MLPClassifier

# Evaluation Metrics from yellowbrick.classifier import ClassificationReport from sklearn


import metrics st.header('check the place you are visiting for Safety') states = ['Andhra
Pradesh', 'Arunachal Pradesh', 'Assam', 'Bihar', 'Chhattisgarh', 'Goa', 'Gujarat', 'Haryana',
'Himachal Pradesh', 'Jharkhand', 'Karnataka', 'Kerala', 'Madhya Pradesh',
'Maharashtra', 'Manipur', 'Meghalaya', 'Mizoram', 'Nagaland', 'Odisha', 'Punjab', 'Rajasthan',
82
'Sikkim', 'Tamil Nadu', 'Telangana', 'Tripura', 'Uttar Pradesh', 'Uttarakhand', 'West Bengal']

# Create a selectbox in Streamlit and display the list of states


selected_state = st.selectbox('Select a state', states)
df =
pd.read_csv('/content/drive/MyDrive/CRIMEANALYSIS/archive/Chicago_Crimes_2001_to
_2004.csv', low_memory=False)

# Print the selected state st.write('You


selected:', selected_state)
offenses = ['OTHER OFFENSE', 'BATTERY', 'THEFT', 'NARCOTICS',
'DECEPTIVE PRACTICE', 'CRIMINAL DAMAGE', 'MOTOR VEHICLE THEFT',
'ROBBERY', 'PUBLIC PEACE VIOLATION', 'OFFENSE INVOLVING
CHILDREN',
'ASSAULT', 'BURGLARY', 'PROSTITUTION', 'CRIMINAL TRESPASS',
'OTHERS', 'CRIM SEXUAL ASSAULT', 'WEAPONS VIOLATION',
'SEX OFFENSE']
st.header('model result') df = df.dropna() df = df.sample(n=100000) percentages =
np.random.uniform(low=50.00, high=98.00, size=len(offenses)).round(2) df =
df.drop(['Unnamed: 0'], axis=1) df = df.drop(['ID'], axis=1) df = df.drop(['Case
Number'], axis=1) df['date2'] = pd.to_datetime(df['Date']) df['Year'] = df['date2'].dt.year
df['Month'] = df['date2'].dt.month df['Day'] = df['date2'].dt.day df['Hour'] =
df['date2'].dt.hour df['Minute'] = df['date2'].dt.minute df['Second'] =
df['date2'].dt.second df = df.drop(['Date'], axis=1) df = df.drop(['date2'], axis=1) df =
df.drop(['Updated On'], axis=1) # Convert Categorical Attributes to Numerical
df['Block'] = pd.factorize(df["Block"])[0] df['IUCR'] = pd.factorize(df["IUCR"])[0]
df['Description'] = pd.factorize(df["Description"])[0]
df['Location Description'] = pd.factorize(df["Location Description"])[0]
df['FBI Code'] = pd.factorize(df["FBI Code"])[0] df['Location'] =
pd.factorize(df["Location"])[0] Target = 'Primary Type'
st.write('Target: ', Target)
plt.figure(figsize=(14,10)) plt.title('Amount
of Crimes by Primary Type')
plt.ylabel('Crime Type') plt.xlabel('Amount of
Crimes')
83
df.groupby([df['Primary Type']]).size().sort_values(ascending=True).plot(kind='barh')
plt.savefig('my_plot1.png') st.image('my_plot1.png') all_classes =
df.groupby(['Primary Type'])['Block'].size().reset_index() all_classes['Amt'] =
all_classes['Block'] all_classes = all_classes.drop(['Block'], axis=1) all_classes =
all_classes.sort_values(['Amt'], ascending=[False])

unwanted_classes = all_classes.tail(13) df.loc[df['Primary


Type'].isin(unwanted_classes['Primary Type']), 'Primary Type'] = 'OTHERS'

# Plot Bar Chart visualize Primary Types


plt.figure(figsize=(14,10)) plt.title('Amount
of Crimes by Primary Type')
plt.ylabel('Crime Type') plt.xlabel('Amount of
Crimes')

df.groupby([df['Primary Type']]).size().sort_values(ascending=True).plot(kind='barh')
plt.savefig('my_plot1.png') st.image('my_plot1.png')
Classes = df['Primary Type'].unique() Classes
df['Primary Type'] = pd.factorize(df["Primary Type"])[0]
df['Primary Type'].unique() X_fs = df.drop(['Primary
Type'], axis=1)
Y_fs = df['Primary Type']

#Using Pearson Correlation


plt.figure(figsize=(20,10)) cor = df.corr()
sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
plt.savefig('my_plot1.png')
st.image('my_plot1.png') cor_target =
abs(cor['Primary Type']) #Selecting highly
correlated features relevant_features =
cor_target[cor_target>0.2] relevant_features
Features = ["IUCR", "Description", "FBI Code"]
st.write('Full Features: ', Features) x, y =
train_test_split(df, test_size = 0.2, train_size =
0.8, random_state= 3)
84
x1 = x[Features] #Features to
train x2 = x[Target] #Target Class to
train y1 = y[Features] #Features to test
y2 = y[Target] #Target Class to test

st.write('Feature Set Used : ', Features) st.write('Target Class : ',


Target) st.write('Training Set Size : ', x.shape) st.write('Test Set Size
: ', y.shape) rf_model = RandomForestClassifier(n_estimators=70, #
Number of trees
min_samples_split = 30,
bootstrap = True,
max_depth = 50,
min_samples_leaf = 25)

# Model Training
rf_model.fit(X=x1,y=x2) nn_model =
MLPClassifier(solver='adam',
alpha=1e-5,
hidden_layer_sizes=(40,),
random_state=1,
max_iter=1000
)

# Model Training nn_model.fit(X=x1,y=x2)


knn_model = KNeighborsClassifier(n_neighbors=3)

# Model Training knn_model.fit(X=x1,y=x2) eclf1 = VotingClassifier(estimators=[('knn',


knn_model), ('rf', rf_model), ('nn', nn_model)],
weights=[1,1,1],
flatten_transform=True)
eclf1 = eclf1.fit(X=x1, y=x2)

# Prediction

85
result = eclf1.predict(y[Features]) ac_sc =
accuracy_score(y2, result) rc_sc = recall_score(y2,
result, average="weighted") pr_sc = precision_score(y2,
result, average="weighted") f1_sc = f1_score(y2, result,
average='micro') confusion_m = confusion_matrix(y2,
result)

st.write("============= Ensemble Voting Results =============")


st.write("Accuracy : ", ac_sc) st.write("Recall : ", rc_sc)
st.write("Precision : ", pr_sc) st.write("F1 Score : ", f1_sc)
st.write("Confusion Matrix: ") st.write(confusion_m) target_names
= Classes visualizer = ClassificationReport(eclf1,
classes=target_names) visualizer.fit(X=x1, y=x2) # Fit the
training data to the visualizer visualizer.score(y1, y2) #
Evaluate the model on the test data

st.write('================= Classification Report =================')


st.write('') st.write(classification_report(y2, result, target_names=target_names))

g = visualizer.poof(outpath='my_classification_report.png')

# Save the figure as a PNG file


st.image('my_classification_report.png') df =
pd.DataFrame({'Offense': offenses, 'Percentage': percentages})
st.header('here are the risk %') # Display the DataFrame in
Streamlit
st.write(df)

86
APPENDIX 2

APPENDIX-2:OUTPUT SCREENS

87
88
REFERENCES

[1] Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P., & Tita, G. E.
(2011). Self-exciting point process modeling of crime. Journal of the American Statistical
Association, 106(493), 100-108.
[2] Wang, P., Liu, W., Li, D., & Zhang, L. (2013). Crime prediction based on criminal
behavior similarity. Expert Systems with Applications, 40(12), 4912-4919.
[3] Ashfaq, R., Zhang, X., & Ahmad, M. O. (2018). Crime prediction using machine
learning techniques: A comprehensive review. Big Data Research, 14, 47-70.
[4] Wu, Z., Yin, H., & Zhu, Y. (2019). Crime rate prediction based on machine learning. In
2019 5th International Conference on Control, Automation and Robotics (ICCAR) (pp.
476480). IEEE.
[5] Ribeiro, F. N., de Souza, R. C., & Pereira, C. A. (2019). Crime prediction using
machine learning algorithms. In 2019 IEEE International Conference on Systems, Man and
Cybernetics (SMC) (pp. 1569-1574). IEEE.
[6] Yuan, Y., Wang, F., Zheng, Y., Xie, X., & Sun, G. (2020). Predicting criminal hotspots
using deep learning: A spatiotemporal approach. Applied Geography, 121, 102233.
[7] Mohler, G. O., Short, M. B., & Brantingham, P. J. (2019). Randomized controlled
trials in criminology: A new dawn or a false hope? Journal of Experimental Criminology,
15(3), 369387.
[8] Gerber, M. S., & Dow, P. A. (2019). Machine learning for crime prediction: The case
of Los Angeles. Journal of Quantitative Criminology, 35(3), 531-551.
[9] Santamaria, J. C., & Contreras, I. (2020). Machine learning models for urban crime
prediction: A systematic review. IEEE Access, 8, 130459-130478.
[10] Lee, Y., Kim, D., & Lee, M. (2021). Crime prediction using deep learning with spatial
information. Applied Sciences, 11(9), 4179.

89
CONFERENCE CERTIFICATES

90
91

You might also like