Professional Documents
Culture Documents
Crime Data Mining - Case Study
Crime Data Mining - Case Study
Amit kumar
Gokulahasan
Nishanthi
Rajkumar
What is a crime?
The breach of one or more rules or laws for which
governing authority via police power may ultimately
prescribe a conviction.
It is injurious to the general population or the state.
So, crime prevention and identifying the criminals is
the necessity in today's society.
Major challenges
All law-enforcement and intelligence-
gathering organizations are currently facing
problems of accurately and efficiently
analyzing the growing volumes of crime data
Different modes, patterns, cross-border
operations, technologically advanced crimes
are difficult to track and solve the case.
Investigation of the crime takes longer
duration due to complexity of issues.
What is data mining?
“Data mining is a collection of techniques for efficient
automated discovery of previously unknown, valid,
novel, useful and understandable patterns in large
databases. The patterns must be actionable so that
they may be used in an enterprise’s decision making
process.”
Advantages of data mining
Lot of permutations & combinations can be
incorporated in the software
Less time consuming and better accuracy
Installing and running the software costs much
less than hiring personnel
Different data mining techniques or combination
of some can be incorporated in one assignment
Advancement in data mining field is yielding
better and better results
Which model suits the process of criminal
identification?
Different law-enforcement agencies are involved in
investigation of different kinds based on severity and
jurisdiction of crime.
Researchers have developed various automated data
mining techniques for both local law enforcement and
national security applications.
Objective of Crime Data Mining:
Using Data mining techniques to aid
analysis of data related to crimes
Extracting named entities from
narrative reports
Detecting deceptive criminal identities
Identifying criminal groups and key
members
Entity extraction
used to automatically identify persons, addresses,
vehicles, and personal characteristics from police
narrative reports
subsequently helps in grouping similar activities
by criminals and tracing their behavior
Its performance depends greatly on the availability
of extensive amounts of clean input data.
Clustering techniques
group data items into classes with similar characteristics to
maximize or minimize intraclass similarity
use the statistics-based concept space algorithm to
automatically associate different objects such as persons,
organizations, and vehicles in crime records
link analysis techniques to identify similar transactions
It can automate a major part of crime analysis but is limited
by the high computational intensity typically required
Association rule mining
discovers frequently occurring item sets in a database and
presents the patterns as rules
application in network intrusion detection to derive
association rules from users’ interaction history, detection
of intruders’ profiles to help detect potential future
network attacks.
Similar to this sequential pattern mining can be applied to
find patterns.
Performance of these techniques relies on the accuracy and
richness of available data .
Deviation detection
Used to overcome the deviation in the data produced by
the criminals so it’s also called outlier detection.
Applicable in fraud detection, network intrusion detection,
and other crime analyses
But identifying the incorrect data is itself a tedious job.
Classification
finds common properties among different crime entities
and organizes them into predefined classes
Applicable in identify the source of e-mail spamming based
on the sender’s linguistic patterns and structural features
used to predict crime trends, classification can reduce the
time required to identify crime entities
Performance is dependent on richness of data
String comparator
compares the textual fields in pairs of database records and
compute the similarity between the records
applicable in detect deceptive information such as name,
address, and Social Security number in criminal records
Social network analysis
Explains the roles of and interactions among nodes in a
conceptual network
Used to construct a network that illustrates criminals’
roles, the flow of tangible and intangible goods and
information, and associations among these entities
In-depth analysis can reveal critical roles and subgroups
and vulnerabilities inside the network
Caution
Entity Extraction & Sequential Pattern Mining –
requires rich data for accuracy
Clustering Techniques & String comparator – High
computational intensity
Deviation Detection – Appear to be normal
Classification – Predefined classification scheme
Social Networking – Low profile
Crime data mining framework
Identifies relationships between techniques applied in
criminal and intelligence analysis at various levels
Case 1: Named Entity Extraction
36 narcotics related cases-AI entity extractor
3 steps
-identifies noun phrases
-calculates a set of feature scores for phrases
-predicts the most likely entity type
Entities- names, addresses, vehicles, narcotics names,
physical characteristics
Case 2: Deceptive entity detection
Criminals provide false information about themselves
This creates redundancy in the database
Makes probing into further details about them,
difficult
An Alternative Analysis
Other techniques that
can be utilized
Entity extraction
Association rule mining
combined with outlier
detection
Criminal- Network Analysis
Problems: Drug, Cybercrime, Terrorism etc.
Clue: Criminals often develops networks in which they
form groups or teams to carry out various illegal
activites.
Objective
Primery: To identify subgroups and key members in
the criminal networks and studying interaction
patterns
Secondary: To develop effective strategies for
disrupting the networks
Data Collection
272 Tucson Police Department incidents summaries
Involving 164 crimes
Committed form 1985 through May 2002
Methods and Techniques
Concept Space (Clustering)
To extract criminal relations and create a likely
network of suspects.
Co – Occurrence Weight – To find the strength
Hierarchical Clustering – To partition the network into
subgroups
Block Modeling – To identify interaction patterns
between these subgroups
For Key Member….
Centrality measures
Degree
Betweenness
Closeness
164 Criminals
Sub Groups
Validation
2 hr field study with
3 Tucson Police Department domain experts
Evaluated the analysis’s validity
The analysis was valid.
Advantages
Increase crime analysts’ work productivity
Visualize Criminal Networks
Risk is reduced
Time is saved-Police can use it for other valuable tasks
Reduce error
Effective strategies can be formulated to disrupting
criminal networks