Professional Documents
Culture Documents
Demographics Crime Trends
Demographics Crime Trends
Akhilesh Pandey
I. I NTRODUCTION
Crime is defined as any action which is unlawful and
hinders the working of the society [6]. It is an unlawful activity and hence controlling crime is considered an important
duty of the government. Various methods are implemented
to achieve this goal. These have been broadly classified into
two regions- dissuasion techniques and policing techniques.
While the former focuses stricter punishment system, the
latter focuses on prevention of crime. Crime Analysis is a
systematic analysis in the field of criminal justice that utilizes
a systematic approach [1], is a part of the latter technique.
While this is a wide field of study, Crime Mapping focuses
on the demographics of the crime incidents. With availability
of strong analytic tools and availability and acces to large
amount of data, it has become an important part of policing
in any city as it can help the security officials to understand
the crime trends at various locations. It can also help with
administrative issues such as staffing and employing counter
measures to tackle issues in a pre-emptive manner.
Criminal activities in a region also affect other aspects of
society, such as, development activities in a region, the
cost of property, etc. Therefore, it is interesting from the
economic perspective to understand the relation between
different crime in a region and the factors affecting it.
This paper looks into the data of crime incidents that have
been recorded in London. We will also look at the demographics of different regions and try to identify how different
factors can be associated with the number of incidents
occurring in it.
II. BACKGROUND
Crime Analysis has been explored to understand the crime
mapping of different regions. The different terms pertaining
to this paper are:
LSOA Codes- They are the Lower Layer Super Output
Area Codes which are used for reporting statistics for
small areas in the United Kingdom. Each borough in
London has been divided into sub subdivisions. The
London area 4584 LSOA regions.
Crime Types- This covers the different types of criminal activities which have been recorded. Out data set has
categorized all the criminal activities into 11 different
categories.
Index of Deprivation- This is an index which is a
measure of Depravity of an area. It was introduced
by the UK government in 2007. These rankings are
available for different boroughs as well as LSOA. It
is a cumulative ranking of a region based on various
factors of poverty.
Demographics- These are the sets of attributes which
we will be using to understand the socioeconomic
statistics of LSOAs. These attributes include various
properties, such as, Average Income, Unemployment
Rate, Education Distribution, Population distribution, etc.
III. T HE DATA SETS
VI. T OOLS
We will be using various tools for our analyses. The three
data sets were received in Excel format and were converted
into .csv format for analyses:
A. Weka
Weka is a powerful Graphical User Interface tool for Data
Analyses by University of Waikato. We will be using it for
handling Outliers as well as Classification and Regression.
B. R
R is an Open source statistical tool. It has a lot of packages
which can be used to implement different functionalities. We
will be using R for Mapping our data, joining the different
data sets as well as further analyses.
C. Microsoft Excel
Excel is a commonly used statistical tool which is used
for storing data as well as calculating parameters of the
distribution. We will be using Excel for combining data from
different Months for Crime Data set. We will also be using
Excel for data formatting, data visualizations and basic data
manipulations.
D. SQL
SQL is a Structured Query Language Developed by Oracle. It is used for handling data which has a structure and data
is highly organized. It will be used by us for data cleaning
as well as some basic querying of the data.
VII. DATA P RE - PROCESSING AND C LEANING
All the Excel files were converted to .csv format as
this is the most used format in our tools. The two data
sets Census and Adjusted IMD scores were clean and
ready to be processed but the crime data had issues and
some modifications were required for Crime records 2011.
The data cleaning and pre-processing procedure has been
explained below:
A. The Crime Records 2011
We removed the entries which had no Longitude and
Latitude values as this would help us in plotting the regions
of crime. This data set had some of the incidents reported
twice under the same category. This is because some entries
had the same coordinates, as well as Crime type and date
Month. We can be sure that the same event has not occurred
twice in the same location as the coordinates of incidents
were same up to two decimal points and other details
matched as well. It is possible the duplication was caused by
recording of incidents at different stations. Another possible
reason could be that the crime involved many people and
hence was duplicated, The duplicate data was removed by
using SQL queries. The query used was
SELECT DISTINCT ( L o n g i t u d e , L a t i t u d e ,
Crime . Type , Date , LSOA . Code , LSOA . Name )
FROM C r i m e D a t a
GROUP BY ( L o n g i t u d e , L a t i t u d e ,
Crime . Type , Date , LSOA . Code , LSOA . Name )
This reduced the number of incidents by a significant number. The CrimeId, Context and Last Outcome were
removed as they do not provide any significant information.
We were then be able to plot this data on the map of Greater
London using R.
We then removed the Crime Types in order to aggregate
the data for Each LSOA. This gave us the total count of all
the Criminal Activities recorded in a LSOA. Fig. 1 shows
the number of different crime events recorded in London in
2011.
Primary Key for the data set. Hence, we can easily merge
the data sets based on this value. This operation is performed
by using plyr package in R. After removing the redundant
columns created, our data set is prepared.
The population and Area of each LSOA is different and this
can directly affect the number of crime incidents in the area.
So we will add a column Crime Rate which gives us the
Number of Crime Incidents per Person.
VIII. DATA A NALYSIS
A. Crime Data set
The Crime Data set gives us the locations of the incidents
that have taken place in our data set. Once the data cleaning
was performed, we can plot the data on a map of London
in order to visualize the distribution of crime incidents. The
data was plotted with the help of spatstat in R. Fig. 2 shows
different Crime incidents and their locations on the London
Map.
A. Correlation
B. Classification
In order to create a classification model, we will add an
arbitrary Class attribute. This is a binary attribute which
has been marked as 1 for Crime Rate more than average
value and 0 for crime rate lower than the average value. A
similar process was carried for other numeric data attributes
as well, where the Numeric values less than the average
value were given a Low value and the ones more than the
average value were given a High value.
A K- Nearest Neighbor Model is a model which assumes
that the K neighbors behave in a similar way and hence
assigns them the same value as their neighbor. A K nearest
neighbor was built for different values of k. The highest
number of correctly classified instances was achieved for
k=11. This model was able to classify 71.13% of incidents
correctly. True Positive and True Negative rates are a
measure of classification, which give the number of times
the model has predicted a Negative (or a 0) or a Positive
(or a 1) and the actual values are 0 or 1 respectively. Our
model gave a TP Rate of 0.865 for 0 with a precision of
0.719 and TP Rate of 0.526 for 1 with a precision of
0.656. However, the Kappa statistic () value of this model
was low = 0.3888.
The value is a degree of robustness of a model. It is a
comparison between the measured accuracy and the expected
accuracy of a model. A Kappa statistic value of 1 signifies
that the measured accuracy of a model and the predicted
accuracy of a model are in complete agreement where as a
value of 0 represents no agreement between the two.
X. C ONCLUSION
The aim of the paper was to study the effects of Demographics on the Crime Rate/Number of Incidents for a
region. We have developed a Linear model which had a
weak relation with the Demographic patterns across various
regions. It can be concluded that the Crime Rate over a
region is affected by other factors beyond the demographics.
We plotted the number of Crime Incidents across different
boroughs and were able to determine regions with number of
events. We were able to map the crime events over the region
of London and identify regions which have higher incidents
of crime as compared to other regions.The cumulative distribution of Crime Rate also points out that the crime events
in certain Areas were higher than the other.
Although the depravity index is a pointer of poverty in a
location, however it fails to specify the disparity in income
in a LSOA. The disparity in income would help us improve
the efficiency of the model as it is one of the biggest factors
affecting crime. Therefore, it is a better pointer to understand
poverty in a region. This would increase the correlation value
in a region and its crime rate.
The mapping does not take into account the cost of living
in different regions is different. This is important as it can
tell us whether the average income in a region is good or
bad financially.
XI. E XTENSIONS
This model is a framework to an approach to Crime analysis. We have considered the Demographics of the London.
The following extensions are possible to the research:
1) We can profile different localities by their commercial
significance, e.g., Industrial Areas such as IT parks,
factories, etc. are less susceptible to crime incidents.
However, these areas may have higher population density as people from different regions might temporarily
shift to such places.
2) We can also study the effect of police profile associated
with different regions. How the number of police
officers per 1000 people affects the Crime rates for
different regions.
3) With the improvement in technology and availability
of internet, a new crime trend has emerged. The
Cyber Crime pertains to criminal activities which
are committed over the internet. Can this model be
implemented to cover such activities?
R EFERENCES
[1] Santos, Rachel Boba. Crime analysis with crime mapping. Sage,
2012. pp. 1.
[2] Eck, John, et al. Mapping crime: Understanding hotspots. (2005):
1-71.
[3] J. Malczewski and A. Poetz, Residential Burglaries and Neighborhood Socioeconomic Context in London, Ontario: Global and Local
Regression Analysis*, The Professional Geographer, vol. 57, no. 4,
pp. 516-529, 2005.
[4] Ons.gov.uk, Office for National Statistics (ONS) - ONS, 2015.
[Online]. Available: http://www.ons.gov.uk/ons/index.html. [Accessed:
13- Dec- 2015].
[5] Data.police.uk, Home | data.police.uk, 2015. [Online]. Available:
http://data.police.uk. [Accessed: 13- Dec- 2015].
[6] C. Block and R. Block, "Crime Definition, Crime Measurement, and
Victim Surveys", Journal of Social Issues, vol. 40, no. 1, pp. 137-159,
1984.