Professional Documents
Culture Documents
T3 - A Comparative Study and Analysis On Crime Predictions Based On SVM and CNN For Smart Cities
T3 - A Comparative Study and Analysis On Crime Predictions Based On SVM and CNN For Smart Cities
T3 - A Comparative Study and Analysis On Crime Predictions Based On SVM and CNN For Smart Cities
Introduction
The 21st Century is frequently referenced as the “Century of the City”, reflecting the
unprecedented global migration into urban areas that is happening nowadays. Such a steadily
increasing urbanization is bringing huge social, economic and environmental transformations
and, at the same time, presenting challenges in city management issues, like resource
planning (water, electricity), traffic, air and water quality, public policy and public safety
services. Moreover, given that the larger cities the higher crime rates, crime spiking is
becoming one of the most important social problems in large urban areas, because it affects
public safety, children development, and adult socio-economic status. With the ever-
increasing ability of public organizations and police departments to collect and store detailed
data tracking crime events, a significant amount of data with spatial and temporal information
is daily collected. This offers the opportunity to apply data analytics methodologies to extract
useful predictive models related to crime events, which can enable police departments to
better utilize their limited resources and develop effective strategies for crime prevention. In
particular, extensive criminal justice research studies show that the incidence of criminal
events is not equally distributed within a city. In fact, crime rates can change with respect to
the geographic location of the area (there are low-risk and highrisk areas) and crime trends
can vary (seasonal patterns, picks, dips) with respect to the period of the year. For such a
reason, an accurate predictive model must be able to automatically detect both which areas in
the city are more affected by crime events and how the crime rate of each specific area varies
with respect to the temporal period. This knowledge can enable police departments to more
efficiently allocate their resources to specific crime hot spots, allowing for the effective
deployment of officers to areas of high risk or removal of officers from areas seeing
decreasing levels of crime, thus more efficiently preventing or quickly responding to criminal
activity.
Abstract
Provides a clear structure of criminal database which enhances the complete system.
The goal of the analysis is to find models for reliably predicting the number and
location of crimes at a given timestamp
DATA MINING:
Data mining (the analysis step of the "Knowledge Discovery in Databases" process,
or KDD), a field at the intersection of computer science and statistics, is the process that
attempts to discover patterns in large data sets. It utilizes methods at the intersection of
artificial intelligence, machine learning, statistics, and database systems The overall goal of
the data mining process is to extract information from a data set and transform it into an
understandable structure for further use Aside from the raw analysis step, it involves database
and data management aspects, data preprocessing, model and inference considerations,
interestingness metrics, complexity considerations, post-processing of discovered structures,
visualization, and online updating. Generally, data mining (sometimes called data or
knowledge discovery) is the process of analyzing data from different perspectives and
summarizing it into useful information - information that can be used to increase revenue,
cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing
data. It allows users to analyze data from many different dimensions or angles, categorize it,
and summarize the relationships identified. Technically, data mining is the process of finding
correlations or patterns among dozens of fields in large relational databases.
Table
Raw data
Raw data
Data cleansing and
loading into data Data Data
Extraction of data from repositories mining database transformation visualizati
Pattern discovery using clustering and classification alg
and
interpretati
results
Data are any facts, numbers, or text that can be processed by a computer. Today,
organizations are accumulating vast and growing amounts of data in different formats and
different databases. This includes:
Operational or transactional data such as, sales, cost, inventory, payroll, and
accounting
Nonoperational data, such as industry sales, forecast data, and macro economic data
Meta data - data about the data itself, such as logical database design or data
dictionary definitions
Information
The patterns, associations, or relationships among all this data can provide
information. For example, analysis of retail point of sale transaction data can yield
information on which products are selling and when.
Knowledge
Information can be converted into knowledge about historical patterns and future
trends. For example, summary information on retail supermarket sales can be analyzed in
light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a
manufacturer or retailer could determine which items are most susceptible to promotional
efforts.
Data Warehouses
In computing, a data warehouse (DW or DWH) is a database used for reporting and
data analysis. It is a central repository of data which is created by integrating data from
multiple disparate sources. Data warehouses store current as well as historical data and are
commonly used for creating trending reports for senior management reporting such as annual
and quarterly comparisons. The data stored in the warehouse are uploaded from the
operational systems (such as marketing, sales etc., shown in the figure to the right). The data
may pass through an operational data store for additional operations before they are used in
the DW for reporting. The typical ETL-based data warehouse uses staging, integration, and
access layers to house its key functions. The staging layer or staging database stores raw data
extracted from each of the disparate source data systems. The integration layer integrates the
disparate data sets by transforming the data from the staging layer often storing this
transformed data in an operational data store (ODS) database.
A data warehouse constructed from integrated data source systems does not require
ETL, staging databases, or operational data store databases. The integrated data source
systems may be considered to be a part of a distributed operational data store layer. Data
federation methods or data virtualization methods may be used to access the distributed
integrated source data systems to consolidate and aggregate data directly into the data
warehouse database tables. Unlike the ETL-based data warehouse, the integrated source data
systems and the data warehouse are all integrated since there is no transformation of
dimensional or reference data. This integrated data warehouse architecture supports the drill
down from the aggregate data of the data warehouse to the transactional data of the integrated
source data systems.
Data warehouses can be subdivided into data marts. Data marts store subsets of data
from a warehouse. This definition of the data warehouse focuses on data storage. The main
source of the data is cleaned, transformed, cataloged and made available for use by managers
and other business professionals for data mining, online analytical processing, market
research and decision support However, the means to retrieve and analyze data, to extract,
transform and load data, and to manage the data dictionary are also considered essential
components of a data warehousing system. Many references to data warehousing use this
broader context. Thus, an expanded definition for data warehousing includes business
intelligence tools, tools to extract, transform and load data into the repository, and tools to
manage and retrieve metadata. Dramatic advances in data capture, processing power, data
transmission, and storage capabilities are enabling organizations to integrate their various
databases into data warehouses. Data warehousing is defined as a process of centralized data
management and retrieval. Data warehousing, like data mining, is a relatively new term
although the concept itself has been around for years. Data warehousing represents an ideal
vision of maintaining a central repository of all organizational data. Centralization of data is
needed to maximize user access and analysis. Dramatic technological advances are making
this vision a reality for many companies. And, equally dramatic advances in data analysis
software are allowing users to access this data freely. The data analysis software is what
supports data mining. It enables these companies to determine relationships among "internal"
factors such as price, product positioning, or staff skills, and "external" factors such as
economic indicators, competition, and customer demographics. And, it enables them to
determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables
them to "drill down" into summary information to view detail transactional data.
Resourcing Interpretations
Data Knowledge
Nearest neighbor method: A technique that classifies each record in a dataset based on
a combination of the classes of the k record(s) most similar to it in a historical dataset (where
k 1). Sometimes called the k-nearest neighbor technique.
Rule induction: The extraction of useful if-then rules from data based on statistical
significance.
Association
Association is one of the best known data mining technique. In association, a pattern
is discovered based on a relationship of a particular item on other items in the same
transaction. For example, the association technique is used in market basket analysis to
identify what products that customers frequently purchase together. Based on this data
businesses can have corresponding marketing campaign to sell more products to make more
profit.
Database Statistics
technology
Information Other
science disciplines
Classification
Classification is a classic data mining technique based on machine learning. Basically
classification is used to classify each item in a set of data into one of predefined set of classes
or groups. Classification method makes use of mathematical techniques such as decision
trees, linear programming, neural network and statistics. In classification, make the software
that can learn how to classify the data items into groups. For example, can apply
classification in application that “given all past records of employees who left the company,
predict which current employees are probably to leave in the future.” In this case, divide the
employee’s records into two groups that are “leave” and “stay”.
Clustering
Clustering is a data mining technique that makes meaningful or useful cluster of
objects that have similar characteristic using automatic technique. Different from
classification, clustering technique also defines the classes and put objects in them, while in
classification objects are assigned into predefined classes. To make the concept clearer, can
take library as an example. In a library, books have a wide range of topics available. The
challenge is how to keep those books in a way that readers can take several books in a
specific topic without hassle.
What is Web Mining
Web mining is the use of data mining techniques to automatically discover and extract
information from Web documents and services.
There are three general classes of information that can be discovered by web mining:
Web activity, from server logs and Web browser activity tracking.
Web graph, from links between pages, people and other data.
Web content, for the data found on Web pages and inside of documents.
At Scale Unlimited we focus on the last one – extracting value from web pages and
other documents found on the web.
Note that there’s no explicit reference to “search” in the above description. While
search is the biggest web miner by far, and generates the most revenue, there are many other
valuable end uses for web mining results. A partial list includes:
Business intelligence
Competitive intelligence
Pricing analysis
Events
Product data
Popularity
Reputation
When extracting Web content information using web mining, there are four typical steps.
When comparing web mining with traditional data mining, there are three main
differences to consider:
1. Scale – In traditional data mining, processing 1 million records from a database would
be large job. In web mining, even 10 million pages wouldn’t be a big number.
2. Access – When doing data mining of corporate information, the data is private and
often requires access rights to read. For web mining, the data is public and rarely
requires access rights. But web mining has additional constraints, due to the implicit
agreement with webmasters regarding automated (non-user) access to this data. This
implicit agreement is that a webmaster allows crawlers access to useful data on the
website, and in return the crawler (a) promises not to overload the site, and (b) has the
potential to drive more traffic to the website once the search index is published. With
web mining, there often is no such index, which means the crawler has to be extra
careful/polite during the crawling process, to avoid causing any problems for the
webmaster.
3. Structure – A traditional data mining task gets information from a database, which
provides some level of explicit structure. A typical web mining task is processing
unstructured or semi-structured data from web pages. Even when the underlying
information for web pages comes from a database, this often is obscured by HTML
markup.
Note that by “traditional” data mining we mean the type of analysis supported by most
vendor tools, which assumes you’re processing table-oriented data that typically comes from
a database.
Sentiment Analysis
Sentiment analysis is a type of data mining that measures the inclination of people’s
opinions through natural language processing (NLP), computational linguistics and text
analysis, which are used to extract and analyze subjective information from the Web - mostly
social media and similar sources. The analyzed data quantifies the general public's sentiments
or reactions toward certain products, people or ideas and reveal the contextual polarity of the
information. Sentiment analysis is also known as opinion mining.
Sentiment analysis uses data mining processes and techniques to extract and capture data
for analysis in order to discern the subjective opinion of a document or collection of
documents, like blog posts, reviews, news articles and social media feeds like tweets and
status updates.
Sentiment analysis allows organizations to track the following:
With the recent advances in deep learning, the ability of algorithms to analyse text has
improved considerably. Creative use of advanced artificial intelligence techniques can be an
effective tool for doing in-depth research. We believe it is important to classify incoming
customer conversation about a brand based on following lines:
1. Key aspects of a brand’s product and service that customers care about.
These basic concepts when used in combination become a very important tool for
analyzing millions of brand conversations with human level accuracy. In the post, we take the
example of Uber and demonstrate how this works. Read On!
Text classifier- The basic building blocks
Sentiment Analysis
Sentiment Analysis is the most common text classification tool that analyses an
incoming message and tells whether the underlying sentiment is positive, negative our neutral.
You can input a sentence of your choice and gauge the underlying sentiment by playing with
the demo.
Intent Analysis
Intent analysis steps up the game by analyzing the user’s intention behind a message
and identifying whether it relates an opinion, news, marketing, complaint, suggestion,
appreciation or query.
Now this is where things get really interesting. To derive actionable insights, it is
important to understand what aspect of the brand is a user discussing about. For example:
Amazon would want to segregate messages that related to: late deliveries, billing issues,
promotion related queries, product reviews etc. On the other hand, Starbucks would want to
classify messages based on whether they relate to staff behaviour, new coffee, flavours,
hygiene feedback, online orders, store name and location etc. But how can one do that?
Forecasting crime using the P. Chen, H. Yuan, and X. In this paper, time series
arima model Shu model of ARIMA is used to
make short-term forecasting
FSKD ’08. Fifth of property crime for one city
International Conference on, of China.
vol. 5, 2008, pp. 627–630. Datasets: Crime Prediction
Dataset in China.
Advantages: Reduces the
Crime rates.
Disadvantages: Maintenance
in most difficult.
Application: Decreases the
crime hotspot areas.
Future Scope: Accuracy level
can be increased.
Title Author & Year Description
Forecasting crimes using E. Cesario, C. Catlett, and D. Crime is undesired anti-
autoregressive Talia social behavior and poses
models, 2016 IEEE 2nd Int. Conf. on serious threat to society. The
Big Data Intelligence and civilized societies make
Computing and Cyber everything possible to reduce
Science and Technology, crime within its regime of
2016, pp. 795–802. influence. Alarming the
crime prone areas in advance
is one of the best strategies
for crime to be ceased to
happen.
Datasets: City Crime
Datasets
Advantages: Provides
information about the city
crimes.
Disadvantages: Information
may vary.
Application: Used to reduce
the crime presences in a
specific area.
Future Scope: Areas can be
covered with perfect tracking
objective.
Predicting the future in S Selva Priya ; Lavanya Gupta In this paper, various
time series using auto prediction methods are
regressive linear regression 9-11 Sept. 2015 compared based on
modeling performance for an example
time series. Given the crime
data and demographic
features like the sex ratio,
population density and
religious composition of a
region,
Datasets: City Performance
Dataset
Advantages: Maintains the
entire information and data
in a single click.
Disadvantages: Prediction
level may be low.
Application: Increases the
city a crime freeone.
Future scope: This paper
delivers a method to predict
the region-specific crime rate
for the future
Title Author & Year Description
Forecasting Violent Jacob R. Scanlon ; Matthew S. This paper presents research
Extremist Cyber Gerber on forecasting the daily level
Recruitment of cyber-recruitment activity
IEEE Transactions on of VE groups. We used a
Information Forensics and previously developed support
Security ( Volume: 10 , Issue: vector machine model to
11 , Nov. 2015 ) identify recruitment posts
within a Western jihadist
discussion forum.
Dataset: We analyzed the
textual content of this data
set with latent Dirichlet
allocation (LDA),
Advantages: Enhanced cyber
security.
Disadvantages: Comparison
level may be low.
Application: In builds in
online crime predictions.
Future Scope: Can be
effectively analyze the online
crimes and social media
frauds.
Intelligent Crime Anomaly Sharmila The quick and accurate
Detection in Smart Cities Chackravarthy ; Steven identification of criminal
Using Deep Learning Schmitt ; Li Yang activity is paramount to
securing any residence. With
2018 IEEE 4th International the rapid growth of smart
Conference on Collaboration cities, the integration of
and Internet Computing (CIC) crime detection systems
18-20 Oct. 2018 seeks to improve this
security.
Dataset: Area and Resident
Dataset.
Advantages: Secures the
resident with crime free one.
Disadvantages: Unable to
maintain large amount of
data.
Application: Can be applied
as a area wise data.
Future Scope: Useful to
maintain the metropolitan
city with high security.
Title Author & Year Description
Forecasting Cyberattacks Gordon Werner ; Ahmet It investigates the use of
as Time Series with Okutan ; Shanchieh Auto-Regressive Integrated
Different Aggregation Yang ; Katie McConky Moving Average (ARIMA)
Granularity 2018 IEEE International models and Bayesian
Symposium on Technologies for Networks (BN) to predict
Homeland Security (HST) 23- future cyber attack
24 Oct. 2018 occurrences and intensities
against target entities. In
addition to incident count
forecasting, categorical and
binary occurrence metrics are
proposed to represent volume
forecasts to a victim.
Datasets: Bayesian Networks
Advantages: Increases the
efficiency to find the online
fraudulent activities.
Disadvantages: Prediction
and analyzes may be slow
and inefficient.
Application: Can be applied
for entire online processes.
Future Scope: Cyber attacks
can be minimized to extent.
Safe cities. A participatory Jaime Ballesteros ; Mahmudur This work takes steps toward
sensing approach Rahman ; Bogdan implementing smart, safe
Carbunar ; Naphtali Rishe cities, by combining the use
of personal devices and
22-25 Oct. 2012 social networks to make
users aware of safety of
their surroundings. We
propose novel metrics to
define location and user
based safety values.
Dataset: Smart City Crime
Prediction.
Advantages: Minimizes the
crime activities in smart
cities.
Disadvantages: Comparison
and prediction is vague.
Application: Reduces the
occurrences of crimes.
Future Scope: We evaluate
ability of forecasting
techniques including
(ARIMA) and artificial
neural networks (ANN) to
predict future safety values.
Title Author & Year Description
A Fast Density-Based Bing Liu In this paper, a fast density-
Clustering Algorithm for based clustering algorithm is
Large Databases 2006 International Conference presented based on
on Machine Learning and DBSCAN. After sorting
Cybernetics objects by a certain
13-16 Aug. 2006 dimensional coordinates, the
new algorithm selects orderly
unlabelled points outside a
core object's neighborhood as
seeds to expand clusters so
that the execution frequency
of region queries can be
decreased.
Datasets: Density Based on
Population
Advantages: Increases the
security measures in dense
populated area.
Disadvantages: Prediction
level may be showered due to
large amount of data
Application: Can be applied
in smart cities.
Future Scope: Performance
and efficiency of smart cities
can be improved.
An improved sampling- International Conference on Spatial data clustering is one
based DBSCAN for large Intelligent Sensing and of the important data mining
spatial databases Information Processing, 2004. techniques for extracting
Proceedings knowledge from large
amount of spatial data
4-7 Jan. 2004 collected in various
applications, such as remote
sensing, GIS, computer
cartography, environmental
assessment and planning, etc.
Dataset: DBSCAN
Advantages: Noise points can
be reduced.
Disadvantages: Identification
is slow process.
Application: Can be applied
to guess the remote sensing
values in cyber network and
also in block chain.
Future Scope: Increases
security in networking and
IOT.
Title Author & Year Description
Internet of things Nomusa Dlodlo ; Oscar This paper is on potential
technologies in smart cities Gcaba ; Andrew Smith smart cities applications as
applied to the domains of
2016 IST-Africa Week smart transport, smart
Conference 11-13 May 2016 tourism and recreation, smart
health, ambient-assisted
living, crime prevention and
community safety,
governance, monitoring and
infrastructure, disaster
management, environment
management, refuse
collection and sewer
management, smart homes
and smart energy.
Datasets: JARVIS
Advantages: Increases the
safety and environment
management.
Disadvantages: Data
collection is difficult process.
Application: Provides
detailed information and data
about the environment and
smart cities.
Future Scope: Efficiently
increases the environment
and a free one.
A Data-Driven Approach Charlie Catlett ; Eugenio This paper presents an
for Spatio-Temporal Crime Cesario ; Domenico approach based on spatial
Predictions in Smart Cities Talia ; Andrea Vinci 2018 analysis and auto-regressive
IEEE International Conference models to automatically
on Smart Computing detect high-risk crime
(SMARTCOMP) 31 July regions in urban areas and
2018 reliably forecast crime trends
in each region.
Datasets: Spatial temporal
Advantages: Prediction level
is accurate and a easier one.
Disadvantages: Analyzing
the dataset is an complicated
one.
Application: It can be useful
in predicting crime levels in
smart cities in advance.
Future Scope: In future it can
be done in various fields like
software, data markets and so
on.
Title Author & Year Description
How fear of crime affects Julia van Heek ; Katrin In this empirical study, we
needs for privacy & Aming ; Martina Ziefle explore users' perceptions on
safety”: Acceptance of safety and privacy in the
surveillance technologies in 2016 5th International context of surveillance
smart cities Conference on Smart Cities and systems in urban
Green ICT Systems environments.
(SMARTGREENS) 23-25 Dataset: Traffic Analysis
April 2016 Advantages: Can useful in
analyzing and maintain the
traffic and traffic based data
for smart cities.
Disadvantages: Chance of
collapse in data that has been
provided.
Application: Traffic
management.
Future Scope: Enormously
implement in covering the
entire smart city and urban
areas.
Dynamic Network Model Olivera Kotevska ; A. Gilad Today's cities generate
for Smart City Data-Loss Kusne ; Daniel V. tremendous amounts of data,
Resilience Case Study: Samarov ; Ahmed Lbath thanks to a boom in
City-to-City Network for affordable smart devices and
Crime Analytics IEEE Access ( Volume: 5 ) sensors. The resulting big
12 October 2017 data creates opportunities to
develop diverse sets of
context-aware services and
systems, ensuring smart city
services are optimized to the
dynamic city environment.
Dataset: Crime Data
Analytics
Advantages: Increases the
security and confidentiality.
Disadvantages: Chance of
presence of irrelevant data
Application: May be used to
optimize the distribution of
police, medical, and
emergency services
Future Scope: Data and
information can be gained
from various precautions.
SYSTEM STUDY
Existing System
The design and implementation of an approach based on spatial analysis and auto-
regressive models to automatically detect high-risk crime regions in urban areas and to
reliably forecast crime trends in each region. The algorithm is composed of several steps.
First, high crime density areas (called crime dense regions or crime hotspots) are discovered
through a spatial analysis approach, where shapes of the detected regions are automatically
traced by the algorithm without any pre-fixed division in areas. Then, a specific crime
prediction model is discovered from each detected region, analyzing the partitions discovered
during the previous step. The final result of the algorithm is a spatiotemporal crime
forecasting model, composed of a set of crime dense regions and a set of associated crime
predictors, each one representing a predictive model to forecast the number of crimes that
will happen in its specific region. As case study, we present here the analysis of crimes within
a big area of Chicago involving about two million crime events. Crime data has been
gathered a Web framework that provides public access to more than one hundred urban
datasets. The results of the experimental evaluation show the effectiveness of the approach,
by achieving good accuracy in spatial and temporal crime forecasting over rolling time
horizons.
Disadvantages
Due to lack of information gathering the specific action for the occurred crime
sequences are slow process.
Accuracy cannot be maintained due to more paper work and manual process, which is
an interrupted pattern in the system.
Major deficiency in the system is targeted city crime predictions cannot be maintained
with better efficiency.
Proposed System
Crime has been prevalent in our society for a very long time and it continues to be so
even today. Currently, many cities have released crime-related data as part of an open data
initiative. Using this as input, we can apply analytics to be able to predict and hopefully
prevent crime in the future. In this work, we applied data analytics to the crime dataset, as
collected and available through the Open Data initiative. The main focus is to perform an in-
depth analysis of the major types of crimes that occurred in the city, observe the trend over
the years, and determine how various attributes contribute to specific crimes. Furthermore,
we leverage the results of the exploratory data analysis to inform the data preprocessing
process, prior to training various machine learning models for crime type prediction. More
specifically, the model predicts the type of crime that will occur in each district of the city.
We observe that the provided dataset is highly imbalanced, thus metrics used in previous
research focus mainly on the majority class, disregarding the performance of the classifiers in
minority classes, and propose a methodology to improve this issue. The proposed model finds
applications in resource allocation of law enforcement in a Smart City.
Advantages
Reduces the latency up to the core and increases the reliability of the system.
Hardware Requirements
The hardware requirements may serve as the basis for a contract for the
implementation of the system and should therefore be a complete and consistent specification
of the whole system. They are used by software engineers as the starting point for the system
design
PHP - Overview
PHP is a recursive acronym for "PHP: Hypertext Preprocessor". PHP is a server side
scripting language that is embedded in HTML. It is used to manage dynamic content,
databases, session tracking, even build entire e-commerce sites. The PHP Hypertext
Preprocessor (PHP) is a programming language that allows web developers to create dynamic
content that interacts with databases. PHP is basically used for developing web based
software applications. This tutorial helps you to build your base with PHP.
PHP started out as a small open source project that evolved as more and more people
found out how useful it was. Rasmus Lerdorf unleashed the first version of PHP way back in
1994.
PHP is a MUST for students and working professionals to become a great Software
Engineer specially when they are working in Web Development Domain. I will list down
some of the key advantages of learning PHP:
Characteristics of PHP
Simplicity
Efficiency
Security
Flexibility
Familiarity
Just to give you a little excitement about PHP, I'm going to give you a small
conventional PHP Hello World program, You can try it using Demo link.
<html>
<head>
<title>Hello World</title>
</head>
<body>
<?php echo "Hello, World!";?>
</body> </html>
Applications of PHP
As mentioned before, PHP is one of the most widely used language over the web. I'm
going to list few of them here:
PHP performs system functions, i.e. from files on a system it can create, open, read,
write, and close them. and can handle forms, i.e. gather data from files, save data to a file,
through email you can send data, return data to the user. You add, delete, modify elements
within your database through PHP and access cookies variables and set cookies. Using PHP,
you can restrict users to access some pages of your website and encrypt data.
Architecture Overview
This section explains how all the different parts of the driver fit together. From the
different language runtimes, through the extension and to the PHP libraries on top. This new
architecture has replaced the old mongo extension. We refer to the new one as
the mongodb extension.
At the top of this stack sits a pure » PHP library, which we will distribute as a
Composer package. This library will provide an API similar to what users have come to
expect from the old mongo driver (e.g. CRUD methods, database and collection objects,
command helpers) and we expect it to be a common dependency for most applications built
with MongoDB. This library will also implement common » specifications, in the interest of
improving API consistency across all of the » drivers maintained by MongoDB (and
hopefully some community drivers, too).Sitting below that library we have the lower level
driver. This extension will effectively form the glue between PHP and our system libraries.
This extension will expose an identical public API for the most essential and performance-
sensitive functionality:
Connection management
BSON encoding and decoding
Object document serialization (to support ODM libraries)
Executing commands and write operations
Handling queries and cursors
Prerequisites
Before proceeding with this tutorial you should have at least basic understanding of
computer programming, Internet, Database, and MySQL etc is very helpful.
PHP started out as a small open source project that evolved as more and more people
found out how useful it was. Rasmus Lerdorf unleashed the first version of PHP way back in
1994.
PHP performs system functions, i.e. from files on a system it can create, open, read,
write, and close them.
PHP can handle forms, i.e. gather data from files, save data to a file, through email
you can send data, return data to the user.
You add, delete, modify elements within your database through PHP. Access cookies
variables and set cookies. Using PHP, you can restrict users to access some pages of your
website. It can encrypt data.
Characteristics of PHP
Simplicity
Efficiency
Security
Flexibility
Familiarity
In order to develop and run PHP Web pages three vital components need to be
installed on your computer system.
Web Server − PHP will work with virtually all Web Server software, including
Microsoft's Internet Information Server (IIS) but then most often used is freely available
Apache Server. Download Apache for free here − https://httpd.apache.org/download.cgi
Database − PHP will work with virtually all database software, including Oracle and
Sybase but most commonly used is freely available MySQL database. Download MySQL for
free here − https://www.mysql.com/downloads/
PHP Parser − In order to process PHP script instructions a parser must be installed to
generate HTML output that can be sent to the Web Browser. This tutorial will guide you how
to install PHP parser on your computer.
PHP Parser Installation
Before you proceed it is important to make sure that you have proper environment
setup on your machine to develop your web programs using PHP.
http://127.0.0.1/info.php
If this displays a page showing your PHP installation related information then it
means you have PHP and Webserver installed properly. Otherwise you have to follow given
procedure to install PHP on your computer.
This section will guide you to install and configure PHP over the following four
platforms −
Apache Configuration
If you are using Apache as a Web Server then this section will guide you to edit
Apache Configuration Files.
The PHP configuration file, php.ini, is the final and most immediate way to affect
PHP's functionality.
To configure IIS on your Windows machine you can refer your IIS Reference Manual
shipped along with IIS.
The main way to store information in the middle of a PHP program is by using a
variable.
Here are the most important things to know about variables in PHP.
All variables in PHP are denoted with a leading dollar sign ($).
Variables are assigned with the = operator, with the variable on the left-hand side and
the expression to be evaluated on the right.
Variables in PHP do not have intrinsic types - a variable does not know in advance
whether it will be used to store a number or a string of characters.
PHP does a good job of automatically converting types from one to another when
necessary.
PHP has a total of eight data types which we use to construct our variables −
Resources − are special variables that hold references to resources external to PHP
(such as database connections).
Conclusion
Whereas learning that trick might have been a natural thing to learn as part of learning
the whole subject, and only requires 5 minutes of study in between learning many other
tricks. In other words, a developer that has to constantly seek out solutions to things he/she
doesn't know will waste a lot more time in aggregate than someone that mastered the subject
as a whole, and then went to apply it. You're just more relaxed and in a better learning mode
when you're focused on nothing but learning. But when you're focused on producing results,
and have to learn at the same time, it can be stressful and waste you tons of time going back
and forth from testing each of the tens of wrong solutions you're trying out and googling until
you find the right one.
MYSQL
MySQL is the most popular Open Source Relational SQL Database Management
System. MySQL is one of the best RDBMS being used for developing various web-based
software applications. MySQL is developed, marketed and supported by MySQL AB, which
is a Swedish company. This tutorial will give you a quick start to MySQL and make you
comfortable with MySQL programming.
MySQL is a fast, easy-to-use RDBMS being used for many small and big businesses.
MySQL is developed, marketed and supported by MySQL AB, which is a Swedish
company. MySQL is becoming so popular because of many good reasons −
MySQL is released under an open-source license. So you have nothing to pay to use
it.
MySQL is a very powerful program in its own right. It handles a large subset of the
functionality of the most expensive and powerful database packages.
MySQL works on many operating systems and with many languages including PHP,
PERL, C, C++, JAVA, etc.
MySQL works very quickly and works well even with large data sets.
MySQL is very friendly to PHP, the most appreciated language for web development.
MYSQL Functions
Here is the list of all important MySQL functions. Each function has been explained
along with suitable example.
MySQL Group By Clause − The MySQL GROUP BY statement is used along with
the SQL aggregate functions like SUM to provide means of grouping the result
dataset by certain database table column(s).
MySQL MAX Function − The MySQL MAX aggregate function allows us to select
the highest (maximum) value for a certain column.
MySQL MIN Function − The MySQL MIN aggregate function allows us to select the
lowest (minimum) value for a certain column.
MySQL SUM Function − The MySQL SUM aggregate function allows selecting the
total for a numeric column.
MySQL CONCAT Function − This is used to concatenate any string inside any
MySQL command.
MySQL DATE and Time Functions − Complete list of MySQL Date and Time
related functions.
Discussion
MySQL is the most popular Open Source Relational SQL Database Management
System. MySQL is one of the best RDBMS being used for developing various web-based
software applications. MySQL is developed, marketed and supported by MySQL AB, which
is a Swedish company. This tutorial will give you a quick start to MySQL and make you
comfortable with MySQL programming.
Angular JS - Overview
General Features
AngularJS is a efficient framework that can create Rich Internet Applications (RIA).
AngularJS provides developers an options to write client side applications using
JavaScript in a clean Model View Controller (MVC) way.
Core Features
Scope − These are objects that refer to the model. They act as a glue between
controller and view.
Services − AngularJS comes with several built-in services such as $http to make a
XMLHttpRequests. These are singleton objects which are instantiated only once in
app.
Filters − These select a subset of items from an array and returns a new array.
Templates − These are the rendered view with information from the controller and
model. These can be a single file (such as index.html) or multiple views in one page
using partials.
Deep Linking − Deep linking allows to encode the state of application in the URL so
that it can be bookmarked. The application can then be restored from the URL to the
same state.
Advantages of AngularJS
It provides the capability to create Single Page Application in a very clean and
maintainable way.
It provides data binding capability to HTML. Thus, it gives user a rich and responsive
experience.
With AngularJS, the developers can achieve more functionality with short code.
In AngularJS, views are pure html pages, and controllers written in JavaScript do the
business processing.
On the top of everything, AngularJS applications can run on all major browsers and
smart phones, including Android and iOS based phones/tablets.
Disadvantages of AngularJS
Though AngularJS comes with a lot of merits, here are some points of concern −
Not Secure − Being JavaScript only framework, application written in AngularJS are
not safe. Server side authentication and authorization is must to keep an application
secure.
Not degradable − If the user of your application disables JavaScript, then nothing
would be visible, except the basic page.
AngularJS Directives
ng-model − This directive binds the values of AngularJS application data to HTML
input controls.
ng-bind − This directive binds the AngularJS application data to HTML tags.
Applications of AngularJS
AngularJS is a efficient framework that can create Rich Internet Applications (RIA).
WAMP is an acronym that stands for Windows, Apache, MySQL, and PHP. It’s a
software stack which means installing WAMP installs Apache, MySQL, and PHP on your
operating system (Windows in the case of WAMP). Even though you can install them
separately, they are usually bundled up, and for a good reason too.
What’s good to know is that WAMP derives from LAMP (the L stands for Linux).
The only difference between these two is that WAMP is used for Windows, while LAMP –
for Linux based operating systems.
Let’s quickly go over what each letter represents “W” stands for Windows, there’s
also LAMP (for Linux) and MAMP (for Mac). “A” stands for Apache. Apache is the server
software that is responsible for serving web pages. When you request a page to be seen by
you, Apache grants your request over HTTP and shows you the site. “M” stands for MySQL.
MySQL’s job is to be the database management system for your server. It stores all of the
relevant information like your site’s content, user profiles, etc. “P” stands for PHP. It’s the
programming language that was used to write WordPress. It acts like glue for this whole
software stack. PHP is running in conjunction with Apache and communicating with
MySQL.
WAMP acts like a virtual server on your computer. It allows you to test all WordPress
features without any consequences since it’s localized on your machine and is not connected
to the web.
First of all, this means that you don’t need to wait until files are uploaded to your site,
and secondly – this makes creating backups much easier.
WAMP speeds up the work process for both developers and theme designers alike.
What is more, you also get the benefit of playing around with your site to your heart’s
content. However, to actually make the website go live, you need to get some form of hosting
service and a Domain. See our beginner-friendly article about web hosting for more
information. In essence, WAMP is used as a safe space to work on your website, without
needing to actually host it online. WAMP also has a control panel. Once you install the
software package, all of the services mentioned above (excluding the operating system that
is) will be installed on your local machine. Whether you use WAMP or software packages for
the other operating systems, it’s a great way to save time. You won’t have to upload files to a
site and will be able to learn how to develop in a safe and care-free environment.
Architecture Diagram
Feature Extraction
Initially we collect the crime dataset of a specific city and preprocess the data that has
been related with the dataset. The data and information that has been inbuilt in the dataset are
analyzed and preprocessed with various algorithms. The preprocessed data are extracted
according to the features they are related.
During the operation the input data is denoted as the dataset of crime data of a specific
city, from that dataset the data are preprocessed with salient features of the algorithms. The
preprocessed data are extracted and compared with the current crime occurrences with the
crime occurrence in the dataset. So that the details can be detected in an efficient way. The
detected result has been analyzed for proper results and incase of occurrences of any errors or
any defects the same procedure has been repeated to attain perfect results and evaluation. The
data are compared with the help of Support Vector Machine Algorithm (SVM) which
classifies the given data and compares them for proper predictions.
After comparing the data with the help of SVM the perfect result has been gathered
and the gathered data are clustered using K-Means Clustering Algorithm to gain the accurate
result of the entire system. The K-Means Clustering provides the accurate value to attain the
results.
The data can be updated in case of newly occurred crimes and those data are updated
to the system which enhances the efficiency of the system in an enormous way. As a result
the new crime data are added to the previous dataset which contains the list of crime events
of the cities and the entire data can be compared and clustered with the SVM and K- Means
Clustering Algorithms to retrieve better results. This process continuous repeatedly, so that
every new crime events and its related contents are added to the dataset for better future
performance.
After analyzing and extracting the data we can obtain the clear information about the
crime events of the city. It also provides us an efficient result of crime prediction and using
that information the occurrences of crime events can be detected or reduced up to the core. In
real life events it can be implemented in smart cities in future which maintains the crime rates
in control or reduces the crime rates.
Dataflow Diagram
Level 0
Crime Prediction
Admin Database
Level 1
Acquire Dataset
Database
Admin Preprocess Data
Analyze Information
Data Segmentation
Level 2
SVM Classifier
Data Retrieval
Feature Extraction
UML Diagram
The Unified Modeling Language (UML) is a general-purpose, developmental,
modeling language in the field of software engineering that is intended to provide a standard
way to visualize the design of a system.
A use case diagram at its simplest is a representation of a user's interaction with the
system that shows the relationship between the user and the different use cases in which the
user is involved.
Acquire Dataset
Retrieve Data
SVM Classifier
Classification of Data
Feature Extraction
Class Diagram
The class diagram is the main building block of object-oriented modelling. It is used
for general conceptual modeling of the systematic of the application, and for detailed
modeling translating the models into programming code. Class diagrams can also be used for
data modelling.
In the diagram, classes are represented with boxes that contain three compartments:
The top compartment contains the name of the class. It is printed in bold and centered,
and the first letter is capitalized.
The middle compartment contains the attributes of the class. They are left-aligned and
the first letter is lowercase.
The bottom compartment contains the operations the class can execute. They are also
left-aligned and the first letter is lowercase.
In the design of a system, a number of classes are identified and grouped together in a
class diagram that helps to determine the static relations between them. With detailed
modelling, the classes of the conceptual design are often split into a number of subclasses. In
order to further describe the behaviour of systems, these class diagrams can be complemented
by a state diagram or UML state machine.
TestingPhase
Acquire
Retrieve
TrainingPhase Preprocess
Acquire Segmentation
Retrieve Classify
Preprocess Predict
Extraction Extraction
Database Server
Store
View
Stire Data()
View Data()
Activity Diagram
An activity diagram visually presents a series of actions or flow of control in a system
similar to a flowchart or a data flow diagram. Activity diagrams are often used in business
process modelling. They can also describe the steps in a use case diagram. Activities
modelled can be sequential and concurrent.
The Purpose of Activity Diagrams. The basic purposes of activity diagrams is similar
to other four diagrams. It captures the dynamic behaviour of the system. Other four diagrams
are used to show the message flow from one object to another but activity diagram is used to
show message flow from one activity to another. Activity diagrams are constructed from a
limited number of shapes, connected with arrows.
Testing Testing
Phase Phase
Acquire Acquire
Dataset Dataset
Preprocess
Preprocess Retrieved Data
Retrieved Data
Crime Data
Segmentation
SVM Classifier
Classification
of Data
Predict Accuracy
Using K - Me...
Feature
Extraction
Sequence Diagram
The sequence diagram is a good diagram to use to document a system's requirements
and to flush out a system's design. The reason the sequence diagram is so useful is because it
shows the interaction logic between the objects in the system in the time order that the
interactions take place.
A sequence diagram shows object interactions arranged in time sequence. It depicts
the objects and classes involved in the scenario and the sequence of messages exchanged
between the objects needed to carry out the functionality of the scenario.A sequence diagram
shows, as parallel vertical lines (lifelines), different processes or objects that live
simultaneously, and, as horizontal arrows, the messages exchanged between them, in the
order in which they occur. This allows the specification of simple runtime scenarios in a
graphical manner.
1: Provide Dataset
2: Fetch Data
3: Acquire Data
6: SVM Classifier
7: Data Classification
They require use cases, system operation contracts, and domain model to already
exist. The collaboration diagram illustrates messages being sent between classes and objects
(instances). A collaboration diagram shows the objects and relationships involved in an
interaction, and the sequence of messages exchanged among the objects during the
interaction. Communication diagrams model the interactions between objects in sequence.
They describe both the static structure and the dynamic behaviour of a system. In many ways,
a communication diagram is a simplified version of a collaboration diagram introduced in
UML 2.0.
1: Provide Dataset
9: Predict Accuracy Using K - Means Clustering
Training Data
Phase Server
3: Acquire Data
2: Fetch Data
4: Preprocess Acquired Data
5: Crime Data Segmentation
6: SVM Classifier
7: Data Classification
8: Predict Accuracy Using K - Means Clustering
10: Feature Extraction
Testing
Phase
SYSTEM IMPLAMENTATION
Modules
Datasets acquisition
Preprocessing
Classification
Feature Extraction
Prediction
Evaluation criteria
Module Description
Data Acquisition
Initially we collect the crime dataset of a specific city and preprocess the data that has
been related with the dataset. The data and information that has been inbuilt in the dataset are
analyzed and preprocessed with various algorithms. The preprocessed data are extracted
according to the features they are related.
During the operation the input data is denoted as the dataset of crime data of a specific
city, from that dataset the data are preprocessed with salient features of the algorithms.
Preprocessing
The preprocessed data are extracted and compared with the current crime occurrences
with the crime occurrence in the dataset. So that the details can be detected in an efficient
way. The detected result has been analyzed for proper results and incase of occurrences of
any errors or any defects the same procedure has been repeated to attain perfect results and
evaluation.
Classification
The data are compared with the help of Support Vector Machine Algorithm (SVM)
which classifies the given data and compares them for proper predictions. After comparing
the data with the help of SVM the perfect result has been gathered and the gathered data are
clustered using K-Means Clustering Algorithm to gain the accurate result of the entire
system.
Segmentation
The data are compared with the help of Support Vector Machine Algorithm (SVM)
which classifies the given data and compares them for proper predictions. The K-Means
Clustering provides the accurate value to attain the results. The data can be updated in case
of newly occurred crimes and those data are updated to the system which enhances the
efficiency of the system in an enormous way. As a result the new crime data are added to the
previous dataset which contains the list of crime events of the cities and the entire data can be
compared and clustered with the SVM and K- Means Clustering Algorithms to retrieve better
results.
Classification
It performs a spatial clustering of the data set, where each cluster represents a dense
region of crimes. The density-based notion is a common approach for clustering, whose
inspiring idea is that objects forming a dense region should be grouped together into one
cluster. In our implementation, this step is performed by applying, a popular density based
clustering algorithm that finds clusters starting from the estimated density distribution of the
considered data. We have chosen the SVM algorithm because it has the ability to discover
clusters with arbitrary shape such as linear, concave, oval, etc. and differently from other
clustering algorithms proposed and it does not require the predetermination of the number of
clusters to be discovered.
Feature Extraction
Given a specific dense region, the CRIMEPREDICTOR method discovers a
predictive model to forecast the number of crimes that will happen in its specific area. In our
implementation, this has been performed by the Seasonal Auto Regressive Integrated Moving
Average model, which is defined as a combination of auto regression, moving average and
difference modeling. Briefly, having the time series, which is the value of the time series at
the timestamp, an ARIMA model is written are the regression coefficient of the moving
average part, are lagged values and lagged errors, and it is the white noise and takes into
account the forecast error.
Prediction
As described, crime dense regions are detected by applying an our ad-hoc modified
version of SVM, which exploits a decay factor that gives a higher weight to the recent crime
events than the older ones. Moreover, in order to detect high quality crime dense regions, it is
necessary to profitable tune the key parameters of the algorithm so as to improve results’
performance. In particular, the values of the SVM’s parameters and determines the size of the
clusters, as they represents the minimum crime density required by an area to be part of a
cluster. On the one hand, the larger is the extension of the dense regions detected: this results
in the discovery of large regions that actually are no longer dense. On the other hand, the
smaller the cluster sizes, resulting in a high number of dense regions detected that could be
not significant for the analysis. Conversely, growing the value results in increasing the
fragmentation of the produced clustering assignment. The values of a key factor for the
accuracy of the dense region detection phase and for the right balance among separability,
compactness and significance of clusters.
Evaluation Criteria
To evaluate the performance and the effectiveness of the approach that has been
described in the paper, we carried out an experimental evaluation by analyzing crimes
occurring in a big area of the city. The main goal consists in detecting the most significant
crime dense regions and discovering a predictive model for each one to forecast the number
of crimes that will happen in the future in each area. In the following sub-sections we
describe the main issues of our analysis: data description and gathering, crime dense region
detection, the regressive model training for each region, and the evaluation of the model on
the test set. Reports the values of K- Means Clustering for the whole area and the top three
largest crime dense regions, by considering one-year-ahead, two-year-ahead and three-year
ahead prediction horizons.
SYSTEM TESTING
Test Case
File level deduplication will save a relatively large memory space. In general, file
level deduplication view multiple copies of same file. It stores first file and then it links other
references to the first file. Only one copy will be stored. In testing, even though file names
are same, the system can able to detect deduplication. If we upload the same file by using
different names, it will view only the content and not names. Thus redundant data is avoided.
In registration phase, the user may not registered before and type their information. So
if the user is new user, the alert message will display that the user is not registered before.
System Testing
It is done to ensure that by putting the software in different environments and check
that it still works. System Testing is done by uploading same file in this cloud checking
whether any duplicate file exists.
Software Testing
It is the process of evaluating a software item to detect differences between given
input and expected output. Also to assess the feature of a software item. Testing assess the
quality of the product. It is a process that should be done during the development process. In
other words software testing is a verification and validation process.
Verification
Verification is the process to make sure the product satisfies the conditions imposed at
the start of the development phase. In other words, to make sure the product behaves the way
we want it to.
Validation
Validation is the process to make sure the product satisfies the specified requirements
at the end of the development phase. In other words, to make sure the product is built as per
customer requirements.
Black Box Testing
Black box testing is a testing which ignores internal mechanism of system
and focuses on output generated against any input and execution of system. It is
done for validation. It is done to check encryption and decryption after
uploading a file into the cloud.
In future work, other research issues may be investigated. First, we may further
explore the application of other spatial analysis approaches for the detection of crime dense
regions and for modeling and forecasting crime trends on such regions. Second, we may
perform an extended experimental evaluation on an wider urban territory, to assess the results
obtained in the case study reported here. Third, we may apply such an approach for spatio-
temporal prediction of other kind of events, different than crimes.
References
[1] United Nations Settlements Programme, the state of the world’s cities 2004/2005:
Globalization and urban culture. Earthscan, 2004.
[2] “Cities: The century of the city,” Nature 467, 900-901 (2010), vol. 467, pp. 900–901,
2010.
[3] F. Cicirelli, A. Guerrieri, G. Spezzano, and A. Vinci, “An edge-based platform for
dynamic smart city applications,” Future Generation Comp. Syst., vol. 76, 2017.
[4] M. Tayebi, M. Ester, U. Glasser, and P. Brantingham, “Crimetracer: Activity space based
crime location prediction,” in Advances in Social Networks Analysis and Mining (ASONAM),
2014 IEEE/ACM International Conference on, 2014, pp. 472–480.
[5] H. Wang, D. Kifer, C. Graif, and Z. Li, “Crime rate inference with big data,” in
Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, ser. KDD ’16. ACM, 2016, pp. 635–644.
[7] T. Wang, C. Rudin, D. Wagner, and R. Sevieri, “Learning to detect patterns of crime,” in
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML
PKDD 2013, 2013.
[8] D. E. Brown and S. Hagen, “Data association methods with applications to law
enforcement,” Decision Support Systems, vol. 34, no. 4, pp. 369– 378, 2003.
[11] B. Chandra, M. Gupta, and M. Gupta, “A multivariate time series clustering approach
for crime trends prediction,” in Systems, Man and Cybernetics, 2008. SMC 2008. IEEE
International Conference on, 2008, pp. 892–896.
[12] S. V. Nath, “Crime pattern detection using data mining,” in Web Intelligence and
Intelligent Agent Technology Workshops, 2006. WIIAT 2006 Workshops. 2006
IEEE/WIC/ACM International Conference on, 2006, pp. 41–44.
[13] H. Chen, W. Chung, J. Xu, G. Wang, Y. Qin, and M. Chau, “Crime data mining: a
general framework and some examples,” Computer, vol. 37, no. 4, pp. 50–56, 2004.
[14] C.-H. Yu, M. Ward, M. Morabito, and W. Ding, “Crime forecasting using data mining
techniques,” in Data Mining Workshops (ICDMW), 2011 IEEE 11th International
Conference on, 2011, pp. 779–786.
[15] Y. Zhuang, M. Almeida, M. Morabito, and W. Ding, “Crime hot spot forecasting: A
recurrent model with spatial and temporal information,” in 2017 IEEE International
Conference on Big Knowledge (ICBK), Aug 2017, pp. 143–150.
[16] H. Wang, D. Kifer, C. Graif, and Z. Li, “Crime rate inference with big data,” in
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining. ACM, 2016, pp. 635–644.
[18] P. Chen, H. Yuan, and X. Shu, “Forecasting crime using the arima model,” in Fuzzy
Systems and Knowledge Discovery, 2008. FSKD ’08. Fifth International Conference on, vol.
5, 2008, pp. 627–630.
[19] E. Cesario, C. Catlett, and D. Talia, “Forecasting crimes using autoregressive models,”
in 2016 IEEE 2nd Int. Conf. on Big Data Intelligence and Computing and Cyber Science and
Technology, 2016, pp. 795–802.
[20] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for
discovering clusters in large spatial databases with noise,” in Proceedings of the Second
International Conference on Knowledge Discovery and Data Mining, ser. KDD’96. AAAI
Press, 1996.
2. Studying The Base Paper And 2nd and 3rdWeek of January 2019
Proposed Features
15,100
Total Expected Budget (Fifteen Thousand and One
Hundred only)