Professional Documents
Culture Documents
X
X
KNOWLEDGE DISCOVERY
IN CYBERSPACE
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or
by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no
expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No
liability is assumed for incidental or consequential damages in connection with or arising out of information
contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in
rendering legal, medical or any other professional services.
CYBERCRIME AND
CYBERSECURITY RESEARCH
KNOWLEDGE DISCOVERY
IN CYBERSPACE
KRISTIJAN KUK
AND
DRAGAN RANĐELOVIĆ
EDITORS
New York
Copyright © 2017 by Nova Science Publishers, Inc.
All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted
in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying,
recording or otherwise without the written permission of the Publisher.
We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to
reuse content from this publication. Simply navigate to this publication’s page on Nova’s website and
locate the “Get Permission” button below the title description. This button is linked directly to the
title’s permission page on copyright.com. Alternatively, you can visit copyright.com and search by
title, ISBN, or ISSN.
For further questions about using the service on copyright.com, please contact:
Copyright Clearance Center
Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: info@copyright.com.
Independent verification should be sought for any data, advice or recommendations contained in this
book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to
persons or property arising from any methods, products, instructions, ideas or otherwise contained in
this publication.
This publication is designed to provide accurate and authoritative information with regard to the subject
matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in
rendering legal or any other professional services. If legal or any other expert assistance is required, the
services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS
JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A
COMMITTEE OF PUBLISHERS.
Additional color graphics may be available in the e-book version of this book.
Preface vii
Chapter 1 Computer-Based Data Analysis Techniques:
The Potential Application to Crime Investigation
in Cyber Space 1
D. Marinković and T. Civelek
Chapter 2 Spatial Data Visualization as a Tool for Analytical
Support of Police Work 19
N. Milić, B. Popović, V. Ilijazi and E. Ilijazi
Chapter 3 Cybercrime Influence on Personal, National and
International Security while Using the Internet 53
I. Cvetanoski, J. Achkoski, D. Rančić
and R. Stainov
Chapter 4 Some Aspects of the Application of Benford’s
Law in the Analysis of the Data Set Anomalies 85
D. Joksimović, G. Knežević, V. Pavlović, M. Ljubić
and V. Surovy
Chapter 5 Behaviour and Attitudes vs. Privacy Concerns
of Social Online Networks 121
G. Savić and M. Kuzmanović
vi Contents
The goal of used algorithms is that tendencies (often ‘hidden’) can be revealed
as well internet identity, political opinion, music preferences, religion etc.
However, the Web space, especially social media, is invaluable source of data
and often the first place crime investigators are referring to it in order to obtain
relevant information. Automatic data search and matching is a powerful tool
enabling the fast and efficient searching of large databases for crime
investigators. In their experiment authors tried to establish how efficient is
above afforementioned a_priori algorithm in blog analysis when association
rules are utilized.
Chapter 2 by Milić, N. et al. present some of the GIS technology
capabilities in the function of analytical support of police work at all levels of
police organization and management. In addition to the visualization of
geospatial data, GIS technology provides analytical capacity primarily in
analyzing geospatial distribution of crime incidents. Comparing to the textual
crime reports (bulletins), crime maps inform law enforcement officers much
faster and easier about the spatial distribution of crime. The innovative
functionalities of predictive analytic solutions are briefly described by authors.
The aim of chapter 3 by Cvetanoski, I. et al. is to stress the danger of
cybercrime activities in cyberspace and its impact on personal, national and
international security. Today, modern technology gives great opportunity to
use on-line tools for performing cybercrime activities, which means that
anyone can create malicious software for crime activities in cyberspace. The
authors ware used a simple linear regression in order to predict computer
crime in the future. The results of their research present that simple linear
regression model can be used to make a prediction for computer crime in a
year or two in the future, but it is not a good model to make a prediction very
far into a future.
In Chap. 4. Joksimović et al. presented generally accepted theoretical
analysis and assumptions regarding the implementation of the Benford's Law.
The implementation of this law in the analysis of the anomalies in some
numerical data in various scientific disciplines is also part of this chapter. The
authors shows how the mutual usage of the Benford's Law and specific laws of
mathematical statistics, successfully detects potential irregularities in the
numerical data and leads the forensic analyst forward in the area of the
detection of the potential fraud.
The next chapter (Chap. 5, by Savić, G. and Kuzmanović, M.) discusses
intend to investigate the relationship between concerns of Online Social
Networks or OSN users and their behavior and attitudes towards privacy.
Therefore, the behavior is investigated throughout the groups of questions
Preface ix
concerning the habits of using OSNs, leaving real personal data or connecting
with unknown people. In this chapter, authors focus on the relationship
between privacy concerns and actual behavior of the online social network
users in Serbia.
Text analysis and classification techniques might be used to improve
efficiency and effectiveness of e-government services, especially the ones
provided by law enforcement agencies by using techniques of automatic text
reports analysis, Nikolić, V. et al. (Chap. 6) propose concepts representing the
documents via the conceptual schemas. The authors presented data mining and
Lucene library spaces architecture, as well as the core Lucene, and then the
possibility of its application. Their case study deal with the possibilities of
Lucene indexing and Lucene searching of data and documents within
unstructured crime text documents in Serbian language.
The final chapter by Jevremović A., et al. presents a model for the
development of its custom system for secure communication depending on
custom encryption algorithms. They discuss key issues related to the
development of mobile devices for secure communication based on the
Android platform. This chapter also analyzes the choice between processing
encryption systems and absolutely secured encryption systems. Therefore, the
solution presented in the chapter proposes building of its custom algorithm in
the form of Linux kernel modules.
We would like to express our special thanks to the eight reviewers who
participated in the peer review process:
Chapter 1
COMPUTER-BASED DATA
ANALYSIS TECHNIQUES:
THE POTENTIAL APPLICATION TO
CRIME INVESTIGATION IN CYBER SPACE
ABSTRACT
Collecting the most versatile information about individuals and their
storing in different databases represent the reality of the contemporary
society. The growth in the quantity of information has exceeded man’s
power to process and analyze them in a traditional manner, making
computerized techniques and means, especially data mining techniques, a
necessity for these purposes. Although widely applied in the public
administration and economy domain, computerized data search and
comparison so far have not reached their full potential in the area of
crime investigation and forensics. Law enforcement agencies and forensic
laboratories collect large quantities of various data originating not only
* Corresponding
author: D. Marinkovic, Email: darko.marinkovic@kpa.edu.rs.
2 Darko Marinković and Turhan Civelek
Keywords: data analysis, data surveillance, data mining, cyber space, crime
investigation
INTRODUCTION
It is a general view today that exceptional organization of human society
inevitably relies on collecting and managing the most various data related to
their members. The efficient functioning of the government as well as non-
governmental sector requires the existence of numerous information registers
about different entities (individuals, organizations, etc.), covering all aspects
of their activities. Utilization of computer-based information systems has
largely increased the possibility of collecting, processing and analyzing data
for different purposes including surveillance of individuals and their
behaviour. The essential importance of computer storing and processing of
information is not only in the speed of carrying out various operations, but
primarily in the possibility to access the integrated mutually linked data
coming from different sources. The state-of-the art information technology
makes it possible to get these data from different networked databases in split
seconds.
Collecting and storing (naturally legal and legitimate) different kind of
information about individuals represents the reality of the modern society, as is
the fact that the persons these data refer to cannot have the absolute power
over them. However, they have the right to feel secure from possible misuses
of these data. This is why the issue of the protection of personal data today is
highlighted even more, particularly being prominent in functioning and
performing the activities of state administration institutions and judiciary,
including police. Accordingly, with regards to the availability of personal data,
they must have certain limitations due to the general interest, and in the same
or similar way as when their other freedoms and rights are limited. The task of
the legal science, law-makers and legal practice is to define standard
foundations for collection and management of the most versatile data, i.e., the
Computer-Based Data Analysis Techniques 3
conditions under which they can be used for socially justified purposes. On the
other hand, the actual (primarily technical) possibilities are increasing from
day to day for more comprehensive, complex and sophisticated exploiting of
personal data, including the data on individual activities in all fields of life and
work. Among other things, the exploiting of such data can yield good results
in fighting crime as well.
The explosive growth of quantity of data and databases they are stored in
has exceeded man’s power to process and analyze them by traditional means
requiring new and different (naturally computer-based) analyzing techniques
and means. Regardless of the purpose they are used for, automatic data search
and comparison are based on the databases where certain data are stored, on
the one hand, and on the other hand, on the application of computers
(understood as hardware) and related programs (software) used for the search,
comparison and analysis of these data.
Pointing to the hidden patterns is useful for crime analysis, but in order to
obtain meaningful results a rich and highly structured database is required.
Deviation detection uses certain measures for the study of data which
noticeably differ from other data. The researchers may use this technique to
detect frauds, hacking into network systems and for other crime analyses.
However, such activities may sometimes seem usual at first sight, which
makes identification of deviating data more difficult.
Classification finds common features among various criminal entities and
organizes them into previously defined classes. This technique is used for the
identification of so called spam e-mail messages, based on linguistic patterns
and structural features of the sender. Often used for prediction of crime trends,
classification may reduce time required for identification of criminal entities.
Association rules algorithms often generate many irrelevant rules that are
subsequently rejected during the validation process. Domain expert has to
specify constraints on the types of rules of interest before the rule discovery
stage and reduce the number of discovered rules that are irrelevant.
Data mining algorithms are finite set of steps that find out patterns in large
data sets. Patterns may be described by rules IF ... THEN, decision tables,
neural networks, genetic algorithms, linear and nonlinear models. There is not
a generally acceptable and good data mining algorithm applicable to all
situation and decision problems [10].
This algorithm search should not be performed only by one algorithm,
instead various machine learning algorithms should be applied. In addition,
each of them will have optimal results for specific (different) data type. To
determine the best learning algorithm, Kappa values, F-measure values can be
comprised with true learning ratio. However, knowing learning algorithms
success rates does not always bring us to the end. In this case, accuracy rate,
8 Darko Marinković and Turhan Civelek
A data mining tool - WEKA [12] evaluates the data from the surveys by
using machine learning algorithms. Also, it produces a confusion matrix which
is a digital output summary of made predictions.
Accuracy rate is used for model performance. It is the ratio of correctly
classified sample numbers to total sample numbers.
(TP TN ) (1)
Accuracy rate =
(TP FP FN TN )
( FP FN ) (2)
Error Rate =
(TP FP FN TN )
TP
Precision ( P) (3)
(TP FP )
TP
Recall ( R) (4)
(TP FN )
2 DK
F-measure ( F ) (5)
(D K )
( Po Pc )
Kappa value ( K ) (6)
(1 Pc )
The goal of apriori algorithm is extracting rules in the form XY, where
X and Y are itemsets. Two main parameters are considered for evaluation
metrics: support [15] and confidence [16]:
Support: The rule holds with support supp in T (the transaction data
set) if supp % of transactions contain X U Y.
(7)
Computer-Based Data Analysis Techniques 11
(8)
To select interesting rules from the set of all possible rules generated,
constraints on various measures of significance and interest can be used. The
best-known constraints are minimum thresholds on support and confidence.
and follow blogging in discrete time periods.’ Finally the followings are
considered as the main data fields which include: education, political caprice,
topics, local media turnover (LMT) and local, political and social space
(LPSS), as shown in Table 2.
WEKA has been used to explore the behaviour of the apriori algorithm
for extracting the significant patterns for users’ professional tendency
detection of blogs. The data is initially stored in MS Excel sheet, then
converted into attribute relation file format (ARFF file), which is the
acceptable format to WEKA tool. Minimum support defined by the tool for the
generated rule is 0.25 (25 instances) and minimum confidence is 0.9.
Association rules for personal features detection of the analyzing public
behaviour in webs are defined as:
• Rule 1. topic = political pb = yes 28 ==> lmt = yes 28 < conf: (1) >
lift:(1.16) lev:(0.04) [3] conv:(3.92)
• Rule 2. topic = political lpss = yes pb = yes 26 ==> lmt = yes 26 <
conf:(1) > lift:(1.16) lev:(0.04) [3] conv:(3.64)
• Rule 3. degree = high lpss = yes 31 ==> lmt = yes 30 < conf:
(0.97) > lift:(1.13) lev:(0.03) [3] conv:(2.17)
• Rule 4. degree = high topic = political 28 ==> lmt = yes 27 <
conf:(0.96) > lift:(1.12) lev:(0.03) [2] conv:(1.96)
• Rule 5. lpss = yes pb = yes 48 ==> lmt = yes 46 < conf:(0.96) >
lift:(1.11) lev:(0.05) [4] conv:(2.24)
• Rule 6. caprice = left lpss = yes 38 ==> lmt = yes 36 < conf:
(0.95) > lift:(1.1) lev:(0.03) [3] conv:(1.77)
• Rule 7. topic = political 35 ==> lmt = yes 33 < conf:(0.94) >
lift:(1.1) lev:(0.03) [2] conv:(1.63)
• Rule 8. topic = political lpss = yes 31 ==> lmt = yes
29 < conf:(0.94) > lift:(1.09) lev:(0.02) [2] conv:(1.45)
• Rule 9. degree = high pb = yes 30 ==> lmt = yes 28 < conf:(0.93) >
lift:(1.09) lev:(0.02) [2] conv:(1.4)
• Rule 10. caprice = left lpss = yes pb = yes 30 ==> lmt = yes
28 < conf:(0.93) > lift:(1.09) lev:(0.02) [2] conv:(1.4)
In our experiment using the same database set, and based on created
association rules, our findings are as follows:
CONCLUSION
The great challenge all police and intelligence agencies are facing is an
accurate and efficient analysis of crime data, the scope of which is constantly
increasing. For instance, complex criminal conspiracies are often hard to
reveal because the information on suspects may be geographically scattered
and may include large number of people. Computer crimes disclosing can also
be difficult because the extensive network traffic and frequent online
transactions create a huge quantity of data out of which only a small portion
refers to illegal actions. Police agencies and forensic laboratories collect large
quantities of various data, as a result of criminal activities processing. It can be
said that the automatic data searching and matching techniques have been
insufficiently used so far in this field, although they could contribute
significantly, particularly in discovering crimes which are difficult to
anticipate and prevent. Extenuating circumstance in their application is, among
other things, huge versatility of data that should be processed and considered.
Those involved in criminal investigations who have years of experience
can often precisely analyze crime trends, but with the increased frequency and
complexity of criminal acts human errors also appear; consequently, the time
required for analysis increases as well, while the offenders have more time to
destroy evidence and avoid being arrested. Automatic data search and
matching is a powerful tool enabling fast and efficient searching of large
databases for crime investigators, who may not be skilled for analysis. In
addition to this, utilization of specific purpose (analysis) software (such as
WEKA, SPSS, RapidMiner, etc.) often costs less than hiring or training of the
staff. Data mining techniques are generally considered as less prone to errors
than people, emphasising the need for their application in different areas
including security related issues such as crime investigation. Special
understanding of the relationship between the possibilities of the analysis and
the characteristics of a certain type of crime can help investigators to apply
these techniques more efficiently in order to identify trends and patterns,
locate problem area(s), and even predict a crime.
16 Darko Marinković and Turhan Civelek
REFERENCES
[1] Marinković, D. (2008). Tajni audio nadzor kao dokazna radnja - različiti
modaliteti i analiza rešenja u zakonodavstvu Srbije. Sprečavanje i
suzbijanje savremenih oblika kriminaliteta III (collected papers),
Beograd, pp. 228-256.
[2] Clarke, R. (1988). Information technology and dataveillance.
Communications of the ACM, 31(5), pp. 498-512.
[3] Terrettaz-Zufferey A. L. et al. (2006). Assesment of Data Mining
Methods for Forensic Case Data Analysis. Varstvoslovje, Fakulteta za
varnostne vede, Ljubljana, pp. 350-354.
[4] Kuk, K. (2015). Veštačka intelegencija u prikupljanju i analizi podataka
u policiji. Nauka, bezbednost, policija, 20(3), pp. 131-148.
[5] Clarke, R. (1994). Dataveillance by governments: The technique of
computer matching. Information Technology and People, 7(2), pp. 46-
85.
[6] Peng, Yi, et al. (2008). A descriptive framework for the field of data
mining and knowledge discovery. International Journal of Information
Technology and Decision Making, 7(4), 639-682.
[7] Witten, Ian H., Eibe F. (2005). Data Mining: Practical machine learning
tools and techniques. Morgan Kaufmann.
[8] Fayyad, U. M. et al. (1996). From Data Mining to Knowledge
Discovery: An Overview. Advances in Knowledge Discovery and Data
Mining, Cambridge, pp. 1-34.
[9] Berry, M., Linoff G. (2000). Mastering Data Mining, New York.
[10] Kuk, K., Mehic, A., Kartunov, S. (2015). The importance of data mining
technologies and the role of intelligent agents in cybercrime. Archibald
Reiss Days, Thematic Conference Proceedings of International
Significance, Volume III, Academy of Criminalistic and Police Studies,
Belgrade, pp. 223-232.
[11] Aydogan, E. K., Gencer, C., Akbulut, S. (2008). Churn Analysis and
Customer Segmentation of a Cosmetics Brand Using Data Mining
Techniques, Journal of Engineering and Natural Sciences, 26(1), pp. 42-
56.
[12] Garner, S. R. (1995). Weka: The waikato environment for knowledge
analysis. Proc New Zealand Computer Science Research Students
Conference, University of Waikato, Hamilton, New Zealand, pp. 57-64.
[13] Han, J., Kamber, M., (2006). Data Mining Concepts and Techniques.
San Francisco, CA: Morgan Kaufmann, Elsiver Inc.
Computer-Based Data Analysis Techniques 17
[14] Agrawal, R., Srikant, R. (1994). Fast Algorithms for Mining Association
Rules in Large Databases. Proceedings of the 20th International
Conference on Very Large Data Bases, VLDB, Santiago de Chile, 12-15
September 1994, pp. 487-499.
[15] Agrawal, R., Imielinski, T., Swami, A. (1993). Mining Association Rules
between Sets of Items in Large Databases. Proceedings of the 1993
ACM SIGMOD International Conference on Management of Data,
Washington DC, 26-28 May 1993, pp. 207-216.
[16] Hipp, J., Güntzer, U., Nakhaeizadeh, G. (2000). Algorithms for
Association Rule Mining - A General Survey and Comparison. ACM
SIGKDD Explorations Newsletter, 2(1), pp. 58-64.
[17] Wyld, D., (2007). The Blogging Revolution: Government in the Age of
Web 2.0, BM Center for the Business of Government, Washington, DC.
[18] Rosanna, E., Cassie, A. Bradley, E., Okdie, M, (2010). Personal
Blogging Individual Differences and Motivations, IGI Global, pp. 292-
301.
[19] UCI Machine Learning Repository (2013). Available from: http://
archive.ics.uci.edu/ml/datasets.htm.
[20] Gharehchopogh, F.S., Khaze, S.R. (2012). Data Mining Application for
Cyber Space Users Tendency in Blog Writing: A Case Study.
International Journal of Computer Applications, 47(18), pp. 40-46.
In: Knowledge Discovery in Cyberspace ISBN: 978-1-53610-566-7
Editors: K. Kuk and D. Ranđelović © 2017 Nova Science Publishers, Inc.
Chapter 2
ABSTRACT
Development of the information technology during the 80s has
significantly improved the way data is collected, stored and processed. As
a consequence police function becomes data driven more than ever
before. Crime analysis units became the new element in police
organizations’ structures worldwide and analytical information became
important prerequisite for effective policing. Having in mind that
INTRODUCTION
Historically, it can be noticed that the police agencies tasks and
responsibilities have remained substantially the same. It is the amount and
complexity of police duties that have been magnified in modern society, but
from police it was always required permanent readiness and high expertise in
solving the most complex problems in the field of security [32].
Police efficiency depends on many parameters, but we could say that
essentially it depends on data management (collecting, storing, processing, and
using). In order to solve a security problem, a police officer must have reliable
information about its origin, manifestation forms and the consequences it
causes [14]. For making a good decision, it is of great importance that such
Spatial Data Visualization as a Tool for Analytical Support 21
1
Read more in: Dempsey, C. 2012. “Where is the phrase ‘80% of data is geographic’ from?”
GIS Lounge, https://www.gislounge.com/80-percent-data-is-geographic/
2 E.g., cellular phone geoposition utilization for surveillance, or crowd movement monitoring in
order to coordinate police resources during large public gathering events, etc.
22 Nenad Milić, Brankica Popović, Venezija Ilijazi et al.
CRIME ANALYSIS
Rapid development of ICT has significantly improved the technology of
data collection and processing, whose end result - analytical information -
becomes an important factor in effective policing, making a solid base for new
discipline - crime analysis [1]. Crime analysis is required to enable police
officers to better identify problems, find out solutions and resources necessary
to address the problems and assess the achieved results [15].
According to Boba crime analysis is defined as ‘systematic study of crime
and disorder problems as well as other police–related issues - including
sociodemographic, spatial, and temporal factors - to assist the police in
criminal apprehension, crime and disorder reduction, crime prevention, and
evaluation’ [1]. Crime analysis involves the application of social science data
collection procedures, analytical methods, and statistical techniques,
employing both qualitative and quantitative techniques to analyze data
valuable to police agencies and their communities. Even though this discipline
is called crime analysis, it actually includes much more than just the
examination of crime incidents. As suggested by the International Association
of Crime Analysts (IACA), it includes ‘the analysis of crime and criminals,
crime victims, disorder, quality of life issues, traffic issues, and internal police
operations, and its results support criminal investigation and prosecution,
patrol activities, crime prevention and reduction strategies, problem solving,
and the evaluation of police efforts’ [19].
In order to avoid inconsistency and disagreement in both definitions and
typology of crime analysis, the IACA proposed the professional standards and
definitions of analytical methodologies, technologies, and core concepts
relevant to the profession of crime analysis. According to them there are four
major categories of crime analysis, suggesting that Criminal investigative
analysis, which is also sometimes called “profiling” is almost always part of
the tactical crime analysis process and therefore should not be considered to be
a separate type of crime analysis. Those categories ordered from specific to
general are:
Spatial Data Visualization as a Tool for Analytical Support 23
While crime intelligence analysis and tactical crime analysis products are
usually internal and kept confidential,3 the products of strategic and
administrative crime analysis are more likely to be distributed externally to
inform audiences outside the police agency. Tactical crime analysis and
administrative crime analysis can be performed largely from the data that
comes from internal sources (police databases and computer-aided dispatch).
Contrary, although it often starts with the data from police databases, both
crime intelligence analysis and strategic crime analysis depend on the
deliberate collection of additional data from a variety of other sources in order
to obtain a broader context of analysed phenomenon [19]. Crime analysts
review all available data, both from police records and from other sources,
with the goal of identifying patterns as they emerge. Analyses of these patterns
and trends can provide the information about the nature of crime (who, what,
when, where, how and why), helping in the development of effective tactics
and strategies in preventing victimization and reducing crime.
The three most important kinds of information that crime analysts use are
sociodemographic,4 temporal and spatial. Sociodemographic information can
be used for establishing an identity of crime suspects, or on a broader level, to
determine the characteristics of groups and how they relate to crime. Temporal
analysis is conducted for examination of short-term and mid-term patterns
(such as patterns by day of the week, time of day or time between incidents
within a particular crime series), as well as examination of long-term patterns
in crime (such as patterns by month, the seasonal nature of crime and trends
over several years) [1, 16]. Nevertheless, it is the spatial nature of crime and
other police-related issues that are central for understanding the nature of a
problem, facilitating a larger role for spatial analysis in crime analysis.
There is a number of sophisticated specialized software aiming to help
police in conducting effective crime analysis.5 Typical crime analysis tools
include Statistical Analysis, Link Analysis, and Data Visualization and Crime
Mapping software.6 Police agencies are adding a new tool, Predictive
software, to assist their efforts [17]. Predictive analytics solutions apply
sophisticated statistical data exploration and machine-learning techniques to
historical information to help agencies uncover hidden patterns and trends -
even in large, complex datasets. In the context of its support to Predictive
3
in order to avoid compromising an investigative strategy.
4
Personal characteristics of individuals and groups, such as sex, race, income, age, education,
etc.
5 More on: http://www.iaca.net/resources.asp?Cat=Software.
6 More on http://www.it.ojp.gov/documents/analyst_toolbox.pdf.
Spatial Data Visualization as a Tool for Analytical Support 25
As was previously noticed, one of the most important aspects of the crime
is the location [3]. Practically everything the police are doing is related to an
address or location. Each call for police intervention and going to the scene
has the appropriate geographical coordinates. In addition, considering the
crime as a product of human behavior, it is understandable why the crimes
geographical distribution is not random [26].
Today, the concentrated nature of crime is accepted as a fact, allowing
policing strategies shift from traditional reactive towards cutting edge
proactive (and/or location-based) policing approaches such as hotspots
policing, problem-oriented policing, intelligence-led policing, community-
oriented policing and Compstat management strategies [9]. All of them are
centered on directing crime prevention and crime reduction responses based on
crime analysis results [35].
Having that in mind crime mapping takes a significant place in the context
of crime analysis in order to facilitate understanding of the characteristics of
the spatial (geographical) distribution of crime and other events of importance
to the police work in a given time [23].
Although crime mapping and spatial crime analysis are not new concepts,
it is the emerging GIS technology that significantly contributes to their wide
utilization in crime analysis. Three important roles of GIS and crime mapping
that are generally accepted are:
Database management,
Spatial analysis, and
Data visualization.
Having in mind the rule that “A picture is worth a thousand words” the
advantage of GIS utilization in crime mapping is evident. GIS has the unique
capacity to overlay different data sources (thematic layers) in digital map
layers in order to visualize them and use for further analysis (as shown in
Figure 1a). In other words, GIS ingests all available data such as historical
crime rates, police reports, department vehicle travel routes, traffic patterns,
camera footage, details of officer deployments, locations of critical
infrastructure or gang territories and other variables and displays them on
maps [23]. For example, through a hyperlink, an analyst can access and
visualize documents relevant to the crime event (e.g., official reports, photos
from the crime scene, etc.) (see Figure 1b).
Spatial Data Visualization as a Tool for Analytical Support 27
(a)
(b)
(a) (b)
Figure 2. Map showing: a) the spatial distribution of robberies (banks, post office,
pharmacy stores, currency exchange stores, casino and gambling facilities) in Belgrade
municipality Cukarica in 2008-2010 period. b) Result of the nearest neighbor
hierarchical spatial clustering technique (CrimeStat III), in order to find events closer
to each other than expected from the random distribution.
Spatial Data Visualization as a Tool for Analytical Support 29
7
Displacement is said to occur if crime reductions in the target area lead to crime increases
elsewhere (in neighboring areas, or in the same area but at different times).
8 Opposite to crime displacement, diffusion of benefits entails the reduction of crime (or other
improvements) in the areas that are related to the targeted crime prevention efforts, but not
targeted by the response itself.
30 Nenad Milić, Brankica Popović, Venezija Ilijazi et al.
9
More on: http://desktop.arcgis.com/en/arcmap/latest/extensions/network-analyst/location-
allocation.htm.
10 grid or right-angle distance metric.
11 straight-line distance metric.
12 So-called Weber problem (Klose and Drexl, 2005).
Spatial Data Visualization as a Tool for Analytical Support 31
way that the sum of the (weighted) distances wk d k x, y to given demand
points k K located in ak , bk is minimized:
x, y (1)
kK
13
More on http://desktop.arcgis.com/en/arcmap/latest/extensions/network-analyst/location-
allocation.htm.
32 Nenad Milić, Brankica Popović, Venezija Ilijazi et al.
Figure 4. Possible ways for visualization of area coverage in accordance with the given
constraints (distance/time).
coverage has not been significantly disrupted (Figure 6a). With additional
correction of location 3, all the events will be covered with four locations
(including those who have previously remained uncovered (Figure 6b). In
other words, corrections performed on the situation shown in Figure 5b further
optimize the coverage model in a way that the number of sites (patrols)
decreases, while the total coverage capacity remains almost unchanged [25].
(a) (b)
(a) (b)
Hotspot Policing
14 TETRA is a digital trunked mobile radio standard developed to meet the needs of traditional
Professional Mobile Radio (PMR) user organisations such as Public Safety, Transportation,
Government, Military, etc. More on: http://www.etsi.org/technologies-clusters/
technologies/tetra.
Spatial Data Visualization as a Tool for Analytical Support 35
(a)
(b)
Figure 7. Network analysis application in the function of the optimal route estimation
for a) single and b) multiple locations.
15
For example, placing the label “high crime area” on a safe area may cause stigmatizing effect,
which may hinder economic development of the particular neighborhood.
36 Nenad Milić, Brankica Popović, Venezija Ilijazi et al.
Hit rate (HR), defined as the proportion of new crimes that occur
within the areas where crimes were predicted to occur:
n
HR
N (2)
where n is the number of crimes in the areas where crimes are predicted to
occur (hotspots) and N is the number of crimes in the whole study area;
Predictive accuracy index (PAI), described as the ratio of the hit rate
to the proportion of the study area that consists of hotspots,
n
HR
PAI N
proportionon _ of _ hot _ spot _ area a
A (3)
where a is the total area occupied by hotspots, and A is the size of entire
study area;
n2
hot _ spot _ crime _ ratio n1
RRI
total _ crime _ ratio N 2
N1 (4)
of the spatial analysis can be done in GIS environment (e.g., ArcGIS Spatial
Analyst provides a range of spatial modeling and analysis tools),16 different
software applications (such as the CrimeStat software)17 have the ability to
perform many of these analyses. The main goal of this analysis is to assess
whether crime locations are randomly scattered across space, or instead show
systematic patterns in the form of clusters (more points are systematically
closer together than they would be in a purely random case) or dispersion
(more points are systematically further away from each other than under
randomness). Well known hotspot analysis techniques includes grid mapping,
covering ellipses, kernel density and heuristics [27].
An example of covering ellipse methodology is the nearest neighbor
hierarchical clustering (Nnh) which identifies groups of incidents that are
spatially close. It is a hierarchical clustering routine that groups points together
based on a given criterion. The CrimeStat Nnh routine defines a threshold
distance and compares the threshold to the distances for all pairs of points.
Only points that are closer to one or more other points than the threshold
distance are selected for clustering (see Figure 2b).
One of the most popular techniques with both academics and the crime
analyst professionals is Kernel Density Estimation (KDE). The idea is to
spread out each crime’s expected contribution to the future crime risk over a
certain area using a mathematical function called a kernel. KDE is a statistical
analysis approach used to interpolate a continuous surface of crime data based
on initial crime data points from different locations. This is created by
‘overlaying a grid (with n equally sized cells) on top of the study area and
calculating a density estimates based on the center points of each grid cell.
Each distance between an incident and the center of a grid cell is then
weighted based on a specific method of interpolation (the kernel function) and
the bandwidth (search radius) [10].’ The approach produces a contour map, a
heat map, or a surface view map with the more heavily weighted areas of high
crime visually represented. The hot spots can then be defined as the areas
16
ArcGIS Spatial Analyst allows the user to create, query, map, and analyze cell-based raster
data, perform integrated raster/vector analysis, derive new information from existing data,
query information across multiple data layers and fully integrate cell-based raster data with
traditional vector data sources. More information are available at www.esri.com.
17 The purpose of CrimeStat is to provide supplemental statistical tools to aid law enforcement
agencies and criminal justice researchers in their crime mapping efforts. CrimeStat is
Windows-based and interfaces with most desktop GIS programs. It calculates various spatial
statistics and writes graphical objects to ArcGIS, MapInfo, Surfer for Windows and other GIS
packages. More information about CrimeStat are available at http://nij.gov/topics/technology/
maps/pages/crimestat.aspx.
38 Nenad Milić, Brankica Popović, Venezija Ilijazi et al.
above a certain threshold on each map. KDE produces hotspot maps with the
highest PAI, becoming stronger with longer time used for the prediction base
[5]. An example of Kernel density estimation map obtained in ESRI’s
ArcGIS® Spatial Analyst is shown in Figure 8.
Figure 8. Kernel density estimation map of robberies in the urban part of the Cukarica
municipality (Belgrade).
Geographical Profiling
which victims and offenders come together in time and space. This process is
referred to as geographic profiling [16].
Geographic profiling, introduced in the early 1990s, represents a
geospatial crime analysis technique that attempts to determine where a serial
offender most likely resides. The predictions are based on the locations of
these crimes, other geographic information about the case and the suspect, and
certain assumptions about the distance offenders will travel to commit crimes
[30]. Different algorithms could be used to calculate the area boundaries. It
must be emphasized that this technique should not be used to pinpoint a
particular location or suspect, since being a statistical technique, it gives
results in terms of probability, not certainty.
Geographic profiling involves application of advanced spatial analysis
techniques for crime distribution under the auspices of the criminology
theoretical framework, and above all routine activities theory, rational choice
theory and crime pattern theory [16].
The three most popular models for geographical profiling of unknown
offenders are Rosmo’s model (so called “criminal geographic targeting”
algorithm), Canter’s model and Levine’s model (journey-to-crime analysis).
All three models are implemented in the appropriate software solutions:
Rigel18 (Rossmo), Dragnet19 (Canter) and CrimeStat20 (Levine) [30]. Levine’s
model differs from the other two models in that it is not geographic profiling
model in the true sense of the word, as pointed out by his creator, rather it is a
model which estimates the crime trip (i.e., the road which offender utilize in
his crime action) [22].
It is clearly convenient to display the output of geographic profiling
software on a Geographic Information System that also shows streets,
landmarks, political boundaries, and other geographic features of the areas
around the crimes. Output is in the form of color shadings (two-dimensional
map) and the height of the surface (in the case of three-dimensional diagram),
representing the offender’s likely base of operations.
Despite the fact that geographic profiling is a relatively new discipline, it
is gaining in popularity after successful implementation in resolving several
serial crimes in the United States and Canada. Today, there is a number of
software packages that can be freely downloaded from the Internet, as the
18
More on http://ecricanada.com/products/rigel-analyst/.
19
More on http://www.i-psy.com/publications/publications_dragnet.php.
20 More on http://www.icpsr.umich.edu/CrimeStat/.
40 Nenad Milić, Brankica Popović, Venezija Ilijazi et al.
21
More on http://ecricanada.com/technologies/.
Spatial Data Visualization as a Tool for Analytical Support 41
There are some controversial issues in the field of GIS and crime
mapping. One is so called ‘spatial labeling’ where labeling an area as
dangerous might produce serious consequences to the community of that area
in terms of economic, sociological, and criminological perspective [9].
Another controversial issue is the privacy of the people (especially crime
victims), where GIS and crime mapping may cause transgression of privacy
and confidentiality of people’s lives [33]. That is especially true for the
Internet crime maps [21]. Some detailed information such as gender, time,
place, ethnicity and age of criminals or victims might be used for creation of
web-based maps, where overlaying specific crimes with them may
inadvertently reveal the identity of a victim.22 In a public information system,
mechanisms must be devised to ensure privacy protection in order to balance
the public right to information and privacy of the crime victim [9]. Therefore,
the most important thing is to provide that individual identification is either
confidential or impossible (anonymous).
22
Who is later often stigmatized on that basis (e.g., rape victim). People therefore often hide
information of their victimization from law enforcement agencies knowing that others could
recognize them with small efforts on the Internet.
23 Source: https://preview.crimereports.com/#!/ accessed on 05/31/2016.
42 Nenad Milić, Brankica Popović, Venezija Ilijazi et al.
24 Compstat is a performance management system that is used to reduce crime and achieve other
police department goals. Compstat emphasizes information-sharing, responsibility and
accountability, and improves effectiveness.
25 The IBM-Cognos Crime Information Warehouse, more on ftp://public.dhe.ibm.com/software/
data/sw-library/cognos/demos/bp_od_blueprints/resources/br_ibm_crime_warehouse.pdf.
26 More on http://www.esri.com/partners/partners-alliance/ibm/solutions.
27 More on http://www.esri.com/~/media/Files/Pdfs/library/fliers/pdfs/esri-maps-ibm-cognos.pdf.
Spatial Data Visualization as a Tool for Analytical Support 43
Predictive Policing
Predictive policing has become one of the hottest emerging areas in law
enforcement. It can be described as ‘the application of analytical techniques -
particularly quantitative techniques - to identify likely targets for police
intervention and prevent crime or solve past crimes by making statistical
predictions’ [27]. All types of data can be analyzed, both structured and
‘unstructured’ (such as emails, text messages, audio and video files, health
records, journals, etc.). Both the volume and the quality of these data will
determine the usefulness of any approach. Obtained information can serve for
28
Source: ftp://public.dhe.ibm.com/software/data/sw-library/cognos/demos/bp_od_blueprints/
resources/br_ibm_crime_warehouse.pdf.
44 Nenad Milić, Brankica Popović, Venezija Ilijazi et al.
is one of the few in the USA using PredPol29 software that automatically
generates maps for police of where and when crimes may occur [12]. PredPol
is a cloud-based software-as-a-service (SaaS) in which unique crime
prediction methodology combines available crime data with advanced
mathematics, cloud computing and computer learning (including indispensable
experience of veteran police) techniques. By analyzing input data - type, place,
and time of crime, an output in the form of red box (dimension 500 by 500
foot) is shown, pointing to the district areas with the highest risk for criminal
activity for that shift (Figure 12). The results are more accurate and more
actionable recommendations for when and where crime is most likely to occur
thus allowing police to show up before crime happens.30
Figure 12. Predictive Policing Screen Shot: PredPol™ a cloud-based SaaS for crime
prediction (adopted from [11]).
29
More on http://www.predpol.com/.
30
PredPol's Innovative Predictive Policing Software Results in Dramatic Crime Reduction,
avaliable at http://www.prnewswire.com/news-releases/predpols-innovative-predictive-
policing-software-results-in-dramatic-crime-reduction-227802601.html.
Spatial Data Visualization as a Tool for Analytical Support 47
CONCLUSION
Police officers are engaged on a daily basis in the collection of data
necessary for fulfilling their responsibilities. The collected data are analyzed
and in the form of different analytical products disseminated to its users.
Having in mind that the most of activities undertaken by the police officers
have a spatial component (X and Y coordinates), cartographic visualization of
these data takes significant place in the crime analysis process. The importance
of spatial data visualization in the police practice has been recognized more
than a century ago, when the first crime maps appeared on the police stations’
walls [16]. Comparing to the textual crime reports (bulletins), crime maps
inform law enforcement officers much faster and easier about the spatial
distribution of crime.
Intensive development of information and communication technologies
has extended the existing analytical methods and added some new ones, thus
forming new approaches to solving crime problems. Crime mapping and GIS
technology are such examples. Helping police officers to be better informed
about crime and other events of importance for their activities, crime maps
enable more effective identification of problems and their causes, which is the
prerequisite for efficient work aimed at their elimination. In this way, spatial
visualization becomes an important decision making support tool at all levels
of police organization - from a street police officer, to the top management of
the police organization.
Nowadays when the limited resources should be effectively deployed in
order to combat crime and to respond to growing citizens' demands, focusing
them on the place and at the time when they are most needed becomes an
essential prerequisite for the effective performance of the police functions. In
this regard, the identification of crime hotspots becomes a part of everyday
police analysts’ activities. Although the human eye and brain can be a good
‘tool’ for geospatial data processing, visual method cannot be sufficient to
enable making correct conclusions. This is particularly evident in cases where
complex spatial distribution of crimes is analyzed. In this context GIS tools
have an important role. They can enable analysts to ‘see’ what is invisible to
the human eye. Timely recognition of problematic locations (e.g., hotspots)
and placing them into the focus of police attention, could lead to opportunity
reduction and yield clear crime prevention benefits.
In order to facilitate access to crime maps, police organizations use the
benefits of the Internet technology. Ensuring constant access to the current
crime data (24/7), the Internet crime maps enable citizens to get the data about
48 Nenad Milić, Brankica Popović, Venezija Ilijazi et al.
crime distribution and crime trends in a fast and easily accessible way. The
most popular are interactive crime maps that allow users to perform their own
queries (by the type, place or time of crime, etc.), and get answers to the
questions they are interested in. Specialized internet-based software will
enable up-to-date information that creates the needed situational awareness
among officers helping them to react timely.
Emerging predictive analytics capability which combines real-time
information gathering with data mining techniques helps law enforcement
officers to uncover hidden patterns, associations, correlations and trends in
large complex datasets of structured and unstructured data. Even more, it can
help to find not only where a crime will most likely occur, but also when and
who the suspect or victim is likely to be, helping police officers to react before
crime is committed (prevent it). With the development of machine learning
techniques, data mining solutions will produce results almost in real time and
it will not be long before we start talking about ‘smart police’ as a part of the
future ‘smart cities’ in the ‘smart world’.
At the end we would like to emphasize that wide utilization of the
aforementioned techniques for crime analysis is to a large extent the result of
spatial visualization techniques development, which make it possible for end
users (police officers) to understand and exploit a product of complicated
analytical methods when presented in a form of the visual data (map).
REFERENCES
[1] Boba, Rachel. 2005. Crime analysis and crime mapping. SAGE
Publications.
[2] Braga, Anthony A., Papachristos, Andrew V., Hureau, David M. 2010.
“The concentration and stability of gun violence at micro places in
Boston, 1980–2008.” Journal of Quantitative Criminology 26:33–53.
[3] Braga, Anthony A., Weisburd, David L. 2010. Policing Problem Places:
Crime Hot Spots and Effective Prevention (Studies in Crime and Public
Policy) 1st Edition, Oxford University Press.
[4] Bureau of Justice Assistance. 2013. “Compstat: Its origins, evaluation
and future in law enforcement agencies.” Bureau of Justice
Assistance&Police Executive Research Forum. Washington DC.
https://www.ncjrs.gov/App/Publications/abstract.aspx?ID=265292.
Spatial Data Visualization as a Tool for Analytical Support 49
[5] Chainey, Spencer; Tompson, Lisa and Uhlig Sebastian. 2008. “The
utility of hotspot mapping for predicting spatial patterns of crime.”
Security Journal, 21(1):4-28.
[6] Clarke, Ronald V., Weisburd, David. 1994. “Diffusion of crime control
benefits: Observations on the reverse of displacement.” In Crime
prevention studies, 2:165-184, edited by Ronald V. Clarke, Monsey,
NY: Criminal Justice Press.
[7] Cohen, Jacqueline and Wilpen L. Gorr. 2006. “Development of Crime
Forecasting and Mapping Systems for Use by Police in Pittsburgh,
Pennsylvania, and Rochester, New York, 1990-2001.” ICPSR04545-v1.
Ann Arbor, MI: Inter-university Consortium for Political and Social
Research, 2006-08-31. http://doi.org/10.3886/ICPSR04545.v1.
[8] Crime Tech Solutions. 2015. “What is Geospatial Crime Mapping?”
Crime Technology Weekly, October 20. Accessed January 28, 2016.
https://fightfinancialcrimes.com/2015/10/20/what-is-geospatial-crime-
mapping/.
[9] Daglar, Murat and Argun, Ugur. 2016. “Crime Mapping and
Geographical Information Systems in Crime Analysis.” International
Journal of Human Sciences, 13(1):2208-2221. doi:10.14687/ijhs.
v13i1.3736.
[10] Eck, John E., Chainey, Spencer; Cameron, James G., Leitner, Michael
and Wilson, Ronald E. 2005. Mapping crime: Understanding hotspots.
Washington DC: National Institute of Justice. https://www.ncjrs.gov/
pdffiles1/nij/209393.pdf.
[11] Friend, Zach. 2013. “Predictive Policing: Using Technology to Reduce
Crime.” FBI Law Enforcement Bulletin, April, 2013. Accessed April 12,
2016. https://leb.fbi.gov/2013/april/predictive-policing-using-technology
-to-reduce-crime.
[12] GCN Staff. 2014. “Seattle police deploy SeaStat crime mapping tech.”
GCN, September 23. Accessed April 5, 2016. https://gcn.com/articles/
2014/09/23/seastat-seattle-crime-mapping.aspx.
[13] GIS for Crime Analysis, Law Enforcement, and Public Safety. 2014.
American Sentinel University. Accessed February 20, 2016.
http://www.americansentinel.edu/blog/wp-content/uploads/2014/06/
AS_GIS-Crime-eBook-Final.pdf.
[14] Goldstein, Herman. 1990. Problem-oriented policing, McGraw Hill,
New York. Available at http://www.popcenter.org/library/reading/
pdfs/goldstein_book.pdf.
50 Nenad Milić, Brankica Popović, Venezija Ilijazi et al.
[15] Gottlieb, Steven; Arenberg, Sheldon and Singh, Raj. 1994. Crime
analysis: From first report to final arrest, CA: Alpha Publishing.
[16] Harries, Keith D. 1999. Mapping crime: Principle and practice, U.S.
Dept. of Justice, Office of Justice Programs, National Institute of Justice,
Washington DC. https://www.ncjrs.gov/pdffiles1/nij/178919.pdf.
[17] Hubler, David. 2013. “Predictive analysis grows as crime-prevention
tool.” GCN. January 15. Accessed March 10, 2016. https://gcn.
com/articles/2013/01/15/predictive-analysis-crime-prevention-tool.aspx.
[18] IBM. 2011. “Predictive Crime Fighting.” IBM’s 100 Icons of Progress,
March 17. Accessed March 7, 2016. http://www-03.ibm.com/ibm/
history/ibm100/us/en/icons/crimefighting/.
[19] International Association of Crime Analysts. 2014. “Definition and types
of crime analysis.” Standards, Methods&Technology White Paper 2014-
02, Overland Park, KS. http://www.iaca.net/Publications/Whitepapers/
iacawp_2014_02_definition_types_crime_analysis.pdf.
[20] Klose, Andreas and Drexl, Andreas. 2005. “Facility location models for
distribution system design.” European Journal of Operational Research,
162(1):4-29. http://dx.doi.org/10.1016/j.ejor.2003.10.031.
[21] Kounadi, Ourania; Bowers, Kate and Leitner, Michael. 2015. “Crime
mapping on-line: Public perception of privacy issues.” European journal
on criminal policy and research, 21(1):167-190.
[22] Levine, Ned. 2015. CrimeStat: A Spatial Statistics Program for the
Analysis of Crime Incident Locations (v 4.02). Ned Levine and
Associates, Houston, TX and the National Institute of Justice,
Washington, DC.
[23] Milc, Nenad. 2012. “Crime mapping in a function of problem oriented
policing (in Serbian).” NBP - Journal of Criminalistics and Law,
Belgrade, 1:123-140.
[24] Milic, Nenad. 2012a. “Crime mapping in a function of improving
partnership between the police and the local community (in Serbian).”
Bezbednost, 3:138-159.
[25] Milic, Nenad and Subosic Dane. 2013. “Location problems solving in
the function of police resources engagement optimization (In Serbian).”
In thematic proceeding Structure and function of police organization -
tradition, status, perspective - II, Academy of criminalistic and police
studies, Belgrade, Serbia, pp. 239-251.
[26] Paulsen, Derek J., Robinson, Matthew B. 2004. Spatial aspects of crime:
Theory and Practice, Pearson Education.
Spatial Data Visualization as a Tool for Analytical Support 51
[27] Perry, Walter L., McInnis, Brian; Price, Carter C., Smith, Susan C.,
Hollywood, John S. 2013. Predictive Policing: The Role of Crime
Forecasting in Law Enforcement Operations, Santa Monica, CA: RAND
Corporation, 2013. http://www.rand.org/pubs/research_reports/RR233.
html.
[28] Popovic, Brankica. 2013. “Role of ICT in modern police organization
(In Serbian).” In thematic proceeding Structure and function of police
organization - tradition, status, perspective - II, Academy of
criminalistic and police studies, Belgrade, Serbia, pp. 251-270.
[29] Reppetto, Thomas A. 1976. “Crime prevention and the displacement
phenomenon.” Crime and Delinquency, 22(2):166-177.
[30] Rich, Tom and Shively, Michael. 2004. A Methodology for Evaluating
Geographic Profiling Software, National Institute of Justice’s Document
No.: 208993, Washington DC. https://www.ncjrs.gov/pdffiles1/nij/
grants/208993.pdf.
[31] Sherman, Lawrence W., Gartin, Patrick R., Buerger, Michael E. 1989.
“Hot spots of predatory crime: Routine activities and the criminology of
place.” Criminology, 27(1):27−56. doi: 10.1111/j.1745-9125.1989.
tb00862.x.
[32] Walker, Samuel and Katz, Charles. 2012. The police in America: An
introduction, 8th edition, McGraw-Hill Education.
[33] Wartell, Julie and McEwen, Тhomas. 2001. Privacy in the Information
Age: A Guide for Sharing Crime Maps and Spatial Data, US
Department of Justice, Washington DC. https://www.it.ojp.gov/
documents/d/188739.pdf.
[34] Weisburd, David; Bushway, Shawn; Lum, Cynthia and Yang, Sue-Ming.
2004. “Trajectories of crime at places: a longitudinal study of street
segments in the city of Seattle.” Criminology, 42(2):283–322. doi:
10.1111/j.1745-9125.2004.tb00521.x.
[35] Wood, Tyler. 2015. “What is Crime Analysis?” Crime Technology
Weekly, December 11. Accessed March 28, 2016. https://fightfinancial
crimes.com/tag/data-visualization/.
In: Knowledge Discovery in Cyberspace ISBN: 978-1-53610-566-7
Editors: K. Kuk and D. Ranđelović © 2017 Nova Science Publishers, Inc.
Chapter 3
ABSTRACT
The aim of this chapter is to stress the danger of cybercrime activities
in cyberspace and its impact on personal, national and international
security in the 21st century. Insignificant approaches towards this
phenomenon may lead to unpredictable consequences even for the state’s
security.
The new millennium brought information society growth which
enabled the nations to be linked in the global cyber space that lead to fast
data transfer throughout the world. Globalization of the cyberspace
caused new risks and threats which are invisible for the eyes and stealthy
for the ears. The cyber-criminals act conspiratorially through the
cyberspace; they penetrate in the system privacy and conduct the crime in
* Corresponding
author: I. Cvetanoski, Email: igorcvetanoski@yahoo.com.
54 I. Cvetanoski, J. Achkoski, D. Rančić et al.
such manner that we are even not aware of being victimized. Cybercrime
starts as personal, but it ends as international security threat.
During the research we will stress on the motives which encourage
cyber-criminals to execute cybercrimes on individuals, private
sector/business companies or state institutions. Furthermore, we will
define categories and types of cybercrime. Also, there will be presented
the methods of cybercrime, such as: hacking, social engineering,
phishing, pharming, denial of services attacks, distributed denial of
services, malicious software usage, adware, steganography and etc..
There will be presented some examples of cybercrimes that occurred in
the world in order to note that no state is immune on this threat in the 21 st
century. And finally, some examples of cybercrime will be shown since
they were noticed in the past few years in Macedonia accompanied with
some statistics. We will also present simple linear regression as a model
for short range predictions (a year or two in the future).
In the future, cybercrime will have increasing rate and it will cause
more significant damages due to the development of the information-
technology society. Today, modern technology gives great opportunity to
use on-line tools for performing cybercrime activities, which means that
anyone can create malicious software for crime activities in cyberspace.
INTRODUCTION
The modern IT society enables global connection of the people through
cyberspace. Communications through cyberspace enable rapid transfer of
information, but they increase the risk to be compromised. The cyberspace as a
new battlespace creates new threats, new warriors and new challenges in the
21st century.
The cyberspace consists of many interconnected computers, servers,
routers, switches and fiber optic cables. Proper use of cyberspace is the basis
for the economy and national security. Provision of cyberspace is an extensive
undertaking that requires coordinated action and commitment from all
stakeholders of society: governments, states, local governments, private sector
and citizens [4].
Nowadays, modern societies depend on cyberspace for normal
functioning. The threat of cyber war and its alleged effects are source of a
great concern for governments and armed forces in the world. The fact that
Cybercrime Influence … 55
several serious cyber attacks are being carried out in these moments while
debating the exact definition of cyber war, can serve as an illustration of what
can be expected if the real cyber war occur in the future. There is real
inconvenience to identify the perpetrators of cyber attack, so they have plenty
of time to conceal their real identity [2].
Perhaps the movie “Matrix” starring Keanu Reeves is one of the many
stories about the future and the evolutionary process of cyberspace, about the
evolution of the war, about change of the perception of the man to the
machine, about the technological development and the development of
artificial intelligence, about switching roles between the humans and the
machines, about the world in which the machines manage the people, about
virtual world created by the progress of machinery using the possibilities of
cyberspace and smooth mutual communication through the established
network connections.
The scope of this chapter is cybercrime and the basis for the cybercrime
definition is that this type of crime includes any criminal act relating to
computers, computer networks and computer systems. The convention on
Cybercrime 2001 of the Council of Europe in its preamble defines cybercrime
as “activities that are directed against the integrity, confidentiality and
availability of computer systems and data networks, as well as any misuse of
these system networks and computer data” [10].
Malicious hackers are responsible for the cyber attacks. They have a basic
objective to penetrate into the computer, data network or computer system
through cyberspace, with the ultimate objective being disruption of the
stability of the system, taking over control of the system (so called zombie
system), denial of services attacks, stealing the personal data, stealing the
monetary funds from their own accounts, propaganda, spying, changes to data,
abuse of critical infrastructure and many other criminal activities with the help
of malicious software (viruses, worms and etc.).
It is difficult to understand the motives for committing cybercrime,
however, following grounds are very common:
Political/religious
Financial benefit,
Idealistic (activities held only to prove the capabilities without
expectation of reward or a financial benefit)
Curiosity, adventure (beginners who have not entered the criminal
leads, but they do it for fame, without the knowledge and skills) [13].
56 I. Cvetanoski, J. Achkoski, D. Rančić et al.
Recent studies have shown that crime associated with computers has
increasing rate, which primarily refers to the violation of intellectual property
(unauthorized copying and theft of copyright) and software piracy. There are
many types of cybercrime but some of them are:
number of Internet users make difficult and impossible to search for violators
of the laws, relating to the abuse of the Internet. The most common ways of
endangering the computers over the Internet are:
Legal spyware programs are those one that are installed on the computer
by the owners of the company, in order network administrators to be able to
monitor the activities of employees. These programs are used for protection of
intellectual property, data and computer networks, and parental supervision of
children and juveniles (at the request of a parent). Other legal cases for using
these programs are for the purpose of the authorities of the state in order to
monitor terrorists, criminals and other law-breakers.
Commercial spyware programs are programs of the companies which are
created for collecting information of users’ habits when viewing Internet
content. These programs are illegal and they collect users’ information easily.
The greatest benefit of spywares has the marketing industry. The spywares, in
accordance to their purpose, can be divided into the following categories:
Rootkit Programs
Backdoor Programs
Adware Programs
agreement with the research teams of applications for smart phones – that can
even use GPS (Global Positioning System) to find our location [30].
The cookies, pop-ups and adware are tools for monitoring our behavior
when we are on-line on the Internet and are used to promote various products.
Many cookies are safe tools for the sole purpose of monitoring and collecting
information from the Internet. In the most of the cases adware programs are
made of pop-up ads that cause nothing else than unwanted nuisances. The
main problem with these tools is that malicious hackers and on-line criminals
largely use them to access and enter our computer and collect our personal
information without we’ve been aware of it. Some of the data of the user who
visits a website are detected through log files. These files register those data
targeted at the creator of the website [33].
Pharming Programs
Pharming unlike Phishing directs users to fake websites without the user
being aware of it. The phishing web pages usually use the domain name for
the address, while their exact location is determined by the IP address. The
user gets to write the domain name into their web browser and press enter; the
domain name is converted into an IP address through DNS (Domain Name
Server). Thus web browser connects to the server with that IP address and
takes data from the website. Once the user visits the website, DNS entrance on
that side often remembers the DNS cache of computer user. Thus computer
must constantly access the DNS server whenever the user wants to access the
website. One of the ways of Pharming is an e-mail that has a code of a virus
that infects the local DNS cache user. For example, instead of IP address
17.254.3.183 which essentially is the address of www.apple.com, it can be
changed to another website by hackers. Pharmers – can infect some DNS
servers, which means that any user who uses that server will be redirected to
the wrong website. Usually most of the DNS servers have protection measures
that protects from these attacks. However, this does not mean that they are
100% immune to attacks by malicious hackers. These attacks can act on
multiple users at once in cases when large DNS server is modified [6].
Methods pharming and phishing are the best known methods of identity theft
and other personal data of the user. Categories of websites that were probably
compromised with malware in 2013 are shown in Table 2 [15].
64 I. Cvetanoski, J. Achkoski, D. Rančić et al.
Scareware Programs
The term scareware marks several classes of programs for fraud, often
with little or no profits that are sold to consumers because of some unethical
marketing. These programs are made as to cause shock or perception of theft
among users. The most frequently used tactic is convincing the user that the
computer is infected with a virus so it is recommended to download antivirus
program to remove the virus. Recommended antivirus is mostly commercial
and users must pay their use. The term programs fraud is often used to
describe a product while performing the desired operation and also produces
many warnings for the purposes of the application of commercial firewall or
programs for cleaning the registry (registry cleaner software). These classes of
programs mark and often display continuous warning messages to users. Even
more, some websites display windows with new ads (pop-up) or
advertisements (banners) with text that emphasizes the user that the computer
is infected with malware, and because of that they suggest scanning the
computer by clicking on offered window. These programs are not linked with
the installed malicious programs; they give false warnings, and are made as
coming from the operating system. The user can infect his/her computer even
if he/she presses the window to cancel or close the message. Some types of
programs that steal user’s data are also ranked in the scareware programs
because they shift the appearance of the background of the computer, they
install icons (for the operating system Windows), and continuously inform the
user that their computer is infected with some form of malicious software. One
example for this type of fraud is SpySheriff. It is a program for stealing user
data posing as a program to remove these malicious programs.
Cybercrime Influence … 65
Ransomware Programs
Steganography
Security risks and threats, that are mentioned above permanently exist in
the Internet-space. For these reasons everyone who is using this space should
be aware of the risks and threats that constantly lurk in both. Furthermore,
there are a few examples of malicious programs that were popular in 2013.
Cybercrime Influence … 67
problem they created large expert team, which decided to publish this
information through the media in order to point to the thief on the possible
consequences (in order to convince the thief to destroy laptop). On the other
side, representatives of the IT sector of insurance company had undertaken all
necessary measures to protect the data from possible abuse [31]. The outcome
was such that luckily there were no consequences for the pension fund and the
state after all, but because of negligence and breach of security procedures by
one person contingency funds, time and extra work for data protection were
spent.
The report “The high-tech crime” 2011 of the company Norton, which is
designed for software solutions, estimated that consumers lost about 114
billion US dollars. The newspapers made a comparison and found that profits
from cybercrime are equivalent to profit from the global drug trade.
In June, 2012 the FBI performed operation “Card Shop”, in which 24
people from thirteen countries on four continents were arrested, for stealing
and selling of credit card data. FBI succeeded to capture them due to the fact
that secretly placed online carding forum called “Carder Profit”, who worked
on the principle invitation only and was constantly monitored by members of
the FBI. Stolen data were returned to the banks; more than 400.000 victims of
cyber crime were protected and it avoids loss of 205 million US dollars [14].
Assistant Director of the FBI from the USA, Janice K. Fedarcyk, said that:
“From New York to Norway and Japan to Australia, Operation “Card Shop”
was directed against sophisticated, highly organized cyber criminals involved
in buying and selling stolen identities, used credit cards, forged documents and
sophisticated hacking tools. Two-year-old secret FBI investigation conducted
on 4 continents is proof of commitment to eradicate rampant criminal behavior
of the Internet” [32]. This action also involved computer crime unit from the
Ministry of Interior from Macedonia. According to the FBI in Macedonia were
accommodated only orders for searching and interrogation of two persons for
whom there were grounds for suspicion that they are involved in cybercrime.
However in this action coordinated by the FBI, there weren’t arrested entities
from Macedonia [27].
In 2012, the Computer Crime Unit of the Ministry of Interior (MI) in
Macedonia detects cybercrime attack which made damage and unauthorized
entry into a computer system in public procurement for a hundred and fifty
police vehicles. Namely, on September 3, 2012 Bureau for Public Safety
(BPS) filed an application with respect to other issues of a technical nature in
the operation of the electronic procurement system in Macedonia in
implementing electronic auction of MI for the purchase of motor vehicles.
Cybercrime Influence … 69
Furthermore, from the analysis of the log files (logs) for access to the site, it
was established that increased traffic coming from 119 different IP (Internet
Protocol) addresses from various countries of the world. Obviously, the goal
was an attack for prohibited access to the electronic system of BPS which was
bombed with simultaneous claims of 119 different IP addresses that blocked
the electronic system [29].
In February, 2014 the Dutch police arrested four Dutch and one German,
and they closed the trading in the so-called “Dark web – Utopia”. These
people were suspected of being involved in illicit drugs, stolen credit cards,
weapons and etc.. Two of the arrested were suspected of having established
another web site called “Dark web” known as the “Black Market Reloaded”.
In the operation were found and seized the following things: personal
computers, hard drivers, USB sticks and 900 so called “Beatcoin” that had a
value between 400.000 and 600.000 euros [26].
The number of criminal activities in the area of cybercrime that occurred
worldwide and on domestic level is significantly higher than the above
mentioned. The aim is to show that no country is immune to this modern threat
nowadays, which is constantly changed in shape and capacity. Cybercrime like
any other crime knows no borders, nations or individuals, but it’s well known
environment is the cyberspace.
X Y
1.00 1.00
2.00 2.00
3.00 1.30
4.00 3.75
5.00 2.25
Figure 1. A scatter plot of the regression line for example data in Table 3 (The black
line consists of the predictions, the points are the actual data, and the vertical lines
between the points and the black line represent errors of prediction) [9].
The error of prediction (Y-Y’) for a point is the value of the point minus
the predicted value (Y’ – the value on the line). So far, the most common
criterion which was used for the “best” fitting line is the line that minimizes
the sum of the squared errors of prediction. That criterion was used to find the
line in Figure 6. Even though, the regression line nowadays is computed with
statistical software still, the calculations are relatively easy and are given
further in the text. MX is the mean of x, MY is the mean of y, SX is the standard
deviation of x, SY is the standard deviation of y, and r is the correlation
between X and Y.
Cybercrime Influence … 71
Sy
𝛃=𝑟 (2)
Sx
where ZY’ is the predicted standard score for Y, r is the correlation, and ZX is
the standardized score for X. Note that the slope of the regression equation for
standardized variables is r [9].
Further, we are going to present and analyse computer crime in
Macedonia in the period of 2012th to 2015th. In 2012, there was 1 (one)
reported case of production and distribution of child pornography via
computer system [35]. In 2013, there were 91 incidents: 74 cases of
unauthorized penetration into a computer system, 4 cases of computer fraud,
and 13 cases of credit card fraud. In 2014, there were 103 incidents of
computer crime: 76 cases of unauthorized penetration into a computer system,
4 cases of computer fraud, 4 cases of abuse of credit cards, 18 cases of
production and use of a fraudulent credit card, and 1 case of computer forgery
[36]. In 2015, there were 48 incidents of computer crime [37]. These data is
shown in Figure 2. Further it is used for data analysis.
In this chapter we are using linear regression to produce a scatter plot and
along with that a trendline –regression line (Figure 3). So, our data set is the
situation of computer crime in Macedonia over various years (data in
Figure 2). The first column of Table 4 shows various years and third column
shows rate of computer crime during that years. In order to simplify Table 4,
instead of years in the first column we create second column of values 0, 1, 2,
3 respectively for the years since 2012.
72 I. Cvetanoski, J. Achkoski, D. Rančić et al.
Figure 2. Chart for detailed data for computer crime in Macedonia from 2012 to 2015
year.
Figure 3. Linear regression chart for computer crime during years in Macedonia.
Figure 4. Share of Internet users who caught a virus or other computer infection [11].
The Figure 4 shows the share of Internet users who caught a virus or other
computer infection in Macedonia, Turkey, Greece, Bulgaria, Slovenia and
Croatia. Compared with 2010, the share of internet users who caught a virus or
other computer infection resulting in loss of information or time, dropped in
all countries by 2015, except Macedonia and Croatia. As shown in Figure 4,
the most remarkable fall was detected in Bulgaria (from 58% in 2010 to 28%
in 2015, or a decrease by 30 percentage points), followed by Slovenia (-21 pp),
Turkey and Greece (both -9 pp). Contrary to these countries, Croatia (+8pp)
and Macedonia (+3pp), remark grow in loss of information or time due to
catching virus or other computer infection through Internet. The Figure 4
shows that Macedonians were the most exposed nation on hackers’ attacks in
comparison to other countries shown on the Figure 4 (78% of Internet users in
74 I. Cvetanoski, J. Achkoski, D. Rančić et al.
Figure 5. Selected on-line activities not done because of security concerns, 2015 (% of
Internet users) [11].
1.1 Never open the links in the e-mail message from an unknown
source;
1.2 No need to open an attachment from the email if we do not expect
or do not know the sender;
1.3 Antivirus scanning of attachments from e-mail prior to opening;
1.4 Always delete e-mail in the spam without an opening;
1.5 No need to give the address of the e-mails of people who do not
know us.
against unwanted e-mail messages or pop-up ads that claim to contain anti-
virus program. These messages usually are Trojan horses waiting to infect our
computer.
It is necessary to check private and security settings of web browser to our
computers or mobile smart phones, which often are bought with installed web
browsers (Safari, Firefox, and Chrome, Internet Explorer or other). Search
engines often come with default settings that provide a balance between the
computer’s security and functionality of web pages. Settings set limits on the
extent to which computers will enable Internet applications - such as cookies,
ActiveX and Java – that help websites to perform important functions. If our
search engine allows unlimited interaction cookies or other applications that
monitor Internet activity, can easily be targeted, by contrast, if completely
block these applications then the website will not function effectively. It is
therefore necessary to find a balance, so for more detailed information it is
best to visit the producer of relevant search engine where we can inform
ourselves for the setting of personal and security information [30].
As regards to the necessary measures and actions that should be taken to
increase the security of information systems in state institutions, the following
steps should be considered:
CONCLUSION
The methods and forms of cybercrime are in constant evolution. They
require continuous monitoring and studying by the authorities. Their ability to
adapt to the new environment shows the necessity of preventive measures in
order to protect cyberspace. The new millennium, the Internet revolution, new
ways of warfare, new enemies, new tactics and techniques of warfare, new
leaders, new world order and a new world security card only confirm the role
of security and intelligence in the modern world to fight against cybercrime.
Everyone who is connected to the Internet is constantly exposed to
security risks and threats from malicious software. The malicious software is
located in the Internet-space which is created, updated, upgraded, modified
and distributed to target groups by malicious hackers. Motives for creating
malicious software are of different nature (espionage, crime, entertainment and
etc.). One of the biggest dangers is disgruntled insiders within each
organization. Searching the Internet with an open IP address is an additional
security risk. Also, security risk means open access to the web page, while all
our activities are detected in the browser history, cookie store and so on.
Cybercrime is increasingly appearing in more complex forms difficult to
detect and prevent. The malicious software as one of the methods of
cybercrime is accessible in cyberspace. Nowadays, it is unnecessary to be a
great specialist of computer equipment or a good programmer in order to
create malicious software for criminal activities, because many of the
malicious software already exists in the Internet-space. Codes of malicious
software are built and placed on a web site or forum for malicious hackers who
waited on its use by any person who wants to create a cyber-attack. Social
engineering has always been a good tool for criminals to access information of
a personal nature of the potential target for implementation of activities in the
area of cybercrime. Information gathered through social engineering in many
cases resort to negligence and accident.
Cybercrime Influence … 79
Cyber criminals usually use the phishing because they know that there are
people who have the resources (computers) but lack of knowledge, they are
reckless and curious, and therefore they often become victims of this method
of cybercrime.
Protection against malicious software on the Internet is by constantly
updating antivirus program, installing and enabling a firewall, check the
private and security settings of the search engines, password protection, raising
awareness of using external memory devices (USB, CD and etc.), working
with a hidden IP address, using concealed search (incognito) and other
measures for computer protection.
In this chapter, simple linear regression was used in order to predict
computer crime in the future, due to the previous years. The results of this
research showed that simple linear regression model can be used to make a
prediction for computer crime in a year or two in the future, but it is not a good
model to make a prediction very far into a future.
The possibilities for action of cyber criminals are huge and they use all
their methods. However, the biggest threat to the operation of cyber criminals
in cyberspace will occur due to: low awareness of employees about the threats
and risks in this area, ignorance, negligence and violation of safety rules and
procedures. This means, however, that most cybercrime would be performed
because of the people as a security risk.
Prevention of the threat of cybercrime requires the establishment of a
separate institution/team to deal with the threats and challenges in cyberspace,
globally known as Computer Emergency Response/Readiness Team (CERT).
This team has not been yet established in Macedonia, although during 2013 its
formation was announced, with the task of protecting and providing
recommendations for the protection of IT systems of government institutions
and the private sector. Its establishment was announced once again in “The
Program of the Government of Republic of Macedonia 2014 – 2018”. The
deadline for its constitution in accordance to this Government Program was
June 2015, but it did not happen, probably due to the political crisis in
Macedonia. Debates on the rationality for the establishment of these teams
were also leading on the forum of the Internet portal “IT” on the theme
“Developing CERT/CIRT team in Macedonia”. In addition to the question
“Should we create CERT/CIRT team in Macedonia?” 78% of surveyed IT
members voted “yes” for the establishment of these teams, which is a high
percentage of the justification for establishing these teams.
The expectation in the future is that cybercrime is going to grow and
become more complex, more serious, covered and to cause major damage due
80 I. Cvetanoski, J. Achkoski, D. Rančić et al.
REFERENCES
[1] Kevin, Beaver. 2010. Hacking For Dummies, 3rd Edition. Wiley
Publishing, Inc. 111 River Street Hoboken, NJ, 386. Accessed
December 15, 2011. http://www.dummies.com/cheatsheet/hacking.
[2] Benjamin, S. Buckland, Fred, Schreier, and Theodor, H. Winkler.
Democratic governance challenges of cyber security. Accessed
December 15, 2013. http://www.fbd.org.rs/akcije/POJEDINACNE/
CYBER%20ZA%20WEBSITE.pdf.
[3] Barry, G.Buzzan (1983). People, states and fear. Skopje, 2010:
Academic press, 112.
[4] Dejan, Vuletic. Cyber warfare as a form of information warfare.
Accessed December 15, 2013. http://www.itvestak.org.rs/ziteh_04/
radovi/ziteh-32.pdf.
[5] CARNet Croatian Academic and Research Network. Phishing attacks.
CCERT-PUBDOC-2005-01-106. CARNetCERT in association with
LS&S. Accessed December 16, 2013. http://www.cert.hr/sites/default/
files/CCERT-PUBDOC-2005-01-106.pdf.
[6] CARNet Croatian Academic and Research Network. Online extortion.
CCERT-PUBDOC-2009-06-268. CARNetCERT in association with
LS&S. Accessed December 16, 2013. http://www.cert.hr/sites/default/
files/CCERT-PUBDOC-2009-06-268.pdf.
[7] CARNet Croatian Academic and Research Network. Steganography.
CCERT-PUBDOC-2006-04-154. CARNetCERT in association with
LS&S. Accessed December 16, 2013. http://www.cert.hr.
[8] CARNet Croatian Academic and Research Network. Spyware
programs. CCERT-PUBDOC-2009-10-280. CARNetCERT in
association with LS&S. Accessed December 16, 2013. http://www.
cert.hr/sites/default/files/CCERT-PUBDOC-2009-10-280.pdf.
[9] David, M.Lane. Introduction to Linear Regression. Accessed March 26,
2016. http://onlinestatbook.com/2/regression/intro.html.
[10] ETS 185 – Convention on Cybercrime, 23.XI.2001. Council of Europe.
Accessed December 15, 2013. http://www.europarl.europa.eu/
meetdocs/2014_2019/documents/libe/dv/7_conv_budapest_/7_conv_bu
dapest_en.pdf.
Cybercrime Influence … 81
[21] New virus locks the data and required to pay $ 300. BusinessInsider.
Accessed January 4, 2014. http://brkajrabota.mk/tehnologija/internet/
30378-nov-virus-gi-zakluccua-komjuterite-i-bara-da-platite-300-dolari.
[22] Organized crime. Seminar work. Accessed February 13, 2014.
http://www.maturskiradovi.net/forum/attachment.php?aid=2015.
[23] PennState, Eberly College of Science. STAT 501. Lesson 1: Simple
Linear Regression. Accessed March 25, 2016. https://onlinecourses.
science.psu.edu/stat501/node/257.
[24] Protecting Yourself Online. What Everyone Needs to Know.
Commonwealth оf Australia 2010. Australian Government. Copyright
Administration, Attorney General’s Department, National Circuit,
Barton ACT.
[25] 2600. Accessed May 21, 2014. http://www.ag.gov.au/cca.
[26] Dejan, Sokolovski. 2014. “Dutch authorities turned off the online black
market Utopia”. Internet portal IT. Accessed February 3, 2014.
http://it.com.mk/holandskite-vlasti-go-izgasija-tsrniot-onlajn-pazar-
utopia/
[27] Dejan, Sokolovski. 2012. “MI and the FBI together against cyber crime,
24 persons from 13 countries were arrested”. Internet portal IT. June 27.
Accessed February 15, 2014. http://it.com.mk/mvr-i-fbi-zaedno-protiv-
kompjuterski-kriminal-24-uapseni-od-13-drzhavi/
[28] StATS: What is a correlation? (Pearson correlation). Accessed
February 18, 2016. http://www.pmean.com/definitions/correlation.htm.
[29] “There was computer crime in bidding for police vehicles”. 2012.
Internet portal МКД. September 19. Accessed Feruary 15, 2014.
http://www.mkd.mk/59923/crna-hronika/sepak-imalo-kompjuterski-
kriminal-pri-naddavanjeto-za-policiskite-vozila.
[30] Тhe Cyber security handbook. A cyber security guide. Accessed August
6, 2014. www.NJConsumerAffairs.gov.
[31] USAID/Project eGovernment. Ministry of Information. Metamorphosis.
2010. Fundamentals and development of e - government. Accessed
February 13, 2014. http://www.mio.gov.mk/files/pdf/Osnovi%20i%
20razvoj%20na%20e-Vlada%202010%20-%20mk.pdf.
[32] U.S. Attorney’s Office. Southern District of New York. 2012.
“Manhattan U.S. Attorney and FBI Assistant Director in Charge
Announce 24 Arrests in Eight Countries as Part of International Cyber
Crime Takedown”. Тhe Federal Bureau of Investigation (FBI). June 26.
Accessed February 17, 2014. http://www.fbi.gov/newyork/press-
releases/2012/manhattan-u.s.-attorney-and-fbi-assistant-director-in-
Cybercrime Influence … 83
charge-announce-24-arrests-in-eight-countries-as-part-of-international-
cyber-crime-takedown.
[33] Understanding Internet Security. What you need to protect yourself
online. 2004 Big Planet, Inc. All Rights Reserved. Big Planet is a
registered trademark. Accessed August 26, 2014. http://www.
bigplanetusa.com/library/bp/pdf/bpis_understanding_security.pdf.
[34] United States Government Accountability Office, Information Security:
Cyber Threats and Vulnerabilities Place Federal Systems at Risk
(Washington DC: US GAO, 2009); William A. Wulf and Anita K.
Jones, “Reflections on Cybersecurity,” Science 326 (13 November
2009): 943-4; See Martin Charles Golumbic, Fighting Terror Online:
The Convergence of Security, Technology, and the Law (New York:
Springer, 2007).
[35] United States Department of State. OSAC, Bureau of Diplomatic
Security. Macedonia 2014 Crime and Safety Report. Accessed March
25, 2016. https://www.osac.gov/pages/ContentReportDetails.aspx?cid=
15074.
[36] United States Department of State. OSAC, Bureau of Diplomatic
Security. Macedonia 2015 Crime and Safety Report. Accessed March
25, 2016. https://www.osac.gov/pages/ContentReportDetails.aspx?cid=
17677.
[37] United States Department of State. OSAC, Bureau of Diplomatic
Security. Macedonia 2016 Crime and Safety Report. Accessed March
25, 2016. https://www.osac.gov/pages/ContentReportDetails.aspx?cid=
18939.
In: Knowledge Discovery in Cyberspace ISBN: 978-1-53610-566-7
Editors: K. Kuk and D. Ranđelović © 2017 Nova Science Publishers, Inc.
Chapter 4
ABSTRACT
In this chapter the contemporary generally accepted theoretical
analysis and assumptions regarding the implementation of Benford’s Law
are presented. It seems interesting to introduce the perspective to
Benford’s Law as a consequence of the universal law of nature stating
Corresponding author: D. Joksimovic, Email: dusan.joksimovic@kpa.edu.rs.
86 D. Joksimović, G. Knežević, V. Pavlović et al.
that nature strives for the maximum entropy or disorder, as well as the
perspective in which Benford’s Law, aspires to find its place in the
contemporary theory of everything in nature.
The implementation of this law in the analysis of the anomalies in
some numerical data in various scientific disciplines is also part of this
article. The incorrect numerical data that describes the specific
occurrence can be the consequence of an unintentional error in the
formation of the numerical data (as a consequence of bad design of an
experiment, the imperfections of the detection of the numerical data, a
badly set up model of some process that generates the set of numerical
data, etc.), but also the consequence of intentional abuse.
Using the Monte Carlo simulation, we determine the average values
and standard deviation of the relative frequency of the first type error for
the Mean Absolute Deviation test and for the Pearson 𝜒 2 test of
Benford’s Law, for the first digit, and the first two digits and we
determine the acceptable length of series for the application of these tests,
in the context of acceptable first type errors.
We make the practical implementation of Benford’s Law by testing it
on the data gathered from the International Monetary Fund, World
Economic Outlook Database. We use two groups of data. The first group
is the data regarding the Gross domestic product and the other group
comprises the Current Account Balances for the 184 countries in the
period from 1980 to 2016.
The specific perspective provided in this chapter regards the
implementation of this law in the forensic analysis of frauds, especially in
the analysis of the numerical data that describes various sociological,
econometrical and financial irregularities. We show how the mutual
usage of Benford’s Law and specific laws of mathematical statistics,
successfully detect potential irregularities in the numerical data and
advances the forensic analyst’s potential fraud detection in this area.
findings, without getting into the assumptions and theoretical analysis of the
phenomena. This article went unnoticed and was forgotten easily.
In the year of 1938 the physicist Frank Benford came to the same
conclusion as Simon Newcomb. Benford tested this hypothesis using the
20229 data of big numbers from 20 different sources and 2968 data of small
numbers from 10 different sources. Sources were found in the natural and
social occurrences, such as numbers used for the published journals, the length
of a river, the value of some physical constant, the mortality rates, statistics in
baseballm, etc. Different from Newcomb, Benford in his work [2] determined
the mathematical law of frequency distribution of leading digits in the
numbers, that became known as Benford’s Law.
Starting from the second half of the 80’s of the last century, this l aw
started to be used more often in the analysis of the consistency of the
numerical data expressed in various social and natural phenomena. Nowadays,
the theoretical analysis of this law in the area of finding a better mathematical
basis and its implementation is still a current issue in contemporary scientific
society [3-4].
𝑥 = 𝑀𝐵 (𝑥) ∙ 𝐵𝑘
where 𝑘 ∈ 𝑍, a 𝑀𝐵 (𝑥) ⊂ [1, 𝐵). The number 𝑀𝐵 (𝑥) will denote mantissa of
the number 𝑥. We can conclude that the following equation is proven to be
true
𝑥 𝑥 𝐵log𝐵 𝑥
𝑀𝐵 (𝑥) = = = = 𝐵log𝐵 𝑥−[log𝐵 𝑥]
𝐵𝑘 𝐵[log𝐵 𝑥] 𝐵[log𝐵 𝑥]
where [log 𝐵 𝑥] will denote the whole part of the number log𝐵 𝑥, or the greatest
integer less than or equal to log𝐵 𝑥.
In scientific practice we can find two mutually equivalent definitions of
Benford’s Law. Benford’s Law as a function of probability distributions of
88 D. Joksimović, G. Knežević, V. Pavlović et al.
mantissa numerical data and Benford’s Law for the joint probability
distribution of representing first k digits of significant numerical data.
Definition 2.1. (Benford’s Law for the function of the probability
distribution of mantissa)
Random variable X, whose realizations are only positive values in the base
𝐵 > 1, is recognized under Benford’s Law if, and only if the function of the
probability distribution of a random variable determined by the mantissa of the
random variable X, M(X), in that base is recognized under the following
logarithm law
1
𝑃(𝐶1 = 𝑐1 , 𝐶2 = 𝑐2 , … , 𝐶𝑘 = 𝑐𝑘 ) = log𝐵 (1 + )
∑𝑘𝑖=1 𝐵𝑘−𝑖 𝑐𝑖
𝑃(𝑀𝐵 (𝑥) ≤ 𝑚) = 𝑃(𝑀𝐵 (𝑥) < 𝑚) + 𝑃(𝑀𝐵 (𝑥) = 𝑚) = 𝑃(𝑀𝐵 (𝑥) < 𝑚),
This characteristic in the base 𝐵 = 10, was detected in the first works of
Newcomb [9] and Benford [3].
1. For 𝑘 ≥ 2, (𝑘 ∈ 𝑁 ∗ )
𝐵 𝑘−1 −1
1
𝑃(𝐶𝑘 = 𝑐𝑘 ) = ∑ log𝐵 (1 + )
𝑖 ∙ 𝐵 + 𝑐𝑘
𝑖=𝐵𝑘−2
𝑐𝑘 ∈ (0,1,2, … , 𝐵 − 1).
1
𝑃(𝐶1 = 𝑐1 , 𝐶2 = 𝑐2 ) = log𝐵 (1 + ) , 𝑐1 ∈ (1,2, … , 𝐵 − 1) , 𝑐2 ∈
𝑐1 ∙𝐵+𝑐2
(0,1,2, … , 𝐵 − 1).
0.3
0.25
0.2
Probability
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9
The first significant digit
The probability distributions of the second digits in the base B=10
0.12
0.1
0.08
Probability
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9
The second significant digit
Figure 1. (Continued)
Some Aspects of the Application of Benford’s Law … 91
0.1
0.08
Probability
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9
The third significant digit
Figure 1. The probability distributions of the first, second and third digits in the base B
= 10.
0.3
0.25
0.2
Probability
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8
The first significant digit
Figure 2. (Continued)
92 D. Joksimović, G. Knežević, V. Pavlović et al.
0.12
0.1
0.08
Probability
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8
The second significant digit
0.1
0.08
Probability
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8
The third significant digit
Figure 2. The probability distributions of the first, second and third digits in the base B
= 9.
Some Aspects of the Application of Benford’s Law … 93
0.35
0.3
0.25
Probability
0.2
0.15
0.1
0.05
0
1 2 3 4 5
The first significant digit
The probability distributions of the second digits in the base B=6
0.25
0.2
0.15
Probability
0.1
0.05
0
0 1 2 3 4 5
The second significant digit
Figure 3. (Continued)
94 D. Joksimović, G. Knežević, V. Pavlović et al.
0.16
0.14
0.12
Probability
0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5
The third significant digit
Figure 3. The probability distributions of the first, second and third digits in the base B
= 6.
The probability distributions of the fourth digits in the base B=10
0.12
0.1
0.08
Probability
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9
The fourth significant digit
Figure 4. (Continued)
Some Aspects of the Application of Benford’s Law … 95
0.1
0.08
Probability
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8
The fourth significant digit
The probability distributions of the fourth digits in the base B=6
0.18
0.16
0.14
0.12
Probability
0.1
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5
The fourth significant digit
Figure 4. The probability distributions of the fourth digit in the bases B = 10,9,6.
96 D. Joksimović, G. Knežević, V. Pavlović et al.
𝑃(𝐶1 = 𝑐1 , 𝐶2 = 𝑐2 , … , 𝐶𝑘 = 𝑐𝑘 ) ≠ ∏ 𝑃(𝐶𝑖 = 𝑐𝑖 )
𝑖=1
This property shows that the appearance of the significant digits within the
realization of the events that satisfy Benford’s Law, are not mutually exclusive
events. This fact opens many questions in the area of investigating these
distributions and finding some universal law, which is in the essence of events
that satisfy Benford’s Law. On the other hand, these events are from a large
number of scientific disciplines and areas, and at first glance they seem
completely unrelated, and some scientific and academic researchers consider
that Benford’s Law is one of the detectors of the existence of the Theory of
Everything, to which leading scientists have tried to come closer in the last
decades.
The probability distributions of the first two digits in the base B=10
0.05
0.04
Probability
0.03
0.02
10
0.01
0 5
0
2
4
6
8 0 Second digits
10
First digits
Figure 5. The probability distributions of the first two digits in the bases B = 10.
Some Aspects of the Application of Benford’s Law … 97
Based on the Kronecker-Weyl theorem, that states that for each irrational
number 𝛼𝜖𝑅\𝑄, sequence 𝑧𝑛 = 𝑛 ∙ 𝛼, where (𝑛 ∈ 𝑁 ∗ ) uniformly distributed
based on modul 1, therefore 𝑛 ∙ 𝛼 𝑚𝑜𝑑 1 ~ 𝒰(0,1), and identity (2), is valid
for all irrational numbers 𝛼𝜖𝑅\𝑄, sequence 𝐵𝑛𝛼 , where (𝑛 ∈ 𝑁 ∗ ), satisfy
Benford’s Law in the base 𝐵 > 1.
Sequence {𝒂𝒏 } satisfy Benford’s Law if, and only if 𝐥𝐨𝐠𝑩 𝒂 is an irrational
number.
98 D. Joksimović, G. Knežević, V. Pavlović et al.
5. The Hypothesis of the Scale and Base Invariance for the Data
That Satisfy Benford’s Law
The event space ℳ𝐵 for which the Definition 2.1., in the base B > 1, is
defined as
The event space ℳ𝐵 we call mantissa algebra, which is, σ-algebra as a sub
σ-field of the Borel.
So, for any set E it is valid:
𝐸 ∈ ℳ𝐵 ⇔ 𝐸 = ⋃ 𝑆 ∙ 𝐵𝑘 , 𝑆𝜖𝓑([1, 𝐵))
𝑘=−∝
1
(𝑚 ∈ 𝑁, 𝐸 ∈ ℳ𝐵 ⇒ 𝐸 𝑚 ∈ ℳ𝐵 )
Hypothesis of the scale and base invariance for the numerical data, can be
formulated, but not strictly mathematically, in the following way:
Scale invariance hypothesis: If some random variable X satisfy Benford’s
Law, then the random variable 𝒂 ∙ 𝑿, where 𝑎 > 0, 𝑎 ∈ 𝑅, also satisfy this law,
multiplied with some positive scalar remains coherent with Benford’s Law
properties under the numerical data that has those properties.
We can say that this hypothesis was tested theoretically and practically
and it passed them all. In 1995 Hill found that in the base 𝐵 = 10, probability
density under the field (𝑅+ , ℳ10 ) is scale invariant if, and only if that
probability satisfy the Benford’s Law [5].
The base invariance hypothesis is much more sensitive. This one tries to
answer the question if Benford’s Law property of some random occurrence or
set of numerical data which is detected in the base 𝐵 > 1, is valid on that set
even when the set of data is converted into the other base? It was shown that
[5] that Benford’s Law properties transfer into the other base for the Borel
sets, while for the Dot set we cannot be sure of that. In that case the
combination of probabilities that satisfy the Benford’s Law and Dirac
measures of the probability with the constant one, help to preserve the
property of the base.
Above all it was presented that if some random variable satisfies the
hypothesis of the scale invariance than that also satisfies the base invariance
hypothesis, but the inverse does not hold true.
The random variable is the sum invariant in the sense that if for any
natural number 𝑛 ∈ 𝑁 the expected sum of mantissa of all entries starting with
the fixed n-tuple of significant digits is the same as that for any other n-tuple.
It was presented that the random variable is the sum invariant only if it
satisfies Benford’s Law.
The sum-invariance for Benford’s Law data is proven in the sense of the
expected sums, in the real Benford’s Law data that sum is not exactly equal, as
some variance exists. The analysis of this variance leads to certain results that
assure the practical usage on the data that satisfy Benford’s Law.
100 D. Joksimović, G. Knežević, V. Pavlović et al.
One of the universal laws in the universe is the law on maximum entropy
that states that all isolated systems in the universe have an aspiration to the
maximum entropy, or disorder. This is the state in which all the possibilities
are equally possible. It shows that [6-8] Benford’s Law is the consequence of
that universal law. That is why even in the most contemporary Theory of
Everything, Benford’s Law is analyzed as one of the potential paths to that
comprehensive solution. However, the fact that Benford’s Law is derived from
the maximum entropy of the system in which it is used, shows us the
successful implementation of this law in the large spectrum of natural and
social events.
not mean that the set of data do not satisfy Benford’s Law, but it can be the
consequence of the fact that we do not have the adequate level of significance
that the data needs to satisfy Benford’s Law.
There are many tests developed and used for the testing of Benford’s Law
at the numerical data (test of the Mean Absolute Deviation, Pearson 𝜒 2 test,
Kuiper test, Z-test, Test of the sum invariance, Test of the factors of distortion,
Second level test, Test of doubling the digits, Test the last two digits...).
In this paper we are going to describe the mean absolute deviation test and
Pearson 𝜒 2.
Mean Absolute Deviation (MAD) is calculated for the first, the second
and the first two digits
9
1
𝑀𝐴𝐷(𝐶1 ) = ∑|𝑃(𝐶1̅ = 𝑐𝑖 ) − 𝑃(𝐶1 = 𝑐𝑖 )|
9
𝑐𝑖 =1
9
1
𝑀𝐴𝐷(𝐶2 ) = ∑ |𝑃(𝐶2̅ = 𝑐𝑖 ) − 𝑃(𝐶2 = 𝑐𝑖 )|
10
𝑐𝑖 =0
9 9
1
𝑀𝐴𝐷(𝐶1𝐶2) = ∑ ∑ |𝑃(𝐶1̅ = 𝑐𝑖 , 𝐶2̅ = 𝑐𝑗 ) − 𝑃(𝐶1 = 𝑐𝑖 , 𝐶2 = 𝑐𝑗 )|
90
𝑐𝑖 =1 𝑐𝑗 =0
We have done the testing of the relative frequency of showing the first
type error in the base B = 10, which we generated several times (100 times to
be precise) Benford’s Law string of different length data (from 100 to 10000
data set) and based on Table 1, taking the values for the “No agreement at all,”
we determine the relative frequency showing of the first type error.
Such testing were done an additional 100 times and using the common
methods we obtained the value for the average value and standard deviation of
the relative frequency of the first type error.
We did the testing using the software MATLAB.
Results are shown in the following Figures.
From Figure 6 and 7 we can conclude that the relative frequency of the
first type error for the Mean Absolute Deviation calculated for the first digit is
acceptable for the string of data whose number is higher than 1,000 and for the
first two digits for the data sets whose number is higher than 3000. For the
smaller strings of data, relative frequency of the first type error shows that the
tests are not applicable.
Pearson 𝝌𝟐
(𝑂𝑖 −𝐸𝑖 )2
Pearson 𝜒 2 test denote ∑𝑘𝑖=1 2
~𝜒𝑘−1 , where Oi - sample frequency,
𝐸𝑖
Ei - expected frequency
The appearance of the characteristics of the set classified into k classes. In
this case number of classes is equal to the number of digits for which the
analysis is (k = 9 for the analysis of the first digit, k = 10 for the analysis of the
second digit, k = 90 for the analysis first two digits), and this test is then:
9
(𝑃(𝐶1̅ = 𝑐𝑖 ) − 𝑃(𝐶1 = 𝑐𝑖 ))2
𝑁∙ ∑ ~𝜒82
𝑃(𝐶1 = 𝑐𝑖 )
𝑐𝑖 =1
Some Aspects of the Application of Benford’s Law … 103
9
(𝑃(𝐶2̅ = 𝑐𝑖 ) − 𝑃(𝐶2 = 𝑐𝑖 ))2
𝑁∙ ∑ ~𝜒92
𝑃(𝐶2 = 𝑐𝑖 )
𝑐𝑖 =0
9 9 2
(𝑃(𝐶1̅ = 𝑐𝑖 , 𝐶2̅ = 𝑐𝑗 ) − 𝑃(𝐶1 = 𝑐𝑖 , 𝐶2 = 𝑐𝑗 )) 2
𝑁∙∑ ∑ ~𝜒89
𝑃(𝐶1 = 𝑐𝑖 , 𝐶2 = 𝑐𝑗 )
𝑐𝑖 =1 𝑐𝑗 =0
1 0.05
0.9 0.045
St.dev.rel.freq.(type I error-MAD(C1))
0.8 0.04
Av.rel.freq.(type I error-MAD(C1))
0.7 0.035
0.6 0.03
0.5 0.025
0.4 0.02
0.3 0.015
0.2 0.01
0.1 0.005
0 0
0 500 1000 0 500 1000
Number of data set Number of data set
0.025 0.014
0.012
St.dev.rel.freq.(type I error-MAD(C1))
0.02
Av.rel.freq.(type I error-MAD(C1))
0.01
0.015
0.008
0.006
0.01
0.004
0.005
0.002
0 0
0 5000 10000 0 5000 10000
Number of data set Number of data set
-3
x 10
1 1.5
0.9
St.dev.rel.freq.(type I error-MAD(C1C2))
0.8
Av.rel.freq.(type I error-MAD(C1C2))
0.7
1
0.6
0.5
0.4
0.5
0.3
0.2
0.1
0 0
0 500 1000 0 500 1000
Number of data set Number of data set
1 0.06
0.9
0.05
St.dev.rel.freq.(type I error-MAD(C1C2))
0.8
Av.rel.freq.(type I error-MAD(C1C2))
0.7
0.04
0.6
0.5 0.03
0.4
0.02
0.3
0.2
0.01
0.1
0 0
0 5000 10000 0 5000 10000
Number of data set Number of data set
Critical values are taken from the 𝜒𝑛2 tables for the level of significance
𝛼 = 0,05, or for some other level of significance. This test is very sensitive to
the deviations of Benford’s Law, and it is sensitive to the enlargement of the
sample N.
In the same way as with the testing of the relative frequency of showing
the first type error for the MAD tests, we tested the relative frequency of
showing the first type error for the 𝜒 2 test, for the first digit and for the first
two digits. Critical values are taken from the 𝜒𝑛2 tables for the level of
significance 𝛼 = 0,05.
Results are shown in the following Figures:
0.06 0.025
0.04
0.015
0.03
0.01
0.02
0.005
0.01
0 0
0 500 1000 0 500 1000
Number of data set Number of data set
0.06 0.03
0.05 0.025
St.dev.rel.freq.(type I error-Chi sq.(C1))
Av.rel.freq.(type I error-Chi sq.(C1))
0.04 0.02
0.03 0.015
0.02 0.01
0.01 0.005
0 0
0 5000 10000 0 5000 10000
Number of data set Number of data set
0.07 0.03
0.06
0.05
0.02
0.04
0.015
0.03
0.01
0.02
0.005
0.01
0 0
0 500 1000 0 500 1000
Number of data set Number of data set
0.06 0.025
St.dev.rel.freq.(type I error-Chi sq.(C1C2))
0.05
Av.rel.freq.(type I error-Chi sq.(C1C2))
0.02
0.04
0.015
0.03
0.01
0.02
0.005
0.01
0 0
0 5000 10000 0 5000 10000
Number of data set Number of data set
0.3
0.25
0.2
Probability
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9
The first significant digit
0.12
0.1
0.08
Probability
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9
The second significant digit
The probability distributions of the first two digits-Gross domestic product,current prices
0.05
0.04
Probability
0.03
0.02
10
0.01
0 5
0
2
4
6
8 0 Second digits
10
First digits
Figure 10. Probability distribution of the first, the second and the first two digits-Gross
domestic product, current prices.
Figure 8 and 9 show that the relative frequency of showing the first type
error is acceptable for the whole length of observed strings of data.
We undertake the practical implementation of Benford’s Law by testing it
on the data gathered from the International Monetary Fund, World Economic
Outlook Database. We use two groups of data. The first group is the data
regarding the Gross domestic product and the other group comprises the
Current account balances for the 184 countries in the period from 1980 to
2016. The first group of data contains the 6266 useful data, and the other
group contains 6243 data considered to be useful. We use the program code
that we created in the software package MATLAB. The results are shown in
Table 2 and the graphs for the probability for the first, second and the first two
digits for both groups of data are shown also.
The test shows that for the group of data containing Gross domestic
product, Benford’s MAD and χ2 for the first, second and the first two digits
does not show the irregularity of data.
Regarding the data in the group Current account Balances Benford’s
MAD and χ2 for the first digit does not show the irregularities, MAD test for
Some Aspects of the Application of Benford’s Law … 109
the second digit shows the acceptable differences, while the χ2 of the second
digit and MAD and χ2 of the first two digits shows the probability, and we are
almost sure, that the result does not comply with Benford’s Law. This result
can be the consequence of the way by which the data are calculated regarding
Current Account Balances because it is about the difference of the two groups
of data that both are Benford’s Law groups. But this could be the consequence
of the possible irregular Current Account Balances reported by some
countries.
Table 2. Benford’s MAD and χ2 test for the first, second and the first
two digits
0.3
0.25
0.2
Probability
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9
The first significant digit
0.14
0.12
0.1
Probability
0.08
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9
The second significant digit
0.05
0.04
Probability
0.03
0.02
10
0.01
0 5
0
2
4
6
8 0 Second digits
10
First digits
Figure 11. Probability distribution of the first, the second and the first two digits-
Current account balance.
Some Aspects of the Application of Benford’s Law … 111
data must describe a similar phenomenon, i.e., that they have the same
nature or the same set of sources that generate them (financial
transactions, the results of various measurements of length, volume,
etc. ...)
no need for boundaries of minimum or maximum values
data must have an incidental nature, rather than some previously
generated data using a pattern, such data are serial numbers, phone
numbers, personal identification numbers, social security numbers,
tax numbers, car registration, account numbers ...
data should comprise of more small than large numbers, and that the
average value is less than the median (positive asymmetry), the higher
the ratio of the mean divided by a median, the more the data are
suitable for this analysis.
data should be reported under the same units of measurement
data should include at least two orders of magnitude.
Examples of events that generate data that satisfy Benford’s Law are: the
price of securities on the stock exchange, financial transactions, bank cards,
some processes in telecommunication and computer systems, processes that
describe recurrent sequences (Fibonacci sequence, fractals) The natural
demographic population growth processes of plants and animals, and many
others. Subject to Benford’s Law analysis can also be the frequency of
categorical data.
Some examples of the usage of this law are:
> 106 prime numbers). Newton iterative procedure generates data that
are subject to this law, and so on.
In the area of economy [21-31] we can find some famous examples of
the application of Benford’s Law. Mark J. Nigrini analyzed tax
returns. This is the beginning of the use of this law for the detection of
fraud. The basic assumption is that the frequency of significant digits
that do not follow Benford’s Law suggests a possible irregularity in
the transactions. This method was quickly adopted by some
supervisory authorities and recognized as a valid audit procedure. So
now there are standard software packages that use Benford’s Law. It
has a highly regarded role in detecting fraud. In the detection of
scams, Benford’s Law can be applied and this is the most widely used
area for this law. This method was soon discovered and applied in the
detection of fraud with credit cards and other forms of electronic
transaction fraud. Detection of fraud is not the only application of this
law. Creative material accounting causes many misstatements in
financial reports and that is why Benford’s Law is needed. It is
applied in the analysis of structural deficiencies in macroeconomic
data, the analysis of investment programs, accounting reports, traffic
reports, forensic accounting, etc. ...
In informatics and computing science it is shown, based on the Law of
Benford, that computer design that minimizes the storage space is
based on the base B = 8. It is also used for analysis of the size of the
files in the folders, as well as the duration of the analysis of various
processes in multi-user environments, and it is also used in cyber
security, neural networks, digital forensics [32-42], etc. ...
In cryptology Benford’s Law is used in the steganography, the
stylometry (analysis of linguistic styles and habits of individuals’
writing) and in image forensics [43-53].
CONCLUSION
Benford’s Law has determined the probability of occurrence of significant
digits in the realization of random variables and in the analysis of numerical
data, in a very wide range of events. Under certain conditions this law has a
universal character. It is valid in all the various systems and its application is
found in almost all natural and social phenomena, and in the analysis of the
events that have some system of measurement.
So far it has been the most used in forensics, in the analysis of intentional
fraud, especially in the various financial statements and the interpolation
process analysis (Newton’s iterative process), in the optimizing of computer
systems, accelerating algorithms, deciphering hidden messages, and in many
other analyzes. As of yet it has not been exactly mathematically proven why
numerical data under certain conditions meet Benford’s Law distribution. For
some specific events such evidence exists, but not in general. In a modeled
problem of the existence of Benford’s Law distribution, it is shown that it is a
consequence of the law of universal aspirations of closed system to move to a
state with maximum entropy, and it is present in all natural and social
phenomena, which in a given analysis can be considered to be closed.
Mathematical implications of this law are defined in a specific Borel σ-field,
which is called algebra mantissa.
Our Monte-Carlo simulations of relative frequency of showing the first
type error, points to the inapplicability of the Mean Absolute Deviation tests of
the first digit for the strings whose length does not pass the 1000 data sets, as
well as the inapplicability of the Mean Absolute Deviation data tests for the
first two digits for the strings whose length is no more than 3000 data sets.
Also, it confirms that the Pearson 𝜒 2 test of the first digit and first two digits is
114 D. Joksimović, G. Knežević, V. Pavlović et al.
applicable for all length of data sets. Applicable means that the first type error
is acceptable.
We carry out the practical implementation of Benford’s Law by testing it
on two groups of data. The first group is the data regarding the Gross
Domestic Product and the other group comprises the Current Account
Balances for the 184 countries in the period from 1980 to 2016. The test shows
that for the group of data containing Gross Domestic Product, Benford’s MAD
and χ2 for the first, second and the first two digits does not show the
irregularity of data. Regarding the data in the group Current Account
Balances, Benford’s MAD and χ2 for the first digit does not show the
irregularities, MAD test for the second digit shows the acceptable differences,
while the χ2 of the second digit and MAD and χ2 of the first two digits shows
the probable, and we are almost sure, that the result does not comply with
Benford’s Law. This result can be the consequence of the way by which the
data are calculated regarding Current Account Balances because it is about the
difference of the two group of data that both are Benford’s groups. But this
could be the consequence of the possible irregular Current Account Balances
reported by some countries.
We believe that the implementation of this law will be more prominent,
and it will be used in practice, and also in the scientific analysis of a multitude
of phenomena in many different areas.
REFERENCES
[1] Newcomb, S. 1881. “Note on the frequency of use of the different digits
in natural numbers.” American Journal of Mathematics 4 (1): 39-40.
[2] Frank Benford. 1938. The Law of Anomalous Numbers, Proceedings of
the American Philosophical Society, Vol. 78, No. 4, p. 551-572.
[3] Arno Berger, Theodore P. Hill. 2011. “A Basic Theory of Benford’s
Law,” Probability Surveys, Vol. 8, 1-126.
[4] Drien Jamain. 2011. “Benford’s Law,” Dissertation Report, Department
of Mathematics, Imperial College, London.
[5] Theodore P. Hill. 1995. “A Statistical Derivation of the Significant Digit
Law,” Statistical Science, Vol. 10, No 4, 354-363.
[6] Michaele Ciofalo. 2009. “Entropy, Benford’s first Digit Law and The
Disribution of Everything,” Palermo, Italy: Dipartamento di Ingenieria
Nucleare Universita degli Studi di Palermo.
Some Aspects of the Application of Benford’s Law … 115
Chapter 5
ABSTRACT
The Online Social Networks (OSNs) are widely used in commercial
or personal purposes, including entertainment. The main value of OSNs
is in their ability to facilitate and ease communication between end users.
They also enable users who are physically remote to maintain
relationships and stay up to date with current events.
Therefore, the popularity and market potential of different OSNs has
been growing over time along with the growth of user engagement and
they still continue to increase. The number of network users worldwide
reached 2.2 billion as of 2016. The leading social network is Facebook
with above 1.5 billion mainly active users worldwide, as of 2016.
According to Internet World Stat, 73.5% of the European Union citizens
use the Internet on a regular basis, while 51.24% of them use Facebook.
In Serbia, 66.20% of the population uses the Internet, but 72.51% of them
also use the most popular OSN – Facebook. This data motivated us to
conduct a survey which would determine if the OSN users in Serbia have
concerns about their privacy.
We intend to investigate the relationship between concerns of OSN
users and their behaviour and attitudes towards privacy. For that purpose,
* Corresponding
author: G. Savic, Email: gordana.savic@fon.bg.ac.rs.
122 Gordana Savić and Marija Kuzmanović
INTRODUCTION
The Online Social Networks (OSNs) are widely used in commercial or
personal purposes, including entertainment. The main value of OSNs is in
their ability to facilitate and ease communication between end users. They also
enable users who are physically remote to maintain relationships and stay up
to date with current events. This helps to create social capital [33].
Therefore, the popularity and market potential of different OSNs has been
growing over time along with the growth of user engagement and they still
continue to increase. As of the 4th quarter of 2014, the average time per day
spent on social networks by global Internet users is 101.4 minutes surfing
social networks, while the number of network users worldwide reached 2.2
billion as of 2016. Leading social networks are Facebook with almost 1.5
billion active users worldwide per month as of 2015, followed by
communication networks such as WhatsApp and Facebook Messenger or the
photo-sharing social network Instagram. Recently, social networking has
demonstrated a clear shift towards mobile platforms. As of the first quarter of
2015, some 580 million Facebook users accessed the social network
exclusively via a mobile device, up from 341 million users in the
Behaviour and Attitudes vs. Privacy Concerns … 123
RELATED WORKS
Westin studied how PSI has changed over time. He showed that during
1994-2003 the percentage of Fundamentalists in the public remained almost
the same – around 25%, but the number of Unconcerned decreased from 42%
in 1993 to 12% in 2000 and reduced further to 10% in 2003. However, the
Pragmatist group varied between 30% and 64% in 2003 [20]. Westin mentions
that a steady decrease of unconcerned consumers might be due to the fact that
people learned more about technology and also became aware of the various
means of protecting their privacy.
In their study, Krasnova, Hildebrand, and Guenther [19] show that 17.3%,
72.6% and 10.1% belonged to privacy fundamentalists, pragmatists and
unconcerned groups, respectively. Recent research suggests that
approximately 49% of individuals are Fundamentalists, 40% are Pragmatists,
and 10% are Unconcerned [42].
Wang and Petrison [39] demonstrated that certain consumers (particularly
in the older age groups) are more negative about potential threats to their
privacy than others. Sheehan and Hoy [30] showed that consumers who
believe they do not have control over their personal information are more
concerned about privacy. Malhotra et al. [24] have identified consumer
concern factors related to privacy practices and developed survey instruments
to measure user information privacy concerns such as data collection
procedures, control, and awareness of privacy practices.
Numerous other studies have analyzed privacy concern, and applied
diverse instruments for measuring it [29]. Recent studies have focused on
privacy in an online environment.
The findings indicate that consumers are willing to provide both online
and offline companies with basic information, but are more protective of
personal information and are less comfortable sharing more sensitive
information [15, 9]. Graeff and Harmon [11] pointed out that a vast majority
of consumers believed that the Internet has made it easier for someone to
obtain personal information about them. In his study, Ha [13] showed that
online users want highly visible privacy policies telling them precisely how a
company will use their personal information. Chen and Rea [6] reported more
specific protective behaviour suggesting that users will cease web site access if
too much personal information is requested when registering on the site.
Berendt et al. [3] argue that while many users have strong opinions on
privacy and state privacy preferences, they are unable to act accordingly. Once
they are in an online interaction, they often do not monitor and control their
actions sufficiently. They also state that online privacy statements seem to
have no impact on behaviour [9].
126 Gordana Savić and Marija Kuzmanović
Boyd and Ellison [4] define OSNs as web-based services that allow
individuals to create a profile and connect to friends within a bounded system
[19]. Debatin [7] emphasizes that the main purpose of participating in social
networks is the exchange of information, most of it highly personal, and the
maintenance and expansion of one’s social relationships. Thus, privacy
protection in online social networks seems to be an oxymoron.
Hugl [17] highlights the necessity to focus on multidimensional and
multidisciplinary frameworks of privacy, considering a so-called “privacy
calculus paradigm” and rethinking “fair information practices” from an
increasingly ubiquitous environment of OSNs.
Barnes [2] refers to public versus private boundaries and a so-called
“paradoxical world of privacy”: while adults are concerned about potential
privacy threats, teenagers make personal data public.
Gross and Acquisti [12] analyzed the online behaviour of students at
Carnegie Mellon University who have used a popular OSN and highlight
diverse potential attacks on various facets of privacy. Authors stated that the
informal character of online social networking and the possibility to
communicate casually enables users to manage a large number of contacts
with relatively little effort [7]. In addition, OSNs enable users to control the
impression they make on others by allowing them to decide how much they
are willing to self-disclose as well as by offering privacy settings to
strategically manage access to personal information. This is additional
motivation for users to post frequently and to voluntarily share large amounts
of personal information. Gross and Acquisti [12] conclude that only a minimal
percentage of users change the permissive default of high privacy preferences,
and personal data therefore is generously shared. On the other hand, OSNs
pose many privacy risks for their users, ranging from unauthorized use of their
information to harmful activities by other users, such as cyber-stalking,
harassment, and reputation damage [6, 16, 26].
Young and Quan-Haase [43] also draw attention to Facebook use by
undergraduate students. They found that 99.35% of the respondents use their
actual first and last name in their profile; nearly two-thirds present their sexual
orientation and interests; 83.1% provide their e-mail address; 92.2% their date
of birth, 80.5% their current town of residence, 97.7% present an image of
themselves, and 96.1% photos of friends [17]. Similarly, Tufekci [37] found
that 94.9% of Facebook users reported using their actual names, while 75.6%
disclose their relationship status.
Behaviour and Attitudes vs. Privacy Concerns … 127
Staddon et al. [31] consider both overall social network privacy concern
and aspects of concern related to transparency and control, with special
attention on comprehension of information sharing in the network, control
over information sharing in the network, and sharing practices of users in
relation to their friends in the network. They found that each aspect of privacy
concern is strongly associated with self-reported engagement across several
measures; users who report higher concern are less engaged while those who
report more control and comprehension over sharing of their information in
the network are more engaged.
Pew Internet Report studies demonstrate that 58% of OSN users have
restricted access to their entire OSN profile or to parts of their profiles [21,
22].
Studies on online privacy behaviour have shown that OSN users tend to be
rather careless with their personal data [7]. Although most users have a general
awareness of possible privacy risks, they do not always act accordingly. For
instance, most Facebook users have hundreds of friends, and statistically,
about one-third of users will accept complete strangers as friends [8].
A broad range of privacy paradox (attitude-behaviour dichotomy) research
finds OSN users actual behaviour during privacy transactions to be in
contradiction with their concerns on privacy risks when sharing personal
information [28]. Namely, OSN users showing high-levels of general privacy
or information sharing concerns are still willing to share higher amounts of
personal information [27].
Acquisti and Gross [1] demonstrated a gap between the information
participants said they cared about protecting online, and what they were
showing publicly on Facebook. Madejski, Johnson, and Bellovin [23]
measured privacy attitudes and intentions and compared these against the
privacy settings on Facebook. They also found that there are inconsistencies
between users’ sharing intentions and their privacy settings.
Haddidi and Hui [14] compared individuals’ behaviour with regard to
friendship requests by using 40 fake identities of well-known film stars and
ordinary people on Facebook. They show that usually users do not accept
random friendship requests.
Keith et al. [18] compared OSN user’s intent to share actual information
with ones who do not share and found no support between privacy concerns
and actual information share but did found a weak relationship between
sharing intentions and actual behaviour suggesting disclosure behaviours are
better predictors of actual behaviour [27].
128 Gordana Savić and Marija Kuzmanović
Sutanto et al. [34] found users with privacy concerns were more than
willing to share their information for personalization benefits on a privacy safe
mobile platform. Taddicken [35] finds that social situations of ‘quid pro quo’
have a much higher impact than privacy concerns on willingness to share
personal information in social networks.
EMPIRICAL STUDY
Study Goals
Measurement Instrument
Q1: OSN users have lost all control over how personal information is
collected and used by companies (providers).
Q2: Most companies (providers) handle the personal information they
collect about OSN users in a proper and confidential way.
Q3: Existing laws and organizational practices provide a reasonable level
of protection for OSN user privacy today.
RESULTS
Invitations to participate in the study were distributed via OSNs and the
responses were collected from April to May 2016. The overall sample is
consisted of OSN users of all ages, gender and professional statuses. The final
sample comprises of 641 respondents who fully completed the survey. The
analyses of their demography structure and responses regarding the privacy
behaviour and concerns are given in this section.
Demography
employment status, the employed respondents use OSNs for making new
relationships in the lowest percent (31.3%), while students are leaders in this
segment with as much as 57.9% of the whole segment.
When it comes to chatting, there is a significant difference between those
who are married, in a relationship and single, including the employment status
(p<0.01). Only 2.5% of the singles are claimed that are not chatters, while the
percentage of those who are married and are not chatters is 15.5%. Most high
school students (100%) and university students (97.1%) chat, and the lowest
percent of the employed are chatters (12.5% never chatting) in comparison to
the other groups such as unemployed, students, etc.
and quizzes reveals that only 17.8% of users in the whole sample use them,
high school students being the most frequent users (24.7%, p<0.01)
Furthermore, the majority of respondents use Facebook as their primary
choice (69.6%). This result is expected since it is in line with Internet World
Stat findings [38]. Details on the use of other OSNs are given in Figure 2.
Facebook is the most popular OSN among users aged 35-44 (80.7%). In
the youngest group of respondents (16-24), in addition to Facebook (63.7%),
the top choices are Instagram (14.8%) and WhatsApp (14.6%). Viber is the
best quoted (about 11%) among the other specified OSNs with significance
level p<0.01. In terms of the frequency of use of a particular network, the
statistics is as follows: three OSNs, Facebook, Instagram, and YouTube, were
the most prominent by the fact that a large number of respondents use them
every day (Figure 3). Only 1.6% of respondents have never used Facebook,
while it is used on daily bases by 76.4% of all respondents. YouTube has
never been used by only 1.1%, but 61% of the respondents use it on a daily
basis. Instagram is used on a daily basis by 30.9% of the respondents (mostly
younger respondents aged up to 34), but the share of those who never use it is
quite large (41.3%). LinkedIn is mostly used by the employed aged 35-44, but
a large proportion of respondents (64.3%) never used it. And finally, Pinterest
has never been used by 76.4% of the respondents and only 1.9% of them use it
on a daily basis. The average time per day spent on all social networks is 3.98
hours with standard deviation of 2.985. No statistically significant correlation
between the time spent on OSNs and age categories appeared.
The first set of questions regarding privacy is related to personal data and
their public placement and online availability. For the majority of items,
respondents answered that it depended on the OSN. But there were those who
never reveal their actual personal information (such as name, birthday, etc.)
regardless of the OSN. However, there are those who always leave their actual
personal information. The detailed results are shown in Figure 4.
As for the sharing intentions, the majority of the respondents share
personal data, photos, videos and posts with all friends (Figure 5).
In the next set of questions, tagging behaviour is the most interesting.
Actually, those who are married, tag friends without asking considerably more
seldom than others (6%), while 18.75% of those who are in a relationship and
20.8% of those who are single tag friends without permission.
136 Gordana Savić and Marija Kuzmanović
Data privacy and safety concerns are expressed through Westin Privacy
Segmentation Index [41].
Figure 6 shows the distribution of responses to the three PSI questions. In
Q1, agreement means privacy-concerned, while in Q2 and Q3, disagreement
means privacy-concerned. Therefore, the majority of respondents think that
OSN users may lose control over private information distribution (50.39%
strongly agree and 33.95% somewhat agree). Only 2.03% of the respondents
disagree with this. Such responses also indicate privacy concern.
The distribution of the responses to questions about concerns for friends’
data privacy and the concern that parents (superiors or colleagues) could have
access to their private information (Q2 and Q3) are very similar. The majority
of the respondents (around 45%) somewhat disagree with the statement that
providers behave properly (Q2) and that laws protect users (Q3). On the other
hand, only 6.24% agree with statement Q2 and 4.84% agree with statement
Q3. These statistics also indicate that a high level of privacy concern exists
among OSN users in Serbia. The obtained results are in line with the concerns
level in other countries, e.g., 92% of Americans and Brits are concerned to
some extent for their privacy online [36] which is 42% more than in the
previous year. Interestingly, the average share (around 8%) of the unconcerned
online users in those countries is still slightly higher than in Serbia.
138 Gordana Savić and Marija Kuzmanović
With the third question from this group (“I’m worried that parents
(superiors or colleagues) have access to my information”), the situation is
similar. The most concerned are the youngest respondents, or one could say
that they are not worried because the average score of responses is only 2.11,
which indicates that most of them replied that they are slightly concerned.
A statistically significant correlation (p<0.01) is observed between PSI
and the level of concerns regarding personal information privacy (Figure 12).
None of the unconcerned assigned score 5 (extremely concerned), while a very
small number of them are very concerned (8.7%). The average score for this
144 Gordana Savić and Marija Kuzmanović
CONCLUSION
The study focuses on understanding the relationship between the concerns
of OSN users, their actual behaviour and concerns regarding privacy. The
results reveal that general behaviour of OSN users in Serbia mainly depends
on marital and employment status, e.g., depends on general occupancy and
interests. Namely, singles use OSNs for establishing new relationships and
chatting, while university students most often use OSNs for sharing
information and photos and informing about social events and playing games
and solving quizzes. Furthermore, the majority of the respondents (72.51%)
use Facebook, and for a large share of 69.6% it is their primary choice. The
results are in line with Internet World Stat findings [38], but the percent is still
considerably higher. According to them, 46.4% of the Internet users
worldwide prefer and use Facebook. This means that there are more OSN
users in Serbia than world’s average, especially regarding Facebook. Having
that in mind, the main issue should be their online privacy protection and
concerns.
Meanwhile, as with other research in the literature [28, 27], our survey
discloses a privacy paradox (concerns-behaviour dichotomy). According to the
Behaviour and Attitudes vs. Privacy Concerns … 145
Privacy Segmentation Index [20, 42], Serbian OSN users are almost evenly
distributed into groups of pragmatists and fundamentalists, and only nearly 4%
are unconcerned. The results from Serbia are in line with the recent results
which suggested that approximately 49% of individuals are fundamentalists,
40% are pragmatists and around 10% are unconcerned according to [42].
There are more fundamentalists among older users (above 35 years of age) and
married people than in the other groups, as expected. Apparently, a lower
percent of Serbian OSN users are unconcerned. Therefore, we expected that
users in Serbia would be more cautious regarding data privacy and behaviour
than in other parts of the world. However, the actual online behaviour is in
contradiction with user concerns when sharing personal information. There is
no significant correlation between PSI categories and revealing the actual
name, phone number, e-mail, date of birth, residence or employment status.
Namely, actual behaviour of OSN users revealed that most respondents share
actual personal data, photos, videos and posts, mainly with all friends.
The results of our study indicate that most users in Serbia always share
actual personal data or sharing behaviour depends on a currently used OSN.
More precisely, 98% of the respondents use their actual first and last name and
98% post an actual photo in their profiles. These results are in line with Young
and Quan-Haase [43] and Tufekci [37] findings. Unlike Tufekci [37] who
demonstrates that as many as 75.6% of the respondents reveal their
relationship status, our survey showed only 20% of the respondents always did
so. But, as much as 40% of OSN users have never revealed their relationship
status.
However, the survey revealed that respondents were moderately
concerned for their data privacy (mainly youngsters), slightly less concerned
for their friends data privacy and the least concerned that parents or a superior
could have had insight into their activities and posts on an OSN. Their
experience with the misuse of data was in line with the Westin category to
which they belong. The majority of those who have had such experiences were
among the fundamentalists, but the majority of those who never had such
experiences fall into the group of the pragmatists.
Limitations of our study are reflected in the predictive power of Westin’s
categories and the assumptions underlying his Privacy Segmentation Index.
Namely, our findings have failed to establish a significant correlation between
the Westin categories and actual privacy-related behaviours on OSNs. This is
because Westin index captures broad, generic privacy attitudes. Moreover, the
instrument was created in 1995 for the American market and has not been
146 Gordana Savić and Marija Kuzmanović
REFERENCES
[1] Acquisti, A., and R. Gross. “Imagined communities: Awareness,
information sharing, and privacy on the Facebook.” In Privacy
Enhancing Technologies. Springer, 2006.
[2] Barnes, S.B. “A privacy paradox: Social networking in the United
States.” First Monday 11, no. 9 (2006).
[3] Berendt, B., O. Gunther, and S. Spiekermann. “Privacy in e-commerce:
stated preferences vs. actual behavior.”Communications of the ACM 48,
no. 4 (2005): 101-106.
[4] Boyd, D., and N. Ellison. “Social Network Sites: Definition. History.
and Scholarship.” Journal of Computer Mediated Communication 13,
no. 1 (2007).
[5] Chen, K., and A.I. Rea Jr. “Protecting personal information online: A
survey of user privacy concerns and control techniques.” The Journal of
Computer Information Systems 44, no. 4 (2004): 85.
Behaviour and Attitudes vs. Privacy Concerns … 147
[6] Clark, L.A., and S.J. Roberts. “Employer’s use of social networking
sites: a socially irresponsible practice.” Journal of Bussines Ethics 95
(2010): 507-525.
[7] Debatin, B. “Ethics, privacy, and self-restraint in social networking.” In
Privacy online, 47-60. Springer Berlin Heidelberg, 2011.
[8] Debatin, B., J.P. Lovejoy, A.K. Horn, and B.N. Hughes. “Facebook and
online privacy: Attitudes, behaviors, and unintended consequences.”
Journal of Computer-Mediated Communication 15, no. 1 (2009): 83-
108.
[9] Dolnicar, S., and Y Jordaan. “Protecting consumer privacy in the
company's best interest.” Australasian Marketing Journal (AMJ) 14, no.
1 (2006): 39-61.
[10] Fire, M, R Goldschmidt, and Y Elovici. “Online Social Networks:
Threats and Solutions.” IEEE COMMUNICATION SURVEYS and
TUTORIALS 16, no. 4 (2014): 2019-2036.
[11] Graeff, T.R., and S. Harmon. “Collecting and using personal data:
consumers’ awareness and concerns.” Journal of Consumer Behaviour
19, no. 4 (2002): 302-318.
[12] Gross, R., and A. Acquisti. “Information revelation and privacy in
online social networks.” Proceedings of the 2005 ACM workshop on
Privacy in the electronic society. 2005. 71-80.
[13] Ha, H. “Factors influencing consumer perceptions of brand trust
online.” Journal of Product and Brand Management 13, no. 5 (2004):
329-342.
[14] Haddidi, H., and P. Hui. “To add or not to add: privacy and social
honeypots.” Proceedings of the ICC 2010: IEEE International
Conference on Communications. Capetown, South Africa: IEEE, 2010.
[15] Harris Interactive. A survey of Consumer privacy attitudes and
behaviours. PLI/Harris, 2001.
[16] Hoy, M.G., and G. Milne. “Gender differences in privacy-related
measures for young adult facebook users.” Journal of Interactive
Advertising 10, no. 2 (2010): 28-45.
[17] Hugl, U. “Reviewing person's value of privacy of online social
networking.” Internet Research 21, no. 4 (2011): 384-407.
[18] Keith, M.J., S.C. Thompson, J. Hale, P.B. Lowry, and C. Greer.
“Information disclosure on mobile devices: Re-examining privacy
calculus with actual user behavior.”International Journal of Human-
Computer Studies 71, no. 12 (2013): 1163-1173.
148 Gordana Savić and Marija Kuzmanović
Chapter 6
ABSTRACT
In the process of developing e-Government, Serbian government has
implemented a lot of e-Government services which produce a large
amount of data and text documents, and whose citizens use more and
more these services in their everyday lives. Text documents are in
Serbian language and commonly in HTML, PDF and Microsoft Word
format. Considering an increased amount of the text documents, Serbian
e-Government has indicated the need for certain data and information
extraction from the variety of existing text documents which are usually
in a format prepared for print.
In order to offer technical solution for a case, the authors have
developed a dedicated application that includes Lucene library. Lucene is
a specialized library for an implementation of the indexing and searching
over a large amount of data. The procedure of quick search within
*
Corresponding author: V. Nikolic, Email: vojkan.nikolic@mup.gov.rs.
152 Vojkan Nikolić, Predrag Đikanović and Slobodan Nedeljković
INTRODUCTION
A rapid expansion of Internet as the main medium for sharing information
and internet availability has encouraged more and more people to create and
share data, information and knowledge. Considering the fact that the
Government of the Republic of Serbia (RS) [1] has implemented [2, 3] a large
number of e-Government services during the process of e-Government
development, the use of these services on daily base leads to producing a large
amount of data and documents which are mostly in the form of text in text
documents and for these services are necessary the data and information which
should be extracted from a variety of existing text documents which are
usually in the format prepared for print [4, 5]. Bearing in mind the amount of
documents, no one has enough time to read all these documents and be able to
“extract” important information contained in them. It is obvious that there is a
need to select and separate the documents.
One approach to this problem is so called text mining [6]. The aim of text
depth analysis is finding interesting and nontrivial information, as well as
knowledge in unstructured text documents, then clustering and classifying
them. Natural language common in such documents is not suitable for an
analytical processing that gives unstructured text. For processing these
documents on a computer, the documents should be adapted and prepared for
computer processing. This process involves a lot of series of activities and
procedures.
The concepts and application of Natural Language Processing (NLP)
represent a set of techniques and methods for an automatic text generation in a
natural language. This concept is applicable and it supports many languages.
Information Retrieval and Development of Conceptual Schemas … 153
This group of crimes includes criminal offenses where someone has been
killed, bodily injured or endangered to death, has suffered bodily injury, or a
case when someone’s health has been seriously impaired. Violence, abuse and
neglect are categorized by levels by the Procedures for Handling Cases of
Violence, abuse and neglect as follows:
but serious bodily injury is the one that endangers someone's life or damages
someone’s health.
Forensic medicine classifies bodily injury in two levels:
EuroVoc Thesaurus
There are four types of semantic indexing resources (also called controlled
indexing languages): - Controlled Vocabulary, Taxonomy, Thesaurus and
Ontology [12]. The thesaurus as one kind of semantic resources [13] is a
network of controlled vocabularies. It is a higher level compared to
taxonomies. It is a data representation including associative relationships in
addition to hierarchical relationships. The structure of EuroVoc depends on
semantic relationships (at the specific level of descriptors and non-
descriptors): scope note, micro-thesaurus relationship, equivalence
relationship, hierarchical relationship, associative relationship [14]. The
thesaurus has equivalence (USE/UF), broader term (BT), narrower term (NT)
and related term (RT) relationships. So formed relationships enable the
structure and scope for the thesaurus.
For instance, having in mind that a broader term for “physical assault” is
“criminal law” and the narrower terms are “criminal offense against a person”
and “criminal offense” determine the scope of a set of data relating to these
terms. In order to realize granular and more consistent indexing, using
semantically relationships, can be used an expanded set of links. It provides a
very efficient process of searching from the perspective of an user.
Developed by approach [15] to semantic annotation of texts is to move
beyond bag-of-words representation, using atoms of lexical knowledge to
represent the elementary word meanings (senses), and converting the text into
a graph linking senses rather than words. WordNet synsets are well suited for
that purpose, grouping words into sets of synonyms related to word
definitions, providing sense identifiers and recording semantic relations
between synsets. The sense clustering methods are referenced in the WordNet
and EuroWordNet and ensure relations between the sets of synonyms
(synsets). Synsets represent the senses of the words, which are grouped into
clusters. Text annotated at a higher abstraction level can be clustered in a
better way because similarities between texts are more cleared.
Information Retrieval and Development of Conceptual Schemas … 157
Figure 1. EuroVoc -The term of a criminal offense in the dictionary of criminal law.
Translating the query and document from raw strings into something we
can do computation what is the first hurdle in computing a similarity score. To
do so, we use “query models” and “document models.” The “models” here are
just a fancy way of saying that the document vectors are represented in some
other way that makes computation possible.
The similarity between two documents is a function of the angle between
their vectors in the term vector space. The similarity between query (q) and
documents (d) is expressed by the cosine of the angle between two vectors (q
and d) in the next formula:
(1)
The above image illustrates this process for the query “fizičkinapad” (en.
“physical assault”) and the document Crime court Republic of Serbia (Figure
5), according with query and document models in [23].
The final step in computing the similarity score runs the query and
document representations using a scoring function.
(2)
Figure 5. The process for the query “fizičkinapad” and the Crime court
Republic of Serbia.
Information Retrieval and Development of Conceptual Schemas … 161
(3)
(4)
Terms extraction.
Concepts extraction.
Concepts weighting.
Essentially each of these groups are quite different and relatively complex
operations. The first step in indexing is to extract the text from the original
document content. Then, the extracted text is used to create the document. The
resulting document is made up of fields. Such developed text fields are
analyzed and formed a set of tokens. The last step in the indexing of text
documents is to combine token with the corresponding indices.
In order to index the document using Lucene, we have to convert in the
plain text format for Lucene processing, and then create a Lucene document.
To create an index, for the document in PDF format, is first necessary to use
the method to extract the information in the form of text from PDF manuals
and then extracted text is used to create documents. Similar approach is used
for indexing of Word, or any other document that is not in full plain text
format. Also, for HML or HTML documents using plain text characters, you
need to properly prepare your data for indexing. When you get the text that
you want to index and create a document with the fields, the text needs to
undergo a process of analysis.
Analysis of converting text data into the base unit time called token. This
is the process of converting raw text in tokens. Lucene, this is achieved by
using Analyzer, Tokenizer and Token Filter classes. Tokenizer is responsible
for the input component pieces, the tokens. Token filters can further modify
the tokens produced by Tokenizer.
Once you create Lucene document fields, you can invoke the Index
Writer. After that, Lucene first analyzes the text, and then text data is divided
into tokens, and then perform a large number of operations. Using Lucene
filter, you perform a search for a specific word or set of words that can be
written using small and capital letters.
During the analysis, text data passes through several operations: the
removal of common words, ignoring punctuation, stemming of words to
reduce them to the root-form, the changing of words to lowercase, etc.
Analysis takes place immediately prior to indexing and query. Analysis
166 Vojkan Nikolić, Predrag Đikanović and Slobodan Nedeljković
converts text data into symbols, and these symbols are added to the terms of
the Lucene index.
Lucene library contains a variety of built-in analyzer. Some of them are:
SimpleAnalyzer, StandardAnalyzer, StopAnalyzer and SnowballAnalyzer.
They are different in a way they treat text and mode of application and the
type of used filter. Such analysis can have advantages. The removal of pre-
indexing, decrease the size of the index, can have a negative impact on
processing precise queries. Applying Lucene can have more control over the
process of analysis using custom analyzer.
After analyzing the input text and creation of its representation, Lucene
index is corrected. Lucene uses the data structure known as an inverted index.
The inverted index uses both disk space and enabling faster look up key time.
Its structure is inverse, because the tokens that are used are extracted from an
input document form in the form of look up keys. This mechanism ensures that
the document is not treated as a central entity. This means that directly it seeks
concrete word in index instead of scanning the entire document.
Lucene’s default scoring system works very well for most cases. It uses
seven different variables to determine the final ranking of each document.
Along TF and IDF variables, there are (from lucenetutorial.com):
(5)
Information Retrieval and Development of Conceptual Schemas … 167
During a research phase (user querying) we want to find the most relevant
documents by applying weighting method by TF-IDF. The operative term will
be “povreda” or some root or variant thereof. We will search for this term in a
half dozen simple documents, as illustrated in the table below.
CONCLUSION
The aim of this study is to present the possibilities of access to Lucene
indexing and Lucene searching data and unstructured text documents related to
crime in Serbian language in order to find elements of crime in cyberspace.
This research is based on Vector Space Model where Tf-Idf measure is applied
on query and Lucene index, in order to advance searching data process for
creating a conceptual model. The emphasis is on three articles of the Criminal
Code of the Republic of Serbia law relating to physical violation. Considering
the criminal activity is also presented in cyberspace, fast search techniques are
important for detection and processing criminal offenses in order to increase
the level of security in the Republic of Serbia.
172 Vojkan Nikolić, Predrag Đikanović and Slobodan Nedeljković
REFERENCES
[1] The strategy and action plan for the development of electronic
administration until 2013 (“RS Official Gazette”, Nos. 55/05, 71/05-
correction, 101/07 and 65/08).
[2] Nikolić, V;Đikanović, P;Batoćanin, D. e-Government Republic of
Serbia: The registration of motor vehicles and trailers, YU INFO, 2013.
[3] Nikolić, V;Protić, J;Đikanović, P. G2G integraatioin MOI ofRepublic of
Serbia with e-Government PORTAL, ETRAN, 2013.
[4] Randjelović, D; Popović, B;Nikolić, V;Nedeljković, S. Intelligent search
terms in the case of police services in eGovernment, New information
technology for analitycal decision-making in the biological, economic
and social systems, State university in Novi Pazar, 2014.
[5] Dragović, R;Ivković, J;Dragović, D;Klipa, Đ;Radišić, D;Nikolić, V.
Decision support system to support the strategic management of the state
administration, YU INFO, 2015.
[6] Ning, Zhong;Yuefeng, Li; Sheng-Tang, Wu.“Effective Pattern
Discovery for Text Mining”, IEEE Transactions on Knowledge and
Data Engineering, vol.24, no. 1, pp. 30-44, January 2012,
doi:10.1109/TKDE.2010.211.
[7] Peter,Teufl; Udo, Payer; Guenter,Lackner. From NLP (Natural
Language Processing) to MLP (Machine Language Processing),
Computer Network Security, (2010).
[8] Stevic, Z; Rajcic-Vujasinovic, M; Radovanovic, I; Nikolic,
V.Modeling and Sensing ofElectrochemical Processesupon
DiracPotentiostaticExcitation of Capacitive Charging/Discharging,Int. J.
Electrochem. Sci.,10(2015)6020-6029.
Information Retrieval and Development of Conceptual Schemas … 173
Chapter 7
ABSTRACT
The possibility of achieving protected communication has long been
a privilege just for professional services and systems that can afford great
investment for the development of specialized devices for this purpose.
Today, the popularization of open-source development model enabled
significantly reduction of development costs and maintaining high levels
of security. This development implies the inclusion of existing
components that enables verifying implemented principles if it is
necessary. This paper discusses key issues related to the development of
mobile devices for secure communication based on the Android platform.
*
Corresponding author: A. Jevremovic, Email: ajevremovic@singidunum.ac.rs.
176 Aleksandar Jevremović, Mladen Veinović, Goran Šimić et al.
INTRODUCTION
The possibility of achieving protected communication has long been a
privilege only for professional services and systems that are able to make big
investments for the development of specialized devices. At this level as safety
communication can be considered only one that is based on the coding
algorithm developed by the end user. In addition, the principles of operation of
such cipher algorithms may not be known to anyone other than the system end
user. From this approach the popular cipher algorithms (DES, AES, etc.)
cannot be considered as safety ones [1].
Therefore, already built and in-the-box communication protecting systems
running out of the question. Cryptographic solutions require reliable (safety)
platform for implementation (hardware and system software) which can be
verified (open source systems). Method of implementation and confidence in
the cryptology synchronization procedures and resynchronization are a key
factor for confidence in their custom cryptology solution. Such an
implementation prevents the existence of secret doors through which the
cipher keys can “leak”. Moreover, the procedures about cryptology keys
manipulation are essential for such considerations (storage, selection,
distribution, deleting etc.).
Today, the popularization of open-source development model enabled
significant reduction of development costs and maintaining high levels of
security. This development implies usage of already built components that
enables verifying implemented principles if it is necessary.
This paper discusses key issues related to the development of mobile
devices for secure communication based on the Android platform. This
research is based on the collected experience [2-9] in the development of top
level communication protection systems based on Linux platform.
realization of this level will have a negative impact on the performance of the
encryption system.
Implementation of custom cipher algorithm at the hardware level may
represent a good solution if there is requirement of high performances and use
of resources that are located outside the computer system. On the other hand,
such a realization is usually very expensive. In addition, any changes to the
system are much more difficult and expensive to implement than software
changes. Finally, the user must have the ability to independently design and
produce the desired hardware or he has to have ability to thoroughly oversee
this process if it is running by somebody else. At the hardware level it can be
achieved maximum speed of encryption and decryption produced via a
dedicated crypto processor. Dedicated processor design techniques are well
known to everyone. However, the technology for the realization of these
processors has only a few countries in the world whose realizations the rest of
the world does not trust.
Our experience indicates that the kernel of operating system represents
optimal place for the implementation of custom cipher algorithm (full list of
references are sited in introductory part). It can achieved improved
performance due to reduced number of system calls. In addition, the
cryptosystem in the core of operating system can easily be paired with the
protocols of the transport and network layer as well as data link layer.
However, for such an approach it is necessary to have access to the source
code of the kernel, which is enabled within the Linux operating system.
etc.) there is no confidence that the complete cipher key search is necessary. It
is suspected that there are shortened ways for that and there is also reasonable
suspicion that these shortened procedures are known to those who have
designed algorithms listed. It is based on some information published, mainly
via the Internet. A particular problem is the use of asymmetric encryption
system (RSA) due to fact that scientifically (mathematically) it has not been
proven that there are no shortened procedures for breaking algorithm.
Once designed cipher systems obsolete quickly. The development of
processors, computers and networks facilitating faster search of keys, and such
systems have to be examined and changed after a few years of use. The safety
processing encryption systems are been attacked by using weaknesses of
computer protocols, built-in backdoors, or human mistakes.
Different of safety processing ciphering algorithms, the content encrypted
by using absolutely safety encryption systems cannot be broken, regardless of
the amount of processing power engaged for this purpose. More precisely, it is
achieved by using unique cipher key (so called one time pad). This further
implies that the key length must be equal to or greater than the length of a
message to be encrypted. Moreover, a new key has to be used for each
message. Due to everyday improvement of storage media (more capacity at a
reduced cost) it is becoming more realistic that absolutely safety encryption
systems can be successfully applied for the protection of real-time
communication. For instance, one time pad systems can be used for protection
of standard voice-coder systems instead ordinary used safety processing
ciphering algorithms.
In both cases, regardless of which encryption system is used, there has to
be established safety communication channel for exchanging the keys between
sides in communication. Development of special protocols which enable the
exchange of a secret keys through unsafe channel [11-13] can be used as
alternative way for this purpose. However, according to published results the
performances of such protocols are still unsatisfactory as well as confidence in
them. These are the main reasons that they are not used in absolutely safety
encryption systems.
A simple and usable-in-practice method for absolutely safety encryption is
bitwise processing of message with XOR logical function (exclusive
disjunction) in which a bit array that came out of a random number generator
is used as second operand [14] (so called sequential or Bit-for-bit encryption).
It is well known that such systems are the most resistant to errors in the
channel (one false bit in the cipher text affects only one bit decoded
incorrectly in an open message), which is important for mobile
Development of the Android-Based Secure Communication Device 181
of security systems the practice is that all components that are not required are
removed to simplify the system and to reduce the potential risk of installation
of back-doors.
Support for IPsec protection system already exists in the Linux kernel, as
AH and ESP protocols. Cipher algorithms used in these protocols are also
implemented in the kernel, and they can be accessed via the standardized
Cryptographic API. This means that user defined cipher algorithm can be
implemented as a kernel module and it can be used for communication
protection via IPsec protection system.
Figure 5. Model of applying user defined cipher algorithm by using Cryptographic API
of Linux kernel.
Development of the Android-Based Secure Communication Device 183
Figure 9. Real numbers obtained by conversion of hex.string into double (left), the
resulting binary sequence (right).
Development of the Android-Based Secure Communication Device 187
less then probability, it could be assumed that sequence source was from
random resource and observed sum was acceptable due to its value which was
higher than 387,840 (which is applicable).
Performed test obtains generating of non-redundant 264600 bits (or
approximately 33 KB) and it takes at least 6 seconds. During this period audio
card practically produced double – 529200 bytes (through stereo channels)
from generated noise. For further analysis, generated random bit array was
compared with the appropriate ones available on the Internet. Web services
offered at random.org were used for this purpose.
At the analog physical level, footprints of noise signals are different
(Figures 11, 12) which imply their difference in randomness. The noise signal
produced from constructed generator has more uniform distribution of bits
than the one downloaded from the Web (random.org). Further statistical
analysis and comparison verify such considerations.
Maurer’s test
Observed sum 456584.6075373749
Observed fn 5.21453411988779
Expected fn 5.2177052
Variance 2.954
P-Value 0.9985278848454993
P-Value (Decimal) 0.99852788484549925840383
Figures 11 and 12. Generated noise (constructed generator - left, random.org - right).
Development of the Android-Based Secure Communication Device 189
Entropy overlapping
Constructed Random.org <
Type of random.org
generator generator
test Result
Result
Monobit 0.9999999970453606 0.9999999851224279 False
Bigram 0.9999938028778617 0.999998712659246 True
Trigram 0.9999776564462013 0.9999971314264103 True
4x4 Matrix 0.9999632491279191 0.9999918362253594 True
Entropy non-overlapping
Random.org
random.org Constructed generator
Type of test < generator
Result Result
Monobit 0.9999999970453606 0.9999999851224279 False
Bigram 0.9999850632203415 0.9999935149038766 True
Trigram 0.9999443542760401 0.9999798584606167 True
4x4 Matrix 0.9999632491279191 0.9999918362253594 True
190 Aleksandar Jevremović, Mladen Veinović, Goran Šimić et al.
CONCLUSION
Highly secure systems for communication protection cannot be depending
on already built solutions regardless of fact that their producers published
almost all details of implementation. This paper presents a model for the
development of its custom system for secure communication depending on
custom encryption algorithms. The basic requirement is that all system
components which can affect the safety have to be developed and tested and
approved by the authors themselves. Linux/Android platform is proposed as a
solution as it has the most of the necessary functions and that complete source
code is publicly available.
Cipher algorithms will be firstly incorporated at the telecommunication
level as a part of development of safety network system. Such solution will
aggregate particular implementations for each application protocol. More
precisely, the proposal includes IPsec security extensions as the
implementations below the network level make use of Internet impossible.
On the computer system, custom cipher algorithm can be implemented in
user (application) space, the core of the operating system or at a hardware
level. Although the implementation in user space represents the simplest
approach, such a realization is difficult to protect the growing number of
application protocols. Lowering of performances is also expected. The
implementation at a hardware level should be appropriate regarding to the
performances and security level, but such an approach is difficult to implement
and rigid for modifications. Therefore, the solution presented in the paper
proposes building of its custom algorithm in the form of Linux kernel
modules.
This paper also analyzes the choice between processing encryption
systems and absolutely secured encryption systems. With regard to the
possibilities of modern devices based on Android, with the focus on the
capacity of modern storage media, absolutely secured encryption systems are
partially preferred.
The final problem in the implementation of safety communication device
is how to obtain a “clean” platform on which the source code of the cipher
algorithm, core of operating system and other necessary software would be
compiled. In addition, the hardware platform on which such software would be
deployed should be “clean” - other words it has to be without “back doors”.
This is the only way the user has a full control over the system.
For solution proposed at OS level, specific hardware device is designed in
order to obtain bit array used for XOR message encryption. This tool acts as a
Development of the Android-Based Secure Communication Device 191
ACKNOWLEDGMENT
Authors of this paper were participants on scientific projects - TR32054,
III44006, III44007 and ON174008 funded by Ministry of Education, Science
and Technology Development Republic of Serbia.
REFERENCES
[1] Jevremović, Aleksandar et al. 2006. “IP Security under Linux OS”,
Proceedings of 50th ETRAN Conference, Belgrade, Serbia, pp: 114-
117.
[2] Jevremović, Aleksandar et al. 2006. “IPsec – Analyzing Influence of
Cryptographic Algorithm on Lan Networks Traffic”, 14.
Telecommunications Forum Telfor, IEEE, Belgrade, Serbia.
[3] Jevremović, Aleksandar et al. 2008. “Custom Cipher Algoritm for
AJAX Requests Protection in Web applications”, Proceedings of 52th
ETRAN Conference, Belgrade, Serbia.
[4] Jevremović, Aleksandar et al. 2009. “Zaštita bežičnih komunikacija
korišćenjem sopstvenog šifarskog algoritma”, 17. Telecommunications
Forum Telfor, Belgrade, Serbia.
[5] Jevremović, Aleksandar. 2011. Integracija sopstvenih kriptoloških
sistema u standardnu računarsko-telekomunikacionu infrastrukturu”,
Univerzitet Singidunum, Belgrade, Serbia, pp. 1-122.
192 Aleksandar Jevremović, Mladen Veinović, Goran Šimić et al.
[18] Rukhin, Andrew et al. 2010. “A Statistical Test Suite for Random and
Pseudorandom Number Generators for Cryptographic Applications,
NIST Special Publication 800-22, Rev. 1a, Computer Security, U.S.
Department of Commerce, 131 pages.
[19] Coron, Jean-Sebasitien and Naccache, David. 2002. “An Accurate
Evaluation of Maurer’s Universal Test, Lecture Notes in Computer
Science, Vol. 1556, pp 57-71.
AUTHOR CONTACT INFORMATION
aspiration, 100
A assault, 66, 154, 156, 160, 162
assessment, 32
abstraction, 156
association rule, vii, 5, 7, 9, 12, 13, 14
abuse, 55, 58, 68, 71, 86, 124, 140, 148, 154
association rules algorithms, vii, 7
access, 2, 26, 40, 42, 47, 57, 58, 59, 61, 62,
asymmetry, 111
63, 65, 69, 78, 123, 125, 126, 127, 130,
atoms, 156
137, 142, 143, 144, 168, 171, 179, 192
attachment, 76, 82
accessibility, 20
attacker, 65
accountability, 42
attitudes, viii, 121, 122, 123, 124, 127, 128,
accounting, 112, 117
129, 145, 146, 147
adaptation, 177, 191
Attorney General, 82
administrators, 60
audit, 112, 117
adolescents, 123
authentication, 61
adults, 126
authorities, 60, 78, 82, 112
advertisements, 61, 64
awareness, 25, 48, 74, 77, 79, 125, 127, 147
adware, 54, 62
age, 12, 24, 41, 123, 125, 128, 131, 134,
142, 145, 146 B
agencies, ix, 1, 4, 12, 15, 20, 21, 22, 24, 25,
29, 37, 41, 42, 48, 153 back-doors, 182
algorithm, viii, ix, 7, 9, 10, 11, 12, 13, 14, bandwidth, 37
20, 22, 39, 65, 153, 175, 176, 177, 178, banking, 4
179, 180, 182, 183, 184, 190 banks, 28, 68
alternative hypothesis, 100 base, 22, 38, 39, 45, 81, 87, 88, 89, 91, 92,
android platform, ix, 175, 176, 190 94, 97, 98, 99, 102, 103, 104, 105, 106,
annotation, 156 112, 152, 165
armed forces, 54 behaviors, 147
arrest, 50, 83 behaviour and attitudes towards privacy,
artificial intelligence, 6, 55 121
198 Index
electronic surveillance, 3 financial, 55, 57, 58, 62, 76, 86, 111, 112,
e-mail, 6, 57, 58, 63, 65, 76, 77, 126, 130, 113
145 financial institutions, 57, 62
emerging markets, 116 financial reports, 112
employees, 60, 66, 77, 79 fingerprints, 5
employment, 128, 133, 140, 144, 145 flaws, 61
employment status, 128, 133, 140, 144, 145 flooding, 58
encryption, ix, 65, 177, 178, 179, 180, 186, food, 154
190 force, 154
endangered, 154 forecasting, 29, 120
enemies, 78 forensics, 1, 86, 112, 113, 117, 118, 119
energy, 191 formation, 79, 86
energy consumption, 191 formula, 160, 166
enforcement, 1, 29, 42 fouling, 154
engineering, 54, 57, 66, 78 foundations, 2
enlargement, 105 fraud, viii, 58, 64, 66, 71, 76, 86, 112, 113
entropy, 86, 100, 113, 185, 186, 187, 189, frequency distribution, 87
191 friendship, 127
environment, 16, 32, 37, 69, 78, 112, 125, funds, 55, 68, 177
126
equipment, 61, 78
espionage, 56, 78 G
ethnicity, 41
gambling, 28
European Union, 121
gangs, 5
evidence, 15, 56, 113, 116, 124
GAO, 83
evolution, 55, 56, 78
Geographic Information System, 21, 39
execution, 40, 56
geography, 45
expertise, 20
geology, 112
exploitation, 173, 183
Germany, 53, 120, 192
exposure, 154, 155
GIS, viii, 20, 21, 26, 27, 29, 30, 31, 32, 34,
extraction, 5, 151, 158, 161, 162, 163, 169
37, 41, 42, 47, 49
extracts, 159, 162
globalization, 57
Google, 128, 161
F governance, 80
governments, 16, 54, 57
Facebook, 58, 62, 121, 122, 123, 126, 127, GPS, 34, 62
130, 134, 144, 146, 147, 149 grants, 51
families, 44 graph, 31, 156, 162, 169, 170, 171
FBI, 49, 68, 82 gravity, 30
fear, 80, 124 Greece, 73, 75, 118
Federal Bureau of Investigation (FBI), 66, Gross Domestic Product, 114
82 grouping, 7, 156
fiber, 54 growth, 1, 3, 53, 121, 122
fights, 155 guilt, 154, 167
filters, 165
Index 201
injuries, 154
H institutions, 2, 54, 66, 77, 79, 155
integration, 4, 34
hacking, 6, 54, 57, 66, 68, 80
integrity, 55, 59, 154
hair, 154
intellectual property, 56, 60, 66
harassment, 122, 124, 126
intellectual property rights, 66
health, 43, 154, 155
intelligence, 15, 23, 24, 25, 42, 62, 78, 167,
height, 39
173
high school, 131, 132, 133
International Monetary Fund, 86, 108
hiring, 15
intervention, 25, 34, 43
history, 5, 23, 50, 78
investment, 112, 175, 176
horses, 59, 61, 77
IP address, 63, 69, 78, 79
hotspots, 23, 25, 28, 29, 34, 36, 37, 42, 44,
Iran, 12
45, 47, 49, 76
issues, ix, 5, 14, 15, 22, 24, 41, 50, 68, 130,
human, 2, 15, 25, 29, 47, 180, 184
136, 175, 176, 178, 179
human behavior, 25
Italy, 114
hypothesis, 87, 98, 99, 100
J
I
Japan, 68
ICC, 147
Java, 58, 77, 163, 185
ICS, 81
Jordan, ix, 119
identification, 5, 6, 25, 28, 29, 36, 41, 47,
judiciary, 2
57, 111, 118
justification, 32, 79
identity, viii, 12, 24, 41, 55, 58, 63, 76, 97,
juveniles, 60
123
image, 5, 65, 66, 112, 118, 126, 144, 160
improvements, 29 K
incarceration, 26
income, 24, 116 keylogging, 60, 66
income tax, 116
independent variable, 69
indexing, ix, 151, 153, 155, 156, 159, 161, L
162, 163, 164, 165, 166, 168, 171, 172,
174 labeling, 41
individuals, 1, 2, 24, 44, 54, 69, 112, 124, language processing, 152, 153
125, 126, 127, 129, 145 languages, vii, 152, 155, 156, 157, 164
industry, 60, 123 laptop, 67
infection, 73 law enforcement, viii, ix, 12, 20, 21, 27, 29,
Information and Communication 37, 41, 42, 43, 47, 48, 153, 158
Technologies, 81 laws, viii, 58, 77, 86, 100, 124, 129, 137,
information retrieval, 174 153
information sharing, 127, 130, 146 laws and regulations, 124
information technology, 2, 19, 124, 172 lawyers, 3
infrastructure, 34, 77, 78, 177 lead, 29, 42, 47, 53, 67, 123
learning, 7, 8, 9, 14, 24, 46
202 Index
predicate, 157
O prevention, vii, 22, 23, 25, 26, 29, 47, 48,
49, 50, 51, 54, 79, 158
obstacles, 34
principles, 175, 176
offenders, 15, 20, 23, 39, 44, 57
prisoners, 34
Office of Justice Programs, 50
privacy concerns, ix, 122, 123, 124, 125,
online social networks, 126, 128, 147, 148
127, 128, 146, 148, 149
operating system, 58, 59, 61, 64, 65, 67,
private information, 137
178, 179, 181, 190
private sector, 54, 79
operations, 2, 22, 39, 59, 165, 184
probability, 39, 87, 88, 89, 91, 92, 94, 95,
opportunities, 29, 34
96, 99, 101, 108, 109, 113, 188
optimal performance, 176
probability distribution, 87, 88, 89, 91, 92,
optimization, 20, 21, 29, 30, 50
94, 95, 96, 101
organ, 6, 155
problem solving, 21, 22, 26
OSN users, viii, 121, 123, 127, 128, 129,
procurement, 68
131, 137, 144, 145
producers, 190
overlay, 26
professionals, 37
profit, 68
P programming, 30, 61, 178
programming languages, 178
pairing, 3, 6 project, 163, 171, 179
parallel, 155, 184 propaganda, 55
parents, 130, 137, 142, 143, 144, 145 propagation, 59
participants, 127, 129, 131, 191 protection, 2, 4, 41, 57, 60, 63, 66, 68, 75,
password, 58, 65, 75, 76, 79 76, 79, 81, 126, 129, 144, 176, 177, 178,
peer review, ix 179, 180, 181, 182, 183, 190
pensioners, 131, 138 psychology, 112
perpetrators, 44, 55, 56 public administration, 1, 3, 4
personal computers, 69 public opinion, vii, 67
personal surveillance, 3, 4 public safety, 45
physics, 112 Puerto Rico, 120
piracy, 56 P-value, 72
plants, 111
platform, ix, 128, 175, 176, 190
Q
playing, 144
police, vii, viii, 2, 5, 15, 19, 20, 21, 22, 23,
quality of life, 22, 35
24, 25, 26, 28, 29, 31, 32, 34, 40, 42, 43,
quantitative technique, 22, 43
46, 47, 48, 49, 50, 51, 68, 69, 82, 85, 172
query, 37, 40, 159, 160, 161, 162, 165, 166,
policy, 26, 50, 74, 77
167, 169, 171
political crisis, 79
questionnaire, 128, 142
political party, 14
quizzes, 134, 144
politics, 14
population, 74, 111, 121, 123
population growth, 111
pop-up ads, 62, 77
204 Index
R S
race, 24 SaaS, 46
radiation, 119 safety, 40, 67, 74, 79, 137, 176, 178, 179,
radio, 34 180, 181, 190
radius, 37 scatter, 70, 71
ramp, 34 scatter plot, 70, 71
random numbers, 184, 192 school, 131, 132, 133
rape, 41 science, 2, 22, 82, 112
rationality, 79 scope, 15, 55, 156
reading, vii, 49 search space, 179
real numbers, 186 search terms, 172
real time, 25, 34, 48, 67 secret key, 180, 181
reality, 1, 2, 148 secure communication, ix, 175, 176, 190
recall, 8 security, vii, viii, 15, 20, 21, 53, 54, 57, 58,
recidivism, 26 62, 68, 74, 75, 76, 77, 78, 79, 80, 81, 82,
recognition, 7, 47 83, 111, 112, 148, 152, 171, 175, 176,
recovery, 77 181, 190
recurrence, 111 security practices, vii
redundancy, 185, 186 security threats, 81
regression, viii, 28, 45, 54, 69, 70, 71, 72, seed, 184
73, 79, 80, 82 self-esteem, 149
regression equation, 71 semantics, 162
regression line, 70, 71, 72 semiconductor, 184, 191
regression method, 45 senses, 156, 157
regression model, viii, 79 sequence pattern, 5
regulations, 4 Serbia, ix, 1, 19, 50, 51, 53, 81, 85, 121,
relevance, 4, 169, 173 122, 123, 124, 128, 137, 144, 145, 151,
reliability, 9, 119 152, 153, 160, 167, 171, 172, 175, 191,
religion, viii, 12 192
reputation, 126 servers, 54, 58, 63
requirement, 21, 34, 179, 181, 190 sex, 24
researchers, 3, 5, 6, 12, 36, 37, 96, 111, 183 sexual harassment, 123
resource allocation, 29, 118 sexual orientation, 126
resource management, 7 shape, 69
resource utilization, 29 shock, 64
resources, 20, 21, 22, 23, 24, 26, 29, 30, 32, showing, 28, 102, 105, 108, 113, 127
34, 40, 42, 43, 44, 45, 47, 50, 59, 60, 61, side effects, 29
79, 156, 161, 162, 179 signals, 188
response, 29, 40, 42, 142, 159 significance level, 134, 138
robberies, 28, 38 signs, vii, 61, 167, 168
root, 61, 98, 165, 167, 170 simple, viii, 27, 32, 40, 54, 69, 79, 82, 118,
routes, 26 162, 170, 178, 180, 184, 192
rule discovery, 5, 7 simple linear regression, viii, 54, 69, 79
rules, viii, 5, 7, 10, 11, 13, 67, 74, 79, 129 simulation, 86
Index 205
transactions, 4, 5, 10, 11, 15, 111, 112, 127 victims, 20, 22, 23, 39, 41, 44, 68, 79
transfer of money, 76 videos, 42, 133, 135, 145
transgression, 41 violence, 48
transistor, 183, 184 virus, 54, 59, 63, 64, 65, 73, 75, 77, 82
translation, 155, 158 viruses, 55, 56, 57, 59, 61, 65, 66
transmission, 56, 66 visualization, viii, 7, 20, 21, 26, 27, 32, 42,
transparency, 122, 123, 127 47, 48, 51, 158
transport, 34, 178, 179 vocabulary, 155
turbulent flows, 119 vulnerability, 65
Turkey, 1, 73, 75
turnover, 11, 13, 14
W
it-eb.com
for more...