Using Global Terrorism Database (GTD) and Web Data Mining To Predict Terrorism and Threat in Social Media Texts

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

USING GLOBAL TERRORISM DATABASE(GTD) AND WEB DATA MINING TO

PREDICT TERRORISM AND THREAT IN SOCIAL MEDIA TEXTS


Ankit Mehta1,Sanskar2,Devyash Bordia3
Department of Computer Science and Engineering,

SRM Institute of Science and Technology,

Chennai,India

Abstract-- After a recent increase in terrorist I. INTRODUCTION


attacks, more sophisticated and advanced
There has been a huge growth in Internet users
information technologies have been developed
in the past decade. Technological advancement
to counter the act of terrorism. The idea of
has not only benefited the society but also has
online terrorism has also been growing its roots
given rise to various problems in the society. One
in the internet world. These types of activities
of such threats is the growth in cyber terrorism.
have been growing along with the growth in
Cyberterrorism can be characterized as the
internet technology. These types of events
purposeful utilization of PCs, systems, and open
include social media threats such as hate
web to cause decimation and mischief for
speeches and comments provoking terror on
personal objectives or revenge. Experienced
social media platforms such as twitter,
cyberterrorists, who are talented as far as
Facebook, etc. These activities must be
hacking can make huge harm to government
prevented before it makes an impact. In this
frameworks, hospital and clinic records, and
paper, we will focus on the prediction of a
national security programs, which may leave a
terrorist event from the Global Terrorism
nation, community or association in strife and in
Database (GTD) With data mining techniques.
dread of further assaults. The agenda of such
The purpose of this project is to use social
terrorists might be political or ideological since
media texts such as tweets and other social
this can be viewed as a type of dread. There have
media posts from various organization as a
been a few noteworthy and minor occurrences
dataset to detect terrorism. For this, we will be
of cyberterrorism. Al-Qaeda used the web to
using tweets scraped from the profiles of
communicate with supporters and even to hire
supporters of various organizations as datasets
new individuals. Estonia, a Baltic nation which is
for the implementation purpose. We will utilize
continually developing as far as innovation,
the Global Terrorism Database (GTD) which is
turned into a battleground for cyber terror in
an open-source database including data on
April, 2007 after debates with respect to the
terrorist event far and wide from 1970 through
expulsion of a WWII soviet statue situated in
2017 to train a machine learning-based
Estonia's capital Tallinn. It is very necessary to
intelligent system to predict any future events
identify the fact that cyberterrorism has had very
that could bring threat to the society.
disastrous effect in our society. We intend to use
Keywords-- Global Terrorism Database (GTD); various Machine Learning algorithms to analyze,
Social Media Threats; Data Mining predict and categorize various terrorist activities
using various algorithms like Natural Language
Processing (NLP), sentiment analysis, kNN
algorithm and random forest algorithm to train a doing such exercises, and the various
system by feeding it with data from Global systems that others use in light of trolls
Terrorism Database and various threats related
Techniques to distinguish antisocial
text on Social Media.
conduct: There is a few firmly related
research work done on antisocial behavior.
In any case most have been completed in the
field of social and behavior sciences and
psychiatry. Endeavors have additionally
been done towards the recognition and
avoidance of physical indications identified
with reserved messages in networks

1) Antisocial Content Identification

The Fundamental goal is to discover


untruthful, exploitative, bigot and fake
substance on the online networks . The
primary undertaking is to influence the
attitude of other individual or to coast
untruthful messages to the network.
Examination of such information can lead us
II. RELATED WORK
to the wellspring of abhor advertiser or
This paper intends to deal with the following reserved message.
details:-
2) Mining of Criminal Information
To Study and investigate various techniques
and work done used to identify and channel Mining and investigation of criminal
vicious and hurtful solitary substance online information and record can give the
on the web. concealed examples and obscure terms and
codes utilized by the lawbreaker. In spite of
To play out a comprehensive investigation the fact that this is being recorded and
and contemplating of these systems and investigated by government law
discover basic patterns. implementation association that manages
the wrongdoings and offenders to keep
Studying antisocial behavior: Antisocial
away from its abuse.
conduct, incorporates exercises, for
example, "trolling, cyberbullying, grieving 3) Cyber Infrastructure Shielding
and criminal activities", they have generally
been utilized and discussed. Research on It has the procedure to ensure digital
Online people group to recognize foundation if there should be an occurrence
introverted conduct has significantly led of digital dangers and digital assaults. These
with subjective measures instead of assaults can be radical withdrawn action or
quantitative. an endeavor to risk the security.

These examinations for the most part There are number of strategies and
included the various kinds of trolling which procedures to examine the human brain
happens and the persuasive explanation for research however computational models
and techniques have not been created to B. Data Preprocessing
naturally identify antisocial practices in
The GTD incorporates in excess of 170,000
online networks. Messages and content
instances of terrorist events far and wide from
floats on interpersonal organization is either
1970 to 2017. For every occurrence, data is
unstructured or unclassified information and
accessible on the area and date of the comparing
turns into the significant obstacle to locate a
episode, summing up to 132 characteristics. Be
reasonable technique to naturally identify
that as it may, the information was gathered
the anti-social behavior.
from various information assets, which would
bring about information inconsistence.
Regarding this issue, we set the limit of offer
III. DESIGN AND METHODOLOGY
proportion as 20%, which means just those traits
The accompanying approach was intended that are recorded inside over 20% of the all-out
to accomplish the paper's target: cases will be considered. After measurable
investigation, 59 properties were chosen yet
missing information still exist in a portion of the
records. To take care of this issue, we utilize the
Mean Imputation (MI) strategy, trading the
missing information for a given property by the
mean of every single known estimation of that
trait in the class where the missing quality has a
place

The various categories of Terrorism are:

 Political Terrorism: Left Wing Terrorism,


Right Wing Terrorism or terrorist attacks
which uncover the solid purpose to
A. Data Set topple the experts.
The GTD, which is supported by National  Protesting Terrorism : The minority
Consortium for the Study of Terrorism and bunch in which individuals share a
Responses to Terrorism (START), is a database of similar regional and social character.
terrorist incidents with a period length from They will likely form a needy nation
1970 onwards. It contains more than 170,000 through political requests, savagery and
terrorist assaults with at any rate 45 factors for even furnished obstruction.
each case, which makes it as "right now the most  Religious Terrorism : It's one sort of
exhaustive unclassified information dependent religion fanaticism pattern planning to
on psychological militant occasions on the reestablish religious principle and set up
planet. In perspective on its effectiveness and religious government.
exhaustiveness, GTD information has highlighted  Underworld Terrorism : Organized
in various scholastic papers as exact violence, dangers or other unlawful
investigation for the examination on different methods are convicted to control or
parts of fear based oppression, from current harm to the monetary and social request
patterns to sorts of psychological oppression. for money related advantage.
Fig: k-Nearest Neighbors
C. Classification
The input training information of KNN
The fundamental idea of classification is to algorithm is mapped to
appoint items to one of a few predefined multidimensional element space which
classes. We think about two broadly utilized is partitioned into regions which are
classifiers, k-Nearest Neighbour (kNN) and characterized based on order of training
Random Forest Algorithm. sets.
K-Nearest Neighbour Classification The below figure demonstrates a
Approach diagram of the locale space of a KNN
kNN is an Instance-Based automatic learning classifier with three unique
algorithm. It works on the concept of Lazy classifications, in particular ω1, ω2 and
Learning methods. This algorithm is one of the ω3. These are related with the
notable methodologies in the field of example preparation information and dots,
acknowledgment. This algorithm is normally squares and triangles are marks of
and broadly utilized in text categorization, text information point's ω1, ω2 and ω3,
classification. It is among the top mining separately. In the figure 'X' speaks to the
algorithm and strategies in text classification information preparing information to be
and is effectively adaptable to enormous grouped.
applications. In Text Mining and natural
language processing kNN is the preferred
algorithm as it is very much suited algorithm
in multi model classes i.e it can have many
class labels or classifiers.

Fig: Region space of a KNN classifier


with 3-dimensions

Fig:1-Nearest Neighbor
Random Forest algorithm was best suitable for Weapon
Classifier and similarly Random Forest
RF merges the benefits of two machine Algorithm stood out to be best for Perpetrator
learning techniques bagging and random classifier on our multiple attempts using
selection. Bagging makes forecasts by various algorithms. The whole data set was
majority vote of trees via training each tree divided into training set and testing set at the
on bootstrap test of the training. Random ratio of 8:2 for training and testing our model.
feature selection looks at every node for the
best part over a random subset of the
highlights. It is a renowned coordinated A. Visualization
learning algorithm by taking decision tree as
the fundamental classifier. It has We have visualized various contents from the
demonstrated its accomplishment in Global Terrorism Database, to better
applications like email spam filtering, voice understand the Datasets we are provided
classification, and picture classification and with. We have visualized various attributes
from the year 1996 to 2017 that were enlisted
text classifier. To order another document
in the Global Terrorism Database. Following
from the information vector, it passes the
are the diagrams generated from the GTD after
information vector through every one of the
visualization of the contents based on attacks
trees of the forests with each tree giving a
by year, fatalities by year, countries by total
result, i.e an classification, which is named
attack, and attacks by type. The diagrams
as "votes" for that specific class. Like the below have been plotted using “catplot()”.
election results, the last result would be the
class that has the most votes. The main
features of RF are:

 Random Forest performs faster than


bagging and boosting.
 Random Forest is robust to noise and
outliers.
 Random Forest is efficient for huge data
sets.
 Random Algorithm is comparatively
reported to be precise than presently
available algorithms for classification.
 RF is an efficient method for evaluating
missing data. The above bar diagram shows us the number
of attacks carried out every year from the year
1996 to 2017. We can conclude that the
attacks were increasing from the year 2011 to
IV. RESULTS 2014 and then it started to gradually drop.
In this paper we have explored various
classifiers on the basis of their accuracy and
speed. We finally went with kNN algorithm for
Weapon Classifier and Random Forest
algorithm for Perpetrator Classifier. kNN
The above bar diagram shows the number of
fatalities that happened every year from 1996
The above bar diagram depicts the total
to 2017.It is evident that the most fatalities
attacks by type. It is evident that bombing and
happened between the years 2011 to 2017.
explosion followed by Armed Assault are the
most preferred attack types with a total count
of more than 90,000 attacks from the year
1996-2017.

The above diagram lists the countries based


on the most number of terrorist acts acted
upon. It is evident that Iran, Pakistan,
Afghanistan and India are the top 4 countries
where most number of attacks have happened Fig: Terrorism Hotspots
from the year 1996-2017.
The above World map shows the hottest
Terrorist locations, showing active terrorist
encounters in South and East Asia.
B. Weapon Classifier C. Perpetrator Classifier
The weapon classifier was built using kNN The perpetrator classifier classifies various
algorithm that classifies the attacks based on groups or organization that carries out illegal
types of weapons. We have chosen ‘K’ by trial and influences terrorism. We used Random
and error strategy for which we obtained the Forest Algorithm for creating the Perpetrator
ideal outcome. We have calculated the Classifier. We grouped and listed various
neighbors from the GTD where k=12. groups responsible for various attacks from
1996 to 2017 using random forest algorithm.
A list of major perpetrator groups were
displayed, along with the accuracy, precision,
recall and f1after the training of the model.

The above graph was generated using kNN


algorithm by trial and error. The graph shows
the accuracy for 12 different clustering and we
finally got the most effective clustered dataset
when k was equal to 12. Thus, from the
majority of the 12 attributes we predicted the
weapons that could be used. The accuracy that
we got using the kNN algorithm was 88.74%.

Performance Metrics
Following four types of performance
metrics were generated using Random
Forest Algorithm. The formulas for each of
the performance metrics along with the
result obtained from our model are
enlisted below.

 Accuracy_score

The above graph classifies the attacks by Accuracy: 0.9045279383429673


weapon types from the year 1996-2017. It is
evident that explosion and firearms are the
most preferred weapon types.
different methods such as package classifiers
and deep learning models to improvise the
 Precision accuracy of prediction.

VI. REFERENCES
Precision: 0.8995287972392408
1. Crime Data Mining, Threat Analysis and
Prediction. Maryam Farsi,
Alireza Daneshkhah,
Amin Hosseinian Far. (2018)
 Recall
2. Using Fuzzy Sets for Detecting Cyber
Terrorism and Extremism in the Text.
Vahide Nida Uzel , Esra Saraç Eşsiz
Recall: 0.9045279383429673 ,Selma Ayşe Özel. (2018)
3. Psychological and Behavioural
examinations of online terrorism. Sheryl
Prentice, Paul J. Taylor. (2018)
 F1 4. Counter Terrorism on Online Social
Networks Using Web Mining
Techniques. Fawad Ali,
Farhan Hassan Khan, Saba Bashir,
F1: 0.894189908871535 Uzair Ahmad. (2019)
5. Complex Networks for Terrorist Target
Prediction. Gian Maria Campedelli,
Hence, we got accuracy of 90.45%, and precision Iain Cruickshank,
of 89.95% from our model using the Random Kathleen M. Carley.(2018)
Forest Algorithm for creating Perpetrator 6. Prediction of terrorist attacks based on
Classifier. GA-BP neural network. Qinghao
Li, Zonghua Zhang, Zhen Shen. (2019)
7. Events classification and operation
V. Discussion and Conclusion states considering terrorism in security
analysis. A. Torres ; C. Tranchita
Terrorism keeps on being a treat over the
8. Text Classification Techniques Used to
globe. Data Analytics and Machine Learning
provide a promising way to deal with the Faciliate Cyber Terrorism
investigators and rapidly deciding the most Investigation.•David Allister
probable perpetrator of a terrorist attack. In Simanjuntak ; Heru Purnomo Ipung ;
Charles li ; Anto Satriyo Nugroho.2010
our project ,we have demonstrated how the
methods like k-Nearest Neighbour and 9. Terrorism analytics: Learning to predict
Random Forest can predict the perpetrator the perpetrator.Disha Talreja ; Jeevan
precisely eight out of ten times. This enables Nagaraj ; N J Varsha ; Kavi Mahesh.2017
the investigating organizations to reduce the
possibilities and act rapidly to get to the real
perpetrators . We further mean to attempt
10. Positing the problem: enhancing 15. applications of artificial intelligence
classification of extremist web content techniques to combating cyber crimes: a
through textual analysis. George R. S. review , Selma Dilek , Hüseyin Çakır and
Weir ; Emanuel Dos Santos ; Barry Mustafa Aydın. , 2015
Cartwright ; Richard Frank. 2016 16. Mining the Social Web to Analyze the
Impact of Social Media on Socialization
11. Positing the problem: enhancing Md. Nazmus Sadat, Shibbir Ahmed, and
classification of extremist web content Muhammad Tasnim Mohiuddin
through textual analysis. George R. S. 17. Crime Pattern Detection Using Data
Weir ; Emanuel Dos Santos ; Barry Mining Shyam Varan Nath, 2006.
Cartwright ; Richard Frank. 2016 18. Lexicon-Based Methods for Sentiment
12. Development of a Framework for Analysis Maite Taboada, Julian Brooke,
Analyzing Terrorism Actions via Twitter Milan Tofiloski, Kimberly Voll, 2011
Lists .Kuljeet Kaur. 2016 19. Sentiment Analysis of Twitter Data -
13. Anti Social Comment Classification Apoorv Agarwal, Boyi Xie, Ilia Vovsha,
based on kNN Algorithm ,Nidhi Chandra Owen Rambow, Rebecca Passonneau
, Sunil Kumar Khatri , Subhranil Som , 20. Research on Prediction Method of
2017 Terrorist Attack Based on Random
14. An International Study on the Risk of Subspace. Author(s) Luo Zijuan ; Ding
Cyber Terrorism . Suhannia Ponnusamy, Shuai. 2017
Geetha A. Rubasundram , 2019

You might also like