Bot Conversations Are Different: Leveraging Network Metrics For Bot Detection in Twitter

2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
Bot Conversations are Different: Leveraging

Network Metrics for Bot Detection in Twitter
David M. Beskow and Kathleen M. Carley
School of Computer Science
Carnegie Mellon University
5000 Forbes Ave, Pittsburgh, PA 15213, USA
dbeskow@andrew.cmu.edu
kathleen.carley@cs.cmu.edu
Abstract—Automated social media bots have existed almost as political and ideological messaging, and at times accomplish
long as the social media platforms they inhabit. Although efforts this through devious cyber maneuver.
have long existed to detect and characterize these autonomous As these bots are used as one line of effort in a larger op-
agents, these efforts have redoubled in the recent months fol-
lowing sophisticated deployment of bots by state and non-state eration to manipulate the marketplace of information, beliefs,
actors. This research will study the differences between human and ideas, their detection and neutralization become one facet
and bot social communication networks by conducting an account of what is becoming known as social cyber security. Carley
snow ball data collection, and then evaluate features derived from et al. is the first to use this term, and defines it as:
this communication network in several bot detection machine Social Cyber-security is an emerging scientific area
learning models.
focused on the science to characterize, understand,
I. I NTRODUCTION and forecast cyber-mediated changes in human be-
havior, social, cultural and political outcomes, and
Automated and semi-automated social media accounts have to build the cyber-infrastructure needed for society
been thrust into the forefront of daily news as they became to persist in its essential character in a cyber-
associated with several publicized national and international mediated information environment under changing
events. These automated accounts, often simply called bots conditions, actual or imminent social cyber-threats.
(though at times called sybils), have become agents within the [11]
increasingly global marketplace of beliefs and ideas. While Within social cyber security, bot detection and neutraliza-
their communication is often less sophisticated and nuanced tion are quickly becoming a cat and mouse cycle where de-
then human dialogue, their advantage is the ability to conduct tection algorithms continuously evolve trying to keep up with
timely informational transactions effortlessly at the speed of ever-evolving bots. Early detection algorithms exploited the
algorithms. This advantage has led to a variety of creative automated timing, artificial network structure, and unoriginal
automated agents deployed for beneficial as well as harmful meta-data of automated accounts in order to identify them.
effects. While their purpose, characteristics, and “puppet mas- These features are relatively easy for bot puppet-masters to
ters” vary widely, they are undeniably present and active. Their manipulate, and we are now seeing automated accounts that
effect, while difficult if not impossible to measure, is tangible. have meaningful screen names, richer profile meta-data, and
Automated and semi-automated accounts are used for a more reasonable content timing and network characteristics.
wide variety of reasons, creating effects that can be positive, We are also seeing an increasing number of accounts
nuisance, or malicious. Examples of positive bots include that we call “bot assisted” or “hybrid” accounts. Although
personal assistants and natural disaster notifications. Nuisance researchers often attempt a binary classification of bot or
bots are typically involved in some type of ‘spam’ distribution human, the reality is that there is a spectrum of automated
or propagation. The spam content ranges from commercial involvement with an account. Many accounts are no longer
advertising to the distribution of adult content. Malicious bots strictly automated (all content and social transactions executed
are involved in propaganda [31], suppression of dissent [37], by a computer). These accounts will have human intervention
and network infiltration/manipulation [5]. to contribute nuanced messaging to two-way dialogue, but
Malicious bots have recently gained wide-spread notoriety will have a computer executing a variety of tasks in the
due to their use in several major international events, including background. Grimme et al. [21] discusses this spectrum in
the British Referendum known as “Brexit” [23], the American detail, describing how ‘social bots’ are created, used, and how
2016 Presidential Elections [9], the aftermath of the 2017 ‘hybridization’ can be used to bypass detection algorithms
Charlottesville protests [19], the German Presidential Elections (in their case successfully bypassing the ‘Botornot’ algorithm
[33], the conflict in Yemen [30], and recently in the Malaysian discussed later in this paper).
presidential elections [4]. These accounts attempt to propagate We hypothesize that one of the strongest characteristics of
IEEE/ACM ASONAM 2018, August 28-31, 2018, Barcelona, Spain
978-1-5386-6051-5/18/$31.00
c 2018 IEEE
bots is that they are not involved in social networks and social
825
communication in the same way that humans are. Like other A. Machine Learning Techniques
complex systems (natural ecosystems, weather systems, etc), As noted above, Twitter bot detection has primarily used
social interaction and relationships is the result of myriads of Machine Learning models. The supervised machine learning
events and stimuli in both the real and virtual worlds. Like models used for bot detection include Naive Bayes [13], Meta-
the weather, the resulting phenomena is difficult to perfectly based [26], SVM [28], and Neural Network [25]. The unsu-
replicate, even with the most sophisticated algorithms. Many pervised machine learning models used include hierarchical
bots are programmed to interact with each other as a bot [28], partitional [18], PCA-based [38], Stream-based [32],
network, and attempt to interact with humans, but many and correlated pairwise similarity [12]. Most of these efforts
features of these interactions will be ‘robotic’. Even ‘hybrid’ leverage data collected from the basic tweet object or user
accounts will have some level of artificial and inorganic object (later in this paper we will define this as Tier 0 or Tier
structure to their communication network. This area of bot 1).
detection in Twitter is largely unexplored, primarily because In 2014, Indiana University launched one of the more
the rich network data (both the friends/followers network as prominent supervised machine learning efforts with the Bot
well as their conversational network) are very time consuming or Not online API service [14] (the service was recently
to collect. We therefore set out to collect the data to charac- rebranded to Botometer). This API uses 1,150 features with
terize the social network(s) and social conversation(s) that a a random forest model trained on on a collage of labeled
twitter account participates in, describe these networks with data sets to evaluate whether or not an account is a bot.
various network metrics, leverage these rich network metrics Botometer leverages network, user, friend, temporal, content,
in traditional machine learning models, and evaluate whether and sentiment features with Random Forest classification [17].
the time involved creates substantial value.
B. Network based techniques
A. Research questions Networks are an extremely important part of bots, bot
behavior and bot detection. Aiello et al. [2] discusses the
1) Do bot Twitter accounts have fundamentally different impact of bots on influence, popularity, and network dynamics.
conversational network structures than human managed Adewole et al. [1] highlights that network features are robust
accounts? to criminal manipulation.
2) Can the measured differences between bot and human One approach to leveraging network structure involves com-
conversation networks lead to increased accuracy in bot munity based bot/sybil detection. While community detection
detection? has been effectively implemented on Facebook [39] and Seino
This paper will begin by discussing past bot detection Weibo [29], it has only recently been used on Twitter Data due
techniques, as well as summarize historical techniques for the strict friend/follower rate limiting discussed above. Only
extracting features from network structures. Next we discuss recently has Benigni et al. [6] used dense subgraph detection
our data collection, data annotation, and methodology for to find extremists and their supporting bots in Twitter.
creating ego-network metrics. Finally we’ll describe training Most research that uses networks for bot detection with
and testing traditional machine learning algorithms and present Twitter Data are in fact creating network based metrics and
our results. introducing these features in traditional machine learning
models. As discussed below, the most challenging part of this
II. R ELATED W ORK type of research is focused on how to build networks from
limited data. The closest works to ours were performed by [10]
Since the early efforts to conduct bot/spam detection, nu- in 2013 and [3] in 2016. Both research efforts used network
merous teams have developed a variety of models to detect features along with profile and temporal features from a Twit-
these. While similar, these models will differ based on the ter Sample Stream without any snowball sampling enrichment.
underlying data they were built on (for example many com- They created an egocentric network that involved ego, alters,
munity detection and clickstream models were developed for with links between alters for both following and mention ego
Facebook, while the overwhelming majority of models built centric networks. Having done this, they calculated content,
on Twitter data use Supervised and Unsupervised Machine profile, and social interaction features. Their network features
learning [1]). Even in Twitter bot detection, these models can were restricted to centrality measures, density measures, and
be grouped by either the models/methods or by the data that weak and strongly connected components. A similar earlier
they use. work by [10] attempts to use community features (number of
Adewole et al. [1] reviewed 65 bot detection articles (articles communities, core/periphery, foreign in/out degree, etc). This
from 2006 - 2016) and found that 68% involved machine was applied to both Facebook data and the Enron email data
learning, 28% involved graph techniques (note that these (not to Twitter).
include some machine learning algorithms that rely heavily on Additionally, the Botometer algorithm leverages some net-
network metrics), and 4% involved crowd-sourcing. Below we work features extracted from the user timeline. This includes
will summarize the salient works under each of these modeling metrics on the retweet network, mention network, and hashtag
techniques. co-occurrence network. The metrics include density, degree
826
distributions, clustering coefficient, and basic network charac- strategies. Account based approaches will only use data objects
teristics. The Botometer algorithm does not conduct a snowball directly tied to the user (user JSON object, user time-line
collection of friends or followers, but does appear to collect object, etc). Stream-based approaches extract features from a
user objects for accounts found in the timeline as a retweet or given topical stream or twitter stream sample. These stream
mention [17]. based features are often network features, but represent a small
To date our team has not found supervised learning bot fraction of the ego-centered network of a given account. Our
detection research that leverages extensive snowball sampling research therefore pursues an account based approach to build
to build ego networks. a fuller representation of the account’s ego network.
Researchers must find a balance between speed and richness
C. Contributions of this work data. Past account focused research generally falls into four
While we discuss above several other research attempts to tiers. Table I provides a description for each tier of data
use network metrics in a bot detection feature space, these collection, the estimated time it would require to collect this
have largely relied on the mention network extracted from data for 250 accounts, and the amount of data that would be
any Twitter query/stream. Ego-centric networks built on a available for feature engineering per account.
single stream/query arguably contain only a small subset of the In earlier research our team proposed a tiered approach to
overall account ego network. Researchers have not attempted bot detection [7] that mirrors the data tiers introduced above.
to build this ego network based on snowball sampling [20] This tiered approach creates a flexible bot-detection “tool-box”
with a seed node since this requires significant time given the with models designed for several scenarios. Some research
extent of the data and the strict API rate limits that Twitter requires bot detection at such a scale, that models based on
imposes on friend/follower data. Our research has taken the Tier 0 or Tier 1 are the only feasible option. At other times,
time to build this rich conversational network in a novel way, highly accurate classification of a few accounts is required.
and then evaluate whether the time and effort render sufficient In these cases, models based on Tier 2 or Tier 3 data are
value. preferred. This paper proposes an approach to Tier 2-3 bot
This work additionally creates and explores bot detection detection that builds on the previous Tier 0 [8] and Tier 1
metrics that require greater effort and sophistication to circum- [7] research and relies heavily on network metrics collected
vent. Currently, bot-herders can circumvent current algorithms through single seed snowball sampling.
by changing their screen name, adding account meta-data,
spending additional time selecting a unique profile picture, TABLE I: Four tiers of Twitter data collection to support
and creating a more realistic tweet inter-arrival time. They account classification (originally presented in [7])
can also deploy bots in bot networks, therefore artificially Collection # of Data
manipulating friend/follower values to appear like they are Tier Description Focus
Time Entities
per 250 (i.e.
popular. However, it will arguably require significantly more Accounts tweets)
sophistication to change the centrality, components, or triadic Tweet text
relationships in the conversations that they participate in. Tier 0 Semantics N/A** 1
only
By increasing the cost to deploy and operate bots, it may Tier 1
Account Account
∼ 1.9 sec 2
economically force “bot-herders” out of their devious market. + 1 Tweet Meta-data
Account Temporal
Finally, the bot-hunter framework builds on the multi-tiered Tier 2 ∼ 3.7 min 200+
+ Timeline patterns
bot detection approach that we introduced in [7]. This multi- Account
tiered approach provides researchers and government or non- + Timeline Network pat-
Tier 3 ∼ 20 hrs 50,000+
+ Friends terns
governmental agencies with a “tool-box” of models designed Timeline
for different classes of bots as well as different scales of ** This tier of data collection was presented by [25] and assumes
data (designed for either high volume of high accuracy). This the status text is acquired outside of the Twitter API
multi-tiered approach acknowledges that there is not a one-
size fits all model/approach that will work for all bot detection
requirements. By merging and expanding on past bot detection B. Data required for account conversation networks
research, we can create an easy to use “tool box” that can Detailed ego network modeling of a Twitter account’s social
address several bot-detection requirements. interactions requires Tier 3 data collection, but to date our
team has not found any research that has conducted that level
III. DATA of data collection to model the network structures and social
Our team used the Twitter REST and Streaming API’s to conversations that an automated Twitter account interacts with.
access the data used in this research effort. Details of this In fact, few teams go beyond basic in-degree (follower count)
process are provided below. and out-degree (friend count) network metrics found in Tier
1 meta-data. The closest effort to date is the Botometer
A. Overview of Available Data model, which arguably operates at Tier 2. By adding the
Research is loosely divided between account-focused data user timeline, Tier 2 provides limited network dynamics, to
collection strategies and topical or stream based collection include being able to model hashtag and URL co-mentions in a
827
meta-network. The resulting timeline based network, however, Conversation of Target Node and Followers
lacks comprehensive links between alters. While the time-line 1 Hop Snowball 2 Hop Snowball
Seed Node
can provide rich temporal patterns, we found that it lacked
sufficient structure to model the ego network of an actor.
We set about to build the social network and social conver- 3 Get alter time-lines
sations that a twitter account is interacting with. We also tried

to do this in a way that would expedite the time it takes to 2
Get followers
(max 250)
collect the data and measure network metrics. Our initial goal
was to collect data, build the feature space, and classify an
account within 5 minutes. We selected the five minute limit in Get user data
an attempt to process ∼ 250 accounts per day with a single and time-line
thread 1
To collect the necessary data, we executed the following

steps sequentially:
1) Collect user data object
2) Collect user timeline (last 200 tweets)
3) Collect user followers (if more than 250, return random
sample of 250 followers) following
4) Collect follower timelines (last 200 tweets) retweet

mention
When complete, this data collection process (illustrated in reply
Figure 1) creates up to 50,000 events (tweets) that represent

the conversation and virtual social interaction that the user and
their followers participate in. Fig. 1: Illustration of 2-hop Snowball Sampling
The resulting network, while partially built on social net-
work structure (the initial following relationship), is primarily
focused on the larger conversation they participate in. We follow/friend relationships are an easy metric for bot herders
initiated the single seed snowball by querying followers rather to simulate and manipulate with elaborate bot nets. Complex
than friends since followers are much less controlled by the conversations, however, are much harder to simulate, even in a
bot-herder, and contain fewer news and celebrity accounts. We virtual world. Additionally, adding the following links between
conducted a timeline rather than followers search for the 2nd the ego and alters would have created a single large connected
hop of the snowball to overcome rate-limiting constraints and graph. By leaving them out, we were able to easily identify
to model the conversation network rather than directly model the natural fragmentation of the social interaction.
the social network. This single seed snowball process conducts
a limited breadth-first-search starting with a single seed and C. Visualizing conversations
terminating at a depth of 2. During our initial exploration, we visualized these con-
Artificially constraining the max number of alters at 250 versations for both human accounts and bot accounts. A
was a modeling compromise that facilitates the self-imposed comparison of these conversations is provided in Figure 2.
5 minute collect/model time horizon. The choice of 250 allows Note that bots tend to get involved in isolated conversations,
our process to stay under 5 minutes, and also represents the and the followers of the bot are very loosely connected. The
upper bound of Dunbar’s number (the number of individuals network created from a human virtual interaction on Twitter,
that one person could follow based on extrapolations of is highly connected due to shared friendship, shared interests,
neocortex size) [16]. Additionally, in evaluating a sample of 22 and shared experiences in the real world.
million twitter accounts, we found that 46.6 % had less than
250 followers. This means that approximately 50% of accounts
will have their entire ego network modeled. Bots tend to have
fewer followers than human accounts and from the 297,061
annotated bot accounts that we had available for this research,
72.5% of them had fewer than 250 followers. Given that this
compromise will only affect 25% of the bot accounts and 50%
of all accounts, we felt that it was appropriate.
We used this data to create an agent to agent network
where links represent one of the following relationships: (a) Human conversation (b) Bot conversation
mention, reply, retweet. These collectively represent the paths
of information and dialogue in the twitter “conversation”. We Fig. 2: Differences between a human Twitter conversation(s)
intentionally did not add the follow/friend relationships in the and a bot Twitter interaction(s)
network (collected in the first hop of the snowball) since
828
TABLE II: Features by data collection tier
Source User Attributes Network Attributes Content Timing
screen name length number of friends Is last status retweet? account age
default profile image? number of followers same language? avg tweets per day
(Tier 1)
Object
entropy screen name number of favorites hashtags last status
User
has location? mentions last status
total tweets last status sensitive?
source (binned) ’bot’ reference?
number nodes of E mean/max mentions entropy of inter-arrival
number edges mean/max hash max tweets M/H/D
Timeline
(Tier 2)
density number of languages

components fraction retweets
largest compo
degree/between centrality
# of bot friends
number of nodes
number of links
density
number of isolates
number of dyad isolates
number of triad isolates
number of components > 4
Snowball
(Tier 3)
Sample
clustering coefficient
transitivity
reciprocity
degree centrality
K-betweenness centrality
mean eigen centrality
number of simmelian ties
number of louvain groups
size of largest louvain group
ego effective size
full triadic census
D. Annotated Data Past research has estimated that 5-8% of twitter accounts
are automated [36]. If this is true, then we mis-labeled a
For annotated bot data, we used three data sets. The first data small amount of our accounts as human. We believe this is
set is a large diverse bot data set that was collected by detecting acceptable noise in the data, but will limit the performance of
15 digit random alpha-numeric strings as indicated in [8] (a supervised machine learning models.
data annotation method using a Tier 0 model). This method
provided 262,097 annotated bot accounts. From this data we IV. M ETHODS
built network metrics on 13,352 of these accounts. The second In this section we will introduce our feature engineering and
data set is the Texas A&M Spammers Honeypot data [27] modeling approach.
used on numerous bot research efforts. Our team found 15,743
of these accounts still active, from which we built network A. Feature Engineering
metrics on 1,986 accounts. Finally, we used an event-oriented We extracted features from Tier 0 through Tier 3, with a
data set. This data set contained 19,221 accounts that were focus on measuring the importance of features extracted from
involved in a publicized bot attack against the Atlantic Council Tier 3. The table of proposed features is provided in Table II.
Digital Forensic Labs (DFR Lab), and tangentially against Note that our Tiered approach is cumulative, meaning Tier
the NATO Public Affairs Office [34]. This attack primarily 3 feature space includes features from Tier 0, Tier 1, and Tier
occurred between 28 August and 30 August 2017. Immediately 2. The Tier 3 model therefore includes the Tier 2 network
after the incident, our team collected tweet/user JSON data as features created by building an entity (mention, hashtag, and
well as user timeline data for each of these accounts, and URL) co-mention network based only on the user’s time-
friends/followers for 935 accounts. Since the incident, 95% line (last 200 tweets). These Tier 2 network features are
of the users have been suspended, validating our collection distinguished in our results section by the entity prefix.
methodology, but limiting the accounts available for snowball We have not found research that has built a snowball
sampling to the 935 active accounts. sampling network for bot detection, and believe that all of
In order to train a model, we also needed accounts annotated the Snowball Sampling ego network features in our model
as human. We used the Twitter Streaming API to collect a are novel. To collect these at scale, our team built a Python
sample of normal Twitter data, intentionally collecting both package that wrapped around the networkx package [22]. We
weekend and weekday data. This provided 149,372 accounts leveraged known network metrics, which are provided in Table
to tag as human Twitter accounts. Of these accounts, we were II with references. Calculation of Simmelian ties [24] was
able to collect/measure network metrics on 10,692 accounts. not available in the networkX package. Our team therefore
829
created a Python implementation of Dekker’s version [15] of
the original algorithm [24]. random_tier3 ●
For all data sets, human data was sampled so that the classes
random_tier2
were balanced. The random forest algorithm was used because
of its superior performance on Tier 1 data [7] and its use random_tier1 ●
in other bot detection algorithms [36]. Training, evaluation,

and testing were conducted in the scikit-learn Python package nato_tier3
[35]. Tuning of the Random Forest algorithm was conducted

through random search of parameter options while using 3 nato_tier2 ● ●
fold cross-validation.
Training Data / Tier

The default bot-hunter behavior returns a binary classifi- nato_tier1
cation for the user. This differs from other research efforts
combined_tier3
(notably the Botometer approach), which returns an estimated
probability that the account is a bot. The likelihood estimate combined_tier2
can assist in measuring hybridization on a continuous scale,
but requires each research team to determine the threshold combined_tier1 ●● ●
that they will use to delineate bots. It is difficult to compare

results between research teams using this framework because cav_tier3
many inevitably use different thresholds. It also allows re-

search teams to choose conservative or aggressive thresholds cav_tier2
in order to support their desired narrative. By choosing a

cav_tier1 ● ●
binary classification, our model will provide some degree of

simplicity, reproducibility, and consistency at the cost of the 0.85 0.90 0.95 1.00
ROC AUC
additional information that a likelihood estimate provides.
Fig. 3: Results by training data (Caverlee, Nato, Random and
V. R ESULTS Combined data sets) and by Tier
After building the network metrics for all bot data sets as
well as the annotated human data, we built and evaluated
Random Forest models for each of the data sets as well as last status. We also observe significant prediction improvement
a model built on a combination of all of the bot training between Tier 1 and Tier 2 for all models except those trained
data. Training, evaluating, and testing were conducted at Tier on the Caverlee Data. We see less improvement between Tier 2
1, Tier 2, and Tier 3. We evaluated in-sample performance and Tier 3, though it is statistically significant for Combined
with 10 fold cross-validation measuring Area Under the Curve and Random data sets but not the NATO data (p.values are
(AUC) of the Receiver Operating (ROC) Curve. AUC has 0.02758, 0.04606, 0.1404 respectively).
been used to evaluate other studies, namely the Botometer The high performance of the Caverlee data is primarily due
framework [36], allowing us to compare our results. We also to the friend and follower distributions in this data. These
tested generalizable by testing each trained model on out- accounts were originally labeled in 2011, meaning they are
of-sample data; namely the other data sets. Additionally, we exclusively older and mature accounts (because of this fact, we
reported both precision and recall in order to evaluate how removed account age from the feature space when training or
models differ in bots vs. human classification. The results are testing with the Caverlee data). These mature spam accounts
provided in Table III and Figure 3. have had time to develop large followings. The median number
of followers for the Caverlee bots is 1190, whereas the median
TABLE III: Table of Results number of followers for the human sample is 287 and for the
Data random bots is 53. All three Caverlee models (Tier 1, 2, and 3)
NATO Data Random Data were able to separate bots from humans based almost solely
on the follower and friend count, resulting in equally high
model Metric Tier1 Tier2 Tier3 Tier1 Tier2 Tier3
performance for all tiers on this data.
AUC 0.657 0.621 0.633 0.990 0.992 0.991
Random Precision 0.869 0.903 0.897 0.987 0.990 0.989
The final and important observation that we see in Table III
Recall 0.374 0.272 0.301 0.989 0.991 0.990 is that our-of-sample classification (training on one type of bot
AUC 0.817 0.854 0.858 0.947 0.964 0.966 data and testing on another type of bot data) resulted in lower
Combined Precision 0.934 0.958 0.973 0.903 0.929 0.925 performance. In Table III we see that training on Random
Recall 0.685 0.742 0.738 0.925 0.946 0.945 String data and testing on NATO data only found 37% of the
bots at Tier 1 and 30.1% at Tier 3 as measured by recall. This
From the results presented in Table III and Figure 3, we result underscores the need for relevant training data for any
see that Tier 1 models continue to provide solid performance, given bot application. It also subtly highlights that Tier 2 and
even with basic features extracted from the user profile and Tier 3 are more prone to over-fitting.
830
Center for Computational Analysis of Social and Organization
Systems (CASOS). The views and conclusions contained in
this document are those of the authors and should not be in-
terpreted as representing the official policies, either expressed
or implied, of the ONR, ARL, DTRA, or the U.S. government.
R EFERENCES
[1] Kayode Sakariyah Adewole, Nor Badrul Anuar, Amirrudin Kamsin,
Kasturi Dewi Varathan, and Syed Abdul Razak. Malicious accounts:
dark of the social networks. Journal of Network and Computer
Applications, 79:41–67, 2017.
[2] Luca Maria Aiello, Martina Deplano, Rossano Schifanella, and Gian-
carlo Ruffo. People are strange when youre a stranger: Impact and
influence of bots on social networks. Links, 697(483,151):1–566, 2012.
[3] Abdullah Almaatouq, Erez Shmueli, Mariam Nouh, Ahmad Alabdulka-
reem, Vivek K Singh, Mansour Alsaleh, Abdulrahman Alarifi, Anas
Alfaris, et al. If it looks like a spammer and behaves like a spammer,
it must be a spammer: analysis and detection of microblogging spam
accounts. International Journal of Information Security, 15(5):475–491,
Fig. 4: Top 15 Features for Combined Tier 3 Model 2016.
[4] A. Ananthalakshmi. Ahead of malaysian polls, bots flood twitter with
pro-government..., Apr 2018.
[5] Matthew Benigni and Kathleen M Carley. From tweets to intelligence:
Further, in Figure 4 we see the top 15 features and the Understanding the islamic jihad supporting community on twitter. In
percentage that each contributed to the model predictions for Social, Cultural, and Behavioral Modeling: 9th International Confer-
the Combined Tier 3 Model. We see that network features ence, SBP-BRiMS 2016, Washington, DC, USA, June 28-July 1, 2016,
Proceedings 9, pages 346–355. Springer, 2016.
provide strong features in the model. This demonstrates that [6] Matthew C Benigni, Kenneth Joseph, and Kathleen M Carley. Online
these values, while tedious to collect, transform, and model, extremism and the communities that sustain it: Detecting the isis
provide strong predictive features that are difficult for bot supporting community on twitter. PloS one, 12(12):e0181405, 2017.
[7] David Beskow and Kathleen M Carley. Introducing bothunter: A tiered
puppet master to manipulate. From this we see that centrality approach to detection and characterizing automated activity on twitter.
measures (particularly graph betweenness) as well as size In Halil Bisgin, Ayaz Hyder, Chris Dancy, and Robert Thomson, editors,
of groups/components provide strong predictors of automated International Conference on Social Computing, Behavioral-Cultural
Modeling and Prediction and Behavior Representation in Modeling and
behavior. Simulation. Springer, 2018.
[8] David Beskow and Kathleen M Carley. Using random string classi-
VI. C ONCLUSION AND F UTURE W ORK fication to filter and annotate automated accounts. In Halil Bisgin,
Ayaz Hyder, Chris Dancy, and Robert Thomson, editors, International
In our pursuit of a multi-model bot detection toolbox, this Conference on Social Computing, Behavioral-Cultural Modeling and
paper builds on past research by adding a model that leverages Prediction and Behavior Representation in Modeling and Simulation.
Springer, 2018.
a feature space extracted from 50,000+ entities collected with [9] Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 us
single seed snowball sampling. This model is developed for presidential election online discussion. 2016.
high accuracy but low volume applications. Our research [10] Sajid Yousuf Bhat and Muhammad Abulaish. Community-based features
for identifying spammers in online social networks. In Advances in
shows that supervised machine learning models are able to Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM
leverage these added network metrics to increase prediction International Conference on, pages 100–107. IEEE, 2013.
performance over Tier 1 models. Additionally, these network [11] Kathleen M Carley, Guido Cervone, Nitin Agarwal, and Huan Liu.
Social cyber-security. In Halil Bisgin, Ayaz Hyder, Chris Dancy, and
features offer an approach for modeling and detecting bot Robert Thomson, editors, International Conference on Social Com-
behavior that is difficult for bot puppet-masters to manipulate puting, Behavioral-Cultural Modeling and Prediction and Behavior
and evade. We also found selection of relevant training data Representation in Modeling and Simulation. Springer, 2018.
[12] Nikan Chavoshi, Hossein Hamooni, and Abdullah Mueen. Debot:
is as important as model selection. Training on one class of Twitter bot detection via warped correlation. In ICDM, pages 817–822,
bots and testing on another class produced limited predictive 2016.
power for all models. [13] Chia-Mei Chen, DJ Guan, and Qun-Kai Su. Feature set identification for
detecting suspicious urls using bayesian classification in social networks.
Future research will explore additional content, profile, and Information Sciences, 289:133–147, 2014.
temporal features available from the single node snowball [14] Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini,
data that we produced with this research. Additionally, future and Filippo Menczer. Botornot: A system to evaluate social bots. In
Proceedings of the 25th International Conference Companion on World
research will seek to build a multi-class bot detection model. Wide Web, pages 273–274. International World Wide Web Conferences
Steering Committee, 2016.
ACKNOWLEDGMENT [15] DJ Dekker. Measures of simmelian tie strength, simmelian brokerage,
and, the simmelianly brokered. 2006.
This work was supported in part by the Office of Naval [16] Robin IM Dunbar. Coevolution of neocortical size, group size and
Research (ONR) Multidisciplinary University Research Initia- language in humans. Behavioral and brain sciences, 16(4):681–694,
tive Award N000140811186 and Award N000141812108, the 1993.
[17] Emilio Ferrara. Measuring social spam and the effect of bots on
Army Research Laboratory Award W911NF1610049, Defense information diffusion in social media. arXiv preprint arXiv:1708.08134,
Threat Reductions Agency Award HDTRA11010102, and the 2017.
831
[18] Kahina Gani, Hakim Hacid, and Ryan Skraba. Towards multiple identity International Conference on Web-Age Information Management, pages
detection in social networks. In Proceedings of the 21st International 554–558. Springer, 2015.
Conference on World Wide Web, pages 503–504. ACM, 2012. [30] Al Bawaba The Loop. Thousands of twitter bots are attempting to
[19] April Glaser. Russian bots are trying to sow discord on twitter after silence reporting on yemen. 2017.
charlottesville. 2017. [31] Cristian Lumezanu, Nick Feamster, and Hans Klein. # bias: Measuring
[20] Leo A Goodman. Snowball sampling. The annals of mathematical the tweeting behavior of propagandists. In Sixth International AAAI
statistics, pages 148–170, 1961. Conference on Weblogs and Social Media, 2012.
[21] Christian Grimme, Mike Preuss, Lena Adam, and Heike Trautmann. [32] Zachary Miller, Brian Dickinson, William Deitrick, Wei Hu, and
Social bots: Human-like by means of human control? Big data, Alex Hai Wang. Twitter spammer detection using data stream clustering.
5(4):279–293, 2017. Information Sciences, 260:64–73, 2014.
[22] Aric Hagberg, Pieter Swart, and Daniel S Chult. Exploring network [33] LM Neudert, B Kollanyi, and PN Howard. Junk news and bots during the
structure, dynamics, and function using networkx. Technical report, Los german federal presidency election: What were german voters sharing
Alamos National Lab.(LANL), Los Alamos, NM (United States), 2008. over twitter?, 2017.
[23] Philip N Howard and Bence Kollanyi. Bots,# strongerin, and# brexit: [34] Benjamin Nimmo. #botspot: The intimidators, August 2017. [Online;
Computational propaganda during the uk-eu referendum. Browser posted 30 August 2017].
Download This Paper, 2016. [35] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-
[24] David Krackhardt. The ties that torture: Simmelian tie analysis in
plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-
organizations. Research in the Sociology of Organizations, 16(1):183–
esnay. Scikit-learn: Machine learning in Python. Journal of Machine
210, 1999.
Learning Research, 12:2825–2830, 2011.
[25] Sneha Kudugunta and Emilio Ferrara. Deep neural networks for bot [36] Onur Varol, Emilio Ferrara, Clayton A Davis, Filippo Menczer, and
detection. arXiv preprint arXiv:1802.04289, 2018. Alessandro Flammini. Online human-bot interactions: Detection, esti-
[26] Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social mation, and characterization. arXiv preprint arXiv:1703.03107, 2017.
spammers: social honeypots+ machine learning. In Proceedings of the [37] John-Paul Verkamp and Minaxi Gupta. Five incidents, one theme:
33rd international ACM SIGIR conference on Research and development Twitter spam as a weapon to drown voices of protest. In FOCI, 2013.
in information retrieval, pages 435–442. ACM, 2010. [38] Bimal Viswanath, Muhammad Ahmad Bashir, Mark Crovella, Saikat
[27] Kyumin Lee, Brian David Eoff, and James Caverlee. Seven months with Guha, Krishna P Gummadi, Balachander Krishnamurthy, and Alan
the devils: A long-term study of content polluters on twitter. In ICWSM, Mislove. Towards detecting anomalous user behavior in online social
2011. networks. In USENIX Security Symposium, pages 223–238, 2014.
[28] Sangho Lee and Jong Kim. Early filtering of ephemeral malicious [39] Haifeng Yu, Michael Kaminsky, Phillip B Gibbons, and Abraham Flax-
accounts on twitter. Computer Communications, 54:48–57, 2014. man. Sybilguard: defending against sybil attacks via social networks. In
[29] Dehai Liu, Benjin Mei, Jinchuan Chen, Zhiwu Lu, and Xiaoyong ACM SIGCOMM Computer Communication Review, volume 36, pages
Du. Community based spammer detection in social networks. In 267–278. ACM, 2006.
832

Bot Conversations Are Different: Leveraging Network Metrics For Bot Detection in Twitter

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bot Conversations Are Different: Leveraging Network Metrics For Bot Detection in Twitter

Uploaded by

Copyright:

Available Formats

2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

Bot Conversations are Different: Leveraging

sations that a twitter account is interacting with. We also tried

To collect the necessary data, we executed the following

4) Collect follower timelines (last 200 tweets) retweet

Figure 1) creates up to 50,000 events (tweets) that represent

density number of languages

in other bot detection algorithms [36]. Training, evaluation,

[35]. Tuning of the Random Forest algorithm was conducted

Training Data / Tier

that they will use to delineate bots. It is difﬁcult to compare

many inevitably use different thresholds. It also allows re-

in order to support their desired narrative. By choosing a

binary classiﬁcation, our model will provide some degree of

You might also like