The Presence of Bots and Cyborgs in The #FeesMustFall Campaign

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

The Presence of Twitter Bots and Cyborgs in the

#FeesMustFall Campaign
Yaseen Khan Surendra Thakur
Department of Information Technology KZN eSkills CoLab,
Durban University of Technology Durban University of Technology
Durban, South Africa Durban, South Africa
khanyas786@gmail.com thakur@dut.ac.za

Abstract—Internet platforms such as Twitter allow cause- The paper next introduces and discusses the applicability
related campaigning as well as analysis through the of Twitter for analysis of social media campaigns (II) as well
opportunistic classification and aggregation capability provided as related works on bot detection and deployment (III). The
by the hashtag (#). South African students leveraged Twitter to data collection and processing (IV) will then be explained and
launch and sustain a campaign now known as the #FeesMustFall this is followed by the methodology for identifying
campaign. This campaign aimed to lobby government to provide automation by users (V), findings and discussion (VI), future
free university education to disadvantaged students. This study works (VII) and the conclusion (VIII).
examines the #FeesMustFall campaign to determine if
automated software robots played a role. The research question II. TWITTER AND #FEESMUSTFALL
was “Did bots and cyborgs play a role in the #FeesMustFall
campaign?” 576 823 tweets were harvested, and the data was Twitter proved particularly useful because of the
cleaned by removing duplicate entries. The remaining 490 449 classification and aggregation capability of hashtags. Our
tweets and 90 783 unique users were used to analyze tweet particular area of interest with the Twitter platform, was
behavior in terms of frequency, volume, content and tweet Twitter bots and cyborgs and whether they exerted any
source. The results show that bots and cyborgs did indeed play influence during the campaign. Twitter bots and cyborgs in the
a role. This is a significant finding as #FeesMustFall is the first broader spectrum, Social Robots, have impacted campaigns
major South African campaign to leverage bots and cyborgs. An worldwide with the famous being the 2016 United States
important additional finding was the DeBot API revealed 4 bots presidential campaign [2] and the “Arab Spring” [3]. In South
not found in our harvested tweets while other trait-driven Africa, Twitter has been successfully used to assist the rescue
techniques used identified suspicious accounts which revealed of a ‘carjacked victim’ after the hostage tweeted from the
two bot or cyborg accounts ranked 1st and 2nd amongst the trunk of his hijacked car to his girlfriend, who, in turn
highest tweeters. This demonstrated a presence of bots during retweeted the message. The message which went viral and the
the campaign that assisted in the amplification of the victim was saved within 3 hours [4].
#FeesMustFall hashtag on Twitter.
Lobbyists, activists and hackers use or deploy bots for
Keywords—Social Robots, Software Robots, bots, cyborgs, their agenda by attempting to manipulate social media users’
#FeesMustFall opinions through automated social engagements. The agenda
may be personal, cause-related, political or financial. This
I. INTRODUCTION paper attempts to highlight the presence of Twitter bots and
South Africa have recently experienced a unique cyborgs without discussing their textual context and
university student-driven social activists’ campaign called interpreting their intentions which is a social study. Social
#FeesMustFall which dominated local and social media media platforms contain huge amounts of unstructured
platforms. The #FeesMustFall is a youth movement whose opinionated data [5] which require unique analytics to uncover
objectives are on reducing university fees with an aim of rich information. This is a reason why Twitter and Facebook
achieving free education for all [1].The use of Social Media inter alia gained the interest of researchers.
has grown exponentially this past decade nurturing online Further, Twitter was selected as the preferred platform for
activism or ‘slacktivism’. This use has allowed research this task due to the convenient nature of how Twitter data
particularly on Twitter primarily due to the classification and (tweets) are categorized simplifying searching, filtering,
aggregation capability offered by the hashtag (#). The hashtag downloading and analysis. The timeline determined for the
allows for a topic to be easily shared worldwide users by collection of Twitter data was from the first mention of the
enabling others to join the conversation through simply using #FeesMustFall, on 21 March 2015, until the 10th April 2017
the same hashtag. This provides the opportunity for easy when the data was received from a professional service
tracking of topics such as #FeesMustFall. Further, Twitter provider, Podargos [6], rather than Twitter itself due to high
allows for the development of software robots (bots) or comparative costs [7]. During the data analysis research,
cyborgs to alternatively rebut or amplify a particular point of intriguing social behaviour exhibited by top tweeting users
view pursuant to the views of the originator. #FeesMustFall were found with respect to, inter alia, their frequency, volume
became desirable to investigate on social media as it is the first and content. Upon further investigation certain users
major youth driven online campaign in the age of the Fourth displayed automotive behavior. Examples included tweeting
Industrial Revolution (4IR) within South Africa generating messages with fixed constant intervals (Fig. 1) while another
significant media coverage through headlines bannering included tweeting multiple times per second for several
‘#FeesMustFall’. seconds producing an abnormal burst in tweeting activity (Fig.
2). This paper refers to burst mode tweeting as the posting or
retweeting of many tweets within a very short space of time.
978-1-5386-6477-3/18/$31.00 ©2018 IEEE
III. BOT DEPLOYMENT AND DETECTION
In the DARPA Twitter bot challenge [10] parties were
tasked to develop efficient bot detection approaches on
Twitter using any approach they desired. It was found that
machine learning techniques on their own were considered
insufficient because of the scarcity of training data and that
semi-automated processes that combined machine learning
were more useful. Popular techniques in detecting bots and
cyborgs make use of entropy, spam and account properties
components [8], [10], [11], [12], [13]. This is built on the
premise and subsequent research that activity by human users
differs significantly to bots and cyborgs [8]. A warp
correlation finder named DeBot was developed by [14] that
detects highly synchronous user accounts over a long period
in Twitter and observes that human users cannot achieve such
levels of synchronisation over lengthy periods and therefore
detected accounts are much more likely to be bots. They have
Figure 1. Timeline of Tweets in a one-hour scale for User 1
made access to this finder public via an Application
programming interface (API) [15] which has been used in this
research to identify bots on Twitter during the #FeesMustFall
campaign.
As outlined by [8] several features were used in the
detection of bots and user accounts which are separated into 3
categories with ‘Cyborg’ included from the usual human and
bot classification. The researchers made use of an entropy-
based, a spam detection, an account properties component and
a decision maker component to identify bots, cyborgs and
humans. Our research, however, focuses on a technique based
on volume, frequency and source of tweets to identify
automation of tweets. Another publicly-available online
service, Botometer (previously BotOrNot), evaluates the
degree to which a Twitter account displays similarity to
characteristics of known social bots by leveraging at least one
thousand features which are grouped into Network, User,
Friends, Temporal, Content and Sentiment classes [11].
Figure 2. Volume of Tweets within 10 seconds by User 2
The conjecture in South Africa was that another campaign
Fig. 1 and Fig. 2 depicts users with automating related to a political matter was the first evidence of Bot
characteristic such as tweeting at specific intervals and activity. Bots in themselves are amoral – they are programmed
multiple tweets per second for several seconds. These to represent the views of their developers. The developers on
characteristics of tweeting is synonymous with Twitter Bots the other hand may be perceived positive, perceived negative
(bots) and Cyborgs [8]. In Twitter a bot is a software that is or mercenaries depending on a participant view.
specially written to troll Social Media and to inter alia, amplify IV. DATA COLLECTION AND PROCESSING
certain tweets, or repeatedly tweet a boutique of tweets either
in burst mode or with a certain time-based frequency. This is Tweets were collected from the first mention of the
what piqued the researcher’s interest to identify the nature and hashtag, #FeesMustFall, in October 2015, until early April
extent of the impact of Twitter Bots and Cyborgs within our 2017. A total of 576 823 tweets were retrieved by purchase
dataset. The 4 methods used to identify automated social from a professional service provider. Limitation exists in
behaviour in Twitter from our data, are explained in the accessing the Twitter API to retrieve tweets and the cost of
Methods section. This paper has significance as it is the first retrieving historical data from Twitter is high [7]. Data went
paper in identifying Twitter Bots and Cyborgs within the through a cleansing process whereby duplicate, perceived
#FeesMustFall campaign on Twitter. erratic and unintelligible data were removed. This left 490 449
tweets which were stored in a database. Formally each tweet
Bots and cyborgs have several characteristics ranging (data point) that was analysed comprised of metadata that
from simple to highly complex [9]. The simple bot, for included the Tweet text; Date Timestamp; Username;
example, may tweet exactly after every 15 minutes to a Favorited/Liked; Retweeted; Tweet Source and User
maximum of a predetermined number that slips under Language. Videos and images were excluded from the
Twitter’s detection of spam in order to avoid suspension and collection and the study.
does not reply to tweets, whereas, the complex bot may, for
example, exhibit random tweet patterns and mimic tweets V. METHODOLOGY FOR IDENTIFYING AUTOMATION BY
from several different users on a specific topic and can reply USERS
to tweets. The detection of bots used in this paper begins with To determine the presence of bots or cyborgs 3 different
a simple technique with iteratively more complex methods methods have been identified to filter out suspicious bot and
with an aim in the future to build and develop methods to cyborg activity for further analyses. An existing bot detecting
detect complex bots and cyborgs.
technique developed by the DeBot team will complement Method C: Identifying users who have posted most of their
these methods. Method A detects multiple tweets per instance, tweets from known automated applications. In order to
Method B detects more than one instances of content amplify tweet volume and engagement, cyborgs and bots
duplication of tweets, Method C detects the percentage of make use of trigger mechanisms that automate tasks on
tweet sources per user that stem from automated software. Twitter such as retweeting, posting messages, following and
Method D uses the API from DeBot. Results from Methods replying to posts. This would involve applications and
A, B and C will then be sampled and analyzed by the authors software to accomplish such tasks. The data was filtered for
to assess whether or not there has indeed been bot or cyborg users who posted at least 30 Tweets within the data and if 70%
activity based on frequency, content and volume of tweets. or more of their tweets come from automated sources as
mentioned earlier then assumption is made that these accounts
These methods are as follows: are cyborgs or bots. The parameters for this method have been
Method A: Identifying at least 2 tweets by a user on a single chosen to increase the likelihood of filtering out automated
timestamp. Assumption that a human user is incapable of accounts and create a manageable set of users for analysis. The
tweeting or retweeting more than once on a single timestamp tweet source was part of the metadata retrieved from the
without automated assistance. timeline of tweets collected. The assumption adopted is that
Let a tweet = ( ) ℎ , ( ) ∈ ℕ users whose total tweets are at least 30 and whose Tweet
source comprises of at least 70% of a known automated
( ) = application then these users are considered Bots or Cyborgs.
For this paper, “IF This Then That” (IFTTT), Hootsuite,
TweetDeck, Tweetcaster and Buffer have been outlined as the
Let a user for a tweet = , ℎ ∈ ℕ set of automated applications for analysis. IFTTT is an applet
creator that provides the ability to automate tweeting and
= . retweeting on Twitter [16]. Hootsuite is a Social Media
manager that primarily provides the ability to schedule tweets
on Twitter [17]. TweetDeck is a Social Media application for
Let a Timestamp for a tweet = , ℎ ∈ ℕ
managing multiple Twitter accounts and has scheduling of
= . tweets as a feature [18]. Tweetcaster is an application for
Twitter users to manage their accounts and has the ability to
schedule tweets [19]. Buffer is a Social Media manager aimed
Each Timestamp is representative of the Gregorian Calendar at businesses that provides ability to schedule tweets on
and Coordinated Universal Time (UTC) and follows the Twitter amongst other automated features [20].
format (year/month/day hour:minute:second) Let =
, , : IFTTT; Hootsuite; TweetDeck; Tweetcaster; Buffer ,

Let the sum of the number of times an automating
( ) > 1 tweet source appears for the jth user be denoted as
( ) , ℎ ( ) ∈ , ∈ ℕ
Method B: Identifying duplication of tweets per user , ∶
occurring more than once for tweets greater than 29 (i.e. A ( )
user that has more than one instance of duplicate tweets) ≥ 0.7 , ( ) > 29
∑( )
The number 30 was deliberately chosen by the authors for
space as well to create a manageable dataset for further Method D: Use DeBot API to identify Twitter Bots with the
investigation. Due to schedule and trigger mechanisms keyword as ‘#Feesmustfall’. The DeBot API is made public
utilised by bots and cyborgs, assumption is made that and is useful in detecting Bot accounts and ‘your api key’ is
duplicate tweets per user appear more frequently in automated given by the DeBot team upon completion of registration on
assisted accounts than Human accounts. The technique was their website [21]. To find out bots for the #FeesMustFall
used on users containing a minimum of 30 tweets in our campaign, the Python code used was as follows:
dataset. Assumption is made that an account with a total
number of tweets exceeding 29 and whose duplicate tweets import debot
comprises of at least 30% of total tweets in a determined db = debot.DeBot('your_api_key')
period is considered to be spamming using automation. db.get_related_bots('#FeesMustFall')
Let each unique tweet = tw where i , tw ∈ ℕ VI. FINDINGS AND DISCUSSION
Method A reveals a total of 283 Users are either bots or
(tw) = Total No. of unique tweets cyborgs while Method B and Method C returned 6 and 135
bot or cyborg prone accounts for further analysis. Method D
returned a total of 4 bot accounts by the DeBot API. Bot and
A user, u is assumed to be a bot or cyborg if ∶
cyborg detection require varying methods as there are several
10 ∗ ∑ (tw) types of bots and cyborgs that exhibit different character
(tw) ≥ , and (tw) ≥ 30 traits, therefore, it is unsurprising that variation of results
7 exists as research in this area is ongoing to combat evolving
bots and cyborgs.
A number of users were detected more than once by the accounts were based on the general characteristics of simple
varying methods with a few of them belonging to the Top 10 bot or cyborg accounts [8]. These methods were not tested for
users with the highest number of tweets in the dataset such as performance as this study does not focus on developing bot or
User 1 and User 2 in Table 1. Method D revealed users that cyborg detection techniques.
were not in the original dataset and this could be due to several
reasons such as Twitter suspending their accounts when the A. Limitations of the Study
data was collected, or the accounts being removed by its Data collected at one point in time is not necessarily the
owners. The comparison of the top 10 users with the highest exact same as data collected at a different point on the same
number of tweets to the users detected from Methods A, B and topic at a different time due to the possibility of tweets and
C was conducted in order to examine the possibility of bots or accounts being deleted, suspended, removed, changed from
cyborgs significantly contributing to the discussion on Twitter public to private and meta data being affected due to changes
using the #FeesMustFall in terms of volume. by users. The methods used does not reflect all bot or cyborg
accounts and behaviours in the data.
TABLE 1. TOP 10 USERS WITH THE HIGHEST NUMBER OF TWEETS
AND THEIR RESPECTIVE NUMBER OF HASHTAGS, UNIFORM RESOURCE B. Ethics of the study
LOCATORS (URLS) AND RETWEETS The actual names of the users were purposely masked to
User No. of No. of URLs Number Retweets avoid unnecessary ethical issues that might arise from
Name hashtags (#) in Tweet of Tweets revealing them. Also, the actual content of the tweets from the
User 1 63817 15362 15403 242 identified bot or cyborg accounts were not depicted to prevent
tracking of these users as some of them may still be active.
User 2 13665 4111 7018 319
VII. FUTURE WORKS
User 3 15684 2215 2318 1025
Bot and cyborg identification is a complex area which
User 4 5330 2294 2258 70
requires many techniques and algorithms to produce efficient
User 5 7355 2221 2193 100 methods to detect and filter out for social media analytics.
User 6 3388 712 2063 7185
Some bot writers will write software to strategically evade
detection. Indeed, as [14] suggest humans do not exhibit
User 7 2206 969 1739 28974 synchronous behaviour for long periods of time on platforms
User 8 1561 1096 1053 12034 such as Twitter. The question for future research then is on the
characteristic’s length of time: how long is long, do different
User 9 1297 298 1041 6622
demographics exhibit different types of engagement? Create a
User 10 2590 366 949 4171 real-time bot detector for Twitter using more advanced
features including timeline and volume of tweets. Can an
account be actively defended against bots?
In Table 1, User 1 and User 2 were ranked 1st and 2nd
respectively amongst the highest tweeters and also appears A triangulation of real-world events with bot and cyborg
amongst the users found from using Methods A, B and C. This activity on social networks during cyber physical campaigns
prompted deeper analyses into the behaviour of these users in may assist in identifying the influential bots and cyborgs and
terms of frequency, content and volume of tweets. Fig. 1 by analysing their corresponding sentiment and content
depicts the pattern of all the tweets from User 1 within a 60- researchers could narrow down and predict their intentions. It
minute scale and it can be notably seen that the user generally is important because it shows that not all social media
tweets in 5 minute intervals which is consistent with a simple campaigns are entirely human-driven. Identifying the
bot or cyborg account that is programmed to schedule posting influence of bots may mitigate complex challenges and assist
of tweets. Table 1 reflects that on average a tweet from User 1 antagonists to reach consensus more quickly.
will comprise of approximately 4 hashtags, 1 URL and not be
retweeted. Combining what is found for User 1 in Fig. 1 and VIII. CONCLUSION
Table 1, it can therefore be concluded that User 1 is a bot or This paper presents the detection of bots and cyborgs
cyborg. Fig. 2 displays a snapshot of the tweeting behaviour associated with the popular campaign, #FeesMustFall, on
for User 2 within a 10 second timeframe. It can be seen that Social Media. Bots, may internationally not be a new
there are multiple tweets per second and a total of 21 tweets phenomenon, however, this occurrence is a reasonably new
within 10 seconds which is a prime indicator of autonomous phenomenon for South Africa. By using basic methods to
behaviour. User 1 and User 2 are therefore deemed as bots or filter out and analyse Twitter accounts for cyborg and bot
cyborgs from the 90783 unique users and contributed to activity, the researchers have concluded that there were
approximately 22413 (4.57%) of the 490449 tweets. This is a indeed bot and/or cyborg activity during the #FeesMustFall
significant finding about the activity during the campaign on Twitter. Some of which have had a significant
#FeesMustFall posts on Twitter as it suggests that social impact on Twitter in terms of amplifying the #FeesMustFall
media analysts who aim to get an overview of public hashtag during the campaign. The important contribution of
sentiment on a specific topic such as this may be blurred into this study is the detection of bots and cyborgs in
inaccurate interpretations without considering the presence of #FeesMustFall. This is a unique finding to South Africa where
bots and cyborgs as well as their perceived influence. there are no known papers or articles revealing such activity
during campaigns of such nature prior to this. The presence of
The aim of the study was to identify the presence of bots bots and cyborgs during campaigns such as the #FeesMustFall
and cyborgs during the #FeesMustFall campaign therefore it poses concerns for governments and relevant stakeholders
is sufficient that at least one bot or cyborg account be found where malicious or other types of bots can be created in
within the dataset and not all of them. In addition, the methods attempts to influence public opinion which can affect elections
used to filter users into possible suspicious bot or cyborg and campaigns alike.
REFERENCES 25th International Conference Companion on World Wide Web - WWW
'16 Companion, 2016.
[1] S. Booysen, Fees Must Fall: Student revolt, decolonisation and
governance in South Africa. Johannesburg: Wits University Press, [12] C. Zhang and V. Paxson, "Detecting and Analyzing Automated
2016. Activity on Twitter", Passive and Active Measurement, pp. 102-111,
2011.
[2] A. Bessi and E. Ferrara, "Social bots distort the 2016 U.S. Presidential
election online discussion", First Monday, vol. 21, no. 11, 2016. [13] A. Wang, "Detecting Spam Bots in Online Social Networking Sites: A
Machine Learning Approach", Lecture Notes in Computer Science, pp.
[3] S. Woolley, "Automating power: Social bot interference in global 335-342, 2010.
politics", First Monday, vol. 21, no. 4, 2016.
[14] N. Chavoshi, H. Hamooni and A. Mueen, "DeBot: Twitter Bot
[4] R. Millham and S. Thakur, "Social Media and Big Data," in The Human Detection via Warped Correlation", 2016 IEEE 16th International
Element of Big Data: Issues, Analytics, and Performance, pp. 179-194, Conference on Data Mining (ICDM), 2016.
2016.
[15] N. Chavoshi, "nchavoshi/debot_api", GitHub, 2018. [Online].
[5] A. Gandomi and M. Haider, "Beyond the hype: Big data concepts, Available: https://github.com/nchavoshi/debot_api. [Accessed: 10-
methods, and analytics", International Journal of Information Sep- 2018].
Management, vol. 35, no. 2, pp. 137-144, 2015.
[16] “IFTTT", Ifttt.com, 2018. [Online]. Available: https://ifttt.com/twitter.
[6] "Welcome to podargos data services!", Podargos.com, 2018. [Online]. [Accessed: 10- Sep- 2018].
Available: https://www.podargos.com/. [Accessed: 10- Sep- 2018].
[17] H. Inc., "Scheduling - Social Media Marketing & Management
[7] "Pricing", Developer.twitter.com, 2018. [Online]. Available: Dashboard - Hootsuite", Hootsuite, 2018. [Online]. Available:
https://developer.twitter.com/en/pricing.html. [Accessed: 08- Sep- https://hootsuite.com/platform/scheduling#. [Accessed: 10- Sep-
2018]. 2018].
[8] Z. Chu, S. Gianvecchio, H. Wang and S. Jajodia, "Detecting [18] "TweetDeck", Tweetdeck.twitter.com, 2018. [Online]. Available:
Automation of Twitter Accounts: Are You a Human, Bot, or https://tweetdeck.twitter.com/. [Accessed: 10- Sep- 2018].
Cyborg?", IEEE Transactions on Dependable and Secure Computing,
vol. 9, no. 6, pp. 811-824, 2012. [19] TweetCaster for Twitter", Tweetcaster.com, 2018. [Online]. Available:
http://tweetcaster.com/. [Accessed: 10- Sep- 2018].
[9] E. Ferrara, O. Varol, C. Davis, F. Menczer and A. Flammini, "The rise
of social bots", Communications of the ACM, vol. 59, no. 7, pp. 96-104, [20] "Social Media Management Platform | Buffer", Buffer.com, 2018.
2016. [Online]. Available: https://buffer.com/. [Accessed: 10- Sep- 2018].
[21] "DeBot", Cs.unm.edu, 2018. [Online]. Available:
[10] V. Subrahmanian, A. Azaria, S. Durst, V. Kagan, A. Galstyan, K.
https://www.cs.unm.edu/~chavoshi/debot/api.html. [Accessed: 10-
Lerman, L. Zhu, E. Ferrara, A. Flammini and F. Menczer, "The
Sep- 2018].
DARPA Twitter Bot Challenge", Computer, vol. 49, no. 6, pp. 38-46,
2016.
[11] C. Davis, O. Varol, E. Ferrara, A. Flammini and F. Menczer,
"BotOrNot: A System to Evaluate Social Bots", Proceedings of the

You might also like