Detection and Resolution of Rumours in Social Media

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Detection and Resolution of

Rumours in Social Media: A


Survey
Introduction
 Increase in social media, news gathering platforms.
 Unmoderated, spread of misinformation.
 Anyone can share real-time thoughts.
 Absence of concrete proof to est. veracity.
 Aim to provide overview of research.
Rumours - characteristics
DEFINITION –
 FALSE - “unverified and instrumentally relevant information statements in circulation”
 TRUE - “an item of circulating information whose veracity status is yet to be verified at the
time of posting.”
Rumour – unverified, no official statement confirming or denying, and/or evidence
RUMOUR TYPE – rumour classification system, the factor that largely determines approaches to
be utilised is their temporal characteristics:
 New rumours that emerge during breaking news.
 Context of breaking news, not observed before, training data may differ from current situation
 Early resol. Crucial, process RT posts
 Eg- ID of terrorists

 Long standing rumours discussed over long time.


 Circulate long time, without established veracity
 Classifier need not detect such rumours as they might be known PRIORI.
History – early studies to
social media
 Studied from many different perspectives
 Pamela Donovan. 2007. How idle is idle talk? One hundred years of rumor research.
 Psych studies
 Computational Analysis

Traditionally, difficult to study people’s reactions to rumours due to RT reactions involved.


 Acc. to (Allport and Postman 1946, 1947)
 Newsworthy = more rumours
 Amount of rumours = Importance of subject * Ambiguity of subjects
 Official announcements very important to curb rumours (Study, 1947)
Early research -> believability, not a factor.
Current trends -> More believable, Spreads more. Least important rumours, spread more.
Medium of spread

 Internet & social media –


 Powerful tools to spread information.
 Large number of sources.
 Ease of sharing.
Acc. to research, Twitter debunks misinformation. Users share
 Opinions, conjectures and evidence.
Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2013 found ratio between
tweets supporting and debunking false rumours was 1:1.
Even if false rumours are supported by the users, self correction takes place,
 with time
 increase in evidence.
Scope and organisation
 Social Media – aggregates judgement of large userbase.
 In early stages, overall tendency to support false rumours.
 Shift in support of True info and debunking false rumours, with time.
 Social Media –
 Useful, open, ease of use, lack of moderation
 Leads to problems of info quality assurance.
 Sense of unease and potential harm.
Challenges posed by rumours

 Domains –
 News gathering –
 great potential for news diffusion, outpacing pros(occasionally)
 Updates from eye witnesses
 CAN BE FALSE, aim to gain popularity
 Emergency and crises –
 Increased use during crises
 Helps locating need for help
 Platform for RT updates, coordination
 Public Opinion –
 Used to collect perception of public
 Measure aggregate public opinion
 Sway opinion of topics, Cambridge analytica.
Challenges posed by
rumours(contd.)
 Domains –
 Stock Market –
 Latest dev. in financial world.
 Sentiment expressed in tweets predict stock market.
 Social media affects brands and products.
 Studies have looked at
 Credibility perceptions of users (Westerman et al. 2014, Recency
of updates and credibility of information. J. Comput.-Med.
Commun. 19, 2 (2014), 171–183)
 Degree of reliability on social media (Jeffrey Gottfried and Elisa
Shearer. 2016. News Use Across Social Media Platforms 2016.
Technical Report. Pew Research Center.)
Challenges posed by
rumours(contd.)
 Two cases
 Long standing rumours –
 Maybe a known priori
 Track public opinion
 Emergency rumours –
 New rumours, in case of events.
 Affect news gathering.
 Affects decisions of individuals affected.
Data Collection And Annotation
 Access to Social Media API
 Best way to access, collect and store data from social media platforms is
generally through application programming interfaces.
 Example - Twitter, Sina Weibo and Facebook.
 Rumour Data Collection Strategies Classification
 Classified on the basis of-
 Long standing rumours - performed for a rumour or rumours that are known in
advance.
 list of rumours is manually input.
 keywords can be defined to collect posts.
 Newly emerging rumours - data collection is usually done from a stream of
posts in real-time.
 Tweets associated with a rumour will be collected before it occurs.
 Keywords not known, broader data collection strategies, then sampling of
subset.
 Posts for event collected and filtered.
Rumour Data Collection
Strategies
Sampling Strategies:
1.Top-down Sampling
2.Bottom-down Sampling
 Annotation of Rumour Data
 Rumour Veracity, Stance Toward Rumours, Rumour Relevance, Other
Factors.

Access to Social Media
APIs
APIs are easy-to-use interfaces that are usually accompanied by
documentation that describes how to request the data of interest.
Twitter:
 Provides detailed documentation of ways to use its API.
 Gives access to a REST API to harvest data from its database as well
as a streaming API to harvest data in real time.
Sina Weibo:
 The most popular microblogging platform in China, many similarities
to that of twitter.
 Access to some methods is not easily available.
Facebook:
 It provides a documented API with a set of software development
kits for multiple programming languages & platforms that make it
easy to develop applications with its data.
Annotation of Rumour Data
 Rumour Veracity
 Stance Toward Rumours
 Rumour Relevance
 Other Factors
Characterising
Rumours:Understanding Rumour
Diffusion And Features
Rumour Classification:System
Architecture
 Rumour detection
 Rumour tracking
 Stance classification
 Veracity classification
Rumour Detection

 Defination:
 It is to find from a dataset of social media posts, which ones are
rumours.
 A post is a rumour not as it will later be true or false, but instead that
it is unverified at the time of posting.

 Dataset:
 PHEME dataset is the only public dataset, with 1,972 rumours and
3,830 non-rumours associated with 5 breaking news posts.
 Approaches to Rumour Detection:

 Finding known rumours: It has a classifier with a set of predefined rumours.

 Posts from skeptic users: Users questions about their veracity.


e.g., “is (that | this | it) true”.to check the enquiring posts.

 Context-learning approach: As all posts may not trigger users, it uses a


conditional random fields (CRF) as a sequential classifier that learns the
reporting dynamics during an event, so that the classifier can determine, for
each new post, whether it is or not a rumour based on what has been seen
so far during the event.

 State-of-the-art approach: It leverages context from earlier posts associated


with a particular event to determine if a post constitutes a rumour.
Rumour Tracking
 Definition:
 It is triggered once a rumour is detected and consists of identifying
subsequent posts associated with the rumour being monitored.
 Labels in each post is then classified as related or unrelated.

 Datasets:
 Qazvinian(2011), which includes over 10,000 tweets associated with 5
different rumours, each tweet annotated for relevance toward the rumour
as related or unrelated.
 Approaches to Rumour Tracking:
Research in rumour tracking is scarce.
 Qazvinian(2011)’s machine learning approach:
 Posts have different features, categorised as “content,” “network”, and “Twitter
specific memes.” A Bayesian classifier is used. The best performance was achieved by
using content-based features.

 Tweet Latent Vector (TLV):


 latent vector representative of a tweet, uses Semantic Textual Similarity (STS).

 Event detection and tracking approach:


 It is based on keyword graphs. A graph of keywords to detect communities and
subsequently newly emerging events.
 A set of keywords, associated with an event is used to track new incoming tweets.
Rumour Stance
Classification
 Definition:
 Determining the type of orientation that each individual post expresses toward
the disputed veracity of a rumour.
 For a set of rumours D = {R1, . . . , Rn }, we classify them as;
Y={supporting,denying,querying, commenting}.

 Datasets:
 PHEME stance dataset, which provides tweet-level annotations of stance
(support, deny, query, comment) for tweets associated with nine events.
 Ferreira and Vlachos(2016) dataset, it contains 300 rumoured claims and
2,595 associated news articles, with an estimation of their veracity.
 Approaches to Rumour Stance Classification:

 One-step problem(six-way classification task):


 Unrelated to rumour.
 Four classes of stance.
 not determined.
 Two-step problem:
 First a three-way classification task:
 related to rumour.
 unrelated to rumour.
 not determined.
 Then four-class classification task:
 stance classification.

 The highest performance scores are achieved using the two-step approach.
 Human-labelled, non-automated by Mendoza(2010):
 Over 95% of tweets associated with true rumours were “affirms,” whereas
only 4% were “questions,” and only 0.4% were “denies.”
 Qazvinian(2011):
 The tweets were classified as supporting, denying, questioning, or neutral. In
terms of results, observations similar to the ones obtained for the rumour
tracker are reported.
 Supervised machine learning by Hamidian and Diab (2015):
 Used an J48 decision tree. An addition feature pragmatics is used.

 Sequential classification by Kochkina(2017):


 Uses Long/Short-Term Memory Networks (LSTMs). Also looks into average
word vectors, punctuation, similarity between word vectors in current tweet,
source tweet, and previous tweet, presence of negation, picture, URL.
Rumors Veracity Classification
Dataset
. RumourEval 2017
. 300 rumours annotated for veracity as one of true, false or
unverified.
. Each rumour includes a stream of tweets associated with it.

Twitter and Sina Weibo


Liyunabaike.com (a Chinese rumour debunking platform)
Features
. message-based
.the length of a message, whether the message contains
Rumors Veracity Classification
(contd.)
exclamation/question marks, number of positive/negative sentiments
words, whether the message contains a hashtag and whether it is a
retweet.

 User-based
. Registration age, number of followers, number of followees, and
the number of tweets the user has authored in the past.

 Topic-based
. the fraction of tweets that contain URLs, the fraction of tweets
with hashtags.
Rumors Veracity Classification
(contd.)

 Propagation-based features
. depth of the retweet tree or the number of initial tweets on a topic

Some Other Features:


 Temporal features
. to capture how rumours spread over time

 The structural features


. model the connectivity between users who posted about the rumour
Rumors Veracity Classification
(contd.)

 Linguistic features
. are obtained through the Linguistic Inquiry and Word Count (LIWC)
dictionaries

Features on Sina Weibo


. Client-Based
. Location-Based
 Client-based features
. Information about the software that was used to perform the messaging
• Location-based features
. Information relating to whether the message was sent from within
the same country where the event happened or not.

• Network features
. Creating a social network based on reviews or comments attached to
the source tweet

• Negation words (comprehensibility category), past, present, future POS


(parent over shoulder) in the tweets (time-orientation category),
discrepancy, sweat and exclusion features (writing style category) and
finally, home, leisure, religion and sex topic features (topic category)
Approaches
 Machine learning: Bayesian networks, SVM (Support Vector Machines),
decision trees based on J48 (89.2% accuracy, 89.1% precision, 89.1% recall
and 89.1% F1-measure)
 Random Forest classifier
. accuracy (90%), precision (93.5%), recall (89.2%) and F1-measure
(89.3%)
 Decision trees with J48 leads to 77.4%, SVM with the RBF kernel to 77.9% and
random forests to 81.5%
 Reports the correlation between features and veracity of rumours
using logistic regression

. Features like mention of numbers, the source the rumour


originated from and hyperlinks, positively correlate with true
rumours and rumours containing some wishes are positively
correlated with false rumours. If images are included in the
rumours then those were negatively correlated with true
rumours.

Types of microblog posts


• Chat
• News
Rumours detection related
applications
 Hoaxy

. A platform for Tracking Online Misinformation


 PHEME

. research project into establishing the veracity of claims made on


the internet.
 RumorLens

. aid journalists in finding posts that spread or correct a particular


rumour on Twitter by exploring the audiences that those posts have
reached.
 TwitterTrails

. Interactive, web-based tool that allows users to investigate the


origin and propagation characteristics of a rumour and its denial on
Twitter.

You might also like