Professional Documents
Culture Documents
Detection and Resolution of Rumours in Social Media
Detection and Resolution of Rumours in Social Media
Detection and Resolution of Rumours in Social Media
Domains –
News gathering –
great potential for news diffusion, outpacing pros(occasionally)
Updates from eye witnesses
CAN BE FALSE, aim to gain popularity
Emergency and crises –
Increased use during crises
Helps locating need for help
Platform for RT updates, coordination
Public Opinion –
Used to collect perception of public
Measure aggregate public opinion
Sway opinion of topics, Cambridge analytica.
Challenges posed by
rumours(contd.)
Domains –
Stock Market –
Latest dev. in financial world.
Sentiment expressed in tweets predict stock market.
Social media affects brands and products.
Studies have looked at
Credibility perceptions of users (Westerman et al. 2014, Recency
of updates and credibility of information. J. Comput.-Med.
Commun. 19, 2 (2014), 171–183)
Degree of reliability on social media (Jeffrey Gottfried and Elisa
Shearer. 2016. News Use Across Social Media Platforms 2016.
Technical Report. Pew Research Center.)
Challenges posed by
rumours(contd.)
Two cases
Long standing rumours –
Maybe a known priori
Track public opinion
Emergency rumours –
New rumours, in case of events.
Affect news gathering.
Affects decisions of individuals affected.
Data Collection And Annotation
Access to Social Media API
Best way to access, collect and store data from social media platforms is
generally through application programming interfaces.
Example - Twitter, Sina Weibo and Facebook.
Rumour Data Collection Strategies Classification
Classified on the basis of-
Long standing rumours - performed for a rumour or rumours that are known in
advance.
list of rumours is manually input.
keywords can be defined to collect posts.
Newly emerging rumours - data collection is usually done from a stream of
posts in real-time.
Tweets associated with a rumour will be collected before it occurs.
Keywords not known, broader data collection strategies, then sampling of
subset.
Posts for event collected and filtered.
Rumour Data Collection
Strategies
Sampling Strategies:
1.Top-down Sampling
2.Bottom-down Sampling
Annotation of Rumour Data
Rumour Veracity, Stance Toward Rumours, Rumour Relevance, Other
Factors.
Access to Social Media
APIs
APIs are easy-to-use interfaces that are usually accompanied by
documentation that describes how to request the data of interest.
Twitter:
Provides detailed documentation of ways to use its API.
Gives access to a REST API to harvest data from its database as well
as a streaming API to harvest data in real time.
Sina Weibo:
The most popular microblogging platform in China, many similarities
to that of twitter.
Access to some methods is not easily available.
Facebook:
It provides a documented API with a set of software development
kits for multiple programming languages & platforms that make it
easy to develop applications with its data.
Annotation of Rumour Data
Rumour Veracity
Stance Toward Rumours
Rumour Relevance
Other Factors
Characterising
Rumours:Understanding Rumour
Diffusion And Features
Rumour Classification:System
Architecture
Rumour detection
Rumour tracking
Stance classification
Veracity classification
Rumour Detection
Defination:
It is to find from a dataset of social media posts, which ones are
rumours.
A post is a rumour not as it will later be true or false, but instead that
it is unverified at the time of posting.
Dataset:
PHEME dataset is the only public dataset, with 1,972 rumours and
3,830 non-rumours associated with 5 breaking news posts.
Approaches to Rumour Detection:
Datasets:
Qazvinian(2011), which includes over 10,000 tweets associated with 5
different rumours, each tweet annotated for relevance toward the rumour
as related or unrelated.
Approaches to Rumour Tracking:
Research in rumour tracking is scarce.
Qazvinian(2011)’s machine learning approach:
Posts have different features, categorised as “content,” “network”, and “Twitter
specific memes.” A Bayesian classifier is used. The best performance was achieved by
using content-based features.
Datasets:
PHEME stance dataset, which provides tweet-level annotations of stance
(support, deny, query, comment) for tweets associated with nine events.
Ferreira and Vlachos(2016) dataset, it contains 300 rumoured claims and
2,595 associated news articles, with an estimation of their veracity.
Approaches to Rumour Stance Classification:
The highest performance scores are achieved using the two-step approach.
Human-labelled, non-automated by Mendoza(2010):
Over 95% of tweets associated with true rumours were “affirms,” whereas
only 4% were “questions,” and only 0.4% were “denies.”
Qazvinian(2011):
The tweets were classified as supporting, denying, questioning, or neutral. In
terms of results, observations similar to the ones obtained for the rumour
tracker are reported.
Supervised machine learning by Hamidian and Diab (2015):
Used an J48 decision tree. An addition feature pragmatics is used.
User-based
. Registration age, number of followers, number of followees, and
the number of tweets the user has authored in the past.
Topic-based
. the fraction of tweets that contain URLs, the fraction of tweets
with hashtags.
Rumors Veracity Classification
(contd.)
Propagation-based features
. depth of the retweet tree or the number of initial tweets on a topic
Linguistic features
. are obtained through the Linguistic Inquiry and Word Count (LIWC)
dictionaries
• Network features
. Creating a social network based on reviews or comments attached to
the source tweet