Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

A Unified Perspective for Disinformation Detection and

Truth Discovery in Social Sensing: A Survey

FAN XU, Jiangxi Normal University, China


VICTOR S. SHENG, Texas Tech University, USA
MINGWEN WANG, Jiangxi Normal University, China

With the proliferation of social sensing, large amounts of observation are contributed by people or devices.
However, these observations contain disinformation. Disinformation can propagate across online social net-
works at a relatively low cost, but result in a series of major problems in our society. In this survey, we provide
a comprehensive overview of disinformation and truth discovery in social sensing under a unified perspec-
tive, including basic concepts and the taxonomy of existing methodologies. Furthermore, we summarize the
mechanism of disinformation from four different perspectives (i.e., text only, text with image/multi-modal,
text with propagation, and fusion models). In addition, we review existing solutions based on these require-
ments and compare their pros and cons and give a sort of guide to usage based on a detailed lesson learned. To
facilitate future studies in this field, we summarize related publicly accessible real-world data sets and open
source codes. Last but the most important, we emphasize potential future research topics and challenges in
this domain through a deep analysis of most recent methods.

CCS Concepts: • Artificial intelligence → Natural language processing; • Information systems → Content
analysis and feature selection;

Additional Key Words and Phrases: Disinformation detection, truth discovery, social sensing, privacy-aware

ACM Reference format:


Fan Xu, Victor S. Sheng, and Mingwen Wang. 2021. A Unified Perspective for Disinformation Detection and
Truth Discovery in Social Sensing: A Survey. ACM Comput. Surv. 55, 1, Article 6 (November 2021), 33 pages.
https://doi.org/10.1145/3477138

1 INTRODUCTION
People or devices can contribute large amounts of observation in social sensing, a new paradigm
[1, 127], to a perceive environment. In this new paradigm, crowdsourcing can be adopted to har-
ness the wisdom of crowds when collecting real time information [107, 165, 166]. Meanwhile, the
social sensing can be embodied in web claims, news, Twitter, Weibo, and so on. Recently, many
social sensing related applications are presented, including smart phones-based crowdsourcing 6
This research was supported by the National Natural Science Foundation of China under Grants 62162031, 61772246,
61728205, and 61876074, Joint Funding Project of Jiangxi Science and Technology Plan under Grant 20192ACBL21030.
Authors’ addresses: F. Xu and M. Wang, Jiangxi Normal University, Nanchang 330022, China; emails: {xufan, mwwang}@
jxnu.edu.cn; V. S. Sheng (corresponding author), Texas Tech University, Lubbock, TX 79409, USA; email: Victor.sheng@
ttu.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2019 Association for Computing Machinery.
0360-0300/2021/11-ART6 $15.00
https://doi.org/10.1145/3477138

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:2 F. Xu et al.

tasks [11], disaster management [154], geo-tagging for smart cities [74], green global position
system (GPS) [23] (a participatory sensing navigation service for finding the most fuel-efficient
routes), personal health monitoring [13], and so on. Similar to mobile crowdsourcing [12, 138],
social sensing has become an effective paradigm to collect and process data. Meanwhile, since
sensing devices are mobiles, they can provide a wider coverage than traditional wireless sensor
networks (WSN) do. Besides these plentiful advantages, it also has some disadvantages, espe-
cially on disinformation propagation. That is, disinformation can propagate across online social
networks at a relatively low cost, but result in a series of major problems in our society. In fact,
in the social sensing or crowdsourcing platform, individual users may have privacy issues when
sharing their answers with others. For example, individual users can report correlations between
search queries and web pages, but the answers may reveal their personal preferences, occupation,
age, state, party, and prior history. Similarly, patient responses to a drug is valuable for doctors
to discover the side effects of the drug, but it contains sensitive information that a patient may
not want to share. Therefore, the privacy-preserving truth discovery is an important and challeng-
ing task in social sensing (refer to Section 4 for more details). As we all know, disinformation is
harmful to our lives [53, 173]. Disinformation commonly spreads in the circumstances of breaking
news, often starting as a rumor. For instance, a rumor about the White House having been bombed
resulted in stock markets spooked in 2013.1 Similarly, another similar disinformation related to the
Hurricane Sandy rumor led to the US Federal Emergency Management Agency finally having to
control the rumor.2
According to statistics, two-thirds of Americans obtain news on social media.3 Humans, how-
ever, are impressionable to false information [124]. Unfortunately, Rubin et al. [100] revealed that
the human ability of detecting false information ranges 55%–58% accuracy. Although some web-
sites, e.g., Snopes,4 Politifact,5 Factcheck,6 can debunk some types of disinformation, they heavily
depend on domain experts to conduct manual fact-checking. Potential issues of manually checking
include low coverage and high latency. Therefore, automatic disinformation detection (ADT)
is necessary. However, conflicts are a general problem within multi-source information for the
same object in a social sensing environment. Fortunately, truth discovery can tackle this challenge
by integrating multi-source noisy information and result in a great success for data or knowledge
fusion [61].
Some authors have conducted a survey related to fake news detection [35, 170], false infor-
mation [48, 159], misinformation [20], and rumor [2, 7, 172]. To be more specific, Shu et al. [35]
summarized fake news from four perspectives, such as credibility of users, false knowledge, writ-
ing style of the fake news, and propagation schema. Zhou and Zafarani [170] introduced fake news
characteristics from both psychology and social theory perspective and summarized current rep-
resentative methods from a viewpoint of data mining. Kumar and Shah [48] reported an overview
of existing research from five aspects (i.e., actors who spread disinformation, rational to deceive
users, the importance of disinformation, the characteristics investigation, and related representa-
tive algorithms). In comparison, Zannettou et al. [159] focused on providing a detailed typology
(i.e., perception, motivation, propagation, detection) in the diverse types of false information, i.e.,
rumors, fake news, hoaxes, clickbait, and various other shenanigans. Fernandez and Alani [20]
focused on misinformation detection and characterized existing methods from four aspects, i.e.,
1 https://www.bbc.com/news/world-us-canada-21508660.
2 https://twitter.com/fema/status/264800761119113216.
3 http://www.journalism.org/2017/09/07/news-use-across-social-media-platforms-2017.
4 https://www.snopes.com.
5 http://www.politifact.com.
6 https://www.factcheck.org.

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:3

misinformation detection based on content, dynamics of misinformation, validation of content,


and misinformation management. Zubiaga et al. [172] did a survey on rumor debunking and track-
ing, along with stance and veracity detection of rumor. Alzanin and Azmi [2] only introduced
algorithms on rumors detection. Figueira et al. [22] focused on fake news detection based on its
content and propagation patterns. However, there exists no literature on a unified perspective of
disinformation detection and truth discovery.
There are two major differences between our manuscript and the existing literature of surveys
in this field. First, we conducted a deep survey for disinformation detection and truth discovery
(i.e., the privacy-unaware and privacy-preserving truth discovery) in social sensing under a unified
framework. In fact, disinformation detection and truth discovery have many common character-
istics, e.g., the similar problem description and the general methodology. The existing literature
of surveys, however, focuses on providing a typology on only one topic such as truth discovery
or one specific type of misinformation (i.e., rumor, fake news, hoax, etc.). Furthermore, we give
more space to illustrate the popular two types disinformation (i.e., rumor and fake news) and try to
introduce them in depth including the mechanism from four different perspectives (i.e., text only,
text with image/multi-modal, text with propagation, and fusion models) in this article. Second, to
facilitate future studies in this field, we summarized related publicly accessible real-world data sets
and open source codes for both disinformation detection and truth discovery in tabular form.
In this survey article, we first provide a comprehensive overview of disinformation and truth
discovery in social sensing under a unified perspective. We first explain related basic concepts
and their problem formulations. And then we show the taxonomy of existing methodologies and
provide a depth explanation of each category in the taxonomy. Furthermore, we give a detailed
summarization for the mechanism of disinformation from four different perspectives, including
text only, text with image/multi-modal, text with propagation, and fusion models. In addition,
we review existing solutions and compare their pros and cons from a detailed lesson learned. To
facilitate future research in this field for novice researchers, we summarize related publicly acces-
sible real-world datasets and open source codes, which are ignored in all previous survey papers.
Of course, there are many research problems on disinformation detection and truth discovery in
social sensing. At the end of this survey, we discuss some potential future research topics and
challenges after a deep analysis of most recent approaches.
The rest of this article is organized as follows: In Section 2, the concepts of disinformation,
truth discovery, and related terminologies, such as misinformation, rumor, hoax, and fake news,
are presented. In Section 3, we give an overview of existing approaches for both disinformation
detection and truth discovery in a unified perspective. In Section 4, we summarize related publicly
accessible real-world datasets and open source codes. We conclude the article from a detailed lesson
learned and discuss some future research directions in Section 5.

2 CONCEPTS AND PROBLEM STATEMENTS


In this section, we will introduce basic concepts and problem statements related to disinformation.

2.1 Basic Concepts


Due to no universal definition for disinformation and related terminologies, people get confused
on disinformation with misinformation, rumor, hoax, and fake news. Here, we list some accept-
able definitions from both authoritative dictionaries and some references in Table 1 and Table 2,
respectively. From Table 1, we can observe that the disinformation is a deliberate false information
described in all the three authoritative dictionaries (i.e., Longman, Oxford, and Merriam Webster).
By contrast, misinformation is incorrect or inaccurate information that is not deliberately false. In
comparison, rumor is a kind of information that may or may not be true. It can be classified into

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:4 F. Xu et al.

Table 1. Definition of Disinformation and Its Related Terminologies from Dictionaries


Dictionaries
Terminology
Longman Oxford Merriam Webster
false information
false information, which deliberately and often
false information, which
is intended to mislead, covertly spreads
is given deliberately in
especially propaganda (as by the planting of
Disinformation order to hide the truth or
issued by a government rumors) in order to
confuse people, especially
organization to a rival influence public
in political situations.
power or the media. opinion or obscure
the truth.
false or inaccurate
incorrect information,
information, especially incorrect or misleading
Misinformation especially when deliberately
which is deliberately information.
intended to deceive people.
intended to deceive.
(1) talk or opinion widely
information that is passed a currently circulating disseminated with no
from one person to another story or report of an discernible source.
Rumor
and which may or may not uncertain or doubtful (2) a statement or report
be true. truth. current without known
authority for its truth.
(1) a false warning about
to trick into believing
something dangerous.
a humorous or or accepting as genuine
Hoax (2) an attempt to make
malicious deception. something false and often
people believe something
preposterous.
that is not true.
(1) a copy of a valuable
object, painting, etc., which
is intended to deceive people. a thing that is not
Fake (news) (2) someone who is not what genuine; a forgery not true, real, or genuine.
they claim to be or does not or sham.
have the skills they
say they have.

misinformation or disinformation according to the user’s intention. Compared with rumor, hoax
and fake news are definitely disinformation, and fake news is a specific type of hoax. Furthermore,
fake news aims at more financial profit or political harvest.
The above definitions are general concepts from the authoritative dictionaries. It is still very
hard to operate in our research. Therefore, some references give more detailed concepts for these
terminologies, e.g., disinformation, misinformation, rumor, hoax, and fake news. Based on these
terms defined in the references as shown in Table 2, we can distinguish them from two aspects,
i.e., authenticity and intention. For the former, the authenticity of disinformation, misinformation,
hoax, and fake news is false. However, the authenticity of rumor is unknown. For the latter, the
intention is generally bad for disinformation, hoax, and fake news. In comparison, it is unknown
for misinformation and rumor.
To be more specific, according to Reference [172], relationships among these terminologies
can be illustrated in Figure 1. According to their intention, rumors can be further classified into
disinformation and misinformation. Fake news is a typical type of hoax, and hoax is a kind of
disinformation.

2.2 Problem Statements


Disinformation detection. According to References [73, 172], we define a disinformation
detection dataset as a collection of claims C = {C1 , C2 , . . . , C |c | }. Each claim Ci stands for a source
information ri , which ideally consists of all its relevant responses in chronological order. That is,

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:5

Table 2. Definition of Disinformation and Its Related Terminologies from References


Terminology Definition References
information that is deliberately false. Hernon [28]
Disinformation it is false information. Kshetri and Voas [45]
information’s authenticity is false, and intention is bad. Zhou et al. [170]
information is false due to an honest mistake. Hernon [28]
Misinformation simply incorrect information. Kshetri and Voas [45]
information’s authenticity is false, and its intention
Zhou et al. [170]
is unknown.
both information’s authenticity and intention are
Zhou et al. [170]
unknown.
Rumor
rumors can be embodied in either misinformation or
disinformation, depending on the intent out to be true; Zubiaga et al. [172]
the intention of rumors is unknown.
unverified and instrumentally relevant information
statements in circulation, and intention of rumor is DiFonzo and Bordia [16]
unknown.
it is a kind of disinformation; information’s authenticity
Hoax Zubiaga et al. [172]
is false, and its intention is bad.
a type of disinformation that is currently generated
manually to our best knowledge; information’s Kshetri and Voas [45]
Fake (news) authenticity is false, and its intention is bad.
Fake news is false news which is released by a news agent. Zhou et al. [170]
it is a kind of disinformation, and its intention is bad. Zubiaga et al. [172]

Fig. 1. A general categorization of information based on intentions.

Ci = {ri , xi1 , xi2 , . . . , xim }, where each xi ∗ is a response of the root ri . Then, we formulate a dis-
information detection task as a supervised classification problem. It learns a classifier f :Ci → Yi ,
where Yi can take one of two classes: rumor or non-rumor, for binary rumor detection; and Yi can
take one of the four classes: true rumor, false rumor, non-rumor, false rumor, or unverified rumor,
for multi-class rumor detection.
Truth discovery. According to Reference [61], for a collection of objects O , a series of sources
S can contribute different information. Its goal is to predict the truth vo ∗ for each object o ∈ O
by resolving conflict information from different sources {vo s }, s ∈ S. Meanwhile, we can estimate
source weights {ws }(s ∈ S), which will be used to infer truths.
Relationship between disinformation and truth discovery. According to the above prob-
lem statements, if we map an object o ∈ O in truth discovery to a claim Ci in disinformation

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:6 F. Xu et al.

Fig. 2. The categorization of representative algorithms.

detection, then we can set up a unified perspective for them. From the problem statement perspec-
tive, it seems that truth discovery is suitable for structured data. In fact,
(1) According to References [30, 75, 76, 128–133, 164], it is hard to accurately ascertain both the
correctness of each unstructured tweet data and the reliability of each Twitter user. Meanwhile,
they demonstrated the maximum likelihood estimation (MLE) based on a credibility analysis
tool using Expect Maximization (EM) can be used to analyze the credibility of reported tweets.
Also, the latest literature [75] shows that the current widely used deep neural network can handle
truth discovery in social sensing environment.
(2) Besides, References [4, 24, 29, 47, 83, 96, 114, 118, 119, 126, 150] adopted a probabilis-
tic graphical model to conduct a joint interaction to find articles with high credibility, sources
with high reliability, and expert users who perform the role of “citizen journalists” in the
community.
(3) Again, References [31, 42, 75, 111, 112] adopted an iteration-based approach to conduct truth
discovery and fake news detection in social sensing.

3 REPRESENTATIVE APPROACHES
In this section, we first focus on representative algorithms for both disinformation detection and
truth discovery in social sensing under a unified perspective in Section 3.1. And then, we provide
a brief introduction of the latest privacy-aware truth discovery, which is a quite promising new
task in this research area, in Section 3.2.

3.1 A Unification Perspective for Disinformation Detection and Truth Discovery


In this subsection, we will first categorize approaches for both disinformation detection and truth
discovery in social sensing, which is illustrated in Figure 2. Then, we introduce existing approaches
according to our categorization, and we illustrate the categories of the current proposed algorithms
in this area. Because it is impossible to introduce all literatures, we just list some representative and
latest references in both areas in social sensing. In Figure 2, the intuition behind modeling-based
methods is that the propagation patterns of true and fake information are different.

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:7

3.1.1 Traditional Approaches for Disinformation Detection. As illustrated in Figure 2, traditional


approaches for disinformation detection can be categorized into six subgroups. We will explain
representative models of these six subgroups in the following subsections.
(1) Feature-based models. Table 3 lists effective features proposed by most recent literatures
on different platforms (e.g., web claims, Weibo, Twitter, review). As shown in Table 3, there ex-
ist more than 20 representative features. These 20 features can be classified into five types, such
as user-based, linguistic-based, sentiment-based, location & temporal-based, and other features.
Among these five feature types, the user-based features are more popular, because the user pro-
file is a good indicator for disinformation detection. Meanwhile, linguistic-based features such as
LIWC (Linguistic Inquiry and Word Count) and readability are widely used in disinformation
detection. What is more, the sentiment-based features are also a good insight for disinformation
detection, because the emotion of a user can affect the judgment of disinformation. Furthermore,
the location & temporal-based features are also a good indicator to disinformation detection, since
the location and temporal information can embody the place and the time when a event happened.
Due to the space limitation, we only briefly introduce representative feature engineering-based
models of each platform in Table 3.
(2) Kernel-based models. Wu et al. [38] and Ma et al. [72] proposed a kernel-based model for
rumor detection. They both considered the propagation characteristics of rumors in Internet. A
kernel method [106] works by embedding data in a vector space (usually with more high dimen-
sionality) to look for (linear) relations in such space, using a kernel function. If a kernel function
is chosen suitably, then complex relations can be simplified and easily detected. Generally, the ker-
nel function can effectively calculate the distance and angle in a high latitude space. The general
kernel function includes linear kernel, polynomial kernel, radial basis function, and Gaussian ker-
nel. Wu et al. [38] found that the propagation way of false rumor and normal message is different.
Based on this observation, they generated a propagation tree for rumor detection. The nodes of
the tree represent opinion leaders and normal users. Each edge of the tree means a reaction op-
eration between any two nodes, and the weight of each edge is presented as a triple, including
an approval score, a double score, and an overall sentiment score between the two nodes. Then,
they integrated a random walk graph kernel with a feature vector kernel as a hybrid kernel, which
is shown in Equation (1), to perform rumor detection. In their work, they extracted a total of 23
features consisting of message-based, user-based, and repost-based ones for a RBF (Radial Basis
Function) kernel.
K (mi , m j ) = βK (Ti ,T j ) + (1 − β )K (X i , X j ), (1)
where mi and mj are two messages; β is a coefficient (0 < β < 1), which defines the relative weight
of the two kernels (i.e., random walk kernel and feature vector kernel). The function K(Ti , Tj ) and
K(Xi , Xj ) are defined as follows: Hereinto, Ti and Tj indicate any two propagation trees; Xi and Xj
donate any two feature vectors.

K (T ,T = e T (I − λAx ) −1e, (2)
where Ax is an adjacency matrix, and λ donates a weighting parameter.
K (X i , X j ) = < ϕ (X i ).ϕ (X j ) >, (3)
where ϕ denotes a feature map.
Similarly, Ma et al. [72] considered a propagation structure kernel and designed three kinds of
similarity, i.e., user similarity, content similarity, and node similarity. Besides, they also considered
propagation paths from a root node to subtrees to integrate context information.

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:8 F. Xu et al.

Table 3. A General Summary of Representative Text-driven Approaches on Disinformation


and Truth Discovery
Label Social
References Methodology & Classifier Type
B. M. Platform
Traditional approaches
Feature-based models
User-based features
(1) Features: creator’s edit history (i.e.,
√ username of the editors of the revision
Kumar et al. [49] Hoax Wikipedia
and the editing time)
(2) Classifier: Random forest
√ (1) Features: user identity; user location
Liu et al. [64] & witness Rumor Twitter
(2) Classifier: SVM
√ (1) Features: user history (i.e., liberal
Qazvinian et al. [91] tweeter vs. conservative tweeter) Rumor Microblogs
(2) Classifier: Bayes classifier
(1) Features: user behavior (i.e., authority
√ score, brand deviation score, rating
Li et al. [54] Spam Review
deviation score)
(2) Classifier: Naive Bayes
(1) Features: characteristics of users (i.e.,
√ average of age; average of state count; Fake
Castillo et al. [8] average number of followers and friends; Twitter
news
whether friend has URL; user mention)
(2) Classifier: J48 Decision Tree
Linguistic-based features
(1) Features: LIWC information;
√ readability (i.e., Flesch-Kincaid, Fake News
Rosas et al. [98]
gunning fog, etc.) news sites
(2) Classifier: SVM
√ (1) Features: absurdity, humor, grammar,
Fake News
Rubin et al. [120] negative affect
news sites
(2) Classifier: SVM
√ (1) Features: LIWC, POS (part-of-speech),
Li et al. [56] unigram Spam Review
(2) Classifier: Bayesian generative model
√ (1) Features: posemo; negate; social;
Kwon et al. [51] cogmech; excl; tentat Rumor Twitter
(2) Classifier: Random forest
Sentiment-based features
(1) Features: belief features (i.e.,
√ Support, negation, question
Liu et al. [64] Rumor Twitter
or neutrality)
(2) Classifier: SVM
(1) Features: aggregated features (i.e.,
average sentiment score; sentiment Fake
Castillo et al. [8] Twitter
positive; sentiment negative) news
(2) Classifier: J48 Decision Tree
(Continued)

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:9

Table 3. Continued

√ (1) Features: subjective vs. Objective;


Li et al. [54] Positive vs. negative Spam Review
(2) Classifier: NaÃŕve Bayes
Location & Temporal-based features
√ (1) Features: average travel speed;
Li et al. [55] IPs or city location Spam Review
(2) Classifier: SVM
√ (1) Features: temporal features (i.e.,
Kwon et al. [51] periodicity and offset of external schock) Rumor Twitter
(2) Classifier: Random forest
√ (1) Features: the location of the event
Yang et al. [19] Rumor Weibo
(2) Classifier: SVM
Other features
(1) Features: entity similarity, core text
√ similarity, claim-to-sentence similarity,
Fake Web
Wang et al. [135] claim-to-paragraph similarity, content
news claim
similarity publication order
(2) Classifier: Decision tree
√ (1) Features: stylometric Fake News
Potthast et al. [89]
(2) Classifier: Random forest news article
(1) Features: tweet element, rumor,
Zeng et al. [161] - - interest, exposure, seasonality Rumor Twitter
(2) Classifier: Regression
√ (1) Features: semantic similarity
Sandulescu et al. [102] (2) Classifier: Pairwise information Spam Review
radius similarity value
√ (1) Features: content-based feature,
Twitter
Ma et al. [34] diffusion-based feature Rumor
Weibo
(2) Classifier: SVM
(1) Features: percentage of signal tweets,
√ entropy ratio, tweet lengths, retweets,
Zhao et al. [167] Rumor Twitter
URLs, hashtags, @Mentions
(2) Classifier: Decision tree
√ (1) Features: similarity-related, review
Lin et al. [63] frequency, repeatability index Spam Review
(2) Classifier: SVM, Logistic regression
√ (1) Features: group (individual)
Mukherjee et al. [82] spam behavior Spam Review
(2) Classifier: SVM, Logistic regression
√ (1) Features: review-centric,
Jindal and Liu [33] reviewer -centric, product-centric Spam Review
(2) Classifier: Logistic regression
√ Kernel-based models
Ma et al. [72] √ Tree kernel
Rumor Twitter
Wu et al. [38] Hybrid kernel
EM-based models
Wang et al. [133],
[129–131], √ Truth
[128, 132], Maximum likelihood estimation Twitter
discovery
Marshall et al. [76],
Huang et al. [30].
(Continued)

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:10 F. Xu et al.

Table 3. Continued

Graph-based models
√ Fake News
Yang et al. [150] Bayesian network + Gibbs sampling
news sites

Xia et al. [143] Rumor Weibo
Tschiatscheket √ Fake
Facebook
et al. [119] √ Bayesian inference news
Kumar et al. [47] √
Hooi et al. [29] √ Spam Review
Beutel et al. [4]
Logistic regression;

Tacchini et al. [114] Boolean label Hoax Facebook
crowdsourcing
Wang et al. [126] - - Maximum likelihood estimation Rumor Weibo

Nguyen et al. [84] √ Fake news Weibo
Rayana et al. [96] Markov Random Field Spam Review
Mukherjee and √ Truth News
Weikum [83] discovery sites

Wang et al. [24] Iterative computation Spam Review
Iteration-based models

Shu et al. [112]
Tri-relationship optimization
√ Fake News
Shu et al. [111]
√ news sites
Kim et al. [42] Bayesian inference
√ Twitter
Jin et al. [31] Iterative deduction
Weibo
Modeling-based methods

Ye et al. [153] Temporal information
Spam Review
√ Burst detection in time-series
Xie et al. [144]
(Curve fitting)
Deep neural network approaches
DNN-based models
√ Truth
Marshall et al. [76] DNN for truth discovery Twitter
discovery
(Tree) RNN-based models

Popat et al. [88] Fake news Twitter
√ Stance News
Hanselowski et al. [27]
√ √ detection sites
Wu et al. [141] LSTM (Bi-LSTM) Twitter
√ News
Sarkar et al. [103] Fake news
sites
√ Twitter
Ruchansky et al. [101]
Weibo
√ √ News
Rashkin et al. [94]
√ sites
Yao et al. [152] Spam Reivew
(Continued)

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:11

Table 3. Continued

Wen et al. [140] √
Ma et al. [73] √ GRU Rumor Twitter
Rath et al. [95] √
Ma et al. [71]
Other deep learning-based methods
CNN-based models
√ News
Karimi et al. [36] Fake
sites
√ Single CNN news
Twitter
Qian et al. [93]
Weibo
√ √ Twitter
Liu et al. [66] Rumor
Weibo
√ Hybrid CNN (User description Fake News
Wang et al. [134]
with CNN) news sites
√ Twitter
Xu et al. [145] Hybrid CNN (Topic with CNN) Rumor
Weibo
Other deep learning models
√ GAN-BOW; GAN-CNN;
Ma et al. [70] Rumor Twitter
GAN-GRU

Przybyla [90] √ News
Fake news
Tredici et al. [117] √BERT sites
Yu et al. [156] Rumor Twitter
Ma et al. [69]
Multi-task;
Yu et al. [156]
Rumor detection: Non-rumor,
Wei et al. [139]
√ True rumor, False rumor, Rumor
Wu et al. [142] Twitter
Unverified rumor. Stance
Cheng et al. [10]
Stance detection: Support, Deny,
Chen et al. [9]
Questions, Comment.
Li et al. [59]
Hybrid approaches (Traditional & deep neural network)
√ Support Vector Machines; Logistic
Shu et al. [110] News
Regression; Naive Bayes; CNN; LSTM Fake news
sites
MaxEntropy; Random forest; LSTM;
√ CNN
Volkova et al. [122]
Features: Content, Style, Syntax,
Connotations, etc.
√ √
Volkova et al. [123] LSTM/CNN+Linguistic Cues Spam Twitter
(B. Indicates Binary; M. Donates Multiclass).

(3) EM (Expectation Maximization)-based models. Recently, EM-based models are pro-


posed in the literature of truth discovery [30, 76, 128–133]. EM is an optimization algorithm for
Maximum Likelihood Estimation (MLE) by iteration [15]. It consists of E-step and M-step.
Specifically, the function of E-step is to compute a distribution on the labels of the points using
current parameters. While the M-step is adopted to update parameters using current guess of la-
bel distribution. They considered a group of M sources, namely, S1 , S2 , . . . , SM , who contribute
individual observation on a collection of N measured variables, C1 , C2 , . . . , CN . They took sources

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:12 F. Xu et al.

Table 4. Term Definitions

Term Description
P(Cj = 1) the probability that the measured variable Cj is true
the probability that source Si reports an observation of a measured variable.
si It can be computed as the fraction of measured variables reported by Si over
the whole variables
d the probability that a randomly selected variable is true
ti the source reliability, which is often not known a priori
the probability that a source Si reports a measured variable to be true
ai
under the condition of it is true
the probability that a source Si reports a measured variable to be true
bi
under the condition of it is false.

as Twitter users, who tweet during observation. The measured variables are embodied by tweets
clustering, which represent observations about the same topic of events. The social sensing data
people observed can be compactly represented by a sensing matrix SC, where Si Cj =1 indicates
that Si reports Cj to be true, and Si Cj =0 otherwise. Given observed data (i.e., a sensing matrix SC),
what is the likelihood of a specific source to make a correct observation and what is the correct
state of each measured variable? Some specific term definitions are shown in Table 4, which will
be used in the truth discovery problem formulation.
(4) Graph-based models. The key idea of graph-based models is to create some latent variables
and adopt hyper parameters to incorporate prior knowledge into graph model. Generally, the prior
knowledge can be embodied with the distribution of truth and source weight. We here only list
a few representative literatures, such as Mukherjee and Weikum [83], Yang et al. [150], Tacchini
et al. [114], Hooi et al. [29], Kumar et al. [47], Beutel et al. [4], Tschiatschek et al. [119], Wang et al.
[24], Rayana et al. [96], Tripathy et al. [118], and Wang et al. [126].
For example, Yang et al. [150] treat news truths and the credibility of user as two latent ran-
dom variables and identify users’ opinion based on the new’s authenticity. To solve the problem,
they proposed a Gibbs sampling-based method to infer the news truths and the credibility of
users. Similarly, Nguyen et al. [84] took the false information detection as a reasoning problem
in Markov random field, which was solved by using an iterative average algorithm. Again, Xia
et al. [143] considered the event states and divided the states to many sub-events and integrated
the current sub-event and the previous sub-event. Then, the combined sub-event was fed into a
time-smoothing-based model to measure the performance of early rumor detection.
More specifically, a Beta distribution with hyper parameter γ = (γ 1 , γ 2 ) is adopted to generate
the probability of i. For each user j, its credibility is modeled with ϕ j 1 and ϕ j 0 , which indicates
its true positive rate and its false positive rate, respectively. Based upon these, four variables (i.e.,
ϕ k 0,0 , ϕ k 0,1 , ϕ k 1,0 , and ϕ k 1,1 ) are adopted to model the credibility of each unverified user k ∈ K.
(5) Iteration-based models. The iteration-based models adopt the assumption that a fact will
has a relative high confidence when it is contributed by reliable sources, and a source will be
reliable if it provides many high trustworthy facts. Based upon these two heuristics, the fact confi-
dence can be inferred from source trustworthiness and vice versa. Several representative methods
repeatedly update them until achieving a stable state [31, 42, 111, 112].
The latest work [112] claimed that the social context generates a tri-relationship (e.g., news
pieces, publishers, and users) that is effective to debunk fake news. They proposed a tri-relationship

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:13

Fig. 3. A tri-relationship framework.

embedding framework TriFN, as shown in Figure 3, to model the interaction between pub-
lisher/user and news simultaneously.7
(6) Modeling-based methods. Currently, modeling-based methods investigated the propaga-
tion patterns of true and fake news. For example, Xie et al. [144] adopted the curve fitting tech-
nology to detect spam reviews. They employed burst detection to create Correlated Abnormal
Patterns Detection in Multidimensional Time Series (CAPT-MDTS) in time-series. Similarly,
Ye et al. [153] investigated the stream data situation and bucketed them into window sequences
with temporal information. They proposed an algorithm to identify anomalies in the time-series
signal. In general, we provide a briefly summarization of each representative traditional approach,
including its general methodology, which learning algorithm utilized, and its corresponding social
platform in Table 3.

3.1.2 Neural Network Approaches for Disinformation Detection. In this section, we further intro-
duce the representative deep learning-based models for disinformation detection from infrastruc-
ture (i.e., DNN, CNN, RNN, LSTM, BERT, etc.) and mechanism perspectives (text with image/multi-
modal, text with propagation, and fusion models).
(1) From infrastructure perspective
For disinformation detection and truth discovery in social sensing, since most posts of web
claims are written in the form of natural language, how to represent the word sequence to a dis-
tributed representation is an interesting research direction. Deep neural network can automatically
extract high-level abstractive features from web claims. Therefore, Deep Neural Network (DNN)
[75, 140] is a natural way to handle disinformation detection and truth discovery.
Due to disinformation are mostly in the form of text, the temporal sequence of text can be
successfully capture by Recurrent Neural Network (RNN) [27, 66, 71, 73, 88, 94, 95, 101, 103,
134, 141, 152].
By contrast, some researchers [25, 36, 66, 93, 103, 122, 134, 145] adopted Convolutional Neu-
ral Networks (CNN) to capture local features effectively when detecting disinformation. Recently,
Xu et al. [145] focused on the topic-driven rumor detecton only on source microblogs. They an-
notated a fine-grained 16 topics (i.e., recreational sports, social politics, diffusion forwarding, etc.)
into current popular Twitter and Weibo datasets and conducted topic distribution classification

7 To improve the readability when printed in black-and-white, we have changed the previous colorful picture to black-and-

white version under the permission of the author of that paper.

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:14 F. Xu et al.

on source microblogs, and then they successfully incorporated the predicted topic vector of the
source microblogs into rumor detection.
Currently, Przybyla [90] and Yu et al. [156] adopted a pre-trained BERT (Bi-directional
Encoder Representations from Transformers) model to conduct fake news or rumor detection.
Generally, the pre-trained BERT model is good at computational efficiencies and representation
ability.
Most of them use cross entropy as the loss function. These models can be implemented by using
popular deep learning tools such as Tensorflow, Keras, Theno, and pyTorch .
Recently, Ma et al. [69] adopted a multi-task learning framework to conduct rumor detection
and stance detection simultaneously. They focused on fine-grained rumor detection and stance
detection. The categories of their rumor detection include false rumor, non-rumor, true rumor, and
unverified rumor; and the categories of their stance detection include support, deny, question, and
comment. Due to the corresponding stance of a web claim is a good indicator for rumor detection,
they obtained a good detection performance on both tasks. Similarly, Yu et al. [156] adopted a
BERT model to train authenticity and stance, and the hidden layer representation of each sub
thread is connected in transformer to capture the global interaction between posts. Furthermore,
Wei et al. [139] constructed a time series and interaction diagram of the post, which is represented
by a GRU. Then, a GCN (Graph Convolutional Network) was adopted to train a single task,
and the two vectors for each task are spliced in time order to train the authenticity of rumors. In
Reference [142], a transformer was adopted to extract private and shared parameters, and multi-
attention mechanism was also integrated. They selected useful shared features through a gate
in GRU. Besides, Ma et al. [70] proposed a GAN (Generative Adversarial Networks)-based
model, as shown in Equation (4). Compared with traditional data-driven approaches, their model
can capture stronger non-trivial patterns via GAN. Due to space limitation, we only present the
most representative deep learning methods in Table 3, including their general methodology, which
learning algorithm utilized, and their corresponding social platforms.
     
1 
max min α − ||ȳ − ŷ||2 + λ||ΘD ||2 + (1 − α ) ||x t − x t ||2
2 2
, (4)
ΘD ΘG T
where ȳ and ŷ are, respectively, the ground-truth and predicted class probability distributions. ΘD
is discriminator parameter, ΘG is generator parameter, λ is the tradeoff coefficient, α is the coeffi-

cient variable, xt and xt are the tth unit in the original and reconstructed sequences, respectively,
T is the length of a sequence, and ||.||2 represents the L2 -norm of a given vector.
There are some hybrid models that integrate neural network models with traditional feature-
based models to conduct disinformation detection [110, 122, 123]. Again, we provide a brief sum-
marization of three representatives in Table 3.
(2) From mechanism perspective
Text with image/multi-modal: The above models shown in Table 3 are text-driven meth-
ods. According to Reference [32], more than 51.60% of microblogs have pictures. On average, the
forwarding amount of microblogs with pictures is 11 times that of microblogs without pictures.
The massive amount of disinformation in the Internet attracts users’ attention by relying on a
large number of false pictures, which shows that pictures play an important role in the disinfor-
mation detection. Impressively, Reference [32] was the first attempt that systematically explores
image features on the news verification task. They adopted VGG (Visual Geometry Group)-19
to extract semantic features from images (i.e., visual clarity score, visual coherence score, visual
similarity distribution histogram, visual diversity score, and visual clustering score) and employed
LSTM along with attention to encode text. Furthermore, Qi et al. [92] adopted a frequency domain
subnetwork to capture the physical features of a fake news image and employed a pixel domain

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:15

subnetwork to capture the semantic features of the fake news image. Then, they fused the above
two sub networks dynamically along with semantic information to finally construct the fake news
detection. In Reference [40], they employed LSTM to concatenate the learned text features and
image features in the encoder side, and reconstructed the original image and text from the hidden
layer vector in the decoder side. Similar work can be found in References [113, 115, 121].
Text with propagation: Generally, propagation is another key factor to conduct disinformation
detection, because the interaction between a source microblog and its subsequent reactions is a
good indicator to judge disinformation. In Reference [46], they proposed a multi-task learning
method to conduct rumor and stance detection simultaneously. In their work, they built a binary-
style tree to construct communication relationships between a source post and its subsequent
reactions by using a tree LSTM model. Again, Bian et al. [6] built two propagation graphs (i.e.,
top-down and bottom-up) by using the GCN framework to conduct rumor detection. Furthermore,
Ren et al. [97] built a heterogeneous graph along with a hierarchical attention mechanism for
representation learning of fake news. They also adopted a GAN method to augment training data.
Similar work can be found in References [41, 57, 68, 151].
Fusion models: Obviously, how to fuse text, propagation, and previous user information is a
big step in disinformation detection. In Reference [157], the local semantic and global communi-
cation information were jointly encoded, and the text representation along with user information
encoding was learned by using multi-head attention. They finally fused these representations to
conduct rumor detection. In contrast, Lu et al. [67] stimulated the potential user interaction by
using graph network structure, and they also constructed a collaborative attention mechanism
to build an explainable model for fake news detection. Similar work can be found in References
[99, 158].

4 PRIVACY-AWARE TRUTH DISCOVERY


According to the survey paper for truth discovery on plaintext domain [61], current truth discovery
algorithms make two assumptions as follows:
• A source will obtain a high reliability value if it frequently provides a reliable information.
• An information will be assigned with a high reliability value to be a truth if it is supported
by many trustworthy sources.
Therefore, the key insight of truth discovery is to conduct weight update for user k and truth
modification for object m iteratively, as shown in Equations (5) and (6), respectively.

M  
wk = f  k
d xm ∗ 
, xm , (5)
m=1 
where f is a decreasing function, M is the total number of objects, and d(.) is the distance function
to measure the difference between the observation values xm k of user and the estimated ground
truths xm ∗ .

K
w k ∗ xm
k
xm ← k=1K , (6)
k=1 w k

where xm ∗ is the estimated ground truths, xm k is the observation value of user k, and K is the
whole number of users.
In social sensing environments, however, most information is generated by crowdsourcing. The
potential issue of crowdsourcing is the privacy issue. For example, are the clouds trustworthy?
Will my data be disclosed to other participants? Can others know my reliability degree? Is the

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:16 F. Xu et al.

communication channel secure? On the one hand, we may generate some sensitive personal infor-
mation on the web, such as health data of patients, locations of participants, answers for special
questions, and so on. On the other hand, the reliability degree of a user is also sensitive. For exam-
ple, inferring personal information and maliciously manipulating data price also exist in our real
world. The key idea of the privacy-aware truth discovery is to protect the process of both weight
update and truth estimation [62] and [148]. We briefly summarize representative works in this
area as below.
(1) Homomorphic cryptosystem-based models. Miao et al. [77] proposed a MapReduce-
driven parallel threshold Paillier cryptosystem-based model for privacy-aware truth discovery,
which consists of three components (i.e., secure sum protocol, weight update, and truth
estimation).
Due to privacy concerns, the plaintext should be encrypted before sending to a cloud server.
Therefore, they employed threshold Paillier cryptosystem [14] to design a secure sum protocol.
More specifically, the sum protocol will calculate the summation from each user data with the en-
crypted format. Although the server know the summation value, it still cannot infer the individual
data of each user.
The secure weight update component needs many steps to update weights. That is, server will
send the ground truth value to each user; each user will encrypt their data and send them to cloud
server; the cloud server will employ the proposed secure sum protocol to calculate the sum of our
data and send the average value to each user; the left process is similar to the aforementioned steps.
Meanwhile, the secure truth estimation needs many steps. That is, server sends the encrypted
weight to each user, and each user calculates the ciphertexts accordingly, followed by the server
updating truth.
Similarly, Xu et al. [146, 147] adopted an additive homomorphic cryptosystem to design privacy-
aware truth discovery. They proposed a super-increasing based sequence to model the input se-
quence. The secure weight update and truth estimation are similar to Reference [77]. To improve
the efficiency, Miao et al. [78], Zhang et al. [162, 163], and Zheng et al. [168] designed a two-cloud
server-based model. Their main algorithms, however, are similar to the above steps. They adopted
two clouds to conduct the interaction process with participating workers. Besides, Zhang et al.
[162, 163] also designed a fault tolerance mechanism in their algorithm.
(2) Yao’s garbled circuit-based models. Tang et al. [116] and Zheng et al. [169] adopted Yao’s
garbled circuit to conduct encryption in truth discovery. They integrated many gates (i.e., squaring
gate, the sub gate, etc.) into the Yao’s garbled circuit. More specifically, the framework in Reference
[116] consist of four steps. That is, each provider will generate random mask to hide their initial
data; each provider will send the random masks to security service provider (SSP); the SSP will
send a collection of designed garbled circuits to evaluator; the evaluator will estimate truth based
on the concealed data.
The key idea of the garbled circuit-based model in Reference [116] is the garbled circuit Cπ is
self-contained, thereby exposing no intermediate values. To obtain the approximate logarithm to a
circuit, they developed a Boolean circuit, as shown in Figure 4, to select the correct linear function
results.8 If the input is within τ i and τ i+1 (the target knots), then 1 will be generated from the AND
gate, otherwise 0. The output of the AND gate will affect the final result.
(3) Diffie-Hellman key agreement based & streaming models. Liu et al. [65] designed a
real-time privacy-preserving truth discovery (RTPT) based on Diffie-Hellman key agreement
to encrypt the information. Their secure summation aggregation includes four components (i.e.,

8 To improve the resolution of the previous picture, we have drawn a new picture with the permission of the author of that

paper.

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:17

Fig. 4. The approximated logarithm of a circuit; The Comp donates the binary comparator; the 
x stands
for the input of the circuit.

Fig. 5. The workflow of the RTPT system.

setup process, keys sharing, masked collection, and unmasking). In the setup process, the client and
server will exchange public parameters. The workflow of the algorithm consists of three phases,
as shown in Figure 5, e.g., initialization, secure truth estimation (steps 1–3, as shown in Figure 5),
and secure weight update (steps 4–7, as shown in Figure 5).
Specifically, in the initialization phase, each worker executes the common setup process to set
weights wi O = 1 and loss di O = 0 at the beginning. The secure truth estimation needs three steps.
That is, the sensing workers masked weighted data and weight and send them to cloud server; the
server conduct truths update; the cloud server sends back the estimated truths to each sensing
worker and end-user, respectively. The secure weight update needs four steps accordingly. That is,
each sensing worker conducts loss update; each worker will send the masked loss to cloud server;
the cloud server will calculate the sum of loss and send it back to sensing worker; each sensing
worker will update their weights accordingly.

4.1 Evaluation Metrics


Due to most approaches taking the disinformation detection and truth discovery task as a classi-
fication problem, the most widely used metrics for them are Accuracy, Precision, Recall, F1, and

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:18 F. Xu et al.

AUC (Equation (7)).


 M ∗(M +1)

ins i ∈posit ivecl ass rankinsi − 2
AU C = , (7)
M ∗N
where rankins i stands for the rank of ith sample; M and N indicate the total number of positive
and negative samples, respectively; insi ∈ positiveclass means to sum up all the rank of positive
samples. Detail information can refer to Reference [26].

5 DATASETS AND OPEN CODES


To facilitate the future studies in this quite promising field, we will list the related information
about publicly accessible real-world disinformation labeled datasets and some open source codes
in this section. Since open source codes and datasets for privacy-aware truth discovery are not
available now, we only list the datasets and open codes in disinformation detection area.

5.1 Real-world Datasets


Table 5 summarizes publicly accessible but commonly used real-world disinformation and truth
discovery labeled datasets. There are 40 such datasets for disinformation. More specifically, for
each dataset, we provide the brief information of its source type (such as rumor, fake news, review,
or web claims), types of labels, URL, and a brief description. For convenience, we categorize these
datasets into four groups, i.e., rumor (10 datasets available), fake news (22 datasets available), hoax
(2 datasets available), opinion (1 dataset available), incongruity (1 dataset available), and truth
discovery (4 datasets available). As Table 5 shows, the binary and multi-class cases occupy 55.00%
and 45.00%, respectively. Due to the high workload to label the original data, however, some of
them are very small, including 100–1000 tweets or web claims. With the development of deep
learning, larger datasets are needed in this research area.

5.2 Open-source Codes


Although more than 100 papers on disinformation detection have been published in the main-
stream conferences, the number of open source codes is much less. To facilitate the future re-
searches, we list 22 collections of public available source codes on disinformation detection in
Table 6. As shown in Table 6, most of the 22 open source codes are deep neural network-based
models, except that References [96, 114] are graph-based models on Facebook and review, respec-
tively. Furthermore, among these 22 codes, 13 of them are proposed for fake news detection, 5 of
them are presented for rumor detection, 2 of them are proposed for hoax detection, 1 of them is
presented for opinion spam detection, and the remaining 1 is proposed for incongruity detection,
which focuses on the consistence match between news title and news body text. Besides, only 9
methods can handle multi-class detection, and the remaining 13 code collections can only handle
binary detection.

6 CONCLUSIONS AND PROSPECTS


Identifying disinformation is crucial to online social media, where large amounts of information
are easily spread across networks with unverified authority. Disinformation can damage people’s
daily lives. In this article, we gave a thorough review on both disinformation detection and
truth discovery in social sensing under a unified perspective. To the best of our knowledge,
this is the first review article under this angle of view. First, we introduced the basic con-
cepts of disinformation and related terminologies, such as misinformation, rumor, hoax, and
fake news. Then, we provided a taxonomy of both disinformation detection and truth discov-
ery algorithms in social sensing along with a promising new research on privacy-aware truth

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
Table 5. Publicly Accessible Real-world Datasets
Data Label
Source URL Statistics Used in
set B. M.
Ma et al. [71],
Ma et al. [72],
Ruchansky et al. [101],
Liu et al. [66],
Ma et al. [73],
Qian et al. [93],
Twitter:
Ma et al. [69],
#users=491,229;
Ma et al. [70],
#posts=1,101,985;
Chen et al. [9],
√ http://alt.qcri.org/w̃gao/ #events=992
Ma et al. [71] Xia et al. [143],
data/rumdect.zip Weibo:
Kochkina et al. [43],
#users=2,746,818;
Qi et al. [92],
#posts=3,805,656;
Rumor Khattar et al. [40],
#events=4664
Bian et al. [6],
Khoo et al. [41],
Lu et al. [67],
Yuan et al. [157],
Yuan et al. [158],
Xu et al. [145].
Kwon et al. [72],
Ma et al. [70],
Chen et al. [10],
Yu et al. [156],
Wei et al. [139],
Cheng et al. [10],
https://figshare.com/articles/
√ Wu et al. [142],
PHEME_rumour_scheme_ #tweets=4,842;
Zubiaga et al. [174] Li et al. [59],
dataset_journalism_use_case/ #conversations=330
Nguyen et al. [84],
2068650
Kochkina et al. [43],
Kumar et al. [46],
Li et al. [57],
Ma et al. [68],
Khoo et al. [41],
Xu et al. [145].
√ https://github.com/WeimingWen/ #Real=6,225;
Wen et al. [140]
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing

CCRV #Fake=9404
(Continued)

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:19
Table 5. Continued
6:20
√ https://dataverse.harvard.edu/
#true_rumours=51;
Kwon et al. [50] dataset.xhtml?persistentId=doi
#false_rumours=60
%3A10.7910%2FDVN%2FBFGAVZ
Twitter15:
#tweets=1490;
√ https://www.dropbox.com/s/ #users=276,663
Ma et al. [72] 7ewzdrbelpmrnxu/
Twitter16:
rumdetect2017.zip?dl=0
#tweets=818;
#users=173,487
#false rumors=2601;
√ http://adapt.seiee.sjtu.edu.cn/ #normal messages=2536;
Wu et al. [38]
k̃zhu/rumor/. # users=4
√ million.
Kwon et al. [51] http://mia.kaist.ac.kr/publications/rumor #events=104
#rows=49972;
Fake news challenge
#unrelated (%) = 0.73131;
2017. Fifty of the 80 √
http://www.fakenewschallenge.org/ #discuss (%) = 0.17828;
participants made
#agree (%) = 0.0736012;
submissions.
#disagree (%) = 0.0168094.
Wang et al. [134],
√ https://www.cs.ucsb.edu/w̃illiam/ Sarkar et al. [103],
Fake news Wang et al. [134] #short statements=12.8K. Karimi et al. [36],
data/liar_dataset.zip
Yang et al. [150],
Xu et al. [145].
√ http://compsocial.github.io/ Mitra et al. [80],
CREDBANK #tweets=169 million. Mitra and
CREDBANK-data/
Gilbert [79].
#NewsTrust stories=82K; Mukherjee and

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
Mukherjee and √ http://www.mpi-inf.mpg.de/impact/
#Articles=47.6K Weikum [83],
Weikum [83] credibilityanalysis/
#Sources=5.7K Popat et al. [88].
PolitiFact:
#news articles_fake=432;
#news articles_real=624;
#users_fake=95,553;
√ https://github.com/KaiDMML/ #users_real=249,887
Shu et al. [110]
FakeNewsNet GossipCop:
#news articles_fake=5323;
#news articles_real=16817;
#users_fake=265,155;
#users_real=80.137
(Continued)
F. Xu et al.
Table 5. Continued
FakeNewsAMT:
√ http://lit.eecs.umich.edu/ #Fake=240; #Legitimate
Rosas et al. [98]
downloads.html Celebrity:
#Fake=250; #Legitimate=250
https://doi.org/10.5281/
zenodo.1239675 #articles=1627;
√ https://github.com/ #Mainstream=826;
Potthast et al. [89]
BuzzFeedNews/2016-10- #Left-wing=356;
facebook-fact-check/tree/ #Right-wing=545
master/data
#Total claims (Snopes) = 4341;
√ https://www.mpi-inf.mpg.de/ #Total claimsPolitiFact=3568;
Popat et al. [88]
dl-cred-analysis/ #Total claims (NewsTrust) = 5344;
#Total claims (SemEval) = 272.
√ https://github.com/UKPLab/
Hanselowski et al. [27] #topics=300
coling2018_fake-news-challenge
√√ #suspicious_news=174;
Volkova et al. [123] http://www.cs.jhu.edu/~svitlana/ #vrified_news=252;
#trust_news=252
√ https://github.com/selfagency/ #websites=244;
BS-detector 2017
bs-detector #posts=12,999
Ferreira and √ https://github.com/willferreira/ #claims=300;
Web claim Vlachos [21] mscproject #news_article=2,595
√ https://gitlab.com/didizlatkova/
Zlatkova et al. [171] #snopes (reuters) = 20000
fake-image-detection
Ott et al. [85], √
Review https://myleott.com/op-spam #reviews=1600
Ott et al. [86]
Wang et al. [128],
Truth #tweets>9.2 million [129], [130],
√ http://apollo.cse.nd.edu/
discovery Wang et al. [128] (Note no ground truths [131], [133],
datasets.html
on Twitter are available) Huang et al. [30],
Marshall et al. [76] .
Truth
#stocks=1000; #sources=50
discovery √
Li et al. [60] http://lunadong.com/fusionDataSets.htm #flights=1200; #sources=38
for data
#books=1263; #sources=894
fusion
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing

(B. Indicates Binary; M. Donates Multiclass).

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:21
6:22 F. Xu et al.

Table 6. Open Accessible Source Codes


Label
Algorithm Source URL
B. M.
Fake news
√ detection
FakeNewsNet Shu et al. [110] √ https://github.com/KaiDMML/FakeNewsNet
GROVER Zellers et al. [160] √ https://github.com/rowanz/grover
HDSF Karimi et al. [37] √ https://github.com/zake7749/WSDM-Cup-2019
FND Khan et al. [39] https://github.com/Tawkat/
√ Fake-News-Detection
Check-It Paschalides et al. [87] https://github.com/anguyen120/
√ fake-news-in-time
GDL Monti et al. [81] √ https://github.com/kc-ml2/ipam-2019-dgl
EANN Wang et al. [136] √ https://github.com/yaqingwang/EANN-KDD18
CSI Ruchansky et al. [101] √ https://github.com/s-omranpour/CSI-Code
NLI Yang et al. [149] √ https://github.com/zake7749/WSDM-Cup-2019
Stylometric Potthast et al. [89] √ https://github.com/webis-de/ACL-18
FNC Hanselowski et al. [27] https://github.com/UKPLab/coling2018_fake-
√ news-challenge
HCNN Wang et al. [134] https://github.com/ekagra-ranjan/fake-news-
√ √ detection-LIAR-pytorch
WebCredibility Esteves et al. [17] √ https://github.com/DeFacto/WebCredibility
Perturbation Bhat et al. [5] https://github.com/meghu2791/
√ evaluateNeuralFakenewsDetectors
WeFEND Wang et al. [137] https://github.com/yaqingwang/
√ WeFEND-AAAI20
MALCOM Le et al. [52] √ https://github.com/lethaiq/MALCOM
dEFEND Shu et al. [109] √ https://github.com/cuilimeng/dEFEND-web
MALCOM Le et al. [52] √ https://github.com/lethaiq/MALCOM
DIDAN Tan et al. [115] √ https://github.com/rxtan2/DIDAN/
Nguyenvo Vo et al. [121] √ https://github.com/nguyenvo09/EMNLP2020
GCAN Lu et al. [67] √ https://github.com/l852888/GCAN
SMAN Yuan et al. [158] https://github.com/chunyuanY/
FakeNewsDetection
Rumor
√ detection
GAN_Rumor Ma et al. [70] √ https://github.com/majingCUHK/Rumor_GAN
RumourEval2019_1 Fajcik et al. [18] √ https://github.com/MFajcik/RumourEval2019
RvNN_Rumor Ma et al. [73] https://github.com/majingCUHK/
√ Rumor_RvNN
RumourEval2019_2 Kochkina1 et al. [44] https://github.com/seongjinpark-88/
√ √ RumorEval2019
CLEARumor Baris et al. [3] https://github.com/Institute-Web-Science-and-
Technologies/CLEARumor
Coupled_ √
Hierarchical_ Yu et al. [156] https://github.com/nguyenvo09/EMNLP2020
Transformer √
Uncertainty Kochkina et al. [43] https://github.com/kochkinaelena/
√ Uncertainty4VerificationModels
VRoC Cheng et al. [10] √ https://github.com/cmxxx/VRoC
GLAN Yuan et al. [157] https://github.com/chunyuanY/
√ RumorDetection
StA Khoo et al. [41] https://github.com/serenaklm/
rumor_detection
(Continued)

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:23

Table 6. Continued
Hoax detection

BLC_HOAX Tacchini et al. [114] https://github.com/gabll/some-like-it-hoax
√ https://hoaxy.iuni.iu.edu/
HOAXY Shao et al. [104, 105]
http://botometer.iuni.iu.edu
Opinion
√ spam detection
OSD Rayana et al. [96] https://www.dropbox.com/sh/iqcuj0363zcj3go/
AAAvbZVR_PSNyJX8AXUXpBqea?dl=0
Incongruity
√ detection
Incongruity Yoon et al. [155] https://github.com/david-yoon/detecting-
incongruity
Truth
√ √ discovery
CRH Li et al. [58] √ √ https://cse.buffalo.edu/~jing/software.htm
SQUARE Sheshadri et al. [108] √ √ http://ir.ischool.utexas.edu/square/index.html
CEKA Zhang et al. [165] √ √ http://ceka.sourceforge.net/
DAFNA-EA Waguih et al. [125] https://github.com/daqcri/DAFNA-EA
(B. Indicates Binary; M. Donates Multiclass).

discovery. What is more, we summarize the mechanism of disinformation from text only, text with
image/multi-modal, text with propagation, and fusion models perspectives. Furthermore, we re-
view existing solutions based on these requirements and compare their pros and cons. Meanwhile,
to facilitate future studies in this field, we provided a set of open accessible real-world datasets and
open-source codes, respectively. To give a sort of guide to usage, we summarize a detailed lesson
learned below.

6.1 A Detailed Lessons Learned


In this section, we describe a detailed lessons learned from the existing representative algorithms.
6.1.1 Pros and Cons for Existing Models. Clearly, each traditional feature-based approach has
pros and cons. For the user-based models, the meta-data of users (i.e., occupation, age, state, party,
and prior history) is a good indicator to disinformation detection. However, this kind of model is
not suitable when the user profiles are not available. It is obvious that users may not fill in their
private information on the microblogs for the sake of their privacy protection. In comparison, for
the linguistic-based models, the external resource (i.e., LIWC, knowledge base, sentiment analysis)
is not effective to design an end-to-end model. Obviously, the sentiment-based model is affected by
the performance of existing sentiment analysis algorithms. Furthermore, the location & temporal
information is not easy to obtain only from text information. However, the location is relatively
easy to obtain from pictures. By contrast, the kernel-based models are effective when using a suit-
able kernel (i.e., linear, radial basis, or Gaussian). However, how to select a quite reasonable kernel
is a challenging task. Differently, the EM-based or iteration-based models are easy to understand.
Such iteration-based models follow a strict mathematical formalization and sometimes obtain a
quite promising performance on a specific dataset. However, either EM-based or iteration-based
models are easy to get into a local optimization value. Also, the initial value setting is not an easy
task. Generally, graph-based models are quite effective. They can represent a complicated relation-
ship among users, text, and other information. In addition, hybrid models (i.e., combination of user
profile or linguistic or semantic features) is missing.
Different from traditional feature-based models, current deep learning-based models obtain
a better performance on disinformation detection. However, the interpretability of these mod-
els is a big problem. Sometimes, this deep learning-based model is just like a black-box. We do
not know exactly why they can obtain the corresponding good performance clearly. In addition,

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:24 F. Xu et al.

although some models investigate the fusion of text with image, or text with propagation, how
to perform a deep fusion for text, propagation, and picture is still an open problem. The current
popular GCN framework could be a promising infrastructure to fuse them altogether. Further-
more, the current models on multi-model disinformation detection usually neglect the judgment
of semantic consistency between a source post and its corresponding picture. They only extract
the semantic features for a picture, using a VGG network, and then concatenate the extracted
VGG features with the text-driven encoding for all the picture-text pair. The lack of picture-text
consistency judgment definitely brings noises in these models. In fact, the semantic consistency
judgment between a source post and its corresponding picture is a key step for disinformation
detection.

6.1.2 Best Performance. In terms of performance, Ma et al. [68] obtained a best macro F1-score
of 78.70% and 71.00% on Twitter and Pheme datasets, respectively, for the 4-way classification
(i.e., non-rumor, false rumor, true rumor, and unverified rumor) by using a tree transformer-based
model. Furthermore, Yang et al. [150] achieved an accuracy of 75.90%, F1-score of 77.40%, and
74.10% for the binary true or fake news, respectively, on the Liar dataset, using a Bayesian network-
based model. In addition, Ma et al. [70] obtained an accuracy of 86.30%, F1-score of 86.60% and
85.80% for the binary rumor or non-rumor, respectively, on the Twitter dataset. They also achieved
an accuracy of 78.10%, F1-score of 78.40% and 77.80% for the binary rumor or non-rumor, respec-
tively, on the Pheme dataset, using a GAN-based model. Besides, Nguyen et al. [84] achieved an
accuracy of 96.20% and macro F1-score of 97.00% for the binary rumor and non-rumor on the
Weibo dataset, using a deep Markov random fields-based model.
Currently, the Twitter, Weibo, Pheme, and Liar datasets are popular datasets in disinforma-
tion detection. Although the performance of these models on disinformation or truth discovery
is promising, their experiments were conducted on their own dataset. Some of the datasets are
not publicly available. Furthermore, there is little experience to compare existing models on more
benchmark datasets.

6.2 Future Research Directions


Although a lot of new models have burst out in past decades, research on disinformation detection
and truth discovery it is still young. Here, we discuss some potential problems that are necessary
to be solved in the near future.
(1) Fine-grained features. At present, traditional feature engineering-based models, such as
LIWC-driven and readability-based methods, are still comparable to some state-of-the-art deep
learning-based models, especially when the number of labeled data is not very large. According to
our knowledge,no publications made thorough comparisons of different feature engineering-based
models on different platforms, such as Twitter, Weibo, web claims, Facebook, Wikipedia, and so
on. It is necessary to distinguish common platform-independent features from platform-specific
features on different platforms. Therefore, we can understand different communities better.
(2) Early detection algorithms. Current algorithms are effective when we have enough labeled
data, especially for propagation-based approaches. They need to model propagation patterns based
on source posts and their subsequent reactions, e.g., retweet. It is obvious that they definitely delay
disinformation detection, because they need more propagation information. Therefore, how we
can perform disinformation detection only using source claims/posts is quite promising, because
it is very important for us to debunk disinformation as soon as possible under many real-world
applications.
(3) Unify disinformation detection with truth discovery. All existing approaches focus on
either disinformation detection or truth discovery. They treat disinformation detection and truth

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:25

discovery as independent tasks. However, as we mentioned before, we can unify the two tasks.
Specifically, we can unify the two tasks through EM-based, graph-based, iteration-based, and deep
neural network-based models. Since these two tasks have some potential common characteristics,
multi-task learning is feasible to integrate the two tasks simultaneously.
(4) Privacy-aware disinformation detection. Currently, all algorithms are privacy-ignorant
in disinformation detection. The user’s posts, claims, answers, however, may be full of privacy. For
example, the economic status of a user can be inferred from the trust and answers of the user. It
is risky to expose such information to an honest but curious server. It is dangerous to share such
information to servers with malicious users or running malicious programs. Therefore, it is urgent
to design privacy-aware disinformation detection algorithms, although it is full of challenges.
(5) Language-related disinformation detection. Most existing approaches focus on disinfor-
mation detection in English. However, Chinese or other language-related disinformation detection
are relatively few. In fact, Chinese is quite different from English in cohesion and coherence char-
acteristics within or across documents. For example, anaphora and default phenomena are more
common in Chinese than in English. Therefore, designing language-specific disinformation detec-
tion solutions are expected in the future.
(6) Semantic consistency judgment for picture-text pair. Current representative multi-
model approaches for disinformation detection neglect the judgment of semantic consistency be-
tween a source post and its corresponding picture. They generally extract visual-driven features
for all pictures. Therefore, they definitely bring noises and reduce the detection performance. In
fact, if we can filter the inconsistent picture from a source post in advance, then the running time
and performance should be dramatically improved accordingly.

ACKNOWLEDGMENTS
The authors would like to thank anonymous reviewers for their insightful comments on this article.

REFERENCES
[1] Charu C. Aggarwal. 2013. Managing and Mining Sensor Data. Springer. DOI:10.1109/TMC.2019.2944829
[2] Samah M. Alzanin and Aqil M. Azmi. 2018. Detecting rumors in social media: A survey. Procedia Comput. Sci. 142
(2018), 294–300. DOI:10.1016/j.procs.2018.10.495
[3] Ipek Baris, Lukas Schmelzeisen, and Steffen Staab. 2019. CLEARumor at semEval-2019 task 7: Convolving ELMo
against rumors. In Proceedings of the 13th International Workshop on Semantic Evaluation. 1105–1109. DOI:10.18653/
v1/S19-2193
[4] Alex Beutel, Kenton Murray, Christos Faloutsos, and Alexander J. Smola. 2014. CoBaFi: Collaborative Bayesian filter-
ing. In Proceedings of the International Conference Companion on World Wide Web (WWW’14). 97–107. DOI:10.1145/
2566486.2568040
[5] Meghana Moorthy Bhat and Srinivasan Parthasarathy. 2020. How effectively can machines defend against machine-
generated fake news? An empirical study. In Proceedings of the 1st Workshop on Insights from Negative Results in NLP.
48–53. DOI:10.18653/v1/2020.insights-1.7
[6] Tian Bian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang, Yu Rong, and Junzhou Huang. 2020. Rumor detection
on social media with bi-directional graph convolutional networks. In Proceedings of the 34th AAAI Conference on
Artificial Intelligence (AAAI’20). 549–556. DOI:10.1609/aaai.v34i01.5393
[7] Juan Cao, Junbo Guo, Xirong Li, Zhiwei Jin, Han Guo, and Jintao Li. 2018. Automatic rumor detection on microblogs:
A survey. arXiv: 1807.03505v1 (2018).
[8] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on Twitter. In Proceedings of
the International Conference Companion on World Wide Web (WWW’11). 675–684. DOI:10.1145/1963405.1963500
[9] Lei Chen, Zhongyu Wei, Jing Li, Baohua Zhou, Qi Zhang, and Xuanjing Huang. 2020. Modeling evolution of message
interaction for rumor resolution. In Proceedings of the 28th International Conference on Computational Linguistics
(COLING’20). 6377–6387. DOI:10.18653/v1/2020.coling-main.561
[10] Mingxi Cheng, Shahin Nazarian, and Paul Bogdan. 2020. VRoC: Variational autoencoder-aided multi-task rumor
classifier based on Text. In Proceedings of the International Conference of World Wide Web (WWW’20). 2892–2898.
DOI:10.1145/3366423.3380054

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:26 F. Xu et al.

[11] Yohan Chon, Nicholas D. Lane, Fan Li, Hojung Cha, and Feng Zhao. 2012. Automatically characterizing places with
opportunistic crowdsensing using smartphones. In Proceedings of the ACM Conference on Ubiquitous Computing (Ubi-
Comp’12). 481–490. DOI:10.1145/2370216.2370288
[12] Dephine Christin, Andreas Reinhardt, Salil S. Kanhere, and Matthias Hollick. 2011. A survey on privacy in mobile
participatory sensing applications. J. Syst. Softw. 84, 11 (2011), 1928–1946. DOI:10.1016/j.jss.2011.06.073
[13] Diane J. Cook and Lawrence B. Holder. 2011. Sensor selection to support practical use of health-monitoring smart
environments. Data Mining Knowl. Discov. 1, 4 (2011), 339–351. DOI:10.1002/widm.20
[14] Ronald Cramer, Ivan Damgrdå, and Jesper B. Nielsen. 2001. Multiparty computation from threshold homomorphic
encryption. Lect. Notes Comput. Sci. 7, 14 (2001), 280–299. DOI:10.7146/brics.v7i14.20141
[15] A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm.
J. Roy. Statist. Soc. Series B (Methodol.) (1977), 1–38. DOI:10.1111/j.2517-6161.1977.tb01600.x
[16] Nicholas DiFonzo and Prashant Bordia. 2007. Rumor, gossip and urban legends. Diogenes 54, 1 (2007), 19–35. DOI:10.
1177/0392192107073433
[17] Diego Esteves, Aniketh Janardhan Reddy, Piyush Chawla, and Lehmann Jens. 2018. Belittling the source: Trustwor-
thiness indicators to obfuscate fake news on the web. In Proceedings of the 1st Workshop on Fact Extraction and
Verification. DOI:10.18653/v1/W18-5508
[18] Martin Fajcik, LukaÃĆÂťs Burget, and Smrz Pavel. 2019. BUT-FIT at semEval-2019 task 7: Determining the rumour
stance with pre-trained deep bidirectional transformers. In Proceedings of the 13th International Workshop on Semantic
Evaluation. DOI:10.18653/v1/S19-2192
[19] Yang Fan, Xiaohui Yu, Liu Yang, and Yang Min. 2012. Automatic detection of rumor on Sina Weibo. In Proceedings of
the ACM SIGKDD Workshop on Mining Data Semantics. DOI:10.1145/2350190.2350203
[20] Miriam Fernandez and Harith Alani. 2018. Online misinformation: Challenges and future directions. In Proceedings
of the International Conference on World Wide Web Companion: the Web Conference Companion. 595–602. DOI:10.
1145/3184558.3188730
[21] William Ferreira and Andreas Vlachos. 2016. Emergent: A novel data-set for stance classification. In Proceedings
of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies (NAACL-HLT’16). 1163–1168. DOI:10.18653/v1/N16-1138
[22] Alvaro Figueira, Nuno Guimaraes, and Luis Torgo. 2018. Current state of the art to detect fake news in social media:
Global trendings and next challenges. In Proceedings of the International Conference on Web Information Systems and
Technologies (ICWIST’18). 332–339. DOI:10.5220/0007188503320339
[23] Raghu K. Ganti, Nam Pham, Hossein Ahmadi, Saurabh Nangia, and Tarek F. Abdelzaher. 2010. GreenGPS: A partici-
patory sensing fuel-efficient maps application. In Proceedings of the MobiSys. 151–164. DOI:10.1145/1814433.1814450
[24] Wang Guan, Sihong Xie, Liu Bing, and Philip S. Yu. 2011. Review graph based online store review spammer detection.
In Proceedings of the IEEE International Conference on Data Mining (ICDM’11). 1242–1247. DOI:10.1109/ICDM.2011.
124
[25] Maike Guderlei and Aßenmacher Matthias. 2020. Evaluating unsupervised representation learning for detecting
stances of fake news. In Proceedings of the 28th International Conference on Computational Linguistics (COLING’20).
6339–6349. DOI:10.18653/v1/2020.coling-main.558
[26] David J. Hand and Robert J. Till. 2001. A simple generalisation of the area under the ROC curve for multiple class
classification problems. Mach. Learn. 45, 2 (2001), 171–186. DOI:10.1023/A:1010920819831
[27] Andreas Hanselowski, Avinesh P. V. S., Benjamin Schiller, Felix Caspelherr, Debanjan Chaudhuri, Christian M. Meyer,
and Iryna Gurevych. 2018. A retrospective analysis of the fake news challenge stance detection task. In Proceedings
of the International Conference on Computational Linguistics (COLING’18). 1859–1874. DOI:20.08.2018--26.08.2018
[28] Peter Hernon. 1995. Disinformation and misinformation through the internet: Findings of an exploratory study.
Government Inform. Quart. 12, 2 (1995), 133–139. DOI:10.1016/0740-624X(95)90052-7
[29] Bryan Hooi, Neil Shah, Alex Beutel, Stephan Gunneman, Leman Akoglu, Mohit Kumar, Disha Makhija, and Christos
Faloutsos. 2016. BIRDNEST: Bayesian inference for ratings-fraud detection. Computer Science (2016). DOI:10.1137/1.
9781611974348.56
[30] Chao Huang, Dong Wang, and Nitesh Chawla. 2016. Towards time-sensitive truth discovery in social sensing appli-
cations. In Proceedings of the IEEE International Conference on Mobile Ad Hoc & Sensor Systems. DOI:10.1109/MASS.
2015.39
[31] Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. 2016. News verification by exploiting conflicting social view-
points in microblogs. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). 2972–2978.
DOI:10.5555/3016100.3016318
[32] Zhiwei Jin, Juan Cao, Yongdong Zhang, Jianshe Zhou, and Qi Tian. 2017. Novel visual and statistical image features
for microblogs news verification. IEEE Trans. Multimedia 19, 3 (2017), 1–38. DOI:10.1109/TMM.2016.2617078

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:27

[33] Nitin Jindal and Bing Liu. 2007. Analyzing and detecting review spam. In Proceedings of the IEEE International Con-
ference on Data Mining (ICDM’07). 547–552. DOI:10.1109/ICDM.2007.68
[34] Ma Jing, Gao Wei, Zhongyu Wei, Yueming Lu, and Kam Fai Wong. 2015. Detect rumors using time series of social
context information on microblogging websites. 1751–1754. DOI:10.1145/2806416.2806607
[35] Shu Kai, Suhang Wang, Amy Sliva, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data
mining perspective. ACM SIGKDD Explor. Newslett. 19, 1 (2017), 22–36. DOI:10.1145/3137597.3137600
[36] Hamid Karimi, Proteek Chandan Roy, Sari Saba Sadiya, and Jiliang Tang. 2018. Multi-source multi-class fake news
detection. In Proceedings of the International Conference on Computational Linguistics (COLING’18). 1546–1557.
[37] Hamid Karimi and Jiliang Tang. 2019. Learning hierarchical discourse-level structure for fake news detection. In
Proceedings of the International Conference on Annual Conference of the North American Chapter of the Association for
Computational Linguistics. DOI:10.18653/v1/N19-1347
[38] Wu Ke, Yang Song, and Kenny Q. Zhu. 2015. False rumors detection on Sina Weibo by propagation structures. In
Proceedings of the IEEE International Conference on Data Engineering. 651–662. DOI:10.1109/ICDE.2015.7113322
[39] Junaed Younus Khan, Md. Tawkat Islam Khondaker, Anindya Iqbal, and Sadia Afroz. 2019. A benchmark study on
machine learning methods for fake news detection. arXiv: 1905.04749v1 (2019).
[40] Dhruv Khattar, Jaipal Singh Goud, Manish Gupta, and Vasudeva Varma. 2019. MVAE: Multimodal variational au-
toencoder for fake news detection. In Proceedings of the International Conference of World Wide Web (WWW’19).
2915–2921. DOI:10.1145/3308558.3313552
[41] Ling Min Serena Khoo, Hai Leong Chieu, Zhong Qian, and Jing Jiang. 2020. Interpretable rumor detection in mi-
croblogs by attending to user interactions. In Proceedings of the 34th AAAI Conference on Artificial Intelligence
(AAAI’20). 8783–8790. DOI:10.1609/aaai.v34i05.6405
[42] Jooyeon Kim, Behzad Tabibian, Alice Oh, Bernhard Schoelkopf, and Manuel Gomez-Rodriguez. 2017. Leveraging the
crowd to detect and reduce the spread of fake news and misinformation. In Proceedings of the ACM International
Conference on Web Search and Data Mining (WSDM’17). DOI:10.1145/3159652.3159734
[43] Elena Kochkina and Maria Liakata. 2018. Estimating predictive uncertainty for rumour verification models. In Pro-
ceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 6964–6981. DOI:10.
18653/v1/2020.acl-main.623
[44] Elena Kochkina1, Maria Liakata, and Isabelle Augenstein. 2017. Turing at SemEval-2017 task 8: Sequential approach
to rumour stance classification with branch-LSTM. In Proceedings of the 11th International Workshop on Semantic
Evaluations (SemEval’2017). 475–480. DOI:10.18653/v1/S17-2083
[45] Nir Kshetri and Jeffrey Voas. 2017. The economics of “fake news.” IT Professional 6 (2017), 8–12. DOI:10.1109/MITP.
2017.4241459
[46] Sumeet Kumar and Kathleen Carley. 2019. Tree LSTMs with convolution units to predict stance and rumor veracity in
social media conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
(ACL’19). 1173–1179. DOI:10.18653/v1/P19-1498
[47] Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, and V. S. Subrahamanian. 2018. Rev2:
Fraudulent user prediction in rating platforms. In Proceedings of the ACM International Conference on Web Search and
Data Mining (WSDM’18). 333–341. DOI:10.1145/3159652.3159729
[48] Srijan Kumar and Neil Shah. 2018. False information on web and social media: A survey. arXiv: 1804.08559v1 (2018).
[49] Srijan Kumar, Robert West, and Jure Leskovec. 2016. Disinformation on the web: Impact, characteristics, and detec-
tion of Wikipedia hoaxes. In Proceedings of the International Conference Companion on World Wide Web (WWW’16).
591–602. DOI:10.1145/2872427.2883085
[50] S. Kwon, M. Cha, and K. Jung. 2017. Rumor detection over varying time windows. PLoS One 12, 1 (2017), 1–19.
DOI:10.1371/journal.pone.0168344
[51] Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Chen Wei, and Yajun Wang. 2013. Prominent features of rumor prop-
agation in online social media. In Proceedings of the IEEE International Conference on Data Mining (ICDM’13). 1103–
1108. DOI:10.1109/ICDM.2013.61
[52] Thai Le, Suhang Wang, and Dongwon Lee. 2020. MALCOM: Generating malicious comments to attack neural fake
news detection models. In Proceedings of the IEEE International Conference on Data Mining (ICDM’20). 282–291.
DOI:10.1109/ICDM50108.2020.00037
[53] Newton Lee. 2013. Misinformation and Disinformation. Springer. DOI:10.1007/978-1-4614-5308-6
[54] Fangtao Li, Minlie Huang, Yang Yi, and Xiaoyan Zhu. 2011. Learning to identify review spam. In Proceedings of the
International Joint Conference on Artificial Intelligence (IJCAI’11). 2488–2493. DOI:10.5555/2283696.2283811
[55] Huayi Li, Zhiyuan Chen, Arjun Mukherjee, Bing Liu, and Jidong Shao. 2015. Analyzing and detecting opinion spam
on a large-scale dataset via temporal and spatial patterns. In Proceedings of the International AAAI Conference on Web
and Social Media. 634–637.

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:28 F. Xu et al.

[56] Jiwei Li, Myle Ott, Claire Cardie, and Eduard Hovy. 2014. Towards a general rule for identifying deceptive opinion
spam. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’14). 1566–1576.
DOI:10.3115/v1/P14-1147
[57] Jiawen Li, Yudianto Sujana, and Hung-Yu Kao. 2020. Exploiting microblog conversation structures to detect rumors.
In Proceedings of the 28th International Conference on Computational Linguistics (COLING’20). 5420–5429. DOI:10.
18653/v1/2020.coling-main.473
[58] Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, and Jiawei Han. 2014. Resolving conflicts in heterogeneous data by
truth discovery and source reliability estimation. In Proceedings of the 28th International Conference on Computational
Linguistics (SIGMOD’14). 1187–1198. DOI:10.1145/2588555.2610509
[59] Quanzhi Li, Qiong Zhang, and Luo Si. 2019. Rumor detection by exploiting user credibility information, attention
and multi-task learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
(ACL’19). 5047–5058. DOI:10.18653/v1/P19-1113
[60] Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. 2013. Truth finding on the deep web:
Is the problem solved? In Proceedings of the International Conference on Very Large Data Bases (VLDB’13). 97–102.
DOI:10.14778/2535568.2448943
[61] Yaliang Li, Gao Jing, Chuishi Meng, Li Qi, and Jiawei Han. 2015. A survey on truth discovery. ACM IGKDD Explor.
Newslett. 17, 2 (2015), 1–16. DOI:10.1145/2897350.2897352
[62] Yaliang Li, Chenglin Miao, Lu Su, Jing Gao, Qi Li, Bolin Ding, Zhan Qin, and Kui Ren. 2018. An efficient two-layer
mechanism for privacy-preserving truth discovery. In Proceedings of the 24th ACM SIGKDD International Conference
on Knowledge Discovery & Data Mining (KDD’18). 1705–1714. DOI:10.1145/3219819.3219998
[63] Yuming Lin, Zhu Tao, Xiaoling Wang, Jingwei Zhang, and Aoying Zhou. 2014. Towards online review spam detection.
In Proceedings of the International Conference Companion on World Wide Web (WWW’14). 341–342. DOI:10.1145/
2567948.2577293
[64] Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Rui Fang, and Sameena Shah. 2015. Real-time rumor debunking on
Twitter. 1867–1870. DOI:10.1145/2806416.2806651
[65] Yuxian Liu, Shaohua Tang, Hao-Tian Wu, and Xinglin Zhang. 2019. RTPT: A framework for real-time privacy-
preserving truth discovery on crowdsensed data streams. Comput. Netw. 148 (2019), 349–360. DOI:10.1016/j.comnet.
2018.11.018
[66] Yang Liu and Yi-Fang Brook Wu. 2018. Early detection of fake news on social media through propagation path
classification with recurrent and convolutional networks. In Proceedings of the 30th AAAI Conference on Artificial
Intelligence (AAAI’18). 354–361. DOI:10.1145/2806416.2806651
[67] Yi-Ju Lu and Cheng-Te Li. 2020. GCAN: Graph-aware co-attention networks for explainable fake news detection
on social media. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20).
505–514. DOI:10.18653/v1/2020.acl-main.48
[68] Jing Ma and Wei Gao. 2020. Debunking rumors on Twitter with tree transformer. In Proceedings of the 28th Interna-
tional Conference on Computational Linguistics (COLING’20). 5455–5466. DOI:10.18653/v1/2020.coling-main.476
[69] Jing Ma, Wei Gao, and Wong Kam-Fai. 2018. Detect rumor and stance jointly by neural multi-task learning. In Pro-
ceedings of the International Conference Companion on World Wide Web (WWW’18). 585–593. DOI:10.1145/3184558.
3188729
[70] Jing Ma, Wei Gao, and Wong Kam-Fai. 2019. Detect rumors on Twitter by promoting information campaigns with gen-
erative adversarial learning. In Proceedings of the International Conference Companion on World Wide Web (WWW’19).
3049–3055. DOI:10.1145/3308558.3313741
[71] Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J. Jansen, Kam-Fai Wong, and Meeyoung Cha. 2016. De-
tecting rumors from microblogs with recurrent neural networks. In Proceedings of the International Joint Conference
on Artificial Intelligence (IJCAI’16). 3818–3824. DOI:10.5555/3061053.3061153
[72] Jing Ma, Wei Gao, and Kam-Fai Wong. 2017. Detect rumors in microblog posts using propagation structure via kernel
learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17). 708–717.
DOI:10.18653/v1/P17-1066
[73] Jing Ma, Wei Gao, and Kam-Fai Wong. 2018. Rumor detection on Twitter with tree-structured recursive neural net-
works. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’18). 1980–1989.
DOI:10.18653/v1/P17-1066
[74] Nicolas Maisonneuve, Matthias Stevens, Maria E. Niessen, Peter Hanappe, and Luc Steels. 2009. Citizen noise pollu-
tion monitoring. In Proceedings of the 10th Annual International Conference on Digital Government Research, Partner-
ships for Public Innovation. 96–103. DOI:10.1145/1556176.1556198
[75] Jermaine Marshall, Arturo Argueta, and Dong Wang. 2017. A neural network approach for truth discovery in social
sensing. In Proceedings of the IEEE International Conference on Mobile Ad Hoc & Sensor Systems. 343–347. DOI:10.
1109/MASS.2017.26

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:29

[76] Jermaine Marshall, Munira Syed, and Dong Wang. 2016. Hardness-aware truth discovery in social sensing ap-
plications. In Proceedings of the International Conference on Distributed Computing in Sensor Systems. 143–152.
DOI:10.1109/DCOSS.2016.9
[77] Chenglin Miao, Wenjun Jiang, Lu Su, Yaliang Li, Suxin Guo, Zhan Qin, Houping Xiao, Jing Gao, and Kui Ren. 2015.
Cloud-enabled privacy-preserving truth discovery in crowd sensing systems. In Proceedings of the ACM Conference
on Embedded Networked Sensor Systems. 183–196. DOI:10.1145/2809695.2809719
[78] Chenglin Miao, Su Lu, Wenjun Jiang, Yaliang Li, Miaomiao Tian, Chenglin Miao, Su Lu, Wenjun Jiang, Yaliang
Li, and Miaomiao Tian. 2017. A lightweight privacy-preserving truth discovery framework for mobile crowd
sensing systems. In Proceedings of the International Conference on Computer Communications (INFOCOM’17). 1–9.
DOI:10.1109/INFOCOM.2017.8057114
[79] Tanushree Mitra. 2015. CREDBANK: A large-scale social media corpus with associated credibility annotations. In
Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM’15). 258–267.
[80] Tanushree Mitra, Graham P. Wright, and Eric Gilbert. 2017. A parsimonious language model of social media credibil-
ity across disparate events. In Proceedings of the ACM Conference on Computer Supported Cooperative Work & Social
Computing (CSCW’17). 126–145. DOI:10.1145/2998181.2998351
[81] Federico Monti, Fabrizio Frasca, Davide Eynard, Damon Mannion, and Michael M. Bronstein. 2019. Fake news detec-
tion on social media using geometric deep learning. arXiv: 1902.06673v1 (2019).
[82] Arjun Mukherjee, Liu Bing, and Natalie Glance. 2012. Spotting fake reviewer groups in consumer reviews. In Pro-
ceedings of the International Conference Companion on World Wide Web (WWW’12). 191–200. DOI:10.1145/2187836.
2187863
[83] Subhabrata Mukherjee and Gerhard Weikum. 2015. Leveraging joint interactions for credibility analysis in news com-
munities. In Proceedings of the ACM International Conference on InfomIation and KnowIedge Management (CIKM’15).
353–362. DOI:10.1145/2806416.2806537
[84] Duc Minh Nguyen, Tien Huu Do, Robert Calderbank, and Deligiannis Nikos. 2019. Fake news detection using deep
Markov random fields. In Proceedings of the Conference of the North American Chapter of the Association for Compu-
tational Linguistics: Human Language Technologies (NAACL-HLT’19). 1391–1400. DOI:10.18653/v1/N19-1141
[85] Myle Ott, Claire Cardie, and Jeffrey T. Hancock. 2013. Negative deceptive opinion spam. In Proceedings of the An-
nual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies (NAACL-HLT’13). 497–501. DOI:10.1.1.701.8423
[86] Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock. 2011. Finding deceptive opinion spam by any stretch
of the imagination. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human
Language Technologies (ACL-HLT’11). 309–319. DOI:10.5555/2002472.2002512
[87] Demetris Paschalides, Chrysovalantis Christodoulou, Rafael Andreou, George Pallis, Marios D. Dikaiakos, Alexan-
dros Kornilakis, and Evangelos Markatos. 2019. Check-It: A plugin for detecting and reducing the spread of fake
news and misinformation on the web. In IEEE/WIC/ACM International Conference on Web Intelligence. 298–302.
DOI:10.1145/3350546.3352534
[88] Kashyap Popat, Subhabrat Mukherjee, Andrew Yates, and Weikum Gerhard. 2018. DeClarE: Debunking fake news
and false claims using evidence-aware deep learning. In Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP’18). 22–32. DOI:10.18653/v1/D18-1003
[89] Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. 2018. A stylometric inquiry into
hyperpartisan and fake news. In Proceedings of the Annual Meeting of the Association for Computational Linguistics
(ACL’18). 231–240. DOI:10.18653/v1/P18-1022
[90] Piotr Przybyla. 2020. Capturing the style of fake news. In Proceedings of the 34th AAAI Conference on Artificial Intel-
ligence (AAAI’20). 490–497. DOI:10.1609/aaai.v34i01.5386
[91] Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. 2011. Rumor has it: Identifying mis-
information in microblogs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing
(EMNLP’11). 1589–1599. DOI:10.5555/2145432.2145602
[92] Peng Qi, Juan Cao, Tianyun Yang, Junbo Guo, and Jintao Li. 2019. Exploiting multi-domain visual information
for fake news detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM’19). 518–527.
DOI:10.1109/ICDM.2019.00062
[93] Feng Qian, Chengyue Gong, Karishma Sharma, and Yan Liu. 2018. Neural user response generator: Fake news
detection with collective user intelligence. In Proceedings of the 27th International Joint Conference on Artificial
Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI’18). 3834–3840. DOI:10.24963/
ijcai.2018/533
[94] Hannah Rashkin, Eunsol Choi, and Jang Jin Yea. 2017. Truth of varying shades: Analyzing language in fake news
and political fact-checking. In Proceedings of the Conference on Empirical Methods in Natural Language Processing
(EMNLP’17). 2931–2937. DOI:10.18653/v1/D17-1317

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:30 F. Xu et al.

[95] Bhavtosh Rath, Gao Wei, Ma Jing, and Jaideep Srivastava. 2017. From retweet to believability: Utilizing trust to
identify rumor spreaders on Twitter. In Proceedings of the IEEE/ACM International Conference on Advances in Social
Networks Analysis & Mining. 179–186. DOI:10.1145/3110025.3110121
[96] Shebuti Rayana and Leman Akoglu. 2015. Collective opinion spam detection: Bridging review networks and meta-
data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD’15).
985–994. DOI:10.1145/2783258.2783370
[97] Yuxiang Ren, Bo Wang, Jiawei Zhang, and Yi Chang. 2020. Adversarial active learning based heterogeneous graph
neural network for fake news detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM’20).
452–461. DOI:10.1109/ICDM50108.2020.00054
[98] Pérez Verónica Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. 2018. Automatic detection of fake
news. arXiv: 1708.07104 (2018), 3391–3401 pages.
[99] Nir Rosenfeld, Aron Szanto, and David C. Parkes. 2020. A kernel of truth: Determining rumor veracity on Twitter
by diffusion pattern alone. In Proceedings of the International Conference of World Wide Web (WWW’20). 1018–1028.
DOI:10.1145/3366423.3380180
[100] Victoria L. Rubin. 2010. On deception and deception detection: Content analysis of computer-mediated stated be-
liefs. In Proceedings of the American Society for Information Science & Technology Conference (ASIST’10) 47, 1, 1–10.
DOI:10.1002/meet.14504701124
[101] Natali Ruchansky, Sungyong Seo, and Liu Yan. 2017. CSI: A hybrid deep model for fake news. In Proceedings of the
ACM International Conference on Information and Knowedge Management (CIKM’17). 797–806. DOI:10.1145/3132847.
3132877
[102] Vlad Sandulescu and Martin Ester. 2016. Detecting singleton review spammers using semantic similarity. In Pro-
ceedings of the International Conference Companion on World Wide Web (WWW’16). 971–976. DOI:10.1145/2740908.
2742570
[103] Sohan De Sarkar, Fan Yang, and Arjun Mukherjee. 2018. Attending sentences to detect satirical fake news. In Pro-
ceedings of the International Conference on Computational Linguistics (COLING’18). 3371–3380.
[104] Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Filippo Menczer. 2016. Hoaxy: A platform
for tracking online misinformation. In Proceedings of the International Conference Companion on World Wide Web
(WWW’16). 745–750. DOI:10.1145/2872518.2890098
[105] Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Alessandro Flammini, and Filippo Menczer. 2018. The
spread of fake news by social bots. Nat. Commun. 9 (2018), 1–9. DOI:10.1038/s41467-018-06930-7
[106] John Shawe-Taylor and Nello Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press.
DOI:10.1017/CBO9780511809682
[107] Victor S. Sheng, Foster J. Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? Improving data quality and
data mining using multiple, noisy labelers. In Proceedings of the ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining (KDD’08). 614–622. DOI:10.1145/1401890.1401965
[108] Aashish Sheshadri and Matthew Lease. 2013. SQUARE: A benchmark for research on computing crowd
consensus. In Proceedings of the 1st AAAI Conference on Human Computation (HCOMP’13). 156–164. DOI:10.1.1.644.
2813
[109] Kai Shu, Limeng Cui, Suhang Wang, Dongwon Lee, and Huan Liu. 2019. dEFEND: Explainable fake news detection.
In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD’19).
395–405. DOI:10.1145/3292500.3330935
[110] Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu. 2020. FakeNewsNet: A data repository
with news content, social context and dynamic information for studying fake news on social media. Big Data 8, 3
(2020), 171–188. DOI:10.1089/big.2020.0062
[111] Kai Shu, Suhang Wang, and Huan Liu. 2018. Exploiting tri-relationship for fake news detection. In Proceedings of the
30th AAAI Conference on Artificial Intelligence (AAAI’18). 1–10.
[112] Kai Shu, Suhang Wang, and Huan Liu. 2019. Beyond news contents: The role of social context for fake news detection.
In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (WSDM’19).
312–320. DOI:10.1145/3289600.3290994
[113] Shivangi Singhal, Anubha Kabra, Mohit Sharma, Rajiv Ratn Shah, Tanmoy Chakraborty, and Ponnurangam
Kumaraguru. 2020. SpotFake+: A multimodal framework for fake news detection via transfer learning. In Proceedings
of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). 13915–13916. DOI:10.1609/aaai.v34i10.7230
[114] Eugenio Tacchini, Gabriele Ballarin, Marco L. Della Vedova, Stefano Moret, and Luca de Alfaro. 2017. Some like it
hoax: Automated fake news detection in social networks. In Proceedings of the 2nd Workshop on Data Science for
Social Good. 1–15.
[115] Reuben Tan, Bryan Plummer, and Kate Saenko. 2020. Detecting cross-modal inconsistency to defend against neu-
ral fake news. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20).
2081–2106. DOI:10.18653/v1/2020.emnlp-main.163

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:31

[116] Xiaoting Tang, Cong Wang, Xingliang Yuan, and Qian Wang. 2018. Non-interative privacy-preserving truth dis-
covery in crowd sensing applications. In Proceedings of the International Conference on Computer Communications
(INFOCOM’18). 1988–1996. DOI:10.1109/INFOCOM.2018.8486371
[117] Marco Del Tredici and Fernández Raquel. 2020. Words are the window to the soul: Language-based user represen-
tations for fake news detection. In Proceedings of the 28th International Conference on Computational Linguistics
(COLING’20). 5467–5479. DOI:10.18653/v1/2020.coling-main.477
[118] Rudra M. Tripathy, Amitabha Bagchi, and Sameep Mehta. 2010. A study of rumor control strategies on social
networks. In Proceedings of the ACM International Conference on Information & Knowledge Management. 1817–1820.
DOI:10.1145/1871437.1871737
[119] Sebastian Tschiatschek, Adish Singla, and Manuel Gomez Rodriguez. 2018. Fake news detection in social networks
via crowd signals. In Proceedings of the International Conference Companion on World Wide Web (WWW’18). 517–524.
DOI:10.1145/3184558.3188722
[120] Yimin Chen, Victoria L. Rubin, Niall J. Conroy, and Sarah Cornwell. 2016. Fake news or truth? Using satirical cues
to detect potentially misleading news. In Proceedings of the Annual Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16). 7–17. DOI:10.18653/v1/
W16-0802
[121] Nguyen Vo and Kyumin Lee. 2020. Where are the facts? Searching for fact-checked information to alleviate the spread
of fake news. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20).
7717–7731. DOI:10.18653/v1/2020.emnlp-main.621
[122] Svitlana Volkova and Jin Yea Jang. 2018. Misleading or falsification? Inferring deceptive strategies and types in online
news and social media. In Proceedings of the International Conference Companion on World Wide Web (WWW’18).
575–583. DOI:10.1145/3184558.3188728
[123] Svitlana Volkova, Kyle Shaffer, Jin Yea Jang, and Nathan Hodas. 2017. Separating facts from fiction: Linguistic models
to classify suspicious and trusted news posts on Twitter. In Proceedings of the Annual Meeting of the Association for
Computational Linguistics (ACL’17). 647–653. DOI:10.18653/v1/P17-2102
[124] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (2018),
1146–1151. DOI:10.1126/science.aap9559
[125] Dalia Attia Waguih and Laure Berti-Equille. 2014. Truth discovery algorithms: An experimental evaluation.
arXiv: 1409.6428 (2014).
[126] Biao Wang, Chen Ge, Luoyi Fu, Song Li, and Xinbing Wang. 2017. DRIMUX: Dynamic rumor influence minimiza-
tion with user experience in social networks. IEEE Transactions on Knowledge and Data Engineering 29, 10 (2017),
2168–2181. DOI:10.1109/TKDE.2017.2728064
[127] Dong Wang, Tarek Abdelzaher, and Lance Kaplan. 2015a. Social Sensing: Building Reliable Systems on Unreliable Data.
Morgan Kaufmann.
[128] Dong Wang, Tarek Abdelzaher, Lance Kaplan, Raghu Ganti, Shaohan Hu, and Hengchang Liu. 2013. Exploitation
of physical constraints for reliable social sensing. In Proceedings of the IEEE Real-time Systems Symposium. 212–223.
DOI:10.1109/RTSS.2013.29
[129] Dong Wang, Md Tanvir Al Amin, Shen Li, Lance Kaplan, Siyu Gu, Chenji Pan, Hengchang Liu, Charu Aggarwal,
Raghu K. Ganti, and Xinlei Wang. 2014. Using humans as sensors: An estimation-theoretic perspective. In Proceedings
of the International Symposium on Information Processing in Sensor Networks (IPSN’14). 35–46. DOI:10.5555/2602339.
2602344
[130] Dong Wang and Chao Huang. 2015. Confidence-aware truth estimation in social sensing applications. In Proceedings
of the 12th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON’15). 336–344.
DOI:10.1109/SAHCN.2015.7338333
[131] Dong Wang, Lance Kaplan, and Tarek F. Abdelzaher. 2014. Maximum likelihood analysis of conflicting observations
in social sensing. ACM Trans. Sensor Netw. 10, 2 (2014), 1–27. DOI:10.1145/2530289
[132] Dong Wang, Lance Kaplan, Hieu Le, and Tarek Abdelzaher. 2012. On truth discovery in social sensing: A maximum
likelihood estimation approach. In Proceedings of the ACM/IEEE International Conference on Information Processing
in Sensor Networks (IPSN’12). 233–244. DOI:10.1145/2185677.2185737
[133] Dong Wang, Jermaine Marshall, and Chao Huang. 2016. Theme-relevant truth discovery on Twitter: An estima-
tion theoretic approach. In Proceedings of the International AAAI Conference on Web & Social Media (ICWSM’16).
408–416.
[134] William Yang Wang. 2017. “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. (2017), 422–426.
DOI:10.1145/3350546.3352552
[135] Xuezhi Wang, Yu Cong, Simon Baumgartner, and Flip Korn. 2018. Relevant document discovery for fact-checking
articles. In Proceedings of the International Conference Companion on World Wide Web (WWW’18). 525–533.
DOI:10.1145/3184558.3188723

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
6:32 F. Xu et al.

[136] Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu Sum, and Gao Jing. 2018. EANN:
Event adversarial neural networks for multi-modal fake news detection. In Proceedings of the ACM SIGKDD Interna-
tional Conference on Knowledge Discovery & Data Mining (KDD’18). 849–857. DOI:10.1145/3219819.3219903
[137] Yaqing Wang, Weifeng Yang, Fenglong Ma, Jin Xu, Bin Zhong, Qiang Deng, and Jing Gao. 2020. Weak supervision for
fake news detection via reinforcement learning. In Proceedings of the 34th AAAI Conference on Artificial Intelligence
(AAAI’20). 516–523. DOI:10.1609/aaai.v34i10.7230
[138] Feng Wei, Yan Zheng, Hengrun Zhang, Zeng Kai, and Y. Thomas Hou. 2018. A survey on security, privacy and trust
in mobile crowdsourcing. IEEE Internet Things J. 5, 4 (2018), 2971–2992. DOI:10.1109/JIOT.2017.2765699
[139] Penghui Wei, Nan Xu, and Mao Wenji. 2019. Modeling conversation structure and temporal dynamics for jointly
predicting rumor stance and veracity. In Proceedings of the Conference on Empirical Methods in Natural Language
Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 4787–4798.
DOI:10.18653/v1/D19-1485
[140] Weiming Wen, Songwen Su, and Yu Zhou. 2018. Cross-lingual cross-platform rumor verification pivoting on mul-
timedia content. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18).
3487–3496. DOI:10.18653/v1/D18-1385
[141] Liang Wu and Huan Liu. 2018. Tracing fake-news footprints: Characterizing social media messages by how they
propagate. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(WSDM’18). 1–9. DOI:10.1145/3159652.3159677
[142] Lianwei Wu, Yuan Rao, Haolin Jin, Ambreen Nazir, and Ling Sun. 2019. Different absorption from the same sharing:
Sifted multi-task learning for fake news detection. In Proceedings of the Conference on Empirical Methods in Natural
Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 4644–
4653. DOI:10.18653/v1/D19-1471
[143] Rui Xia, Kaizhou Xuan, and Yu Jianfei. 2020. A state-independent and time-evolving network for early rumor detec-
tion in social media. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20).
9042–9051. DOI:10.18653/v1/2020.emnlp-main.727
[144] Sihong Xie, Wang Guan, Shuyang Lin, and Philip S. Yu. 2012. Review spam detection via temporal pattern discovery.
In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD’12). 823–
831. DOI:10.1145/2339530.2339662
[145] Fan Xu, Victor S. Sheng, and Mingwen Wang. 2020. Near real-time topic-driven rumor detection in source microblogs.
Knowl.-Based Syst. 207 (2020), 1–9. DOI:10.1016/j.knosys.2020.106391
[146] Guowen Xu, Hongwei Li, Tan Chen, Dongxiao Liu, Yuanshun Dai, and Yang Kan. 2017. Achieving efficient and
privacy-preserving truth discovery in crowd sensing systems. Comput. Secur.y 69 (2017), 114–126. DOI:10.1016/j.
cose.2016.11.014
[147] Guowen Xu, Hongwei Li, Dongxiao Liu, Ren Hao, Yuanshun Dai, and Xiaohui Liang. 2016. Towards efficient privacy-
preserving truth discovery in crowd sensing systems. In Proceedings of the IEEE Global Communications Conference.
1–6. DOI:10.1109/GLOCOM.2016.7842343
[148] Guowen Xu, Hongwei Li, Sen Liu, Mi Wen, and Rongxing Lu. 2019. Efficient and privacy-preserving truth discovery
in mobile crowd sensing systems. IEEE Trans. Vehic. Technol. 68, 4 (2019), 3854–3865. DOI:10.1109/TVT.2019.2895834
[149] Kai-Chou Yang, Timothy Niven, and Kao Hung-Yu. 2019. Fake news detection as natural language inference. arXiv:
1907.07347v1 (2019).
[150] Shuo Yang, Kai Shu, Suhang Wang, Renjie Gu, Fan Wu, and Huan Liu. 2019. Unsupervised fake news detection on
social media: A generative approach. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’19).
5644–5651. DOI:10.1145/3372923.3404783
[151] Xiaoyu Yang, Yuefei Lyu, Tian Tian, Yifei Liu, Yudong Liu, and Xi Zhang. 2020. Rumor detection on social media with
graph structured adversarial learning. In Proceedings of the 29th International Joint Conference on Artificial Intelligence
(IJCAI’20). 1417–1423. DOI:10.24963/ijcai.2020/197
[152] Yuanshun Yao, Bimal Viswanath, Jenna Cryan, Haitao Zheng, and Ben Y. Zhao. 2017. Automated crowdturfing attacks
and defenses in online review systems. In Proceedings of the ACM Conference on Computer and Communications
Security. 1143–1158. DOI:10.1145/3133956.3133990
[153] Junting Ye, Santhosh Kumar, and Leman Akoglu. 2016. Temporal opinion spam detection by multivariate indicative
signals. In Proceedings of the International AAAI Conference on Web & Social Media (ICWSM’16). 743–746.
[154] Jie Yin, Andrew Lampert, Mark Cameron, Bella Robinson, and Robert Power. 2012. Using social media to enhance
emergency situation awareness. IEEE Intell. Syst. 27, 6 (2012), 52–59. DOI:10.1109/MIS.2012.6
[155] Seunghyun Yoon, Kunwoo Park, Joongbo Shin, Hongjun Lim, Seungpil Won, Meeyoung Cha, and Kyomin Jung. 2019.
Detecting incongruity between news headline and body text via a deep hierarchical encoder. In Proceedings of the
Thirtieth AAAI Conference on Artificial Intelligence (AAAI’19). 791–800.

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.
A Unified Perspective for Disinformation Detection and Truth Discovery in Social Sensing 6:33

[156] Jianfei Yu, Jing Jiang, Ling Min Serena Khoo, Hai Leong Chieu, and Rui Xia. 2020. Coupled hierarchical transformer
for stance-aware rumor verification in social media conversations. In Proceedings of the Conference on Empirical
Methods in Natural Language Processing (EMNLP’20). 1392–1401. DOI:10.18653/v1/2020.emnlp-main.108
[157] Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han, and Songlin Hu. 2019. Jointly embedding the local and global
relations of heterogeneous graph for rumor detection. In Proceedings of the IEEE International Conference on Data
Mining (ICDM’19). 796–805. DOI:10.1109/ICDM.2019.00090
[158] Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han, and Hu Songlin. 2020. Early detection of fake news
by utilizing the credibility of news, publishers, and users based on weakly supervised learning. In Proceedings
of the 28th International Conference on Computational Linguistics (COLING’20). 5444–5454. DOI:10.18653/v1/2020.
coling-main.475
[159] Savvas Zannettou, Michael Sirivianos, Jeremy Blackburn, and Nicolas Kourtellis. 2019. The web of false information:
Rumors, fake news, hoaxes, clickbait, and various other shenanigans. J. Data Inf. Qual. 3 (2019), 1–37. DOI:10.1145/
3309699
[160] Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Cho Yejin. 2019.
Defending against neural fake news. arXiv: 1905.12616v1 (2019).
[161] Li Zeng, Kate Starbird, and Emma S. Spiro. 2016. Rumors at the speed of light? Modeling the rate of rumor trans-
mission during crisis. In Proceedings of the Hawaii International Conference on System Sciences (HICSS’16). 1969–1978.
DOI:10.1109/HICSS.2016.248
[162] Chuan Zhang, Liehuang Zhu, Xu Chang, Kashif Sharif, Xiaojiang Du, and Mohsen Guizani. 2019. LPTD: Achieving
lightweight and privacy-preserving truth discovery in CIoT. Fut. Gen. Comput. Syst. 90 (2019), 175–184. DOI:10.1016/
j.future.2018.07.064
[163] Chuan Zhang, Liehuang Zhu, Chang Xu, Kashif Sharif, and Ximeng Liu. 2019. PPTDS: A privacy-preserving truth
discovery scheme in crowd sensing systems. Inf. Sci. 484 (2019), 183–196. DOI:10.1016/j.ins.2019.01.068
[164] Daniel Zhang, Dong Wang, Nathan Vance, Yang Zhang, and Steven Mike. 2018. On scalable and robust truth discov-
ery in big data social media sensing applications. IEEE Trans. Big Data (2018), 195–208. DOI:10.1109/TBDATA.2018.
2824812
[165] Jing Zhang, Victor S. Sheng, and Xindong Wu. 2015. CEKA: A tool for mining the wisdom of crowds. J. Mach. Learn.
Res. 16 (2015), 2853–2858. DOI:10.5555/2789272.2912090
[166] Jing Zhang, Xindong Wu, and Victor S. Sheng. 2016. Learning from crowdsourced labeled data: A survey. Artif. Intell.
Rev. 46, 4 (2016), 543–576. DOI:10.1007/s10462-016-9491-9
[167] Zhao Zhe, Paul Resnick, and Qiaozhu Mei. 2015. Enquiring minds: Early detection of rumors in social media from
enquiry posts. In Proceedings of the International Conference Companion on World Wide Web (WWW’15). 1395–1405.
DOI:10.1145/2736277.2741637
[168] Yifeng Zheng, Huayi Duan, and Wang Cong. 2018. Learning the truth privately and confidently: Encrypted
confidence-aware truth discovery in mobile crowdsensing. IEEE Trans. Inf. Forens. Secur. 13 (2018), 2475–2489.
DOI:0.1109/TIFS.2018.2819134
[169] Yifeng Zheng, Huayi Duan, Xingliang Yuan, and Cong Wang. 2017. Privacy-aware and efficient mobile crowdsensing
with truth discovery. IEEE Trans. Depend. Sec. Comput. 17, 1 (2017), 121–133. DOI:10.1109/TDSC.2017.2753245
[170] Xinyi Zhou and Reza Zafarani. 2018. Fake news: A survey of research, detection methods, and opportunities.
arXiv: 1812.00315v1 (2018).
[171] Dimitrina Zlatkova, Preslav Nakov, and Ivan Koychev. 2019. Fact-checking meets fauxtography: Verifying claims
about images. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and
the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 2099–2108. DOI:10.18653/
v1/D19-1216
[172] Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, and Rob Procter. 2017. Detection and resolution of
rumours in social media: A survey. Comput. Surv. 51, 2 (2017). DOI:10.1145/3161603
[173] Arkaitz Zubiaga and Heng Ji. 2014. Tweet, but verify: Epistemic study of information verification on Twitter. Social
Netw. Anal. Mining 4, 1 (2014), 1–12. DOI:10.1007/s13278-014-0163-y
[174] Arkaitz Zubiaga, Maria Liakata, Rob Procter, Geraldine Wong Sak Hoi, and Peter Tolmie. 2016. Analysing how
people orient to and spread rumours in social media by looking at conversational threads. PLoS One 11, 3 (2016).
DOI:10.1371/journal.pone.0150989

Received September 2019; revised May 2021; accepted July 2021

ACM Computing Surveys, Vol. 55, No. 1, Article 6. Publication date: November 2021.

You might also like