Professional Documents
Culture Documents
Datasets
Datasets
https://arxiv.org/pdf/2111.03299.pdf
//dataset PAPER
4.1.3. FacebookHoax
The dataset consists of data about posts from Facebook pages that are associated with scientific news
(verified) and conspiracy pages (unverified), which were gathered using the Facebook Graph API. The
dataset comprises a total of 15,500 posts extracted from 32 distinct pages without any supporting
evidence provided, consisting of 14 pages dedicated to conspiracy-related content and 18 pages focused
on scientific topics.
4.1.4. GossipCop
GossipCop, another name for Suggest [16], analyzes fake news from US entertainment and celebrity
stories that are posted on websites and in magazines. Each story is given a score between 0 and 10, which
indicates that the news is entirely factual, and 0 indicates that the rumor is entirely fake or fictitious.
4.1.5 FakeCovid
Popat et al. (2017) introduced a corpus that contained a significantly larger number of verified claims.
The dataset consists of 4,956 claims along with their corresponding verdicts. These claims have been
sourced from the Snopes website, as well as the Wikipedia collection of confirmed hoaxes and fictitious
individuals. The researchers retrieved approximately 30 related documents from the web for each claim
by using the Google search engine. This approach generated a total collection of 136,085 documents.
2. 4.2.1.2 MultiFC
In contrast to the FEVER dataset, which is based on claims generated from Wikipedia, the MultiFC
dataset, as proposed by Augenstein et al. [7], gathers information from 26 fact-checking websites.
Specifically, a dataset of 34,918 claims was developed by the researchers. The claims were collected from
26 fact-checking websites, Top evidence pages from the web to verify the claims; the context and
metadata . We perform a thorough analysis to identify characteristics of the dataset such as entities
mentioned in claims. The claims were extracted from diverse domains, each characterized by a distinct
number of labels which is also a notable challenge of this dataset.
https://aclanthology.org/2020.lrec-1.849/
4.2.3.3 WikiFactCheck-English
WikiFactCheck-English is a dataset comprising more than 124k claim, context, and an evidence
document which is extracted from citations and articles on English Wikipedia, together with more than
34k handwritten claims that are contradicted by the evidence documents. This is the largest fact-checking
dataset to date, including actual real world claims and supporting evidence. In the actual world, it will
facilitate the creation of fact-checking systems that can more effectively handle assertions and supporting
data.
■ 4.2.3.4 Emergent
An Emergent dataset for rumor debunking was obtained from a digital journalism programme . The
dataset includes 2,595 news stories that are related to 300 rumored claims and their corresponding
veracity classifications (true, false, or unconfirmed), indicating whether it's stance is supporting, refuting,
or observing the claim.
4.2.3.5 FACTIFY
The FACTIFY dataset includes evidence documents linked to the written claims and images. This is the
first multi-modal fact verification dataset with three main categories labeled: support, no-evidence, and
refute . It also includes textual claims, reference textual resources, and images. Classifying the claims
according to the evidence provided is the goal.With 50,000 data points encompassing news from India
and the US, the FACTIFY dataset is notable for being the largest multi-modal fact verification public
dataset . It was made available for use in a group project during the De-Factify workshop in AAAI-21 5.
4.2.4.3 HealthFC
The HealthFC dataset consists of 750 health related claims in German and English language. The
evidence for these claims were gathered from clinical studies.The dataset was collected from web portal
Medizin Transparent. Every claim has an associated single document.The dataset has three labels
support , refute and NEI.