Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Case Study on Detecting COVID-19 Health-Related

Misinformation in Social Media


Mir Mehedi Ahsan Pritom1 , Rosana Montanez Rodriguez1 , Asad Ali Khan2 , Sebastian A. Nugroho2 ,

Esra’a Alrashydah3 , Beatrice N. Ruiz4 , Anthony Rios5


Department of Computer Science1 Department of Civil and Environmental Engineering3
2
Department of Electrical and Computer Engineering Department of Psychology4
Department of Information Systems and Cyber Security5
University of Texas at San Antonio, USA
Email: {mirmehedi.pritom, rosana.montanezrodriguez, asad.khan, sebastian.nugroho, anthony.rios}@utsa.edu

Abstract infection (Pritom et al., 2020; WHO, 2020). So-


cial media became the main conduit for the spread
COVID-19 pandemic has generated what of COVID-19 misinformation. The abundance of
public health officials called an infodemic health-related misinformation spreading over social
of misinformation. As social distancing and media presents a threat to public health, especially
stay-at-home orders came into effect, many
arXiv:2106.06811v1 [cs.SI] 12 Jun 2021

in controlling and mitigating the spread of COVID-


turned to social media for socializing. This 19 (Islam et al., 2020). Misinformation campaigns
increase in social media usage has made it also affect public attitudes towards health guid-
a prime vehicle for the spreading of mis- ance compliance and hampering efforts towards
information. This paper presents a mech- preventing the spread. In some cases, individuals
anism to detect COVID-19 health-related have lost their lives by making decisions based on
misinformation in social media following misinformation (Barua et al., 2020). Therefore, it
an interdisciplinary approach. Leveraging is imperative to combat COVID-19 health-related
social psychology as a foundation and exist- misinformation to minimize its adverse impact on
ing misinformation frameworks, we defined public health (Ali, 2020).
misinformation themes and associated key-
words incorporated into the misinformation In the past, researchers have focused on identify-
detection mechanism using applied machine ing social media misinformation (Yu et al., 2017)
learning techniques. Next, using the Twit- using various machine learning (ML) and deep
ter dataset, we explored the performance of learning-based approaches. However, the effective-
the proposed methodology using multiple ness of those approaches in tackling health-related
state-of-the-art machine learning classifiers. COVID-19 misinformation from social media is un-
Our method shows promising results with known. There is also a lack of ground-truth and
at most 78% accuracy in classifying health- validated datasets to verify model accuracy for the
related misinformation versus true informa- previous research efforts. Moreover, we find lacking
tion using uni-gram-based NLP feature gen- in the existing literature to address the misinfor-
erations from tweets and the Decision Tree mation problem with coordinated interdisciplinary
classifier. We also provide suggestions on approaches from social psychology, information op-
alternatives for countering misinformation erations, and data science (i.e., applied machine
and ethical consideration for the study. learning). Motivated by the lacking, the primary
objective of this study is to leverage interdisci-
1 Introduction plinary techniques to understand the COVID-19
health-related misinformation problem and derive
The ongoing COVID-19 pandemic has brought an
insights by detecting misinformation using the Nat-
unprecedented health crisis. Along with the physi-
ural Language Processing (NLP) methods and Ma-
cal health-related effects, the pandemic has brought
chine Learning (ML) classifiers. Grounded in exist-
changes to daily social interactions such as telework-
ing work on misinformation propagation (Schneier,
ing, social distancing, and stay-at-home orders lead-
2019) and source credibility theory (Hovland and
ing to high usage of social media (Statista, 2020).
Weiss, 1951) in social psychology, our study derives
These conditions have paved the way for opportunis-
various credible health-related themes for under-
tic bad guys to fish in the troubled waters. From the
standing health misinformation and propose detec-
beginning of the COVID-19 pandemic, an excessive
tion utilizing state-of-the-art techniques from NLP
amount of misinformation has spread across social
and applied machine learning.
and online digital media (Barua et al., 2020; Cinelli
et al., 2020). These misinformation campaigns in- We use Twitter as our social media platform to
clude hoaxes, rumors, propaganda, or conspiracy study the effectiveness of our proposed methodology.
theories, often with themed products or services We have collected, processed, labeled, and analyzed
that may protect from contracting the COVID-19 tweets to train and test the supervised ML classi-

1
fiers. Finally, we have analyzed the performance of in (Roozenbeek et al., 2020) and (Bridgman et al.,
the classifiers following our mechanism using stan- 2020) report that exposure to misinformation in-
dard performance metrics such as accuracy, pre- creases individual’s misconception on COVID-19
cision, recall, F1-score, and Macro-F1-score. The and lowers their compliance with public health pre-
major contributions of this paper are, vention guidelines.
Our approach is also influenced by Information
• Provide a detailed methodology with a proto- Operations Kill Chain (Schneier, 2019). The frame-
type for detecting COVID-19 health-related work is based on the Russian “Operation Infektion”
misinformation from social media (i.e., Twit- misinformation campaign and provides the basis for
ter). our focus on existing grievances. A critical charac-
• Propose a more fair social media (e.g., Twitter) teristic of misinformation is that it propagates using
annotation process for labeling misinformation. existing channels by aligning the messages to pre-
existing grievances and beliefs in a group (Schneier,
• Provide a labeled ground-truth dataset for 2019; Cyber-Digital Task Force, 2018). Using ex-
COVID-19 health-related tweets for future isting media associated with a credible source (i.e.,
model verification. credible from the audience perspective) makes the
message more likely to be accepted by the audience
• Provide the efficacy of state-of-the-art classi- (Hovland and Weiss, 1951). Moreover, Islam et al.
fiers to detect COVID-19 health-related misin- (2020) reveals that the oversupply of health-related
formation tweets leveraging different NLP text misinformation fueled by rumors, stigma, and con-
representation methods such as Bag-of-Words spiracy theories in social media platforms provides
(BoW) and n-gram. critical, adverse implications towards individuals
and communities.
Paper outline. Section 2 presents the problem
background motivation and related works. Section COVID-19 Misinformation. Studies compar-
3 presents the research methodology, experiment ing the performance of different ML algorithms
details with the dataset collection, processing, and have been conducted in the literature. For instance,
analyzing steps.Section 4 discusses the experiment (Choudrie et al., 2021) analyzes how older adults
results, limitations, future research directions, and process various kinds of infodemic about COVID-19
ethical considerations of the present study. Section prevention and cure using Decision Tree and Convo-
5 concludes the paper. lutional Neural Network techniques. Although this
study focuses on COVID-19 health-related misinfor-
2 Background and Related Works mation, the data is collected via online interviews
We find that misinformation research has also been with 20 adults. (Mackey et al., 2021) have presented
driven by the COVID-19 pandemic as there are lots an application of unsupervised learning to detect
of ongoing research on fake news and social me- misinformation on Twitter using “hydroxychloro-
dia misinformation. In this section, we draw from quine” as the keyword. However, the study has only
the previous works on how existing misinformation scoped to detect misinformation related to the word
detection research is leveraging Natural Language “hydroxychloroquine," one of the many health key-
Processing (NLP), Machine Learning (ML), and words we have used to filter health-related tweets.
interdisciplinary techniques such as information kill Again, (Patwa et al., 2020) presents a manually
chain (e.g., step for the propagation of misinforma- annotated dataset containing 10,700 social media
tion), and social psychology. posts and articles from various sources, such as
Twitter and Facebook, and analyzes ML methods’
Background Motivation. Authors in (Pan and performance to detect fake news related to COVID-
Zhang, 2020; Sylvia Chou and Gaysynsky, 2020) 19. The ML models explored in that study were
highlights how the COVID-19 infodemic has added Decision Tree, Logistic Regression, Gradient Boost-
additional challenges for the public health commu- ing, and Support Vector Machine. They have not
nity. Infodemic is the product of an overabundance focused on health-related misinformation, which is
of information that undermines public health ef- the scope of the current study. Next, (Gundapu
forts to address the pandemic (WHO, 2020). The and Mamidi, 2021) have used supervised ML and
effect of COVID-19 misinformation also impacts deep learning transformer models (namely BERT,
law enforcement and public safety entities (Gradoń, ALBERT, and XLNET) for COVID-19 misinforma-
2020). The study finds that social media have tion detection. Likewise, in the previous one, they
a higher prevalence of misinformation than news have not provided insights on any health-related
outlets (Bridgman et al., 2020). Another study themes or keywords. In (Park et al., 2020), an inves-
highlights that an increase in Twitter conversation tigation on the information propagation and news
on COVID-19 is also a good predictor of COVID-19 sharing behaviors related to COVID-19 in Korea
regional contagion (Singh et al., 2020). Authors is performed, using content analysis on real-time

2
Twitter data shared by top news channels. The re- Kill chain (Schneier, 2019) to understand the steps
sults show that the spread of the COVID-19 related to conduct misinformation operations. Based on
news articles that delivered medical information our understanding, we have studied some of the
is more significant than non-medical information; popular COVID-19 related hoaxes and misinforma-
hence medical information dissemination impacts tion articles (Lytvynenko, 2020a,b; Gregory and
the public health decision-making process. McDonald, 2020; World Health Organization, 2020)
to derive various themes and keywords. Themes de-
NLP for COVID-19. NLP methods have also
scribe the pre-existing grievances and beliefs of the
been leveraged to detect COVID-19 misinformation
group, and keywords are words specific to COVID-
in YouTube (Serrano et al., 2020; Li et al., 2020)
19 health-related misinformation that align with a
and Twitter (Al-Rakhami and Al-Amri, 2020). Ser-
particular theme. These keywords help us to col-
rano et al. (2020) have studied catching COVID-19
lect and filter relevant tweets from a sheer volume
misinformation videos on YouTube through extract-
of COVID-19 related tweets for the current study.
ing user conversations in the comments and pro-
Again, we have selected only a few days to col-
posed a multi-label classifier Next, Al-Rakhami and
lect Twitter data for resource and time constraints.
Al-Amri (2020) use a two-level ensemble-learning-
However, to increase the chances of coverage of
based framework using Naive Bayes, k-Nearest
common COVID-19 health-related tweets, we have
Neighbor, Decision Tree, Random Forest, and SVM
selected dates where a COVID-19 related event has
to classify misinformation based on the online cred-
occurred in the United States. This pilot study
ibility of the author. They define credibility based
would reveal the efficacy of our proposed methodol-
on user-level and tweet-level features leveraging
ogy. The selection of dates also covered the tweets
NLP methods. Their findings show features like
from the first two months (March-April 2020) to
account validation (IsV), number of retweets NoRT),
the first four months (June-July 2020) of the global
number of hashtags NoHash), number of mentions
pandemic declaration on March 11, 2020 (Cucinotta
NoMen), and profile follow rates FlwR) are good pre-
and Vanelli, 2020). In this section, we present our
dictors of credibility. However, in our work, we
methodology for the following modules: (i) Twit-
have not used any user-level information for classi-
ter Dataset Collection, (ii) Annotation of Tweets,
fying misinformation and only relied on the texts
(iii) Analyzing Tweets, (iv) Classification Tasks for
of the corresponding tweets. These above user-level
Detection Model, (v) Performance Evaluation.
features may be added as a complement to our
methodology for more accurate detection models.
3.1 Twitter Dataset Collection
Again, we find study related to COVID-19 that has
leveraged machine learning algorithms with Bag of We only consider collecting the tweets or user-
Words (BoW) NLP-based features for classifying generated content posted by anonymous individuals
COVID-19 diagnosis from textual clinical reports from Twitter. This study does not infer or reveal
(Khanday et al., 2020). any authors of the tweets, as we only extract the
textual parts of a tweet. The biggest challenge for
Other Related Works. Li et al. (2020) have collecting quality tweets is similar to a “needle in
selected the top viewed 75 videos with keywords haystack” problem as there are many COVID-19
of ‘coronavirus’ and ‘COVID-19’ to be analyzed related tweets on Twitter posted daily. We have
for reliability scoring. The videos have been ana- focused only on health-related tweets because that
lyzed using their proposed novel COVID-19 Specific directly impacts public health if people trust mis-
Score (CSS), modified DISCERN (mDISCERN), guided tweets.
and modified JAMA (mJAMA) scores. In (Ahmed
Although it is possible to collect the re-
et al., 2020), the authors highlight the drivers of
lated tweets directly using Twitter API via
misinformation and strategies to mitigate it by con-
Tweepy (Roesslein, 2020), there are rate limitations
sidering COVID-19 related conspiracy theories on
on the API. As an alternative in this study, we use
Twitter where they have observed ordinary citi-
the COVID-19 Tweets dataset (Lamsal, 2020) from
zens as the most critical drivers of the conspiracy
IEEE Dataport because this dataset is (i) publicly
theories. This finding highlights the need for mis-
available online, (ii) provides COVID-19 related
information detection and content removal policies
tweet IDs daily from as early as March 2020, and
for social media.
(iii) the tweets are collected from all over the world
3 Methodology and Experiment (thereby giving no regional limitation).
We have selected the following four days:
Setup
04/02/2020 (i.e., Stay-at-Home orders in many
In this study, we first try to understand the types states in the US), 04/24/2020 (i.e., POTUS com-
of COVID-19 health-related misinformation that ments on the use of disinfectant against COVID-
have been disseminated during the pandemic. We 19 goes viral), 06/16/2020 (i.e., reports published
try to map them into the Information Operations on the use of common drugs against COVID-

3
Theme: Limiting Civil Liberties Theme: Prevention Theme: Possible Remedies
lockdown tide pods hand sanitizer immunity drink hydroxychloroquine homeopathy
hoax sunlight uv ray uv light chloronique bleach shots
desinfectant
Theme: Worsening Condition Theme: Origin of the Virus
amphetamine 5g wuhan virus wuhan virus

Table 1: Example of a COVID-19 Health Related Themes and Keywords

COVID-19 Health Related Keywords


drug dengue wash hand antibody fda-approved facemask sars
treatment screening patient hand wash immunity scientific evidence health vaccine
stay at home home stay social distance azithromycin mask stay home fever
testing immunity drink heperan sulfate pneumonia vitamin d body pain n95
slight cough n-95 herd immunity antibody test antibodies

Table 2: List of COVID-19 Health-Related Keywords for Filtering Tweets

19 (Mueller and Rabin, 2020)), and 07/02/2020 of the first four labels.
(i.e., face cover mandates in many US states). After The same set of tweets are independently labeled
selection of the dates, we download the full dataset by multiple annotators (i.e., our research team mem-
for each of the dates from IEEE Dataport (Lam- bers). We have relied on majority voting among
sal, 2020). The data contains all the tweet IDs for the tweet labels to finalize labels for each tweet.
the tweets, but it does not contain the tweets (i.e., Moreover, any of the tweets not having a winning
texts) themselves. Next, we use the Hydrator app label are finalized by conducting open discussion
(the Now, 2020) for extracting actual tweets. For among the group of annotators to reach a unani-
each selected day, we extract 10,000 tweet IDs and mous agreement. This process has produced 314
collect those tweets for further processing. We limit tweets labeled as T , 210 tweets labeled as M , 173
our collection to 10,000 tweets because of resources tweets labeled as I, and 1,918 tweets labeled as
and time limitations. We observe that the tweets N . For aiding the future research on COVID-19
extracted from the Hydrator app are truncated to health-related misinformation detection on social
a maximum of 280 characters. media, we would be happy to share our annotated
Next, we identify various themed for ongoing ground-truth dataset through valid organizational
COVID-19 health-related misinformation (as shown email requests only. We believe the dataset can
in Table 1) and define a glossary of COVID-19 work as a basis for improved future research.
health-related keywords (shown in Table 2) to
filter only the interesting and relevant health- 3.3 Methods for Analyzing Tweets
related tweets. This filtering resulted in a total
of 2,615 unique tweets for all four selected dates. We have only considered using the tweets with
The health keyword glossary combines COVID- class labels M and T for health-related misin-
19 health-related misinformation (based on the formation detection study. At first, we tokenize
themes) and true information. tweets to build two sets of tokens: (i) true infor-
mation tokens, (ii) misinformation tokens. Next,
to improve the model performance, we remove the
3.2 Annotations of Tweets
default English stop words (SWenglish ) listed in
In this study, we apply manual annotation on the (RANKS.NL) and more trivial COVID-19 words
filtered tweets to label them. Initially, we have for true information and misinformation tweets.
defined 5 class labels to annotate all the filtered These trivial words are observed by analyzing the
tweets, such as (i) true information (T ), (ii) misin- most frequent tokens from true information and
formation (M ), (iii) incomplete (I), (iv) not health- misinformation set of tweets. Some of the high-
related (N ), and (v) unsure (U ). Here, true infor- lighted trivial COVID-19 words are included in
mation is COVID-19 related health facts supported this set SWtrivial = {covid19, covid, covid-19,
by scientific evidence; misinformation is inaccurate coronavirus, corona, covid_19, health}. More-
COVID-19 health-related information that health over, it is important to cleanup tweets for reliable
organizations like WHO and CDC have discredited; and effective analysis as many tweets are messy.
incomplete information is a truncated tweets that The cleanup process includes transmitting all tweets
can not be verified for the complete statement; not to lower case letters for avoiding redundancy, re-
health-related is any tweet about COVID-19 that moval of incomplete links, cleaning of unnecessary
does not directly relate to any health information; punctuation, irrelevant set of characters (e.g., ...,
and unsure class contains tweets where the annota- “,”, @), non-meaningful single-character tokens, and
tor is unsure about the exact categorization to any digits only tokens (e.g., “100”). In total, we have

4
removed 222 stop-words (presented as SW, where
SW = SWenglish ∪ SWtrivial ). Next, we apply Ani = {∀j toknj := 0 | ∀w∈toknj w ∈
/ twi } (4)
python NLTK SnowballStemmer for stemming the
tweets (NLTK, 2020) and extract the root forms of In Eq. 3, ∀w∈toknj w ∈ twi depicts the presence of
words for more generalized model. After these steps, all the words of the j-th token in tweet twi . Now,
we need to generate features from the tweet texts n-gram tokens is further derived based on the value
for the supervised learning classifiers. This study of n, as follows,
only relied on text features (e.g., extracted from 
individual tweets) to classify health misinformation (wj )
 if n = 1
versus true information. In this study, we have toknj = (wj , wj+1 ) if n = 2 (5)

used popular Bag of Words (BoW) (Zhang et al., 
(wj , wj+1 , wj+2 ) if n = 3
2010) and different n-grams (Fürnkranz, 1998) NLP
methods for feature extraction from the tweets. From equation 5, we see that uni-gram method
(n=1) use single word wj ∈ twi as tokens for any
3.3.1 Feature Extraction twi . The bi-grams method (n=2) use a pair of two
Bag of Words (BoW) : This method uses raw words (wj , wj+1 ) as tokes where both wj , wj+1 ∈
word frequencies in a sentence as features. If the twi . Lastly, tri-grams method (n=3) use a tuple
BoW method is used, a tweet is presented as set of of three words (wj , wj+1 , wj+2 ) as tokes, where
tokens, where each token is a connected word (e.g., wj , wj+1 , wj+1 ∈ twi . Moreover, for any tweets
no spaces in between), and it stores a frequency containing J number of words has (J −n+1) tokens
count for that token in the tweet. Any tweet twi for any n-grams methods.
containing multiple BoW tokens (or word) where
each token, tokj = wj ∈ / SW, and class label of 3.3.2 Preparing Training and Test Data
the i-th tweet, li ∈ {M, T } can be presented as a The selection of the feature extraction method
set of BoW tokenized representation tokenizedB i = plays a critical role in health-related misinfor-
PBi ∪ A B
i for the i-th tweet. Now, PB
i is the set of mation detection. To start preparing the train-
tokens that are present and AB i is the set of tokens ing and testing datasets, the set of tokenized
absent in the i-th tweet twi , derived as follows misinformation tweets is presented as DM =
m∈{B,n}
PB {∀i tokenizedi | li = M }, while the the set of
i = {∀j tokj := Freq(tokj ) | wj ∈ twi } (1)
tokenized true information tweets are presented
m∈{B,n}
as DT = {∀i tokenizedi | li = T }. Here,
AB
i = {∀j tokj := 0 | wj ∈
/ twi } (2) m ∈ {B, n} representing the method for feature
In Eq. 1 and 2, Freq(tokj ) present the frequency extraction. Next, we merge these dataset to pre-
counts of any j-th token within the i-th tweet. pare as Dmerge = DM ∪ DT . Then, the Dmerge
dataset is randomly splitted with 80 : 20 ratio into
n-grams : This method uses the sequence of n
training data (Dtrain ) and test data (Dtest ), along
(where n={1, 2, 3}) words as binary features. For
with their respective tweet labels. In this pilot
1-gram (or uni-gram) a single word sequence is
study, our training dataset contains |Dtrain | = 419
considered as a token. Unlike BoW method, 1-
tweets, of which 164 are misinformation and 255
gram method uses features for each token as binary
are true information. Again, the testing dataset
values (1 or 0), stating whether the token is present
contains |Dtest | = 105 tweets, of which 46 are mis-
in a tweet or not and does not mention if there
information and 59 true information. Note that,
are multiple instances of the token. Next, we have
individual tweets shuffle from training to test and
used 2-grams (or bi-grams) as features, which
vise versa, but the training and test size remain
use all the sequences of two words as tokens and
constant, which causes a change in the number
stores them with 1 or 0 binary values. Lastly, the
of tokens in training and test between BoW and
3-grams (or tri-grams) features use all the valid
1-gram method.
sequences of three words as tokens with the binary
value 1 (if token is present in tweet) or 0 (if token 3.3.3 Analysis with BoW Features
is not present in tweet).
For the BoW method, we observed 5, 268 words
Now, any tweet twi containing all n-grams tokens
(or tokens) in Dtrain (training data), leading to a
∀j toknj , where, n ∈ {1, 2, 3} and class label of the
vocabulary size of 2, 301 unique words, and 1, 318
i-th tweet, li ∈ {M, T } can be presented as a set
words in Dtest (test data), leading to a vocabulary
of n-grams tokenized representation, tokenizedni =
size of 838 unique words. Moreover, the tweets
Pni ∪ Ani for the i-th tweet. Here, Pni and Ani present
contained in Dtrain are consisting 12.57 words on
the set of tokens that are present and absent in
average, with maximum of 32 and minimum of 2
tweet twi , which is derived by Eq. 3 and Eq. 4,
words. Likewise, tweets contained in Dtest are also
respectively.
consisting 12.55 words on average, with maximum
Pni = {∀j toknj := 1 | ∀w∈toknj w ∈ twi } (3) of 28 and minimum of 2 words, which indicates a

5
• Accuracy = 12 c∈{M,T } Accuracyc , where
P
similar distribution of tweets in both training and
testing datasets with this method. individual class accuracy Accuracyc =
TPc +TNc
TPc +TNc +FPc +FNc .
3.3.4 Analysis with n-grams Features
TPc
In case of uni-gram method, we have 5,232 tokens in • Precisionc = TPc +FPc , for the selected class
Dtrain leading to 2,286 unique uni-gram tokens, and c ∈ {M, T }.
1,354 tokens in Dtest leading to 848 unique uni-gram TPc
• Recallc = TPc +FNc , where c ∈ {M, T } depicts
tokens. The average token length for training set is
the class.
12.49 tokens, with maximum of 32 and minimum of
2×(Recallc ×Precisionc )
2 tokens, while the average token length for test set • F1-scorec∈{M,T } = Recallc +Precisionc .
is 12.90 tokens, with maximum of 30 and minimum
1
P
of 4 tokens. Next, in case of bi-grams, we have 4,813 • Macro-F1-score = 2 c∈{M,T } F1-scorec .
tokens in Dtrain leading to 4,298 unique bi-grams Here, TPc is the total number of true positives,
tokens, and 1,249 tokens in Dtest leading to 1,171 TNc is the total number of true negatives, FPc is
unique bi-grams tokens. The average token lengths the number of false positives, and FNc is the total
for training set is 11.49 tokens, with maximum of 31 number of false negatives for class label c ∈ {M, T }.
and minimum of 1 tokens, while the average token
length for test set is 11.90 tokens, with maximum 4 Experiment Results and
of 29 and minimum of 3 tokens. Lastly, in case of Discussions
tri-grams, we have 4,394 tokens in Dtrain leading to
a tri-grams vocabulary (unique tokens) size of 4,166, Table 3 shows that the uni-gram text representation
and 1,144 tokens in Dtest leading to vocabulary size method with the Decision Tree classifier achieves
of 1,109. The average token length for training set the best classification accuracy of 78%. We also
is 10.49 tokens, while the average token lengths for observe that uni-gram method has outperformed
test set is 10.90 tokens. all other methods for all the classifiers except SVM,
where the bi-gram method has 75% over the 70%
3.4 Classification Tasks for accuracy of the uni-gram method. Table 3 also
Misinformation Detection highlights that all the classifiers achieve at least
The purpose of the classification task is to sepa- 74% classification accuracy with at least one of the
rate COVID-19 health-related tweets between true text representation methods. We also looked at
information and misinformation. We have used mul- the class-wise F-1 score for both misinformation
tiple machine learning classifiers that are commonly class and true information class, which further vali-
used in literature for text classification. The present dates that uni-gram method is outperforming other
study considers Decision Tree (DT) (Song and Lu, n-grams and BoW methods consistently for mis-
2015), Naive Bayes (NB) (Langley and Sage, 2013), information class labels. The classifiers perform
Random Forest (RF) (Biau, 2012), Support Vec- much better for detecting true information labeled
tor Machine (SVM) (Evgeniou and Pontil, 2001), tweets with F1-score ranging from 0.783 to 0.833
and Maximum Entropy Modeling (MEM) (Berger (i.e., at least one of the text representation meth-
et al., 1996) classifiers. We have selected these clas- ods) for different classifiers. In contrast, F1-score
sifiers because, (i) they are popular, (ii) they are for misinformation detection is always higher for
readily available under the python nltk.classify the uni-gram method, with a maximum of 0.683 for
and nltk.classify.scikitlearn API libraries the NB classifier. One reason for getting a better
(NLTK, 2020), (iii) they would provide a basis if the result for the true information than the misinfor-
proposed methodology is a viable way of tackling mation class is the imbalance ratio 1:0.686 for true
the COVID-19 themed health-related misinforma- information:misinformation in the dataset (see data
tion. These classifiers are trained based on the distribution for both classes in section 3.3). More-
input features extracted from the tweets (e.g., to- over, in reality, the number of tweets with true
kens) and corresponding numerical labels assigned information is usually much higher than the actual
using the manual annotation process described in misinformation in social media, which is imitated in
section 3.2. The steps involved in training the clas- this study. We also observe that the highest Macro-
sification models are shown in Fig. 1. F1-Score of 0.755 is achieved for the Decision Tree
classifier with the uni-gram method, which further
3.4.1 Performance Metrics indicates the effectiveness of uni-gram modeling for
We have used the following standard evaluation health-related tweets’ classification.
metrics to evaluate our detection (i.e., classifica- Next, for Precision and Recall, we find that some
tion) methodology: class-wise Precision, class-wise of the methods (e.g., bi-grams, tri-grams) have
Recall, class-wise F1-score, classification Accuracy, achieved a perfect precision (1.0) or recall (1.0)
and Macro-F1-score. The following list provides the for either of the class. However, whenever a clas-
formulas for each of the metrics: sifier achieved a perfect precision (very few false

6
Figure 1: Steps for Processing of Health-Related Tweets to Detect Misinformation

Misinformation True information Aggregated Metrics


Precision Recall F1-score Precision Recall F1-score Accuracy Macro-F1
BoW .684 .650 .667 .785 .810 .797 0.65 .732
uni-gram .683 .683 .683 .780 .780 .780 0.74 .731
NB
bi-grams .680 .472 .557 .747 .875 .806 0.73 .682
tri-grams 1.0 .211 .348 .674 1.0 .805 0.70 .577
BoW .600 .585 .593 .730 .742 .736 0.67 .664
uni-gram .852 .561 .677 .753 .932 .833 0.78 .755
DT
bi-grams 1.0 .278 .435 .711 1.0 .831 0.74 .633
tri-grams 1.0 .131 .233 .653 1.0 .790 0.67 .511
BoW .636 .512 .568 .714 .807 .758 0.68 .663
uni-gram .692 .659 .675 .771 .797 .783 0.74 .731
MEM
bi-grams .492 .861 .626 .865 .500 .634 0.63 .630
tri-grams .413 .868 .559 .750 .242 .366 0.48 .463
BoW .700 .342 .459 .675 .903 .772 0.67 .616
uni-gram .895 .415 .567 .704 .966 .814 0.74 .691
RF
bi-grams .889 .222 .356 .692 .984 .813 0.71 .533
tri-grams .653 .132 .233 .653 1.0 .790 0.67 .511
BoW .737 .341 .467 .679 .919 .781 0.69 .624
uni-gram .739 .415 .531 .688 .898 .779 0.70 .655
SVM
bi-grams .923 .333 .490 .724 .984 .834 0.75 .662
tri-grams 1.0 .105 .191 .646 1.0 .785 0.66 .488

Table 3: Performance of Baseline Classifiers for Misinformation, True Information classes, and Aggregated
Metrics (test data size = 105)

positives) for the misinformation class, it has a sig- 4.1 Limitations


nificantly lower recall of around 0.2 (very high false
There are still some limitations in the study. First,
negatives). Any such behavior would count as a bias
we are not inferring the tweet’s meaning. For exam-
classifier towards one of the class labels. Hence, we
ple, a classifier’s failure would look like classifying
want to choose prevision and recall more balanced
two-sentences, sentence-1=“Hydroxychloroquine
for both of the class labels. Another insight we
is a medicine for COVID-19”, and
draw from the current analysis is that the bi-grams
sentence-2=“Hydroxychloroquine is not a
and tri-grams methods have not been performing
medicine for COVID-19”. Though sentence-2
well enough because most of the bi-grams and tri-
= ¬sentence-1, and sentence-2 is true in-
grams tokens are unique and are not repeated in
formation, it does not have enough samples in
the dataset for both class labels. We also feel that
the training dataset, and may get classified as
the small sample data size (n = 524) including both
misinformation. Thus, to handle such scenarios,
the training and testing phase, is also not enough
we need to build further methods to get negation
for a rigorous study but it is certainly the first step
versus affirmation meaning from a tweet to get
towards exploring the significance of the problem.
more accurate models. We can also include
We believe with a larger dataset, our methodology’s
anti-misinformation (i.e., true information coun-
performance could be further generalized.
tering the existing misinformation) in our training
datasets to make the model more proactive in
detecting misinformation campaigns. Second, we
have only selected four dates and 10,000 tweets for

7
each day due to the lack of time and resources to instead blocking misinformation contents. We rec-
invest in the manual annotation process. Moreover, ommend that future studies should investigate the
some of the tweet IDs (around 5-10% of 10,000 solution space in this direction for systematizing it.
tweets each day) we have extracted empty tweets
(i.e., removed by Twitter or author). Third, the 4.3 Future Directions
lack of ground-truth datasets forced us to do
In future, it would be interesting to explore de-
manual labeling of the data to train supervised
tection mechanisms with explainable classifiers for
learning-based classifiers. These manual tasks may
bringing trustworthiness to misinformation detec-
have impurity and errors, but we have used our
tion research. Another future work arena would
best judgments and followed the best practices to
be to study network activities such as retweets,
apply various mechanisms (e.g., group discussion,
likes, and followers count for health misinformation
majority voting) to bring fairness into the anno-
tweet accounts to see if bots or real accounts have
tation process. Fourth, We have only considered
disseminated it. An analysis of those network ele-
BoW and n-gram methods for this pilot study
ments could validate existing misinformation propa-
and have not examined other methods such as
gation frameworks (Cyber-Digital Task Force, 2018)
TF-IDF and word embedding. Moreover, using the
for health-related misinformation, which could be
BoW method has its own shortcomings (Swarnkar,
leveraged to develop proactive mechanisms to de-
2020) which is also present in the current study.
tect unseen misinformation activity in real-time.
Fifth, misinformation through images and videos
Finally, along with the detection part, correcting
are not focused in this study. Sixth, we only
health-related misinformation among communities
analyze Twitter as social media platform, but
need to be studied. Because of the continued in-
we believe the methodology is still applicable
fluence effect, misinformation once presented con-
to other platforms (e.g., Facebook, Instagram)
tinues to influence later judgments, the author pro-
with minimal tweaks. However, multimedia-based
posed compelling facts as a medium of correction
platforms (e.g., TikTok) need different approaches.
(Lewandowsky et al., 2012), which needs to be fur-
ther analyzed in the context of COVID-19 health
4.2 Ethical Considerations misinformation.
Although it is essential to understand how to miti-
gate effects of misinformation, there are some eth- 5 Conclusion
ical considerations. Misinformation is commonly
encountered in the form of selective information In this paper, we present a methodology for health
or partial truths in conversations about contro- misinformation detection in Twitter leveraging
versial topics (Schneier, 2019). An example is the state-of-the-art techniques from NLP, ML, misin-
statistics on Black-on-Black crimes that are used to formation propagation, and existing social psychol-
explain over policing of Black communities (Braga ogy research domains. We find extracting and an-
and Brunson, 2015). From this perspective, misin- notating quality data for misinformation research
formation can be generated when the information is challenging. We discover gap in availability of
available is incomplete (e.g.in news developing sto- ground-truth dataset, which is addressed by our
ries), or when a new findings appear contradicting effort. Our findings highlight Decision Tree classi-
existing beliefs (Morawska and Cao, 2020). We fier is outperforming all other classifiers with 78%
think misinformation detection mechanisms must classification accuracy leveraging simpler uni-gram
consider these factors to avoid being viewed as cen- features for text representation. We recommend
sorship or violation of freedom of speech (Kaiser future studies to systematize the understanding of
et al., 2020). To ensure freedom of speech, we have the health-related misinformation dissemination in
opted to label tweets as misinformation if we feel social media and its impact on public health at
the author of any tweet is demanding questions on large.
the existing health system, therapeutics, or policies
for tackling this pandemic. We have also ensured
not to violate the privacy of any Twitter users by References
not using any sensitive account information about
Wasim Ahmed, Francesc López Seguí, Josep Vidal-
any tweet’s owner (author of any tweet labeled as
Alaball, and Matthew S Katz. 2020. Covid-19
T or M ). We have also cleaned up our tweets using and the “film your hospital” conspiracy theory:
regex where texts contain tagging other Twitter Social network analysis of twitter data. J Med
users with ‘@TwitterID’ tags. Lastly, we believe Internet Res, 22(10):e22374.
that a better policy for addressing misinformation
would be to provide correct information that does M. S. Al-Rakhami and A. M. Al-Amri. 2020. Lies
not threatens individuals’ existing beliefs (Chan kill, facts save: Detecting covid-19 misinforma-
et al., 2017) but deter them from harmful behavior tion in twitter. IEEE Access, 8:155961–155970.

8
Sana Ali. 2020. Combatting against covid-19 & Johannes Fürnkranz. 1998. A study using n-
misinformation: A systematic review. Human gram features for text categorization. Aus-
Arenas. trian Research Institute for Artifical Intelligence,
3(1998):1–10.
Zapan Barua, Sajib Barua, Salma Aktar, Najma
Kabir, and Mingze Li. 2020. Effects of misin- Kacper Gradoń. 2020. Crime in the time of the
formation on covid-19 individual responses and plague: Fake news pandemic and the challenges
recommendations for resilience of disastrous con- to law-enforcement and intelligence community.
sequences of misinformation. Progress in Disas- Society Register, 4(2):133–148.
ter Science, 8:100119.
John Gregory and Kendrick McDonald. 2020. Trail
Adam L. Berger, Vincent J. Della Pietra, and of Deceit: The Most Popular COVID-19 Myths
Stephen A. Della Pietra. 1996. A maximum en- and How They Emerged.
tropy approach to natural language processing.
Sunil Gundapu and Radhika Mamidi. 2021. Trans-
Comput. Linguist., 22(1):39–71.
former based automatic covid-19 fake news de-
tection system.
Gérard Biau. 2012. Analysis of a random forests
model. Journal of Machine Learning Research, Carl I Hovland and Walter Weiss. 1951. The in-
13(38):1063–1095. fluence of source credibility on communication
effectiveness. Public opinion quarterly, 15(4):635–
Anthony Allan Braga and Rodney Brunson. 2015. 650.
The police and public discourse on" Black-on-
Black" violence. US Department of Justice, Office Md Saiful Islam, Tonmoy Sarkar, Sazzad Hossain
of Justice Programs, National Institute of . . . . Khan, Abu-Hena Mostofa Kamal, S. M. Murshid
Hasan, Alamgir Kabir, Dalia Yeasmin, Moham-
Aengus Bridgman, Eric Merkley, Peter John mad Ariful Islam, Kamal Ibne Amin Chowdhury,
Loewen, Taylor Owen, Derek Ruths, Lisa Te- Kazi Selim Anwar, Abrar Ahmad Chughtai, and
ichmann, and Oleg Zhilin. 2020. The causes and Holly Seale. 2020. COVID-19–Related Infodemic
consequences of covid-19 misperceptions: Un- and Its Impact on Public Health: A Global Social
derstanding the role of news and social media. Media Analysis. The American Journal of Trop-
Harvard Kennedy School Misinformation Review, ical Medicine and Hygiene, 103(4):1621–1629.
1(3).
Ben Kaiser, Jerry Wei, Elena Lucherini, Kevin Lee,
Man-pui Sally Chan, Christopher R Jones, Kath- J Nathan Matias, and Jonathan Mayer. 2020.
leen Hall Jamieson, and Dolores Albarracin. 2017. Adapting security warnings to counter misinfor-
Debunking: A meta-analysis of the psychological mation. arXiv preprint arXiv:2008.10772.
efficacy of messages countering misinformation.
Psychological science, 28(11):1531–1546. Akib Mohi Ud Din Khanday, Syed Tanzeel Rabani,
Qamar Rayees Khan, Nusrat Rouf, and Masarat
Jyoti Choudrie, Snehasish Banerjee, Ketan Kotecha, Mohi Ud Din. 2020. Machine learning based
Rahee Walambe, Hema Karende, and Juhi approaches for detecting covid-19 using clinical
Ameta. 2021. Machine learning techniques and text data. International Journal of Information
older adults processing of online information and Technology, pages 1–9.
misinformation: A covid 19 study. Computers in
Human Behavior, 119:106716. Rabindra Lamsal. 2020. Coronavirus (COVID-19)
Tweets Dataset.
Matteo Cinelli, Walter Quattrociocchi, Alessan-
Pat Langley and Stephanie Sage. 2013. Induc-
dro Galeazzi, Carlo Michele Valensise, Emanuele
tion of selective bayesian classifiers. CoRR,
Brugnoli, Ana Lucia Schmidt, Paola Zola, Fabi-
abs/1302.6828.
ana Zollo, and Antonio Scala. 2020. The covid-19
social media infodemic. Scientific Reports, 10(1). Stephan Lewandowsky, Ullrich K. H. Ecker,
Colleen M. Seifert, Norbert Schwarz, and John
Domenico Cucinotta and Maurizio Vanelli. 2020. Cook. 2012. Misinformation and its correction:
Who declares covid-19 a pandemic. Acta Bio Continued influence and successful debiasing.
Medica Atenei Parmensis, 91(1):157–160. Psychological Science in the Public Interest.
Cyber-Digital Task Force. 2018. Attorney general’s Heidi Oi-Yee Li, Adrian Bailey, David Huynh, and
cyber-digital task force. Technical report, United James Chan. 2020. Youtube as a source of infor-
States Department of Justice. mation on covid-19: a pandemic of misinforma-
tion? BMJ Global Health, 5(5):e002604.
Theodoros Evgeniou and Massimiliano Pontil. 2001.
Support vector machines: Theory and applica- Jane Lytvynenko. 2020a. Here are some of the
tions. In Machine Learning and Its Applications, coronavirus hoaxes that spread in the first few
Advanced Lectures, pages 249–257. Springer. weeks.
Jane Lytvynenko. 2020b. Here’s a running list J. C. M. Serrano, O. Papakyriakopoulos, and
of the latest hoaxes spreading about the coron- S. Hegelich. 2020. NLP-based Feature Extraction
avirus. for the Detection of COVID-19 Misinformation
Videos on YouTube. In ACL 2020 Workshop
Tim K Mackey, Vidya Purushothaman, Michael on Natural Language Processing for COVID-19
Haupt, Matthew C Nali, and Jiawei Li. 2021. (NLP-COVID).
Application of unsupervised machine learning Lisa Singh, Shweta Bansal, Leticia Bode, Ceren
to identify and characterise hydroxychloroquine Budak, Guangqing Chi, Kornraphop Kawintira-
misinformation on twitter. The Lancet Digital non, Colton Padden, Rebecca Vanarsdall, Emily
Health, 3(2):e72–e75. Vraga, and Yanchen Wang. 2020. A first look at
covid-19 information and misinformation sharing
Lidia Morawska and Junji Cao. 2020. Airborne
on twitter. arXiv preprint arXiv:2003.13907.
transmission of sars-cov-2: The world should
face the reality. Environment international,
139:105730. Yan-Yan Song and Ying Lu. 2015. Deci-
sion tree methods: applications for classifi-
Benjamin Mueller and Roni Caryn Rabin. 2020. cation and prediction. Shanghai archives of
Common Drug Reduces Coronavirus Deaths, Sci- psychiatry, 27(2):130–135. 26120265[pmid],
entists Report. PMC4466856[pmcid],sap-27-02-130[PII].

NLTK. 2020. Nltk 3.5 documentation. Statista. 2020. Estimated u.s. social media
usage increase due to coronavirus home iso-
Shan L Pan and Sixuan Zhang. 2020. From fight-
lation 2020. https://www.statista.com/
ing covid-19 pandemic to tackling sustainable
statistics/1106343/. Last accessed on March
development goals: An opportunity for responsi-
22, 2021.
ble information systems research. International
Journal of Information Management, 55:102196.
Naman Swarnkar. 2020. Bag of words: Ap-
Han Woo Park, Sejung Park, and Miyoung Chong. proach, python code, limitations. https:
2020. Conversations and medical news frames //blog.quantinsti.com/bag-of-words/
on twitter: Infodemiological study on covid-19 in #Limitations-of-Bag-of-Words. Last ac-
south korea. J Med Internet Res, 22(5):e18897. cessed on March 25, 2021.

Parth Patwa, Shivam Sharma, Srinivas PYKL, Wen-Ying Sylvia Chou and Anna Gaysynsky. 2020.
Vineeth Guptha, Gitanjali Kumari, Md Shad A prologue to the special issue: Health misinfor-
Akhtar, Asif Ekbal, Amitava Das, and Tan- mation on social media.
moy Chakraborty. 2020. Fighting an infodemic:
Covid-19 fake news dataset. Documenting the Now. 2020. Hydrator [computer
software]. Retrieved from https://github.com/
Mir Mehedi Ahsan Pritom, K. M. Schweitzer, R. M.
DocNow/hydrator.
Bateman, M. Xu, and S. Xu. 2020. Character-
izing the landscape of covid-19 themed cyberat-
WHO. 2020. Infodemic. https://www.who.int/
tacks and defenses. In 2020 IEEE International
health-topics/infodemic. Last accessed on
Conference on Intelligence and Security Infor-
March 24, 2021.
matics (ISI), pages 1–6.
RANKS.NL. Stopwords. https://www.ranks.nl/ World Health Organization. 2020. Trail of Deceit:
stopwords. Last accessed on March 21, 2021. The Most Popular COVID-19 Myths and How
They Emerged.
Joshua Roesslein. 2020. Tweepy:
Twitter for python! URL: Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and
https://github.com/tweepy/tweepy. Tieniu Tan. 2017. A convolutional approach for
misinformation identification. In Proceedings of
Jon Roozenbeek, Claudia R Schneider, Sarah Dry- the Twenty-Sixth International Joint Conference
hurst, John Kerr, Alexandra LJ Freeman, Gabriel on Artificial Intelligence, IJCAI-17, pages 3901–
Recchia, Anne Marthe Van Der Bles, and Sander 3907.
Van Der Linden. 2020. Susceptibility to misinfor-
mation about covid-19 around the world. Royal
Yin Zhang, Rong Jin, and Zhi-Hua Zhou. 2010.
Society open science, 7(10):201199.
Understanding bag-of-words model: a statistical
Bruce Schneier. 2019. Toward an information oper- framework. International Journal of Machine
ations kill chain. Learning and Cybernetics, 1(1-4):43–52.

10

You might also like