Professional Documents
Culture Documents
Fake News Detection
Fake News Detection
Abstract— Fake news is exponentially grown due to the of Fake News has increased because some social media
popularity and ease of putting forward our thoughts and users want to increase their views/readerships/likes. Even
opinion. This easily can be believed by a big mass of people and media, sometimes spreads satirical news to increase the
circulating the same causes chaos and unwanted disputes. With TRP ratings and the news spread can also be biased either
fierce competition in news outlets and social media, it has towards a specific person or a specific group or a specific
become a threat to consumers. For months, the Corona virus political party.
and its associated impact has given rise to many fake news
worldwide, including India. During the initial stages of the Currently if there is anything that is spreading faster than
corona, fake news particularly on social media was spread
recklessly. Contents and news about the origin, cause,
Covid-19 it is the fake news related to Covid-19 regarding
symptoms, and cure and conspiracy theories about the virus how the virus spreads, lockdown, the second wave, vaccines
were on the feed. This brings in the need to help the masses to etc. This has caused unnecessary panic among the public as
be able to not fall prey to this fake or biased news using ML to what to believe and not. To build awareness among the
algorithms. The use of ensemble machine learning algorithms public, social media platforms such as Instagram, Twitter
and vectorizers and the main theme of this paper. and Facebook have taken certain measures and various
approaches to provide the public with right information.
Keywords— Fake news, Social awareness, panic, Covid-19, Social media platforms have also given quick guidelines on
News bias, Misleading Information, Bias Identification, Machine
how to distinguish Fake News from genuine news [4].
Learnings.
Numerous researchers are additionally dealing with
handling this issue by utilizing various methods to
I. INTRODUCTION distinguish counterfeit news.
Counterfeit news isn't new, it has consistently been there
This review paper centers on the Machine Learning techniques
even before computerized media existed, but with the
which can be utilized to distinguish fake news in social media
increase in usage of social platforms and digital media fake
platforms. Section II gives a brief overview on the papers
news is spreading in an uncontrollable rate.
referred; and in Section III, we have described the techniques
Fake News is something that everyone has been dealing with and datasets used by different authors in order to detect fake
for many years now. In the past, it was spread through news.
Yellow Journalism, rumors and excessive gossip articles;
however today, it has become an even more problematic II. LITERATURE SURVEY
topic as the use of social media has increased exponentially. The enormous and persistent supply of news is web-based
Any individual/User that has access to the internet can platforms like Facebook, Twitter, etc. Most of the contents
spread Fake News which can either be very silly or can also and articles do not come from professional sources. And the
lead to panic among the public. This has become a problem vast majority of the news contents are either fake or biased.
because people cannot distinguish between fake news and This makes the crowd to reach a wrong conclusion and
real news with 100% accuracy and this can cause them to intentionally promote political and other sorts of social
make their decisions according to the news they read/heard issues. This paper proposes to locate a dependable and right
about. This is also one of the reasons for psychological model to classify fake news and articles from the real ones
warfare, misinformation, reduction of economy, distrust etc. using machine learning algorithms.
These types of News have caused huge conflicts in social
media platforms like Twitter and Facebook. All the papers that were referred proposed approaches to
detect misleading information that is being circulated on
There are many definitions for Fake News, Fake News can social media and other platforms using machine learning
be defined as made up stuff about a particular event that algorithms.
exists or about a non-existent event. Fake News can also be
satirical which tries to bring humor from real news which
may not be the same as the real news. Nowadays, the spread
Granik et al. [1] discusses about implementing a very simple and "word vector" features were improved by using
model using an artificial intelligence algorithm, called Naïve "ensemble methods".
Bayes Classifier to detect fake news as artificial intelligence
was getting better at solving classification problems. The Ning Xin Nyow and Hui Na Chua, within [9], obtained and
datasets collected by the authors of [1] were from Facebook remodeled Twitter's information for spotting further vital
posts of news channels like Politico, ABC News and CNN. attributes that influence the exactness of the ML techniques
to categorize if a news is real or fake by making use of
The authors in [2] proposed a model for detecting fake news approaches like data processing. The authors also shed light
that utilized a technique called "N-gram Analysis" and other on the characteristics, the numerous Tweet’s attributes and
Machine Learning techniques and also performed feature design of the application to reliably change the distinction
extraction techniques on the datasets unlike [1]. The dataset of online news.
used for this experiment was mostly collected from "Kaggle"
and "Reuters" for fake news and real news. The proposed The authors in [10] begin by putting forward an outline of
model was also tested with already available datasets like well-known data revealing the characteristics using varied
“Horne and Adali dataset” and “Burfoot and Baldwin’s satire large models which includes the far-famed
dataset” which was publically available. "Maki–Thompson model". Using these as the foundation,
they projected and designed what they described as
The authors in [3] have collected around thirty three "context-aware modeling frameworks capable of capturing
thousand Tweets as its dataset through a twitter developer specific eventualities in on-line social media data unfold".
account. To recognize fake news extracted from social They projected four models capable of capturing the
media, the same authors have proposed a system with elements of data separation for a particular context and
"stylistic – computational analysis" based on Natural additionally provided random versions of those models.
Language Processing and have used algorithm like
"one-class Support Vector Machine". Utilizing multinomial voting algorithm [11], the paper
mainly focuses on a combined method for detecting
Different from the previous papers referred, [5] identifies counterfeit data. "Naïve Bayes", "Random Forest","
User profiles which spreads fake news in online media Decision Tree"," Support Vector Machine", "k - Nearest
platforms. This is done by using both automatic and human Neighbours" are some of the other Machine Learning
point of view in identifying certain features of that particular techniques implemented in this paper. The training data
user profile and the news content shared by them. Instead of used to train the algorithm is obtained from the "Bag of
checking for fake news, the author tries to identify unreliable Words" model. Verifying and validating the data further is
User profiles that spread fake news in social platforms completed using the python language. Tableau, a
through offline and online analysis. The datasets required for visualization tool is used in this paper. Default algorithm
this purpose was collected from twitter. (i)Offline analysis values carry out the implementation process.
was done by using Deep Neural Networks and (ii)Online
analysis was done by providing questionnaire to real users. Due to the negative influence on society, [13] because of the
widespread of fake news via social media and news outlets,
The authors in paper [6] exploited the features of "Google this paper talks about the urge to make automated fake news
BERT (Bidirectional Encoder Representations from detection tool. To address this problem, a combination of
Transformers)" and have implemented a model for neural network architecture like CNN and LSTM is utilized
distinguishing fake news. The types of data in the dataset and two distinct dimensionality reduction approach - Chi-
used are multimedia, news and social network data from Square and Principle Components Analysis is used. This
twitter. They used two famous publically available datasets model improved result by 20% in term of F1 score and 4%
which are the "LIAR dataset" and "FakeNewsNet dataset" in term of accuracy.
for testing the performance. The model designed in this
paper also categorizes faux news in multimedia content. As mentioned in the above papers one of the most
commonly used vectorizers is TF-IDF and count. [14] Also,
Elhadad et al. [7] discusses the detection of misleading to improve the scores they have included English stop
information, particularly on Covid – 19. They have collected words. In addition to this, the paper also uses various
data from reputed and international organizations like WHO, classifiers like SVM, Naive Bayes, DT, and logistic
UNICEF, UN and fact checking websites to validate their regression.
data on Covid – 19. The authors have created a model for
detecting misleading information by utilizing a number of ] [15] This particular paper talks about the presidential
Machine Learning algorithms and feature extraction election in US that took place in 2016 and how counterfeit
techniques. news resulted in debates and baseless accusations. The
author of the paper has created a dataset that consisted of
two hundred tweets on "Hilary Clinton" and assess them.
[8] briefs about the detection of fake news through
They explore the techniques for extracting and classifying
approaches which uses only the textual features in the news.
news and perform linguistic analysis on tweets after text
The authors brought out the power of "ensemble methods"
normalization.
and proposed a model using "stylometric features" and
"word vector" representation of the dataset to detect fake
news on social media which is further