Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

FAKE NEWS DETECTION USING MACHINE LEARNING

A Seminar Report

Submitted By
Mayuresh Warang PRN Number 72152987K
UNDER THE GUIDANCE OF
Prof Jyotsna Nanajkar
in partial fulfilment for the award of the degree

of

BACHELOR OF ENGINEERING

IN

INFORMATION TECHNOLOGY ENGINEERING

Zeal College of Engineering and Research Narhe, Pune

Department Of Information Technology


Year 2022-23

SAVITRIBAI PHULE PUNE UNIVERSITY: PUNE


ACKNOWLEDGEMENT

I am sincerely express our wholehearted thanks to the principal Dr.Ajit Kate , Zeal College of Engineering
and Research Narhe, Pune. for his constant encouragement and moral support during this project.

I owe my sincere thanks to Prof Balaji Chaugule, Head of the Department of Information Technology
Engineering, Institute of Zeal College of Engineering and Research Narhe, Pune. for furnishing every
essential facility for doing this project.

I sincerely thank my guide Prof Jyotsna Nanajkar , Department of Information Technology Engineering,
Zeal College of Engineering and Research Narhe, Pune. for his valuable help and guidance throughout the
project.

It gives me a great pleasure in presenting my seminar report on “Fake News Detection”


Abstract

In our modern era where the internet is ubiquitous, everyone relies on various online resources for news.
Along with the increase in the use of social media platforms like Facebook, Twitter, etc. news spread rapidly
among millions of users within a very short span of time. The spread of fake news has far-reaching
consequences like the creation of biased opinions to swaying election outcomes for the benefit of certain
candidates. Moreover, spammers use appealing news headlines to generate revenue using advertisements via
click-baits. In this paper, we aim to perform binary classification of various news articles available online
with the help of concepts pertaining to Artificial Intelligence, Natural Language Processing and Machine
Learning. We aim to provide the user with the ability to classify the news as fake or real and also check the
authenticity of the website publishing the news.

Social Media for news consumption is a double edge sword. On the one hand , it is low cost and rapid access
of information lead people to seek news from social media. On the other hand , it enables the wide spread of
“Fake News” i.e. low quality news with intentionally false information.

The fake news on social media and various other media is wide spreading and is a matter of serious impact
due to its ability to cause a lot of social and national damage with destructive impact.
Keywords

1) Machine Learning
2) Data Mining
3) Support Vector Machine
4) Random Tree
5) Decision Tree
6) Naïve Bayes
7) KNN
8) Logistic Regression
Contents

1 INTRODUCTION 1

1.1 Machine Learning 1

1.2 Data Mining 2

1.3 Motivation 2

2 LITERATURE REVIEW 3

3 FAKE NEWS 4
3.1 What is Fake News 4
3.2 Types of Fake News 4
3.3 Traditional Fake News 5
3.4 Social Media Fake News 5

4 FAKE NEWS DETECTION 6


4.1 Problem Statement 6

5 FEATURE EXTRACTION 7
5.1 News Content Feature 7
5.1.1 Linguistic Based 7
5.1.2 Visual Based 8
5.2 Social Context Feature 8
5.2.1 User Based 9
5.2.2 Post Based 9
5.2.3 Network Based 9

6 MODEL CONSTRUCTION 10
6.1 News Content Model 10
6.1.1 Knowledge Based 10
6.1.2 Style Based 10
6.2 Social Context Model 11
6.2.1 Stance Based 11
6.2.2 Propagation Based 11

7 ASSESSING DETECTION EFFICACY 12


7.1 Datasets 12

8 ALGORITHMS 13
8.1 Random Forest 13
8.2 Decision Tree 13
8.3 Logistic Regression 13
8.4 Naïve Bayes 14
8.5 KNN 14

9 EVALUATION METRICS 15

10 RELATED AREA 16
10.1 Rumor Classification 16
10.2 Truth Discovery 16
10.3 Clickbait Detection 16

Conclusion 17

References 18
List of figures

9.1 Evaluation Matrix 15


Chapter 1

Introduction

As an increasing amount of our lives is spent interacting online through social media
platforms, more and more people tend to hunt out and consume news from social media
instead of traditional news organizations. The explanations for this alteration in consumption
behaviors are inherent within the nature of those social media platforms: (i) it's often more
timely and fewer expensive to consume news on social media compared with traditional
journalism , like newspapers or television; and (ii) it's easier to further share, discuss , and
discuss the news with friends or other readers on social media. For instance, 62 percent of
U.S. adults get news on social media in 2016, while in 2012; only 49 percent reported seeing
news on social media . It had been also found that social media now outperforms television
because the major news source. Despite the benefits provided by social media, the standard
of stories on social media is less than traditional news organizations.

Detecting fake news on social media poses several new and challenging research problems.
Though fake news itself is not a new problem{nations or groups have been using the news
media to execute propaganda or influence operations for centuries{the rise of web-generated
news on social media makes fake news a more powerful force that challenges traditional
journalistic norms. There are several characteristics of this problem that make it uniquely
challenging for automated detection. First, fake news is intentionally written to mislead
readers, which makes it nontrivial to detect simply based on news content.

1.1 Machine Learning


Machine Learning (ML) is a class of algorithms that help software systems achieve more
accurate results without having to reprogram them directly. Data scientists characterize
changes or characteristics that the model needs to analyze and utilize to develop predictions.
When the training is completed, the algorithm splits the learned levels into new data.

1.2 Data Mining


Data mining techniques are categorized into two main methods, which is; supervised and
unsupervised. The supervised method utilizes the training information in order to foresee the

1
hidden activities. Unsupervised Data Mining is a try to recognize hidden data models
provided without providing training data for example, pairs of input labels and categories.

1.2 Motivation
Fake News contains misleading information that could be checked. This maintains lie about a
certain statistic in a country or exaggerated cost of certain services for a country, which may
arise unrest for some countries like in Arabic spring. There are organizations, like the House
of Commons and the Crosscheck project, trying to deal with issues as confirming authors are
accountable. However, their scope is so limited because they depend on human manual
detection, in a globe with millions of articles either removed or being published every
minute, this cannot be accountable or feasible manually. A solution could be, by the
development of a system to provide a credible automated index scoring, or rating for
credibility of different publishers, and news context. Therefore there is a need of detecting
the fake news and hence we use Fake News Detection Project.

2
Chapter 2

Literature Review

Mykhailo Granik et. al. in their paper [1] shows a simple approach for fake news detection
using naive Bayes classifier. This approach was implemented as a software system and tested
against a data set of Facebook news posts. They were collected from three large Facebook
pages each from the right and from the left, as well as three large mainstream political news
pages (Politico, CNN, ABC News). They achieved classification accuracy of approximately
74%. Classification accuracy for fake news is slightly worse. This may be caused by the
skewness of the dataset: only 4.9% of it is fake news.

Himank Gupta et. al. [2] gave a framework based on different machine learning approach that
deals with various problems including accuracy shortage, time lag (BotMaker) and high
processing time to handle thousands of tweets in 1 sec. Firstly, they have collected 400,000
tweets from HSpam14 dataset. Then they further characterize the 150,000 spam tweets and
250,000 non- spam tweets. They also derived some lightweight features along with the Top-
30 words that are providing highest information gain from Bag-of-Words model. 4. They
were able to achieve an accuracy of 91.65% and surpassed the existing solution by
approximately18%.

Marco L. Della Vedova et. al. [3] first proposed a novel ML fake news detection method
which, by combining news content and social context features, outperforms existing methods
in the literature, increasing its accuracy up to 78.8%. Second, they implemented their method
within a Facebook Messenger Chabot and validate it with a real-world application, obtaining
a fake news detection accuracy of 81.7%. Their goal was to classify a news item as reliable or
fake; they first described the datasets they used for their test, then presented the content-based
approach they implemented and the method they proposed to combine it with a social-based
approach available in the literature. The resulting dataset is composed of 15,500 posts,
coming from 32 pages (14 conspiracy pages, 18 scientific pages), with more than 2, 300, 00
likes by 900,000+ users. 8,923 (57.6%) posts are hoaxes and 6,577 (42.4%) are non-hoaxes.

3
Chapter 3
Fake News
3.1 Fake News

The reasons for choosing this narrow definition are three folds. First, the underlying intent of
fake news provides both theoretical and practical value that enables a deeper understanding
and analysis of this topic. Second, any techniques for truth verification that apply to the
narrow conception of fake news can also be applied to under the broader definition. Third,
this definition is able to eliminate the ambiguities between fake news and related concepts
that are not considered in this article. The following concepts are not fake news according to
our definition: (1) satire news with proper context, which has no intent to mislead or deceive
consumers and is unlikely to be mis-perceived as factual; (2) rumors that did not originate
from news events; (3) conspiracy theories, which are difficult verify as true or false; (4)
misinformation that is created unintentionally; and (5) hoaxes that are only motivated by fun
or to scam targeted individuals.

Fake news has existed for a very long time, nearly the same amount of time as news began to
circulate widely after the printing press was invented in 1439. However, there is no agreed
definition of the term \fake news". Therefore, werst discuss and compare some widely used
definitions of fake news in the existing literature, and provide our definition of fake news that
will be used for the remainder of this survey. A narrow definition of fake news is news
articles that are intentionally and verifiably false and could mislead readers.

3.2 Types of Fake News

1) Fake News on Traditional News Media


2) Fake News on Social Media

3.3 Fake News on Traditional News Media

Fake news itself is not a new problem. The media ecology of fake news has been changing
over time from newsprint to radio/television and, recently, online news and social media. We
4
denote \traditional fake news" as the fake news problem before social media had important
effects on its production and dissemination. Next, we will describe several psychological and
social science foundations that describe the impact of fake news at both the individual and
social information ecosystem levels.

Psychological Foundations of Fake News. Humans are naturally not very good at
differentiating between real and fake news. There are several psychological and cognitive
theories that can explain this phenomenon and the influential power of fake news. Traditional
fake news mainly targets consumers by exploiting their individual vulnerabilities.

3.4 Fake News on Social Media

In this subsection, we will discuss some unique characteristics of fake news on social media.
Specifically, we will highlight the key features of fake news that are enabled by social media.
Note that the aforementioned characteristics of traditional fake news are also applicable to
social media. Malicious Accounts on Social Media for Propaganda. While many users on
social media are legitimate, social media users may also be malicious, and in some cases are
not even real humans. The low cost of creating social media accounts also encourages
malicious user accounts, such as social bots, cyborg users, and trolls. A social bot refers to a
social media account that is controlled by a computer algorithm to automatically produce
content and interact with humans (or other bot users) on social media.

5
Chapter 4

Fake News Detection

4.1 Problem Statement

In the previous section, we introduced the conceptual characterization of traditional fake


news and fake news in social media. Based on this characterization, we further explore the
problem definition and proposed approaches for fake news detection.

In this subsection, we present the details of mathematical formulation of fake news detection
on social media. Specifically, we will introduce the definition of key components of fake
news and then present the formal definition of fake news detection. The basic notations are
defined below,
1) Let a refer to a News Article. It consists of two major

components: Publisher and Content. Publisher ~ pa


includes a set of profile features to describe the original
author, such as name, domain, age, among other
attributes. Content ~ ca consists of a set of attributes
that represent the news article and includes headline,
text, image, etc.
2) We also define Social News Engagements as a set of tuples E = {eij} to represent
the process of how news spread over time among n users U = {u1,u2,u3….} and their
corresponding posts P = {p1,p2,p3,….} on social media regarding news article a. Each
engagement eit = { u,p,t} represents that a user u spreads
news article a using pi at time t. Note that we set
t = Null if the article a does not have any engagement
yet and thus u represents the publisher.

6
Chapter 5

Feature Extraction

Fake news detection on traditional news media mainly relies on news content, while in social
media, extra social context auxiliary information can be used to as additional information to
help detect fake news. Thus, we will present the details of how to extract and represent useful
features from news content and social context.

5.1 News Content Feature

News content features ~ ca describe the meta information related to a piece of news. A list of
representative news content attributes are listed below:
1) Source: Author or publisher of the news article
2) Headline: Short title text that aims to catch the attention of readers and describes
the main topic of the
article
3)Body Text: Main text that elaborates the details of
the news story; there is usually a major claim that
is specifically highlighted and that shapes the angle
of the publisher
4)Image: Part of the body content of a news article that
provides visual cues to frame the story

5.1.1 Linguistic Based

Linguistic-based: Since fake news pieces are intentionally created for financial or political
gain rather than to report objective claims, they often contain opinionated and inflammatory
language, crafted as \clickbait" (i.e., to entice users to click on the link to read the full article)
or to incite confusion. Thus, it is reasonable to exploit linguistic features that capture the
different writing styles and sensational headlines to detect fake news. Linguistic based
features are extracted from the text content in terms of document organizations from different
levels, such as characters, words, sentences, and documents. In order to capture the different
aspects of fake news and real news, existing work utilized both common linguistic features
and domain-specific linguistic features.

5.1.2 Visual Based

7
Visual-based: Visual cues have been shown to be an important manipulator for fake news
propaganda10. As we have characterized, fake news exploits the individual vulnerabilities of
people and thus often relies on sensational or even fake images to provoke anger or other
emotional response of consumers. Visual-based features are extracted from visual elements
(e.g. images and videos) to capture the different characteristics for fake news. Faking images
were identified based on various user-level and tweet-level hand-crafted features using
classification framework.

5.2 Social Context Feature


In addition to features related directly to the content of the news articles, additional social
context features can also be derived from the user-driven social engagements of news
consumption on social media platform. Social engagements represent the news proliferation
process over time, which provides useful auxiliary information to infer the veracity of news
articles. Note that few papers exist in the literature that detect fake news using social context
features. However, because we believe this is a critical aspect of successful fake news
detection, we introduce a set of common features utilized in similar research areas, such as
rumor veracity classification on social media.

5.2.1 User Based


User-based: Fake news pieces are likely to be created and spread by non-human accounts,
such as social bots or cyborgs. Thus, capturing users' profiles and characteristics by user-
based features can provide useful information for fake news detection. User based features
represent the characteristics of those users who have interactions with the news on social
media. These features can be categorized across different levels: individual level and group
level: individual level and group level.

5.2.2 Post Based


People express their emotions or opinions towards fake news through social media posts, such as
skeptical opinions, sensational reactions, etc. Thus, it is reasonable to extract post-based features
to help and potential fake news via reactions from the general public as expressed in posts. Post-
based features focus on identifying useful information to infer the veracity of news from various
aspects of relevant social media posts. These features can be categorized as post level, group
level, and temporal level. Post level features generate feature values for each post.

8
The aforementioned linguistic-based features and some embedding approaches for news
content can also be applied for each post.

5.2.3 Network Based


Users form different networks on social media in terms of interests, topics, and relations. As
mentioned before, fake news dissemination processes tend to form an echo chamber cycle,
highlighting the value of extracting network-based features to represent these types of network
patterns for fake news detection. Network-based features are extracted via constructing specific
networks among the users who published related social media posts. Different types of networks
can be constructed. The stance network can be built with nodes indicating all the tweets relevant
to the news and the edge indicating the weights of similarity of stances.

9
Chapter 6

Model Construction

In the previous section, we introduced features extracted from different sources, i.e., news
content and social context, for fake news detection. In this section, we discuss the details of
the model construction process for several existing approaches. Specifically we categorize
existing methods based on their main input sources as: News Content Models and Social
Context Models.

6.1 News Content Model


In this subsection, we focus on news content models, which mainly rely on news content
features and existing factual sources to classify fake news. Specifically, existing approaches
can be categorized as Knowledge-based and Style-based.

6.1.1 Knowledge Based


Knowledge-based: Since fake news attempts to spread false claims in news content, the most
straightforward means of detecting it is to check the truthfulness of major claims in a news
article to decide the news veracity. Knowledge based approaches aim to use external sources
to fact-check proposed claims in news content. Expert-oriented fact-checking heavily relies
on human domain experts to investigate relevant data and documents to construct the verdicts
of claim veracity, for example PolitiFact11, Snopes12, etc. Crowdsourcing-oriented fact-
checking exploits the \wisdom of crowd" to enable normal people to annotate news content;
these annotations are then aggregated to produce an overall assessment of the news veracity
Computational-oriented fact-checking aims to provide an automatic scalable system to
classify true and false claims. Previous computational-oriented fact checking methods try to
solve two majors issues: (i) identifying check-worthy claims and (ii) discriminating the
veracity of fact claims.

6.1.2 Style Based


Style-based: Fake news publishers often have malicious intent to spread distorted and
misleading information and influence large communities of consumers, requiring particular
writing styles necessary to appeal to and persuade a wide scope of consumers that is not seen
in true news articles. Style-based approaches try to detect fake news by capturing the

10
manipulators in the writing style of news content. There are mainly two typical categories of
style-based methods: Deception-oriented and Objectivity-oriented.

Deception-oriented stylometric methods capture the deceptive statements or claims from


news content. The motivation of deception detection originates from forensic psychology
(i.e., Undeutsch Hypothesis) and various forensic tools including Criteria-based Content
Analysis and Scientific-based Content Analysis have been developed.

Objectivity-oriented approaches capture style signals that can indicate a decreased objectivity
of news content and thus the potential to mislead consumers, such as hyperpartisan styles and
yellow-journalism. Hyperpartisan styles represent extreme behavior in favor of a particular
political party, which often correlates with a strong motivation to create fake news.

6.2 Social Context Based


The nature of social media provides researchers with additional resources to supplement and
enhance News Content Models. Social context models include relevant user social
engagements in the analysis, capturing this auxiliary information from a variety of
perspectives. We can classify existing approaches for social context modeling into two
categories: Stance-based and Propagation-based. Note that very few existing fake news
detection approaches have utilized social context models. Thus, we also introduce similar
methods for rumor detection using social media, which have potential application for fake
news detection.

6.2.1 Stance Based


Stance-based: Stance-based approaches utilize users' viewpoints from relevant post contents
to infer the veracity of original news articles. The stance of users' posts can be represented
either explicitly or implicitly. Explicit stances are direct expressions of emotion or opinion,
such as the \thumbs up" and \thumbs down" reactions expressed in Facebook.

6.2.2 Propagation Based


Propagation-based: Propagation-based approaches for fake news detection reason about the
interrelations of relevant social media posts to predict news credibility. The basic assumption
is that the credibility of a news event is highly related to the credibilities of relevant social
media posts.

11
Chapter 7
Assessing Detection Efficacy
In this section, we discuss how to assess the performance of algorithms for fake news detection.
We focus on the available datasets and evaluation metrics for this task.

7.1 Datasets

Online news can be collected from different sources, such as news agency homepages, search
engines, and social media websites. However, manually determining the veracity of news is a
challenging task, usually requiring annotators with domain expertise who performs careful
analysis of claims and additional evidence, context, and reports from authoritative sources.

Generally, news data with annotations can be gathered in the following ways: Expert
journalists, Fact-checking websites, Industry detectors, and Crowdsourced workers.
However, there are no agreed upon benchmark datasets for the fake news detection problem.
Some publicly available datasets are listed below:

BuzzFeedNews15: This dataset comprises a complete sample of news published in Facebook


from 9 news agencies over a week close to the 2016 U.S. election from September 19 to 23
and September 26 and 27. Every post and the linked article were fact-checked claim-by-claim
by 5 BuzzFeed journalists.
LIAR16: This dataset is collected from fact-checking website PolitiFact through its API. It
includes 12,836 human-labeled short statements, which are sampled from various contexts,
such as news releases, TV/radio interviews, campaign speeches, etc.
BS Detector17: This dataset is collected from a browser extension called BS detector
developed for checking news veracity18. It searches all links on a given webpage for
references to unreliable sources by checking against a manually complied list of domains.
CREDBANK19: CREDBANK is a large-scale crowdsourced dataset of approximately 60
million tweets that cover 96 days starting from October 2015. All the tweets are broken down
to be related to over 1,000 news events, with each event assessed for credibilities by 30
annotators from Amazon Mechanical Turk.

12
Chapter 8

Algorithm
8.1 Random Forest

Random Forest are built on the concept of building many decision tree algorithms, after which
the decision trees get a separate result. The results, which are predicted by large number of
decision tree, are taken up by the random forest. To ensure a variation of the decision trees, the
random forest randomly selects a subcategory of properties from each group.
The applicability of Random forest is best when used on uncorrelated decision trees. If
applied on similar trees, the overall result will be more or less similar to a single decision
tree. Uncorrelated decision trees can be obtained by bootstrapping and feature randomness.

8.2 Decision Tree


The decision tree is an important tool that works based on flow chart like structure that is
mainly used for classification problems. Each internal node of the decision tree specifies a
condition or a “test” on an attribute and the branching is done on the basis of the test
conditions and result. Finally the leaf node bears a class label that is obtained after computing
all attributes. The distance from the root to leaf represents the classification rule. The
amazing thing is that it can work with category and dependent variable.

8.3 Logistic Regression

It is a classification not a regression algorithm. It is used to estimate discrete values (Binary


values like 0/1, yes/no, true/false) based on given set of independent variable(s). In simple
words, it predicts the probability of occurrence of an event by fitting data to a logit function.
Hence, it is also known as logit regression. Since, it predicts the probability, its output values
lies between 0 and 1 (as expected). Mathematically, the log odds of the outcome are modelled
as a linear combination of the predictor variables.
Odds = p/(1-p) = probability of event occurrence / probability of not event occurrence

ln(odds) = ln(p/(1-p))
logit(p)=ln(p/(1-p))= b0+b1X1+b2X2+b3X3....+bkXk

8.4 Naïve Bayes

13
This algorithm works on Bayes theory under the assuming that its free from predictors and is
used in multiple machine learning problems. Simply put, Naive Bayes assumes that one
function in the category has nothing to do with another. For example, the fruit will be
classified as an apple when its of red color, swirls, and the diameter is close to 3 inches.
Regardless of whether these functions depend on each other or on different functions, and
even if these functions depend on each other or on other functions, Naive Bayes assumes that
all these functions share a separate proof of the apples.

8.5 KNN
KNN classifies new positions based on most of the sounds from the neighboring k with
respect to them. The position assigned in the class is highly mutually exclusive between the
nearest neighbors K, as measured by the role of the distance KNN falls in the category of
supervised learning and its main applications are intrusion detection, pattern recognition. It is
nonparametric, so no specific distribution is assigned to the data or any assumption is made
about them. For example GMM, assumes a Gaussian distribution of the given data.

14
Chapter 9

Evaluation Matrix

To evaluate the performance of algorithms for fake news detection problem, various
evaluation metrics have been used. In this subsection, we review the most widely used
metrics for fake news detection. Most existing approaches consider the fake news problem as
a classification problem that predicts whether a news article is fake or not:

1)True Positive (TP): when predicted fake news


pieces are actually annotated as fake news;
2)True Negative (TN): when predicted true news
pieces are actually annotated as true news;
3)False Negative (FN): when predicted true news
pieces are actually annotated as fake news;
4)False Positive (FP): when predicted fake news
pieces are actually annotated as true news.

Fig 9.1 Evaluation Metrices

15
Chapter 10

Related Area

In this section, we further discuss areas that are related to the problem of fake news detection.
We aim to point out the differences between these areas and fake news detection by briefly
explaining the task goals and highlighting some popular methods.

10.1 Rumor Classification


A rumor can usually be defined as \a piece of circulating information whose veracity status is
yet to be verified at the time of spreading". The function of a rumor is to make sense of an
ambiguous situation, and the truthfulness value could be true, false or unverified. Previous
approaches for rumor analysis focus on four subtasks: rumor detection, rumor tracking,
stance classification, and veracity classification.

10.2 Truth Discovery


Truth discovery is the problem of detecting true facts from multiple conflicting sources.
Truth discovery methods do not explore the fact claims directly, but rely on a collection of
contradicting sources that record the properties of objects to determine the truth value. Truth
discovery aims to determine the source credibility and object truthfulness at the same time.
The fake news detection problem can benefit from various aspects of truth discovery
approaches under different scenarios. First, the credibility of different news outlets can be
modeled to infer the truthfulness of reported news. Second, relevant social media posts can
also be modeled as social response sources to better determine the truthfulness of claims.

10.3 Clickbait Detection


Clickbait is a term commonly used to describe eye-catching and teaser headlines in online
media. Clickbait headlines create a so-called \curiosity gap", increasing the likelihood that
reader will click the target link to satisfy their curiosity. Existing clickbait detection
approaches utilize various linguistic features extracted from teaser messages, linked.

16
Conclusion

With the increasing popularity of social media, more and more people consume news from
social media instead of traditional news media. However, social media has also been used to
spread fake news, which has strong negative impacts on individual users and broader society.
In this article, we explored the fake news problem by reviewing existing literature in two
phases: characterization and detection. In the characterization phase, we introduced the basic
concepts and principles of fake news in both traditional media and social media. In the
detection phase, we reviewed existing fake news detection approaches from a data mining
perspective, including feature extraction and model construction. We also further discussed
the datasets, evaluation metrics, and promising future directions in fake news detection
research and expand the field to other applications.

The research in this paper focuses on detecting the fake news by reviewing it in two stages:
characterization and disclosure. In the first stage, the basic concepts and principles of fake
news are highlighted in social media. During the discovery stage, the current methods are
reviewed for detection of fake news using different supervised learning algorithms.

In the 21st century, the majority of the tasks are done online. Newspapers that were earlier
preferred as hard-copies are now being substituted by applications like Facebook, Twitter,
and news articles to be read online. Whatsapp’s forwards are also a major source. The
growing problem of fake news only makes things more complicated and tries to change or
hamper the opinion and attitude of people towards use of digital technology. When a person
is deceived by the real news two possible things happen- People start believing that their
perceptions about a particular topic are true as assumed. Thus, in order to curb the
phenomenon, we have developed our Fake news Detection system that takes input from the
user and classify it to be true or fake.

17
Bibliography/References

[1] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu, “Fake News Detection
on Social Media: A Data Mining Perspective” arXiv:1708.01967v3 [cs.SI], 3 Sep 2017
[2] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu, “Fake News Detection
on Social Media: A Data Mining Perspective” arXiv:1708.01967v3 [cs.SI], 3 Sep 2017
[3] M. Granik and V. Mesyura, "Fake news detection using naive Bayes classifier," 2017
IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), Kiev,
2017, pp. 900-903.
[4] Fake news websites. (n.d.) Wikipedia. [Online]. Available:
https://en.wikipedia.org/wiki/Fake_news_website. Accessed Feb. 6, 2017
[5] Cade Metz. (2016, Dec. 16). The bittersweet sweepstakes to build an AI that destroys fake
news.
[6] Conroy, N., Rubin, V. and Chen, Y. (2015). “Automatic deception detection: Methods for
finding fake news” at Proceedings of the Association for Information Science and Technology,
52(1), pp.1-4.
[7] Markines, B., Cattuto, C., & Menczer, F. (2009, April). “Social spam detection”. In
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the
Web (pp. 41-48)
[8] Rada Mihalcea , Carlo Strapparava, The lie detector: explorations in the automatic
recognition of deceptive language, Proceedings of the ACL-IJCNLP
[9] Kushal Agarwalla, Shubham Nandan, Varun Anil Nair, D. Deva Hema, “Fake News
Detection using Machine Learning and Natural Language Processing,” International Journal of
Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7, Issue-6, March
2019
[10] H. Gupta, M. S. Jamal, S. Madisetty and M. S. Desarkar, "A framework for real-time
spam detection in Twitter," 2018 10th International Conference on Communication Systems
& Networks (COMSNETS), Bengaluru, 2018, pp. 380-383 [11] M. L. Della Vedova, E.
Tacchini, S. Moret, G. Ballarin, M. DiPierro and L. de Alfaro, "Automatic Online Fake News
Detection Combining Content and Social Signals," 2018 22nd Conference of Open
Innovations Association (FRUCT), Jyvaskyla, 2018, pp. 272-279.

18
[12] C. Buntain and J. Golbeck, "Automatically Identifying Fake News in Popular Twitter
Threads," 2017 IEEE International Conference on Smart Cloud (SmartCloud), New York,
NY, 2017, pp. 208-215.
[13] S. B. Parikh and P. K. Atrey, "Media-Rich Fake News Detection: A Survey," 2018 IEEE
Conference on Multimedia Information Processing and Retrieval (MIPR), Miami, FL, 2018,
pp. 436-441
[14] Scikit-Learn- Machine Learning In Python
[15] Dataset- Fake News detection William Yang Wang. " liar, liar pants on _re": A new
benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648, 2017.
[16] Shankar M. Patil, Dr. Praveen Kumar, “Data mining model for effective data analysis of
higher education students using MapReduce” IJERMT, April 2017 (Volume-6, Issue-4).
[17] Aayush Ranjan, “ Fake News Detection Using Machine Learning”, Department Of
Computer Science & Engineering Delhi Technological University, July 2018.
[18] Patil S.M., Malik A.K. (2019) Correlation Based Real-Time Data Analysis of Graduate
Students Behaviour. In: Santosh K., Hegadi R. (eds) Recent Trends in Image Processing and
Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science,
vol 1037. Springer, Singapore.
[19] Badreesh Shetty, “Natural Language Processing (NLP) for machine learning” at
towardsdatascience, Medium.
[20] NLTK 3.5b1 documentation, Nltk generate n gram .
[21] Ultimate guide to deal with Text Data (using Python) – for Data Scientists and Engineers
by Shubham Jain, February 27, 2018 .
[22] Understanding the random forest by Anirudh Palaparthi, Jan 28, at analytics vidya.
[23] Understanding the random forest by Anirudh Palaparthi, Jan 28, at analytics vidya.
[24]Shailesh-Dhama,“Detecting-Fake-News-with-Python”, Github, 2019 .
[25] Aayush Ranjan, “ Fake News Detection Using Machine Learning”, Department Of
Computer Science & Engineering Delhi Technological University, July 2018.
[26] What is a Confusion Matrix in Machine Learning by Jason Brownlee on November 18,
2016 in Code Algorithms From Scratch.

19

You might also like