Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

College of Excellence, 2021-6th Rank

Autonomous and Affiliated to Bharathiar University

Reaccredited with A++ grade by NAAC, An ISO 9001:2015 Certified Institution


Peelamedu, Coimbatore-641004

FAKE NEWS DETECTION USING MACHINE LEARNING


Group Project

Submitted to PSGR Krishnammal College for Women in partial fulfilment of the requirement for
award of the Degree of
BACHELOR OF COMMERCE WITH BUSINESS ANALYTICS

SUBMITTED BY,

NAME REGISTER NUMBER

DHANA PRIYA K 19BCB012


KARTHIKAA R S 19BCB028
RITHISHA CATHARINE C 19BCB043
SAKTHI G 19BCB046
SRINIRANJANI A 19BCB051

Under the Guidance of


Dr. (Mrs.) M. RAJESWARI, M.Sc., Ph.D
Assistant Professor
June 2022
CERTIFICATE

This is to certify that this project work entitled “FAKE NEWS DETECTION USING
MACHINE LEARNING” submitted to PSGR Krishnammal College for Women, in partial
fulfilment of the Degree of Bachelor of Commerce with Business Analytics and is record of the
original work done by DHANA PRIYA K, KARTHIKAA R S, RITHISHA CATHARINE C,
SAKTHI G and SRINIRANJANI A during the period of 2021-2022 of their study in the
Department of B Com (Business Analytics), under the supervision and guidance and this project
work has not formed the basis for the award of any Degree/ Diploma/ Associateship/ Fellowship
or other similar title to any candidate of any University.

Signature of the Guide

Dr. (Mrs.) M. Rajeswari, M.Sc., Ph.D.,


Assistant Professor
Department of B Com (Business Analytics)

COUNTER SIGNED

Mrs. S. Manasha, M. Com., M. Phil.,


Head of the Department (I/C)
Department of B Com (Accounting and Finance) and (Business Analytics)

Dr. (Mrs.) S. Nirmala, MBA., M.Phil., Ph.D.,


Principal
DECLARATION

We, DHANA PRIYA K, KARTHIKAA R S, RITHISHA CATHARINE C, SAKTHI G


and SRINIRANJANI A of III B.Com (Business Analytics), hereby declare that the project work
entitled “FAKE NEWS DETECTION USING MACHINE LEARNING ” submitted to PSGR
Krishnammal College for Women, in partial fulfilment of the Degree of Bachelor of Commerce
with Business Analytics and is the record of original research work done by us during the period
(2021-2022) of our study in the Department of B Com (Business Analytics) under the supervision
and guidance of Dr. (Mrs.) M. Rajeswari, M.Sc., Ph.D., and this project work has not formed
the basis for the award of any Degree/ Diploma/ Associateship/ Fellowship or other similar title to
any candidate of the University.

REG NO NAME SIGNATURE


19BCB012 DHANA PRIYA K

19BCB028 KARTHIKAA R S

19BCB043 RITHISHA CATHARINE C

19BCB046 SAKTHI G

19BCB051 SRINIRANJANI A
ACKNOWLEDGEMENT

We take this opportunity to acknowledge with great pleasure, deep satisfaction and
gratitude, the contribution of many individuals in the successful completion of this
project report.

We deem it our inbound duty to thanks Shri G. Rangaswamy, Managing Trustee,


GRG Trust
for the infrastructure provided.

We express our sincere thanks to Dr. (Mrs.) R. Nandhini, Chairperson, PSGR


Krishnammal College for Women, Coimbatore for her support and for all the
resources provided.

We express our profound gratitude to Dr. (Mrs.) N. Yesodha Devi, M.Com.,


M.Phil., Ph.D., Secretary, PSGR Krishnammal College for Women, Coimbatore
for having given me theopportunity to undergo this course and to undertaken this
project.

We express our sincere thanks to Dr. (Mrs.) S. Nirmala, MBA., M.Phil., Ph.D.,
Principal, PSGRKrishnammal College for Women, Coimbatore for her support and
for all resources provided.

We are extremely grateful to Mrs. S. Manasha, M. Com., M.Phil., Head of the


Department(I/C), Department of B Com (Accounting and Finance) and (Business
Analytics), PSGR Krishnammal College for Women, Coimbatore for sustained
interest and advice that have contributed to a great extent to the completion of the
project.

We are thankful to our faculty guide Dr. (Mrs.) M. Rajeswari, M.Sc., Ph.D.,
Assistant Professor, Department of B Com (Business Analytics) PSGR
Krishnammal College for Women, Coimbatore for her appropriate guidance and
suggestions.

We express our sincere thanks to Mr. V. Manoj, B.E (E.C.E), (MASTERS IN ML


& AI)., Application Developer/ Data Analyst, I-Bacus-Tech, Coimbatore for
guiding and helping us in completion of the project.

We also express our gratitude to all faculty members of our department for their
timely support and encouragement.
CONTENTS

S.N TITLE PAGE


O NO
1 INTRODUCTION 1

1.1 COMPANY PROFILE 2

1.2 ABOUT THE PROJECT 7

2 REVIEW OF LITERATURE 8

3 SYSTEM ANALYSIS 13

3.1 PROPOSED SYSTEM 13

4 FEASIBILITY STUDY 14

5 SYSTEM SPECIFICATION 15

5.1HARDWARESPECIFICATION 15

5.2SOFTWARESPECIFICATION 15

5.2.1ABOUTTHESOFTWARE 15

6 SYSTEM DESIGN 17

6.1PROJECT DESCRIPTION 17

6.2WORKFLOW 18

6.3DATABASE DESIGN 18

7 SYSTEM TESTING AND IMPLEMENTATION 20

7.1 TESTING FUNDAMENTALS 20

7.2 IMPLEMENTATION 21

8 SCREENSHOT LAYOUT AND SAMPLE CODING 22

9 RESULT AND DISCUSSION 32

10 CONCLUSION AND FURTHER WORK 34

REFERENCES
SYNOPSIS

These days a lot of information is being shared over social media and we are not able to
Differentiate between which information is fake and which is real. People immediately
start expressing their concern or sharing their opinion as soon as they come across a post,
without verifying its authenticity. This further results in spreading of it. Fake news and
rumors are the most popular forms of false and unauthenticated information and should
be detected as soon as possible for avoiding their dramatic consequences. This paper is
a review and comprehensive analysis of the articles in recent literatures which were
about detecting fake news over social media. The aim of this work is to create a system
or model that can use the data of past news reports and predict the chances of a news
report being fake or not. Various researchers have attempted solving this challenge in a
multitude of ways to test which method works and get desirable results.

OBJECTIVES:

• To analyze the fake news detection using Naive Bayes Algorithm.


• To examine the fake news detection using Logistic Regression.
• To figure out the fake news detection using Decision Tree.
• To evaluate the fake news detection using Random Forest Algorithm.
• To analyze the fake news detection using Support Vector Machine Algorithm.
1.INTRODUCTION

The advent of the World Wide Web and the rapid adoption of social media platforms (such
as Facebook and Twitter) paved the way for information dissemination that has never been
witnessed in the human history before. With the current usage of social media platforms,
consumers are creating and sharing more information than ever before, some of which are
misleading with no relevance to reality. Automated classification of a text article as
misinformation or disinformation is a challenging task. Even an expert in a particular domain has
to explore multiple aspects before giving a verdict on the truthfulness of an article. There has
been a rapid increase in the spread of fake news in the last decade, most prominently observed
in the 2016 US elections. A number of studies have primarily focused on detection and
classification of fake news on social media platforms such as Facebook and Twitter.
The extensive spread of fake news can have a serious negative impact on individuals and
society. First, fake news can break the authenticity balance of the news ecosystem. Second, fake
news intentionally persuades consumers to accept biased or false beliefs. Fake news is
usually manipulated by propagandists to convey political messages or influence. Third, fake news
changes the way people interpret and respond to real news.
Different researchers are working for the detection of fake news. The use of Machine
learning is proving helpful in this regard Machine learning is the part of artificial intelligence
that helps in making the systems that can learn and perform different actions. Researchers are
using different algorithms to detect the false news. The algorithms first have to be trained with a
data set called train data set. After the training, these algorithms can be used to perform different
tasks. Machine learning algorithms will detect the fake news automatically once they have
trained.

1
1.1COMPANY PROFILE

ABOUT US

FORUS Technologies is a leading Software Development Concern in New Zealand


& India managed by IT veterans with more than a decade experience in leading Software
Technologies. FORUS Technologies develop applications which is into customized software
applications, website development services based on a range of platforms and technologies.

We provide Web designing, Application Development and Maintenance


Outsourcing services that lead to business process improvement. This allows for reduction
of costs and enables business growth. FORUS Application Development and Maintenance
Services is a part of its IT Services Group.

VISION

• Integrity: Honesty in how we deal with our clients, each other and with the world.
• Candor: Be open and upfront in all our conversations. Keep clients updated on the
real situation. Deal with situations early; avoid last minute surprises.
• Service: Seek to empower and enable our clients. Consider ourselves successful not
when we deliver our client’s final product but when the product is launched and
meets success.
• Kindness: Go the extra mile. Speak the truth with grace. Deliver more than is
expected or promised.
• Competence: Benchmark with the best in the business. Try new and better things.
• Never rest on laurels. Move out of comfort zones.

2
MISSION

FORUS Technologies mission is “To assist company’s progression and mechanize


there in sequence stream” and to reach out the global markets and create innovative, world-
class software solutions that match to the international quality standards.

TEAM

FORUS Technologies is proud of its company team. FORUS Technologies believes


that people are the key for successfully provided high level offshore software development
services and more importantly, an ongoing relationship with our clients. FORUS
Technologies has a clear policy of employing only the very best in their field. The team is
composed of developers who have significant experience in the Education, Healthcare,
Transportation, Hospitality, and Manufacturing, Real estate, Retailing and Technology
sector, etc.,

VISIONARY & DYNAMIC MANAGEMENT

FORUS Technologies Management is passionate about its operations & team. Short
years have led to a long track record and ever green relationships with its clients. The
company’s management has experience in delivering projects to several companies. They
have managed team’s people effectively delivering projects in time and to stipulated norms.

Services

• Software Development and Maintenance.


• Knowledge Process Outsourcing.
• Internship for College Students.
• Video Conference Training in multi-locations of world.
• Project Installation and Training through Online mode.
• Technologies Refinement with Development Exposure.
• Training with Live & Real Time Projects.
• Industrial Visits with Training.
• Research Assistance in Engineering and Arts.
• Corporate Training.

3
WEBSITE DEVELOPMENT AND HOSTING

• Web 2.0 web development and implementation.


• Custom graphics design - logos, web design.
• Corporate identity development - all the elements of a corporate style &marketing
design.

Having a Web site is important, but increasing traffic and attracting visitors to your site
should be the next priority.

• Pay Per Click Advertising.


• Link Building.
• E-mail Newsletters.
• Reporting and Analysis.

APPLICATION DEVELOPMENT SERVICES

Our expert team has done all the hard work and knowledge sharing to accomplish
various levels of application development projects. We are experienced professionals to
develop advanced systems with complex business logic dealing with large amounts of data
and transactions. We are able to supply you with most desirable, innovative, trustworthy web
application solution.

WEB DEVELOPMENT SERVICES

• Database design and programming.


• Database integration.
• Developing web interface for data entry.
• Add features to or modify existing script.

PORTAL DEVELOPMENT

Web is full of information and knowledge and there are many ways to retrieve. Web portal
is one of the easy and resourceful modes that offer various resources and services like such

4
as email, forums, search engines, and online shopping malls that help the user of portal to
interact with individuals and groups spread over the internet.

OUR FOCUS AREA OF PORTAL DEVELOPMENT ARE

• Entertainment Portal Development


• B2B portal development
• B2C portal development
• E-commerce Portal
• Online Travel portal (web site)

SOLUTIONS

FORUS Technologies understand that every customer and every industry has its
unique software requirement and today’s readymade software will not suit client’s business
process. FORUS Technologies will develop and implement business solutions based on
your organizations needs and process, which will make application more user friendly.

REACH US

FORUS Technologies, No: 27, Ist Floor, Mani’s Colony, Kalingrayan Street, RamNagar,
Coimbatore 641 012

Dynamic reach:

+91 82205-55626

0422-4957158

www.forustechnologies.com Dynamic interaction:

Info@ forustechnologies.com Visit US:

http://www.forustechnologies.com/

5
OUR RECENT CLIENTS:

Australia PORT BLAIR

6
1.2 ABOUT THE PROJECT

The main aim of this project is to detect fake news. The main objective is to analyze
and find the accuracy of fake news detection. Colab is used for analyzing and visualizing the
data. First the data is collected from the Kaggle website and the data is pre-processed by
removing the repeated values and removing the null values. Using the machine learning
algorithm detecting fake news is done. Machine learning algorithm is used to analyze the
accuracy rate of the fake news. The algorithm that is used to analyze the data is naïve Bayes
algorithm, Logistic regression, decision tree, random forest, support vector machine.
Accuracy rate is predicted using all these algorithms and a comparison is made between the
used algorithms to find which algorithm’s accuracy level is high. The report is created using
the result.

THE PROJECT CONSISTS OF THE FOLLOWING MODULES

• Naive Byes Algorithm.


• Logistic Regression.
• Decision Tree.
• Random Forest Algorithm.
• Support Vector Machine Algorithm.

7
2. REVIEW OF LITERATURE

Social networks are stepping up in using digital fake news detection tools and
educating the public towards spotting fake news. At the time of writing, Facebook uses
machine learning algorithms to identify false or sensational claims used in advertising for
alternative cures, they place potential fake news articles lower in the news feed, and they
provide users with tips on how to identify fake news them selves [3]. Some approaches detect
fake news by using metadata such as a comparison of release time of the article and timelines
of spreading the article as well where the story spread [2].

The nature of social media makes it easy to spread fake news, as a user potentially
sends fake news articles to friends, who then send it again to their friends and so on.
Comments on fake news sometimes fuel its ‘credibility’ which can lead to rapid sharing
resulting in further fake news [11]. Social bots are also responsible for the spreading of fake
news. Bots are sometimes used to target super-users by adding replies and mentions to posts.
Humans are manipulated through these actions to share the fake news articles [25].

With the widespread dissemination of information via digital media platforms, it is


of utmost importance for individuals and societies to be able to judge the credibility of it.
Fake news is not a recent concept, but it is a commonly occurring phenomenon in current
times. The consequence of fake news can range from being merely annoying to influencing
and misleading societies or even nations. A variety of approaches exist to identify fake news.

By conducting a systematic literature review, we identify the main approaches


currently available to identify fake news and how these approaches can be applied in
different situations. Some approaches are illustrated with a relevant example as well as the
challenges and the appropriate context in which the specific approach can be used. Fake
news that is purposely created to mislead and to cause harm to the public is referred to as
digital disinformation. Fake news that is purposely created to mislead and to cause harm to
the public is referred to as digital disinformation [23].

The spreading of false political information have increased due to the emergence of
streamline media environments. In a recent study it was found that 43% (13 of 30) false news
stories were shared on social media platforms, like Twitter, with links to non-credible news
websites [15]. There has been a considerable amount of research done on the influence of
fake news on the political environment. By creating false political statements, voters can be

8
convinced or persuaded to change their opinions. Critics reported that in the national election
in the UK (regarding the nation’s withdrawal out of the EU) and the 2016 presidential
election in the US, a number of false information was shared on social media platforms that
have influenced the outcome of the results. Social media platforms, like Facebook, came
under fire in the 2016 US presidential election, when fake news stories from unchecked
sources were spread among many users. The spreading of such fake news have the sole
purpose of changing the public’s opinion [17].

Various techniques can be used to change the public’s opinion. These techniques
include repeatedly retweeting or sharing messages often with the use of bots or cyborgs. It
also includes misleading hyperlinks that lures the social media user to more false information
[6]. One of the biggest problems with fake news is that it allows the writers to receive
monetary incentives. Misleading information and stories are promoted on social media
platforms to deceive social media users for financial gain [16]. One of the main reasons for
falsifying information is to earn money through clicks and views People earn money through
clicks and views . The more times the link is clicked the more advertising money is generated.
Every click corresponds to advertising revenue for the content creator [8].

The more traffic companies or social media users get to their fake news page, the
more profit through advertising can be earned. Writers focus on sensational headlines rather
than truthful information. These attractive headlines deceive individuals into sharing certain
false information Click baits have been indicated as one of the main reasons behind the
spreading of false information [7].

In this paper shows a simple approach for fake news detection using naive Bayes
classifier. This approach was implemented as a software system and tested against a data set
of Facebook news posts. They were collected from three large Facebook pages each from
the right and from the left, as well as three large mainstream political news pages (Politico,
CNN, and ABC News). They achieved classification accuracy of approximately 74%.
Classification accuracy for fake news is slightly worse. This may be caused by the skewness
of the dataset: only 4.9% of it is fake news[18].

Conroy, Rubin, and Chen outlines several approaches that seem promising towards
the aim of perfectly classify the misleading articles. They note that simple content-related n-
grams and shallow parts-of-speech (POS) tagging have proven insufficient for the
classification task, often failing to account for important context information. Rather, these

9
methods have been shown useful only in tandem with more complex methods of analysis.
Deep Syntax analysis using Probabilistic Context Free Grammars (PCFG) have been shown
to be particularly valuable in combination with n-gram methods [19]. False news detection
by using the N-gram model to segregate between false and truth information through
machine learning techniques. They have experiments using both linear- based as well as
nonlinear-based classifiers and compared those six different machine learning techniques:
K-Nearest Neighbor, Support Vector Machine, Logistic Regression, Linear Support Vector
Machine, Decision tree, and Stochastic Gradient Descent, which are good at detecting fake
news.

The authors have shown their experimental results using compiled datasets from truth
and inappropriate news websites so that they have to achieve results with great expectation.
They have used 5-fold cross-validation in their experiments so, around every validation of
datasets is used 80% for training datasets, the rest 20% for testing datasets. The authors have
achieved the highest accuracy of 92% by using unigram methods and a linear support vector
machine classifier[4].

A model that creates a whole network for learning a depiction of news, reports,
authors, and titles simultaneously. To achieve better accuracy, they have used several ML
algorithms: Support Vector Machine, Convolution Neural Networks, Long Short-Term
Memory, and K- nearest neighbors, and Naive Bayes. The author proposed a model that was
first tested initially on CNN based machine learning algorithm that provides accuracy
with 94% in a combined dataset (Liar and Kaggle) but, it has seen that using the KNN
model only predicts. 70% that’s the very worst model. When they have examined their
dataset using SVM with an accuracy of 73% that was almost similar to their previous
algorithm, then they have to demonstrate their model after using Naive Bayes with 91%
accuracy which was much better.

The proposed project uses NLP techniques for detecting the 'fake news’ that is,
misleading news stories which come from the non-reputable sources. By building a model
based on a K Means clustering algorithm, the fake news can be detected. The data science
community has responded by taking actions against the problem. It is impossible to
determine a news as real or fake accurately. So the proposed project uses the datasets that
are trained using count vectorizer method for the detection of fake news and its accuracy will
be tested using machine learning algorithms [10].

10
Machine-learning methods are employed to detect the credibility of news based on
the text content and responses given by users. A comparison is made to show that the latter
is more reliable and effective in terms of determining all kinds of news. The method applied
in this work is highest posterior probability of tokens in the response of two classes. It uses
frequency-based features to train the Algorithms including Support Vector Machine, Passive
Aggressive Classifier, Multinomial Naive Bayes, Logistic Regression and Stochastic
Gradient Classifier. This work also highlights a wide-range of features established recently
in this area that gives a clearer picture for the automation of this problem. I have conducted
an experiment in this work to match the lists of Fake related words in the text of responses, to
find out whether the response based detection is a good measure to determine the credibility
or not.The results were found to be very promising and have scope for more research in the
area[9].

In our modern era where internet is ubiquitous, everyone relies on various online
resources for news. Along with the increase in use of social me dia platforms like
Facebook, Twitter etc. news spread rapidly among millions of users within a very short span
of time. The spread of fake news has far reaching consequences like creation of bi ased
opinions to swaying election outcomes for the benefit of certain candidates. Moreover,
spammers use appealing news headlines to generate revenue using advertisements via click-
baits. In this project, we aim to perform a binary classification of vari ous news articles
available online with the help of concepts pertaining to Artificial Intelligence, Natural
Language Processing and Machine Learning [12].

The project is an web application which gives you the guidance of the day to day
rountine of fake news, spam message in daily news chanel, Facebook, Twitter, Instagram
and other social media. We have shown some data analysis from our dataset which have
retrive from many online social media and display the main source till now fake news and
true news are engaged. Our project is tangled with multiple model trained by our own and
also some pretrained model extracted from Felipe Adachi. The accuracy of the model is
around 95% for all the selfmade model and 97% for this pretrained model. This model can
detect all news and message which are related to covid-19, political news, geology, etc [9].

Fake news was categorized into clickbait, influential, and satire. To stop fake news,
the methods that were adopted were spam detection, stance detection, benchmark dataset.
On further author saw the sentiment analysis which came under NLP techniques. A fake

11
news story that has been discussed was China Airport Security Robot Electoroshocks that
took place in 2016 and led to over 12 thousand fake news in 2016, in China, which was in
244 different websites as sources [20].

Learning to discern, taking time to think and reason about what is true and what is
not, or what intentions a piece of information may have, as a previous step before sharing it,
and also making this procedure a habit, that is, fostering critical literacy, is not a first-level
curricular requirement, as Tickle (2018) reflects, although he is aware of its importance [24].
A report of Pew Research Center U.S.A. suggests that adults got around 70 percent of news
from social media. With the news of Donald Trump as president, this information has led to
an increase of 9 lakh and 60 thousand Facebook users.

In this paper, linguistic features or visual features play their role. Moving onto
network features it deals with diffusion networks as well as co-occurrence networks. So, the
authors have achieved an accuracy of about 83 percent [21]. Fake news was categorized
into clickbait, influential, and satire. To stop fake news, the methods that were adopted were
spam detection, stance detection, benchmark dataset. On further author saw the sentiment
analysis which came under NLP techniques. A fake news story that has been discussed was
China Airport Security Robot Electoroshocks that took place in 2016 and led to over
12 thousand fake news in 2016, in China, which was in 244 different websites as sources
[7].

12
3. SYSTEM ANALYSIS

3.1 PROPOSED SYSTEM

The main objective of the study is to figure out the accuracy level of the fake news
spread using five different algorithms i.e., Naive Bayes, Logistic Regression, Decision Tree,
Support Vector Machine (SVM), Random Forest. And to compare which of the Algorithms
have Impact in resulting higher accuracy level of the fake news spread. The dataset is
retrieved from the website of Kaggle.

Fake news can reduce the impact of real news by competing with it. So, detecting the
fake news is much important. Using the machine learning techniques, it is easy to detect the
news. Confusion matrix is used for visualization.

13
4. FEASIBILITY STUDY

FEASIBILITY OF STUDY

The dataset is named as news dataset enumerated with various news from different
places. The dataset consists of attributes such as Title, Text, Subject, and Date. Different
algorithms are used to predict the accuracy level. Confusion matrix is used for visualization.
Confusion matrix is a matrix used to determine the performance of the classification models
for a given set.

14
5. SYSTEM SPECIFICATION

5.1 HARDWARE REQUIREMENTS

• Processor - 11th Gen (R) Core(TM) i5-1135G7@ 2.40Hz 2.42GHz.


• RAM - 12.0 GB (11.7 GB usable).
• Mouse - HP S500 Wireless optical mouse.
• Keyboard - Full-sized island-style keyboard with number keypad.
• Display card - Intel@ HD Graphics 4000.

5.2 SOFTWARE REQUIREMENTS

• Operating system : windows 11.


• Front end tool : Google colab.

5.2.1 ABOUT THE SOFTWARE:

COLAB:

Colab is basically a free Jupyter notebook environment running wholly in the cloud.
Most importantly, Colab does not require a setup, plus the notebooks that you will create can
be simultaneously edited by your team members – in a similar manner you edit documents
in Google Docs. The greatest advantage is that Colab supports most popular machine
learning libraries which can be easily loaded in your notebook.

HISTORY:
Google have released Colaboratory: a web IDE for python, to enable Machine
Learning with storage on the cloud — this internal tool had a pretty quiet public release in
late 2017, and is set to make a huge difference in the world of machine learning, artificial
intelligence and data science work.
Google first started working with the Jupyter Development Team in 2014 to release
an early version of the tool, since then the tool has been constantly evolving. Colaboratory
as being almost identical in structure of jupyter. It’s a Jupyter notebook environment that
requires no setup to use and runs entirely in the cloud.

15
FEATURES:
The following are the major features of COLAB. We will use this features throughout the
journey. Here is just a brief introduction to all features of COLAB.
• Interactive tutorials to learn machine learning and neural networks.
• Write and execute Python 3 code without having a local setup.
• Execute terminal commands from the Notebook.
• Import datasets from external sources such as Kaggle.
• Save your Notebooks to Google Drive.
• Import Notebooks from Google Drive.
• Free cloud service, GPUs and TPUs.
• Integrate with PyTorch, Tensor Flow Open CV.
• Import or publish directly from/to GitHub.

BENEFITS:

• Free virtual machines to use: with about 12GB RAM and 50GB hard drive space,
with common dependencies such as numpy, pandas, and even Tensor Flow pre-installed.
• Free GPU access.
• Supports Python 2 (and just recently also Python 3!).
• There is integration with Google Drive, So we can share and control permissions and
you’ll be able to see other collaborators work instantly.
• There is a revisions history — an extremely useful feature for teams.
• Can add comments on cells — someone for example could be given ‘comment-only’
permissions to review code. It can also resolve, reply and target comments to others.

DISADVANTAGES:

• All Colaboratory notebooks must be stored in Google Drive — so need to log into a
Google account before accessing the tool.
• Long-running background computations may be stopped.
• Need to install all specific libraries which do not come with standard python (and need
to repeat this with every session).
• It can be difficult (and potentially costly) to work with bigger datasets as you have to
download and store them in Google drive (only 15GB is free in Google Drive).

16
6. SYSTEM DESIGN

6.1 PROJECT DESCRIPTION

To analyze and predict the accuracy rate of fake news deduction using different
algorithm. The data is imported from the Kaggle database. Then the dataset is imported in
colab. The dataset is preprocessed by removing the null values and to know the information
about the dataset. After finding the accuracy level using machine learning algorithms, a
comparison is made between the algorithms. Here naive Bayes algorithm, Logistic
regression, decision tree, random forest, support vector machine is used to get the accuracy
level. Confusion matrix is used to visualize the result. Then comparison made between the
algorithms and final output of the algorithm with the highest accuracy level is predicted.

17
6.2 DATA FLOW DIAGRAM

IMPORTED COLAB
DATASET

TRAINING TESTING
DATASET
DATASET
DATASET

NAVIE BAYES

LOGISTIC REGRESSION

START DECISION TREE

SUPPORT VECTOR
MECHINE

RANDOM FOREST

ACCURACY
RESULT
REPORT LEVEL

18
6.3 DATABASE DESIGN

Fig 6.3.1: Database design

The fig 6.3.1 represents the dataset. The dataset is collected for Kaggle website. Then data
cleaning is done by removing punctuations, empty cells. The table dataset contains the
attributes that involves in news like Title, Text, Subject, Date. It contains mostly the political
news that spreads during the time of us presidential election.

ATTRIBUTE DATA TYPE

TITLE character

TEXT character

SUBJECT character

DATE integer

19
7. SYSTEM TESTING AND IMPLEMENTATION

7.1 TESTING FUNDAMENTALS

• NAVIE BAYES ALGORITHM

Naive Bayes algorithm is a supervised learning algorithm, which is based on Bayes


theorem and used for solving classification problems. It is mainly used in text
classification that includes a high-dimensional training dataset.

It is a probabilistic classifier, which means it predicts on the basis of the probability of an


object.

• LOGISTIC REGRESSION

Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.

Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.

• DECISION TREE

Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike
other supervised learning algorithms, IT can be used for solving regression and classification
problems.

• RANDOM FOREST ALGORITHM


Random Forest is a classifier that contains a number of decision trees on various subsets of
the given dataset and takes the average to improve the predictive accuracy of that dataset.

It is based on the concept of ensemble learning, which is a process of combining multiple


classifiers to solve a complex problem and to improve the performance of the model.

20
• Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems.

7.2 IMPLEMENTATION

1. Naive Bayes algorithm is analyze and predict the fake news accuracy.
2. Logistic regression is used to analyze and predict fake news accuracy level.
3. Decision tree algorithm is used to analyze and predict fake news accuracy level.
4. Random forest algorithm is used to analyze and predict fake news accuracy level.
5. Support vector machine algorithm is used to analyze and predict fake news accuracy
level.

21
8. SCREEN LAYOUT AND SAMPLE CODING

I. TO IMPORT DATASET

Fig 8.1: Importing dataset

The source of Fig 8.1 represents the code for importing dataset from Google drive.
Dataset has been downloaded from Kaggle and preprocessing in done. Then the pathway of
the file is used to import the data set from drive. Two datasets are Fake news.csv and
training.csv files are imported.

22
Fig 8.2: Importing Libraries

In the above figure 8.2 packages like pandas, NumPy, matplotlib and seaborn have
been imported. Then Machine Learning library sklearn to bring in the algorithms
train_test_split accuracy score is used.

II. VISUALIZING THE MOST FREQUENT WORDS

Fig 8.3: Fake news

The above Fig 8.3 represents the visualized chart for the most frequently used words
in fake news dataset. The chart used for visualization is bar chart. The X axis represents the
words and the Y axis represents the count. TRUMP is the mostly used word in this dataset.

23
Fig 8.4: True news

The source of Fig 8.4 represents the visualized chart for the most frequently used
words in true news dataset. The chart used for visualization is bar chart. The X axis
represents the words and the Y axis represents the count. SAID is the mostly used word in
this dataset.

A. TO ANALYZE THE FAKE NEWS DETECTION USING NAIVE BAYES


ALGORITHM

Fig 8.5: Naive Bayes Algorithm

24
The above Fig 8.5 represents the code for predicting accuracy level using Naive
Bayes Algorithm. Naive Bayes algorithm is used for the classification tasks. This can be
used to check whether the news is authentic or fake. First dictionary is created (dct =dict() )
then NB classifier is imported. The model is fixed (model = pipe fit(X_train,
Y_train) )Testing dataset is used here. the next step is finding accuracy level. By using this
algorithm, the accuracy level of detecting fake news is 81.43%.

Fig 8.6: Confusion matrix without normalization

The above Fig 8.6 represents the visualization of the accuracy level using the Naive
Bayes algorithm. The chart used for visualization is confusion matrix without normalization.
X axis represents the predicted level and the Y axis represents the true label. Fake positive
783 and fake negative is 0. Real positive 94 and the real negative is 200.

25
B. TO EXAMINE THE FAKE NEWS DETECTION USING LOGISTIC
REGRESSION

Fig 8.7: Logistic Regression Algorithm

The above Fig 8.7 represents the code for predicting accuracy level using Logistic
Regression Algorithm. This classifier is used when the value to be predicted is categorical.
For example, it can predict or give the result in true or false. Testing dataset is used here.by
using this algorithm the accuracy of detecting fake news is 97.21%.

Fig 8.8: Confusion matrix without normalization

26
The above Fig 8.8 represents the visualization of the accuracy level using the
Logistic Regression algorithm. The chart used for visualization is confusion matrix without
normalization. X axis represents the predicted level and the Y axis represents the true label.
Fake positive 782 and fake negative is 1. Real positive 285 and the real negative is 29.

C. TO FIGURE OUT THE FAKE NEWS DETECTION USING DECISION TREE

Fig 8.9: Decision Tree Algorithm

The above Fig 8.9 represents the code for predicting accuracy level using decision
tree Algorithm. This supervised algorithm of machine learning can help to detect the fake news. It
breaks down the dataset into different smaller subsets. Testing dataset is used here.by using this
algorithm the accuracy of detecting fake news is 99.26%. To use ML algorithm, we need to convert
text data into numerical data so we use count vectoriser from sklearn. The next step is to Build
confusion metrics to find accuracy. The left diagonal value shows true prediction and right diagonal
value shows False prediction.

27
Fig 8.10: Confusion matrix without normalization

The above Fig 8.10 represents the visualization of the accuracy level using the
Decision Tree algorithm. The chart used for visualization is confusion matrix without
normalization. X axis represents the predicted level and the Y axis represents the true label.
Fake positive 777 and fake negative is 6. Real positive 292 and the real negative is 2.

D. EVALUATE THE FAKE NEWS DETECTION USING RANDOM FOREST


ALGORITHM

Fig 8.11: Random Forest Algorithm

28
The above Fig 8.11 represents the code for predicting accuracy level using random
forest Algorithm. In this classifier, there are different random forests that give a value and a
value with more votes is the actual result of this classifier. The Random Forest classifier is
imported. Then the model is fixed (model = pipe.fit(X_train, Y_train) )Testing dataset is
used here. The next step is finding accuracy level. By using this algorithm, the accuracy level
of detecting fake news is 98.61%.

Fig 8.12: Confusion matrix without normalization

The above Fig 8.12 represents the visualization of the accuracy level using the
Random Forest algorithm. X axis represents the predicted level and the Y axis represents
the true label. Fake positive 782 and fake negative is 1. Real positive 280 and the real
negative is 14.

29
E.TO ANALYZE THE FAKE NEWS DETECTION USING SUPPORT VECTOR
MACHINE ALGORITHM (SVM)

Fig 8.13: SVM Algorithm

The above Fig 8.13 represents the code for predicting accuracy level using
Support Vector Machine Algorithm. This algorithm used for Classification as well as
Regression problems. It is a supervised machine learning algorithm that learns from the
labeled data set. First the Support Vector Machine classifier is imported. Then the model
is fixed (model = pipe.fit(X_train, Y_train) )Testing dataset is used . The next step is
the accuracy level. By using this algorithm, the accuracy of detecting fake news is
99.O7%.

30
Fig 8.14: Confusion matrix without normalization

The above Fig 8.14 represents the visualization of the accuracy level using the SVM
algorithm.The chart used for visualization is confusion matrix without normalization. X axis
represents the predicted level and the Y axis represents the true label. Fake positive 780 and
fake negative is 3. Real positive 287 and the real negative is 7.

31
9. RESULT AND DISCUSSION

In this project analysis is made for fake news detection using different algorithms.
The accuracy rate of detecting the fake news is figured out using naïve Bayes algorithm,
Logistic regression, decision tree, random forest, support vector machine algorithms. From
this a comparison is made to find out which algorithm provides highest accuracy level.

Fig 9.1: comparing different algorithms.

The above Fig 9.1 represents the code for comparing the different algorithms
accuracy level.

Fig 9.2: Visualization of different algorithms accuracy level

32
The above Fig. 9.2 represent the Visualization of different algorithms accuracy
level by using the visualization of count plot by matplotlib

ALGORITHMS ACCURACY LEVEL

NAIVE BAYES ALGORITHM 81.43%

LOGISTIC REGRESSION 97.21%

DECISION TREE ALGORITHM 99.26%

RANDOM FOREST ALGORITHM 98.6%

SUPPORT VECTOR MACHINE


ALGORITHM (SVM) 99.07%

From this table we can conclude that decision tree algorithm provides the highest
accuracy level comparing to other algorithms using machine learning in Colab.

33
10. CONCLUSION AND FUTURE WORK

The majority of the tasks are done online. Newspapers that were earlier preferred as
hard-copies are now being substituted by applications like Facebook, Twitter, and news
articles to be read online. The growing problem of fake news only makes things more
complicated and tries to change or hamper the opinion and attitude of people towards use
of digital technology. Thus, in order to curb the phenomenon, Machine Learning
Techniques have to be used.

FURTHER WORKS

In further process different algorithms can be used to detect the fake news in unique
machine learning algorithms. Optimization and improvement of fake news detection using
deep learning approaches for societal benefit. Hence the purpose of uncovering fake news
must be for the benefit of society as well as for the support of the government.

34
REFERENCES

[1] Agarwal, Arush, and Akhil Dixit. "Fake News Detection: An Ensemble Learning Approach."
2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE,
2020.

[2] Ahmed, Hadeer, Issa Traore, Sherif Saad. "Detection of online fake news using n-gram analysis
and machine learning techniques." International conference on intelligent, secure, dependable
systems in distributed & cloud environments. Springer, Cham, 2017.

[3] Albright J. Welcome to the era of fake news. Media Commun. 2017;5(2):87. doi:
10.17645/mac.v5i2.977.

[4] Al-Rawi, A., Groshek, J., Zhang, L.: What the fake? Assessing the extent of networked political
spamming and bots in the propagation of #fakenews on Twitter. Online Inf. Rev. 43(1), 53–71
(2019).

[5] Bondielli, A., Marcelloni, F.: A survey on fake news and rumour detection techniques. Inf. Sci.
497, 38–55 (2019).

[6] chlin, N.: Fake news: belief in post-truth. Libr. Hi Tech 35(3), 386–392 (2017).

[7] ’Fake news detection using machine learning’’ Aayush Ranjan JULY, 2018.

[8] ’Fake News and Message Detection’’ Lokesh Parab 2020-2021 9. ’Fake news detection using
NLP’’ NSS Rama Chandra ’2017-2022.

[10] ’’Fake News Detection’’ Ritika Nair ‘Northeastern University’

[11] Jain, Anjali, et al. "A smart System for Fake News Detection Using Machine Learning." 2019
International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT).
Vol. 1. IEEE, 2019.

[12] Jain, Anjali, et al. "A smart System for Fake News Detection Using Machine Learning." 2019
International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT).
Vol. 1. IEEE, 2019.

[13] Jang, S.M., et al.: A computational approach for examining the roots and spreading patterns of
fake news: evolution tree analysis. Comput. Hum. Behav. 84, 103–113 (2018).

35
[14] Jang, S.M., Kim, and J.K.: Third person effects of fake news: fake news regulation and media
literacy interventions. Comput. Hum. Behav. 80, 295–302 (2018).

[15] Kanoh, H.: Why do people believe in fake news over the Internet? An understanding from the
perspective of existence of the habit of eating and drinking. Proc. Comput. Sci. 126, 1704–1709
(2018).

[16] M. Granik and V. Mesyura, "Fake news detection using naive Bayes classifier," 2017 IEEE
First Ukraine Conference on Electrical and Computer Engineering (UKRCON), Kiev, 2017, pp.
900-903.

[17] N. J. Conroy, V. L. Rubin, and Y. Chen, “Automatic deception detection: Methods for finding
fake news,” Proceedings of the Association for Information Science and Technology.

[18] O'Brien, Nicole. Machine learning for detection of fake news. Diss. Massachusetts Institute of
Technology, 2018.

[19] O'Brien, Nicole. Machine learning for detection of fake news. Diss. Massachusetts Institute of
Technology, 2019.

[20] Sparks, H., Frishberg, and H.: Facebook gives step-by-step instructions on how to spot fake news
(2020).

[21] Shu, Kai, et al. "Fake news detection on social media: A data mining perspective." ACM
SIGKDD explorations newsletter 19.1 (2017): 22-36.

[22] Tickle, L. (2018, June 12). Fake news: Teaching children the difference between Trump and
truth. The Guardian. https://www.theguardian.com/education/2018/jun/12/fake-news-schools-
trump-truth.

[23] Yang, Shuo, et al. "Unsupervised fake news detection on social media: A generative approach."
Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019. on Social Media: A
Generative Approach.

[24] Yumeng Qin et al. “Predicting Future Rumours.” Chinese Journal of Electronics ( Volume: 27 ,
Issue: 3 , 5 2018, 514 – 520.

[25] Zhang, Jiawei, Bowen Dong, and S. Yu Philip. "Fakedetector: Effective fake news detection
with deep diffusive neural network." 2020 IEEE 36th International Conference on Data Engineering
(ICDE). IEEE, 2020.

36
BIBLOGRAPHY

https://en.wikipedia.org/wiki/Fake_news

https://towardsdatascience.com/fake-news-detection-with-machine-learning-using-python-
3347d9899ad1

https://machinelearningmastery.com

https://www.javatpoint.com

https://www.geeksforgeeks.org

37
38

You might also like