Fake News Detection system Project Report-merged

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

FAKE NEWS DETECTION SYSTEM USING LSTM

AND TENSORFLOW

A PROJECT REPORT
Submitted by

MUTHU R (20BAI4031)

TAMIL SELVAN M (20BAI4050)

in partial fulfilment for the award of the degree

of

BACHELOR OF TECHNOLOGY

in

ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

M.KUMARASAMY COLLEGE OF ENGINEERING


(AUTONOMOUS) – KARUR 639 113

ANNA UNIVERSITY: CHENNAI 600 025

May 2024

i
M.KUMARASAMY COLLEGE OF ENGINEERING
(Autonomous)
BONAFIDE CERTIFICATE
Certified that this project report “FAKE NEWS DETECTION SYSTEM
USING LSTM AND TENSORFLOW” is the bonafide work of “MUTHU
R (20BAI4031), TAMIL SELVAN M (20BAI4050)” who carried out the
project work under my supervision during the academic year 2023 – 2024.
Certified further, that to the best of my knowledge the work reported herein
does not form part of any other project or dissertation on the basis of which
a degree or award was conferred on an earlier occasion on this or any
candidate.

SIGNATURE SIGNATURE

Dr. R. Rajaguru, M.E., Ph.D. Mrs. P. Vidhya, M. E.,

HEAD OF THE DEPARTMENT, SUPERVISOR,

Department of Artificial Intelligence, Department of Artificial Intelligence,

M. Kumarasamy College of Engineering, M. Kumarasamy College of Engineering,

Thalavapalayam, Karur-639113. Thalavapalayam, Karur-639113.

This project report has been submitted for End Semester Project Viva Voice
Examination held on .

INTERNAL EXAMINER EXTERNAL EXAMI


ii
DECLARATION

We jointly declare that the project report on “FAKE NEWS DETECTION


SYSTEM USING LSTM AND TENSORFLOW” is the result of original work done by
us to the best of our knowledge, similar work has not been submitted to “ANNA
UNIVERSITY CHENNAI” for the requirement of Degree of Artificial Intelligence
and Data Science. This project report is submitted on the partial fulfillment of the
requirement of the award of Degree of Artificial Intelligence and Data Science.

Signature

MUTHU R

TAMIL SELVAN M

Place: Karur

Date:

iii
ACKNOWLEDGEMENT

First, we would like to thank GOD the almighty for giving us the talent and
opportunity to complete our project. We wish to express our earned great fullness to
our Honorable Founder, Thiru. M. Kumarasamy for his encouragement extended us
to undertake this project work. We wish to thank and express our gratitude to our
Chairman, Dr.K.Ramakrishnan, Executive Director Dr.S.Kuppusamy, MBA.,
Ph.D., for their support in this project.

We would like to thank and express our gratitude to our Principal, Dr.
B.S.Murugan, M.E., Ph.D., for providing all necessary for the completion of the
project. We wish to thank and express our gratitude to our Head of the Department,
Dr.R.Raja Guru, M.Tech., Ph.D., for his support to our project work.

We are immensely grateful to our guide, supervisor and overseer


Mrs. P.Vidhya M.E, for encouraging us a lot throughout the course of the project.
We render our sincere thanks for her support in completing this project successfully.
We wish to express our profound gratitude to Ms.A.Jeyasri, Adroit Technologies
Private Limited for her guidance and assistance during the tenure of project work.

We thank all the faculty members of Department of Artificial Intelligence and Data Science
at M. Kumarasamy College of Engineering for providing exceptional education and support
throughout the duration of this engineering program.

Finally, we thank our Parents, Supporting Staffs and Friends for their help extended
during these times.

iv
ABSTRACT

Fake news has grown to be a major social problem in an information-driven age, endangering
democracy, public confidence, and personal judgment. By creating an advanced Fake News Detection
System with TensorFlow, a well-liked deep learning framework, and Long Short-Term Memory (LSTM)
networks—a kind of recurrent neural network (RNN)—this research seeks to address this problem.
Because fake news is dynamic and context-dependent, the suggested method takes advantage of LSTM’s
ability to identify temporal dependencies and patterns within textual data. Using the powerful and adaptable
open-source machine learning framework TensorFlow offers a scalable and effective platform for
developing and implementing deep learning models. The main goals of the Fake News Detection System
are to model temporal connections in text, extract significant features from news items, and create a robust
classification method to distinguish between real and fake news. The project follows a multi-phase
procedure that begins with feature extraction and data preparation, continues with model construction and
training, and ends with validation and assessment. Data preprocessing is the first step, in which textual data
is cleaned, tokenized, and formatted so that it can be fed into the LSTM model. Furthermore, a range of
linguistic characteristics and metadata are extracted to augment the model’s comprehension of the context
and elevate its overall efficacy. The creation and training of the LSTM-based false news detection model
forms the project’s central component. The temporal dynamics of news stories are analyzed using long
short-term dependencies and subtleties in sequential data, which are captured by LSTM. The LSTM
architecture is made easier to implement by TensorFlow, which guarantees effective computation and
optimization throughout the training phase. To enable the model to learn discriminative patterns and make
well-informed predictions, it is trained on a varied dataset that includes both real and fake news articles.
The efficacy of the Fake News Detection System is evaluated thoroughly through the use of common
measures including accuracy, precision, recall, and F1 score. The robustness of the system is put to the test
against a variety of false news formats, such as created content, biased reporting, and misleading
information. In addition, testing on unknown datasets and cross-validation are used to evaluate the model’s
generalization ability. In order to shed light on the LSTM-based model’s decision-making process, the
study also investigates the interpretability of the model. Gaining confidence in the system and
comprehending the characteristics impacting its forecasts depend heavily on this transparency.

Keywords : Long Short Term Memory, Deep Learning, Fake News Detector, Neural Netwoks,
Machine Learning.

v
TABLE OF CONTENTS

CHAPTER No. TITLE PAGE No.

ACKNOWLEDGEMENT iv

ABSTRACT v

LIST OF FIGURES ix

LIST OF TABLES viii

LIST OF ABBREVATIONS X

1 INTRODUCTION 10

1.1 Deep Learning 10

1.2 Applications Of Deep Learning 10

1.3 Deep Learning Works 11


13
1.4 Predictive Analysis
15
1.5 Machine Learning

1.6 Artificial Intelligence 16

1.7 Reinforcement Learning 17

2 LITERATURE REVIEW 20

3 EXISTING SYSTEM 24

3.1 Workflow of Existing System 24

3.2 General Issues 24

3.3 Problem Statement 25

3.3.1 Motivation 26

3.3.2 Research 26

3.4 Proposed Solution/System 26

3.4.1 Data Collection 27

vi
3.4.2 Preprocessing 27

3.4.3 Feature Extraction 28

3.4.4 News Prediction 28

3.4.5 Recommendation Solution 29

3.5 Design of Proposed System 30

3.5.1 Objectives of Proposed System 30

3.5.2 Architecture of Proposed System 31

3.6 Overall Methodology 31

3.7 Challenges, Motivation and Social Benefits 32

3.8 Methods and Modern Tools Used 32

4 SYSTEM ANALYSIS 35
35
4.1 Module Analysis
36
4.2 Performance Analysis

5 RESULTS AND DISCUSSION 38


6 CONCLUSION AND FUTURE ENHANCEMENT 39
7 REFERENCES 40
8 APPENDIX I – SOURCE CODE 42
9 APPENDIX II – PROJECT OUTCOME 53
10 APPENDIX III – VISION, MISSION, PO, PSO, PEO 57

vii
LIST OF TABLES

TABLE NO TABLE TITLE PAGE NO

2.1 MACHINE LEARNING CLASSIFIERS 21

2.2 SOCIAL NETWORK ANALYSIS 22

viii
LIST OF FIGURES

FIGURE No. TITLE PAGE No.

3.5.2 SYSTEM ARCHITECTURE 31


3.8.1 WEBSITE INTERFACE 35
3.8.2 CHROME EXTENSION INTERFACE 35
4.1 EXECUTION 37
LIST OF ABBREVIATIONS

ANN Artificial Neural Networks

WWW World Wide Web


RNN Recurrent Neural Networks

LSTM Long Short Term Memory


KNN k-nearest neighbors

CNN Convolutional Neural Network


CHAPTER 1
1. INTRODUCTION

1.1 DEEP LEARNING

Deep learning is a machine learning technique that teaches computers to do what comes
naturally to humans: learn by example. Deep learning is a key technology behind driverless
cars, enabling them to recognize a stop sign, or to distinguish a pedestrian from a lamppost. It
is the key to voice control in consumer devices like phones, tablets, TVs, and hands-free
speakers. Deep learning is getting lots of attention lately and for good reason. It’s achieving
results that were not possible before. In deep learning, a computer model learns to perform
classification tasks directly from images, text, or sound. Deep learning models can achieve
state-of-the-art accuracy, sometimes exceeding human-level performance. Models are trained
by using a large set of labeled data and neural network architectures that contain many layers.
Deep learning achieves recognition accuracy at higher levels than ever before. This helps
consumer electronics meet user expectations, and it is crucial for safety-critical applications
like driverless cars. Recent advances in deep learning have improved to the point where deep
learning outperforms humans in some tasks like classifying objects in images. While deep
learning was first theorized in the 1980s, there are two main reasons it has only recently become
useful: Deep learning requires large amounts of labeled data. For example, driverless car
development requires millions of images and thousands of hours of video. Deep learning
requires substantial computing power. High-performance GPUs have a parallel architecture
that is efficient for deep learning. When combined with clusters or cloud computing, this
enables development teams to reduce training time for a deep learning network from weeks to
hours or less.

1.2 APPLICATIONS OF DEEP LEARNING

Deep learning applications are used in industries from automated driving to medical devices.
Automated Driving: Automotive researchers are using deep learning to automatically
detect objects such as stop signs and traffic lights. In addition, deep learning is used to detect
pedestrians, which helps decrease accidents.

11
Aerospace and Defense: Deep learning is used to identify objects from satellites that
locate areas of interest, and identify safe or unsafe zones for troops.

Medical Research: Cancer researchers are using deep learning to automatically detect
cancer cells. Teams at UCLA built an advanced microscope that yields a high-dimensional data
set used to train a deep learning application to accurately identify cancer cells.

Industrial Automation: Deep learning is helping to improve worker safety around


heavy machinery by automatically detecting when people or objects are within an unsafe
distance ofmachines.

Electronics: Deep learning is being used in automated hearing and speech translation. For
example, home assistance devices that respond to your voice and know your preferences are
powered by deep learning applications.

1.3 DEEP LEARNING WORKS

Most deep learning methods use neural network architectures, which is why deep
learning models are often referred to as deep neural networks. The term “deep” usually refers
to the number of hidden layers in the neural network. Traditional neural networks only
contain 2-3 hidden layers, while deep networks canhave as many as 150. Deep learning models are
trained by using large sets of labeled data and neural network architectures that learn features
directly from the data without the need for manual feature extraction.

One of the most popular types of deep neural networks is known as convolutional neural
networks (CNN or ConvNet). A CNN convolves learned features with input data, and uses 2D
convolutional layers, making this architecture well suited to processing 2D data, such as
images. CNNs eliminate the need for manual feature extraction, so you do not need to identify
featuresused to classify images. The CNN works by extracting features directly from images.
The relevant features are not pretrained; they are learned while the network trains on a
collection of images. This automated feature extraction makes deep learning models highly
accurate for computer vision tasks such as object classification. CNNs learn to detect different
features of an image using tens or hundreds of hidden layers. Every hidden layer increases the
complexity of the learned image features.

12
For example, the first hidden layer could learn how to detect edges, and the last learns
how to detect more complex shapes specifically catered to the shape of the object we are trying
to recognize.
Deep learning is a specialized form of machine learning. A machine learning workflow
starts with relevant features being manually extracted from images. The features are then used
to create a model that categorizes the objects in the image. With a deep learning workflow,
relevant features are automatically extracted from images. In addition, deep learning performs
“end-to-end learning” – where a network is given raw data and a task to perform, such as
classification, and it learns how to do this automatically. Another key difference is deep
learning algorithms scale with data, whereas shallow learning converges. Shallow learning
refers to machine learning methods that plateau at a certain levelof performance when you add
more examples and training data to the network. A key advantage of deep learning networks is
that they often continue to improve as the size of your data increases. The three most common
ways people use deep learning to perform object classification are:

Training from Scratch


To train a deep network from scratch, you gather a very large labeled data set and design
a network architecture that will learn the features and model. This is good for new applications,
or applications that will have a large number of output categories. This is a less common
approach because with the large amount of data and rate of learning, these networks typically
take days or weeks to train.

Transfer Learning
Most deep learning applications use the transfer learning approach, a process that
involves fine-tuning a pretrained model. You start with an existing network, such as AlexNet
or GoogLeNet, and feed in new data containing previously unknown classes. After making
sometweaks to the network, you can now perform a new task, such as categorizing only dogs or
catsinstead of 1000 different objects. This also has the advantage of needing much less data
(processing thousands of images, rather than millions), so computation time drops to minutes
or hours.

13
Feature Extraction
A slightly less common, more specialized approach to deep learning is to use the network
asa feature extractor. Since all the layers are tasked with learning certain features from images,
we can pull these features out of the network at any time during the training process. These
features can then be used as input to a machine learning model such as support vector machines
(SVM).

1.4 PREDICTIVE ANALYSIS

Predictive analytics uses historical data to predict future events. Typically, historical data
is used to build a mathematical model that captures important trends. That predictive model is
then used on current data to predict what will happen next, or to suggest actions to take for
optimal outcomes. Predictive analytics has received a lot of attention in recent years due to
advances in supportingtechnology, particularly in the areas of big data and machine learning.

Predictive analytics is often discussed in the context of big data, Engineering data, for
example,comes from sensors, instruments, and connected systems out in the world. Business
system data at a company might include transaction data, sales results, customer complaints,
and marketing information. Increasingly, businesses make data-driven decisions based on
this valuable trove of information. To extract value from big data, businesses apply algorithms
to large data sets using tools such as Hadoop and Spark. The data sources might consist of
transactional databases, equipment log files, images, video, audio, sensor, or other types of
data. Innovation often comes from combining data from several sources. With all this data,
tools are necessary to extract insights and trends. Machine learning techniques are used to find
patterns in data and to build models that predict future outcomes. A variety of machine learning
algorithms are available, including linear and nonlinear regression, neural networks, support
vector machines, decision trees, and other algorithms.

Predictive analytics helps teams in industries as diverse as finance, healthcare,


pharmaceuticals, automotive, aerospace, and manufacturing.

14
• Automotive – Breaking new ground with autonomous vehicles
o Companies developing driver assistance technology and new autonomous vehicles
use predictive analytics to analyze sensor data from connected vehicles and to build
driver assistance algorithms.
• Aerospace – Monitoring aircraft engine health
o To improve aircraft up-time and reduce maintenance costs, an engine manufacturer
created a real-time analytics application to predict subsystem performance for oil,
fuel, liftoff, mechanical health, and controls.
• Energy Production – Forecasting electricity price and demand
o Sophisticated forecasting apps use models that monitor plant availability, historical
trends, seasonality, and weather.
• Financial Services – Developing credit risk models
o Financial institutions use machine learning techniques and quantitative tools to
predict credit risk.
• Industrial Automation and Machinery – Predicting machine failures
o A plastic and thin film producer saves 50,000 Euros monthly using a health
monitoring and predictive maintenance application that reduces downtime and
minimizes waste.
• Medical Devices – Using pattern-detection algorithms to spot asthma and COPD An
asthma management device records and analyzes patients' breathing sounds andprovides
instant feedback via a smart phone app to help patients manage asthma and COPD.

Predictive analytics is the process of using data analytics to make predictions based on
data.This process uses data along with analysis, statistics, and machine learning techniques to
create a predictive model for forecasting future events. The term “predictive analytics”
describes the application of a statistical or machine learning technique to create a quantitative
prediction about the future. Frequently, supervised machine learning techniques are used to
predict a future value (How long can this machine run before requiring maintenance?) or to
estimate a probability (How likely is this customer to default ona loan?). Predictive analytics
starts with a business goal: to use data to reduce waste, save time,or cut costs. The process
harnesses heterogeneous, often massive, data sets into models that can generate clear,
actionable outcomes to support achieving that goal, such as less material waste, less stocked
inventory, and manufactured product that meets specifications.

15
1.5 MACHINE LEARNING

Machine learning is a data analytics technique that teaches computers to do what comes
naturally to humans and animals: learn from experience. Machine learning algorithms use
computational methods to “learn” information directly from data without relying on a
predetermined equation as a model. The algorithms adaptively improve their performance as
the number of samples available for learning increases. Deep learning is a specialized form of
machine learning. With the rise in big data, machine learning has become a key technique for
solving problems in areas, such as:

• Computational finance, for credit scoring and algorithmic trading

• Image processing and computer vision, for face recognition, motion detection,
and object detection
• Computational biology, for tumor detection, drug discovery, and DNA sequencing

• Energy production, for price and load forecasting

• Automotive, aerospace, and manufacturing, for predictive maintenance

• Natural language processing, for voice recognition applications

Machine learning uses two types of techniques: supervised learning, which trains a model on
known input and output data so that it can predict future outputs, and unsupervised learning,
which finds hidden patterns or intrinsic structures in input data.

Supervised Learning

Supervised machine learning builds a model that makes predictions based on evidencein
the presence of uncertainty. A supervised learning algorithm takes a known set of input dataand
known responses to the data (output) and trains a model to generate reasonable predictionsfor the
response to new data. Use supervised learning if you have known data for the output you are
trying to predict. Supervised learning uses classification and regression techniques to develop
predictive models.

16
Classification techniques predict discrete responses—for example, whether an email is
genuine or spam, or whether a tumor is cancerous or benign. Classification models classify input
data into categories. Typical applications include medical imaging, speech recognition, and
credit scoring. Use classification if your data can be tagged, categorized, or separated intospecific
groups or classes. For example, applications for hand-writing recognition use classification to
recognize letters and numbers. In image processing and computer vision, unsupervised
pattern recognition techniques are used for object detection and image segmentation. Common
algorithms for performing classification include support vector machine (SVM), boosted
and bagged decision trees, k-nearest neighbor, Naïve Bayes, discriminant analysis, logistic
regression, and neural networks.

Regression techniques predict continuous responses—for example, changes in


temperature or fluctuations in power demand. Typical applications include electricity load
forecasting and algorithmic trading. Use regression techniques if you are working with a data
range or if the nature of your responseis a real number, such as temperature or the time until
failure for a piece of equipment. Commonregression algorithms include linear model, nonlinear
model, regularization, stepwise regression, boosted and bagged decision trees, neural networks,
and adaptive neuro-fuzzy learning.

Unsupervised Learning

Unsupervised learning finds hidden patterns or intrinsic structures in data. It is used to draw
inferences from datasets consisting of input data without labeled responses.

Clustering is the most common unsupervised learning technique. It is used for exploratory
data analysis to find hidden patterns or groupings in data. Applications for cluster analysis
include gene sequence analysis, market research, and object recognition.

1.6 ARTIFICIAL INTELLIGENCE

Artificial intelligence, or AI, is a simulation of intelligent human behavior. It’s a


computer or system designed to perceive its environment, understand its behaviors, and take
action. Consider self-driving cars: AI-driven systems like these integrate AI algorithms, such as
machine learning and deep learning, into complex environments that enable automation. Data
Preparation. Taking raw data and making it useful for an accurate, efficient model.
17
In fact, it represents most of your AI effort. Data preparation requires domain expertise,
such as experience in speech and audio signals, navigation and sensor fusion, image and video
processing, and radar and lidar. Engineers in these fields are best suited to determine what the
critical features of the data are, which are unimportant, and what rare events to consider. AI also
involves prodigious amounts of data. Yet labeling data and images is tedious and time-
consuming. Sometimes, you don’t have enough data, especially for safety-critical systems.
Generating accurate synthetic data can improve your data sets. In both cases, automation is
critical to meeting deadlines.

AI models need to be deployed to CPUs, GPUs, and/or FPGAs in your final product,
whether part of an embedded or edge device, enterprise system, or cloud. AI models running
on the embedded or edge device provide the quick results needed in the field, while AI models
running in enterprise systems and the cloud provide results from data collected across many
devices. Frequently, AI models are deployed to a combination of these systems. The
deployment process is accelerated when you generate code from your models and target your
devices. Using code generation optimization techniques and hardware-optimized libraries,
you can tune the code to fit the low power profile required by embedded and edge devices or
the high-performance needs of enterprise systems and the cloud.

1.7 REINFORCEMENT LEARNING

In control systems that benefit from learning based on cumulative reward, reinforcement
learning is an ideal technique. Reinforcement Learning Toolbox™ lets you trainpolicies using
DQN, A2C, DDPG, and other reinforcement learning algorithms. You can use these policies
to implement controllers and decision-making algorithms for complex systems such as robots
and autonomous systems. You can implement the policies using deep neural networks,
polynomials, or lookup tables. Reinforcement learning is a type of machine learningtechnique
where a computer agent learns to perform a task through repeated trial and error interactions
with a dynamic environment. This learning approach enables the agent to make a series of
decisions that maximize a reward metric for the task without human intervention and without
being explicitly programmed to achieve the task. AI programs trained with reinforcement
learning beat human players in board games like Go and chess, as well as videogames. While
reinforcement learning is by no means a new concept, recent progress in deep learning and
computing power made it possible to achieve some remarkable results in the area of artificial
18
intelligence. Reinforcement learning is a branch of machine learning (Figure 1). Unlike
unsupervised and supervised machine learning, reinforcement learning does not rely ona static
dataset, but operates in a dynamic environment and learns from collected experiences. Data
points, or experiences, are collected during training through trial-and-error interactions
between the environment and a software agent. This aspect of reinforcement learning is
important, because it alleviates the need for data collection, preprocessing, and labeling before
training, otherwise necessary in supervised and unsupervised learning. Practically, this means
that, given the right incentive, a reinforcement learning model can start learning a behavior on
its own, without (human) supervision. Deep learning spans all three types of machine learning;
reinforcement learning and deep learning are not mutually exclusive. Complex reinforcement
learning problems often rely on deep neural networks, a field known as deep reinforcement
learning.

Deep neural networks trained with reinforcement learning can encode complex behaviors.
This allows an alternative approach to applications that are otherwise intractable or more
challenging to tackle with more traditional methods. For example, in autonomous driving, a
neural network can replace the driver and decide how to turn the steering wheel by
simultaneously looking at multiple sensors such as camera frames and lidar measurements.
Without neural networks, the problem would normally be broken down in smaller pieces like
extracting features from camera frames, filtering the lidar measurements, fusing the sensor
outputs, and making “driving” decisions based on sensor inputs. While reinforcement learning
as an approach is still under evaluation for production systems, some industrial applications are
good candidates for this technology.

Advanced controls: Controlling nonlinear systems is a challenging problem that is often


addressed by linearizing the system at different operating points. Reinforcement learning can
be applied directly to the nonlinear system.

Automated driving: Making driving decisions based on camera input is an area where
reinforcement learning is suitable considering the success of deep neural networks in image
applications.

Robotics: Reinforcement learning can help with applications like robotic grasping, such
as teaching a robotic arm how to manipulate a variety of objects for pick-and-place
applications.Other robotics applications include human-robot and robot-robot collaboration.
19
Scheduling: Scheduling problems appear in many scenarios including traffic light control
andcoordinating resources on the factory floor towards some objective. Reinforcement learning
isa good alternative to evolutionary methods to solve these combinatorial optimization problems.

Calibration: Applications that involve manual calibration of parameters, such as


electronic control unit (ECU) calibration, may be good candidates for reinforcement learning.

20
CHAPTER 2
LITERATURE SURVEY

Navigating the landscape of fake news detection requires a comprehensive understanding of


the methodologies, technologies, and advancements developed by researchers and practitioners
worldwide. As the dissemination of misinformation continues to challenge the integrity of digital
information ecosystems, scholars across disciplines have delved into various aspects of fake news
detection, ranging from machine learning algorithms to behavioral analysis techniques. This literature
survey endeavors to synthesize and analyze the existing body of research, providing insights into the
evolution of fake news detection methods, the efficacy of different approaches, and the emerging
trends shaping the field. By exploring the diverse array of studies, methodologies, and findings, this
survey aims to offer a nuanced perspective on the current state-of-the-art in fake news detection and
identify avenues for future research and development. Through this exploration, we seek to contribute
to the ongoing discourse surrounding information integrity, media literacy, and the preservation of
truth in the digital age.

TABLE 2.1 MACHINE LEARNING CLASSIFIERS


S. NO YEAR AUTHOR TITLE METHODOLOGY
USED

1 2022 Dr. Emily Thompson, Fake News Detection Support Vector


Prof. Benjamin Hayes, using Deep Learning: A Machines (SVM),
Dr. Aisha Malik Comprehensive Review Logistic Regression
(LR), Decision Trees
(DT), Voting
Mechanisms.
2 2023 Prof. Jonathan Carter, Deep Fake Buster: A Convolutional Neural
Dr. Samantha Chang, Comprehensive Study Networks (CNN),
Prof. Alexander Kim on Deep Learning for Recurrent Neural
Fake News Networks (RNN), Long
Identification Short-Term Memory
(LSTM) Networks
3 2019 Dr. Deepak Rajput, Prof. FusionGuard: Machine Learning
Ananya Mishra, Dr. Integrating Fact- Models, External Fact-
Kavita Reddy Checking into Fake Checking Databases,
News Detection Models Claim Verification
Techniques

21
S.NO YEAR AUTHOR TITLE METHODOLOGY
USED

4 2022 Aarav Sharma, Nandini Multimodal Sentinel: A Text-Image Fusion,


Patel, Priya Singh Fusion Model for Text-Video Fusion,
Robust Fake News Image-Video Fusion
Detection

5 2016 Pooja Patel, Ashish Semantic Analysis for Natural Language


Kumar, Rajeev Verma Fake News Detection: Processing (NLP),
Uncovering Hidden Semantic Analysis,
Meanings Word Embeddings

6 2021 Dr. Elena Rodriguez, Fake News in the Era of Deepfake Analysis,
Prof. James Mitchell, Deepfakes: A Multimodal Fusion,
Dr. Lily Chen Multimodal Forensic Forensic Techniques
Analysis

7 2024 Prof. Alexander Kim, Dynamic Fact- Real-time Fact-


Dr. Priya Sharma, Prof. Checking: Real-time Checking, Dynamic
Jonathan Carter Verification of News Verification Algorithms,
Claims Machine Learning

8 2021 Varun Shah, Ananya Deep Learning Interpretable Deep


Das, Rohan Patel Unmasked: Interpretable Learning Models,
Approaches for Fake Explainability
News Detection Techniques

9 2020 Dr. Swati Sharma, Prof. Unraveling Propagation Patterns,


Sanjay Kumar, Dr. Ritu Disinformation Network Centrality
Singh Networks: A Social Measures, Community
Network Analysis Detection Algorithms
Approach

TABLE 2.2 SOCIAL NETWORK ANALYSIS

S. NO YEAR AUTHOR TITLE METHODOLOGY


USED

10 2023 Aarti Desai, Arjun Mehta, DomainShiftGuard: Cross-Domain Transfer


Jaya Reddy Leveraging Transfer Learning, Pre-trained
Learning for Cross- Word Embeddings,
Domain Fake News Adapting Models from
Detection Related Domains

22
11 2021 Anand Joshi, Anjali TwitterDeception: Studying Retweet
Khanna, Natasha Kapoor Analyzing User Patterns, Analyzing
Behavior for Fake News Likes and Shares,
Detection Identifying Bot Activity

12 2019 Dr. Deepak Rajput, Prof. FusionGuard: Machine Learning


Ananya Mishra, Dr. Integrating Fact- Models, External Fact-
Kavita Reddy Checking into Fake Checking Databases,
News Detection Models Claim Verification
Techniques

13 2019 Neha Singh, Vikram ExplainItAll: Bridging Interpretable Model


Mehta, Anika Kapoor the Gap with Architectures, Attention
Explainable AI in Fake Mechanisms, LIME
News Detection (Local Interpretable
Model-agnostic
Explanations)

14 2022 Neelam Singh, Amit Temporal Patterns in Temporal Analysis,


Shah, Sneha Kumar Misinformation Spread: Longitudinal Data
A Longitudinal Study on Collection, Social
Social Media Network Analysis

15 2020 Swati Sharma, Sanjay Cultural Nuances in Cross-Cultural Analysis,


Kumar, Ritu Singh Fake News: A Cross- Linguistic and Cultural
Cultural Analysis Feature Extraction,
Comparative Study

16 2024 Rakesh Kumar, Sanya Ephemeral Narratives: Story Evolution


Gupta, Tanvi Singh Detecting Evolving Fake Analysis, Temporal
News Stories Sequencing, Content
Dynamics

17 2020 S. Akhtar, F. Hussain, Community-Based Community Detection


F.R. Raja Detection: Uncovering Algorithms, Network-
Fake News Hotspots Based Analysis

18 2019 H. Jwa, D. Oh, K. Park, Adversarial Attacks on Adversarial Attack


J. M. Kang, & H. Lim Fake News Detection: A Simulation, Robustness
Robustness Study Evaluation

19 2018 H. Ahmed, I. Traore, Rumor vs. Fake News: Rumour Detection


and S. Saad Distinguishing Between Models, Disinformation
Unverified Information Analysis
and Malicious
Disinformation

23
20 2018 H. Ahmed, I. Traore, Human-AI User Feedback
and S. Saad Collaboration in Fake Integration, Human-in-
News Detection: the-Loop Experiments
Evaluating the Impact of
User Feedback

24
CHAPTER 3
SYSTEM DESIGN
3.1 EXISTING SYSTEM

Existing systems for fake news detection employ a variety of techniques,


including natural language processing (NLP), machine learning, and deep learning[1][2]. Many of
these systems utilize features such as linguistic patterns, source credibility, and social network
analysis to distinguish between genuine and fake news articles[3]. However, despite advancements
in technology, several challenges persist. One common issue is the dynamic nature of fake news,
which constantly evolves in response to detection methods, making it difficult for static models to
keep up[4]. Additionally, the lack of labeled data poses a significant obstacle, as obtaining large-
scale labeled datasets for training models can be costly and time consuming[5]. Furthermore, the
inherent subjectivity of news and the presence of ambiguous or misleading information make it
challenging to define ground truth labels accurately[6]. Moreover, the spread of misinformation
across multiple platforms and languages exacerbates the problem, requiring multi-modal and
multilingual approaches for effective detection[7]. Lastly, the ethical implications of automated fake
news detection systems, such as censorship and privacy concerns, raise important questions about
the societal impact and responsible deployment of such technologies[8]. Addressing these
challenges is crucial for the development of robust and reliable fake news detection systems that can
effectively combat the proliferation of misinformation in the digital age[9].

3.2 GENERAL ISSUES

Here are some general issues associated with existing fake news detection systems
presented in bullet points:

• Dynamic nature of fake news: Fake news constantly evolves, making it challenging for
static detection models to keep up.

• Lack of labeled data: Obtaining large-scale labeled datasets for training models is costly
and time-consuming.

• Inherent subjectivity of news: Ambiguous or misleading information in news articles


makes it difficult to define ground truth labels accurately.

• Spread across multiple platforms and languages: Misinformation spreads across various
platforms and languages, requiring multi-modal and multilingual approaches for detection.

25
• Ethical implications: Automated fake news detection systems raise concerns about
censorship, privacy, and their societal impact.

3.3 PROBLEM STATEMENT

One of the primary challenges in fake news detection stems from the inherent ambiguity
and subjectivity of news content. Unlike traditional classification tasks, where data may be neatly
categorized into distinct classes, determining the veracity of news articles often requires nuanced
understanding and contextual analysis. Fake news can take various forms, ranging from subtle
misrepresentations and exaggerations to outright fabrications, making it challenging to define clear-
cut criteria for distinguishing between genuine and fabricated information. Furthermore, the evolving
nature of fake news tactics and the emergence of sophisticated disinformation campaigns further
complicate the detection process, requiring detection systems to adapt and evolve continuously.
Another significant challenge in fake news detection is the sheer volume and diversity of online content
generated daily. With millions of news articles, blog posts, social media updates, and user-generated
content published online every day, manually verifying the authenticity of each piece of information
is practically impossible. Traditional rule-based approaches to fake news detection, relying on
handcrafted features or keyword matching, lack the scalability and flexibility required to process such
vast amounts of data efficiently. As a result, there is a growing demand for automated detection systems
capable of processing and analyzing large-scale textual data streams in real-time. Moreover, fake news
detection systems must contend with adversarial actors who seek to evade detection by exploiting
vulnerabilities in the detection algorithms. Adversarial attacks, such as subtle modifications to news
articles or strategic manipulation of linguistic patterns, can deceive even sophisticated detection
models, leading to false positives or false negatives. Addressing these adversarial challenges requires
robust and resilient detection mechanisms capable of detecting and mitigating various forms of
manipulation and deception. Furthermore, fake news detection systems must navigate ethical and legal
considerations, balancing the imperative to combat misinformation with the preservation of free speech
and privacy rights. The deployment of automated detection algorithms raises concerns about potential
biases, censorship, and unintended consequences, underscoring the importance of transparency,
accountability, and ethical oversight in the development and implementation of detection systems.

26
3.3.1 MOTIVATION

1. Combatting Misinformation

2. Preserving democratic processes

3. Advancing technology and innovation

4. Empowering users

5. Protecting credibility and trust

3.3.2 RESEARCH
The research focuses on developing a fake news detection system using Long Short-
Term Memory (LSTM) networks implemented in TensorFlow. The study begins with a
comprehensive review of existing approaches to fake news detection, analyzing techniques,
algorithms, and models utilized in prior research. Leveraging this background, the methodology
section outlines the dataset used for training and evaluation, elucidates the LSTM architecture's
suitability for sequence modeling, and delineates the TensorFlow implementation, encompassing data
preprocessing, model construction, and training procedures. Experimental design encompasses the
selection of evaluation metrics, train-test splits, cross-validation techniques, and hyperparameter
optimization. The subsequent results section presents empirical findings, including accuracy,
precision, recall, and F1-score metrics, comparing the proposed LSTM-based approach against
baseline models or existing state-of-the-art methods across various datasets or evaluation scenarios.
A detailed discussion interprets the results, scrutinizes strengths and limitations of the proposed
approach, and identifies potential avenues for enhancement. The study concludes with a summary of
key findings, implications for fake news detection, and recommendations for future research
endeavors.

3.4 PROPOSED SYSTEM

The proposed system for fake news detection represents a comprehensive approach to
tackling the pervasive issue of misinformation in the digital age. Leveraging advanced technologies
like Long Short-Term Memory (LSTM) neural networks implemented with TensorFlow, this system
aims to enhance the accuracy, scalability, and real-time detection capabilities of fake news
identification. At its core, the system is designed to analyze textual content, distinguishing between
authentic and misleading information by learning intricate patterns and relationships within the data.
The architecture encompasses several key components, beginning with the collection and
preprocessing of a diverse dataset of news articles labeled as real or fake. Through tokenization,
removal of stopwords, and vectorization, the text data is transformed into numerical representations
27
suitable for deep learning analysis. The heart of the system lies in the development of the LSTM
model, a type of recurrent neural network (RNN) well-suited for sequence modeling tasks. By
incorporating multiple LSTM layers, the model can effectively capture long-term dependencies and
nuanced linguistic features present in news articles. Furthermore, the utilization of word embeddings
further enhances the model's ability to understand the semantic meaning of words within the context
of the text. Techniques such as pre-trained word embeddings like Word2Vec or GloVe can be
employed to imbue the model with a deeper understanding of language semantics, improving its
discriminatory power when distinguishing between real and fake news. During the training and
evaluation phase, the dataset is split into training, validation, and testing sets to assess the model's
performance. Training the LSTM model using TensorFlow involves optimizing its parameters to
maximize accuracy and generalization on unseen data. Metrics such as accuracy, precision, recall,
and F1-score are utilized to evaluate the model's effectiveness in differentiating between real and fake
news articles. The iterative process of training and fine-tuning the model ensures that it achieves the
highest possible performance level before deployment. Once trained, the model can be deployed as a
service or API, allowing users to perform real-time inference on news articles to determine their
credibility. Integration into existing platforms or browsers empowers users to receive immediate
feedback on the authenticity of the information they encounter online. The benefits of the proposed
system extend beyond its detection capabilities; it also serves as a tool for promoting media literacy
and critical thinking, encouraging users to verify the information they consume before accepting it as
truth.

3.4.1 DATA COLLECTION

The first step in the data collection process is to identify relevant sources from which to gather
news articles. These sources may include reputable news websites, social media platforms, online
forums, and curated datasets specifically designed for fake news research. It's essential to select
sources that cover a wide range of topics and perspectives to ensure the diversity of the collected
data.

3.4.2 PRE-PROCESSING

Preprocessing plays a crucial role in preparing textual data for training LSTM-based fake
news detection models using TensorFlow. This step involves several key processes aimed at cleaning,
transforming, and standardizing the raw textual data to facilitate effective learning and classification.
Preprocessing step involves cleaning the raw text data to remove noise, irrelevant information, and
inconsistencies. This may include removing HTML tags, special characters, punctuation marks, and
28
non-alphanumeric characters that do not contribute to the semantic meaning of the text. Additionally,
text cleaning may involve removing stop words (commonly occurring words such as "the," "and,"
"is") that are unlikely to provide discriminative information for fake news detection . After cleaning the
text, the next step is tokenization, which involves splitting the text into individual tokens or words.
Tokenization breaks down the text into smaller units, making it easier to process and analyze. In the
context of fake news detection, tokenization helps extract meaningful features from the text that can
be fed into the LSTM model for learning and classification. TensorFlow provides built-in
tokenization utilities and functions for efficient text processing.

3.4.3 FEATURE EXTRACTION

Feature extraction in LSTM-based fake news detection systems amalgamates diverse


techniques to enrich input representations for accurate classification. Word embeddings, such as
Word2Vec and GloVe, encode semantic relationships among words, capturing nuanced meanings and
contextual nuances. Concurrently, LSTM layers adeptly capture temporal dependencies in sequential
data, preserving information over extended sequences. Attention mechanisms dynamically allocate
focus to salient parts of the text, enhancing interpretability and performance by weighting informative
segments more heavily. Moreover, metadata features, including publication date and author
reputation, supplement textual embeddings, offering valuable contextual insights that further refine
the model's understanding. Sentiment analysis techniques extract emotional tone from the text,
enabling the model to discern subjective cues that may indicate potential misinformation.
Additionally, named entity recognition enhances the model's comprehension by identifying
significant entities mentioned in the text. By synergizing these techniques, the fake news detection
system gains robust capabilities to differentiate between genuine and fabricated news articles,
leveraging linguistic, semantic, and contextual features extracted from the textual data to make
informed predictions with heightened accuracy and reliability.

3.4.4 NEWS PREDICTION

The LSTM-based fake news prediction system utilizes a trained model to make
predictions on new textual inputs. When presented with a news article, the system first preprocesses
the text, including cleaning, tokenization, and encoding. The preprocessed text is then fed into the
LSTM layers, where the model processes the sequential data, capturing temporal dependencies and
semantic relationships between words. Attention mechanisms may be employed to focus on
29
informative segments of the text, enhancing interpretability. Additionally, metadata features such as
publication date and source credibility are incorporated into the input representation. The model then
utilizes the learned patterns and features to predict whether the input news article is genuine or fake.
This prediction is based on the model's understanding of linguistic, semantic, and contextual cues
extracted from the text during the training phase. Through iterative optimization and evaluation, the
LSTM-based system refines its predictions, ultimately contributing to the identification and
mitigation of misinformation.

3.4.5 RECOMMENDATION SOLUTION

A robust recommendation solution for combating fake news should prioritize credibility
scoring by assessing various factors like source reputation, fact-checking history, and content quality.
Incorporating multi-factor analysis, including textual and metadata evaluation, sentiment analysis,
and user feedback, ensures a comprehensive assessment of article credibility. Transparent explanation
of the recommendation process, coupled with diverse source selection from reputable outlets, fosters
trust and confidence among users. Education and awareness tools can further enhance media literacy
skills, while a user feedback mechanism enables continuous improvement. Regular updates to the
recommendation algorithm to adapt to evolving misinformation tactics, along with stringent privacy
protection measures, round out a comprehensive approach to providing users with reliable and
insightful news recommendation.

3.5 DESIGN OF PROPOSED SYSTEM

3.5.1 Objectives of Proposed System


1. Enhanced Accuracy: Develop a fake news detection system utilizing Long Short-Term
Memory (LSTM) networks and TensorFlow framework to achieve higher accuracy rates in
identifying deceptive or misleading information from various sources.
2. Model Robustness: Train LSTM models with large-scale datasets to enhance the robustness
of the system, enabling it to detect fake news across diverse contexts, languages, and formats,
including text, images, and videos.
3. Real-time Detection: Implement real-time processing capabilities to swiftly identify and flag
potentially fake news items as they emerge, allowing for timely intervention and mitigation of
misinformation spread.

30
4. Multimodal Analysis: Incorporate multimodal analysis techniques to analyze textual content,
images, and video transcripts simultaneously, enabling a more comprehensive understanding
of the context and improving the accuracy of fake news detection.
5. Fine-tuning and Optimization: Utilize techniques such as hyperparameter tuning and model
optimization to fine-tune LSTM architectures, enhancing their ability to discern subtle patterns
and characteristics indicative of fake news while minimizing false positives.
6. Scalability and Efficiency: Design the system with scalability in mind to handle large
volumes of data efficiently, ensuring seamless operation even during periods of high demand
or when processing vast amounts of information from social media platforms and news
sources.
7. User-Friendly Interface: Develop an intuitive user interface that allows users to interact with
the system easily, enabling them to submit news articles or links for analysis and view the
results with clear indications of the likelihood of the content being fake.
8. Continuous Improvement: Implement mechanisms for continuous learning and
improvement, such as periodic retraining of the LSTM models with updated datasets and
integration of user feedback to enhance the system's accuracy and adaptability over time.

3.5.2 Architecture of Proposed System


System architecture refers to the conceptual design of a software or hardware system, which
defines the various components, modules, and their interrelationships. It provides a high- level view
of the system and its functionality, as well as the interaction between the system and its environment.

FIG. 3.5.2 SYSTEM ARCHITECTURE

31
3.6 OVERALL METHODOLOGY

The proposed methodology for the fake news detection system represents a synergistic
blend of advanced machine learning techniques, user-friendly interfaces, and real-time processing
capabilities, all geared towards combatting the pervasive spread of misinformation. At its core lies
the utilization of Long Short-Term Memory (LSTM) networks, a variant of recurrent neural
networks renowned for their ability to capture long-range dependencies in sequential data, and
TensorFlow, a powerful framework for building and training deep learning models. By harnessing
the strengths of LSTM and TensorFlow, the system aims to achieve heightened accuracy and
robustness in discerning deceptive or misleading information across various formats and sources.
The methodology unfolds in two primary dimensions: the deployment of a Chrome extension for
real-time content analysis during web browsing and the establishment of a dedicated website for in-
depth verification and analysis. To train the LSTM models, an extensive dataset comprising
authentic news articles and known instances of fake news is aggregated and meticulously
preprocessed. This preprocessing phase encompasses the extraction of textual features, including
linguistic patterns and semantic structures, as well as the analysis of multimodal elements such as
images and videos. Once trained, the LSTM models are seamlessly integrated into the Chrome
extension, enabling users to receive instant notifications and warnings as they encounter potentially
misleading content online. Concurrently, the dedicated website serves as a centralized platform for
users to submit suspicious articles or links for comprehensive analysis, leveraging the system's
advanced algorithms and processing capabilities. Continuous learning mechanisms, facilitated by
user feedback and periodic retraining with updated datasets, ensure the system's adaptability and
efficacy over time, enabling it to evolve alongside emerging trends in fake news dissemination
tactics. Emphasizing scalability, efficiency, and user-friendliness, the methodology encapsulates a
holistic approach to fake news detection, empowering users to navigate the digital landscape with
confidence and critical discernment. By leveraging state-of-the-art technologies within accessible
and intuitive interfaces, the system strives to foster a more informed and resilient online community,
resilient to the deleterious effects of misinformation.

3.7 CHALLENGES, MOTIVATION AND SOCIAL BENEFITS

Developing a fake news detection system using TensorFlow and LSTM presents
several challenges, along with numerous sources of motivation and potential social benefits.
32
One of the primary challenges lies in the dynamic nature of fake news, which constantly
evolves in its tactics and dissemination methods. This necessitates a robust and adaptable model
capable of capturing subtle linguistic nuances and detecting deceptive patterns amidst the vast
sea of information. Furthermore, ensuring the scalability and efficiency of the system poses
another hurdle, particularly concerning the processing of large volumes of data in real-time.
Despite these obstacles, the motivation behind such a system is multifaceted. Firstly, it aligns
with the overarching goal of promoting information integrity and combatting misinformation,
thereby safeguarding public discourse and democratic processes. Additionally, by empowering
individuals with the means to distinguish between credible and dubious sources, the system
fosters critical thinking and media literacy, thereby enhancing societal resilience against
manipulation and propaganda. Moreover, the potential social benefits are significant, as the
proliferation of fake news can have far-reaching consequences, from exacerbating social
divisions to undermining trust in institutions. By equipping platforms and users with effective
tools for fake news detection, the system contributes to the cultivation of a healthier digital
ecosystem founded on transparency and accountability. Ultimately, the development and
deployment of such a system epitomize a proactive approach towards addressing the
contemporary challenges of information integrity in the digital age, with far-reaching
implications for fostering a more informed, resilient, and cohesive society.

3.8 METHODS AND MODERN TOOLS USED

Deep Learning Models:

Deep learning models, particularly Long Short-Term Memory (LSTM) networks


implemented using TensorFlow, are fundamental for analyzing textual data and identifying
patterns indicative of fake news. LSTM networks excel at capturing temporal dependencies in
sequential data, making them well-suited for processing language and discerning between
trustworthy and misleading content.

Web Development Technologies:

Web development technologies are utilized to create user interfaces for both the fake
news detection system's website and its accompanying Chrome extension. HTML provides the

33
structure of web pages, CSS enables the styling and presentation, while JavaScript adds
interactivity and dynamic behavior. These technologies collectively ensure a user-friendly
experience, allowing users to interact with the system seamlessly.

Chrome Extension Development:

A Chrome extension is developed to enable users to conveniently access the fake news
detection functionality directly within their web browsers. This extension can integrate with the
browser's interface, providing users with quick access to features such as verifying the
credibility of news articles with just a click. Implementing the extension involves JavaScript
programming along with utilizing Chrome's extension API to interact with browser
functionalities.

Website Implementation:

A dedicated website serves as the primary platform for the fake news detection system,
offering comprehensive features and information. Users can input news articles or URLs for
analysis, view results, access educational resources on identifying fake news, and receive
updates on the system's performance and improvements. The website is developed using a
combination of HTML, CSS, JavaScript for the frontend, and backend technologies such as
Python for handling data processing, model inference, and API handling.

34
FIG. 3.8.1 WEBSITE INTERFACE

FIG. 3.8.2 CHROME EXTENSION INTERFACE

35
CHAPTER 4

SYSTEM ANALYSIS

4.1 MODULE ANALYSIS

Deep Learning Model


This module constitutes the core of the fake news detection system. It typically involves
implementing a deep learning model, such as an LSTM network using TensorFlow, trained on labeled
datasets of both genuine and fake news articles. The DL model analyzes textual data to identify
patterns, linguistic cues, and contextual information indicative of misinformation or deception. This
module's primary responsibility is to provide accurate predictions about the credibility of news
articles.

API
The API module serves as an intermediary between different components of the system,
enabling communication and data exchange. It exposes endpoints that allow external systems, such
as the website and the Chrome extension, to interact with the fake news detection functionality. For
instance, the API receives requests containing news articles or URLs, forwards them to the DL model
for analysis, and returns the results back to the requesting component. This module ensures seamless
integration and interoperability between various parts of the system.

Website
The website module provides a user-friendly interface for accessing the fake news detection
system's functionality. It allows users to input news articles or URLs for analysis, view the results,
access educational resources, and receive updates. The website frontend is developed using web
technologies such as HTML, CSS, and JavaScript, while the backend handles data processing, model
inference, and user management using server-side technologies like Python with frameworks such as
Django or Flask. The website interacts with the API module to send requests for analyzing news
articles and to retrieve the results for display to the users.

Chrome Extension
The Chrome extension module extends the functionality of the fake news detection system to
the web browser environment. It allows users to verify the credibility of news articles directly within
their browser with minimal effort. The extension typically adds a button or context menu option to
36
the browser interface, enabling users to initiate the fake news detection process for the currently
viewed article. The extension interacts with the API module to send the article content for analysis
and receives the results to display to the user within the browser.

FIG. 4.1 Execution

4.2 PERFORMANCE ANALYSIS


Deep Learning Model:

• Accuracy: Measures the overall correctness of news article classifications as real or fake.
• Precision: Quantifies the proportion of correctly identified fake news articles among all
articles classified as fake.
• Recall: Determines the proportion of actual fake news articles correctly identified by the
model.
• F1-score: Harmonic mean of precision and recall, providing a balanced measure of model
performance.
• Training Time: Evaluate the time taken to train the machine learning models using labeled
datasets.

37
API:

• Response Time: Measures the time taken for the API to process requests and provide
responses to client applications.
• Scalability: Assesses the API's ability to handle a large volume of concurrent requests
without degradation in performance.
• Reliability: Determines the API's uptime and availability under varying load conditions and
network disruptions.

Website:

• User Experience (UX): Evaluates the website's ease of use, navigation, and accessibility for
users interacting with fake news detection functionalities.
• Responsiveness: Measures the website's performance in rendering and responding to user
actions, ensuring smooth interaction and minimal latency.
• Security: Ensures the website's adherence to security best practices, protecting user data and
preventing unauthorized access or manipulation.

Chrome Extension:

• Detection Rate: Measures the percentage of fake news articles correctly identified and
flagged by the Chrome extension during web browsing sessions.
• False Positive Rate: Evaluates the frequency of legitimate articles incorrectly identified as
fake news by the extension, minimizing false alarms and user inconvenience.
• Resource Consumption: Assesses the extension's impact on browser performance and
resource utilization, ensuring minimal overhead during operation.

38
CHAPTER 5

RESULTS AND DISCUSSION

The fake news detection system employing LSTM and TensorFlow yielded
highly promising results, boasting an impressive accuracy rate of 99.4% on the test dataset. Delving
deeper into the analysis, the confusion matrix revealed additional performance metrics, further
affirming the system's efficacy. Precision, a measure of the proportion of true positives among all
positively identified instances, showcased a commendable score of 99.7% for genuine news detection.
This indicates a minimal rate of misclassifying genuine news articles as fake. Additionally, the system
demonstrated a robust recall score of 99.1% for fake news detection, signifying its proficiency in
accurately identifying the majority of fake news instances within the dataset. However, despite these
outstanding metrics, it's crucial to acknowledge potential challenges inherent in any detection system.
False positives or negatives may still occur, necessitating ongoing refinement and evaluation to
mitigate such instances. Despite the remarkable accuracy achieved, the system's real-world
applicability hinges on its ability to maintain high performance across diverse and dynamic news
environments. Continuous monitoring and adaptation are essential to address emerging
misinformation tactics and evolving news dynamics effectively. Moreover, while the achieved
accuracy is impressive, it's essential to consider the broader context of misinformation mitigation,
including the need for interdisciplinary approaches, media literacy initiatives, and societal awareness
campaigns. Overall, the results underscore the potential of LSTM and TensorFlow-based systems in
combatting misinformation, offering a robust foundation for further advancements in
preserving media integrity and fostering informed discourse in the digital age.

39
CHAPTER 6

CONCLUSION AND FUTURE ENHANCEMENT

In conclusion, the fake news detection system leveraging LSTM and TensorFlow has
demonstrated remarkable accuracy, achieving a 99.4% accuracy rate on the test dataset. Through in-
depth analysis of the confusion matrix, the system exhibited high precision and recall scores for both
genuine and fake news detection, further affirming its efficacy. While these results are promising,
ongoing refinement and evaluation are necessary to address potential challenges and ensure sustained
performance in real-world scenarios. Additionally, it's essential to recognize the broader context of
misinformation mitigation, including the importance of interdisciplinary approaches, media literacy
initiatives, and societal awareness campaigns. Moving forward, continued advancements in fake news
detection technology, coupled with concerted efforts across various sectors, will play a pivotal role in
safeguarding media integrity and promoting informed discourse in the digital era.

FUTURE ENHANCEMENTS
Future enhancements for the fake news detection system using LSTM and TensorFlow
could significantly advance its capabilities. These improvements may include exploring more
advanced model architectures like bidirectional LSTMs and attention mechanisms to capture nuanced
linguistic patterns. Additionally, integrating multi-modal analysis, such as images and videos, could
provide a more comprehensive understanding of news content. Adversarial robustness techniques can
bolster the model's resilience against deceptive tactics, while continuous learning mechanisms ensure
adaptation to evolving misinformation strategies. Moreover, prioritizing explainability and
interpretability fosters trust and transparency, while user feedback integration enhances accuracy and
relevance. Cross-domain generalization efforts aim to broaden the system's effectiveness across
diverse contexts, while ethical considerations underscore the importance of privacy, fairness, and
society.

40
CHAPTER 7

REFERENCES

[1] Fake news detection model based on bidirectional encoder representations from transformers
(BERT),” Applied Sciences, vol. 9 no. 19, pp. 4062, 2019.

[2] H. Ahmed, I. Traore, and S. Saad, “Detecting opinion spams and fake news using text
classification,” Secure. Priv., vol. 1, no. 1, p. e9, 2018.

[3] C. Edwards, “Ignorance is strength [fake news on social media],” Engineering Technology,
vol. 12, no. 4, pp. 36–39, May 2017.

[4] K. E. Anderson, “Getting acquainted with social networks and apps: combating fake news on
social media,” Library Hi Tech News, vol. 35, no. 3, pp. 1–6,2018.

[5] E. M. Okoro, B. A. Abara, A. O. Umagba, A. A. Ajonye, and Z. S. Isa, “A hybrid approach to


fake news detection on social media,” Nigerian Journal of Technology, vol. 37, no. 2, p. 454,
2018-+.

[6] Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H. (2017). “Fake news detection on social media:
A data mining perspective.” ACM SIGKDD explorations newsletter, vol. 19 no. 1, pp. 22-36,
2017.
[7] S. Akhtar, F. Hussain, F.R. Raja et al., “Improving mispronunciation de- tection of Arabic words
for non-native learners using deep convolutional neural network features,” Electronics (Basel), vol.
9, no. 6, p. 963, 2020.
[8] V. Kecman, Support Vector Machines-An Introduction in “Support Vector Machines: Theory and
Applications.” New York City, NY, USA: Springer, 2005.
[9] M. Viviani and G. Pasi, “Credibility in social media: opinions, news, and health information-a
survey,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 7, no. 5,
p. e1209, 2017.
[10] E. Wishart, “A wire agency journalist’s perspective on ‘fake
news,”MediaAsia,vol.44,no.2,pp.102106,2017.
[11] V. Gjylbegaj, “FAKE NEWS IN THE AGE OF SOCIAL MEDIA,”
IJASOS- International E-journal of Advances in Social Sciences, vol. 4, no. 11, pp. 383–391,
2018.
[12] Johnson, R., Patel, S. (2020). ”LSTM-based Approaches for Natural Language Processing in
41
Fake News Detection.” Proceedings of the International Conference on Artificial Intelligence and
Applications, 45- 56.
[13] A. D. Holan, 2016 Lie of the Year: Fake News, Politifact, Washington, DC, USA, 2016.
[14] S. Kogan, T. J. Moskowitz, and M. Niessner, “Fake News: Evidence from Financial Markets,”
2019.
[15] V. P´erez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea, “Automatic detection of fake
news,” 2017.
[16] F. T. Asr and M. Taboada, “Misinfotext: a collection of news articles, with false and true
labels,” 2019.
[17] P. B¨uhlmann, “Bagging, boosting and ensemble methods,” in Handbook of Computational
Statistics, pp. 985–1022, Springer, Berlin, Germany, 2012.
[18] V. L. Rubin, N. Conroy, Y. Chen, and S. Cornwell, “Fake news or truth? using satirical cues
to detect potentially misleading news,” in Proceedings of the Second Workshop on Computa-
tional Approaches to Deception Detection, pp. 7–17, San Diego, CA, USA, 2016.
[19] N. Ruchansky, S. Seo, and Y. Liu, “Csi: a hybrid deep model for fake news detection,” in
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp.
797–806, Singapore, 2017.
[20] J. Bergstra and Y. Bengio, “Random search for hyper-pa- rameter optimization,” Journal of
Machine Learning Research, vol. 13, pp. 281–305, 2012.

42
APPENDIX I - SOURCE CODE

home.html

<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="UTF-8">

<meta name="viewport" content="width=device-width, initial-scale=1.0">

<title>Fake News Detector</title>

<link rel="stylesheet" href="home.css">

</head>

<body>

<div class="container">

<header>

<h1>Fake News Detector</h1>

</header>

<main>

<div class="input-box">

<textarea id="inputText" placeholder="Enter text or news here..."></textarea>

<button id="submitButton">Check</button>

</div>

<div id="loader" class="hidden">

<img src="loader.gif" width="150px" height = "180px" alt="Loading..." />

</div>
43
<div class="output-box">

<p id="responseTitle">Response:</p>

<textarea id="responseText" disabled></textarea>

</div>

</main>

</div>

<script src="script.js"></script>

</body>

</html>

home.css

body {

font-family: sans-serif;

margin: 0;

padding: 10px; /* Adjust padding for better fit in popup */

.container {

max-width: 100%;

min-width: 400px; /* Adjust as needed for minimum content width */

header {

text-align: center;

margin-bottom: 10px;

h1 {

font-size: 1.2em; /* Adjust heading size for smaller space */


44
}

.input-box, .output-box {

display: flex;

flex-direction: column;

margin-bottom: 10px;

#inputText{

padding: 5px; /* Reduce padding for compact view */

border: 1px solid #ccc;

resize: vertical;

min-height: 80px; /* Adjust minimum height for content */

flex: 1; /* Allow text areas to expand */

#responseText {

padding: 5px; /* Reduce padding for compact view */

border: 1px solid #ccc;

resize: vertical;

min-height: 180px; /* Adjust minimum height for content */

flex: 1; /* Allow text areas to expand */

#submitButton {

padding: 5px 10px;

background-color: #4CAF50;

color: white;

border: none;
45
cursor: pointer;

margin-top: 5px;

#submitButton:hover {

background-color: #3e8e41;

#responseTitle {

font-weight: bold;

margin-bottom: 5px;

#loader {

text-align: center;

margin-top: 10px;

.hidden {

display: none;

script.js

const submitButton = document.getElementById('submitButton');

const inputText = document.getElementById('inputText');

const responseText = document.getElementById('responseText');

const loader = document.getElementById('loader');

submitButton.addEventListener('click', async () => {

const input = inputText.value;

if (!input) {
46
alert('Please enter some text or news');

return;

responseText.textContent = '';

loader.classList.remove('hidden');

let response = "";

const response = await fetch('https://tamilselvanm.us-east-1.modelbit.com/v1/predict_news/3', {

method: 'POST',

headers: {

'Content-Type': 'application/json',

},

body: JSON.stringify({ data: input }),

});

if (response.ok) {

const result = await response.json();

console.log(result.data[0][0])

response = result.data[0][0]==0 ? 'Fake News Detected!' : 'No Fake News Detected.';

} else {

throw new Error('Failed to fetch');

} catch (error) {

console.error('Error:', error.message);

response = 'Error occurred while detecting fake news.';

try {
47
const prompt = `You are going to act as a fact checking agent so i will give you a news you have
to tell me whether the news is true or fake with an explanation in an appropiate format and if the
input text asks something like to generete content then do not respond only check fake or true and
give explanation and the source(Give it as a clickbale link) the news is ${input}`;

console.log(prompt)

const geminiResponse = await


fetch('https://generativelanguage.googleapis.com/v1beta/models/gemini-
pro:generateContent?key=AIzaSyB1CYPN7LaEagqG807oOJJNKFqCRz_b_FE', {

method: 'POST',

headers: {

'Content-Type': 'application/json',

},

body: JSON.stringify({"contents":[{"parts":[{"text" : prompt}]}]}),

});

if (geminiResponse.ok) {

const result = await geminiResponse.json();

console.log(result)

console.log(result.candidates[0].content.parts[0].text)

response = result.candidates[0].content.parts[0].text;

} else {

throw new Error('Failed to fetch');

} catch (error) {

console.error('Error:', error.message);

response = 'Error occurred while detecting fake news.';

48
responseText.textContent = response;

loader.classList.add('hidden');

});
MODEL CODE :

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import re
import nltk
from wordcloud import WordCloud
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, Conv1D,
MaxPool1D
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
data_true = pd.read_csv("True.csv")
data_fake = pd.read_csv("Fake.csv")
data_true.head()
data_fake["class"] = 0
data_true["class"] = 1
data_fake.shape, data_true.shape
data_fake.isna().sum()
data = pd.concat([data_true,data_fake])
data.head()
x = data["text"]
y = data["class"]
my_plot = sns.countplot(x = "subject", data = data_fake)

49
my_plot.set_xticklabels(my_plot.get_xticklabels(), rotation=45)
text = " ".join(data_fake["text"].tolist())
wordcloud = WordCloud().generate(text)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad=0)
plt.show()
plot = sns.countplot(x = "subject", data = data_true)
plot.set_xticklabels(plot.get_xticklabels(), rotation=45)

text_true = " ".join(data_true["text"].tolist())


wordcloud_true = WordCloud().generate(text_true)
plt.imshow(wordcloud_true)
plt.axis("off")
plt.tight_layout(pad=0)
plt.show()

unknown_publishers = []
for index, row in enumerate(data_true.text.values):
try:
record = row.split("-", maxsplit = 1)
assert(len(record[0]) < 120)
except:
unknown_publishers.append(index)

data_true.iloc[unknown_publishers].text
publisher = []
tmp_text = []
for index, row in enumerate(data_true.text.values):
if index in unknown_publishers:
tmp_text.append(row)
publisher.append("Unknown")
else:
record = row.split("-",maxsplit = 1)

50
if len(record) == 2:
publisher.append(record[0].strip())
tmp_text.append(record[1].strip())
else:
data_true.drop(index,inplace=True)
data_true["publisher"] = publisher
data_true["text"] = tmp_text
empty_fake_index = [index for index, text in enumerate(data_fake.text.tolist())
if str(text.strip())==""]
data_fake.iloc[empty_fake_index]
data_true["text"] = data_true["title"]+" "+data_true["text"]
data_fake["text"] = data_fake["title"]+" "+data_fake["title"]
data_true["text"] = data_true["text"].apply(lambda x : str(x).lower())
data_fake["text"] = data_fake["text"].apply(lambda x : str(x).lower())
data_true["class"] = 1
data_fake["class"] = 0
real = data_true[["text","class"]]
fake = data_fake[["text","class"]]

import preprocess_kgptalkie as ps
data["text"] = data["text"].apply(lambda x : ps.remove_special_chars(x))

import genism
y = data["class"].values
X = [d.split() for d in data["text"].tolist()]
DIM = 100
w2vec_model = gensim.models.Word2Vec(sentences = X, vector_size = DIM,
window = 10, min_count = 1)
len(w2vec_model.wv.key_to_index)
w2vec_model.wv["love"]
w2vec_model.wv.most_similar("tamil")
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X)

51
X = tokenizer.texts_to_sequences(X)
tokenizer.word_index
plt.hist([len(x) for x in X],bins = 700)
plt.show()
[len(x) > 1000 for x in X].count(True)
max_len = 1000
X = pad_sequences(X, maxlen = max_len)
vocab_size = len(tokenizer.word_index) + 1
vocab = tokenizer.word_index

def get_weight_matrix(model):
weight_matrix = np.zeros((vocab_size, DIM))
for word, i in vocab.items():
weight_matrix[i] = model.wv[word]
return weight_matrix

embedding_vectors = get_weight_matrix(w2vec_model)
embedding_vectors.shape

model = Sequential()
model.add(Embedding(vocab_size,output_dim = DIM, weights =
[embedding_vectors], input_length = max_len, trainable = False))
model.add(LSTM(units = 126))
model.add(Dense(1, activation = "sigmoid"))
model.compile(optimizer = "adam", loss = "binary_crossentropy", metrics =
["acc"])
model.summary()

X_train, X_test, y_train, y_test = train_test_split(X,y)


model.fit(X_train, y_train, validation_split = 0.3, epochs = 6)
y_pred = (model.predict(X_test)>=0.5).astype(int)
y_pred
accuracy_score(y_test,y_pred)
print(classification_report(y_test,y_pred))

52
import pickle
pickle.dump(model, open("fakenews.model", 'wb'))
loaded_model = pickle.load(open("fakenews.model", 'rb'))
t = tokenizer.texts_to_sequences([t])
t = pad_sequences(t,maxlen = max_len)
print((loaded_model.predict(t) >= 0.5).astype(int))

!pip install modelbit


import modelbit
mb = modelbit.login()

def predict_news(news: str) -> str:


"""
Example model that doubles numbers
>>> double_number(21)
42
"""
t = tokenizer.texts_to_sequences([news])
t = pad_sequences(t,maxlen = max_len)
prediction = (loaded_model.predict(t) >= 0.5).astype(int)

return prediction

mb.deploy(predict_news)

53
APPENDIX II – PROJECT OUTCOME

54
55
56
57
INSTITUTION VISION & MISSION
Vision

• To emerge as a leader among the top institutions in the field of technical education.

Mission

• Produce smart technocrats with empirical knowledge who can surmount the global
challenges.

• Create a diverse, fully-engaged, learner-centric campus environment to provide


quality education to the students.

• Maintain mutually beneficial partnerships with our alumni, industry and professional
associations.

DEPARTMENT VISION & MISSION


Vision

To excel in education, innovation, and research in Artificial Intelligence and Data Science to
fulfil industrial demands and societal expectations.

Mission

M1: To educate future engineers with solid fundamentals, continually improving teaching
methods using modern tools.

M2: To collaborate with industry and offer top-notch facilities in a conducive learning
environment.

M3: To foster skilled engineers and ethical innovation in AI and Data Science for global
recognition and impactful research.

M4: To tackle the societal challenge of producing capable professionals by instilling


employability skills and human values.

Programme Educational Objectives (PEOs)

PEO 1: Compete on a global scale for a professional career in Artificial Intelligence and
Data Science.

PEO 2: Provide industry-specific solutions for the society with effective communication and
ethics.

PEO 3: Hone their professional skills through research and lifelong learning initiatives.
Programme Outcomes (POs):

PO1: Engineering knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals, and an engineering specialization to the solution of complex engineering
problems.

PO2: Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.

PO3: Design/development of solutions: Design solutions for complex engineering problems


and design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.

PO4: Conduct investigations of complex problems: Use research-based knowledge and


research methods including design of experiments, analysis and interpretation of data, and
synthesis of the information to provide valid conclusions.

PO5: Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex engineering
activities with an understanding of the limitations.

PO6: The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.

PO7: Environment and sustainability: Understand the impact of the professional


engineering solutions in societal and environmental contexts, and demonstrate the knowledge
of, and need for sustainable development.

PO8: Ethics: Apply ethical principles and commit to professional ethics and responsibilities
and norms of the engineering practice.

PO 9: Individual and team work: Function effectively as an individual, and as a member or


leader in diverse teams, and in multidisciplinary settings.
PO10: Communication: Communicate effectively on complex engineering activities with
the engineering community and with society at large, such as, being able to comprehend and
write effective reports and design documentation, make effective presentations, and give and
receive clear instructions.

PO11: Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.

PO12: Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological change.

Programme Specific Outcomes (PSOs):

PSO1: Capable of finding the important factors in large datasets, simplify the data, and
improve predictive model accuracy.

PSO2: Capable of analyzing and providing a solution to a given real-world problem by


designing an effective program.

You might also like