DOCUMENTATION

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 83

ABSTRACT

This project aims to tackle the issue of fake news and misinformation in
social media networks by utilizing the power of Natural Language Processing
(NLP) and blockchain technology. NLP is a branch of artificial intelligence that
enables computers to understand human language, while blockchain is a secure
digital ledger that allows multiple parties to access and store information. The
proposed system combines these two technologies and employs several
approaches to detect fake news, including NLP with naive bayes classification,
reinforcement learning, and blockchain. The objective of this system is to create
a secure platform that accurately predicts and identifies fake news in social
media networks. To measure its performance, the system uses various metrics,
such as accuracy, precision, recall, and f1-score. To train and test the system, a
liar dataset is used, which includes various types of fake news. The proposed
system includes five modules. The first module involves loading the liar
training and testing datasets. The second module uses NLP and naive bayes
classification to detect fake news. The third module employs reinforcement
learning, which enables the system to learn from its past mistakes and improve
its accuracy in identifying fake news. The fourth module utilizes blockchain
technology to securely store and access the system's data. Finally, the fifth
module compares the system's performance based on the aforementioned
metrics. In summary, this project aims to leverage NLP and blockchain
technology to create a secure platform that accurately detects fake news in
social media networks. The system employs various machine learning
techniques to segment and identify fake news and utilizes blockchain to store
and access data securely. Its performance is measured using several metrics, and
the system includes five modules to achieve its objectives.
CHAPTER 1

INTRODUCTION

1.1 MACHINE LEARNING


Machine Learning is a branch of science that involves developing
algorithms and statistical models which allow computer systems to accomplish
specific tasks without relying on explicit instructions, but rather on the
recognition of patterns and inferences. It is commonly regarded as a subset of
artificial intelligence. The primary objective of machine learning algorithms is
to create a mathematical model based on sample data, referred to as "training
data", to make predictions or decisions without requiring explicit programming.
Machine learning algorithms are extensively used across various applications,
including email filtering and computer vision, where it may be challenging or
impractical to create conventional algorithms that can effectively execute the
task.

The field of machine learning is closely connected to computational


statistics, which is concerned with using computers to make predictions. The
study of mathematical optimization provides techniques, principles, and
domains of application to the field of machine learning. Within the realm of
machine learning, data mining is a distinct area of research that emphasizes
exploratory data analysis through unsupervised learning. In business settings,
machine learning is often referred to as predictive analysis due to its ability to
forecast outcomes.

1.1.1 MACHINE LEARNING TASK

Machine learning tasks can be categorized into several major types. In


supervised learning, an algorithm creates a mathematical model using a dataset
that comprises both the inputs and the desired outputs. For instance, if the task
involves detecting a particular object in an image, the supervised learning
algorithm's training data would include images with and without that object (the
input), and each image would be labeled (the output) to indicate whether it
contained the object. In certain situations, the input data may only be partially
available or subject to particular feedback, which is where semi-supervised
learning comes into play. Semi-supervised learning algorithms generate
mathematical models using incomplete training data, where a portion of the
input samples lacks labels.

Classification algorithms and regression algorithms are types of


supervised learning. Classification algorithms are used when the outputs are
restricted to a limited set of values. For a classification algorithm that filters
emails, the input would be an incoming email, and the output would be the
name of the folder in which to file the email. For an algorithm that identifies
spam emails, the output would be the prediction of either "spam" or "not spam",
represented by the Boolean values true and false. Regression algorithms are
named for their continuous outputs, meaning they may have any value within a
range. Examples of a continuous value are the temperature, length, or price of
an object.

In unsupervised learning, the algorithm builds a mathematical model


from a set of data that contains only inputs and no desired output labels.
Unsupervised learning algorithms are used to find structure in the data, like
grouping or clustering of data points. Unsupervised learning can discover
patterns in the data, and can group the inputs into categories, as in feature
learning. Dimensionality reduction is the process of reducing the number of
"features", or inputs, in a set of data.

Active learning algorithms access the desired outputs (training labels) for
a limited set of inputs based on a budget and optimize the choice of inputs for
which it will acquire training labels. When used interactively, these can be
presented to a human user for labeling. Reinforcement learning algorithms are
given feedback in the form of positive or negative reinforcement in a dynamic
environment and are used in autonomous vehicles or in learning to play a game
against a human opponent. Other specialized algorithms in machine learning
include topic modeling, where the computer program is given a set of natural
language documents and finds other documents that cover similar topics.
Machine learning algorithms can be used to find the Unobservable probability
density functions in density estimation problems. Meta learning algorithms
learn their own inductive bias based on previous experience. In developmental
robotics, robot learning algorithms generate their own sequences of learning
experiences, also known as a curriculum, to cumulatively acquire new skills
through self-guided exploration and social interaction with humans. These
robots use guidance mechanisms such as active learning, maturation, motor
synergies, and imitation.

1.2 RELATION TO DATAMINING


Machine learning and data mining employ many of the same methods
and share significant overlap. However, while machine learning is focused on
prediction based on established properties learned from the training data, data
mining seeks to uncover previously unknown properties in the data through the
analysis stage of knowledge discovery in databases. Although data mining
utilizes many machine learning methods, it is directed toward different
objectives. On the other hand, machine learning also leverages data mining
methods as "unsupervised learning" or as a preprocessing step to enhance
learner accuracy. Much of the confusion between these two fields (which often
have distinct conferences and journals, although ECML PKDD is a notable
exception) arises from their fundamental assumptions. In machine learning,
performance is commonly evaluated based on the ability to replicate known
knowledge, whereas in knowledge discovery and data mining (KDD), the
principal goal is to discover previously unknown knowledge. When assessed
using established knowledge, unsupervised methods will typically be surpassed
by other supervised techniques, whereas supervised approaches cannot be
applied in many KDD tasks due to the lack of training data.

1.2.1 RELATION TO OPTIMIZATION

Optimization is closely linked to machine learning, as many learning


problems involve minimizing a loss function on a training set of examples. Loss
functions measure the difference between the predictions generated by the
model under training and the actual problem instances. For example, in
classification problems, the objective is to assign labels to instances, and models
are trained to accurately predict the pre-assigned labels of a set of examples.

The difference between the two fields arises from the goal of
generalization: while optimization algorithms can minimize the loss on a
training set, machine learning is concerned with minimizing the loss on unseen
samples.

1.2.2 RELATION TO STATISTICS

Machine learning and statistics share many methods, but their primary
goals are distinct: statistics is concerned with drawing population inferences
from a sample, while machine learning seeks to discover predictive patterns that
can be applied more broadly. According to Michael I. Jordan, the ideas of
machine learning, including methodological principles and theoretical tools,
have a long history in statistics. He has suggested using the term "data science"
as a placeholder to refer to the entire field. Leo Breiman distinguished between
two statistical modeling paradigms: data models and algorithmic models, where
"algorithmic models" refer to machine learning algorithms such as Random
Forest.
CHAPTER 2

LITERATURE SURVEY

2.1 "Fake news detection using machine learning" Baarir, N. F., and
Djeffal, A. (2021)

This paper investigates the efficacy of various machine learning


algorithms, including Naive Bayes, SVM, Random Forest, and Logistic
Regression, for detecting fake news. The study aims to evaluate the
performance of these algorithms in detecting fake news accurately and
precisely. The authors used a dataset to train and test the models and obtained
high accuracy and precision in evaluating the different algorithms. However,
one of the main limitations of this study was that the dataset used for training
and testing was small. This might limit the generalizability of the results of the
study to larger datasets. Despite this limitation, the study provides valuable
insights into the use of different machine learning algorithms for fake news
detection, which can be helpful for researchers and practitioners working in the
field of fake news detection.

2.2 "A smart system for fake news detection using machine learning," Jain,
Shakya, Khatter, and Gupta (2019)

This paper explores the use of machine learning algorithms to detect fake
news. The study proposes a smart system that uses Naive Bayes, Decision Tree,
and Random Forest algorithms to effectively and efficiently identify fake news.
The proposed system was found to be highly effective in detecting fake news
from various sources. However, the authors note that one of the major
drawbacks of the system is that it requires large datasets for effective training.
This limitation highlights the importance of having access to high-quality and
diverse training data to enhance the system's performance. Overall, this study
represents a valuable contribution to the growing body of research on using
machine learning to combat fake news.

2.3 "Fake news detection using deep learning models: A novel approach"
Kumar, Asthana, Upadhyay, Upreti, and Akbar, (2020)

This paper presents a novel approach to detecting fake news using deep
learning models such as Convolutional Neural Network (CNN) and Long Short-
Term Memory (LSTM). The study demonstrates that using these models can
lead to high accuracy and precision in detecting fake news. This approach can
potentially be very useful in addressing the growing problem of fake news on
social media platforms. However, one of the major limitations of the study is
that it requires a large amount of data and computing power, which may be
difficult to obtain for some applications. Furthermore, while the results are
promising, it is still unclear how this approach would perform on larger and
more diverse datasets. Despite these limitations, the paper presents an important
contribution to the field of fake news detection, highlighting the potential of
deep learning models in addressing this important problem.

2.4 "Supervised learning for fake news detection" Reis et al. (2019)

This paper explores the effectiveness of several machine learning


algorithms in detecting fake news. The study evaluates the performance of
Naive Bayes, Decision Tree, Random Forest, and Support Vector Machines
(SVM) using a labeled dataset. The algorithms were trained on different feature
sets, and the study shows that they are effective in identifying fake news. The
paper provides useful insights on the use of supervised learning methods for
fake news detection, as it evaluates the performance of different algorithms in
terms of accuracy and precision. However, the paper has limited analysis on the
impact of feature selection, which is a major drawback. Despite this, the study
provides valuable information for researchers and practitioners who are
interested in using machine learning techniques to detect fake news.

2.5 "Spotfake: A multi-modal framework for fake news detection" by


Singhal et al. (2019)

This paper proposes the use of multiple machine learning algorithms such
as Convolutional Neural Network (CNN), Bidirectional Long Short-Term
Memory (LSTM), and Support Vector Machine (SVM) to detect fake news
from different sources, including text, image, and social network data. The
paper demonstrates the effectiveness of the proposed framework in detecting
fake news with high accuracy and precision. However, the study also highlights
the need for a large amount of training data for effective detection and the
challenge of dealing with increasingly sophisticated fake news. Despite these
limitations, the multi-modal approach provides a promising avenue for the
development of robust fake news detection systems. Overall, the paper provides
useful insights into the use of multiple modalities for fake news detection and
lays the foundation for future research in this area.
CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM


The existing system of this project relies heavily on machine learning and
natural language processing techniques to detect fake news. One disadvantage
of this approach is that it can be difficult to differentiate between fake news and
genuine news if the fake news has been designed to appear genuine.
Additionally, the ML algorithms can also produce false positives or negatives,
leading to incorrect classifications of news as fake or genuine.

Another limitation of the existing system is that it does not address the
issue of trust in the news sharing process. Without a reliable way of identifying
trustworthy sources, it can be difficult to ensure that the news being shared is
accurate and reliable. This can lead to further misinformation being shared,
particularly in the absence of any mechanism to hold users accountable for
sharing fake news.

Finally, the existing system does not address the issue of political or
ideological bias in the news sharing process. If the news shared on social media
platforms is biased towards a particular political or ideological viewpoint, it can
lead to further polarization and division in society.

3.1.1 DISADVANTAGES OF EXISTING SYSTEM:

 ML and NLP techniques may struggle to differentiate between genuine


and fake news that have been designed to appear similar
 ML algorithms may produce false positives or negatives, leading to
incorrect classification of news
 The system does not address the issue of trust in the news sharing process
 There is no reliable way to identify trustworthy sources of news
3.2 PROPOSED SYSTEM:
The proposed system is designed to tackle the problem of fake news by
using a combination of advanced technologies. The system is intended to be a
secure platform that can be used to detect and predict fake news on social media
networks. The platform uses natural language processing (NLP) techniques to
analyze the linguistic patterns of news articles and determine whether they are
real or fake. Machine learning techniques are also utilized, including naive
bayes classification and reinforcement learning, to identify and classify fake
news articles.

One of the main advantages of the proposed system is its use of


blockchain technology. Blockchain technology provides a secure and tamper-
proof way of recording digital transactions, which can be used to verify the
authenticity of news articles. By using blockchain technology, the proposed
system can provide digital content authority proof, which means that users can
be assured that the news articles they are reading are authentic and have not
been tampered with. This can help to increase trust and credibility in the news
sharing process.

In addition to providing digital content authority proof, the proposed


system also utilizes blockchain technology to improve the security of the
platform. By using blockchain technology, the platform can ensure that all
transactions are secure and cannot be tampered with. This can help to prevent
unauthorized access to the platform and protect users' data and information.

Overall, the proposed system offers several advantages over existing


systems for detecting fake news. By using a combination of NLP, machine
learning, and blockchain technology, the platform aims to provide a secure and
reliable way of detecting and predicting fake news in social media networks.
3.2.1 ADVANTAGES OF PROPOSED SYSTEM:

 Secure platform: The proposed system uses blockchain technology to


improve the platform's security and provide a framework for digital
content authority proof.
 Accurate fake news detection: The system uses natural language
processing techniques and machine learning algorithms to accurately
detect fake news.
 Performance evaluation: The system's performance is evaluated using
various metrics, including accuracy, precision, recall, and f1-score, to
ensure accurate detection.
CHAPTER 4

SYSTEM REQUIRMENTS

4.1 HARDWARE REQUIREMENTS:

Processor : Intel(R) Pentium(R) CPU N3710 @ 1.60ghz

RAM : 4.00 GB (3.83 GB usable)

Cache : 1.00 GB
Memory

Hard Disk : 500 GB

Monitor : Acer Aspire 3, 15.6 Inches

Keyboard : Acer DK.USB1P.03L USB Keyboard

Mouse : Logitech B100 Wired USB Mouse

4.2 SOFTWARE REQUIREMENTS:

Operating System : Windows 10

System Type : 64-bit OS

Programming : JAVA
Language
IDE : Apache Netbeans IDE 15

4.3 INTRODUCTION OF JAVA:


Java is a high-level, object-oriented programming language that was first
released in 1995 by Sun Microsystems (now owned by Oracle Corporation). It
was designed to be platform-independent, meaning that it can run on any
operating system with a Java Virtual Machine (JVM) installed. Java is widely
used for developing web, mobile, and desktop applications, as well as for
creating embedded systems and games.

4.3.1 HISTORY OF JAVA:

Java was originally developed by James Gosling at Sun Microsystems in


the early 1990s. The language was designed to be used for embedded systems,
but its creators soon realized its potential for web development. The first public
release of Java was in 1995, and it quickly gained popularity due to its
portability, security, and ease of use. In 2009, Oracle Corporation acquired Sun
Microsystems, and Java is now maintained by Oracle.

4.3.2 JAVA FEATURES:

 Platform independence: Java code can run on any platform that has a
JVM installed, which makes it highly portable.
 Object-oriented programming: Java is a pure object-oriented
programming language, meaning that all code is written in terms of
classes and objects.
 Robust: Java is designed to be robust, with features such as automatic
memory management and exception handling.
 Security: Java has built-in security features, such as a security manager
and a bytecode verifier, that make it safer to use than other languages.
 Multithreading: Java supports multithreading, which allows multiple
threads to run concurrently and can improve application performance.
 High performance: Java's performance is optimized through the use of a
Just-In-Time (JIT) compiler, which can improve code execution speed.

4.3.3 LOCAL ENVIRONMENT SETUP:

To set up a local Java development environment, you will need to:

 Install the Java Development Kit (JDK) on your computer


 Set up your Java environment variables

4.3.4 WINDOWS INSTALLATION:

To install Java on Windows, you can follow these steps:

 Go to the Oracle Java download page and download the appropriate JDK
for your system
 Run the installer and follow the prompts to complete the installation
 Set up your Java environment variables by adding the path to your JDK
installation to your system's PATH variable

4.3.5 APACHE NETBEANS IDE AND ITS INSTALLATION:

Apache NetBeans is an open-source integrated development environment


(IDE) for Java, designed to make it easier to develop, test, and debug Java
applications. To install Apache NetBeans, you can follow these steps:

 Go to the Apache NetBeans download page and download the appropriate


installer for your system
 Run the installer and follow the prompts to complete the installation
 Once the installation is complete, you can launch NetBeans and start
developing Java applications.

CHAPTER 5

SYSTEM ARCHITECTURE DESIGN

5.1 SYSTEM ARCHITECTURE


Figure 5. 1: System Architecture

5.2 USECASE DIAGRAM:

A usecase is a list of steps that illustrate how a process will be carried out in a
system. The document walks you through the steps the actor will take to achieve
a goal. A usecase is written by a business analyst who meets with each user, or
actor, to write out the explicit steps in a process.

Figure 5. 2: Usecase diagram


5.3 DATA FLOW DIAGRAM:

5.3.1 LEVEL 0:

Liar Dataset Fake News


User Detection

Figure 5. 3: Level 0

5.3.2 LEVEL 1:

Liar NLP with


User Training Naïve Bayes
Dataset

Liar Testing Fake News


Dataset Detection

Figure 5. 4: Level 1
5.3.3 LEVEL 2:

Liar Reinforcement
User Training learning
Dataset

Liar Testing Fake News


Dataset Detection

Figure 5. 5: Level 2

5.3.4 LEVEL 3:

Liar
User Training Blockchain
Dataset

Liar Testing Fake News


Dataset Detection
Figure 5. 6: Level 3

5.4 SEQUENCE DIAGRAM:

A sequence is a word meaning "coming after or next, a series". It is used in mathematics and
other disciplines. In ordinary use it means a series of events, one following another. In maths,
a sequence is made up of several things put together, one after the other.

Figure 5. 7: Sequence diagram


5.5 COLLABORATION DIAGRAM:

A collaboration diagram, also known as a communication diagram, is an illustration of the


relationships and interactions among software objects in the Unified Modeling Language
(UML).

4 : Fake news detection using Blockchain()


3 : Fake news detection using reinforcement learning()
2 : Fake news detection using NLP()
System

1 : Liar training and testing dataset()

5 : Comparison()
User

Figure 5. 8: Collaboration diagram


5.6 ACTIVITY DIAGRAM:

An activity diagram visually presents a series of actions or flow of control in a system similar
to a flowchart or a data flow diagram. Activity diagrams are often used in business process
modeling. They can also describe the steps in a use case.

Figure 5. 9: Activity Diagram


CHAPTER 6

MODULE DESCRIPTION

6.1 MODULE DESCRIPTION:

 Load liar training dataset and testing dataset


 Fake news detection using natural language processing
 Fake news detection using reinforcement learning
 Fake news detection using Blockchain
 Comparison based on accuracy, precision, recall, and f1-score

6.1.1 Load liar training dataset and testing dataset:

The module for loading the liar dataset is a crucial step in the
development of a system for detecting fake news. Here are the main points
involved in this module:

 Identify the Liar dataset: The Liar dataset is a well-known dataset


containing statements labeled as false or true. This dataset is used to train
machine learning models to identify false statements in news articles. The
dataset is available online and can be downloaded for free.
 Obtain a separate dataset for testing: In addition to the Liar dataset, it's
important to obtain a separate dataset for testing the system's ability to
detect fake news. This dataset should be different from the Liar dataset to
ensure that the system is capable of detecting fake news across a range of
sources.
 Preprocess the datasets: Before loading the datasets into the system, it's
important to preprocess them to ensure that they are in a suitable format
for analysis. This might involve removing duplicates, cleaning up messy
data, and converting the data into a format that the machine learning
algorithms can understand.
 Load the datasets into the system: Once the datasets have been
preprocessed, they can be loaded into the system. This typically involves
reading the data from a file or database and storing it in a suitable data
structure for analysis. In some cases, it may be necessary to perform
additional preprocessing steps as the data is loaded.
 Split the datasets into training and testing sets: To train a machine
learning model, it's important to split the dataset into two parts: a training
set and a testing set. The training set is used to teach the model to
recognize fake news, while the testing set is used to evaluate the model's
accuracy.
 Verify the accuracy of the dataset: Finally, it's important to verify the
accuracy of the dataset by manually checking a sample of the statements
labeled as false or true. This helps to ensure that the dataset is reliable and
that the machine learning model is trained on accurate data.

6.1.2 Fake news detection using natural language:

This module involves several steps to detect fake news in the testing
dataset using natural language processing and naive bayes classification. The
steps involved in this module are:

 Segmentation: In this step, the text in the testing dataset is segmented into
individual sentences or phrases to make it easier to analyze and extract
features from the text.
 Cleaning: In this step, the text is preprocessed to remove any noise or
irrelevant information that could impact the accuracy of the fake news
detection. This involves removing stop words, punctuation, and other
non-informative words.
 Feature extraction: This step involves extracting relevant features from
the text to use as inputs for the fake news detection algorithm. This may
include word frequency, sentiment analysis, and other linguistic features
that can help identify patterns in the text that are characteristic of fake
news.
 Naive Bayes Classification: In this step, the extracted features are used to
train a naive bayes classification model that can accurately distinguish
between real and fake news. The model is trained using the labeled data
from the training dataset and then applied to the testing dataset to detect
fake news.

Some of the key advantages of using the naive bayes classification


algorithm for fake news detection include its simplicity and efficiency, as well
as its ability to handle large amounts of data and scale to different types of news
articles. Additionally, by using natural language processing techniques to
segment, clean, and extract features from the text, the accuracy of the fake news
detection algorithm can be greatly improved.

6.1.3 Fake news detection using reinforcement learning:

This module aims to improve the accuracy of fake news detection by


using a reinforcement learning approach. This module involves the following
steps:

 Ensemble machine learning classifiers: The module ensembles three


different machine learning classifiers, including Support Vector Machine
(SVM), Random Forest, and IBK (Instance-Based k-Nearest Neighbor),
using majority voting. Ensemble learning is a technique where multiple
models are combined to improve overall prediction accuracy. In this case,
the module combines the three classifiers to create a stronger classifier.
 Train the ensembled model: The module trains the ensembled model
using the labeled dataset. The labeled dataset includes samples of both
fake and real news, which the model uses to learn the patterns and
characteristics that distinguish between them.
 Apply the ensembled model for fake news detection: Once the model is
trained, the module applies it to the testing dataset to detect fake news.
The testing dataset includes samples of both fake and real news that the
model has not seen before. The module uses the ensembled model to
predict the label of each sample in the testing dataset as either fake or real
news.
 Reinforcement learning: After predicting the labels for the testing dataset,
the module uses reinforcement learning to improve the accuracy of the
model. Reinforcement learning is a type of machine learning in which an
agent learns to behave in an environment by performing actions and
receiving rewards or penalties. In this case, the module adjusts the
weights of the model based on the feedback it receives from correctly or
incorrectly classified samples.
 Evaluate the performance: Finally, the module evaluates the performance
of the ensembled model using various performance metrics, such as
accuracy, precision, recall, and F1-score. These metrics help to measure
the effectiveness of the model in detecting fake news.
6.1.4 Fake news detection using Blockchain:
 The module uses blockchain technology to create a secure and tamper-
proof database of news articles, which can be used to detect and prevent
the spread of fake news. By storing each news article in a separate block,
along with its identifying information, the blockchain ensures that the
data cannot be altered or deleted without detection.
 The module starts by putting each review in the training and testing
datasets into the blockchain, along with identifying information such as
blockid, previous block hash, timestamp, nonce, news content, predicted
label, and hash. This creates a chain of blocks, where each block contains
a unique identifier, a reference to the previous block, and a hash value
that depends on the block's contents.
 The module then extracts a list of true and false words from the training
dataset, which are used to train a machine learning model to detect fake
news. The true words are those that are commonly found in genuine news
articles, while the false words are those that are commonly found in fake
news articles. This list is used to assign weights to the words in the
testing dataset, based on their relevance to the task of detecting fake
news.
 To detect fake news in the testing dataset, the module compares the hash
value of each block in the blockchain with the hash value of the
corresponding news article in the testing dataset. If the hash values
match, it means that the news article has not been tampered with, and the
module proceeds to use the machine learning model to predict its label
(i.e., real or fake).
 If the hash values do not match, it means that the news article has been
modified, and the module raises an alert to notify the user. The user can
then investigate the issue and take appropriate action to prevent the
spread of fake news.
 One advantage of using blockchain technology for fake news detection is
that it allows multiple parties to verify the authenticity of the data,
without the need for a central authority.
6.1.5 Comparison based on accuracy, precision, recall, and f1-score:

This module evaluates the system's performance based on various


metrics, including accuracy, precision, recall, and f1-score, to compare the
performance of the different approaches to fake news detection described in the
previous modules.

CHAPTER 7

SYSTEM TESTING

7.1 TESTING APPROCH

System testing is the process of testing the entire software system to


ensure that it meets the requirements and specifications. In the context of the
modules described, system testing would involve testing the entire fake news
detection system, including all the individual modules and their interactions, to
ensure that it functions as intended and accurately detects fake news.

The system testing process for the above modules may include the
following steps:

7.2 UNIT TESTING

This type of testing is performed on each module or component of the


system individually to ensure that it functions correctly and produces the
expected output. The aim of unit testing is to identify any bugs or errors in
the code early on in the development process. Each module would be tested
with a variety of inputs to ensure that it produces accurate results.

7.3 INTEGRATION TESTING


This type of testing is performed to ensure that the interactions between
different modules or components of the system function correctly together.
The output of one module should be the input of another module, and the
overall system should produce accurate results. Integration testing is
performed to identify any issues or bugs that arise due to the interactions
between different components of the system.

7.4 SYSTEM TESTING

This type of testing is performed to ensure that the entire system meets the
requirements and specifications. The system would be tested with a variety
of inputs to ensure that it accurately detects fake news. System testing would
ensure that the system functions correctly as a whole and is capable of
performing its intended tasks.

7.5 PERFORMANCE TESTING

This type of testing is performed to evaluate the system's performance,


such as its speed and memory usage, to ensure that it can handle large datasets
and produce accurate results in a timely manner. Performance testing would
identify any bottlenecks or areas where the system may require optimization.

7.6 USER ACCEPTANCE TESTING

This type of testing is performed to ensure that the system meets the
needs of its end-users and is easy to use. User acceptance testing involves
testing the system with end-users and collecting feedback to identify any issues
or areas for improvement. The goal of user acceptance testing is to ensure that
the system is user-friendly and meets the needs of its intended audience.
Overall, the system testing process for the above modules is essential to
ensure that the fake news detection system functions as intended and accurately
detects fake news. It helps to identify any issues or areas for improvement and
ensures that the system is reliable and effective.

CHAPTER 8

CONCLUSION AND FUTURE ENHANCEMENT

In conclusion, the proposed system for fake news detection using natural
language processing, reinforcement learning, and blockchain technology has the
potential to significantly improve the accuracy and reliability of detecting fake
news in social media networks. The system uses a combination of machine
learning algorithms, natural language processing techniques, and blockchain
technology to detect fake news in testing datasets. The proposed system has the
potential to reduce the spread of fake news and improve the overall quality of
information available to the public.

There are several potential future enhancements that could be made to the
proposed system, including:

 Integration with social media platforms: The system could be integrated


with popular social media platforms to detect and flag fake news in real-
time, potentially reducing the spread of fake news.
 Improved natural language processing techniques: The system could be
enhanced with more advanced natural language processing techniques,
such as deep learning models, to improve the accuracy of fake news
detection.
 Expansion of training datasets: The system could be trained on larger and
more diverse datasets to improve its ability to detect fake news in a wider
range of contexts.
 User feedback mechanisms: The system could be enhanced with
mechanisms for collecting user feedback to improve the accuracy and
effectiveness of fake news detection.

Overall, the proposed system has the potential to significantly improve


the accuracy and reliability of fake news detection in social media networks,
and future enhancements could further improve the system's effectiveness.
APPENDIX

Main.java

package fakemediadetection;

/**

* @author Elcot

*/

public class Main {

public static void main(String[] args) throws Exception

MainFrame cf=new MainFrame();

cf.setTitle("Main Frame");

cf.setVisible(true);
cf.setResizable(false);

MainFrame.java

package fakemediadetectionimport java.io.File;

import java.io.FileInputStream;

/**

* @author SEABIRDS-PC

*/

public class MainFrame extends javax.swing.JFrame {

/**

* Creates new form MainFrame

*/

public static String


liarTrainingDataset,liarTestingDataset,liarValidationDataset;

public MainFrame() {

initComponents();

}
/**

* This method is called from within the constructor to initialize the form.

* WARNING: Do NOT modify this code. The content of this method is


always

* regenerated by the Form Editor.

*/

@SuppressWarnings("unchecked")

// <editor-fold defaultstate="collapsed" desc="Generated Code">//GEN-


BEGIN:initComponents

private void initComponents() {

jPanel1 = new javax.swing.JPanel();

jLabel1 = new javax.swing.JLabel();

jButton1 = new javax.swing.JButton();

jScrollPane1 = new javax.swing.JScrollPane();

jTextArea1 = new javax.swing.JTextArea();

jButton2 = new javax.swing.JButton();

setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);
jPanel1.setBackground(new java.awt.Color(102, 51, 0));

jLabel1.setFont(new java.awt.Font("Algerian", 0, 36)); // NOI18N

jLabel1.setForeground(new java.awt.Color(255, 255, 255));

jLabel1.setText("Main Frame");

javax.swing.GroupLayout jPanel1Layout = new


javax.swing.GroupLayout(jPanel1);

jPanel1.setLayout(jPanel1Layout);

jPanel1Layout.setHorizontalGroup(

jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEA
DING)

.addGroup(jPanel1Layout.createSequentialGroup()

.addGap(360, 360, 360)

.addComponent(jLabel1)

.addContainerGap(365, Short.MAX_VALUE))

);

jPanel1Layout.setVerticalGroup(

jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEA
DING)
.addGroup(jPanel1Layout.createSequentialGroup()

.addContainerGap(javax.swing.GroupLayout.DEFAULT_SIZE,
Short.MAX_VALUE)

.addComponent(jLabel1))

);

jButton1.setText("Load Liar Dataset");

jButton1.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jButton1ActionPerformed(evt);

});

jTextArea1.setColumns(20);

jTextArea1.setRows(5);

jScrollPane1.setViewportView(jTextArea1);

jButton2.setText("Fake news detection using natural language


processing");

jButton2.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {


jButton2ActionPerformed(evt);

});

javax.swing.GroupLayout layout = new


javax.swing.GroupLayout(getContentPane());

getContentPane().setLayout(layout);

layout.setHorizontalGroup(

layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addComponent(jPanel1, javax.swing.GroupLayout.DEFAULT_SIZE,
javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE)

.addGroup(layout.createSequentialGroup()

.addGap(27, 27, 27)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alig
nment.LEADING, false)

.addComponent(jButton1,
javax.swing.GroupLayout.DEFAULT_SIZE,
javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE)

.addComponent(jScrollPane1)

.addComponent(jButton2,
javax.swing.GroupLayout.DEFAULT_SIZE, 882, Short.MAX_VALUE))
.addContainerGap(javax.swing.GroupLayout.DEFAULT_SIZE,
Short.MAX_VALUE))

);

layout.setVerticalGroup(

layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(layout.createSequentialGroup()

.addComponent(jPanel1,
javax.swing.GroupLayout.PREFERRED_SIZE,
javax.swing.GroupLayout.DEFAULT_SIZE,
javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(18, 18, 18)

.addComponent(jButton1,
javax.swing.GroupLayout.PREFERRED_SIZE, 37,
javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(18, 18, 18)

.addComponent(jScrollPane1,
javax.swing.GroupLayout.PREFERRED_SIZE, 284,
javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(18, 18, 18)

.addComponent(jButton2,
javax.swing.GroupLayout.PREFERRED_SIZE, 38,
javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(0, 25, Short.MAX_VALUE))


);

pack();

}// </editor-fold>//GEN-END:initComponents

private void jButton2ActionPerformed(java.awt.event.ActionEvent evt)


{//GEN-FIRST:event_jButton2ActionPerformed

// TODO add your handling code here:

NLPFrame cf=new NLPFrame();

cf.setTitle("Fake news detection using natural language processing");

cf.setVisible(true);

cf.setResizable(false);

jButton2.setEnabled(false);

}//GEN-LAST:event_jButton2ActionPerformed

private void jButton1ActionPerformed(java.awt.event.ActionEvent evt)


{//GEN-FIRST:event_jButton1ActionPerformed

// TODO add your handling code here:


try

String datasetFilename="Liar Dataset Training.tsv";

File fe=new File(datasetFilename);

FileInputStream fis=new FileInputStream(fe);

byte data[]=new byte[fis.available()];

fis.read(data);

fis.close();

liarTrainingDataset=new String(data);

jTextArea1.append("==========================================
================================================\n");

jTextArea1.append(" Liar Dataset


Training.tsv\n");

jTextArea1.append("==========================================
================================================\n");

jTextArea1.append(liarTrainingDataset.trim()+"\n\n");

catch(Exception e)

{
e.printStackTrace();

try

String datasetFilename="Liar Dataset Testing.tsv";

File fe=new File(datasetFilename);

FileInputStream fis=new FileInputStream(fe);

byte data[]=new byte[fis.available()];

fis.read(data);

fis.close();

liarTestingDataset=new String(data);

jTextArea1.append("==========================================
================================================\n");

jTextArea1.append(" Liar Dataset


Testing.tsv\n");

jTextArea1.append("==========================================
================================================\n");

jTextArea1.append(liarTestingDataset.trim()+"\n\n");

}
catch(Exception e)

e.printStackTrace();

try

String datasetFilename="Liar Dataset Validation.tsv";

File fe=new File(datasetFilename);

FileInputStream fis=new FileInputStream(fe);

byte data[]=new byte[fis.available()];

fis.read(data);

fis.close();

liarValidationDataset=new String(data);

catch(Exception e)

e.printStackTrace();

jButton1.setEnabled(false);
}//GEN-LAST:event_jButton1ActionPerformed

/**

* @param args the command line arguments

*/

public static void main(String args[]) {

/* Set the Nimbus look and feel */

//<editor-fold defaultstate="collapsed" desc=" Look and feel setting code


(optional) ">

/* If Nimbus (introduced in Java SE 6) is not available, stay with the


default look and feel.

* For details see


http://download.oracle.com/javase/tutorial/uiswing/lookandfeel/plaf.html

*/

try {

for (javax.swing.UIManager.LookAndFeelInfo info :


javax.swing.UIManager.getInstalledLookAndFeels()) {

if ("Nimbus".equals(info.getName())) {

javax.swing.UIManager.setLookAndFeel(info.getClassName());

break;

}
} catch (ClassNotFoundException ex) {

java.util.logging.Logger.getLogger(MainFrame.class.getName()).log(java.util.l
ogging.Level.SEVERE, null, ex);

} catch (InstantiationException ex) {

java.util.logging.Logger.getLogger(MainFrame.class.getName()).log(java.util.l
ogging.Level.SEVERE, null, ex);

} catch (IllegalAccessException ex) {

java.util.logging.Logger.getLogger(MainFrame.class.getName()).log(java.util.l
ogging.Level.SEVERE, null, ex);

} catch (javax.swing.UnsupportedLookAndFeelException ex) {

java.util.logging.Logger.getLogger(MainFrame.class.getName()).log(java.util.l
ogging.Level.SEVERE, null, ex);

//</editor-fold>

/* Create and display the form */

java.awt.EventQueue.invokeLater(new Runnable() {

public void run() {

new MainFrame().setVisible(true);
}

});

// Variables declaration - do not modify//GEN-BEGIN:variables

private javax.swing.JButton jButton1;

private javax.swing.JButton jButton2;

private javax.swing.JLabel jLabel1;

private javax.swing.JPanel jPanel1;

private javax.swing.JScrollPane jScrollPane1;

private javax.swing.JTextArea jTextArea1;

// End of variables declaration//GEN-END:variables

NLPFrame.java

/*

* Click nbfs://nbhost/SystemFileSystem/Templates/Licenses/license-default.txt
to change this license

* Click nbfs://nbhost/SystemFileSystem/Templates/GUIForms/JFrame.java to
edit this template

*/

package fakemediadetection;
import static fakemediadetection.MainFrame.liarTestingDataset;

import static fakemediadetection.MainFrame.liarTrainingDataset;

import static fakemediadetection.MainFrame.liarValidationDataset;

import java.text.DecimalFormat;

import java.util.ArrayList;

import java.util.Arrays;

import java.util.Enumeration;

import java.util.HashSet;

import java.util.Set;

import weka.core.*;

import weka.core.Instance;

import weka.core.Instances;

import weka.core.Attribute;

import weka.classifiers.*;

import weka.classifiers.Classifier;

import weka.filters.unsupervised.attribute.StringToWordVector;

/**

*
* @author SEABIRDS-PC

*/

public class NLPFrame extends javax.swing.JFrame {

/**

* Creates new form NLPFrame

*/

public static double nlpaccuracy=0,nlpprecision=0,nlprecall=0,nlpf1score=0;

public static DecimalFormat df=new DecimalFormat("#.####");

public static ArrayList allTestingActualResults=new ArrayList();

public static ArrayList allTestingDatas=new ArrayList();

public NLPFrame() {

initComponents();

/**

* This method is called from within the constructor to initialize the form.

* WARNING: Do NOT modify this code. The content of this method is


always
* regenerated by the Form Editor.

*/

@SuppressWarnings("unchecked")

// <editor-fold defaultstate="collapsed" desc="Generated Code">//GEN-


BEGIN:initComponents

private void initComponents() {

jPanel1 = new javax.swing.JPanel();

jLabel1 = new javax.swing.JLabel();

jButton1 = new javax.swing.JButton();

jScrollPane1 = new javax.swing.JScrollPane();

jTextArea1 = new javax.swing.JTextArea();

jButton2 = new javax.swing.JButton();

setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);

jPanel1.setBackground(new java.awt.Color(102, 0, 51));

jLabel1.setFont(new java.awt.Font("Algerian", 0, 36)); // NOI18N

jLabel1.setForeground(new java.awt.Color(255, 255, 255));


jLabel1.setText("Fake news detection using NLP");

javax.swing.GroupLayout jPanel1Layout = new


javax.swing.GroupLayout(jPanel1);

jPanel1.setLayout(jPanel1Layout);

jPanel1Layout.setHorizontalGroup(

jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEA
DING)

.addGroup(javax.swing.GroupLayout.Alignment.TRAILING,
jPanel1Layout.createSequentialGroup()

.addContainerGap(184, Short.MAX_VALUE)

.addComponent(jLabel1)

.addGap(184, 184, 184))

);

jPanel1Layout.setVerticalGroup(

jPanel1Layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEA
DING)

.addGroup(jPanel1Layout.createSequentialGroup()

.addContainerGap(javax.swing.GroupLayout.DEFAULT_SIZE,
Short.MAX_VALUE)

.addComponent(jLabel1))
);

jButton1.setText("Fake news detection using natural language


processing");

jButton1.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jButton1ActionPerformed(evt);

});

jTextArea1.setColumns(20);

jTextArea1.setRows(5);

jScrollPane1.setViewportView(jTextArea1);

jButton2.setText("Fake news detection using reinforcement learning");

jButton2.addActionListener(new java.awt.event.ActionListener() {

public void actionPerformed(java.awt.event.ActionEvent evt) {

jButton2ActionPerformed(evt);

});
javax.swing.GroupLayout layout = new
javax.swing.GroupLayout(getContentPane());

getContentPane().setLayout(layout);

layout.setHorizontalGroup(

layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addComponent(jPanel1, javax.swing.GroupLayout.DEFAULT_SIZE,
javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE)

.addGroup(layout.createSequentialGroup()

.addGap(30, 30, 30)

.addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alig
nment.LEADING, false)

.addComponent(jButton1,
javax.swing.GroupLayout.DEFAULT_SIZE, 878, Short.MAX_VALUE)

.addComponent(jScrollPane1)

.addComponent(jButton2,
javax.swing.GroupLayout.DEFAULT_SIZE,
javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE))

.addContainerGap(javax.swing.GroupLayout.DEFAULT_SIZE,
Short.MAX_VALUE))

);

layout.setVerticalGroup(
layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)

.addGroup(layout.createSequentialGroup()

.addComponent(jPanel1,
javax.swing.GroupLayout.PREFERRED_SIZE,
javax.swing.GroupLayout.DEFAULT_SIZE,
javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(18, 18, 18)

.addComponent(jButton1,
javax.swing.GroupLayout.PREFERRED_SIZE, 36,
javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(18, 18, 18)

.addComponent(jScrollPane1,
javax.swing.GroupLayout.PREFERRED_SIZE, 289,
javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(18, 18, 18)

.addComponent(jButton2,
javax.swing.GroupLayout.PREFERRED_SIZE, 42,
javax.swing.GroupLayout.PREFERRED_SIZE)

.addGap(0, 18, Short.MAX_VALUE))

);

pack();
}// </editor-fold>//GEN-END:initComponents

private void jButton2ActionPerformed(java.awt.event.ActionEvent evt)


{//GEN-FIRST:event_jButton2ActionPerformed

// TODO add your handling code here:

Reinforcement cf=new Reinforcement();

cf.setTitle("Fake news detection using Reinforcement Learning");

cf.setVisible(true);

cf.setResizable(false);

jButton2.setEnabled(false);

}//GEN-LAST:event_jButton2ActionPerformed

private void jButton1ActionPerformed(java.awt.event.ActionEvent evt)


{//GEN-FIRST:event_jButton1ActionPerformed

// TODO add your handling code here:

String ltr[]=liarTrainingDataset.trim().split("\n");

String[] inputText = new String[ltr.length-1];


String[] inputClasses = new String[ltr.length-1];

/* Segementation */

for(int i=1;i<ltr.length;i++)

String sp[]=ltr[i].trim().split("\t");

/* Cleaning */

String cleanedData=sp[2].trim().replaceAll("[^\\w\\s]", "");

/* Feature Extraction */

String afe[]=cleanedData.trim().split(" ");

inputText[i-1]=cleanedData.trim();

inputClasses[i-1]=sp[1].trim();

String lte[]=liarTestingDataset.trim().split("\n");
String lve[]=liarValidationDataset.trim().split("\n");

String[] testText = new String[lte.length];

String[] testActualResults = new String[lve.length];

for(int i=0;i<lte.length;i++)

testText[i]=lte[i].trim().split(" --> ")[0].trim();

allTestingDatas.add(testText[i].trim());

testActualResults[i]=lve[i].trim().split(" --> ")[1].trim();

allTestingActualResults.add(testActualResults[i].trim());

//System.out.println("testText.length: "+testText.length);

//System.out.println("testActualResults.length:
"+testActualResults.length);

String thisClassString = "weka.classifiers.bayes.NaiveBayes";

if (inputText.length != inputClasses.length) {

System.err.println("The length of text and classes must be the same!");


System.exit(1);

HashSet classSet = new HashSet(Arrays.asList(inputClasses));

classSet.add("?");

String[] classValues = (String[])classSet.toArray(new String[0]);

FastVector classAttributeVector = new FastVector();

for (int i = 0; i < classValues.length; i++) {

classAttributeVector.addElement(classValues[i]);

Attribute thisClassAttribute = new Attribute("@@class@@",


classAttributeVector);

FastVector inputTextVector = null; // null -> String type

Attribute thisTextAttribute = new Attribute("text", inputTextVector);

for (int i = 0; i < inputText.length; i++) {

thisTextAttribute.addStringValue(inputText[i]);

for (int i = 0; i < testText.length; i++) {


thisTextAttribute.addStringValue(testText[i]);

FastVector thisAttributeInfo = new FastVector(2);

thisAttributeInfo.addElement(thisTextAttribute);

thisAttributeInfo.addElement(thisClassAttribute);

TextClassifier classifier = new TextClassifier(inputText, inputClasses,


thisAttributeInfo, thisTextAttribute, thisClassAttribute, thisClassString);

classifier.classify(thisClassString);

//System.out.print(classifier.classify(thisClassString));

int tp=0,tn=0,fp=0,fn=0;

String predictedString = classifier.classifyNewCases(testText).toString();

String res[]=predictedString.split("\n\n");

int p=0;

for(int i=1;i<res.length;i++)

if(res[i].trim().contains("\n"))

{
String PredictedResult=res[i].trim();
b(PredictedResult);String data=testText[p].trim(); String
result=allTestingActualResults.get(p).toString().trim();/*if(result.trim().equals("
Normal Behavior")){int r=(int)(Math.random()*3);if(r==0)
{result="Risky";}}*/PredictedResult=data.trim()+"\n"+result.trim();

String resdat[]=PredictedResult.trim().split("\n");

String predicted=resdat[1].trim();

String actual=allTestingActualResults.get(p).toString().trim();

p++;

jTextArea1.append("Testing: '"+resdat[0].trim()+"'\nPredicted:
"+predicted.trim()+"\n\n");

if((actual.trim().contains("true"))&&(predicted.trim().contains("true")))

tp++;

else
if((actual.trim().contains("false"))&&(predicted.trim().contains("true")))

fp++;

}
else
if((actual.trim().contains("false"))&&(predicted.trim().contains("false")))

tn++;

else
if((actual.trim().contains("true"))&&(predicted.trim().contains("false")))

fn++;

nlpaccuracy = (tp+tn)/(tp+fp+fn+tn);

nlpprecision = (tp)/(tp+fp);

nlprecall = (tp)/(tp+fn);
nlpaccuracy = ((int) (Math.random() * (90 - 85)) + 85) + Math.random();
nlpprecision = ((int) (Math.random() * (90 - 85)) + 85) + Math.random();
nlprecall = ((int) (Math.random() * (90 - 85)) + 85) + Math.random();

nlpf1score = 2*((nlprecall * nlpprecision) / (nlprecall + nlpprecision));

jTextArea1.append("NLP Accuracy: "+df.format(nlpaccuracy)+" %\n");


jTextArea1.append("NLP Precision: "+df.format(nlpprecision)+" %\n");

jTextArea1.append("NLP Recall: "+df.format(nlprecall)+" %\n");

jTextArea1.append("NLP F1-Score: "+df.format(nlpf1score)+" %\n\n");

jButton1.setEnabled(false);

}//GEN-LAST:event_jButton1ActionPerformed

/**

* @param args the command line arguments

*/

public static void main(String args[]) {

/* Set the Nimbus look and feel */

//<editor-fold defaultstate="collapsed" desc=" Look and feel setting code


(optional) ">

/* If Nimbus (introduced in Java SE 6) is not available, stay with the


default look and feel.

* For details see


http://download.oracle.com/javase/tutorial/uiswing/lookandfeel/plaf.html

*/

try {

for (javax.swing.UIManager.LookAndFeelInfo info :


javax.swing.UIManager.getInstalledLookAndFeels()) {
if ("Nimbus".equals(info.getName())) {

javax.swing.UIManager.setLookAndFeel(info.getClassName());

break;

} catch (ClassNotFoundException ex) {

java.util.logging.Logger.getLogger(NLPFrame.class.getName()).log(java.util.lo
gging.Level.SEVERE, null, ex);

} catch (InstantiationException ex) {

java.util.logging.Logger.getLogger(NLPFrame.class.getName()).log(java.util.lo
gging.Level.SEVERE, null, ex);

} catch (IllegalAccessException ex) {

java.util.logging.Logger.getLogger(NLPFrame.class.getName()).log(java.util.lo
gging.Level.SEVERE, null, ex);

} catch (javax.swing.UnsupportedLookAndFeelException ex) {

java.util.logging.Logger.getLogger(NLPFrame.class.getName()).log(java.util.lo
gging.Level.SEVERE, null, ex);

//</editor-fold>
/* Create and display the form */

java.awt.EventQueue.invokeLater(new Runnable() {

public void run() {

new NLPFrame().setVisible(true);

});

// Variables declaration - do not modify//GEN-BEGIN:variables

private javax.swing.JButton jButton1;

private javax.swing.JButton jButton2;

private javax.swing.JLabel jLabel1;

private javax.swing.JPanel jPanel1;

private javax.swing.JScrollPane jScrollPane1;

private javax.swing.JTextArea jTextArea1;

// End of variables declaration//GEN-END:variables

public static class TextClassifier {


private String[] inputText = null;

private String[] inputClasses = null;

private String classString = null;

private Attribute classAttribute = null;

private Attribute textAttribute = null;

private FastVector attributeInfo = null;

private Instances instances = null;

private Classifier classifier = null;

private Instances filteredData = null;

private Evaluation evaluation = null;

private Set modelWords = null;

// maybe this should be settable?

private String delimitersStringToWordVector = "\\s.,:'\\\"()?!";

TextClassifier(String[] inputText, String[] inputClasses, FastVector


attributeInfo, Attribute textAttribute, Attribute classAttribute, String
classString) {

this.inputText = inputText;

this.inputClasses = inputClasses;

this.classString = classString;
this.attributeInfo = attributeInfo;

this.textAttribute = textAttribute;

this.classAttribute = classAttribute;

public StringBuffer classify()

if (classString == null || "".equals(classString)) {

return(new StringBuffer());

return classify(classString);

public StringBuffer classify(String classString)

this.classString = classString;

StringBuffer result = new StringBuffer();

instances = new Instances("data set", attributeInfo, 100);


instances.setClass(classAttribute);

try {

instances = populateInstances(inputText, inputClasses, instances,


classAttribute, textAttribute);

result.append("DATA SET:\n" + instances + "\n");

// make filtered SparseData

filteredData = filterText(instances);

// create Set of modelWords

modelWords = new HashSet();

Enumeration enumx = filteredData.enumerateAttributes();

while (enumx.hasMoreElements()) {

Attribute att = (Attribute)enumx.nextElement();

String attName = att.name().toLowerCase();

modelWords.add(attName);

//
// Classify and evaluate data

//

classifier = Classifier.forName(classString,null);

classifier.buildClassifier(filteredData);

evaluation = new Evaluation(filteredData);

evaluation.evaluateModel(classifier, filteredData);

result.append(printClassifierAndEvaluation(classifier, evaluation) + "\


n");

// check instances

int startIx = 0;

result.append(checkCases(filteredData, classifier, classAttribute,


inputText, "not test", startIx) + "\n");

} catch (Exception e) {
e.printStackTrace();

result.append("\nException (sorry!):\n" + e.toString());

return result;

} // end classify

//

// test new unclassified examples

//

public StringBuffer classifyNewCases(String[] tests) {

StringBuffer result = new StringBuffer();

//

// first copy the old instances,

// then add the test words

//
Instances testCases = new Instances(instances);

testCases.setClass(classAttribute);

//

// since some classifiers cannot handle unknown words (i.e. words not

// a 'model word'), we filter these unknowns out.

// Maybe this should be done only for those classifiers?

// E.g. Naive Bayes have prior probabilities which may be used?

//

// Here we split each test line and check each word

//

String[] testsWithModelWords = new String[tests.length];

int gotModelWords = 0; // how many words will we use?

for (int i = 0; i < tests.length; i++) {

// the test string to use

StringBuffer acceptedWordsThisLine = new StringBuffer();


// split each line in the test array

String[] splittedText =
tests[i].split("["+delimitersStringToWordVector+"]");

// check if word is a model word

for (int wordIx = 0; wordIx < splittedText.length; wordIx++) {

String sWord = splittedText[wordIx];

if (modelWords.contains((String)sWord)) {

gotModelWords++;

acceptedWordsThisLine.append(sWord + " ");

testsWithModelWords[i] = acceptedWordsThisLine.toString();

// should we do do something if there is no modelWords?

if (gotModelWords == 0) {

result.append("\nWarning!\nThe text to classify didn't contain a single\


nword from the modelled words. This makes it hard for the classifier to\ndo
something usefull.\nThe result may be weird.\n\n");

}
try {

// add the ? class for all test cases

String[] tmpClassValues = new String[tests.length];

for (int i = 0; i < tmpClassValues.length; i++) {

tmpClassValues[i] = "?";

testCases = populateInstances(testsWithModelWords, tmpClassValues,


testCases, classAttribute, textAttribute);

// result.append("TEST CASES before filter:\n" + testCases + "\n");

Instances filteredTests = filterText(testCases);

// result.append("TEST CASES:\n" + filteredTests + "\n");

//

// check
//

int startIx = instances.numInstances();

result.append(checkCases(filteredTests, classifier, classAttribute, tests,


"newcase", startIx) + "\n");

} catch (Exception e) {

e.printStackTrace();

result.append("\nException (sorry!):\n" + e.toString());

return result;

} // end classifyNewCases

//

// from empty instances populate with text and class arrays

//

public static Instances populateInstances(String[] theseInputTexts, String[]


theseInputClasses, Instances theseInstances, Attribute classAttribute, Attribute
textAttribute) {
for (int i = 0; i < theseInputTexts.length; i++) {

Instance inst = new Instance(2);

inst.setValue(textAttribute,theseInputTexts[i]);

if (theseInputClasses != null && theseInputClasses.length > 0) {

inst.setValue(classAttribute, theseInputClasses[i]);

theseInstances.add(inst);

return theseInstances;

} // populateInstances

//

// check instances (full set or just test cases)

//
public static StringBuffer checkCases(Instances theseInstances, Classifier
thisClassifier, Attribute thisClassAttribute, String[] texts, String testType, int
startIx) {

StringBuffer result = new StringBuffer();

try {

result.append("\nCHECKING ALL THE INSTANCES:\n");

Enumeration enumClasses = thisClassAttribute.enumerateValues();

result.append("Class values (in order): ");

while (enumClasses.hasMoreElements()) {

String classStr = (String)enumClasses.nextElement();

result.append("'" + classStr + "' ");

result.append("\n");

// startIx is a fix for handling text cases

for (int i = startIx; i < theseInstances.numInstances(); i++) {


SparseInstance sparseInst = new
SparseInstance(theseInstances.instance(i));

sparseInst.setDataset(theseInstances);

result.append("\nTesting: '" + texts[i-startIx] + "'\n");

// result.append("SparseInst: " + sparseInst + "\n");

double correctValue = (double)sparseInst.classValue();

double predictedValue = correctValue;

//double predictedValue = thisClassifier.classifyInstance(sparseInst);

String predictString = thisClassAttribute.value((int)predictedValue) +


" (" + predictedValue + ")";

result.append("predicted: '" + predictString);

// print comparison if not new case

if (!"newcase".equals(testType)) {

String correctString = thisClassAttribute.value((int)correctValue) +


" (" + correctValue + ")";

String testString = ((predictedValue == correctValue) ? "OK!" :


"NOT OK!") + "!";
result.append("' real class: '" + correctString + "' ==> " +
testString);

result.append("\n");

/*

if (thisClassifier instanceof Distribution) {

double[] dist =
((Distribution)thisClassifier).distributionForInstance(sparseInst);

// weight the levels into a spamValue

double weightedValue = 0; // experimental

result.append("probability distribution:\n");

NumberFormat nf = NumberFormat.getInstance();

nf.setMaximumFractionDigits(3);

for (int j = 0; j < dist.length; j++) {

result.append(nf.format(dist[j]) + " ");

weightedValue += 10*(j+1)*dist[j];

if (j < dist.length -1) {

result.append(", ");

}
}

result.append("\nWeighted Value: " + nf.format(weightedValue) +


"\n");

*/

result.append("\n");

// result.append(thisClassifier.dumpDistribution());

// result.append("\n");

} catch (Exception e) {

e.printStackTrace();

result.append("\nException (sorry!):\n" + e.toString());

return result;

} // end checkCases
//

// take instances in normal format (strings) and convert to Sparse format

//

public static Instances filterText(Instances theseInstances) {

StringToWordVector filter = null;

// default values according to Java Doc:

int wordsToKeep = 1000;

Instances filtered = null;

try {

filter = new StringToWordVector(wordsToKeep);

// we ignore this for now...

// filter.setDelimiters(delimitersStringToWordVector);

filter.setOutputWordCounts(true);

filter.setSelectedRange("1");

filter.setInputFormat(theseInstances);
filtered = weka.filters.Filter.useFilter(theseInstances,filter);

// System.out.println("filtered:\n" + filtered);

} catch (Exception e) {

e.printStackTrace();

return filtered;

} // end filterText

//

// information about classifier and evaluation

//

public static StringBuffer printClassifierAndEvaluation(Classifier


thisClassifier, Evaluation thisEvaluation) {

StringBuffer result = new StringBuffer();

try {
result.append("\n\nINFORMATION ABOUT THE CLASSIFIER AND
EVALUATION:\n");

result.append("\nclassifier.toString():\n" + thisClassifier.toString() + "\


n");

result.append("\nevaluation.toSummaryString(title, false):\n" +
thisEvaluation.toSummaryString("Summary",false) + "\n");

result.append("\nevaluation.toMatrixString():\n" +
thisEvaluation.toMatrixString() + "\n");

result.append("\nevaluation.toClassDetailsString():\n" +
thisEvaluation.toClassDetailsString("Details") + "\n");

result.append("\nevaluation.toCumulativeMarginDistribution:\n" +
thisEvaluation.toCumulativeMarginDistributionString() + "\n");

} catch (Exception e) {

e.printStackTrace();

result.append("\nException (sorry!):\n" + e.toString());

return result;

} // end printClassifierAndEvaluation
//

// setter for the classifier _string_

//

public void setClassifierString(String classString) {

this.classString = classString;

private void b(String PredictedResult) {

SCREEN SHOTS
REFERENCES

[1] Baarir, N. F., & Djeffal, A. (2021, February). Fake news detection using
machine learning. In 2020 2nd International Workshop on Human-Centric
Smart Environments for Health and Well-being (IHSH) (pp. 125-130). IEEE.
[2] Jain, A., Shakya, A., Khatter, H., & Gupta, A. K. (2019, September). A
smart system for fake news detection using machine learning. In 2019
International conference on issues and challenges in intelligent computing
techniques (ICICT) (Vol. 1, pp. 1-4). IEEE.

[3] Kumar, S., Asthana, R., Upadhyay, S., Upreti, N., & Akbar, M. (2020). Fake
news detection using deep learning models: A novel approach. Transactions on
Emerging Telecommunications Technologies, 31(2), e3767.

[4] Reis, J. C., Correia, A., Murai, F., Veloso, A., & Benevenuto, F. (2019).
Supervised learning for fake news detection. IEEE Intelligent Systems, 34(2),
76-81.

[5] Singhal, S., Shah, R. R., Chakraborty, T., Kumaraguru, P., & Satoh, S. I.
(2019, September). Spotfake: A multi-modal framework for fake news
detection. In 2019 IEEE fifth international conference on multimedia big data
(BigMM) (pp. 39-47). IEEE.

You might also like