A Sentiment and Review Analysis Based News Reliability Ranking Platform

National University of Computer and Emerging Sciences
A Sentiment and Review Analysis Based News

Reliability Ranking Platform
Muhammad Talal………….L16-4047
Obaid Ur Rehman…………L16-4062
Syed Ahmad Saeed..………L16-4264
Supervisor: Dr. Asif Mahmood Gilani
B.S. Computer Science

Final Year Project
June 2020
Anti-Plagiarism Declaration
This is to declare that the above publication produced under the:
Title: A Sentiment and Review Analysis based News Ranking Platform

is the sole contribution of the author(s) and no part hereof has been reproduced on as it is basis
(cut and paste) which can be considered as Plagiarism. All referenced parts have been used to
argue the idea and have been cited properly. I/We will be responsible and liable for any
consequence if violation of this declaration is determined.
Date: 03/06/2020
Student 1
Name: Muhammad Talal
Signature: _______________________
Student 2
Name: Obaid Ur Rehman
Signature: _______________________
Student 3
Name: Syed Ahmad Saeed
Signature: _______________________
Table of Contents i
Table of Contents
Table of Contents .................................................................................................................... i
List of Tables .........................................................................................................................ii
List of Figures .......................................................................................................................iii
Abstract ...................................................................................................................................... 1
Introduction .............................................................................................................. 2
Goals and Objectives ....................................................................................................... 2
Scope of the Project ......................................................................................................... 2
Literature Survey / Related Work ............................................................................ 4
Text Classification ........................................................................................................... 4
Image Manipulation Detection ........................................................................................ 7
Requirements and Design ........................................................................................ 9
Functional Requirements ................................................................................................. 9
Non-Functional Requirements ......................................................................................... 9
Hardware and Software Requirements .......................................................................... 10
System Architecture ....................................................................................................... 10
Architectural Strategies .................................................................................................. 13
Use Cases ....................................................................................................................... 14
GUI ................................................................................................................................ 21
Database Design............................................................................................................. 23
System Requirements..................................................................................................... 24
Design Considerations ................................................................................................. 25
Development Methods ................................................................................................. 25
Class diagram ............................................................................................................... 26
Sequence diagram ........................................................................................................ 27
Policies and Tactics...................................................................................................... 37
Implementation and Test Cases ............................................................................. 39
Implementation .............................................................................................................. 39
Test Case Design and Description ................................................................................. 42
Test Metrics ................................................................................................................... 62
Experimental Results and Analysis ........................................................................ 63
Text Classifier Results and Analysis ............................................................................. 63
Image Tamper Detection Results and Analysis ............................................................. 65
Conclusion.............................................................................................................. 66
References ................................................................................................................................ 67
List of Tables ii
List of Tables
Table 1: Data Dictionary Table................................................................................................ 23
Table 2: Component ID and Name Mapping ........................................................................... 43
Table 3: Test Case ID and Name Mapping .............................................................................. 43
Table 4: Model Accuracies ...................................................................................................... 63
Table 5: Model Precision Values ............................................................................................. 64
Table 6: Model Recall Rate ..................................................................................................... 64
Table 7: Tamper Detection Model Performance ..................................................................... 65
List of Figures iii
List of Figures
Figure 1: Architecture of TI-CNN ............................................................................................. 4
Figure 2: CNN Architecture for Image Splicing Detection ....................................................... 7
Figure 3: System Architecture ................................................................................................. 12
Figure 4: Login Screen............................................................................................................. 21
Figure 5: User Register Screen ................................................................................................ 21
Figure 6: News Article Feed Screen ........................................................................................ 22
Figure 7: News Articles Details ............................................................................................... 22
Figure 8: ER Diagram .............................................................................................................. 23
Figure 9: Class Diagram .......................................................................................................... 26
Figure 10: User Login Sequence Diagram............................................................................... 27
Figure 11: User Register Sequence Diagram ........................................................................... 28
Figure 12: Show News Articles List Sequence Diagram......................................................... 29
Figure 13: View Article Details Sequence Diagram................................................................ 30
Figure 14: Post Comment Sequence Diagram ......................................................................... 31
Figure 15: Rate an Article Sequence Diagram ........................................................................ 32
Figure 16: Changing the Password Sequence Diagram ........................................................... 33
Figure 17: Posting a News Article ........................................................................................... 34
Figure 18: Resetting the Password ........................................................................................... 35
Figure 19: Updating the Profile ............................................................................................... 36
Figure 20: Viewing the Profile ................................................................................................ 37
A Sentiment and Review Analysis based News Reliability Ranking Platform 1
Abstract
In the current digital era, fake news dissemination has been one of the major problems. The
prevalence of digital news sources and social media has accelerated the spread of fake news on
the internet, moreover with common use of image editing software, fake news images are also
very common. The purpose of the project is to make an automated news classification and
ranking system that scrapes news articles from multiple online web sources and classifies and
ranks them on the basis of their authenticity. The project uses Computer Vision, Natural
Language Processing and Deep Learning techniques to classify and rank news articles. The
main parts of a news article are its text content and the image data in the article, our system
takes both features into account for ranking a given news article. FakeNewsNet dataset has
been used to train the text classification model and a synthetic dataset for image tamper
detection. The approach taken for classifying news text uses a 1D-CNN with an attention
mechanism that uses the article text (headline and body) and source features, the system is able
to classify text articles with good accuracy and precision. The image tamper detection model
is a Faster-RCNN network fine-tuned for tamper detection task. The image tamper detection
model is able to detect tampered images with good performance on various evaluation datasets
and human tampered images. The system uses the score from both of these models to robustly
rank the news articles. A web application is also developed to provide an interface to users so
that they can easily read reliable news articles which are ranked and checked by our system.
An automated system like this could potentially mitigate the spread of fake news and allow
people to easily read news with minimal fact checking.
Introduction 2
Introduction
With the rise of internet and social media, digital news sources are rapidly becoming the most
dominant source of information. This in turn has also given rise to fake and misleading
information like biased articles or clickbait content. The trend of fake news in the recent times
has had a major impact on misleading the masses. Moreover, the increasing availability of
image editing tools like photoshop, GIMP etc. image tempering has become common and this
technique is often used by news sources to create fake evidence images. Fake news has
therefore become a huge issue in recent times.
The purpose of the project is to minimize the impact brought upon by fake news content. We
have used artificial intelligence techniques like Natural Language Processing (NLP), Deep
Learning and Computer Vision to identify and rank news articles from multiple news sources
over the web, based on their authenticity. We have performed sentiment analysis of news
article’s text in order to decide whether an article text seems authentic or not, moreover, we
have also incorporated Computer Vision techniques to analyze images to recognize
tempered/forged images used in articles which could further help verify the authenticity of the
news article.
Such a platform that can be potentially immune to fake news and eliminates bias can prove to
be a great news medium. The goal is to create a platform that collects news from sources over
the web in real time and assign a reliability rank and promote news that seems to be authentic.
Such a platform could potentially mitigate the misleading culture of fake news.
The first section gives the introduction of the project including scope and main objectives.
Second chapter gives the detailed literature review. Third chapter covers the requirements and
design specifications of the system. The fourth chapter describes the implementation. Fifth
chapter gives the final conclusion.
Goals and Objectives

The project has following goals:
1. Crawl multiple dominant news sources and maintain an up to date document database.
2. Implement a model to classify and rank authentic and false news articles.
3. Analyse image data within news articles to identify forged and tampered news images.
4. Maintain an update a reliability rank of sources based upon previous articles to help in
identifying possible misleading news.
5. Provide a platform for users, which allows them to read and discuss reliable news.
Scope of the Project

The project covers fields such as NLP/Text Processing, Computer Vision, Deep Learning
techniques, Web development and covers some topics from information retrieval. We have
combined different technologies used by previous similar researches and proposed a deep
learning architecture to classify and rank news articles. We have also incorporated Computer
Vision to further improve the performance of our model.
The steps that will be needed to accomplish it are as follow:
• Create a structure to iteratively collect news from multiple sources to collect real time
news articles.
• Create a deep learning model to classify news articles using their text content and image
data.
• Cross match facts among similar news articles from different sources to filter
anomalous and deviant news articles.
• Create a platform for users to view, interact and post latest news according to the
ranking assigned by the proposed system.
Literature Survey / Related Work 4
Literature Survey / Related Work

The following sections describe the detailed literature survey of previous researches.
Text Classification
For text classification, main methods used are using deep neural networks, which includes
sequence models or convolution neural network. Moreover, machine learning techniques such
as logistic regression are also used in the previous researches. Language modeling techniques
are also used in classification. The following section describes previous researches on fake
news classification.
2.1.1 Text Classification Using Neural Networks

Method proposed in [1] makes use of a Deep Neural Network model named TI-CNN that
makes use of both text and images. This incorporates both textual and image data to classify
news articles. Before training the network, textual data is vectorized using Word Embedding
technique instead of Bag of Words approach because Bag of Words representation doesn’t
maintain context information of words very well.
The architecture of TI-CNN is illustrated in Figure 1
Figure 1: Architecture of TI-CNN

Architecture and working of TI-CNN network is illustrated in the figure [1]
The dataset used [22] contains 20,015 news article which includes 11491 fake and 8074 real
news articles, the samples contain meta-data as well as image and text content of news articles.
Analysis of news articles indicate that fake news articles exhibit various distinguishing features
such as different choice of grammar and distribution of words, moreover, fake news articles
extensively make use of altered low resolution images. Text analysis shows that fake news has
fewer words than real news articles on average, real news has more number of sentences than
fake news, real news has marginally fewer words in sentences, fake news has more capitalized
words, fake news have significantly fewer negations, fake news use fewer second person
pronouns, real news has more diverse grammar and real news exhibit more positive sentiment.
These distinguishing text features are called explicit text features. The explicit features are also
extracted from image data, which include number of faces and average image resolution.
Another type of features called latent feature are learnt through the Convolutional Neural
Network from both the image and text data. The latent and explicit features are then unified
into the same vector space. For network training a Rectified Linear Unit (RELU) is used,
Negative Log Likelihood is used as loss function and RMSprop is used as an optimizer. The
experiments show precision value of 0.9220, recall rate of 0.9277 and F1-score of 0.9210.
The system proposed in [2] uses speaker profiles incorporated with attention-based LSTM
model. Adding a speaker profile greatly improved the accuracy of the model. Linguistic limited
approach is not enough for the fake news detection as it is media/topic dependent hence it
reduces scalability. There are 2 LSTM cells used. 1st cell is used for the retrieval of the
representation of the news article and the second cell is used for the vector representation of
the speaker profile. Two attention factors are constructed using speaker profile. first cell
accommodates the speaker profile while the second one accommodates information about
topics of the news articles. The two representations obtained from both cells are then chained
together in the soft-max function for classification. Evaluations are performed using the LIAR
dataset [23]. There is a total of 12836 short statements in the dataset and a total of 3341 speakers
which wraps a total of 141 topics from politifact.com. Each news includes text content as well
as speaker’s information. The experiment was conducted using different combinations of the
speaker profile attributes. Inclusion of Credit history in evaluation gave an improvement in
accuracy by 3%. Location of speaker along with job title and party affiliation gave more
improvement (2.3%). When all attributes were incorporated in evaluation the performance
surged to over 40% in accuracy as compared to basic LSTM model with net increase of 14.5%
in accuracy.
Method discussed in [3] uses hybrid deep learning model which uses features from 3 major
characteristics of a news article, which are article text, source and user information for
automated classification of news articles. CSI is and abbreviation for Capture Score and
Integrate, these are the three modules working the neural network model. In the capture
module, a RNN learns the feature for representing the articles. The second module known as
score module uses a fully connected neural network that takes user features as an input and
outputs a resultant score vector for the users. Integrate module combines the resultant feature
vectors from capture and score module and uses that vector to classify the news article. The
dataset used contains articles from Twitter and Weibu [4], contains article’s textual data as well
as information about user engagements. CSI model achieved an accuracy of 0.892 and F1-
Score of 0.894 on Twitter articles and an accuracy of 0.953 and F1-Score of 0.954.
2.1.2 Text Classification Using Machine Learning and Algorithmic Approaches

The proposed system in [2] two classification techniques: first based on logistic regression,
with a feature as user interaction with posts, and the second one a unique implementation of
Boolean crowdsourcing algorithms on available training sets, but no prior assumption on users
being mostly reliable can be made. Dataset for this system consists of Facebook posts and user
reaction, was collected using Facebook’s graph API. These posts have mainly been taken from
conspiracy based and scientific pages. Topics taken from scientific pages were assumed to be
non-hoaxes and topics taken from conspiracy and non-scientific pages were assumed to be a
hoax. The dataset was comprised of a total of 15,500 posts taken from 32 different pages that
included 14 conspiracy pages and 18 scientific pages. There were more than 2,300,00 likes by
more than 900,000 users. In this dataset, around 57.60% of the total posts were hoax and around
42.40% were non-hoax.
In The logistic regression, the model trains on the weights for each user and probabilities are
assigned for them. In the Boolean Label Crowdsourcing (BLC), users provide True or False
labels (Boolean values) for the posts so that it indicates the nature of the post, for example
whether a post is in accordance with the community guidelines or not. In the dataset, a user
liking a post was considered as a True value. Then a standard cross-validation analysis was
performed for BLC using logistic regression and of the harmonic algorithm. Both techniques
worked exceptionally well. Logistic Regression had an accuracy of more than 99% while
harmonic BLC had an accuracy of 99.4%.
classification power of the several classic models have been evaluated using three basic types
of features in [6]. The dataset had of 2282 labeled BuzzFeed news articles with comment,
shares and Facebook post reactions related to the elections of 2016 in the United States of
America extracted from Buzzface [24]. Main features used for the detection of fake news
articles were: news content features, features extracted from the sources of news articles, and
the environment features such as user engagement. a model was trained for each classifier from
a set of labeled datasets, and later then used for classifying whether an article is fake or real.
The best results were obtained by Random Forest and XGB classifiers with 0.85 (± 0.007) and
0.86 (± 0.006) for AUC, respectively.
2.1.3 Text Classification Using Language Modeling Techniques

System discussed in [7] computes the interrelationship between news article headline and body
text uses this relationship to find fake or clickbait news content. The dataset used for this system
if Fake News Challenge (FNC1) dataset [25]. Another feature used for classifying the article
used by the system is to crossmatch the stance of news article published by a news source with
articles published by other sources, because the majority of the news sources have same stance
on an authentic news article. N-Gram matching technique is used to classify the body and
headline as related or unrelated, number of n-grams matching in headline and body are
calculated and multiplied by IDF score of the matching n-gram, the answer is then normalized
by dividing with sum of headline length and body length, if the score is above a threshold then
it is labeled as related and labeled unrelated otherwise, this score contributes a weight of 25%.
The articles labeled as related are further classified into 3 classes, namely discuss, agree and
disagree, for this purpose logistic regression is performed on headlines data, if the distance
between the top two classes if less than a threshold then 3 binary classifiers are trained on both
body and headline of article, if the distance is less than the threshold the top class label is
assigned to the sample, this classification has weightage of 75%. The combined accuracy
achieved from the system is 87%.
The system proposed in [8] implements N-gram analysis model along with machine learning
techniques for fake news detection. N-gram is simply a word sequence, characters, syllables
etc. This system uses n-gram of words to represent a document and produce appropriate
features needed to classify it. Several word-based n-gram features were utilized to analyze the
effect of n-gram type on classification performance. Some preprocessing steps were made to
clean the data. These steps include tokenization, lowering the case of words, removal of stop-
words, sentence segmentation. To prevent the effect of high dimensional data on accuracy the
features were normalized/reduced. For feature reduction, two different methods were analyzed,
TF and TF-IDF. After data is preprocessed and feature representation is formed using N-gram,
the classifier is trained to identify these feature representations. To classify a document using
these features six different machine learning techniques were analyzed. A public dataset named
Horne and Adali [9] was used. Another dataset was used which was compiled from two sources
- Reuters.com for real news articles and a fake news dataset taken from Kaggle with fake news
articles. These datasets were used for the training and testing of the system. For each
experiment, 80% of the dataset was used for training and 20% for testing. The proposed model
showed the best accuracy when using unigram features, TF-IDF feature reduction and Linear
SVM classifier. It showed the best accuracy of 92%.
From the above researches it can be seen that neural network models, namely LSTM provide
better performance if enough data is available, moreover, machine learning techniques also
give promising results.
Image Manipulation Detection

In case of image manipulation or splicing detection, deep neural networks especially deep
convolution networks are the standard of image classification, moreover, unsupervised or self-
supervised techniques that use image meta data. The following section describes the various
researches in image manipulation detection.
2.2.1 Image Manipulation Detection Using Neural Networks

Method deployed by [10] focuses on detecting image forgery using GANs. This real to forged
translation is a technique in which an image belonging to a certain domain is translated or
mapped to a different domain, the process is done using a Generative Adversarial Networks
(GAN) composed of two parts i.e. a generator-network and a discriminator-network, the
generator is trained until it deceives the discriminator. The detection methods are composed of
some generic image manipulation detectors which include architecture proposed by Cozzolino
[11] and network architecture by Bayar [12], and state-of-the-art Convolutional Neural
Network (CNN) architectures namely InceptionNet v3 [13], DenseNet [14] and XceptionNet
[15]. The dataset used contains images generated using CycleGAN [16], the dataset is divided
into multiple categories and each contains both the real and fake images generated using the
network. After training the dataset on compressed and uncompressed images, the evaluation
shows an average accuracy of 89.5% with training and testing on uncompressed data, an
average accuracy of 68.9 % with training on uncompressed and testing on uncompressed data
and accuracy of 81.5% with testing and training on compressed data. The results proof
XceptionNet to be the most accurate with an average accuracy of 89.03% in all situations.
The system proposed in [17] uses Convolutional Neural Network designed specifically for the
detection of spliced image and copy-move. The CNN is trained on patches of images extracted
from the training set. The forged samples are extracted randomly along the lines of the regions
where forgery was done. For the negative samples, equal number of authentic regions are drawn
from the images. The input of the CNN is of size 128x128x3 (128x128 patch in 3 color
channels) The architecture is shown in Figure 2:
Figure 2: CNN Architecture for Image Splicing Detection

Architecture of CNN architecture for learning features from image patches [17]
This CNN serves to draw out features from input images. The final discriminative features
from the CNN features are then obtained by using a feature fusion technique, which are then
used for classification of the image through SVM classifier. The proposed system gave an
accuracy of 96.04% on CASIA Version 1.0 [26], 97.83% on CASIA Version 2.0 [26] and
96.38% on DVMM [27] outperforming many state-of-the-art models.
2.2.2 Image Manipulation Detection Using Self-Supervised Learning

The system discussed in [18] makes use of anomaly detection to classify images as fake or real.
The system examines whether a given image is consistent with its metadata or not. The adopted
method falls under unsupervised or self-supervised learning paradigm; hence a labeled dataset
is not required by the system. The Exchangeable image file format (EXIF) features are used a
supervisory signal in training the classifier. A separate classifier is learned for each EXIF tag
and all the classifiers are combined to calculate the self-consistency of an image with its EXIF
metadata. The whole image is divided into patches of resolution 128x128, then Siamese Neural
Network is used to predict the probability that a given path is consistent with the EXIF features
of the image. The network outputs 4096-dimensional feature vectors, these vectors are
concatenated and passed through a Multi-Layer-Perceptron network with 4 layers, this network
predicts the probability that all the patches share the same value for each metadata attribute.
The network is trained with image patches randomly sampled from 400,000 Flickr photos.
After calculating consistency for each patch, overall consistency is computed for the image.
Five datasets were used for evaluation Columbia dataset [27], Carvalho et al. [19], Realistic
Tampering [20], Images scraped from The Onion and Reddit Photoshop Battles, In-the-Wild
forensics dataset [18] and dataset of Hays and Efros [21]. The model had the highest result on
each data set except one. In Columbia dataset method achieved 94% accuracy, in Carvalho
dataset accuracy was 64% and in Hays, accuracy was 65% and 59% in In-the-Wild dataset.
From the literature survey it can be seen that supervised deep learning techniques provide better
results, the self-supervised techniques also provide promising results however, in real world
news article images meta data is not present. Hence, supervised techniques are more suitable.
Requirements and Design

Functional Requirements
Following are the Functional Requirements of the system
1. The system shall periodically scrape news articles from various news sources.
2. The system shall store the scrapped news articles in a relational database.
3. The system shall classify a given news article according to its authenticity.
4. The system shall separately analyse image data extracted from the news article to detect
image manipulation.
5. The system shall rank the news articles based on their image and text classification
labels.
6. The system shall allow users to create an account.
7. The system shall allow users to view the news articles from various news sources
ranked to their according their authenticity.
8. The system shall allow users to post their comment on the news articles.
9. The system shall allow users to rate news articles.
Non-Functional Requirements
Following are the non-functional requirements of the system.
3.2.1 Performance Requirements

The application shall meet the following performance requirements:
• At least 100 users shall be able to simultaneously access the system without
performance degrade.
• News sources shall be scrapped at least twice a day.
3.2.2 Security Requirements

The following security requirements shall be met by the system
• User passwords shall be stored as an irreversible salted hash.
• System shall be safe from XSS attacks.
3.2.3 Usability Requirements

The following usability requirements shall be met by the system
• System should provide an intuitive User Interface.
• Theme should be consistent across the whole system.

Requirements and Design 10
Hardware and Software Requirements

Following sections describes the hardware and software requirements of the system.
3.3.1 Hardware Requirements

Following are the hardware requirements for our project
• A system with high computing power capable of training fairly complex Neural
Networks in reasonable time periods.
• A cloud server for deployment of web platform. The cloud server shall be capable of
handling moderate user traffic.
3.3.2 Software Requirements

Following are the software requirements for our project
• AWS EC2 Instance [34]
• AWS S3 Bucket [35]
• ProxyCrawl [36]
• Google Colab
• TensorFlow
• Keras
• Gensim
• NLTK
• Django
• Angular 8+
• Scrapy
• PostgreSQL
• pgAdmin 4
• Bootstrap 4
System Architecture
This section describes the architecture of the system.
3.4.1 System Modules

Following is a brief description of the internal architecture of the system modules.
3.4.1.1 Web Scrappers

This module shall be responsible to fetch news articles from popular news sources. This module
will comprise of two main components.
3.4.1.1.1 RSS Scrapper

The RSS scrapper will periodically run on the RSS feeds of popular news sources and the links
of any new news articles and transfer these links to the appropriate article scrapper.
3.4.1.1.2 Article Scrapper

The system will comprise of multiple article scrappers each designed for a specific news
source. Each article scrapper will work on news links of their respective website and scrap the
content from the website and create a structured article data and send to the final structured
news database. These scrappers will use an external service named ProxyCrawl [36] to fetch
the contents of the article.
3.4.1.2 Ranking Model

This module shall be responsible for assigning a reliability score to pending unranked news
articles in the final structured news database. This module will comprise of three main
components.
3.4.1.2.1 Image Classification Model

This component will work on the images that are part of the article. The model will analyze the
images of the article to classify whether the image is original or if it has been manipulated and
transfer this classification to the complete score component.
3.4.1.2.2 Text Classification Model

This component will work on the textual part of the article. The model will analyze the textual
content of the article in order to identify the reliability of the news article tells whether an article
is fake or not.
3.4.1.2.3 Combined Score

This component will receive the class assigned to the images of the article by the image
classification model and the score assigned to textual content of the article by text classification
model and assign a combined score to the article. This will be the final reliability score of the
article. The component will transfer this score to the finalized news articles database.
3.4.1.3 Final Structured News Database

This component is responsible for storing the structured contents of the articles. This database
will comprise of both articles that are scrapped from popular news sources and articles that are
posted on the Web Platform by the users. The database will keep record of both ranked and
unranked (pending) news articles.
3.4.1.4 Web Platform

This module shall be responsible for the web interface where the user will be able to view,
interact with and post news articles.
3.4.1.4.1 Backend
This is the server side of the web application which will by based on Python Django
Framework. It will be responsible for managing the user data and news articles and providing
the appropriate responses to the requests made by user from the frontend. It will communicate
with the final structured news database to fetch ranked news articles as well as store pending
articles to the database.
3.4.1.4.2 Frontend
This is the client side of the web application which will be based on Angular Framework. It
will be responsible for providing user with an interface to interact with and post news articles.
It will communicate with the backend to fetch the data for the user and to update user data and
post news articles.
3.4.2 External Systems

Following is a brief description of the system interactions with external systems and entities.
3.4.2.1 RSS Feeds

RSS feeds are a standardized web feed which provides system the functionality to access
updates to websites. The system will use RSS feeds to grab links of any newly posted news
articles on the popular news sources which would then be scraped for complete articles.
3.4.2.2 ProxyCrawl
ProxyCrawl is a web API based service which provides the functionality of scrapping websites
using continuously changing proxies. The system will use ProxyCrawl to extract news articles
content in order to avoid getting banned by news sources. Additionally, ProxyCrawl also
provides the service of scrapping JavaScript based pages therefore the system will also use
ProxyCrawl for grabbing the content of any news sources that use JavaScript pages.
3.4.3 Architecture Diagram

The architecture diagram showing the internal architecture of the modules, as well as the
external architecture of the system with other systems as following:
Figure 3: System Architecture

Figure describes the architecture of the whole system and its modules
Architectural Strategies
Following architectural strategies are decided for the project.
3.5.1 Product External Dependencies

The system shall depend upon the following external services:
• The server side of the application would be deployed on an AWS EC2 instance [34].
• We will we using AWS S3 bucket [35] for storing user uploaded images as well as
images from scrapped articles.
• The system will use ProxyCrawl web API service to scrap external news articles.
• The ranking model will be trained on Google Colab.
3.5.2 Product Enhancement and Extensibility

Following strategies can be adopted:
• Since currently the product would be able to fetch news from limited sources as explicit
scrappers are required for each source. In the future the product can be extended by
scrapping more and more sources either through more scrappers or we could transition
towards basing the platform on an artificial intelligence-based scrapper like Diffbot
[32].
• We can target more types of fake news to enhance our text ranking model.
• We can enhance our Image manipulation detection model specifically according to

news images e.g ignoring watermarks, detecting computer generated images etc.
• We can extend our website to add more functionalities for user interaction like rating
sources, functionality for sources to add their website on the platform for automatic
news capturing.
3.5.3 Concurrency and Synchronization

Following strategies can be adopted:
• We will be using async operations on frontend, so user is able to interact with website
while the system is working on any time-consuming operations like web API requests.
• We will be using parallel processing on the backend application so that the application
is able to handle multiple users simultaneously.
3.5.4 User Interface Paradigms

The system will follow HCI paradigms such as Ben Shneiderman's 8 golden rules and fitts law.
Use Cases
Following are the main use cases of the system.
3.6.1 User Registration

Name User Registration
Actors User
Summary User shall register on the website
Pre-
User must be on the signup page
Conditions
Post-
User gets registered
Conditions
Special
None
Requirements
Basic Flow
Actor Action System Response
User enters name, email and Verifies that the account doesn’t exist,
1 2
password. registers the user and displays homepage
Alternative Flow
System displays the message email already
1 User inputs an existing email. 2-A
exists.
1 User enters an empty field. 2-B System prompt missing field error.
3.6.2 Login
Name Login
Actors User
Summary User shall login into the website
Pre-
User is on the login page and user is registered
Conditions
Post-
User is signed in
Conditions
Special
None
Requirements
Basic Flow
Verifies that the account exists and
1 User enters email and password. 2
displays homepage to user.
Alternative Flow
User inputs an invalid email or System prompts invalid email or password
1 2-A
password. error.
3.6.3 View List of Articles

Name View List of Articles
Actors User
Summary User shall view list of news articles
Pre-
None
Conditions
Post-
User shall be on news articles list page
Conditions
Special
None
Requirements
Basic Flow
User navigates to view list of news System will display list of latest news
1 2
articles articles sorted by their reliability score.
3.6.4 View Article Details

Name View Article Details
Actors User
Summary User shall view details of a specific article
Pre-
User shall be on the articles list page
Conditions
Post-
User shall be on news articles list page
Conditions
Special
None
Requirements
Basic Flow
1 User selects an article 2 System displays the details of chosen article
3.6.5 Post a Comment

Name Post a Comment
Actors User
Summary User shall post a comment on an article
Pre-
User is on the article’s detail page; user shall be signed in
Conditions
Post-
Comment will be posted on the article and will be displayed to the user.
Conditions
Special
None
Requirements
Basic Flow
System verifies the comment and displays
1 User enters his comment. 2 it on the page and adds the comment in the
database.
Alternative Flow
1 User inputs an empty comment. 2-A System prompts empty field error.
3.6.6 Rate a News Article

Name Rate a News Article
Actors User
Summary User shall rate a news article
Pre-
User shall be on the article’s detail page; user shall be signed in
Conditions
Post-
News article rating will be added and average rating will be updated
Conditions
Special
None
Requirements
Basic Flow
User shall rate the article by clicking User’s rating is displayed along with the
1 2
on the stars alongside the header average rating
3.6.7 Change Password

Name Change Password
Actors User
Summary User shall change password
Pre-
User is signed in
Conditions
Post-
Password is updated
Conditions
Special
None
Requirements
Basic Flow
System verifies the old password and
User enters old password, new matches the new password with
1 password and new password 2 confirmation password, updates the
confirmation. password and display’s the confirmation
prompt to user
Alternative Flow
User inputs invalid previous
1 2-A System prompts password error.
password.
User enters different new password
1 2-C System prompts password error.
and confirmation password.
User enters same new and old
1 2-D System prompts password error.
password
3.6.8 Reset Password

Name Reset Password
Actors User
Summary User shall reset password
Pre-
User is on password reset page
Conditions
Post-
Password is reset
Conditions
Special
None
Requirements
Basic Flow
Verifies that the account exists and sends a
1 User enters email. 2
random code to the enter email.
3 User enters the received code. 4 System verifies the code
System matches the new password with
User enters new password and confirmation password, updates the
5 6
confirmation password password and display’s the confirmation
prompt to user
Alternative Flow
1 User enter invalid email. 2-A System prompts invalid email error.
2 User enters invalid code. 4-A System prompts invalid code error.
User enters different new password
5 6-A System prompts password error.
and confirmation password.
3.6.9 Post a News

Name Post a News
Actors User
Summary User shall post news on the website
Pre-
User will be logged into the website
Conditions
Post-
User article will be saved into the to be ranked queue of the model
Conditions
Special
None
Requirements
Basic Flow
1 User clicks on post news 2 Post news page will be displayed
User enters the text and uploads News will be saved in the to be ranked
3 4
images and clicks post. queue
Alternative Flow
3 User enters empty text. 4-A System prompt missing field error.
3.6.10 View Profile

Name View Profile
Actors User
Summary User shall view their profile
Pre-
User will be logged into the website
Conditions
Post-
Profile of the user will be displayed.
Conditions
Special
None
Requirements
Basic Flow
1 User navigates to profile page 2 User profile will be displayed
3.6.11 Update Profile

Name Update Profile
Actors User
Summary User shall update their profile
Pre-
User will be logged into the website, profile page of opened
Conditions
Post-
Profile of the user will be updated.
Conditions
Special
None
Requirements
Basic Flow
1 User clicks on edit profile 2 System will display edit profile page
User enters updated profile details System will verify the data and update the
3 4
and clicks on update profile profile
Alternative Flow
3 User enters invalid data in a field. 4-A System prompts invalid data error.
3 User leaves a required field empty 4-B System prompts missing field error.
GUI
This section provides the prototype interface screens of the project.
3.7.1 Login Screen
Figure 4: Login Screen

Figure shows the user interface for login screen
3.7.2 Register Screen
Figure 5: User Register Screen

Figure shows the user interface for register screen
3.7.3 News Article Feed
Figure 6: News Article Feed Screen

Figure shows the news article feed screen
3.7.4 News Article Details
Figure 7: News Articles Details

Figure shows news article details page
Database Design
This section provides Entity Relationship Diagram along with Data Dictionary.
3.8.1 ER Diagram
Figure 8: ER Diagram
Figure shows ER diagram for the database design
3.8.2 Data Dictionary

Table 1: Data Dictionary Table
Following table shows complete data dictionary of each attribute in the Database
Data Relation Relation
Entity Attribute Nullable Description
Type To Type
Primary key of
Article_id Int No
article
Title varchar No Title of the article
All text content of
Text varchar No
the article
Score assigned to
the by the
Reliability_score varchar Yes
reliability ranking
Model
Type of article
Article Type Int No (scrapped or user
post)
The average rating
Avg_rating Int Yes
of the article
Combined
Total_score Int Yes reliability score and
rating of the article
Id of user who
User_id Int Yes User 1 to 1
posted the article
News Id of news source
Source_id Int Yes 1 to 1
Source of article
Primary key of the

Article_img_id Int No
article image
Article
Article_img_url varchar No URL of the image
Image
Id of the article the
Article_id Int No Article * to 1
image belongs to
Primary key of the
Source_id Int No
source
News
URL of source
Source Source_img_url varchar Yes
image
Name varchar No Name of the source
Email of the user,
Email varchar No Primary key of
user.
Hashed Password
Password varchar No
of the user
User
URL of user’s
Profile_pic_url varchar Yes
profile picture
Name varchar No Name of the user
Small description
Bio varchar Yes
of the user.
Primary key of the
Comment_id Int No
comment
The text of the
Text Text No
comment
Comment Id of the article the
comment is about
Id of the user who
User_id Int No User * to 1 posted the
comment
Primary key of
Rating_id Int No
rating
The value of the
Rating_value Int No
rating (1 to 5)
Rating
Id of the article the
rating is about
Id of the user who
User_id Int No User * to 1
posted the rating
System Requirements
The product shall require the following features on the system to work as expected
• A stable working internet connection
• A stable up to date web browser

Design Considerations
This section describes the design considerations for the system.
3.10.1 Assumptions
Following are the assumptions made for specifications
• RSS feeds are available for every news source website
• The website structure remains unchanged for the scrappers to work
• The user should be computer literate.
• The user should have knowledge of English language.
• The user should know the basic usage of internet and web browsers.
3.10.2 Dependencies
Following are the dependencies are present for the system
• Since it is a web-based application, a constant internet connection is required all the
time.
• Our daily news update depends on how frequently our news sources update their
websites since our scope is to only rank news articles.
• Our system’s web crawler solely depends on ProxyCrawl. It’s a web service that keeps
on changing the request IP address so that the website does not ban our crawler.
• We will be using Amazon Web Services for the deployment of our system on cloud.
• To properly scrape news articles, we need up to date RSS feeds.
3.10.3 Constraints
Following are the constraints on the project
• Since model training requires high GPU power, we will need Google Colab for training
our model.
• ProxyCrawl is a paid service. It has request limits for a certain price plan.
• AWS is a paid service and needs to be re-subscribed monthly/yearly.
Development Methods
We will use scrum development design model for the development of our web application.
Scrum is one of the implementations of agile methodology which is based on iterative and
incremental methodologies. We will divide our work into goals that can be achieved within
one timeboxed iteration (sprints). This timebox will be no longer than one week. The complete
progress will be tracked and re-planned in 15-minute stand-up meetings (daily scrum) every
week.
Scrum is a framework for managing intricate works. Since our project is fairly complex in
nature which requires spending time on both research and development, scrum is the best suited
methodology for it.
Class diagram
The section shows the class diagram of the system.
Figure 9: Class Diagram

Figure show class diagram and relation between classes
Sequence diagram
Following are the sequence diagrams for the system.
3.13.1 User Login Sequence Diagram
Figure 10: User Login Sequence Diagram

Figure shows the sequence diagram and flow for user login
3.13.2 User Register Sequence Diagram
Figure 11: User Register Sequence Diagram

Figure show the sequence diagram and flow for user register
3.13.3 Show News Articles List Sequence Diagram
Figure 12: Show News Articles List Sequence Diagram

Figure shows the sequence diagram and flow for listing news articles
3.13.4 View Article Details Sequence Diagram
Figure 13: View Article Details Sequence Diagram

Figure shows the sequence diagram and flow for viewing article details
3.13.5 Post a Comment Sequence Diagram
Figure 14: Post Comment Sequence Diagram

Figure shows the sequence diagram and flow of posting a comment
3.13.6 Rate an Article Sequence Diagram
Figure 15: Rate an Article Sequence Diagram

Figure shows the flow of rating an article
3.13.7 Changing the Password Sequence Diagram
Figure 16: Changing the Password Sequence Diagram

Figure shows the flow of changing the user account password
3.13.8 Posting a News Article
Figure 17: Posting a News Article

Figure shows the flow of posting a news article on the website
3.13.9 Resetting the Password
Figure 18: Resetting the Password

Figure shows the flow of resetting the user password through email verification
3.13.10 Updating the Profile
Figure 19: Updating the Profile

Figure shows the flow of updating the user profile
3.13.11 Viewing the Profile
Figure 20: Viewing the Profile

Figure shows the flow of updating the user profile
Policies and Tactics

3.14.1 Tools to be used
We will use visual studio code for frontend development and PyCharm for backend
development. We will use PostgreSQL for maintaining database. For the purpose of training
our classification model we will use Google Colab and for web scrapping we will use Proxy
Crawl web service.
3.14.2 Coding structure

We will follow MVC structure for our web application.
3.14.3 Policies for system testing

We shall use both white box and black box testing but our main focus will be on white box
testing.
3.14.4 Plans for system maintenance

We will keep our web scrappers up to date as many websites tends to change their designs with
time.
3.14.5 Policies for system interface

Interface will be user friendly. Users will be able to interact with website with ease.
Implementation and Test Cases

Implementation
This section describes the implementation done so far.
4.1.1 Dataset Collection

Dataset used in the project is FakeNewsNet [29]. The dataset consists of real news articles and
fake news articles verified from two major news verification sources i.e. Politifact [31] and
Gossipcop [30]. The dataset contains 432 fake and 624 real articles from Politifact and 6,048
fake and 16,817 real news articles from Gossipcop.
The dataset found only contains URLs of fake and real news articles, hence we had to use a
prebuilt scrapper [33] to extract the news article contents and headlines for our use. We had to
set up a local flask server and ran the scrapper for both news sources.
The obtained dataset contained the data in separate json file format. The extracted data needed
to be preprocessed for use in the training model because the raw extracted contains HTTP
delimiters and HTML tags.
In addition to only text and headlines, source features were also collected. News Source ratings
is also collected from Amazon Alexa’s news skill. Another feature was number of followers
on twitter page of news sources which was available in FakeNewsNet dataset.
The prominent issue with the dataset is that the classes are highly imbalanced. To fix this class
imbalance we scraped fake news articles from gossipcop and politifact to make the classes
roughly equal. The reason for using these two sources is that the actual dataset also uses articles
from these sources.
4.1.2 Dataset Preprocessing

Preprocessing is a must for every text classification task, text data in raw form can never be
used to train deep learning models, moreover, noise in the text can further effect
The json files obtained from the scrappers has some junk characters and needed to be cleaned
so that it could be easy to use and to remove unnecessary noise from the text.
The first step is to remove delimiters like carriage return (‘\r’), newline (‘\n’) and punctuation
marks from the text. Secondly, the text needs to be tokenized and converted to integer
sequences where each integer id corresponds to a distinct entry in the vocabulary. For this
purpose, Keras Text Tokenizer is used. The tokenization has been done for each news
document in the corpus.
Next step after tokenization is Word Embedding. Each word in the corpus is represented as a
dense vector based on its context words. Keras embedding layer is used for this purpose.
The preprocessed data can be then used for training a deep learning model.
The news source features were not available for some sources. To fix this issue the missing
values were interpolated.
4.1.3 Image Dataset Collection

A large dataset is publicly available for image forgery detection. As a solution a synthetic
dataset for image splice detection is created using two base datasets namely Microsoft COCO
[38] and PascalVOC [37]. For creating the dataset two images are chosen from the base dataset,
Implementation and Test Cases 40
one image serves as background and a random rectangular block is chosen from second image
as foreground image. The foreground image is spliced onto the background image at a random
region for creating a tampered image, the bounding of the random region of background image
is saved for annotation.
The synthetics dataset consists of 35000 annotated tampered images along with their 35000
pristine counterparts.
4.1.4 Image Dataset Preprocessing

Only preprocessing applied to the images is the resizing and normalization. The images are
resized to a fixed resolution of 600 x 600 x 3. All the pixel values are normalized in range of
0-1 to make the training faster.
4.1.5 Text Classification Model Implementation

For the text classification tasks, deep learning models namely LSTM and GRU have been tested
and compared with our proposed 1D-CNN model.
For text and other sequence processing tasks, RNNs have been popular. However, in recent
researches CNNs have also been used for text processing. We have also focused on a CNN
model and compared it with other RNN model types.
LSTM and GRU belong to the family of RNNs. However, vanilla RNNs have a problem of
vanishing gradients. Another problem with RNN is that it cannot remember long term relations
between input sequences. These two problems are addressed in later architectures such as
LSTM and GRU.
Before feeding input to the model they are first given a word embedding matrix representation.
Each row in the matrix is a finite n dimensional representation of a word. To do so Keras’s
trainable embedding layer is used. The input is a sequence of words represented in the form of
integers; each integer is the id of word in the vocabulary. The output is an b x n x m tensor
where b is the batch size, n is the max number of words allowed and m is the embedding
dimension. This layer is made trainable so that the representations can be fit on our dataset.
Three types of deep learning models are tested for classification the news articles namely
LSTM and GRU and our proposed CNN. All the models are implemented using Tensorflow
and Keras. In order to train on the both the articles and body, the models were made
multichannel.
For the LSTM and GRU implementation, 50 LSTM cells are used in the model for text body
channel, because adding more cells lead to overfitting. The headline channel uses 20
LSTM/GRU units.
For our proposed CNN implementation, a single convolution layer with 128 feature vectors are
used in the text body channel. In the headline channel 64 feature vectors are extracted.
After feature extraction using LSTM, GRU and Convolution Layers from the headline and text
channels, the feature vectors are passed to an attention mechanism layer. The purpose of the
attention is to focus only on a subset of input and process input piece by piece similar to a
human. The output of attention layer is passed to flatten layer. The output from flatten layers
of both channels are concatenated, other input features are also concatenated at this point. The
output from concatenation layer is passed to dense classifier to classify article as fake or real.
For all the model implementations Adam optimizer and Cross Entropy is used as a loss
function.
The classification algorithms were trained using multiple features. The features were the
Article Text, Article Body, News Source Credibility Features (News Source Star Ratings and
Number of Followers of News Source on Twitter).
Various combinations of these features were embedded in the model and performance was
evaluated.
4.1.6 Tamper Detection Model Implementation

The implemented model is similar to [39], where 2 Faster-RCNN models are used to detect
tampering in images. The model is basically two Faster RCNNs which means that model has
two base CNNs, two Region Pooling Networks and two Region of Interest Pooling layer.
In summary, Faster-RCNN first of all extracts feature maps from the image using a CNN
model, better practice is to use a pretrained model. The features are passed to a Region
Proposition Network which uses fixed number of bounding boxes distributed throughout the
image. At RPN layer we find whether the box correctly overlaps the object or not. The RPN
features are passed to Region of Interest Layer which extracts features from areas predicted by
RPN and CNN. Finally, the RCNN module classifies it and backpropagates to better fit the
data and give better bounding box overlaps. Each module has its own loss and the optimization
function minimizes them. The advantage that Faster RCNN has over base RCNN model is that
it uses RPN layer and does not perform 2000 CNN predictions on the image using selective
search as done in RCNN.
The method we implemented consists of two Faster-RCNN networks. One gives predictions
on the RGB image and the second one uses noise maps produced using 3 filters proposed in
[39]. The base CNN network is a Resnet-50 pretrained on ImageNet. Both networks give their
bounding boxes and the errors are backpropagated to correctly detect and localize tampered
regions. The anchor boxes used in the networks have size scales of 64, 256 and 1024. The
anchor ratios are 1:1, 1:2 and 2:1.
The RPN and CNN network use an Adam optimizer with a learning rate of 0.00001. The overall
model uses SGD optimizer with momentum with a learning rate of 0.00001.
Before training the images are all scaled to a resolution of 600x600. The image pixel values
are scaled between 1 and 0. The pixel values are mean normalized as well.
The bounding box predictions are given by both networks. The final bounding box is the region
that is common in both predictions.
The model is implemented using Keras framework with Tensorflow as a backend. The RPN
and ROI Layer have been implemented as custom layers using keras layer modules. The
Resnet-50 that has been pretrained on ImageNet is imported from keras Applications module.
The Resnet-50 for noise input has been initialized randomly and trained from scratch on the
dataset. These weights could also be pretrained on the ImageNet and would make the training
faster.
4.1.7 Web Platform Implementation

Following section describes the web application implementation. It describes the
implementation of frontend, backend.
4.1.7.1 Frontend Implementation

The frontend of the application has been implemented using Angular 8. The system follows an
MVC structure. Angular HTTPClient module along with RxJs observables is used to manage
communication with backend. The system allows the user to navigate through the ranked
articles fetched from news sources and post comments on them.
The frontend module first displays a Sign In page where user inputs his credentials and then
the frontend module authenticates the user from the backend by using an API call. The
authentic user can then view the ranked list of articles, which frontend module fetches from the
backend using an API call. User can also read complete article using the interface provided the
frontend application.
4.1.7.2 Backend Implementation

The backend of the system is built on Django along with Django Rest Framework to manage
API requests. The backend follows an MVC structure. The backend of the system is responsible
to manage user data and handle user requests. The backend module only deals with ranked
articles in the database and does not handle articles that are pending ranking.
The backend application fetches ranked articles and their scores from the database and returns
them to the frontend module on receiving an API request. Backend module also stores basic
user information in the database and also authenticates existing user when a user signs in.
4.1.8 News Scrapper Implementation

Following are the details of the module responsible for scrapping data regularly from news
sources. All scrappers of this module will be built in the Python framework “Scrapy”.
4.1.8.1 RSS Scrapper

This scrapper is triggered at regular intervals. This scrapper scrap’s the RSS feeds of all
targeted news sources and looks for any new article links. To scrap the contents of the RSS
feeds, a python library known as “Feedparser” is used. This scrapper matches the top news
articles from RSS feeds with the database articles to identify any new articles and passes these
links to relative scrapper for that website and triggers all the website scrappers with any updated
news.
4.1.8.2 Website Scrapper

This scrapper is triggered by the RSS feed scrapper. This module contains multiple scrappers.
The purpose of each scrapper is to scrap news from a particular website. For now, we have
targeted news from only three news sources. Each website scrapper will deal with the new
article links of its relative source and shall be responsible for fetching the article data from that
website.
Test Case Design and Description

All the test cases mentioned here require Google Chrome version 45+, Mozilla Firefox version
38+, Opera version 9+, Internet Explorer 10+ and Microsoft Edge 12+ to run. For windows
operating system Windows 7 and later versions are supported. Moreover, an internet
connection is required for all test cases.
Table 2 shows the mapping of component ids with their names in the software.
Table 2: Component ID and Name Mapping

Table shows component id and name of different component in the system.
Component Id Component Name
1 User Management Component
2 News Feed Component
3 News Interaction Component
4 News Scrapper Component
5 User Interface Component
Table 3 contains test case ids with their respective names.

Table 3: Test Case ID and Name Mapping
Table shows names of different test cases and their id.
Test Case Id Test Case Name
1 User Registration Test Case
2 User Registration Alternate Scenario 2A Test Case
3 User Registration Alternate Scenario 2B Test Case
4 User Login Test Case
5 User Login Alternate Scenario 2A Test Case
6 User Login Alternate Scenario 2B Test Case
7 View List of Articles Test Case
8 View Article Details Test Case
9 Post a Comment Test Case
10 Rate a News Article Test Case
11 Reset Password Test Case
12 Reset Password Alternate Scenario 2A Test Case
13 Reset Password Alternate Scenario 2B Test Case
18 Post a News Test Case
19 Post a News Alternate Scenario 4A Test Case
20 View Profile Test Case
21 System Load Test Case

22 News Scrapping Schedule Test Case
23 Password Hashing Test Case
24 XSS Attack Prevention Test Case
25 Intuitive User Interface Test Case
26 Theme Consistency Test Case
4.2.1 User Registration Test Case

User Management Component
1
Test Case ID: 1 QA Test Engineer: Syed Ahmad Saeed
Test case Version: 1.0 Reviewed By: Muhammad Talal
Test Date: 19/03/2020 Use Case User Registration
Reference(s):
Revision History: -
Objective To test user registration functionality.
Product/Ver/Module: A Sentiment and Review Analysis Based News Reliability Ranking
Platform / 1.0 / Web Platform Module
Environment: Web Browser
Working internet connection
Assumptions: -
Pre-Requisite: User must be on the signup page.
Step No. Execution description Procedure result
1 Enter name, email id and password System takes user to home page and adds a
and clicks sign up button. new user in the database.
Comments: System works properly according to the requirements.
Passed Failed Not Executed
4.2.2 User Registration Alternate Scenario 2A Test Case

1
Test Case ID: 2 QA Test Engineer: Obaid Ur Rehman
Reference(s):
Revision History: -
Objective To test user registration functionality when email already exists.
Assumptions: -
1 Enter name, already existing email id System displays the message email already
and password and clicks sign up exists.
button.
Comments:

4.2.3 User Registration Alternate Scenario 2B Test Case

1
Reference(s):
Revision History: -
Objective To test user registration functionality when user leaves a field empty.
Assumptions: -
1 Leave name, email id or password System prompts an empty field error.
field empty.
Comments:
4.2.4 User Login Test Case

1
Test Case ID: 4 QA Test Engineer: Muhammad Talal
Test case Version: 1.0 Reviewed By: Syed Ahmad Saeed
Test Date: 25/03/2020 Use Case Login
Reference(s):
Revision History: -
Objective To test user login functionality.
Assumptions: -
Pre-Requisite: User must be on the login page and user is registered.
1 Enter valid email id and password. System takes user to the home page.
Comments:

4.2.5 User Login Alternate Scenario 2A Test Case

1
Reference(s):
Revision History: -
Objective To test user login functionality when credentials are invalid.
Assumptions: -
Pre-Requisite: User must be on the login page.
1 Enter invalid email id or password. System prompts invalid email or password
error.
Comments:
4.2.6 User Login Alternate Scenario 2B Test Case

1
Reference(s):
Revision History: -
Objective To test user login functionality user leaves a field empty.
Assumptions: -
Pre-Requisite: User must be on the login page.
1 Leave email id or password empty. System prompts empty field error.
Comments:

4.2.7 View List of Articles Test Case

News Feed Component
2
Test Date: 01/04/2020 Use Case View List of Articles
Reference(s):
Revision History: -
Objective To test article listing functionality.
Assumptions: There is at least one ranked news article in the database.
Pre-Requisite: -
1 Navigate to view news articles list. System displays a list of news articles, but
the score is very low. The articles are not
classified properly.
Comments:
4.2.8 View Article Details Test Case

News Feed Component
2
Test Date: 01/04/2020 Use Case View Article Details
Reference(s):
Revision History: -
Objective To test article details functionality.
Assumptions: There is at least one ranked news article in the database.
Pre-Requisite: The user must be on article list page.
1 Click on a news article. System takes user to details page of selected
articles.
Comments:

4.2.9 Post a Comment Test Case

News Interaction Component
3
Test Date: 06/04/2020 Use Case Post a Comment
Reference(s):
Revision History: -
Objective To test user comment functionality.
Assumptions: -
Pre-Requisite: User is on the article’s detail page and User shall be signed in
1 Navigate to comment section and System displays the comment on the screen
enter comment and adds comment to the database.
Comments: System works properly according to the requirements and properly
authenticates user comments.
4.2.10 Rate a News Article Test Case

3
Test Date: - Use Case Rate a News Article
Reference(s):
Revision History: -
Objective To test article rating functionality.
Assumptions: -
Pre-Requisite: User is on the article’s detail page and User shall be signed in
1 Give rating by selecting star from 1 System updates the average rating and
to 5 stars on the header of the article. displays it on the page.
Comments:

4.2.11 Reset Password Test Case

1
Test Date: - Use Case Reset Password
Reference(s):
Revision History: -
Objective To test reset password functionality.
Assumptions: -
Pre-Requisite: User must be on password reset page.
1 Enter email. Verifies that the account exists and sends a
random code to the enter email.
2 Enter the received code. System verifies password and sends user to
New Password page.
3 Enter new password and System matches the new password with
confirmation password. confirmation password, updates the
password and display’s the confirmation
prompt to user.
Comments:

4.2.12 Reset Password Alternate Scenario 2A Test Case

1
Reference(s):
Revision History: -
Objective To test reset password functionality when user enters an invalid email
id.
Assumptions: -
Pre-Requisite: User must be on the password reset page.
1 Enter an invalid email id. System prompts invalid email message.
Comments:
4.2.13 Reset Password Alternate Scenario 2B Test Case

1
Reference(s):
Revision History: -
Objective To test reset password functionality when user leaves an empty field.
Assumptions: -
1 Leave any field empty in the form. System prompts missing field error.
Comments:


1
Reference(s):
Revision History: -
Objective To test reset password functionality when user enters an invalid code.
Assumptions: -
1 Enter any invalid code. System prompts invalid code error.
Comments:

1
Reference(s):
Revision History: -
Objective To test reset password functionality when user leaves an empty code
field.
Assumptions: -
1 Leave code field empty. System prompts empty field error.
Comments:


1
Reference(s):
Revision History: -
Objective To test reset password functionality when user’s new password and
confirm password field don’t match.
Assumptions: -
1 Enter new password and enter System prompts password error.
different password in confirm
password field.
Comments:


1
Reference(s):
Revision History: -
Objective To test reset password functionality when user leaves new password
and confirm password field empty.
Assumptions: -
1 Leave new password or confirm System prompts empty field error.
password field empty.
Comments:

4.2.18 Post a News Test Case

3
Test Date: - Use Case Post a News
Reference(s):
Revision History: -
Objective To test user article posting functionality.
Assumptions: -
Pre-Requisite: User is on the article posting page and User shall be signed in
1 Clicks on post news. System displays the post news page.
2 Enter the text and upload images and News article saved in the unranked queue in
click post. database and system displays article posted
success dialog box.
Comments:

4.2.19 Post a News Alternate Scenario 4A Test Case

3
Test Date: - Use Case Post a News
Reference(s):
Revision History: -
Objective To test user article posting functionality when user leaves text empty.
Assumptions: -
Pre-Requisite: User is on the article posting page and User shall be signed in.
1 Click on post news. System displays the post news page.
2 Leave the text blank and click post. System displays missing field error.
Comments:
4.2.20 View Profile Test Case

3
Test Date: - Use Case View Profile
Reference(s):
Revision History: -
Objective To test view profile functionality.
Platform / 1.3 / Web Module
Assumptions: -
Pre-Requisite: User shall be signed in.
1 Navigate to profile page. System displays the user profile.
Comments:

4.2.21 System Load Test Case

1
Test Date: - Use Case -
Reference(s):
Revision History: -
Objective To test system under load of 100 users.
JMeter
Assumptions: -
Pre-Requisite: -
1 Create a thread group of 100 users. A thread group is created in the test plan.
2 Create a HTTP GET request sampler A sampler is created under the thread group
to the home page. in test plan.
3 Create a graph result listener to A result listener is created in test plan.
check throughput.
4 Run the test plan and record the The system shows an error rate of 0%.
throughput.
Comments: System shows no errors and error rate is less than 2% on average.

4.2.22 News Scrapping Schedule Test Case

News Scrapper Component
4
Test Date: 27/04/2020 Use Case -
Reference(s):
Revision History: -
Objective To test news scrapper functionality.
Platform / 1.0 / Web Scrapper Module
Python Schedule
Assumptions: -
Pre-Requisite: -
1 Create a schedule for scrapper task A schedule is created for the task.
to run twice a day.
2 Monitor the database to check news Scrapper has added new articles in the
articles at specified time. unranked queue in the database.
Comments: System works properly according to the requirements and scrapes news
at specified time.

4.2.23 Password Hashing Test Case

1
Test case Version: 1.0 Reviewed By: Obaid Ur Rehman
Reference(s):
Revision History: -
Objective To test password hashing functionality.
Assumptions: -
Pre-Requisite: -
1 Navigate to user registration page. User registration page is displayed.
2 Enter password and other credentials System adds user to the database and
and click sign up button. password in stored in hashed form instead of
plain text.
Comments: System works properly according to the requirement and stores
password as salted hash.

4.2.24 XSS Attack Prevention Test Case

3
Reference(s):
Revision History: -
Objective To check XSS attack prevention functionality.
XSSer
Assumptions: -
Pre-Requisite: -
1 Enter a malicious script like The comment is filtered out and there is no
<script>alert(‘XSS’) </script> in effect on system behavior.
comments field.
Comments:

4.2.25 Intuitive User Interface Test Case

User Interface Component
5
Reference(s):
Revision History: -
Objective To check usability of the interface.
TestRail
Assumptions: -
Pre-Requisite: -
1 Monitor Register User scenario. User needs an average of 5.3 clicks to login.
2 Monitor Login User scenario. User needs an average of 3.4 clicks to login.
3 Navigate to any news article and User needs an average of 2.2 clicks to rate
monitor article rating scenario. an article.
4 Navigate to any news article and Users need and average of 1.1 clicks to view
monitor article details scenario article details.
5 Navigate to any news article and Users need and average of 1.8 clicks to post
monitor article comments scenario. comments.

4.2.26 Theme Consistency Test Case

User Interface Component
5
Reference(s):
Revision History: -
Objective To check usability of the interface.
TestRail
Assumptions: -
Pre-Requisite: A default color scheme and user interface components design are
available.
1 Navigate to Login screen. Color scheme and user interface components
are same as used in default theme.
2 Navigate to User Registration Color scheme and user interface components
screen. are same as used in default theme
3 Navigate to article listing screen. Color scheme and user interface components
are same as used in default theme.
4 Navigate to news article details Color scheme and user interface components
screen. are same as used in default theme.
Test Metrics
Following the test metrics and their results after the testing process.
Metric Value
Number of Test Cases: 26
Number of Test Cases Passed: 26
Number of Test Cases Failed: 0
Test Case Defect Density: 0
Test Case Effectiveness: 0

Experimental Results and Analysis

For our project, we have worked on image tamper detection in news images and news article
test classification. The results of both are presented in this chapter.
Text Classifier Results and Analysis

Experimentation results and analysis of those results is described in this chapter.
Experimentation results of fake news text classification task using various models are analyzed
using multiple performance metrics are discussed in this chapter.
As discussed earlier three types of deep learning models are tested for text classification task
which are LSTM, GRU and our proposed CNN approach.
Accuracy after hyperparameter tuning are shown in Table 4.
Table 4: Model Accuracies
Table shows accuracy using various architectures and input features.
Input Feature LSTM GRU CNN
Article Headline 79.7 80.2 80.9
Article Body 82.6 83.7 85.1
Article Body + Headline 84.1 86.8 88.1
Article Body + Headline + Source Rating + Number of
85.3 86.8 87.8
Followers
Article Headline (Balanced Dataset) 82.5 82.8 84.5
Article Body (Balanced Dataset) 87.9 87.1 88.3
Article Body + Headline (Balanced Dataset) 88.6 88.0 90.3
90.6 90.1 91.8
Followers (Balanced Dataset)
Because of class imbalance, precision is also a good metric for performance assessment.
Precision values after hyperparameter tuning are shown in Table 5.
Experimental Results and Analysis 64
Table 5: Model Precision Values

Table shows precision values using various architectures and input features.
0.81 0.84 0.86
Followers
0.89 0.90 0.92
Precision alone cannot give confidence in the performance of algorithm. So, recall rate is
necessary for evaluation. Table 6 shows recall of the tested models.
Table 6: Model Recall Rate
Table shows model recall using various architectures and input features.
0.77 0.80 0.84
Followers
0.87 0.88 0.89
From the results it can be seen that the proposed Convolution Neural Network outperforms the
other two by a slight margin in case of both accuracy and precision. Moreover, CNN is also
faster to train as compared to other architectures and way simpler than LSTM and GRU.
Incorporating additional features of the news source improved the recall of our classifier
greatly. Sequence models i.e. GRU and LSTM show almost same test performance, however,
CNN outperforms both. Another advantage that CNN is that the time to train per and time to
predict is almost half as compared to LSTM and GRU.
The final model we deployed in our system is the CNN with attention mechanism.
Image Tamper Detection Results and Analysis

The above mentioned image tamper detection has been trained on synthetic dataset for 500
iterations on a Nvidia Tesla P-100 GPU. For testing purpose famous evaluation dataset DVMM
[27] and CASIA Version 2 [26] have been used. 1000 images from CASIA and 200 images
from DVMM have been used.
For performance assessment we have used two metrics. One is accuracy i.e. how many images
are correctly classified as tampered or pristine. Second metric is pixel wise Area Under the
ROC Curve, to decide at pixel level how good is our classifier performing at telling whether a
pixel belongs to a tampered region or not.
Table 7 shows performance of the tamper detection model.
Table 7: Tamper Detection Model Performance
The table shows model accuracy and AUC on evaluation datasets.
Dataset Accuracy AUC
DVMM 88 0.78
CASIA 81 0.90
The model was also tested using some human tampered images of a photoshop expert. These
image much more challenging than the images in the evaluation data. There were 23 of these
images and our model was able to correctly classify these images.
Considering that a synthetic dataset was used for training, model was able to perform very well
and was able to classify majority of the forged samples given to it. The method could be made
even better if a large annotated public dataset of human done forgeries is available.
The final result by our model is a harmonic mean of the score given by the tamper detection
model and the text classification model.
Conclusion 66
Conclusion
Fake news is a major problem of digital era, the proposed system aims to tackle this problem
using both the textual and image data. Textual and image data within a news article are main
features for classification and ranking of news articles. For the classification and ranking of
news articles we are primarily using deep learning techniques for classification of textual data
and image manipulation detection in the news article. The features used are the text content
and source credibility (source ratings and number of followers). After dataset collection and
structuring, article text is then preprocessed and is trained on three well known deep learning
architectures namely Convolution Neural Network, Long Short-Term Memory Network
(LSTM) and Gated Recurrent Unit (GRU), attention mechanism is used in all architectures as
well. After parameter tuning CNN with attention mechanism outperforms other architectures,
which is our proposed model.
The system is also capable of detecting image splice forgery in news images using a two
channel Faster-RCNN. For that purpose, a dataset has been prepared which is used to train the
network which is used for image splice detection and localization in news article images. The
final score is a weighted sum of the results from image and text classification models. The live
news articles are ranked on base of this score.
Moreover, a web application has also been designed display the ranked list of news articles
based on their authenticity and view the article details. The application is single page web
applications, which has been developed using Angular 8 as a front-end framework. Python
Django web framework has been used to develop the backend of the system. A web scrapper
module has also been developed that extract news article text from an online news article. The
system ranks and displays them
The models can be further improved if a good news dataset is available with both news and
image content. Also, if a good source is available which could provide dependable news source
credibility features such as their ratings and user following of the source, because we have
found that source credibility information is not available for many sources, we tackled this
missing data by interpolation, but there is a need for a good source that analyses the sources,
as it could greatly help in detection of fake news.
References
[1] Y. Yang, L. Zheng, J. Zhang, Q. Cui, X. Zhang, Z. Li and P. S. Yu, "TI-CNN:
Convolutional Neural Networks for Fake News Detection," arXiv, 2018.
[2] Y. Long , Q. Lu, R. Xiang, M. Li and C.-R. Huang, "Fake News Detection Through
Multi-Perspective Speaker Profiles," in International Joint Conference on Natural
Language Processing, Taipei, 2017.
[3] N. Ruchansky, S. Seo and Y. Liu, "CSI: A Hybrid Deep Model for Fake News
Detection," in Conference on Information and Knowledge Management, Singapore,
2017.
[4] J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K.-F. Wong and M. Cha, "Detecting
rumors from microblogs with recurrent neural networks," in IJCAI'16 Proceedings of the
Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, 2016.
[5] G. Ballarin, M. L. D. Vedova, E. Tacchini, S. Moret and L. d. Alfaro, "Some Like it
Hoax: Automated Fake News Detection in Social," in Workshop on Data Science for
Social Good (SoGood), Skopje, 2017.
[6] J. C. Reis, A. Correia, F. Murai, A. Veloso and F. Benevenuto, "Supervised Learning for
Fake News Detection," IEEE Intelligent Systems, vol. 34, no. 2, pp. 76-81, 2019.
[7] P. Bourgonje, J. M. Schneider and G. Rehm, "From Clickbait to Fake News Detection:
An Approach based on Detecting the Stance of Headlines to Articles," in EMNLP
Workshop on Natural Language Processing meets Journalism, Copenhagen, 2017.
[8] H. Ahmed, I. Traore and S. Saad, "Detection of Online Fake News Using N-Gram
Analysis and Machine Learning Techniques," in International Conference on Intelligent,
Secure, and Dependable Systems in Distributed and Cloud Environments, 2017.
[9] B. D. Horne and S. Adali, "This Just In: Fake News Packs a Lot in Title, Uses Simpler,
Repetitive Content in Text Body, More Similar to Satire than Real News," in
International Workshop on News and Public Opinion at ICWSM, 2017.
[10] F. Marra, D. Gragnaniello, D. Cozzolino and L. Verdoliva, "Detection of GAN-generated
Fake Images over Social Networks," in IEEE Conference on Multimedia Information
Processing and Retrieval, Miami, 2018.
[11] D. Cozzolino, G. Poggi and L. Verdoliva, "Recasting Residual-based Local Descriptors
as Convolutional Neural Networks: an Application to Image Forgery Detection," in ACM
Workshop on Information Hiding and Multimedia Security, Philadelphia, 2017.
[12] B. Bayer and C. M. Stamm, "A Deep Learning Approach to Universal Image
Manipulation Detection Using a New Convolutional Layer," in ACM Workshop on
Information Hiding and Multimedia Security, Vigo, 2016.
[13] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception
Architecture for Computer Vision," in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Las Vegas, 2016.
[14] G. Huang, Z. Liu, K. Q. Weinberger and L. van der Maaten, "Densely Connected
Convolutional Networks," in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Honolulu, 2017.
[15] F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," in IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017.
References 68
[16] J.-Y. Zhu, T. Park, P. Isola and A. A. Efros, "Unpaired Image-to-Image Translation
Using Cycle-Consistent Adversarial Networks," in IEEE International Conference on
Computer Vision (ICCV), Venice, 2017.
[17] Y. Rao and J. Ni, "A Deep Learning Approach to Detection of Splicing and Copy-Move
Forgeries in Images," in IEEE International Workshop on Information Forensics and
Security (WIFS), Abu Dhabi, 2016.
[18] M. Huh, A. Liu, A. Owens and A. A. Efros, "Fighting Fake News: Image Splice
Detection via Learned Self-Consistency," in European Conference on Computer Vision,
Munich, 2018.
[19] T. J. d. Carvalho, C. Riess, E. Angelopoulou, H. Pedrini and A. d. R. Rocha, "Exposing
Digital Image Forgeries by Illumination Color Classification," IEEE Transactions on
Information Forensics and Security, vol. 8, no. 7, pp. 1182-1194, 2013.
[20] P. Korus and J. Huang, "Evaluation of random field models in multi-modal unsupervised
tampering localization," in IEEE International Workshop on Information Forensics and
Security (WIFS), Abu Dhabi, 2016.
[21] J. Hays and A. A. Efros, "Scene Completion Using Millions of Photographs," in ACM
Transactions on Graphics, San Diego, 2007.
[22] M. Risdal, “Getting Real about Fake News,” Kaggle, 25-Nov-2016. [Online]. Available:
https://www.kaggle.com/mrisdal/fake-news.
[23] Wang and W. Yang, “‘Liar, Liar Pants on Fire’: A New Benchmark Dataset for Fake
News Detection,” arXiv.org, 01-May-2017. [Online]. Available:
https://arxiv.org/abs/1705.00648.
[24] Gsantia, “BuzzFace,” GitHub, 31-Jul-2018. [Online]. Available:
https://github.com/gsantia/BuzzFace.
[25] FakeNewsChallenge, “FakeNewsChallenge(FNC1),” GitHub, 15-Jun-2017. [Online].
Available: https://github.com/FakeNewsChallenge/fnc-1.
[26] P. Sovathana, “CASIA dataset,” Kaggle, 04-Oct-2018. [Online]. Available:
https://www.kaggle.com/sophatvathana/casia-dataset.
[27] “Columbia Image Splicing Detection Evaluation Dataset (DVMM),” Columbia Image
Splicing Detection Evaluation Dataset. [Online]. Available:
http://www.ee.columbia.edu/ln/dvmm/downloads/AuthSplicedDataSet/dlform.html.
[28] Payamesfandiari, “payamesfandiari/fake_news_finder,” GitHub, 01-May-2018.
[Online]. Available:
https://github.com/payamesfandiari/fake_news_finder/tree/master/data/processed.
[29] K. Shu, D. Mahudeswaran, S. Wang, D. Lee and H. Liu, "FakeNewsNet: A Data
Repository with News Content, Social Context and Dynamic Information for Studying
Fake News on Social Media," ArXiv, 2018.
[30] “Gossip Cop,” Gossip Cop. [Online]. Available: https://www.gossipcop.com.
[31] “Fact-checking U.S. politics,” PolitiFact. [Online]. Available:
https://www.politifact.com/.
[32] “Knowledge Graph, AI Web Data Extraction and Crawling,” Diffbot. [Online].
Available: https://www.diffbot.com/.
[33] KaiDMML, “KaiDMML/FakeNewsNet,” GitHub, 19-Nov-2019. [Online]. Available:
https://github.com/KaiDMML/FakeNewsNet/tree/master/code.
[34] “EC2,” Amazon, 2006. [Online]. Available: https://aws.amazon.com/ec2/.
[35] “S3,” Amazon, 2006. [Online]. Available: https://aws.amazon.com/s3/.

[36] “Crawling API For Web Scrapers - Proxy Crawl,” ProxyCrawl. [Online]. Available:
https://proxycrawl.com/scraping-api-avoid-captchas-blocks.
[37] M. Everingham, L. Van Gool, C. K. Williams, J. Winn and A. Zisserman, "The PASCAL
Visual Object Classes (VOC) Challenge," Internation Journal of Computer Vision, pp.
303-338, 2010.
[38] M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L.
Zitnick, P. Dollar and T. Y. Lin, "Microsoft COCO: Common Objects in Context," arXiv,
2015.
[39] P. Zhou, X. Han, V. I. Morariu and L. S. Davis, "Learning Rich Features for Image
Manipulation Detection," in CVPR, 2018.

A Sentiment and Review Analysis Based News Reliability Ranking Platform

Uploaded by

Copyright:

Available Formats

You might also like

A Sentiment and Review Analysis Based News Reliability Ranking Platform

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Sentiment and Review Analysis Based News Reliability Ranking Platform

Uploaded by

Copyright:

Available Formats

National University of Computer and Emerging Sciences

A Sentiment and Review Analysis Based News

Supervisor: Dr. Asif Mahmood Gilani

B.S. Computer Science

Title: A Sentiment and Review Analysis based News Ranking Platform

Goals and Objectives

Scope of the Project

Literature Survey / Related Work

2.1.1 Text Classification Using Neural Networks

Figure 1: Architecture of TI-CNN

2.1.2 Text Classification Using Machine Learning and Algorithmic Approaches

2.1.3 Text Classification Using Language Modeling Techniques

Image Manipulation Detection

2.2.1 Image Manipulation Detection Using Neural Networks

Figure 2: CNN Architecture for Image Splicing Detection

2.2.2 Image Manipulation Detection Using Self-Supervised Learning

Requirements and Design

6. The system shall allow users to create an account.

9. The system shall allow users to rate news articles.

3.2.1 Performance Requirements

• News sources shall be scrapped at least twice a day.

3.2.2 Security Requirements

• System shall be safe from XSS attacks.

3.2.3 Usability Requirements

• Theme should be consistent across the whole system.

Hardware and Software Requirements

3.3.1 Hardware Requirements

3.3.2 Software Requirements

• AWS S3 Bucket [35]

3.4.1 System Modules

3.4.1.1 Web Scrappers

3.4.1.1.1 RSS Scrapper

3.4.1.1.2 Article Scrapper

3.4.1.2 Ranking Model

3.4.1.2.1 Image Classification Model

3.4.1.2.2 Text Classification Model

3.4.1.2.3 Combined Score

3.4.1.3 Final Structured News Database

3.4.1.4 Web Platform

3.4.2 External Systems

3.4.2.1 RSS Feeds

3.4.3 Architecture Diagram

Figure 3: System Architecture

3.5.1 Product External Dependencies

• The ranking model will be trained on Google Colab.

3.5.2 Product Enhancement and Extensibility

• We can enhance our Image manipulation detection model specifically according to

3.5.3 Concurrency and Synchronization

3.5.4 User Interface Paradigms

3.6.1 User Registration

3.6.3 View List of Articles

3.6.4 View Article Details

3.6.5 Post a Comment

3.6.6 Rate a News Article

3.6.7 Change Password

3.6.8 Reset Password

3.6.9 Post a News

3.6.10 View Profile