D8 - Fake Profile Detection (Gpku)

An industrial oriented major project report on
FAKE PROFILE DETECTION USING DEEP LEARNING

Submitted By
PRIYAM KUMAR UPADHYAY (17W91A05N5)
P ANANTHA SAI PHANI TEJA (17W91A05J5)
JATOTHU SAIDA NAYAK (17W91A05L4)
CHADA SRIJA (18W95A0501)
Under the esteemed guidance of
Dr. ANANTHA RAMAN G R

Professor, CSE
To
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY
HYDERABAD
In partial fulfilment of the requirements for award of degree in
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
2017– 2021
MALLA REDDY
INSTITUTE OF ENGINEERING & TECHNOLOGY, (MRET - W9)
(Sponsored by Malla Reddy Educational society)
Permanently affiliated to JNTUH, Approved by AICTE,
Accredited by NBA & NAAC, An ISO 9001:2015 Certified Institution
Maisammaguda, Dhulapally post, Malkajgiri, Medchal-500100.
DECLARATION
We hereby declare that the project entitled “Fake Profile Detection Using Deep Learning”
submitted to Malla Reddy Institute of Engineering and Technology (MRET-W9), affiliated to
Jawaharlal Nehru Technological University Hyderabad (JNTUH) for the award of the degree
in Bachelor of Technology in Computer Science & Engineering is a result of original industrial
oriented project done by us.
It is further declared that the project report or any part thereof has not been previously submitted
to any University or Institute for the award of degree or diploma.
1. PRIYAM KUMAR UPADHYAY (17W91A05N5)

2. P ANANTHA SAI PHANI TEJA (17W91A05J5)
3. JATOTHU SAIDA NAYAK (17W91A05L4)
4. CHADA SRIJA (18W95A0501)
MALLA REDDY INSTITUTE OF ENGINEERING
& TECHNOLOGY
(Sponsored by Malla Reddy Educational Society)
Affiliated to JNTU, Hyderabad
Approved by AICTE | Accredited by NBA & NAAC
An ISO 9001:2015 Certified Institution
Maisammaguda, Gundlapochampally (Village), Near Dhulapally, Medchal
Malkagiri (District), Secunderabad-500 100
Phone: +91 9573228520, +91 9676402608
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that this is the Bonafide Certificate of the Project Titled “Fake Profile
Detection Using Deep Learning” is submitted by PRIYAM KUMAR UPADHYAY
(17W91A05N5), P ANANTHA SAI PHANI TEJA (17W91A05J5), JATOTHU SAIDA
NAYAK (17W91A05L4), CHADA SRIJA (18W95A0501) of B. Tech in the partial
fulfilment of the requirements for the degree in BACHELOR OF TECHNOLOGY IN
COMPUTER SCIENCE AND ENGINEERING, Department of Computer Science &
Engineering and this has not been submitted for the award of any other degree of this
institution.
VIVA VOCE DATE: ________________
INTERNAL GUIDE HOD
EXTERNAL EXAMINER
ACKNOWLEDGEMENT
First and foremost, we are grateful to the Principal Dr. M ASHOK Professor, for providing us
with all the resources in the college to make our project a success. We thank him for his
valuable suggestions at the time of seminars which encouraged us to give our best in the project.
We would like to express our gratitude to Dr. P KIRAN KUMAR REDDY Professor, Dean
of Academics for his support and valuable suggestions during the dissertation work.
We would like to express our gratitude to Dr. ANANTHA RAMAN G R Professor, Head of
the Department and our internal guide, Department of Computer Science and Engineering for
his support and valuable suggestions during the dissertation work.
We offer our sincere gratitude to our project coordinator Mr. M NAGENDRA RAO Assistant
Professor of Computer Science and Engineering department who has supported us throughout
this project with their patience and valuable suggestions.
We would also like to thank all the supporting staff of the Dept. of CSE and all other
departments who have been helpful directly or indirectly in making the project a success.
We are extremely grateful to our parents for their blessings and prayers for our completion of
project that gave us strength to do our project.
1. PRIYAM KUMAR UPADHYAY (17W91A05N5)

2. P ANANTHA SAI PHANI TEJA (17W91A05J5)
3. JATOTHU SAIDA NAYAK (17W91A05L4)
4. CHADA SRIJA (18W95A0501)
INDEX
Abstract I
List of Figures II
List of Screens III
List of Tables III
List of Abbreviations IV
CH NO. CONTENT PAGE NO.

INTRODUCTION 1-5
1.1 Problem Statement 2
1.3 Scope 3
1.4 Limitations 4
1
1.5 Conclusion 5
LITERATURE SURVEY 6 – 20
2.1 Introduction 6
2.1.1 Web Scraping 8
2.2 Existing System 11
2.3 Disadvantages of Existing System 12
2.4 Objective 13
2.5 Proposed System 14
2
2.6 System Architecture 15
2.7 Algorithms 16
2.8 Deep Learning 17
2.9 Data Analytics 18
2.10 Conclusion 20
SYSTEM ANALYSIS 21 – 32
3.1 Introduction 21
3.2 Functional Requirement 21
3 3.3 Non-Functional Requirement 23
3.4 Goals of Implementation 24
3.5 System Analysis 24
3.5.1 Frontend Technologies 24
3.5.2 Backend Technologies 28
3.5.3 Development Software 30
3.6 Conclusion 32
SYSTEM DESIGN 33 – 42
4.1 Introduction 33
4.2 Unified Modelling Language (UML) 34
4.2.1 Use Case Diagram 35
4.2.2 Class Diagram 36
4.2.3 Component Diagram 37
4.2.4 Deployment Diagram 38
4.2.5 State Chart Diagram 39
4
4.2.6 Activity Diagram 39
4.2.7 Sequence Diagram 40
4.2.8 Data Flow Diagram 40
4.3 Conclusion 42
TECHNOLOGIES USED 43 – 51
5.1 Introduction 43
5.2 Deep Learning 43
5.2.1 Importance of Deep Learning 44
5.3 Neural Network 45
5.4 Artificial Neural Network 47
5.4.1 History of ANN 48
5 5.4.2 Architecture of ANN 49
5.4.3 Characteristics of ANN 50
5.4.4 Working of ANN 50
5.5 Conclusion 51
SYSTEM ENVIRONMENT SETUP 52 – 60
6.1 Introduction 52
6.2 Browser Setup 52
6.3 Installation of Jupyter 54
6.3.1 Conda 54
6.3.2 mamba 54
6.3.3 pip 55
6 6.3.4 pipenv 55
6.3.5 Docker 55
6.3.6 Supported Browser 56
6.3.7 Usage with private NPM Registry 56
6.3.8 Installation Problems 56
6.4 Design and Environmental Setup 58
6.4.1 Input Design 58
6.4.2 Objectives 59
6.4.3 Output Design 59
6.5 Conclusion 60
SYSTEM IMPLEMENTATION 61 – 76
7.1 Introduction 61
7.2 Proposed System Modules 62
7
7.3 SnapShots 73
7.4 Conclusion 76
SYSTEM TEST 77 – 80
8.1 Introduction 77
8.2 Types of Tests 77
8 8.3 Test Results 80
8.4 Conclusion 80
CONCLUSION 81
9.1 Project Inference/Conclusion 81
9
9.2 Project Scope & Enhancement 81
REFERENCES 82 – 83
10.1 Book References 82
10
10.2 Link References 83
ABSTRACT
Online Social Networks (OSN) are popular applications for sharing various data, including
text, photos, and videos. However, fake account problems are one of the obstacles in the current
OSN systems. Fake profiles play an important role in advanced persistent threats and are also
involved in other malicious activities. Attacker exploits fake accounts to distribute misleading
information such as malware, virus, or malicious URLs.
The approaches to identifying fake profiles in social media can be classified into the approaches
aimed on analyzing profiles data and individual accounts. Social networks fake profile creation
is considered to cause more harm than any other form of cybercrime. This crime has to be
detected even before the user is notified about the fake profile creation. Many algorithms and
methods have been proposed for the detection of fake profiles in the literature.
This paper sheds light on the role of fake identities in advanced persistent threats and covers
the mentioned approaches of detecting fake social media profiles. In order to make a relevant
prediction of fake or genuine profiles, we will assess the impact of three supervised machine
learning algorithms: Random Forest (RF), Decision Tree (DT-J48), and Naïve Bayes (NB).
Inspired by the big successes of deep learning in computer vision, mainly in automatic feature
extraction and representation, we propose Deep Profile, a deep neural network (DNN)
algorithm to deal with fake account issues. Instead of using standard machine learning, we
construct a dynamic CNN to train a learning model in fake profile classification. Notably, we
propose a novel pooling layer to optimize the neural network performance in the training
process. Demonstrated by the experiments, we harvest a promising result with better accuracy
and small loss than common learning algorithms in a malicious account classification task.
Keywords: Online social networks, User profiling, Fake profile detection, CNN, Machine
Learning
MRIET I
LIST OF FIGURES
S FIG PAGE
DESCRIPTION
NO. NO. NO.
Graph showing increase in no. of fake
1 1.1.1 2
accounts across Facebook
2 2.1.1 Training Datasets over extracted and refined dataset 7
3 2.1.1.1 Web Scraping Process 8
4 2.5.1 Fake Profile Detection System Architecture 15
5 2.6.1 Components of ML Algorithms 16
6 2.7.1 Building Components of web Application 18
7 2.8.1 Deep Learning and Data Analytics Automation 20
8 3.5.2.1 Naïve’s bayes Algorithm 29
9 4.2.1.1 Use case Diagram 35
10 4.2.2.1 Class Diagram 36
11 4.2.3.1 Component Diagram 37
12 4.2.4.1 Deployment Diagram 38
13 4.2.5.1 State Chart Diagram 39
14 4.2.7.1 Sequence Diagram 40
15 5.2.1 Deep Learning Origination 44
16 5.2.1.1 Layers of Deep Learning 45
17 5.3.1 Neural Networks 46
18 5.4.1.1 Artificial Neural Network (ANN) 49
19 5.4.2.1 ANN Layers 49
20 5.4.4.1 Working of ANN 51
21 6.2.1 Browser Installation 52
22 6.2.2 Download Browser for setup 53
23 6.2.3 Default Browser Selection 53
24 6.2.4 Acceptance to Policy 54
MRIET II
LIST OF SCREENS
SCREEN DESCRIPTION PAGE

S NO.
NO. NO.
Imported Libraries
1 7.3.1 73
Trained and Compiled Output Screen

2 7.3.2 73
Plot Training and Validated Accuracy Values

3 7.3.3 74
Random Forest Classifier

4 7.3.4 74
Plot Training and Validation Loss Values

5 7.3.5 75
Prediction and Ground Truth

6 7.3.6 75
AdaBoost Classifier
7 7.3.7 76
Decision Tree Classifier

8 7.3.8 76
LIST OF TABLES
S NO. TABLE NO. DESCRIPTION PAGE NO.
1 8.3 Test Case Results 80
MRIET III
LIST OF ABBREVIATIONS
ACRONYM DESCRIPTION
UI User Interface
API Application Programming Interface
ANN Artificial Neural Network
OSN Online Social Network
SVM Support Vector Machine
HTTP Hypertext Transfer Protocol
QOS Quality of Service
ML Machine Learning
SRS Software Requirement Specification
MRIET IV
1 INTRODUCTION
One of the most popular applications in the mobile device is the Online Social Network (OSN).
Currently, it is an essential element in our daily. It becomes a popular application to connect
people around the earth for sharing various data items includes videos, photos, and messages.
However, an anomalous issue like fake accounts becomes a significant concern in OSN
protection. Several studies propose techniques to improve OSN protection in various manner.
For instance, a study introduces a virtual channel model to improve OSN protection.
Commonly, each of the devices has a security mechanism to unlock and access the device, such
as a PIN, a password, and keyboard patterns. Unfortunately, the conventional model puts user
data at risk because there is no additional security to check the user’s activities and performance
after unlocked or login to the application. Unauthorized people may able to crack the simple
passwords or PIN of mobile phones or even wearable devices because of these weaknesses.
The anomalous account becomes one of the main challenges in current public OSN. Growing
users on the OSN heighten the probability level of malicious activities. Various studies propose
numerous techniques to distinguish benign or malicious accounts effectively. Yet, it remains
big challenges in OSNs that have a large number of users and information with a dynamic
environment. In the transmission process, the OSNs able to run either independently or
dependently group. Moreover, for security reasons, OSN also organizes the scheme of a single
user group.
To avoid all these problems there is a lot of research work done. Most of the research work
completed in the field of supervised machine classifiers and unsupervised machine classifiers
of the machine learning algorithm. Supervised machine learning algorithm uses classifiers such
as naïve bays, decision tree, SVM, ANNs and also using Deep Learning (CNN). Through all
these classifiers detect the fake accounts on social media. For the detection of fake accounts,
the first process is to select the target profile for analysis to extract the feature set of the profile,
such as name, chat history, location, friends list, followers, likes, comments, and tagging. Then
applied the supervised or unsupervised machine learning classifier on it, which determines the
target profile is fake or genuine. On this technique, there is a lot of work done and gives a
successful detection result. Most of the machine learning algorithm helps in enhancing the
accuracy rate of the system that is becoming 50%-96%.
1
MRIET
1.1 PROBLEM STATEMENT
In today's Modern society, social media plays a vital role in everyone's life. The general purpose
of social media is to keep in touch with friends, sharing news, etc. The number of users in
social media is increasing exponentially. Instagram has recently gained immense popularity
among social media users. With more than 1 billion active users, Instagram has become one of
the most used social media sites. After the emergence of Instagram to the social media scenario,
people with a good number of followers have been called Social Media Influencers. These
social media influencers have now become a go-to place for the business organization to
advertise their products and services.
The widespread use of social media has become both a boon and a bane for the society. Using
social media for online fraud, spreading False information is increasing at a rapid pace. Fake
accounts are the major source of false information on social media. Business organizations that
invest huge Sum of money on social media influencers must know whether the following
gained by that account is organic or not. So, there is a widespread need for a fake account
detection tool, which can accurately say whether the account is fake or not. In this paper, we
use classification algorithms in machine learning to detect fake accounts. The process of
finding a fake account mainly depends on factors such as engagement rate and artificial
activity.
Figure 1.1.1: Graph Showing Increase in Number of Fake

Accounts across Facebook
2
MRIET
1.2SCOPE
Fake Profile detection using deep learning is a system that is available for user to seamlessly
detect fake profiles across the Online Social Network (OSN).
It can be accessed from almost all system, any time with the help of an internet connection
and web browser on python tool.
The project is a sincere effort in simplifying the task of administrators in an easily usable
format. We finalized to make this project and hence planned to develop this algorithm using
many machines learning supervised techniques and Deep learning for accuracy evaluation.
The proposed framework shows the sequence of processes that need to be followed for
continues detection of fake profiles with active leaning from the feedback of the result given
by the classification algorithm. This framework can easily be implemented by the social
networking company. By using method and parameters fake profiles detection becomes easy.
As a result of this cybercrime may be reduced.
Some aspects of the project are:
• Open: Should be accessible and readable to everyone across the user

• Adaptable: It holds minimum effort and time to understand.
• Multiple participants: Multiple Spectators’ Participation.
• Archived: The data must be stored in a permanent manner in .csv format
• Searchable: Easy to Navigate and Search for data sets
• Filterable: Only required content is displayed with graphical data for analysis
purposes.
• Upgradable: Can be easily upgraded and staggered when necessary
Our application comes with abilities to expand over additional requirement due to its robustness
for enhancements and upgradation of portal. One of the crucial aspects comes into picture is
its nature of platform independency. The operational efficiency ensures new users to access
and get at any time.
Another important extremity is the low latency of the portal due to its neutral architecture and
portability towards integration of multiple algorithms across platform.
3
MRIET
This project eliminates the need for constant physical enquiries, this saves a lot of time for both
the researchers and the facilities in process of exchange of data sets.
This report discusses the implementation details of the project, and the advantages of having
different visualizations system with supervised learning algorithms and neural networks.
1.3LIMITATIONS
While the information available online is staggering and enormously increasing day by day,
even in our technological age, we cannot forget to mention the fact that not everyone in this
day is ready to sit down at a computer screen and read for any great deal of time and find out
whether the information being displayed (the profile). Curling up in front of the fire on a cold
day with a book in hand can never be replaced by sitting in a cold chair staring at the words
and profile on computer screen.
The algorithm has few downsides such as inefficiency to handle the categorical variables which
has different number of levels. Also, when there is an increase in the number of trees, the
algorithm's time efficiency takes a hit.
Fake Profile Detection also requires constant upkeep even in-between issues. Data Sets need
to be tested regularly in order to avoid ‘linkrot’ & ‘Inconsistencies’. And because editing can
be done at any time, there's a responsibility attached to make sure what needs to be fixed is.
Some limitations with this system are listed below:
• CNN do not encode the position and orientation of the object into their predictions.
• They completely lose all their internal data about the pose and the orientation of the
object and they route all the information to the same neurons that may not be able to
deal with this kind of information.
• CNN do not encode the position and orientation of object.
• Lack of ability to be spatially invariant to the input data.
• Lots of training data is required.
• Optimization of wait time needed for quality Interface generation.
• Inefficiency to handle the categorical variables.
• Fake profile detection currently lacks the feature of automatic data categorization.
4
MRIET
• Fake profile detection lacks the feature of automatic inappropriate data detection and
acknowledgement.
1.4 CONCLUSION
This Chapter provides insight into the project, the various limitations of the project and the
various advantages of it. Here we get a clear picture of what the system must do and what is
expected of the system.
This study is providing extensive investigation with systematic analysis about the impacts of
Fake profile detection tools in learning to identify constrains and limitations of Fake profile
detection. Unfortunately, we agreed with similar previous studies that current tools are still
inadequate and inefficient to be used for replacing the traditional note takin Fake profile
detection due to several issues. We found that developing a successful Fake profile detection
application is challenges because of four main issues, complexity, technology learning
dilemma, integrity, and inefficiency issues. This study discusses the main implications to shape
the future of Online System Network (OSN).
5
MRIET
2 LITERATURE SURVEY
2.1 INTRODUCTION
A literature survey or a literature review in a project report is that Chapter which shows the
various analyses and research made in the field of your interest and the results already
published, considering the various parameters of the project and the extent of the project.
It is the most important part of your report as it gives you a direction in your research. It helps
you set a goal for your analysis - thus giving you your problem statement.
Growing OSN can increase the popularity of people and social ratings. A practical example,
OSN users can produce popularity with many likes, followers, and comments. However, it is
too easy to create fake accounts, or people can buy it online with few costs. For example, it is
easier to buy Twitter and Instagram followers and likes on the internet. Commonly, to detect
anomalous accounts in OSN, the methods analyse activity variations. Typically, the users’
activities keep on changing in a period. Sudden changing of access pattern for the information
and behaviour allows the server to catch the suspicious account up. If it fails, the anomalous
can infect the system with existing fraudulent.
The infected account also caused by a Cyborg, a type of fake account with forged identities. It
changes the user’s credibility and utilizes the compromised account to spread misleading
information, spreading rumours, polarize mass opinions. On the other hand, diverse
communities propose a bunch of dataset analysis with supervised or unsupervised learning to
address the problem. For instance, in learning technique, the model can train the features data
in a period to compute user classification. Several papers also investigate fake nodes detection
with statistical methods, distributed spatial with the density-based approach, SVM, or even
combine SVM, RF, and AdaBoost to detect the OSN fake accounts.
Not just using the OSN features data, the study of fake detection can utilize dynamic data such
as behavioural analysis, graph theory, learning algorithm, and application pattern. By using the
features, they construct various approaches to identify and classify anomalies. To hinder the
suspicious activities of the intruder in the large OSN, a study explores a method by forming a
community detection algorithm. Another study presents a model with a basis of social
behaviour to explore users’ profiles to deal with the detection problem. By analysing the
6
MRIET
behaviour in a single OSN environment, the model can classify the compromised user. The
methods determine suspicious accounts at vary grades without regarding the horizontal
classification or utilize intelligent sensing models for detecting anomalies.
Various methods used to get an efficient authentication process for multiple issues like key
agreement schemes to provide secure roaming services within the information. The OSN
environment needs to create a system for solving the malicious account problem to obtain
proper authentication. For roaming service with user anonymity, the scheme can be considered
as secure authentication and key agreement, physical-social location in the network, rumours
propagation, or even tracking user interaction in joint community OSN. Conventional
techniques like CAPTCHA are an authentication process when people and the application are
authorized in a system. However, it is hard to detect and stop the fake accounts with common
security approaches. The conventional security technique utilizes CAPTCHAs and SMS
verification to verify the accounts and avoid creating fake accounts. However, the attacker can
face these challenges with traditional methods, spammers can pass the obstacles using
CAPTCHA farms or SIM card farms.
Fake Profile Detection using Deep learning is a notion that highly depends on availability of
Data sets. Generally, it is linked to the conditions in which it is viewed; therefore, it is a highly
subjective topic. Data set training aims to quantitatively represent the human perception of
quality by enhancing the UI.
Figure2.1.1: Training Datasets over extracted and refined dataset
Our contributions to this field include:
• We introduce a Spiral model architecture to achieve considerably higher accuracy.

• Developing a simple yet accurate model that requires low computational power.
• Considering the characteristics that are usually preferred.
7
MRIET
• Providing a practical and interactive approach to data processing.

• Reducing the overall time required for retrieval of trained information from datasets.
• Working with parameters that have not been explored before for the purpose of
Online System Network enhancement.
This report discusses the implementation details of the model, and the advantages of having
different visualizations system for understanding the data standards.
2.1.1 Web Scraping

Web scraping, web harvesting, or web data extraction is data scraping used for extracting
data from websites. The web scraping software may directly access the World Wide Web using
the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually
by a software user, the term typically refers to automated processes implemented using
a bot or web crawler. It is a form of copying in which specific data is gathered and copied from
the web, typically into a central local database or spreadsheet, for later retrieval or analysis.
Web scraping a web page involves fetching it and extracting from it. Fetching is the
downloading of a page (which a browser does when a user views a page). Therefore, web
crawling is a main component of web scraping, to fetch pages for later processing. Once
fetched, then extraction can take place. The content of a page may be parsed, searched,
reformatted, its data copied into a spreadsheet or loaded into a database. Web scrapers typically
take something out of a page, to make use of it for another purpose somewhere else.
Figure 2.1.1.1: Web Scraping Process
Web scraping is used for contact scraping, and as a component of applications used for web
indexing, web mining and data mining, online price change monitoring and price comparison,
product review scraping (to watch the competition), gathering real estate listings, weather data
8
MRIET
monitoring, website change detection, research, tracking online presence and reputation, web
mashup and, web data integration.
Techniques
Web scraping is the process of automatically mining data or collecting information from the
World Wide Web. It is a field with active developments sharing a common goal with
the semantic web vision, an ambitious initiative that still requires breakthroughs in text
processing, semantic understanding, artificial intelligence and human-computer interactions.
Current web scraping solutions range from the ad-hoc, requiring human effort, to fully
automated systems that are able to convert entire web sites into structured information, with
limitations.
Human copy-and-paste
The simplest form of web scraping is manually copying and pasting data from a web page into
a text file or spreadsheet. Sometimes even the best web-scraping technology cannot replace a
human's manual examination and copy-and-paste, and sometimes this may be the only
workable solution when the websites for scraping explicitly set up barriers to prevent machine
automation.
Text pattern matching

A simple yet powerful approach to extract information from web pages can be based on the
UNIX grep command or regular expression-matching facilities of programming languages (for
instance Perl or Python).
HTTP programming
Static and dynamic web pages can be retrieved by posting HTTP requests to the remote web
server using socket programming.
HTML parsing
Many websites have large collections of pages generated dynamically from an underlying
structured source like a database. Data of the same category are typically encoded into similar
pages by a common script or template. In data mining, a program that detects such templates
in a particular information source, extracts its content and translates it into a relational form, is
called a wrapper. Wrapper generation algorithms assume that input pages of a wrapper
induction system conform to a common template and that they can be easily identified in terms
9
MRIET
of a URL common scheme. Moreover, some semi-structured data query languages, such
as XQuery and the HTQL, can be used to parse HTML pages and to retrieve and transform
page content.
DOM parsing
By embedding a full-fledged web browser, such as the Internet Explorer or
the Mozilla browser control, programs can retrieve the dynamic content generated by client-
side scripts. These browser controls also parse web pages into a DOM tree, based on which
programs can retrieve parts of the pages. Languages such as Xpath can be used to parse the
resulting DOM tree.
Vertical aggregation
There are several companies that have developed vertical specific harvesting platforms. These
platforms create and monitor a multitude of "bots" for specific verticals with no "man in the
loop" (no direct human involvement), and no work related to a specific target site. The
preparation involves establishing the knowledge base for the entire vertical and then the
platform creates the bots automatically. The platform's robustness is measured by the quality
of the information it retrieves (usually number of fields) and its scalability (how quick it can
scale up to hundreds or thousands of sites). This scalability is mostly used to target the Long
Tail of sites that common aggregators find complicated or too labour-intensive to harvest
content from.
Semantic annotation recognizing

The pages being scraped may embrace metadata or semantic markups and annotations, which
can be used to locate specific data snippets. If the annotations are embedded in the pages,
as Microformat does, this technique can be viewed as a special case of DOM parsing. In
another case, the annotations, organized into a semantic layer, are stored and managed
separately from the web pages, so the scrapers can retrieve data schema and instructions from
this layer before scraping the pages.
Computer vision web-page analysis

There are efforts using machine learning and computer vision that attempt to identify and
extract information from web pages by interpreting pages visually as a human being might.
10
MRIET
2.2EXISTING SYSTEM
Naive Bayes classifiers are a family of simple probabilistic classifiers used in machine learning.
These classifiers are based on applying Bayes theorem with strong (naive) independence
assumptions between the features.
Naive Bayes is a simple method for constructing classifiers: models that assign class labels to
problem instances, represented as vectors of feature values, where the class labels are drawn
from some finite set. It is not a single algorithm for training such classifiers, but a family of
algorithms based on a common principle: all naive Bayes classifiers assume that the value of a
particular feature is independent of the value of any other feature, given the class variable Naive
Bayes classifiers are a popular statistical technique of email filtering. They emerged in the
middle of the 90s and were one of the first attempts to tackle spam filtering problems Naive
Bayes typically use bag of words features to identify spam email, an approach commonly used
in text classification.
Naïve Bayes classifiers work by correlating the use of tokens (typically words, or sometimes
other constructions, syntactic or not), with spam and non-spam emails and then using Bayes
theorem to calculate a probability that an email is or is not a spam message
Earlier than implementation of Fake Profile Detection using deep learning there is No precise
model which enables hassle-free dataset study and analysis
• Time consumed in distributing data set.
• Manual work.
• Lack of accessibility.
• More Latency.
• Reliability and portability increased
11
MRIET
2.3 DISADVANTAGES OF EXISTING SYSTEM
The existing systems have the following disadvantages:
• Main limitation is assumption of independent predictors

• It implicitly assumes that all the attributes are mutually independent.
• It is almost impossible in real life
• The current models require high computational power.
• The data attributes used by the current model do not take into consideration common
data attributes.
• The current model uses data attributes that require complex and huge mathematical
Computations and statistical computations.
• A layman will not be able to understand the workings of the current system due to
their high complexity.
• Optimization of wait time needed for quality Interface generation.
• It doesn't lend to diversity of a review attach for maximum learning and question
model.
• If categorical variable has a category in test data set, which was not observed in
training data set, then model will assign a 0 (zero) probability and will be unable
to make a prediction. This is often known as Zero Frequency.
12
MRIET
2.4 OBJECTIVE
The main object of Fake Profile Detection is to provide users with a simple and practical
approach of
learning particularly the concept of handling and managing a single point hosted web-based
system which seamlessly integrate the agile functions to upload, change and view various files
shared across the user interface website, by means user authentication via a username and
password.
Another crucial aspect of Fake Profile Detection is to provide a hassle-free environment with
various options to navigate through different levels of education such as school, senior
secondary, graduate and post-graduate related study books or notes.
The various objectives of Fake Profile Detection are:
• Provides spectators with a simple interface to interact with.

• Develop a practical use for learning algorithms.
• To reduce manual paperwork.
• Reduced process and distribution time.
• Increased reliability.
• Increased operational efficiency.
• Quick Access
• Low latency with high QOS (Quality of Service) .
• New features can be added to the model as per requirements.
• Provide a secure and safe environment for trained model.
• Interface for new datasets to integrate.
• Approval for a higher level of safety.
• Provide anonymity and unique identity to outputs.
• Create a responsive and agile system.
• Robust in nature.
• Generate interest in the field of Scripting based responsive models.
13
MRIET
2.5 PROPOSED SYSTEM

The existing system naïve bayes method in which finding appropriate result is somewhat
difficult. To overcome this problem for detecting fake profiles here we are artificial neural
networks which is far better than any other methods. It gives the result very fast and accurate
in finding the fake and clean profiles.
Advantages
• Problems are represented by attribute-value pairs.

• They are used for problems having the target function, the output may be discrete-
valued, real-valued, or a vector of several real or discrete-valued attributes.
• The learning methods are quite robust to noise in the training data. The training
examples may contain errors, which do not affect the final output.
• It is used where the fast evaluation of the learned target function required.
• It can bear long training times depending on factors such as the number of weights
in the network, the number of training examples considered, and the settings of
various learning algorithm parameters.
It greatly overcomes the lack of availability and converts the datasets into to a fully automated
and managed model trained under conditions.
Provides users & researchers with a simple interface to interact with various aspects holding
individual functions. It reduces manual work and also minimizes carriages of huge data across
the system and network.
Fake Profile Detection using Deep learning reduced / optimizes study and distribution time due
to its robustness and simple architecture. This is a model which need not be deployed over
cloud based architectural environment.
Fake Profile Detection using Deep learning is honed with increased reliability and increased
operational efficiency. It is a trained model and supervised based service which perform its
intended function adequately for the specified data set, and operate in a defined
environment without failure. Low latency with high QoS, non-functional properties
of model such as performance, reliability, availability, and platform independency come
handy with proposed project.
14
MRIET
2.6 SYSTEM ARCHITECTURE

A deep learning model is designed to continually analyse data with a logic structure similar to
how a human would draw conclusions. To achieve this, deep learning applications use a layered
structure of algorithms called an artificial neural network. The design of an artificial neural
network is inspired by the biological neural network of the human brain, leading to a process
of learning that’s far more capable than that of standard machine learning models.
With the massive amounts of data being produced by the current "Big Data Era," we’re bound
to see innovations that we can’t even fathom yet, and potentially as soon as in the next ten
years. According to the experts, some of these will likely be deep learning applications.
“The analogy to deep learning is that the rocket engine is the deep learning models and the fuel
is the huge amounts of data we can feed to these algorithms.”
Figure 2.5.1: Fake Profile Detection System Architecture
Module Description
Select the profile
Firstly, select the profile which is to be test to find fake or real. Proper care should be taken
while choosing the features which are not dependent on each other.
Extract the attributes
15
MRIET
After proper selection of attributes, the dataset of previously identified fake and real profiles
are needed for the training purpose. We have made the real profile dataset whereas the fake
profile dataset is provided.
Pass it through the algorithm
The attributes selected are needed to be extracted from the profiles (fake and genuine). For the
social networking companies which want to implement our scheme don’t need to follow the
scrapping process, they can easily extract the features from their database. We applied to scrap
off the profiles since no social network dataset is available publicly for the research purpose of
detecting the fake profiles.
Determine fake or real profiles
After this, the dataset of fake and real profiles are prepared. From this dataset, 80% of both
profiles (real and fake) are used to prepare a training dataset and 20% of both profiles are used
to prepare a testing dataset. We find the efficiency of using training dataset.
ANN Classifier
ANN use different type of layers to find the profiles real or fake. This is an interactive method
and gives accurate value. It contains of many artificial neurons which are interconnected by
nodes.
2.7 ALGORITHMS
Figure 2.6.1: Components of ML Algorithms
16
MRIET
Machine learning (ML) is the study of computer algorithms that improve automatically
through experience and by the use of data. It is seen as a part of artificial intelligence. Machine
learning algorithms build a model based on sample data, known as "training data", in order to
make predictions or decisions without being explicitly programmed to do so. Machine learning
algorithms are used in a wide variety of applications, such as in medicine, email
filtering, speech recognition, and computer vision, where it is difficult or unfeasible to develop
conventional algorithms to perform the needed tasks.
A subset of machine learning is closely related to computational statistics, which focuses on

making predictions using computers; but not all machine learning is statistical learning. The
study of mathematical optimization delivers methods, theory and application domains to the
field of machine learning. Data mining is a related field of study, focusing on exploratory data
analysis through unsupervised learning. In its application across business problems, machine
learning is also referred to as predictive analytics.
2.8 DEEP LEARNING
Deep learning (also known as deep structured learning) is part of a broader family of machine
learning methods based on artificial neural networks with representation learning. Learning
can be supervised, semi-supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief networks, graph neural
networks, recurrent neural networks and convolutional neural networks have been applied to
fields including computer vision, speech recognition, natural language processing, machine
translation, bioinformatics, drug design, medical image analysis, material inspection
and board game programs, where they have produced results comparable to and in some
cases surpassing human expert performance.
Artificial neural networks (ANNs) were inspired by information processing and distributed
communication nodes in biological systems. ANNs have various differences from
biological brains. Specifically, neural networks tend to be static and symbolic, while the
biological brain of most living organisms is dynamic (plastic) and analogue.
17
MRIET
Deep learning architectures can be constructed with a greedy layer-by-layer method. Deep
learning helps to disentangle these abstractions and pick out which features improve
performance.
For supervised learning tasks, deep learning methods eliminate feature engineering, by
translating the data into compact intermediate representations akin to principal components,
and derive layered structures that remove redundancy in representation.
Deep learning algorithms can be applied to unsupervised learning tasks. This is an important
benefit because unlabelled data are more abundant than the labelled data. Examples of deep
structures that can be trained in an unsupervised manner are neural history
compressors and deep belief networks.
Figure 2.7.1: Building Components of Web Application
2.9 DATA ANALYTICS
Data analytics (DA) is the process of examining data sets in order to find trends and draw
conclusions about the information they contain. Increasingly data analytics is used with the aid
of specialized systems and software. Data analytics technologies and techniques are widely
18
MRIET
used in commercial industries to enable organizations to make more-informed business

decisions. It is also used scientists and researchers to verify or disprove scientific models,
theories and hypotheses.
Features Selection
Feature selection is one of the basic concepts in machine learning which hugely impacts the
performance of classification and prediction. In our work, and in order to make our models
train well, we decided to use only features which will affect directly the results.
Feasibility Study
The feasibility of the project is analyzed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the feasibility
study of the proposed system is to be carried out. This is to ensure that the proposed system is
not a burden to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.
Three key considerations involved in the feasibility analysis are,
Economic Feasibility
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased
Operational Feasibility
Are you into the production of “things”? Perhaps, your answer would be yes. We naturally
don’t call them things; instead, we call them products, services, or systems. Using the term
“things” sounds foreign because you can’t just drop them into an area without touching them.
They need to be connected to an existing service or business. These “things” are an extension
of the organization where they are produced.
19
MRIET
Figure 2.8.1: Deep Learning and Data Analytics Automation
Technical Feasibility
This study is carried out to check the technical feasibility, that is, the technical requirements of
the system. Any system developed must not have a high demand on the available technical
resources. This will lead to high demands on the available technical resources. This will lead
to high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.
Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user. This includes
the process of training the user to use the system efficiently. The user must not feel threatened
by the system, instead must accept it as a necessity. The level of acceptance by the users solely
depends on the methods that are employed to educate the user about the system and to make
him familiar with it. His level of confidence must be raised so that he is also able to make some
constructive criticism, which is welcomed, as he is the final user of the system.
2.10 CONCLUSION
From the literature survey we can conclude that we have overcome the drawbacks of existing
model and provide a clean and easy to model for the researchers. We have studied the various
problems with the current model and have come up with a model to overcome these problems.
20
MRIET
2 SYSTEM ANALYSIS
3.1 INTRODUCTION
A software requirements specification (SRS) is a document that captures complete description
about how the system is expected to perform. It is usually signed off at the end of requirements
engineering phase. It is a description of a software system to be developed. It lays out functional
and non-functional requirements and may include a set of use cases that describe interactions
that the software must provide.
The important parts of the Software Requirements Specification (SRS) document are:
• Functional requirements of the system

• Non-functional requirements of the system
• Goals of implementation
These are the important parts of an SRS.
A software requirements specification is a description of a software system to be developed. It

is modelled after business requirements specification, also known as a stakeholder
requirements specification.
3.2 FUNCTIONAL REQUIREMENT OF THE SYSTEM
A Functional Requirement (FR) is a description of the service that the software must offer. It
describes a software system or its component. A function is nothing but inputs to the software
system, its behaviour, and outputs.
It can be a calculation, data manipulation, business process, user interaction, or any other
specific functionality which defines what function a system is likely to perform. Functional
Requirements in Software Engineering are also called Functional Specification.
The main functional requirements of Fake Profile Detection Using Deep learning are:
• It must extend an interface for users to interact the system.
21
MRIET
• The portal must extend the users with an interface for Viewing all the Analysed graphs
and outputs.
• The platform must extend users with an interface to view their trained data.
• Model must be able to accept or reject uploads.
• The user must be extended with an interface to change the view of the output.
• The user must be extended with an interface to distinguish the outputs.
• The user must be extended with an artifact to navigate between different aspects.
• The user must be extended with an interface to quit.
• For developing the application, the following are the Software Requirements:
o Python
o Anaconda
• Operating Systems supported
o Windows 7/Windows XP/Windows 8
• Technologies and Languages used to Develop
o Python
• Debugger and Emulator
o Any Browser (Particularly Chrome)
• Hardware Requirements for this model are

For developing the application, the following are the Hardware Requirements:
o Processor: Pentium IV or higher
o RAM: 256 MB
o Space on Hard Disk: minimum 512MB
The above indicate the main functionalities of the system
22
MRIET
3.3 NON - FUNCTIONAL REQUIREMENT OF THE SYSTEM
The non-functional requirements of the system are:

• Confidentiality - The degree to which the software system protects sensitive data and
allows only authorized access to the data.
• Performance – Defines how well the software system accomplishes certain
functions under specific conditions. Examples include the software's speed of
response, throughput, execution time and storage capacity.
• Scalability - Property of a system that describes the ability to appropriately handle
increasing (and decreasing) workloads.
• Capacity - Deal with the amount of information or services that can be handled by
the component or system.
• Availability - The degree to which users can depend on the system to be up (able to
function) during “normal operating times”.
• Reliability - The extent to which the software system consistently performs the
specified functions without failure.
• Recoverability - The capability of the software to re-establish its level of
performance and recover the data directly affected in the case of a failure.
• Maintainability – It is how easy it is for a system to be supported, changed,
enhanced, and restructured over time.
• Security - The extent to which the system is safeguarded against deliberate and
intrusive faults from internal and external sources.
• Data Integrity - The degree to which the data maintained by the software system are
accurate, authentic, and without corruption.
• Usability - The ease with which the user can learn, operate, prepare inputs, and
interpret outputs through interaction with a system.
The above are the major non-functional requirements of the system.
23
MRIET
3.4 GOALS OF IMPLEMENTATION
The main goal of this project is to develop a model that is user accessible and easily
understandable. This model/system aims to provide a practical approach to detect fake profiles
across online system network.
The model also aims to provide graphical and analysed insights to the user.
3.5 SYSTEM ANALYSIS
3.5.1 FRONTEND TECHNOLOGIES
Python
Python is an interpreted high-level general-purpose programming language. Python's design
philosophy emphasizes code readability with its notable use of significant indentation.
Its language constructs as well as its object-oriented approach aim to help programmers write
clear, logical code for small and large-scale projects.
Python is dynamically-typed and garbage-collected. It supports multiple programming

paradigms, including structured (particularly, procedural), object-oriented and functional
programming. Python is often described as a "batteries included" language due to its
comprehensive standard library.
Python is an interpreted, high-level, general-purpose programming language. Created by Guido

Van Rossum and first released in 1991, Python’s design philosophy emphasizes code
readability with its notable use of significant white-space. Its language constructs and object-
oriented approach aim to help programmers write clear, logical code for small and large-scale
projects.
Python Features:
• Easy-to-learn – Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.
• Easy-to-read – Python code is more clearly defined and visible to the eyes.
• Easy-to-maintain – Python’s source code is fairly easy to maintain.
• A broad standard library – Python’s bulk of the library is very portable and
crossplatform compatible on UNIX, Windows, and Macintosh.
24
MRIET
• Interactive Mode – Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
• Portable – Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
• Extendable – You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
• Databases – Python provides interfaces to all major commercial databases.
• GUI Programming – Python supports GUI applications that can be created and ported
to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
• Scalable – Python provides a better structure and support for large programs than shell
scripting.
Python Modules
Modules used are as follows
Numpy
Python has a strong set of data types and data structures. Yet it wasn’t designed for Machine
Learning per say. Enter numpy (pronounced as num-pee). Numpy is a data handling library,
particularly one which allows us to handle large multi-dimensional arrays along with a huge
collection of mathematical operations. The following is a quick snippet of numpy in action.
NumPy is the fundamental package for scientific computing with Python. It contains among
other things:
• A powerful N-dimensional array object.
• Sophisticated (broadcasting) functions.
• Tools for integrating C/C++ and FORTRAN code.
• Useful linear algebra, Fourier transform, and random number capabilities.
Using NumPy in Python gives functionality comparable to MATLAB since they are both
interpreted, and they both allow the user to write fast programs if most operations work on
arrays or matrices instead of scalars. In comparison, MATLAB boasts many additional
toolboxes, notably Simulink, whereas NumPy is intrinsically integrated with Python, a more
modern and complete programming language. Moreover, complementary Python packages are
available; SciPy is a library that adds more MATLAB-like functionality and Matplotlib is a
25
MRIET
plotting package that provides MATLAB-like plotting functionality. Internally, both

MATLAB and NumPy rely on BLAS and LAPACK for efficient linear algebra computations.
Pandas
Pandas is a Python package providing fast, flexible, and expressive data structures designed to
make working with structured (tabular, multidimensional, potentially heterogeneous) and time
series data both easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, real world data analysis in Python. Additionally, it has the broader goal of
becoming the most powerful and flexible open-source data analysis / manipulation tool
available in any language. It is already well on its way toward this goal. Here are things
PANDAS can do well:
• Easy handling of missing data (represented as NaN) in floating point as well as non-
floating-point data
• Size mutability: columns can be inserted and deleted from DataFrame and higher
dimensional objects
• Automatic and explicit data alignment: objects can be explicitly aligned to a set of
labels, or the user can simply ignore the labels and let Series, DataFrame, etc.
automatically align the data for you in computations
• Powerful, flexible group by functionality to perform split-apply-combine operations on
data sets, for both aggregating and transforming data.
• Make it easy to convert ragged, differently-indexed data in other Python and NumPy
data structures into DataFrame objects
• Intelligent label-based slicing, fancy indexing, and sub-setting of large data sets.
• Intuitive merging and joining data sets.
• Flexible reshaping and pivoting of data sets
• Hierarchical labelling of axes (possible to have multiple labels per tick)
• Robust IO tools for loading data from flat files (CSV and delimited), Excel files,
databases, and saving / loading data from the ultra-fast HDF5 format
• Time series-specific functionality: date range generation and frequency conversion,
moving window statistics, date shifting and lagging.
Matplotlib
26
MRIET
Matplotlib is designed to be as usable as MATLAB, with the ability to use Python and the
advantage of being free and open-source. Each pyplot function makes some change to a figure:
e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area,
decorates the plot with labels, etc. There is also a procedural “pylab” interface based on a state
machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is
discouraged. SciPy makes use of Matplotlib. Matplotlib was originally written by John D.
Hunter, since then it has an active development community, and is distributed under a BSD-
style license. Michael Droettboom was nominated as matplotlib’s lead developer shortly before
John Hunter’s death in August 2012, and further joined by Thomas Caswell. Matplotlib 2.0.x
supports Python versions 2.7 through 3.6. Python 3 support started with Matplotlib 1.2.
Matplotlib 1.4 is the last version to support Python 2.6. Matplotlib has pledged to not support
Python 2 past 2020 by signing the Python 3 Statement.
Scipy
Pronounced as Sigh-Pie, this is one of the most important python libraries of all time. Scipy is
a scientific computing library for python. It is also built on top of numpy and is a part of the
Scipy Stack.
This is yet another behind the scenes library which does a whole lot of heavy lifting. It provides
modules/algorithms for linear algebra, integration, image processing, optimizations, clustering,
sparse matrix manipulation and many more.
Scikit-learn
Scikit-learn is a free machine learning library for Python. It features various algorithms like
support vector machine, random forests, and k-neighbours, and it also supports Python
numerical and scientific libraries like NumPy and SciPy. Scikit-learn is a free software machine
learning library for the Python programming language. It features various classification,
regression and clustering algorithms including support vector machines, random forests,
gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python
numerical and scientific libraries NumPy and SciPy.
Seaborn
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level
interface for drawing attractive and informative statistical graphics. Seaborn helps you explore
and understand your data. Its plotting functions operate on data frames and arrays containing
27
MRIET
whole datasets and internally perform the necessary semantic mapping and statistical
aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus
on what the different elements of your plots mean, rather than on the details of how to draw
them.
3.5.2 BACKEND TECHNOLOGIES
MACHINE LEARNING ALGORITHMS
Machine learning is the study of computer algorithms that improve automatically through
experience and by the use of data. It is seen as a part of artificial intelligence. Most machine
learning methods train the classifiers by the use of machine learning algorithms. The classifiers
are based on various social networks attributes such as attribute similarity, network friend
similarity, and IP address analysis. Machine learning classifiers, a number of algorithms which
are used in the proposed model, are introduced below.
SUPPORT VECTOR MACHINE
Support vector machine (SVM) proposed is a learning algorithm based on statistical learning
theory. SVM implements the principal of structure risk minimization which minimizes the
empirical error and the complexity of the learner at the same time and achieves good
generalization performance in classification and regression tasks. The goal of SVM for
classification is to construct the optimal hyperplane with the largest margin. In general, the
larger the margin, the lower the generalization error of the classifier.
In this article, SVM was used with a linear and Gaussian kernel in training. Gaussian uses
normal curves around the data points and sums these data points so that the decision boundary
can be defined by a type of topology condition such as curves where the sum is above 0.5.
NAÏVE’S BAYES THEOREM
Naive Bayes classifiers are a collection of classification algorithms based on Bayes' Theorem.
It is not a single algorithm but a family of algorithms where all of them share a common
principle, i.e., every pair of features being classified is independent of each other.
It is a classification technique based on Bayes' Theorem with an assumption of independence

among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature.
28
MRIET
Figure3.5.2.1: Naïve’s Bayes Algorithm
DATA ANALYTICS
Data analysis is a process of inspecting, cleansing, transforming, and modelling data with the
goal of discovering useful information, informing conclusions, and supporting decision-
making. Data analysis has multiple facets and approaches, encompassing diverse techniques
under a variety of names, and is used in different business, science, and social science
domains. In today's business world, data analysis plays a role in making decisions more
scientific and helping businesses operate more effectively.
Data requirements
The data are necessary as inputs to the analysis, which is specified based upon the requirements
of those directing the analysis or customers (who will use the finished product of the analysis).
The general type of entity upon which the data will be collected is referred to as
an experimental unit (e.g., a person or population of people). Specific variables regarding a
population (e.g., age)may be specified and obtained. Data may be numerical or categorical (i.e.,
a text label for numbers).
Data collection
Data are collected from a variety of sources. The requirements may be communicated by
analysts to custodians of the data; such as, Information Technology personnel within an
organization. The data may also be collected from sensors in the environment, including traffic
cameras, satellites, recording devices, etc.
Data processing
29
MRIET
Data, when initially obtained, must be processed or organized for analysis. For instance, these
may involve placing data into rows and columns in a table format (known as structured data)
for further analysis, often through the use of spreadsheet or statistical software.
Data cleaning
Once processed and organized, the data may be incomplete, contain duplicates, or contain
errors. The need for data cleaning, will arise from problems in the way that the datum is entered
and stored. Data cleaning is the process of preventing and correcting these errors. Common
tasks include record matching, identifying inaccuracy of data, overall quality of existing data,
deduplication, and column segmentation.
DEEP LEARNING
Deep learning (also known as deep structured learning) is part of a broader family of machine
learning methods based on artificial neural networks with representation learning. Learning
can be supervised, semi-supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief networks, graph neural
networks, recurrent neural networks and convolutional neural networks have been applied to
fields including computer vision, speech recognition, natural language processing, machine
translation, bioinformatics, drug design, medical image analysis, material inspection and board
game programs, where they have produced results comparable to and in some cases surpassing
human expert performance.
Artificial neural networks (ANNs) were inspired by information processing and distributed
communication nodes in biological systems. ANNs have various differences from
biological brains. Specifically, neural networks tend to be static and symbolic, while the
biological brain of most living organisms is dynamic (plastic) and analogue.
3.5.3 DEVELOPMENT SOFTWARE
SUBLIME TEXT EDITOR
Sublime Text is a shareware cross-platform source code editor with a Python application
programming interface (API). It natively supports many programming languages and
Markup languages, and functions can be added by users with plugins, typically community-
built and maintained under free-software licenses .
30
MRIET
It includes wide features such as Syntax Highlight, Auto Indentation, File Type
Recognition, Sidebar, Macros, Plug-in and Packages that make it easy for working with
code base.Its “Go to” functionality and many keyboard shortcuts make it easy for experienced
developers to navigate their way around, and to write and find code easily without having to
take their hands off the keyboard.
VISUAL STUDIO CODE EDITOR
Visual Studio Code is a free source-code editor made by Microsoft for Windows, Linux
and macOS. Features include support for debugging, syntax highlighting,
intelligent code completion, snippets, code refactoring, and embedded Git for gitlib.
Visual Studio Code is a streamlined code editor with support for development operations
like debugging, task running, and version control. It aims to provide just the tools a
developer needs for a quick code-build-debug cycle and leaves more complex workflows
to fuller featured IDEs, such as Visual Studio IDE.
SUBLIME Vs VISUAL STUDIO TEXT EDITORS
Sublime Text is quick and easy to write code and navigate your way around when you
know what you're doing. Visual Studio provides more hand-holding and is a great option
for its debugging functionality, but might slow some speedy experienced programmers
down when it comes to writing code.
ANACONDA JUPYTER NOTEBOOK
Jupyter Notebook (formerly IPython Notebooks) is a web-based interactive computational

environment for creating Jupyter notebook documents. The "notebook" term can colloquially
make reference to many different entities, mainly the Jupyter web application, Jupyter
Python web server, or Jupyter document format depending on context. A Jupyter Notebook
document is a JSON document, following a versioned schema, containing an ordered list of
input/output cells which can contain code, text (using Markdown), mathematics, plots and rich
media, usually ending with the ".ipynb" extension
31
MRIET
3.6 CONCLUSION
The analysis tells as the requirement specifications of the project. The functional requirements
specify the functionality and functional requirements were as the software requirements tell
the required software and supporting files to process the data. The hardware requirements tell
about the hardware components required to run the software. The various requirements of the
system is selected through rigorous survey; the development is done in such a way that we
ensure that all the requirements are met, and the software is up to the standards of a professional
software.
System analysis is conducted for the purpose of studying a system or its parts in order to
identify its objectives. It is a problem-solving technique that improves the system and
ensures that all the components of the system work efficiently to accomplish their purpose.
Analysis specifies what the system should do.
32
MRIET
3 SYSTEM DESIGN
4.1 INTRODUCTION
System Design is the process or art of defining the architecture components, modules,
interfaces and data for a system to satisfy specified requirements. One should see as the
applications of the systems theory to product development.
System design is the phase that bridges the gap between the problem domain and the existing
system in a manageable way. It is the phase where the SRS document is converted into a format
that can be implemented and decides how the system will operate.
The purpose of the design phase is to plan a solution of the problem specified by the
requirement document. This phase is the first step in moving from the problem domain to
the solution domain. In other words, starting with what is needed, design takes us toward
how to satisfy the needs. The design of a system is perhaps the most critical factor affection
the quality of the software; it has a major impact on the later phase, particularly testing,
maintenance. The output of this phase is the design document. This document is similar to a
blueprint for the solution and is used later during implementation, testing and maintenance.
The design activity is often divided into two separate phases System Design and Detailed
Design.
System Design also called top-level design aims to identify the modules that should be in the
system, the specifications of these modules, and how they interact with each other to produce
the desired results. At the end of the system design all the major data structures, file formats,
output formats, and the major modules in the system and their specifications are decided.
During, Detailed Design, the internal logic of each of the modules specified in system design
is decided. During this phase, the details of the data of a module are usually specified in a
high-level design description language, which is independent of the target language in which
the software will eventually be implemented.In system design the focus is on identifying
the modules, whereas during detailed designthe focus is on designing the logic for each of
the modules. In other works, in system designthe attention is on what components are needed,
while in detailed design how the components can be implemented in software is the issue.
33
MRIET
Design is concerned with identifying software components specifying relationships among

components. Specifying software structure and providing blue print for the document phase.
Modularity is one of the desirable properties of large systems. It implies that the system is
divided into several parts. In such a manner, the interaction between parts is minimal clearly
specified.
During the system design activities, Developers bridge the gap between the requirements
specification, produced during requirements elicitation and analysis, and the system that is
delivered to the user. Design is the place where the quality is fostered in development.
Software design is a process through which requirements are translated into a representation
of software.
In this phase, the complex activity of system development is divided into several smaller sub
activities, which coordinate with each other to achieve the main objective of system
development.
4.2 UNIFIED MODELING LANGUAGE (UML)
Unified Modelling Language (UML) is a general-purpose modelling language. The main aim
of UML is to define a standard way to visualize the way a system has been designed. It is uite
like blueprints used in other fields of engineering.
UML is not a programming language; it is rather a visual language. We use UML diagrams to
portray the behaviour and structure of a system. UML helps software engineers, businessmen
and system architects with modelling, design and analysis. The Object Management Group
(OMG) adopted Unified Modelling Language as a standard in 1997. It’s been managed by
OMG ever since. International Organization for Standardization (ISO) published UML as an
approved standard in 2005. UML has been revised over the years and is reviewed periodically.
UML is linked with object-oriented design and analysis. UML makes the use of elements and
forms associations between them to form diagrams. Diagrams in UML can be broadly
classified as:
Structural Diagrams – Capture static aspects or structure of a system. Structural Diagrams

include Component Diagrams, Object Diagrams, Class and Deployment Diagrams.
34
MRIET
Behaviour Diagrams – Capture dynamic aspects or behaviour of the system. Behaviour
diagrams include Use Case Diagrams, State Diagrams, Activity and Interaction Diagrams.
4.2.1 USE CASE DIAGRAM
A use case is a methodology used in system analysis to identify, clarify and organize system
requirements. The use case is made up of a set of possible sequences of interactions between
systems and users in a particular environment and related to a particular goal. The method
creates a document that describes all the steps taken by a user to complete an activity.
Every use case contains three essential elements:

• The actor.
• The goal.
• The system.
Figure 4.2.1.1: Use Case Diagram
35
MRIET
4.2.2 CLASS DIAGRAM
The class diagram is the main building block of object-oriented modelling. It is used for
general conceptual modelling of the structure of the application, and for detailed modelling
translating the models into programming code. Class diagrams can also be used for data
modelling.
Class diagrams give you a sense of orientation. They provide detailed insight into the
structure of your systems. At the same time they offer a quick overview of the synergy
happening among the different system elements as well as their properties and
relationships.
Figure 4.2.2.1: Class Diagram
36
MRIET
4.2.3 COMPONENT DIAGRAM
a component diagram depicts how components are wired together to form larger components
or software systems. They are used to illustrate the structure of arbitrarily complex systems.
Component diagrams represent a set of components and their relationships. These

components consist of classes, interfaces, or collaborations. Component diagrams represent
the implementation view of a system. During the design phase, software artifacts (classes,
interfaces, etc.) of a system are arranged in different groups depending upon their
relationship. Now, these groups are known as components. Finally, it can be said component
diagrams are used to visualize the implementation.
Figure 4.2.3.1: Component Diagram
37
MRIET
4.2.4 DEPLOYMENT DIAGRAM
A deployment diagram in the Unified Modelling Language models the physical deployment
of artifacts on nodes. To describe a web site, for example, a deployment diagram would show
what hardware components exist, what software components run on each node, and how the
different pieces are connected.
Deployment diagrams are a set of nodes and their relationships. These nodes are physical
entities where the components are deployed. Deployment diagrams are used for visualizing
the deployment view of a system. This is generally used by the deployment team. Note − If
the above descriptions and usages are observed carefully then it is very clear that all the
diagrams have some relationship with one another. Component diagrams are dependent upon
the classes, interfaces, etc. which are part of class/object diagram. Again, the deployment
diagram is dependent upon the components, which are used to make component diagrams.
Figure 4.2.4.1: Deployment Diagram
38
MRIET
4.2.5 STATE CHART DIAGRAM
Any real-time system is expected to be reacted by some kind of internal/external events.

These events are responsible for state change of the system. State chart diagram is used to
represent the event driven state change of a system. It basically describes the state change of
a class, interface, etc. State chart diagram is used to visualize the reaction of a system by
internal/external factors.
Figure 4.2.5.1: State chart Diagram
4.2.6 ACTIVITY DIAGRAM

An activity diagram visually presents a series of actions or flow of control in a system similar
to a flowchart or a data flow diagram. Activity diagrams are often used in business process
modelling. They can also describe the steps in a use case diagram. Activities modelled can be
sequential and concurrent.
Activity diagrams show high-level actions chained together to represent a process

occurring in your system. Activity diagrams are particularly good at modelling business
processes.
39
MRIET
4.2.7 SEQUENCE DIAGRAM
A sequence diagram is a type of interaction diagram because it describes how and in what
order a group of objects works together. These diagrams are used by software developers
and business professionals to understand requirements for a new system or to document an
existing process.
Sequence diagrams describe how and in what order the objects in a system function. These
diagrams are widely used by businessmen and software developers to document and understand
requirements for new and existing systems.
Figure 4.2.7.1 Sequence Diagram
4.2.8 DATA FLOW DIAGRAMS
A graphical tool used to describe and analyse the moment of data through a system manual
or automated including the process, stores of data, and delays in the system. Data Flow
40
MRIET
Diagrams are the central tool and the basis from which other components are developed. The
transformation of data from input to output, through processes, may be described logically
and independently of the physical components associated with the system. The DFD is also
known as a data flow graph or a bubble chart.
o DFDs are the model of the proposed system. They clearly should show the
requirements on which the new system should be built. Later during design
activity this is taken as the basis for drawing the system’s structure charts. The
Basic Notation used to create a DFD’s are asfollows:
1. Dataflow: Data move in a specific direction from an origin to a destination.
2. Process: People, procedures, or devices that use or produce (Transform) Data.

Thephysical component is not identified.
3. Source: External sources or destination of data, which may be People, programs,

organizations or other entities.
4. Data Store: Here data are stored or referenced by a process in the System.
41
MRIET
4.3 CONCLUSION
By design content, we can describe the required modules and different diagrams. Using
diagrams what are the communications present and we can also understand the project easily.
Modules help as in designing the project to fulfil the user requirements. This phase has helped
us understand the project better. These diagrams make the process of construction amazingly
simple.
A UML diagram is a partial graphical representation (view) of a model of a system under

design, implementation, or already in existence. UML diagram contains graphical elements
(symbols) - UML nodes connected with edges (also known as paths or flows) - that represent
elements in the UML model of the designed system. The UML model of the system might also
contain other documentation such as use cases written as templated texts.
The kind of the diagram is defined by the primary graphical symbols shown on the diagram.
For example, a diagram where the primary symbols in the contents area are classes is class
diagram. A diagram which shows use cases and actors is use case diagram. A sequence diagram
shows sequence of message exchanges between lifelines.
UML specification does not preclude mixing of different kinds of diagrams, e.g. to combine
structural and behavioural elements to show a state machine nested inside a use case.
Consequently, the boundaries between the various kinds of diagrams are not strictly enforced.
At the same time, some UML Tools do restrict set of available graphical elements which could
be used when working on specific type of diagram.
42
MRIET
4 TECHNOLOGIES USED
5.1 INTRODUCTION
Any system requires the implementation of various technologies which together help the
proposed system runs. These technologies include both hardware and software. The seamless
integration between these components helps the system run smoothly. If there is no proper
integration between these components the system may develop unwanted complications. This
project takes advantage of multiple open-source software’s and some trained supervised
learning methods to accomplish the task. Some of the technologies used in this project include:
• DEEP LEARNING
• NEURAL NETWORKS
• ARTIFICIAL NEURAL NETWORK
5.2 DEEP LEARNING

Deep learning is based on the branch of machine learning, which is a subset of artificial
intelligence. Since neural networks imitate the human brain and so deep learning will do. In
deep learning, nothing is programmed explicitly. Basically, it is a machine learning class that
makes use of numerous nonlinear processing units so as to perform feature extraction as well
as transformation. The output from each preceding layer is taken as input by each one of the
successive layers. Deep learning models are capable enough to focus on the accurate features
themselves by requiring a little guidance from the programmer and are very helpful in solving
out the problem of dimensionality. Deep learning algorithms are used, especially when we have
a huge no of inputs and outputs. Since deep learning has evolved by machine learning, which
itself is a subset of artificial intelligence and as the idea behind artificial intelligence is to mimic
human behaviour, so is "the idea of deep learning to build such algorithms that can mimic the
brain". Deep learning is a collection of statistical techniques of machine learning for learning
feature hierarchies that are actually based on artificial neural networks.
43
MRIET
Figure 5.2.1: Deep learning origination
5.2.1 Importance of Deep Learning
• Machine learning works only with sets of structured and semi-structured data, while
deep learning works with both structured and unstructured data
• Deep learning algorithms can perform complex operations efficiently, while machine
learning algorithms cannot
• Machine learning algorithms use labelled sample data to extract patterns, while deep
learning accepts large volumes of data as input and analyses the input data to extract
features out of an object
• The performance of machine learning algorithms decreases as the number of data
increases; so, to maintain the performance of the model, we need a deep learning
44
MRIET
Figure 5.2.1.1: Layers of Deep Learning
Example of Deep Learning
In the figure given above, we provide the raw data of images to the first layer of the input layer.
After then, these input layer will determine the patterns of local contrast that means it will
differentiate on the basis of colours, luminosity, etc. Then the 1st hidden layer will determine
the face feature, i.e., it will fixate on eyes, nose, and lips, etc. And then, it will fixate those face
features on the correct face template. So, in the 2nd hidden layer, it will actually determine the
correct face here as it can be seen in the above image, after which it will be sent to the output
layer. Likewise, more hidden layers can be added to solve more complex problems, for
example, if you want to find out a particular kind of face having large or light complexions.
So, as and when the hidden layers increase, we are able to solve complex problems.
5.3 NEURAL NETWORK

A neural network is made of artificial neurons that receive and process input data. Data is
passed through the input layer, the hidden layer, and the output layer. A neural network process
starts when input data is fed to it. Data is then processed via its layers to provide the desired
output.
A neural network learns from structured data and exhibits the output. Learning taking place
within neural networks can be in three different categories:
45
MRIET
1. Supervised Learning - with the help of labelled data, inputs, and outputs are fed to the
algorithms. They then predict the desired result after being trained on how to interpret
data.
2. Unsupervised Learning - ANN learns with no human intervention. There is no labelled
data, and output is determined according to patterns identified within the output data.
3. Reinforcement Learning - the network learns depending on the feedback you give it.
The essential building block of a neural network is a perceptron or neuron. It uses the
supervised learning method to learn and classify data
How Neural Networks work
Neural Networks are complex systems with artificial neurons. Artificial neurons or perceptron
consist of:
• Input
• Weight
• Bias
• Activation Function
• Output
Figure 5.3.1: Neural Network
46
MRIET
The neurons receive many inputs and process a single output. Neural networks are layers of
neurons. These layers consist of the following:
• Input layer
• Multiple hidden layers
• Output layer
The input layer receives data represented by a numeric value. Hidden layers perform the most
computations required by the network. Finally, the output layer predicts the output. In a neural
network, neurons dominate one another. Each layer is made of neurons. Once the input layer
receives data, it is redirected to the hidden layer. Each input is assigned with weights.
The weight is a value in a neural network that converts input data within the network’s hidden
layers. Weights work by input layer, taking input data, and multiplying it by the weight value.
It then initiates a value for the first hidden layer. The hidden layers transform the input data
and pass it to the other layer. The output layer produces the desired output. The inputs and
weights are multiplied, and their sum is sent to neurons in the hidden layer. Bias is applied to
each neuron. Each neuron adds the inputs it receives to get the sum. This value then transits
through the activation function. The activation function outcome then decides if a neuron is
activated or not. An activated neuron transfers information into the other layers. With this
approach, the data gets generated in the network until the neuron reaches the output layer.
Another name for this is forward propagation. Feed-forward propagation is the process of
inputting data into an input node and getting the output through the output node. Feed-forward
propagation takes place when the hidden layer accepts the input data. Processes it as per the
activation function and passes it to the output. The neuron in the output layer with the highest
probability then projects the result. If the output is wrong, back propagation takes place. While
designing a neural network, weights are initialized to each input. Back propagation means re-
adjusting each input’s weights to minimize the errors, thus resulting in a more accurate output.
5.4 ARTIFICIAL NEURAL NETWORK

ANN is an efficient computing system whose central theme is borrowed from the analogy of
biological neural networks. ANNs are also named as “artificial neural systems,” or “parallel
distributed processing systems,” or “connectionist systems.” ANN acquires a large collection
47
MRIET
of units that are interconnected in some pattern to allow communication between the units.
These units, also referred to as nodes or neurons, are simple processors which operate in
parallel. Every neuron is connected with another neuron through a connection link. Each
connection link is associated with a weight that has information about the input signal. This is
the most useful information for neurons to solve a particular problem because the weight
usually excites or inhibits the signal that is being communicated. Each neuron has an internal
state, which is called an activation signal. Output signals, which are produced after combining
the input signals and activation rule, may be sent to other units. The term "Artificial Neural
Network" is derived from Biological neural networks that develop the structure of a human
brain. Similar to the human brain that has neurons interconnected to one another, artificial
neural networks also have neurons that are interconnected to one another in various layers of
the networks. These neurons are known as nodes. Artificial Neural Network is biologically
inspired by the neural network, which constitutes after the human brain.
5.4.1 HISTORY OF ANN
The history of neural networking arguably began in the late 1800s with scientific endeavours
to study the activity of the human brain. In 1890, William James published the first work about
brain activity patterns. In 1943, McCulloch and Pitts created a model of the neuron that is still
used today in an artificial neural network. This model is segmented in two parts
• A summation over-weighted input.

• An output function of the sum.
Artificial Neural Network (ANN):
In 1949, Donald Hebb published "The Organization of Behaviour," which illustrated a law for
synaptic neuron learning. This law, later known as Hebbian Learning in honour of Donald
Hebb, is one of the most straight-forward and simple learning rules for artificial neural
networks. In 1951, Narvin Minsky made the first Artificial Neural Network (ANN) while
working at Princeton. In 1958, "The Computer and the Brain'' were published, a year after John
von Neumann's death. In that book, von Neumann proposed numerous extreme changes to how
analysts had been modelling the brain.
48
MRIET
Figure 5.4.1.1: Artificial Neural Network
5.4.2 ARCHITECTURE OF ANN
Artificial Neural Network primarily consists of three layers:
Figure 5.4.2.1: ANN Layers
Input layer:
The Input layers contain those artificial neurons (termed as units) which are to receive input
from the outside world. This is where the actual learning on the network happens, or
recognition happens else it will process.
Output layer:
49
MRIET
The output layers contain units that respond to the information that is fed into the system and
also whether it learned any task or not.
Hidden layer:
The hidden layers are mentioned hidden in between input layers and the output layers. The
only job of a hidden layer is to transform the input into something meaningful that the output
layer/unit can use in some way.
Most of the artificial neural networks are all interconnected, which means that each of the
hidden layers is individually connected to the neurons in its input layer and also to its output
layer leaving nothing to hang in the air. This makes it possible for a complete learning
process and also learning occurs to the maximum when the weights inside the artificial neural
network get updated after each iteration.
5.4.3 CHARACTERISTICS OF ANN
• Neural networks have the ability to map input patterns to their assumed output patterns
• The Neural networks are able to generalise. Hence, new findings from past patterns can
be expected
• The Neural networks are stable systems and are tolerant of faults. Therefore, they can
distinguish complete patterns from incomplete, partial or noisy patterns
• At high speed and in a distributed manner, the Neural networks can process the data in
a parallel mode
5.4.4 WORKING OF ANN
Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network
receives the input signal from the external source in the form of a pattern and image in the form
of a vector. These inputs are then mathematically assigned by the notations x(n) for every n
number of inputs.
50
MRIET
Figure 5.4.4.1: Working of ANN
Afterward, each of the input is multiplied by its corresponding weights (these weights are the
details utilized by the artificial neural networks to solve a specific problem ). In general terms,
these weights normally represent the strength of the interconnection between neurons inside
the artificial neural network. All the weighted inputs are summarized inside the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight equals
to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity. Here, to keep
the response in the limits of the desired value, a certain maximum value is benchmarked, and
the total of weighted inputs is passed through the activation function.
5.5 CONCLUSION
Working on this project has exposed to the various new and old technologies present in the
market for web development. These technologies have specifically been selected due to their
versatility and hoe the easily work with each other in tandem. We have gained a lot of insight
about the various technologies present in the market and what are their advantages and
disadvantages.
51
MRIET
5 SYSTEM ENVIRONMENT SETUP
6.1 INTRODUCTION
In this section we will describe how the development and the deployment setup was done for
the system. This can be used to replicate the exact workings of the system.
6.2 BROWSER SETUP
We will be talking about a setup of browser called chrome.
STEP 1: Go to https://www.google.com/chrome/ in your browser.
You can use any web browser to download Google Chrome. If you haven’t installed a
browser, you can use your operating system’s preinstalled web browser (Internet Explorer for
Windows and Safari for Mac OS X).
Figure 6.2.1: Browser installation
STEP 2: Click "Download Chrome". This will open the Terms of Service window.
52
MRIET
Figure 6.2.2: Download Browser for setup
STEP 3: Determine if you want Chrome as your default browser. If you set it as the default
browser, it will open whenever a link for a web page is clicked in another program, such as
email. This will send back crash reports, preferences and button clicks. It does not send any
personal information or track websites.
Figure 6.2.3: Default Browser Selection
53
MRIET
STEP 4: Click “Accept and Install” after reading the Terms of Service. The installer will start
and you will have Google Chrome installed when it has finished. Depending on your browser
settings, you may need to allow the program to run.
Figure 6.2.4: Acceptance to policy
STEP 5: Sign in to Chrome. After installing, a Chrome window will open showing first-time
use information. We can sign in with your Google account to sync bookmarks, preferences,
and browsing history with any Chrome browser that you use. We can also read how to use
Google Chrome for some tips on your new browser.
6.3 INSTALLATION OF JUPYTER
Jupyter can be installed using conda , mamba , pip , pipenv or docker .
6.3.1 conda
If you use conda , you can install it with:
conda install -c conda-forge jupyterlab

6.3.2 mamba
If you use mamba , you can install it with:
mamba install -c conda-forge jupyterlab
54
MRIET
6.3.3 pip
If you use pip , you can install it with:
pip install jupyterlab

If installing using pip install --user , you must add the user-level bin directory to
your PATH environment variable in order to launch jupyter lab . If you are using a Unix
derivative (FreeBSD, GNU / Linux, OS X), you can achieve this by

using export PATH="$HOME/.local/bin:$PATH" command.
6.3.4 pipenv
If you use pipenv , you can install it as:
pipenv install jupyterlab

pipenv shell
or from a git checkout:
pipenv install git+git://github.com/jupyterlab/jupyterlab.git#egg=jupyterlab

pipenv shell
When using pipenv , in order to launch jupyter lab , you must activate the project’s
virtualenv. For example, in the directory where pipenv ’s Pipfile and Pipfile.lock live (i.e.,
where you ran the above commands):
pipenv shell
jupyter lab
Alternatively, you can run jupyter lab inside the virtualenv with
pipenv run jupyter lab
6.3.5 DOCKER
If you have Docker installed, you can install and use Jupyter by selecting one of the
many ready-to-run Docker images maintained by the Jupyter Team. Follow the instructions
in the Quick Start Guide to deploy the chosen Docker image.
Ensure your docker command includes the -e JUPYTER_ENABLE_LAB=yes flag to

ensure Jupyter is enabled in your container.
55
MRIET
6.3.6 SUPPORTED BROWSER
The latest versions of the following browsers are currently known to work:
• Firefox
• Chrome
• Safari
Earlier browser versions may also work, but come with no guarantees.
JupyterLab uses CSS Variables for styling, which is one reason for the minimum versions
listed above. IE 11+ or Edge 14 do not support CSS Variables, and are not directly supported
at this time. A tool like postcss can be used to convert the CSS files in
the jupyterlab/build directory manually if desired.
6.3.7 USAGE WITH PRIVATE NPM REGISTRY
To install some extensions, you will need access to an NPM packages registry. Some
companies do not allow reaching directly public registry and have a private registry. To use
it, you need to configure npm and yarn to point to that registry (ask your corporate IT
department for the correct URL):
npm config set registry https://registry.company.com/

yarn config set registry https://registry.company.com/
JupyterLab will pick up that registry automatically. You can check which registry URL is
used by JupyterLab by running:
python -c "from jupyterlab.commands import AppOptions; print(AppOptions().registry)"
6.3.8 INSTALLATION PROBLEMS
If your computer is behind corporate proxy or firewall, you may encounter HTTP and SSL
errors due to the proxy or firewall blocking connections to widely-used servers. For example,
you might see this error if conda cannot connect to its own repositories:
CondaHTTPError: HTTP 000 CONNECTION FAILED for url

<https://repo.anaconda.com/pkgs/main/win-64/current_repodata.json>
Here are some widely-used sites that host packages in the Python and JavaScript open-source
ecosystems. Your network administrator may be able to allow http and https connections to
these domains:
56
MRIET
• pypi.org
• pythonhosted.org
• continuum.io
• anaconda.com
• conda.io
• github.com
• githubusercontent.com
• npmjs.com
• yarnpkg.com
Alternatively, you can specify a proxy user (usually a domain user with password), that is
allowed to communicate via network. This can be easily achieved by setting two common
environment variables: HTTP_PROXY and HTTPS_PROXY . These variables are
automatically used by many open-source tools (like conda ) if set correctly.
# For Windows
set HTTP_PROXY=http://USER:PWD@proxy.company.com:PORT
set HTTPS_PROXY=https://USER:PWD@proxy.company.com:PORT
# For Linux / MacOS

export HTTP_PROXY=http://USER:PWD@proxy.company.com:PORT
export HTTPS_PROXY=https://USER:PWD@proxy.company.com:PORT
In case you can communicate via HTTP, but installation with conda fails on connectivity
problems to HTTPS servers, you can disable using SSL for conda .
Warning
Disabling SSL in communication is generally not recommended and involves potential

security risks.
# Configure npm to not use SSL
conda config --set ssl_verify False
You can do a similar thing for pip . The approach here is to mark repository servers as
trusted hosts, which means SSL communication will not be required for downloading Python
libraries.
# Install pandas (without SSL)

pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org pandas
Using the tips from above, you can handle many network problems related to installing
Python libraries.
57
MRIET
Many Jupyter extensions require having working npm and jlpm (alias for yarn )
commands, which is required for downloading useful Jupyter extensions or other JavaScript
dependencies. If npm cannot connect to its own repositories, you might see an error like:
ValueError: "@jupyterlab/toc" is not a valid npm package

You can set the proxy or registry used for npm with the following commands.
# Set proxy for NPM

npm config set proxy http://USER:PWD@proxy.company.com:PORT
npm config set proxy https://USER:PWD@proxy.company.com:PORT
# Set default registry for NPM (optional, useful in case if common JavaScript libs cannot be
found)
npm config set registry http://registry.npmjs.org/
jlpm config set registry https://registry.yarnpkg.com/
In case you can communicate via HTTP, but installation with npm fails on connectivity
problems to HTTPS servers, you can disable using SSL for npm .
Warning
Disabling SSL in communication is generally not recommended and involves potential

security risk.
# Configure npm to not use SSL
npm set strict-ssl False
6.4 DESIGN AND ENVIRONMENT SETUP
6.4.1 INPUT DESIGN
The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are necessary to
put transaction data in to a usable form for processing can be achieved by inspecting the
computer to read data from a written or printed document or it can occur by having people
keying the data directly into the system. The design of input focuses on controlling the amount
of input required, controlling the errors, avoiding delay, avoiding extra steps and keeping the
process simple. The input is designed in such a way so that it provides security and ease of use
with retaining the privacy. Input Design considered the following things:
➢ What data should be given as input?
58
MRIET
➢ How the data should be arranged or coded?

➢ The dialog to guide the operating personnel in providing input.
Methods for preparing input validations and steps to follow when error occur
6.4.2 OBJECTIVES
• Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input
process and show the correct direction to the management for getting correct
information from the computerized system.
• It is achieved by creating user-friendly screens for the data entry to handle large volume
of data. The goal of designing input is to make data entry easier and to be free from
errors. The data entry screen is designed in such a way that all the data manipulates can
be performed. It also provides record viewing facilities.
• When the data is entered it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that the user will not
be in maize of instant. Thus, the objective of input design is to create an input layout
that is easy to follow.
6.4.3 OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to
other system through outputs. In output design it is determined how the information is to be
displaced for immediate need and also the hard copy output. It is the most important and direct
source information to the user. Efficient and intelligent output design improves the system’s
relationship to help user decision-making.
• Designing computer output should proceed in an organized, well thought out manner;
the right output must be developed while ensuring that each output element is designed
so that people will find the system can use easily and effectively. When analysis design
computer output, they should Identify the specific output that is needed to meet the
requirements.
• Select methods for presenting information.
59
MRIET
• Create document, report, or other formats that contain information produced by the
system.
The output form of an information system should accomplish one or more of the following
objectives.
• Convey information about past activities, current status or projections of the

• Future.
• Signal important events, opportunities, problems, or warnings.
• Trigger an action.
• Confirm an action
6.5 CONCLUSION
In this section we have gone through the environment setup used during development and
deployment of the project. We have gained a lot of knowledge on how to practically implement
these technologies
60
MRIET
7 SYSTEM IMPLEMENTATIONS
7.1 INTRODUCTION
The implementation phase, the project plan is put into motion and the work of the project is
performed. The project takes shape during the implementation phase. This phase involves the
construction of the actual project result. Programmers are occupied with encoding, designers
are involved in developing graphic material, contractors are building, the actual reorganisation
takes place.
The purpose of the design phase is to plan a solution of the problem specified by the
requirement document. This phase is the first step in moving from the problem domain to
the solution domain. In other words, starting with what is needed, design takes us toward
how to satisfy the needs. The design of a system is perhaps the most critical factor affection
the quality of the software; it has a major impact on the later phase, particularly testing,
maintenance. The output of this phase is the design document. This document is similar to a
blueprint for the solution and is used later during implementation, testing and maintenance.
The design activity is often divided into two separate phases System Design and Detailed
Design.
System Design also called top-level design aims to identify the modules that should be in the
system, the specifications of these modules, and how they interact with each other to produce
the desired results. At the end of the system design all the major data structures, file formats,
output formats, and the major modules in the system and their specifications are decided.
During, Detailed Design, the internal logic of each of the modules specified in system design
is decided. During this phase, the details of the data of a module are usually specified in a
high-level design description language, which is independent of the target language in which
the software will eventually be implemented.
In system design the focus is on identifying the modules, where as during detailed design
the focus is on designing the logic for each of the modules. In other works, in system design
61
MRIET
the attention is on what components are needed, while in detailed design how the components
can be implemented in software is the issue.
Design is concerned with identifying software components specifying relationships among

components. Specifying software structure and providing blue print for the document phase.
Modularity is one of the desirable properties of large systems. It implies that the system is
divided into several parts. In such a manner, the interaction between parts is minimal clearly
specified.
During the system design activities, Developers bridge the gap between the requirements
specification, produced during requirements elicitation and analysis, and the system that is
delivered to the user. Design is the place where the quality is fostered in development.
Software design is a process through which requirements are translated into a representation
of software.
7.2 PROPOSED SYSTEM MODULES
IMPORTING LIBRARIES
In []: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import pandas_profiling as pp
from pandas_profiling import ProfileReport
import keras as k # deep learning frame work
from sklearn.model_selection import train_test_split
import keras
from keras import regularizers
from keras.models import Sequential,model_from_json
from keras.layers import Dense, Dropout, Activation
from keras import optimizers
from numpy.random import seed
seed(1)
62
MRIET
READ DATASET
In []: df_users = pd.read_csv(r"C:\Users\priyam_upadhyay\Desktop\project_file\Detecting-
Fake-Profiles-On-Social-Media-master\Detecting-Fake-Profiles-On-Social-Media-
master/dataset/users.csv")
df_fusers =
pd.read_csv(r"C:\Users\priyam_upadhyay\Desktop\project_file\Detecting-Fake- Profiles-On-
Social-Media-master\Detecting-Fake-Profiles-OnSocialMediamaster/dataset/fusers.csv")
In []: df_fusers.shape
Out[]: (3351, 38)
In []: df_users.shape
Out[]: (3474, 42)
Add isFake Column
In []: #for df_users

isNotFake = np.zeros(3474)
#for df_fusers
IsFake = np.ones(3351)
In []: #adding is fake or not column to make predictions for it

df_fusers["isFake"] = isFake
df_users["isFake"] = isNotFake
Combine different datasets into one
In []: df_allUsers = pd.concat([df_fusers, df_users], ignore_index=True)

df_allUsers.columns = df_allUsers.columns.str.strip()
In []: #to shuffle the whole data

df_allUsers = df_allUsers.sample(frac=1).reset_index(drop=True)
63
MRIET
In []: df_allUsers.describe()
Out []:
In []: df_allUsers.head()
Out []:
Distribution of Data in X and Y
In []: Y = df_allUsers.isFake
In []: df_allUsers.drop(["isFake"], axis=1, inplace=True)

X = df_allUsers
In []: profile = ProfileReport(X, title="Pandas Profiling Report")

Out []: HBox(children=(HTML(value='Summarize dataset'), FloatProgress(value=0.0, max
=55.0), HTML(value='')))
64
MRIET
HBox(children=(HTML(value='Generate report structure'), FloatProgress(value=0.0

, max=1.0), HTML(value='')))
HBox(children=(HTML(value='Render HTML'), FloatProgress(value=0.0, max=1.0
), HTML(value='')))
In []: Y.reset_index(drop=True, inplace=True)
In []: print(Y.shape)
Out []: (6825,)
In []: X.head()
Out []:
In []: lang_list = list(enumerate(np.unique(X["lang"])))
lang_dict = {name : i for i, name in lang_list}
X.loc[:, "lang_num"] = X["lang"].map(lambda x: lang_dict[x]).astype(int)
X.drop(["name"], axis=1, inplace=True)
Feature Selection
65
MRIET
In []: X = X[[
"statuses_count",
"followers_count",
"friends_count",
"favourites_count",
"lang_num",
"listed_count",
"geo_enabled",
"profile_use_background_image"
]]

Profile
Out []: HBox(children=(HTML(value='Summarize dataset'), FloatProgress(value=0.0, max=
21.0), HTML(value='')))
HBox(children=(HTML(value='Generate report structure'), FloatProgress(value=0.0,
max=1.0), HTML(value='')))
HBox(children=(HTML(value='Render HTML'), FloatProgress(value=0.0, max=1.0),
HTML(value='')))
In []: X = X.replace(np.nan, 0) #To replace the missing boolean values with zeros as it mea
ns false

Profile
Out []: HBox(children=(HTML(value='Summarize dataset'), FloatProgress(value=0.0, max=2
1.0), HTML(value='')))
HBox(children=(HTML(value='Generate report structure'), FloatProgress(value=0.0,
max=1.0), HTML(value='')))
66
MRIET
HBox(children=(HTML(value='Render HTML'), FloatProgress(value=0.0, max=1.0),

HTML(value='')))
Import Data
In []: train_X, test_X, train_y, test_y = train_test_split(X, Y, train_size=0.8, test_size=0.2,

random_state=0)
train_X, val_X, train_y, val_y = train_test_split(train_X, train_y, train_size=0.8,

test_size=0.2, random_state=0)
In []: print(train_X.shape)
print(test_X.shape)
print(train_y.shape)
print(test_y.shape)
Out []: (4368, 8)
(1365, 8)
(4368,)
(1365,)
Design Model
In []: model = Sequential()

model.add(Dense(32, activation='relu', input_dim=8))
model.add(Dense(64, input_dim=32, activation='relu'))
model.add(Dense(64, input_dim=64, activation='relu'))
model.add(Dense(32,input_dim=64, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
Out []:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
67
MRIET
=================================================================
dense (Dense) (None, 32) 288
_________________________________________________________________
dense_1 (Dense) (None, 64) 2112
_________________________________________________________________
_________________________________________________________________
_________________________________________________________________
=================================================================
Total params: 8,673
Trainable params: 8,673
Non-trainable params: 0
Compile Model
In []: model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
Training
In []: history = model.fit(train_X, train_y,
epochs=15,
verbose=1,
validation_data=(val_X,val_y))
Out []:
68
MRIET
Testing
In []: score = model.evaluate(test_X, test_y, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Out []: Test loss: 0.15890942513942719
Test accuracy: 0.9912087917327881
Graphs
In []: # Plot training and validation accuracy values
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
axes = plt.gca()
axes.set_xlim([0,14])
69
MRIET
axes.set_ylim([0.85,1])
axes.grid(True, which='both')
axes.axhline(y=0.85, color='k')
axes.axvline(x=0, color='k')
axes.axhline(y=1, color='k')
plt.legend(['Train','Val'], loc='lower right')
plt.show()
Out []:
In []: # Plot training and validation loss values

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
axes = plt.gca()
axes.set_xlim([0,14])
axes.set_ylim([0,5])
axes.grid(True, which='both')
70
MRIET
plt.legend(['Train','Val'], loc='upper right')
plt.show()
Out []:
Prediction
In []: # Write the index of the test sample to test
prediction = model.predict(test_X[136:137])
prediction = prediction[0]
print('Prediction\n',prediction)
print('\nThresholded output\n',(prediction>0.5)*1)
Out []: Prediction
[0.9993391]
Thresholded output
[1]
Ground truth
In []: print(test_y[136:137])
71
MRIET
Out []: 5389 1.0

Name: isFake, dtype: float64
Saving, Loading Model
# Saving
In []: # serialize model to JSON
model_json = model.to_json()
# Write the file name of the model
with open("model.json", "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
# Write the file name of the weights
model.save_weights("model.h5")
print("Saved model to disk")
Out []: Saved model to disk
Loading
In []: # load json and create model
# Write the file name of the model
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
# Write the file name of the weights
loaded_model.load_weights("model.h5")
print("Loaded model from disk")
Out []: Loaded model from disk
72
MRIET
7.3 SNAPSHOTS
Figure 7.3.1: Imported libraries
Figure 7.3.2: Trained and Compiled Output Screen
73
MRIET
Figure 7.3.3: Plot training and validation accuracy values
Figure 7.3.4 : Random Forest Classifier
74
MRIET
Figure 7.3.5: Plot training and validation loss values
Figure 7.3.6: Prediction and Ground truth
75
MRIET
Figure 7.3.7: AdaBoost Classifier
Figure 7.3.8: Decision Tree Classifier
7.4 CONCLUSION
Implementing this project has helped us understand the use of various deep learning techniques.
We also learned a lot about the workings of ML, Deep learning and how to implement it in real
world situations. We have also learned a lot about machine learning specifically about the
various steps present in it.
76
MRIET
8 SYSTEM TESTS
8.1 INTRODUCTION
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality
of components, sub-assemblies, assemblies and/or a finished product It is the process of
exercising software with the intent of ensuring that the Software system meets its requirements
and user expectations and does not fail unacceptably. There are various types of test. Each test
type addresses a specific testing requirement.
8.2 TYPES OF TESTS

Unit testing
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is
structural testing, that relies on knowledge of its construction and is invasive. Unit tests
perform basic tests at the component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined inputs and
expected results.
Integration testing
Integration tests are designed to test integrated software components to determine if they run
as one program. Testing is event-driven and is more concerned with the basic outcome of
screens or fields. Integration tests demonstrate that although the components were individually
satisfaction, as shown by successfully unit testing, the combination of components is correct
and consistent. Integration testing is specifically aimed at exposing the problems that arise
from the combination of components.
Functional test
77
MRIET
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centred on the following items:
• Valid Input: Identified classes of valid input must be accepted.
• Invalid Input: Identified classes of invalid input must be rejected.
• Functions: Identified functions must be exercised.
• Output: Identified classes of application outputs must be exercised.
• Systems/Procedures: Interfacing systems or procedures must be invoked.
Organization and preparation of functional tests are focused on requirements, key functions,
or special test cases. Also, systematic coverage about identifies Business process flows; data
fields, predefined processes, and successive processes must be considered for testing. Before
functional testing is complete, additional tests are identified and the effective value of current
tests are determined.
System Test
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration-oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.
White Box Testing
White Box Testing is a testing in which the software tester knows the inner workings, structure
and language of the software, or at least its purpose. It is purpose. It is used to test areas that
cannot be reached from a black-box level.
Black Box Testing
Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, like most other kinds of tests,
must be written from a definitive source document, such as specification or requirements
document, such as specification or requirements document. It is a testing in which the software
under test is treated, as a black box. you cannot “see” into it. The test provides inputs and
78
MRIET
responds to outputs without considering how the software works.
Test strategy and approach
Field testing will be performed manually and functional tests will be written in detail.
Test objectives
• All field entries must work properly.

• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.
Features to be tested
• Verify that the entries are of the correct format

• No duplicate entries should be allowed
• All links should take the user to the correct page.
Integration Testing
Software integration testing is the incremental integration testing of two or more integrated
software components on a single platform to produce failures caused by interface defects.
The task of the integration test is to check that components or software applications, e.g.,
components in a software system or – one step up – software applications at the company level
interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant participation
by the end user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered
79
MRIET
8.3 TEST RESULTS
TEST CASE TEST CASE EXPECTED ACTUAL RESULT

NAME DESCRIPTION VALUE VALUE
The user must be

able to access and The system
a proper should be
Compile Test description of efficient in The system is Ready for
error message handling reliable. Deployment.
must be displayed requests and
at the time of reliable.
accessing it.
This is to verify
whether user is The user must
able to perform have all the
Operation Test and view results approved and The system is Ready for
along with other correct results reliable. Deployment.
functional to be
operation like displayed.
graphical output,
etc.
To test working of The modle
Explore ML algorithms must support The system is Ready for
Operation Test and validate the all the reliable. Deployment.
course of action. operations.
8.4 CONCLUSION
Testing is an especially important phase during the development of a project. It helps us find
bugs and unwanted issues within the system. During this phase, we found some bugs in the
system that we could easily fix. This helped us determine if the system was ready for realworld
use. After rigorous testing, we could find that the system is ready for deployment.
80
MRIET
9 CONCLUSIONS
9.1 PROJECT INFERENCE / CONCLUSION
In this research, we have come up with an ingenious way to detect fake accounts on OSNs By
using Artificial Neural Network to its full extent, we have eliminated the need for manual
prediction of a fake account, which needs a lot of human resources and is also a time-
consuming process. Existing systems have become obsolete due to the advancement in the
creation of fake accounts. The factors that the existing system relayed upon is unstable. In this
research, we used stable factors such as engagement rate, artificial activity to increase the
accuracy of the prediction.
9.2 PROJECT SCOPE AND ENHANCEMENTS
Future work is to apply feature sets used in other spam detection models, and hence to realize
multi-model ensemble prediction. Another direction is to make the system robust against
adversarial attacks, such as a botnet that diversifies all features, or an attacker that learns from
failures.
There is always a possibility of improvement, therefore we have implemented all the core
features of the proposed system, but we believe there can be few more advancements to this
artifact:
• Improving the accuracy of the model.
• Working with a larger set of tuples and data on varied platform for increased reliability.
• Advancements of the user interface for better interactions
• Optimizing the time required for data injection and retrieval
81
MRIET
10 REFERENCES
10.1 BOOK REFERENCES
➢ JavaScript The Complete Reference 3rd Edition, by Thomas Powell Nambouri Sravya,
Chavana Sai praneetha, S. Saraswathi,” Identify the Human or Bots Twitter Data using
Machine Learning Algorithms”, International Research Journal of Engineering and
Technology (IRJET), Volume: 06 Issue: 03 | Mar 2019 www.irjet.net, e-ISSN: 2395-
0056, p- ISSN: 2395-0072.
➢ M. Smruthi, N. Harini,” A Hybrid Scheme for Detecting Fake Accounts in Facebook”,
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-
3878, Volume-7, Issue-5S3, February 2019.
➢ Tehlan, Pooja, Rosy Madaan, and Komal Kumar Bhatia. "A Spam Detection
Mechamism in Social Media using Soft Computing."
➢ Rao, P. S., J. Gyani, and G. Narsimha. "Fake profiles identification in online social
networks using machine learning and NLP." Int. J. Appl. Eng. Res 13.6 (2018): 973-
4562.
➢ Raturi, Rohit. "Machine learning implementation for identifying fake accounts in social
network." International Journal of Pure and Applied Mathematics 118.20 (2018): 4785-
4797. J. Wang, “Fundamentals of erbium-doped fibre amplifiers arrays (Periodical
style—Submitted for publication),” IEEE J. Quantum Electron., submitted for
publication.
➢ Van Der Walt, Estée, and Jan Eloff. "Using machine learning to detect fake identities:
bots vs humans." IEEE Access 6 (2018): 6540-6549.
➢ Kulkarni, Sumit Milind, and Vidya Dhamdhere. "Automatic detection of fake profiles
in online social networks." Open access international journal of science and engineering
3.1 (2018): 70-73. M. Young, The Techincal Writers Handbook. Mill Valley, CA:
University Science, 1989.
➢ Ala'M, Al-Zoubi, Ja'far Alqatawna, and Hossam Faris. "Spam profile detection in social
networks based on public features." 2017 8th International Conference on information
and Communication Systems (ICICS). IEEE, 2017.
82
MRIET
➢ Elovici, Yuval, and Gilad Katz. "Method for detecting spammers and fake profiles in
social networks." U.S. Patent No. 9,659,185. 23 May 2017.
➢ Gurajala, Supraja, et al. "Profile characteristics of fake Twitter accounts." Big Data &
Society 3.2 (2016): 2053951716674236.
➢ Ferrara, Emilio, et al. "Predicting online extremism, content adopters, and interaction
reciprocity." International conference on social informatics. Springer, Cham, 2016.
➢ Caspi, Avner, and Paul Gorsky. "Online deception: Prevalence, motivation, and
emotion." Cyber Psychology & Behaviour 9.1 (2006): 54-59.
➢ Bergen, Emilia, et al. "The effects of using identity deception and suggesting secrecy
on the outcomes of adult-adult and adult-child oradolescent online sexual interactions."
Victims & Offenders 9.3 (2014): 276-298.
➢ Wani, Suheel Yousuf, Mudasir M. Kirmani, and Syed Imamul Ansarulla. "Prediction
of fake profiles on Facebook using supervised machine learning techniques-A
theoretical model." International Journal of Computer Science and Information
Technologies (IJCSIT) 7, no. 4 (2016): 1735-1738.
➢ Wu, W., Alvarez, J., Liu, C. and Sun, H.M., 2018. Bot detection using unsupervised
machine learning. Microsystem Technologies, 24(1), pp.209-217.
10.2 LINK REFERENCES
➢ https://www.google.co.in/
➢ https://jupyter.org/install
➢ https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
➢ https://www.datarobot.com/wiki/deep-learning/
➢ https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
➢ https://github.com/
83
MRIET

D8 - Fake Profile Detection (Gpku)

Uploaded by

Copyright:

Available Formats

You might also like

D8 - Fake Profile Detection (Gpku)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

D8 - Fake Profile Detection (Gpku)

Uploaded by

Copyright:

Available Formats

An industrial oriented major project report on

FAKE PROFILE DETECTION USING DEEP LEARNING

Under the esteemed guidance of

Dr. ANANTHA RAMAN G R

1. PRIYAM KUMAR UPADHYAY (17W91A05N5)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VIVA VOCE DATE: ________________

INTERNAL GUIDE HOD

1. PRIYAM KUMAR UPADHYAY (17W91A05N5)

CH NO. CONTENT PAGE NO.

SCREEN DESCRIPTION PAGE

Trained and Compiled Output Screen

Plot Training and Validated Accuracy Values

Random Forest Classifier

Plot Training and Validation Loss Values

Prediction and Ground Truth

Decision Tree Classifier

S NO. TABLE NO. DESCRIPTION PAGE NO.

1 8.3 Test Case Results 80

API Application Programming Interface

ANN Artificial Neural Network

OSN Online Social Network

SVM Support Vector Machine

HTTP Hypertext Transfer Protocol

QOS Quality of Service

SRS Software Requirement Specification

1.1 PROBLEM STATEMENT

Figure 1.1.1: Graph Showing Increase in Number of Fake

Some aspects of the project are:

• Open: Should be accessible and readable to everyone across the user

Some limitations with this system are listed below:

Figure2.1.1: Training Datasets over extracted and refined dataset

Our contributions to this field include:

• We introduce a Spiral model architecture to achieve considerably higher accuracy.

• Providing a practical and interactive approach to data processing.

2.1.1 Web Scraping

Figure 2.1.1.1: Web Scraping Process

Text pattern matching

Semantic annotation recognizing

Computer vision web-page analysis

2.3 DISADVANTAGES OF EXISTING SYSTEM

The existing systems have the following disadvantages:

• Main limitation is assumption of independent predictors

The various objectives of Fake Profile Detection are:

• Provides spectators with a simple interface to interact with.

2.5 PROPOSED SYSTEM

• Problems are represented by attribute-value pairs.

2.6 SYSTEM ARCHITECTURE

Figure 2.5.1: Fake Profile Detection System Architecture

Extract the attributes

Pass it through the algorithm

Determine fake or real profiles

Figure 2.6.1: Components of ML Algorithms

A subset of machine learning is closely related to computational statistics, which focuses on

2.8 DEEP LEARNING

Figure 2.7.1: Building Components of Web Application

2.9 DATA ANALYTICS