Professional Documents
Culture Documents
D8 - Fake Profile Detection (Gpku)
D8 - Fake Profile Detection (Gpku)
D8 - Fake Profile Detection (Gpku)
MALLA REDDY
INSTITUTE OF ENGINEERING & TECHNOLOGY, (MRET - W9)
(Sponsored by Malla Reddy Educational society)
Permanently affiliated to JNTUH, Approved by AICTE,
Accredited by NBA & NAAC, An ISO 9001:2015 Certified Institution
Maisammaguda, Dhulapally post, Malkajgiri, Medchal-500100.
DECLARATION
We hereby declare that the project entitled “Fake Profile Detection Using Deep Learning”
submitted to Malla Reddy Institute of Engineering and Technology (MRET-W9), affiliated to
Jawaharlal Nehru Technological University Hyderabad (JNTUH) for the award of the degree
in Bachelor of Technology in Computer Science & Engineering is a result of original industrial
oriented project done by us.
It is further declared that the project report or any part thereof has not been previously submitted
to any University or Institute for the award of degree or diploma.
BONAFIDE CERTIFICATE
This is to certify that this is the Bonafide Certificate of the Project Titled “Fake Profile
Detection Using Deep Learning” is submitted by PRIYAM KUMAR UPADHYAY
(17W91A05N5), P ANANTHA SAI PHANI TEJA (17W91A05J5), JATOTHU SAIDA
NAYAK (17W91A05L4), CHADA SRIJA (18W95A0501) of B. Tech in the partial
fulfilment of the requirements for the degree in BACHELOR OF TECHNOLOGY IN
COMPUTER SCIENCE AND ENGINEERING, Department of Computer Science &
Engineering and this has not been submitted for the award of any other degree of this
institution.
EXTERNAL EXAMINER
ACKNOWLEDGEMENT
First and foremost, we are grateful to the Principal Dr. M ASHOK Professor, for providing us
with all the resources in the college to make our project a success. We thank him for his
valuable suggestions at the time of seminars which encouraged us to give our best in the project.
We would like to express our gratitude to Dr. P KIRAN KUMAR REDDY Professor, Dean
of Academics for his support and valuable suggestions during the dissertation work.
We would like to express our gratitude to Dr. ANANTHA RAMAN G R Professor, Head of
the Department and our internal guide, Department of Computer Science and Engineering for
his support and valuable suggestions during the dissertation work.
We offer our sincere gratitude to our project coordinator Mr. M NAGENDRA RAO Assistant
Professor of Computer Science and Engineering department who has supported us throughout
this project with their patience and valuable suggestions.
We would also like to thank all the supporting staff of the Dept. of CSE and all other
departments who have been helpful directly or indirectly in making the project a success.
We are extremely grateful to our parents for their blessings and prayers for our completion of
project that gave us strength to do our project.
Abstract I
List of Figures II
List of Screens III
List of Tables III
List of Abbreviations IV
ABSTRACT
Online Social Networks (OSN) are popular applications for sharing various data, including
text, photos, and videos. However, fake account problems are one of the obstacles in the current
OSN systems. Fake profiles play an important role in advanced persistent threats and are also
involved in other malicious activities. Attacker exploits fake accounts to distribute misleading
information such as malware, virus, or malicious URLs.
The approaches to identifying fake profiles in social media can be classified into the approaches
aimed on analyzing profiles data and individual accounts. Social networks fake profile creation
is considered to cause more harm than any other form of cybercrime. This crime has to be
detected even before the user is notified about the fake profile creation. Many algorithms and
methods have been proposed for the detection of fake profiles in the literature.
This paper sheds light on the role of fake identities in advanced persistent threats and covers
the mentioned approaches of detecting fake social media profiles. In order to make a relevant
prediction of fake or genuine profiles, we will assess the impact of three supervised machine
learning algorithms: Random Forest (RF), Decision Tree (DT-J48), and Naïve Bayes (NB).
Inspired by the big successes of deep learning in computer vision, mainly in automatic feature
extraction and representation, we propose Deep Profile, a deep neural network (DNN)
algorithm to deal with fake account issues. Instead of using standard machine learning, we
construct a dynamic CNN to train a learning model in fake profile classification. Notably, we
propose a novel pooling layer to optimize the neural network performance in the training
process. Demonstrated by the experiments, we harvest a promising result with better accuracy
and small loss than common learning algorithms in a malicious account classification task.
Keywords: Online social networks, User profiling, Fake profile detection, CNN, Machine
Learning
MRIET I
FAKE PROFILE DETECTION USING DEEP LEARNING
LIST OF FIGURES
S FIG PAGE
DESCRIPTION
NO. NO. NO.
Graph showing increase in no. of fake
1 1.1.1 2
accounts across Facebook
2 2.1.1 Training Datasets over extracted and refined dataset 7
3 2.1.1.1 Web Scraping Process 8
4 2.5.1 Fake Profile Detection System Architecture 15
5 2.6.1 Components of ML Algorithms 16
6 2.7.1 Building Components of web Application 18
7 2.8.1 Deep Learning and Data Analytics Automation 20
8 3.5.2.1 Naïve’s bayes Algorithm 29
9 4.2.1.1 Use case Diagram 35
10 4.2.2.1 Class Diagram 36
11 4.2.3.1 Component Diagram 37
12 4.2.4.1 Deployment Diagram 38
13 4.2.5.1 State Chart Diagram 39
14 4.2.7.1 Sequence Diagram 40
15 5.2.1 Deep Learning Origination 44
16 5.2.1.1 Layers of Deep Learning 45
17 5.3.1 Neural Networks 46
18 5.4.1.1 Artificial Neural Network (ANN) 49
19 5.4.2.1 ANN Layers 49
20 5.4.4.1 Working of ANN 51
21 6.2.1 Browser Installation 52
22 6.2.2 Download Browser for setup 53
23 6.2.3 Default Browser Selection 53
24 6.2.4 Acceptance to Policy 54
MRIET II
FAKE PROFILE DETECTION USING DEEP LEARNING
LIST OF SCREENS
Imported Libraries
1 7.3.1 73
AdaBoost Classifier
7 7.3.7 76
LIST OF TABLES
MRIET III
FAKE PROFILE DETECTION USING DEEP LEARNING
LIST OF ABBREVIATIONS
ACRONYM DESCRIPTION
UI User Interface
ML Machine Learning
MRIET IV
FAKE PROFILE DETECTION USING DEEP LEARNING
1 INTRODUCTION
One of the most popular applications in the mobile device is the Online Social Network (OSN).
Currently, it is an essential element in our daily. It becomes a popular application to connect
people around the earth for sharing various data items includes videos, photos, and messages.
However, an anomalous issue like fake accounts becomes a significant concern in OSN
protection. Several studies propose techniques to improve OSN protection in various manner.
For instance, a study introduces a virtual channel model to improve OSN protection.
Commonly, each of the devices has a security mechanism to unlock and access the device, such
as a PIN, a password, and keyboard patterns. Unfortunately, the conventional model puts user
data at risk because there is no additional security to check the user’s activities and performance
after unlocked or login to the application. Unauthorized people may able to crack the simple
passwords or PIN of mobile phones or even wearable devices because of these weaknesses.
The anomalous account becomes one of the main challenges in current public OSN. Growing
users on the OSN heighten the probability level of malicious activities. Various studies propose
numerous techniques to distinguish benign or malicious accounts effectively. Yet, it remains
big challenges in OSNs that have a large number of users and information with a dynamic
environment. In the transmission process, the OSNs able to run either independently or
dependently group. Moreover, for security reasons, OSN also organizes the scheme of a single
user group.
To avoid all these problems there is a lot of research work done. Most of the research work
completed in the field of supervised machine classifiers and unsupervised machine classifiers
of the machine learning algorithm. Supervised machine learning algorithm uses classifiers such
as naïve bays, decision tree, SVM, ANNs and also using Deep Learning (CNN). Through all
these classifiers detect the fake accounts on social media. For the detection of fake accounts,
the first process is to select the target profile for analysis to extract the feature set of the profile,
such as name, chat history, location, friends list, followers, likes, comments, and tagging. Then
applied the supervised or unsupervised machine learning classifier on it, which determines the
target profile is fake or genuine. On this technique, there is a lot of work done and gives a
successful detection result. Most of the machine learning algorithm helps in enhancing the
accuracy rate of the system that is becoming 50%-96%.
1
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
In today's Modern society, social media plays a vital role in everyone's life. The general purpose
of social media is to keep in touch with friends, sharing news, etc. The number of users in
social media is increasing exponentially. Instagram has recently gained immense popularity
among social media users. With more than 1 billion active users, Instagram has become one of
the most used social media sites. After the emergence of Instagram to the social media scenario,
people with a good number of followers have been called Social Media Influencers. These
social media influencers have now become a go-to place for the business organization to
advertise their products and services.
The widespread use of social media has become both a boon and a bane for the society. Using
social media for online fraud, spreading False information is increasing at a rapid pace. Fake
accounts are the major source of false information on social media. Business organizations that
invest huge Sum of money on social media influencers must know whether the following
gained by that account is organic or not. So, there is a widespread need for a fake account
detection tool, which can accurately say whether the account is fake or not. In this paper, we
use classification algorithms in machine learning to detect fake accounts. The process of
finding a fake account mainly depends on factors such as engagement rate and artificial
activity.
2
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
1.2SCOPE
Fake Profile detection using deep learning is a system that is available for user to seamlessly
detect fake profiles across the Online Social Network (OSN).
It can be accessed from almost all system, any time with the help of an internet connection
and web browser on python tool.
The project is a sincere effort in simplifying the task of administrators in an easily usable
format. We finalized to make this project and hence planned to develop this algorithm using
many machines learning supervised techniques and Deep learning for accuracy evaluation.
The proposed framework shows the sequence of processes that need to be followed for
continues detection of fake profiles with active leaning from the feedback of the result given
by the classification algorithm. This framework can easily be implemented by the social
networking company. By using method and parameters fake profiles detection becomes easy.
As a result of this cybercrime may be reduced.
Our application comes with abilities to expand over additional requirement due to its robustness
for enhancements and upgradation of portal. One of the crucial aspects comes into picture is
its nature of platform independency. The operational efficiency ensures new users to access
and get at any time.
Another important extremity is the low latency of the portal due to its neutral architecture and
portability towards integration of multiple algorithms across platform.
3
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
This project eliminates the need for constant physical enquiries, this saves a lot of time for both
the researchers and the facilities in process of exchange of data sets.
This report discusses the implementation details of the project, and the advantages of having
different visualizations system with supervised learning algorithms and neural networks.
1.3LIMITATIONS
While the information available online is staggering and enormously increasing day by day,
even in our technological age, we cannot forget to mention the fact that not everyone in this
day is ready to sit down at a computer screen and read for any great deal of time and find out
whether the information being displayed (the profile). Curling up in front of the fire on a cold
day with a book in hand can never be replaced by sitting in a cold chair staring at the words
and profile on computer screen.
The algorithm has few downsides such as inefficiency to handle the categorical variables which
has different number of levels. Also, when there is an increase in the number of trees, the
algorithm's time efficiency takes a hit.
Fake Profile Detection also requires constant upkeep even in-between issues. Data Sets need
to be tested regularly in order to avoid ‘linkrot’ & ‘Inconsistencies’. And because editing can
be done at any time, there's a responsibility attached to make sure what needs to be fixed is.
• CNN do not encode the position and orientation of the object into their predictions.
• They completely lose all their internal data about the pose and the orientation of the
object and they route all the information to the same neurons that may not be able to
deal with this kind of information.
• CNN do not encode the position and orientation of object.
• Lack of ability to be spatially invariant to the input data.
• Lots of training data is required.
• Optimization of wait time needed for quality Interface generation.
• Inefficiency to handle the categorical variables.
• Fake profile detection currently lacks the feature of automatic data categorization.
4
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
• Fake profile detection lacks the feature of automatic inappropriate data detection and
acknowledgement.
1.4 CONCLUSION
This Chapter provides insight into the project, the various limitations of the project and the
various advantages of it. Here we get a clear picture of what the system must do and what is
expected of the system.
This study is providing extensive investigation with systematic analysis about the impacts of
Fake profile detection tools in learning to identify constrains and limitations of Fake profile
detection. Unfortunately, we agreed with similar previous studies that current tools are still
inadequate and inefficient to be used for replacing the traditional note takin Fake profile
detection due to several issues. We found that developing a successful Fake profile detection
application is challenges because of four main issues, complexity, technology learning
dilemma, integrity, and inefficiency issues. This study discusses the main implications to shape
the future of Online System Network (OSN).
5
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
2 LITERATURE SURVEY
2.1 INTRODUCTION
A literature survey or a literature review in a project report is that Chapter which shows the
various analyses and research made in the field of your interest and the results already
published, considering the various parameters of the project and the extent of the project.
It is the most important part of your report as it gives you a direction in your research. It helps
you set a goal for your analysis - thus giving you your problem statement.
Growing OSN can increase the popularity of people and social ratings. A practical example,
OSN users can produce popularity with many likes, followers, and comments. However, it is
too easy to create fake accounts, or people can buy it online with few costs. For example, it is
easier to buy Twitter and Instagram followers and likes on the internet. Commonly, to detect
anomalous accounts in OSN, the methods analyse activity variations. Typically, the users’
activities keep on changing in a period. Sudden changing of access pattern for the information
and behaviour allows the server to catch the suspicious account up. If it fails, the anomalous
can infect the system with existing fraudulent.
The infected account also caused by a Cyborg, a type of fake account with forged identities. It
changes the user’s credibility and utilizes the compromised account to spread misleading
information, spreading rumours, polarize mass opinions. On the other hand, diverse
communities propose a bunch of dataset analysis with supervised or unsupervised learning to
address the problem. For instance, in learning technique, the model can train the features data
in a period to compute user classification. Several papers also investigate fake nodes detection
with statistical methods, distributed spatial with the density-based approach, SVM, or even
combine SVM, RF, and AdaBoost to detect the OSN fake accounts.
Not just using the OSN features data, the study of fake detection can utilize dynamic data such
as behavioural analysis, graph theory, learning algorithm, and application pattern. By using the
features, they construct various approaches to identify and classify anomalies. To hinder the
suspicious activities of the intruder in the large OSN, a study explores a method by forming a
community detection algorithm. Another study presents a model with a basis of social
behaviour to explore users’ profiles to deal with the detection problem. By analysing the
6
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
behaviour in a single OSN environment, the model can classify the compromised user. The
methods determine suspicious accounts at vary grades without regarding the horizontal
classification or utilize intelligent sensing models for detecting anomalies.
Various methods used to get an efficient authentication process for multiple issues like key
agreement schemes to provide secure roaming services within the information. The OSN
environment needs to create a system for solving the malicious account problem to obtain
proper authentication. For roaming service with user anonymity, the scheme can be considered
as secure authentication and key agreement, physical-social location in the network, rumours
propagation, or even tracking user interaction in joint community OSN. Conventional
techniques like CAPTCHA are an authentication process when people and the application are
authorized in a system. However, it is hard to detect and stop the fake accounts with common
security approaches. The conventional security technique utilizes CAPTCHAs and SMS
verification to verify the accounts and avoid creating fake accounts. However, the attacker can
face these challenges with traditional methods, spammers can pass the obstacles using
CAPTCHA farms or SIM card farms.
Fake Profile Detection using Deep learning is a notion that highly depends on availability of
Data sets. Generally, it is linked to the conditions in which it is viewed; therefore, it is a highly
subjective topic. Data set training aims to quantitatively represent the human perception of
quality by enhancing the UI.
7
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
This report discusses the implementation details of the model, and the advantages of having
different visualizations system for understanding the data standards.
Web scraping a web page involves fetching it and extracting from it. Fetching is the
downloading of a page (which a browser does when a user views a page). Therefore, web
crawling is a main component of web scraping, to fetch pages for later processing. Once
fetched, then extraction can take place. The content of a page may be parsed, searched,
reformatted, its data copied into a spreadsheet or loaded into a database. Web scrapers typically
take something out of a page, to make use of it for another purpose somewhere else.
Web scraping is used for contact scraping, and as a component of applications used for web
indexing, web mining and data mining, online price change monitoring and price comparison,
product review scraping (to watch the competition), gathering real estate listings, weather data
8
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
monitoring, website change detection, research, tracking online presence and reputation, web
mashup and, web data integration.
Techniques
Web scraping is the process of automatically mining data or collecting information from the
World Wide Web. It is a field with active developments sharing a common goal with
the semantic web vision, an ambitious initiative that still requires breakthroughs in text
processing, semantic understanding, artificial intelligence and human-computer interactions.
Current web scraping solutions range from the ad-hoc, requiring human effort, to fully
automated systems that are able to convert entire web sites into structured information, with
limitations.
Human copy-and-paste
The simplest form of web scraping is manually copying and pasting data from a web page into
a text file or spreadsheet. Sometimes even the best web-scraping technology cannot replace a
human's manual examination and copy-and-paste, and sometimes this may be the only
workable solution when the websites for scraping explicitly set up barriers to prevent machine
automation.
HTTP programming
Static and dynamic web pages can be retrieved by posting HTTP requests to the remote web
server using socket programming.
HTML parsing
Many websites have large collections of pages generated dynamically from an underlying
structured source like a database. Data of the same category are typically encoded into similar
pages by a common script or template. In data mining, a program that detects such templates
in a particular information source, extracts its content and translates it into a relational form, is
called a wrapper. Wrapper generation algorithms assume that input pages of a wrapper
induction system conform to a common template and that they can be easily identified in terms
9
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
of a URL common scheme. Moreover, some semi-structured data query languages, such
as XQuery and the HTQL, can be used to parse HTML pages and to retrieve and transform
page content.
DOM parsing
By embedding a full-fledged web browser, such as the Internet Explorer or
the Mozilla browser control, programs can retrieve the dynamic content generated by client-
side scripts. These browser controls also parse web pages into a DOM tree, based on which
programs can retrieve parts of the pages. Languages such as Xpath can be used to parse the
resulting DOM tree.
Vertical aggregation
There are several companies that have developed vertical specific harvesting platforms. These
platforms create and monitor a multitude of "bots" for specific verticals with no "man in the
loop" (no direct human involvement), and no work related to a specific target site. The
preparation involves establishing the knowledge base for the entire vertical and then the
platform creates the bots automatically. The platform's robustness is measured by the quality
of the information it retrieves (usually number of fields) and its scalability (how quick it can
scale up to hundreds or thousands of sites). This scalability is mostly used to target the Long
Tail of sites that common aggregators find complicated or too labour-intensive to harvest
content from.
10
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
2.2EXISTING SYSTEM
Naive Bayes classifiers are a family of simple probabilistic classifiers used in machine learning.
These classifiers are based on applying Bayes theorem with strong (naive) independence
assumptions between the features.
Naive Bayes is a simple method for constructing classifiers: models that assign class labels to
problem instances, represented as vectors of feature values, where the class labels are drawn
from some finite set. It is not a single algorithm for training such classifiers, but a family of
algorithms based on a common principle: all naive Bayes classifiers assume that the value of a
particular feature is independent of the value of any other feature, given the class variable Naive
Bayes classifiers are a popular statistical technique of email filtering. They emerged in the
middle of the 90s and were one of the first attempts to tackle spam filtering problems Naive
Bayes typically use bag of words features to identify spam email, an approach commonly used
in text classification.
Naïve Bayes classifiers work by correlating the use of tokens (typically words, or sometimes
other constructions, syntactic or not), with spam and non-spam emails and then using Bayes
theorem to calculate a probability that an email is or is not a spam message
Earlier than implementation of Fake Profile Detection using deep learning there is No precise
model which enables hassle-free dataset study and analysis
• Time consumed in distributing data set.
• Manual work.
• Lack of accessibility.
• More Latency.
• Reliability and portability increased
11
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
12
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
2.4 OBJECTIVE
The main object of Fake Profile Detection is to provide users with a simple and practical
approach of
learning particularly the concept of handling and managing a single point hosted web-based
system which seamlessly integrate the agile functions to upload, change and view various files
shared across the user interface website, by means user authentication via a username and
password.
Another crucial aspect of Fake Profile Detection is to provide a hassle-free environment with
various options to navigate through different levels of education such as school, senior
secondary, graduate and post-graduate related study books or notes.
13
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Advantages
It greatly overcomes the lack of availability and converts the datasets into to a fully automated
and managed model trained under conditions.
Provides users & researchers with a simple interface to interact with various aspects holding
individual functions. It reduces manual work and also minimizes carriages of huge data across
the system and network.
Fake Profile Detection using Deep learning reduced / optimizes study and distribution time due
to its robustness and simple architecture. This is a model which need not be deployed over
cloud based architectural environment.
Fake Profile Detection using Deep learning is honed with increased reliability and increased
operational efficiency. It is a trained model and supervised based service which perform its
intended function adequately for the specified data set, and operate in a defined
environment without failure. Low latency with high QoS, non-functional properties
of model such as performance, reliability, availability, and platform independency come
handy with proposed project.
14
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
With the massive amounts of data being produced by the current "Big Data Era," we’re bound
to see innovations that we can’t even fathom yet, and potentially as soon as in the next ten
years. According to the experts, some of these will likely be deep learning applications.
“The analogy to deep learning is that the rocket engine is the deep learning models and the fuel
is the huge amounts of data we can feed to these algorithms.”
Module Description
Select the profile
Firstly, select the profile which is to be test to find fake or real. Proper care should be taken
while choosing the features which are not dependent on each other.
15
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
After proper selection of attributes, the dataset of previously identified fake and real profiles
are needed for the training purpose. We have made the real profile dataset whereas the fake
profile dataset is provided.
The attributes selected are needed to be extracted from the profiles (fake and genuine). For the
social networking companies which want to implement our scheme don’t need to follow the
scrapping process, they can easily extract the features from their database. We applied to scrap
off the profiles since no social network dataset is available publicly for the research purpose of
detecting the fake profiles.
After this, the dataset of fake and real profiles are prepared. From this dataset, 80% of both
profiles (real and fake) are used to prepare a training dataset and 20% of both profiles are used
to prepare a testing dataset. We find the efficiency of using training dataset.
ANN Classifier
ANN use different type of layers to find the profiles real or fake. This is an interactive method
and gives accurate value. It contains of many artificial neurons which are interconnected by
nodes.
2.7 ALGORITHMS
16
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Machine learning (ML) is the study of computer algorithms that improve automatically
through experience and by the use of data. It is seen as a part of artificial intelligence. Machine
learning algorithms build a model based on sample data, known as "training data", in order to
make predictions or decisions without being explicitly programmed to do so. Machine learning
algorithms are used in a wide variety of applications, such as in medicine, email
filtering, speech recognition, and computer vision, where it is difficult or unfeasible to develop
conventional algorithms to perform the needed tasks.
Deep learning (also known as deep structured learning) is part of a broader family of machine
learning methods based on artificial neural networks with representation learning. Learning
can be supervised, semi-supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief networks, graph neural
networks, recurrent neural networks and convolutional neural networks have been applied to
fields including computer vision, speech recognition, natural language processing, machine
translation, bioinformatics, drug design, medical image analysis, material inspection
and board game programs, where they have produced results comparable to and in some
cases surpassing human expert performance.
Artificial neural networks (ANNs) were inspired by information processing and distributed
communication nodes in biological systems. ANNs have various differences from
biological brains. Specifically, neural networks tend to be static and symbolic, while the
biological brain of most living organisms is dynamic (plastic) and analogue.
17
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Deep learning architectures can be constructed with a greedy layer-by-layer method. Deep
learning helps to disentangle these abstractions and pick out which features improve
performance.
For supervised learning tasks, deep learning methods eliminate feature engineering, by
translating the data into compact intermediate representations akin to principal components,
and derive layered structures that remove redundancy in representation.
Deep learning algorithms can be applied to unsupervised learning tasks. This is an important
benefit because unlabelled data are more abundant than the labelled data. Examples of deep
structures that can be trained in an unsupervised manner are neural history
compressors and deep belief networks.
Data analytics (DA) is the process of examining data sets in order to find trends and draw
conclusions about the information they contain. Increasingly data analytics is used with the aid
of specialized systems and software. Data analytics technologies and techniques are widely
18
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Features Selection
Feature selection is one of the basic concepts in machine learning which hugely impacts the
performance of classification and prediction. In our work, and in order to make our models
train well, we decided to use only features which will affect directly the results.
Feasibility Study
The feasibility of the project is analyzed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the feasibility
study of the proposed system is to be carried out. This is to ensure that the proposed system is
not a burden to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.
Economic Feasibility
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased
Operational Feasibility
Are you into the production of “things”? Perhaps, your answer would be yes. We naturally
don’t call them things; instead, we call them products, services, or systems. Using the term
“things” sounds foreign because you can’t just drop them into an area without touching them.
They need to be connected to an existing service or business. These “things” are an extension
of the organization where they are produced.
19
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Technical Feasibility
This study is carried out to check the technical feasibility, that is, the technical requirements of
the system. Any system developed must not have a high demand on the available technical
resources. This will lead to high demands on the available technical resources. This will lead
to high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.
Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user. This includes
the process of training the user to use the system efficiently. The user must not feel threatened
by the system, instead must accept it as a necessity. The level of acceptance by the users solely
depends on the methods that are employed to educate the user about the system and to make
him familiar with it. His level of confidence must be raised so that he is also able to make some
constructive criticism, which is welcomed, as he is the final user of the system.
2.10 CONCLUSION
From the literature survey we can conclude that we have overcome the drawbacks of existing
model and provide a clean and easy to model for the researchers. We have studied the various
problems with the current model and have come up with a model to overcome these problems.
20
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
2 SYSTEM ANALYSIS
3.1 INTRODUCTION
A software requirements specification (SRS) is a document that captures complete description
about how the system is expected to perform. It is usually signed off at the end of requirements
engineering phase. It is a description of a software system to be developed. It lays out functional
and non-functional requirements and may include a set of use cases that describe interactions
that the software must provide.
The important parts of the Software Requirements Specification (SRS) document are:
A Functional Requirement (FR) is a description of the service that the software must offer. It
describes a software system or its component. A function is nothing but inputs to the software
system, its behaviour, and outputs.
It can be a calculation, data manipulation, business process, user interaction, or any other
specific functionality which defines what function a system is likely to perform. Functional
Requirements in Software Engineering are also called Functional Specification.
The main functional requirements of Fake Profile Detection Using Deep learning are:
21
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
• The portal must extend the users with an interface for Viewing all the Analysed graphs
and outputs.
• The platform must extend users with an interface to view their trained data.
• Model must be able to accept or reject uploads.
• The user must be extended with an interface to change the view of the output.
• The user must be extended with an interface to distinguish the outputs.
• The user must be extended with an artifact to navigate between different aspects.
• The user must be extended with an interface to quit.
• For developing the application, the following are the Software Requirements:
o Python
o Anaconda
• Operating Systems supported
o Python
22
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
23
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
The main goal of this project is to develop a model that is user accessible and easily
understandable. This model/system aims to provide a practical approach to detect fake profiles
across online system network.
The model also aims to provide graphical and analysed insights to the user.
Python
Python is an interpreted high-level general-purpose programming language. Python's design
philosophy emphasizes code readability with its notable use of significant indentation.
Its language constructs as well as its object-oriented approach aim to help programmers write
clear, logical code for small and large-scale projects.
24
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
• Interactive Mode – Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
• Portable – Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
• Extendable – You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
• Databases – Python provides interfaces to all major commercial databases.
• GUI Programming – Python supports GUI applications that can be created and ported
to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
• Scalable – Python provides a better structure and support for large programs than shell
scripting.
Python Modules
Modules used are as follows
Numpy
Python has a strong set of data types and data structures. Yet it wasn’t designed for Machine
Learning per say. Enter numpy (pronounced as num-pee). Numpy is a data handling library,
particularly one which allows us to handle large multi-dimensional arrays along with a huge
collection of mathematical operations. The following is a quick snippet of numpy in action.
NumPy is the fundamental package for scientific computing with Python. It contains among
other things:
• A powerful N-dimensional array object.
• Sophisticated (broadcasting) functions.
• Tools for integrating C/C++ and FORTRAN code.
• Useful linear algebra, Fourier transform, and random number capabilities.
Using NumPy in Python gives functionality comparable to MATLAB since they are both
interpreted, and they both allow the user to write fast programs if most operations work on
arrays or matrices instead of scalars. In comparison, MATLAB boasts many additional
toolboxes, notably Simulink, whereas NumPy is intrinsically integrated with Python, a more
modern and complete programming language. Moreover, complementary Python packages are
available; SciPy is a library that adds more MATLAB-like functionality and Matplotlib is a
25
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Matplotlib
26
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Matplotlib is designed to be as usable as MATLAB, with the ability to use Python and the
advantage of being free and open-source. Each pyplot function makes some change to a figure:
e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area,
decorates the plot with labels, etc. There is also a procedural “pylab” interface based on a state
machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is
discouraged. SciPy makes use of Matplotlib. Matplotlib was originally written by John D.
Hunter, since then it has an active development community, and is distributed under a BSD-
style license. Michael Droettboom was nominated as matplotlib’s lead developer shortly before
John Hunter’s death in August 2012, and further joined by Thomas Caswell. Matplotlib 2.0.x
supports Python versions 2.7 through 3.6. Python 3 support started with Matplotlib 1.2.
Matplotlib 1.4 is the last version to support Python 2.6. Matplotlib has pledged to not support
Python 2 past 2020 by signing the Python 3 Statement.
Scipy
Pronounced as Sigh-Pie, this is one of the most important python libraries of all time. Scipy is
a scientific computing library for python. It is also built on top of numpy and is a part of the
Scipy Stack.
This is yet another behind the scenes library which does a whole lot of heavy lifting. It provides
modules/algorithms for linear algebra, integration, image processing, optimizations, clustering,
sparse matrix manipulation and many more.
Scikit-learn
Scikit-learn is a free machine learning library for Python. It features various algorithms like
support vector machine, random forests, and k-neighbours, and it also supports Python
numerical and scientific libraries like NumPy and SciPy. Scikit-learn is a free software machine
learning library for the Python programming language. It features various classification,
regression and clustering algorithms including support vector machines, random forests,
gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python
numerical and scientific libraries NumPy and SciPy.
Seaborn
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level
interface for drawing attractive and informative statistical graphics. Seaborn helps you explore
and understand your data. Its plotting functions operate on data frames and arrays containing
27
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
whole datasets and internally perform the necessary semantic mapping and statistical
aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus
on what the different elements of your plots mean, rather than on the details of how to draw
them.
3.5.2 BACKEND TECHNOLOGIES
Machine learning is the study of computer algorithms that improve automatically through
experience and by the use of data. It is seen as a part of artificial intelligence. Most machine
learning methods train the classifiers by the use of machine learning algorithms. The classifiers
are based on various social networks attributes such as attribute similarity, network friend
similarity, and IP address analysis. Machine learning classifiers, a number of algorithms which
are used in the proposed model, are introduced below.
Support vector machine (SVM) proposed is a learning algorithm based on statistical learning
theory. SVM implements the principal of structure risk minimization which minimizes the
empirical error and the complexity of the learner at the same time and achieves good
generalization performance in classification and regression tasks. The goal of SVM for
classification is to construct the optimal hyperplane with the largest margin. In general, the
larger the margin, the lower the generalization error of the classifier.
In this article, SVM was used with a linear and Gaussian kernel in training. Gaussian uses
normal curves around the data points and sums these data points so that the decision boundary
can be defined by a type of topology condition such as curves where the sum is above 0.5.
Naive Bayes classifiers are a collection of classification algorithms based on Bayes' Theorem.
It is not a single algorithm but a family of algorithms where all of them share a common
principle, i.e., every pair of features being classified is independent of each other.
28
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
DATA ANALYTICS
Data analysis is a process of inspecting, cleansing, transforming, and modelling data with the
goal of discovering useful information, informing conclusions, and supporting decision-
making. Data analysis has multiple facets and approaches, encompassing diverse techniques
under a variety of names, and is used in different business, science, and social science
domains. In today's business world, data analysis plays a role in making decisions more
scientific and helping businesses operate more effectively.
Data requirements
The data are necessary as inputs to the analysis, which is specified based upon the requirements
of those directing the analysis or customers (who will use the finished product of the analysis).
The general type of entity upon which the data will be collected is referred to as
an experimental unit (e.g., a person or population of people). Specific variables regarding a
population (e.g., age)may be specified and obtained. Data may be numerical or categorical (i.e.,
a text label for numbers).
Data collection
Data are collected from a variety of sources. The requirements may be communicated by
analysts to custodians of the data; such as, Information Technology personnel within an
organization. The data may also be collected from sensors in the environment, including traffic
cameras, satellites, recording devices, etc.
Data processing
29
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Data, when initially obtained, must be processed or organized for analysis. For instance, these
may involve placing data into rows and columns in a table format (known as structured data)
for further analysis, often through the use of spreadsheet or statistical software.
Data cleaning
Once processed and organized, the data may be incomplete, contain duplicates, or contain
errors. The need for data cleaning, will arise from problems in the way that the datum is entered
and stored. Data cleaning is the process of preventing and correcting these errors. Common
tasks include record matching, identifying inaccuracy of data, overall quality of existing data,
deduplication, and column segmentation.
DEEP LEARNING
Deep learning (also known as deep structured learning) is part of a broader family of machine
learning methods based on artificial neural networks with representation learning. Learning
can be supervised, semi-supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief networks, graph neural
networks, recurrent neural networks and convolutional neural networks have been applied to
fields including computer vision, speech recognition, natural language processing, machine
translation, bioinformatics, drug design, medical image analysis, material inspection and board
game programs, where they have produced results comparable to and in some cases surpassing
human expert performance.
Artificial neural networks (ANNs) were inspired by information processing and distributed
communication nodes in biological systems. ANNs have various differences from
biological brains. Specifically, neural networks tend to be static and symbolic, while the
biological brain of most living organisms is dynamic (plastic) and analogue.
Sublime Text is a shareware cross-platform source code editor with a Python application
programming interface (API). It natively supports many programming languages and
Markup languages, and functions can be added by users with plugins, typically community-
built and maintained under free-software licenses .
30
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
It includes wide features such as Syntax Highlight, Auto Indentation, File Type
Recognition, Sidebar, Macros, Plug-in and Packages that make it easy for working with
code base.Its “Go to” functionality and many keyboard shortcuts make it easy for experienced
developers to navigate their way around, and to write and find code easily without having to
take their hands off the keyboard.
Visual Studio Code is a free source-code editor made by Microsoft for Windows, Linux
and macOS. Features include support for debugging, syntax highlighting,
intelligent code completion, snippets, code refactoring, and embedded Git for gitlib.
Visual Studio Code is a streamlined code editor with support for development operations
like debugging, task running, and version control. It aims to provide just the tools a
developer needs for a quick code-build-debug cycle and leaves more complex workflows
to fuller featured IDEs, such as Visual Studio IDE.
Sublime Text is quick and easy to write code and navigate your way around when you
know what you're doing. Visual Studio provides more hand-holding and is a great option
for its debugging functionality, but might slow some speedy experienced programmers
down when it comes to writing code.
31
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
3.6 CONCLUSION
The analysis tells as the requirement specifications of the project. The functional requirements
specify the functionality and functional requirements were as the software requirements tell
the required software and supporting files to process the data. The hardware requirements tell
about the hardware components required to run the software. The various requirements of the
system is selected through rigorous survey; the development is done in such a way that we
ensure that all the requirements are met, and the software is up to the standards of a professional
software.
System analysis is conducted for the purpose of studying a system or its parts in order to
identify its objectives. It is a problem-solving technique that improves the system and
ensures that all the components of the system work efficiently to accomplish their purpose.
Analysis specifies what the system should do.
32
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
3 SYSTEM DESIGN
4.1 INTRODUCTION
System Design is the process or art of defining the architecture components, modules,
interfaces and data for a system to satisfy specified requirements. One should see as the
applications of the systems theory to product development.
System design is the phase that bridges the gap between the problem domain and the existing
system in a manageable way. It is the phase where the SRS document is converted into a format
that can be implemented and decides how the system will operate.
The purpose of the design phase is to plan a solution of the problem specified by the
requirement document. This phase is the first step in moving from the problem domain to
the solution domain. In other words, starting with what is needed, design takes us toward
how to satisfy the needs. The design of a system is perhaps the most critical factor affection
the quality of the software; it has a major impact on the later phase, particularly testing,
maintenance. The output of this phase is the design document. This document is similar to a
blueprint for the solution and is used later during implementation, testing and maintenance.
The design activity is often divided into two separate phases System Design and Detailed
Design.
System Design also called top-level design aims to identify the modules that should be in the
system, the specifications of these modules, and how they interact with each other to produce
the desired results. At the end of the system design all the major data structures, file formats,
output formats, and the major modules in the system and their specifications are decided.
During, Detailed Design, the internal logic of each of the modules specified in system design
is decided. During this phase, the details of the data of a module are usually specified in a
high-level design description language, which is independent of the target language in which
the software will eventually be implemented.In system design the focus is on identifying
the modules, whereas during detailed designthe focus is on designing the logic for each of
the modules. In other works, in system designthe attention is on what components are needed,
while in detailed design how the components can be implemented in software is the issue.
33
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
During the system design activities, Developers bridge the gap between the requirements
specification, produced during requirements elicitation and analysis, and the system that is
delivered to the user. Design is the place where the quality is fostered in development.
Software design is a process through which requirements are translated into a representation
of software.
In this phase, the complex activity of system development is divided into several smaller sub
activities, which coordinate with each other to achieve the main objective of system
development.
Unified Modelling Language (UML) is a general-purpose modelling language. The main aim
of UML is to define a standard way to visualize the way a system has been designed. It is uite
like blueprints used in other fields of engineering.
UML is not a programming language; it is rather a visual language. We use UML diagrams to
portray the behaviour and structure of a system. UML helps software engineers, businessmen
and system architects with modelling, design and analysis. The Object Management Group
(OMG) adopted Unified Modelling Language as a standard in 1997. It’s been managed by
OMG ever since. International Organization for Standardization (ISO) published UML as an
approved standard in 2005. UML has been revised over the years and is reviewed periodically.
UML is linked with object-oriented design and analysis. UML makes the use of elements and
forms associations between them to form diagrams. Diagrams in UML can be broadly
classified as:
34
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
diagrams include Use Case Diagrams, State Diagrams, Activity and Interaction Diagrams.
4.2.1 USE CASE DIAGRAM
A use case is a methodology used in system analysis to identify, clarify and organize system
requirements. The use case is made up of a set of possible sequences of interactions between
systems and users in a particular environment and related to a particular goal. The method
creates a document that describes all the steps taken by a user to complete an activity.
35
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
The class diagram is the main building block of object-oriented modelling. It is used for
general conceptual modelling of the structure of the application, and for detailed modelling
translating the models into programming code. Class diagrams can also be used for data
modelling.
Class diagrams give you a sense of orientation. They provide detailed insight into the
structure of your systems. At the same time they offer a quick overview of the synergy
happening among the different system elements as well as their properties and
relationships.
36
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
a component diagram depicts how components are wired together to form larger components
or software systems. They are used to illustrate the structure of arbitrarily complex systems.
37
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
A deployment diagram in the Unified Modelling Language models the physical deployment
of artifacts on nodes. To describe a web site, for example, a deployment diagram would show
what hardware components exist, what software components run on each node, and how the
different pieces are connected.
Deployment diagrams are a set of nodes and their relationships. These nodes are physical
entities where the components are deployed. Deployment diagrams are used for visualizing
the deployment view of a system. This is generally used by the deployment team. Note − If
the above descriptions and usages are observed carefully then it is very clear that all the
diagrams have some relationship with one another. Component diagrams are dependent upon
the classes, interfaces, etc. which are part of class/object diagram. Again, the deployment
diagram is dependent upon the components, which are used to make component diagrams.
38
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
39
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
A sequence diagram is a type of interaction diagram because it describes how and in what
order a group of objects works together. These diagrams are used by software developers
and business professionals to understand requirements for a new system or to document an
existing process.
Sequence diagrams describe how and in what order the objects in a system function. These
diagrams are widely used by businessmen and software developers to document and understand
requirements for new and existing systems.
A graphical tool used to describe and analyse the moment of data through a system manual
or automated including the process, stores of data, and delays in the system. Data Flow
40
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Diagrams are the central tool and the basis from which other components are developed. The
transformation of data from input to output, through processes, may be described logically
and independently of the physical components associated with the system. The DFD is also
known as a data flow graph or a bubble chart.
o DFDs are the model of the proposed system. They clearly should show the
requirements on which the new system should be built. Later during design
activity this is taken as the basis for drawing the system’s structure charts. The
Basic Notation used to create a DFD’s are asfollows:
4. Data Store: Here data are stored or referenced by a process in the System.
41
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
4.3 CONCLUSION
By design content, we can describe the required modules and different diagrams. Using
diagrams what are the communications present and we can also understand the project easily.
Modules help as in designing the project to fulfil the user requirements. This phase has helped
us understand the project better. These diagrams make the process of construction amazingly
simple.
The kind of the diagram is defined by the primary graphical symbols shown on the diagram.
For example, a diagram where the primary symbols in the contents area are classes is class
diagram. A diagram which shows use cases and actors is use case diagram. A sequence diagram
shows sequence of message exchanges between lifelines.
UML specification does not preclude mixing of different kinds of diagrams, e.g. to combine
structural and behavioural elements to show a state machine nested inside a use case.
Consequently, the boundaries between the various kinds of diagrams are not strictly enforced.
At the same time, some UML Tools do restrict set of available graphical elements which could
be used when working on specific type of diagram.
42
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
4 TECHNOLOGIES USED
5.1 INTRODUCTION
Any system requires the implementation of various technologies which together help the
proposed system runs. These technologies include both hardware and software. The seamless
integration between these components helps the system run smoothly. If there is no proper
integration between these components the system may develop unwanted complications. This
project takes advantage of multiple open-source software’s and some trained supervised
learning methods to accomplish the task. Some of the technologies used in this project include:
• DEEP LEARNING
• NEURAL NETWORKS
• ARTIFICIAL NEURAL NETWORK
43
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
• Machine learning works only with sets of structured and semi-structured data, while
deep learning works with both structured and unstructured data
• Deep learning algorithms can perform complex operations efficiently, while machine
learning algorithms cannot
• Machine learning algorithms use labelled sample data to extract patterns, while deep
learning accepts large volumes of data as input and analyses the input data to extract
features out of an object
• The performance of machine learning algorithms decreases as the number of data
increases; so, to maintain the performance of the model, we need a deep learning
44
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
In the figure given above, we provide the raw data of images to the first layer of the input layer.
After then, these input layer will determine the patterns of local contrast that means it will
differentiate on the basis of colours, luminosity, etc. Then the 1st hidden layer will determine
the face feature, i.e., it will fixate on eyes, nose, and lips, etc. And then, it will fixate those face
features on the correct face template. So, in the 2nd hidden layer, it will actually determine the
correct face here as it can be seen in the above image, after which it will be sent to the output
layer. Likewise, more hidden layers can be added to solve more complex problems, for
example, if you want to find out a particular kind of face having large or light complexions.
So, as and when the hidden layers increase, we are able to solve complex problems.
A neural network learns from structured data and exhibits the output. Learning taking place
within neural networks can be in three different categories:
45
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
1. Supervised Learning - with the help of labelled data, inputs, and outputs are fed to the
algorithms. They then predict the desired result after being trained on how to interpret
data.
2. Unsupervised Learning - ANN learns with no human intervention. There is no labelled
data, and output is determined according to patterns identified within the output data.
3. Reinforcement Learning - the network learns depending on the feedback you give it.
The essential building block of a neural network is a perceptron or neuron. It uses the
supervised learning method to learn and classify data
Neural Networks are complex systems with artificial neurons. Artificial neurons or perceptron
consist of:
• Input
• Weight
• Bias
• Activation Function
• Output
46
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
The neurons receive many inputs and process a single output. Neural networks are layers of
neurons. These layers consist of the following:
• Input layer
• Multiple hidden layers
• Output layer
The input layer receives data represented by a numeric value. Hidden layers perform the most
computations required by the network. Finally, the output layer predicts the output. In a neural
network, neurons dominate one another. Each layer is made of neurons. Once the input layer
receives data, it is redirected to the hidden layer. Each input is assigned with weights.
The weight is a value in a neural network that converts input data within the network’s hidden
layers. Weights work by input layer, taking input data, and multiplying it by the weight value.
It then initiates a value for the first hidden layer. The hidden layers transform the input data
and pass it to the other layer. The output layer produces the desired output. The inputs and
weights are multiplied, and their sum is sent to neurons in the hidden layer. Bias is applied to
each neuron. Each neuron adds the inputs it receives to get the sum. This value then transits
through the activation function. The activation function outcome then decides if a neuron is
activated or not. An activated neuron transfers information into the other layers. With this
approach, the data gets generated in the network until the neuron reaches the output layer.
Another name for this is forward propagation. Feed-forward propagation is the process of
inputting data into an input node and getting the output through the output node. Feed-forward
propagation takes place when the hidden layer accepts the input data. Processes it as per the
activation function and passes it to the output. The neuron in the output layer with the highest
probability then projects the result. If the output is wrong, back propagation takes place. While
designing a neural network, weights are initialized to each input. Back propagation means re-
adjusting each input’s weights to minimize the errors, thus resulting in a more accurate output.
47
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
of units that are interconnected in some pattern to allow communication between the units.
These units, also referred to as nodes or neurons, are simple processors which operate in
parallel. Every neuron is connected with another neuron through a connection link. Each
connection link is associated with a weight that has information about the input signal. This is
the most useful information for neurons to solve a particular problem because the weight
usually excites or inhibits the signal that is being communicated. Each neuron has an internal
state, which is called an activation signal. Output signals, which are produced after combining
the input signals and activation rule, may be sent to other units. The term "Artificial Neural
Network" is derived from Biological neural networks that develop the structure of a human
brain. Similar to the human brain that has neurons interconnected to one another, artificial
neural networks also have neurons that are interconnected to one another in various layers of
the networks. These neurons are known as nodes. Artificial Neural Network is biologically
inspired by the neural network, which constitutes after the human brain.
The history of neural networking arguably began in the late 1800s with scientific endeavours
to study the activity of the human brain. In 1890, William James published the first work about
brain activity patterns. In 1943, McCulloch and Pitts created a model of the neuron that is still
used today in an artificial neural network. This model is segmented in two parts
In 1949, Donald Hebb published "The Organization of Behaviour," which illustrated a law for
synaptic neuron learning. This law, later known as Hebbian Learning in honour of Donald
Hebb, is one of the most straight-forward and simple learning rules for artificial neural
networks. In 1951, Narvin Minsky made the first Artificial Neural Network (ANN) while
working at Princeton. In 1958, "The Computer and the Brain'' were published, a year after John
von Neumann's death. In that book, von Neumann proposed numerous extreme changes to how
analysts had been modelling the brain.
48
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Input layer:
The Input layers contain those artificial neurons (termed as units) which are to receive input
from the outside world. This is where the actual learning on the network happens, or
recognition happens else it will process.
Output layer:
49
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
The output layers contain units that respond to the information that is fed into the system and
also whether it learned any task or not.
Hidden layer:
The hidden layers are mentioned hidden in between input layers and the output layers. The
only job of a hidden layer is to transform the input into something meaningful that the output
layer/unit can use in some way.
Most of the artificial neural networks are all interconnected, which means that each of the
hidden layers is individually connected to the neurons in its input layer and also to its output
layer leaving nothing to hang in the air. This makes it possible for a complete learning
process and also learning occurs to the maximum when the weights inside the artificial neural
network get updated after each iteration.
• Neural networks have the ability to map input patterns to their assumed output patterns
• The Neural networks are able to generalise. Hence, new findings from past patterns can
be expected
• The Neural networks are stable systems and are tolerant of faults. Therefore, they can
distinguish complete patterns from incomplete, partial or noisy patterns
• At high speed and in a distributed manner, the Neural networks can process the data in
a parallel mode
Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network
receives the input signal from the external source in the form of a pattern and image in the form
of a vector. These inputs are then mathematically assigned by the notations x(n) for every n
number of inputs.
50
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Afterward, each of the input is multiplied by its corresponding weights (these weights are the
details utilized by the artificial neural networks to solve a specific problem ). In general terms,
these weights normally represent the strength of the interconnection between neurons inside
the artificial neural network. All the weighted inputs are summarized inside the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight equals
to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity. Here, to keep
the response in the limits of the desired value, a certain maximum value is benchmarked, and
the total of weighted inputs is passed through the activation function.
5.5 CONCLUSION
Working on this project has exposed to the various new and old technologies present in the
market for web development. These technologies have specifically been selected due to their
versatility and hoe the easily work with each other in tandem. We have gained a lot of insight
about the various technologies present in the market and what are their advantages and
disadvantages.
51
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
6.1 INTRODUCTION
In this section we will describe how the development and the deployment setup was done for
the system. This can be used to replicate the exact workings of the system.
You can use any web browser to download Google Chrome. If you haven’t installed a
browser, you can use your operating system’s preinstalled web browser (Internet Explorer for
Windows and Safari for Mac OS X).
STEP 2: Click "Download Chrome". This will open the Terms of Service window.
52
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
STEP 3: Determine if you want Chrome as your default browser. If you set it as the default
browser, it will open whenever a link for a web page is clicked in another program, such as
email. This will send back crash reports, preferences and button clicks. It does not send any
personal information or track websites.
53
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
STEP 4: Click “Accept and Install” after reading the Terms of Service. The installer will start
and you will have Google Chrome installed when it has finished. Depending on your browser
settings, you may need to allow the program to run.
STEP 5: Sign in to Chrome. After installing, a Chrome window will open showing first-time
use information. We can sign in with your Google account to sync bookmarks, preferences,
and browsing history with any Chrome browser that you use. We can also read how to use
Google Chrome for some tips on your new browser.
6.3.1 conda
54
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
6.3.3 pip
your PATH environment variable in order to launch jupyter lab . If you are using a Unix
6.3.4 pipenv
pipenv shell
jupyter lab
Alternatively, you can run jupyter lab inside the virtualenv with
6.3.5 DOCKER
If you have Docker installed, you can install and use Jupyter by selecting one of the
many ready-to-run Docker images maintained by the Jupyter Team. Follow the instructions
in the Quick Start Guide to deploy the chosen Docker image.
55
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
The latest versions of the following browsers are currently known to work:
• Firefox
• Chrome
• Safari
Earlier browser versions may also work, but come with no guarantees.
JupyterLab uses CSS Variables for styling, which is one reason for the minimum versions
listed above. IE 11+ or Edge 14 do not support CSS Variables, and are not directly supported
at this time. A tool like postcss can be used to convert the CSS files in
the jupyterlab/build directory manually if desired.
To install some extensions, you will need access to an NPM packages registry. Some
companies do not allow reaching directly public registry and have a private registry. To use
it, you need to configure npm and yarn to point to that registry (ask your corporate IT
department for the correct URL):
If your computer is behind corporate proxy or firewall, you may encounter HTTP and SSL
errors due to the proxy or firewall blocking connections to widely-used servers. For example,
you might see this error if conda cannot connect to its own repositories:
56
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
• pypi.org
• pythonhosted.org
• continuum.io
• anaconda.com
• conda.io
• github.com
• githubusercontent.com
• npmjs.com
• yarnpkg.com
Alternatively, you can specify a proxy user (usually a domain user with password), that is
allowed to communicate via network. This can be easily achieved by setting two common
environment variables: HTTP_PROXY and HTTPS_PROXY . These variables are
automatically used by many open-source tools (like conda ) if set correctly.
# For Windows
set HTTP_PROXY=http://USER:PWD@proxy.company.com:PORT
set HTTPS_PROXY=https://USER:PWD@proxy.company.com:PORT
Warning
57
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Many Jupyter extensions require having working npm and jlpm (alias for yarn )
commands, which is required for downloading useful Jupyter extensions or other JavaScript
dependencies. If npm cannot connect to its own repositories, you might see an error like:
# Set default registry for NPM (optional, useful in case if common JavaScript libs cannot be
found)
npm config set registry http://registry.npmjs.org/
jlpm config set registry https://registry.yarnpkg.com/
In case you can communicate via HTTP, but installation with npm fails on connectivity
problems to HTTPS servers, you can disable using SSL for npm .
Warning
The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are necessary to
put transaction data in to a usable form for processing can be achieved by inspecting the
computer to read data from a written or printed document or it can occur by having people
keying the data directly into the system. The design of input focuses on controlling the amount
of input required, controlling the errors, avoiding delay, avoiding extra steps and keeping the
process simple. The input is designed in such a way so that it provides security and ease of use
with retaining the privacy. Input Design considered the following things:
58
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
6.4.2 OBJECTIVES
• Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input
process and show the correct direction to the management for getting correct
information from the computerized system.
• It is achieved by creating user-friendly screens for the data entry to handle large volume
of data. The goal of designing input is to make data entry easier and to be free from
errors. The data entry screen is designed in such a way that all the data manipulates can
be performed. It also provides record viewing facilities.
• When the data is entered it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that the user will not
be in maize of instant. Thus, the objective of input design is to create an input layout
that is easy to follow.
A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to
other system through outputs. In output design it is determined how the information is to be
displaced for immediate need and also the hard copy output. It is the most important and direct
source information to the user. Efficient and intelligent output design improves the system’s
relationship to help user decision-making.
• Designing computer output should proceed in an organized, well thought out manner;
the right output must be developed while ensuring that each output element is designed
so that people will find the system can use easily and effectively. When analysis design
computer output, they should Identify the specific output that is needed to meet the
requirements.
• Select methods for presenting information.
59
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
• Create document, report, or other formats that contain information produced by the
system.
The output form of an information system should accomplish one or more of the following
objectives.
6.5 CONCLUSION
In this section we have gone through the environment setup used during development and
deployment of the project. We have gained a lot of knowledge on how to practically implement
these technologies
60
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
7 SYSTEM IMPLEMENTATIONS
7.1 INTRODUCTION
The implementation phase, the project plan is put into motion and the work of the project is
performed. The project takes shape during the implementation phase. This phase involves the
construction of the actual project result. Programmers are occupied with encoding, designers
are involved in developing graphic material, contractors are building, the actual reorganisation
takes place.
The purpose of the design phase is to plan a solution of the problem specified by the
requirement document. This phase is the first step in moving from the problem domain to
the solution domain. In other words, starting with what is needed, design takes us toward
how to satisfy the needs. The design of a system is perhaps the most critical factor affection
the quality of the software; it has a major impact on the later phase, particularly testing,
maintenance. The output of this phase is the design document. This document is similar to a
blueprint for the solution and is used later during implementation, testing and maintenance.
The design activity is often divided into two separate phases System Design and Detailed
Design.
System Design also called top-level design aims to identify the modules that should be in the
system, the specifications of these modules, and how they interact with each other to produce
the desired results. At the end of the system design all the major data structures, file formats,
output formats, and the major modules in the system and their specifications are decided.
During, Detailed Design, the internal logic of each of the modules specified in system design
is decided. During this phase, the details of the data of a module are usually specified in a
high-level design description language, which is independent of the target language in which
the software will eventually be implemented.
In system design the focus is on identifying the modules, where as during detailed design
the focus is on designing the logic for each of the modules. In other works, in system design
61
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
the attention is on what components are needed, while in detailed design how the components
can be implemented in software is the issue.
During the system design activities, Developers bridge the gap between the requirements
specification, produced during requirements elicitation and analysis, and the system that is
delivered to the user. Design is the place where the quality is fostered in development.
Software design is a process through which requirements are translated into a representation
of software.
IMPORTING LIBRARIES
62
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
READ DATASET
In []: df_users = pd.read_csv(r"C:\Users\priyam_upadhyay\Desktop\project_file\Detecting-
Fake-Profiles-On-Social-Media-master\Detecting-Fake-Profiles-On-Social-Media-
master/dataset/users.csv")
df_fusers =
pd.read_csv(r"C:\Users\priyam_upadhyay\Desktop\project_file\Detecting-Fake- Profiles-On-
Social-Media-master\Detecting-Fake-Profiles-OnSocialMediamaster/dataset/fusers.csv")
In []: df_fusers.shape
Out[]: (3351, 38)
In []: df_users.shape
Out[]: (3474, 42)
63
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
In []: df_allUsers.describe()
Out []:
In []: df_allUsers.head()
Out []:
In []: Y = df_allUsers.isFake
64
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
In []: print(Y.shape)
In []: X.head()
Out []:
Feature Selection
65
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
In []: X = X[[
"statuses_count",
"followers_count",
"friends_count",
"favourites_count",
"lang_num",
"listed_count",
"geo_enabled",
"profile_use_background_image"
]]
In []: X = X.replace(np.nan, 0) #To replace the missing boolean values with zeros as it mea
ns false
66
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Import Data
In []: print(train_X.shape)
print(test_X.shape)
print(train_y.shape)
print(test_y.shape)
Out []: (4368, 8)
(1365, 8)
(4368,)
(1365,)
Design Model
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
67
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
=================================================================
dense (Dense) (None, 32) 288
_________________________________________________________________
dense_1 (Dense) (None, 64) 2112
_________________________________________________________________
dense_2 (Dense) (None, 64) 4160
_________________________________________________________________
dense_3 (Dense) (None, 32) 2080
_________________________________________________________________
dense_4 (Dense) (None, 1) 33
=================================================================
Total params: 8,673
Trainable params: 8,673
Non-trainable params: 0
Compile Model
In []: model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
Training
In []: history = model.fit(train_X, train_y,
epochs=15,
verbose=1,
validation_data=(val_X,val_y))
Out []:
68
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Testing
In []: score = model.evaluate(test_X, test_y, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Out []: Test loss: 0.15890942513942719
Test accuracy: 0.9912087917327881
Graphs
In []: # Plot training and validation accuracy values
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
axes = plt.gca()
axes.set_xlim([0,14])
69
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
axes.set_ylim([0.85,1])
axes.grid(True, which='both')
axes.axhline(y=0.85, color='k')
axes.axvline(x=0, color='k')
axes.axvline(x=14, color='k')
axes.axhline(y=1, color='k')
plt.legend(['Train','Val'], loc='lower right')
plt.show()
Out []:
70
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
axes.axhline(y=0, color='k')
axes.axvline(x=0, color='k')
axes.axhline(y=5, color='k')
axes.axvline(x=14, color='k')
plt.legend(['Train','Val'], loc='upper right')
plt.show()
Out []:
Prediction
In []: # Write the index of the test sample to test
prediction = model.predict(test_X[136:137])
prediction = prediction[0]
print('Prediction\n',prediction)
print('\nThresholded output\n',(prediction>0.5)*1)
Out []: Prediction
[0.9993391]
Thresholded output
[1]
Ground truth
In []: print(test_y[136:137])
71
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Loading
In []: # load json and create model
# Write the file name of the model
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
# Write the file name of the weights
loaded_model.load_weights("model.h5")
print("Loaded model from disk")
Out []: Loaded model from disk
72
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
7.3 SNAPSHOTS
73
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
74
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
75
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
7.4 CONCLUSION
Implementing this project has helped us understand the use of various deep learning techniques.
We also learned a lot about the workings of ML, Deep learning and how to implement it in real
world situations. We have also learned a lot about machine learning specifically about the
various steps present in it.
76
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
8 SYSTEM TESTS
8.1 INTRODUCTION
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality
of components, sub-assemblies, assemblies and/or a finished product It is the process of
exercising software with the intent of ensuring that the Software system meets its requirements
and user expectations and does not fail unacceptably. There are various types of test. Each test
type addresses a specific testing requirement.
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is
structural testing, that relies on knowledge of its construction and is invasive. Unit tests
perform basic tests at the component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined inputs and
expected results.
Integration testing
Integration tests are designed to test integrated software components to determine if they run
as one program. Testing is event-driven and is more concerned with the basic outcome of
screens or fields. Integration tests demonstrate that although the components were individually
satisfaction, as shown by successfully unit testing, the combination of components is correct
and consistent. Integration testing is specifically aimed at exposing the problems that arise
from the combination of components.
Functional test
77
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centred on the following items:
• Valid Input: Identified classes of valid input must be accepted.
• Invalid Input: Identified classes of invalid input must be rejected.
• Functions: Identified functions must be exercised.
• Output: Identified classes of application outputs must be exercised.
• Systems/Procedures: Interfacing systems or procedures must be invoked.
Organization and preparation of functional tests are focused on requirements, key functions,
or special test cases. Also, systematic coverage about identifies Business process flows; data
fields, predefined processes, and successive processes must be considered for testing. Before
functional testing is complete, additional tests are identified and the effective value of current
tests are determined.
System Test
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration-oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.
White Box Testing is a testing in which the software tester knows the inner workings, structure
and language of the software, or at least its purpose. It is purpose. It is used to test areas that
cannot be reached from a black-box level.
Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, like most other kinds of tests,
must be written from a definitive source document, such as specification or requirements
document, such as specification or requirements document. It is a testing in which the software
under test is treated, as a black box. you cannot “see” into it. The test provides inputs and
78
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
Field testing will be performed manually and functional tests will be written in detail.
Test objectives
Features to be tested
Software integration testing is the incremental integration testing of two or more integrated
software components on a single platform to produce failures caused by interface defects.
The task of the integration test is to check that components or software applications, e.g.,
components in a software system or – one step up – software applications at the company level
interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant participation
by the end user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered
79
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
8.4 CONCLUSION
Testing is an especially important phase during the development of a project. It helps us find
bugs and unwanted issues within the system. During this phase, we found some bugs in the
system that we could easily fix. This helped us determine if the system was ready for realworld
use. After rigorous testing, we could find that the system is ready for deployment.
80
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
9 CONCLUSIONS
In this research, we have come up with an ingenious way to detect fake accounts on OSNs By
using Artificial Neural Network to its full extent, we have eliminated the need for manual
prediction of a fake account, which needs a lot of human resources and is also a time-
consuming process. Existing systems have become obsolete due to the advancement in the
creation of fake accounts. The factors that the existing system relayed upon is unstable. In this
research, we used stable factors such as engagement rate, artificial activity to increase the
accuracy of the prediction.
Future work is to apply feature sets used in other spam detection models, and hence to realize
multi-model ensemble prediction. Another direction is to make the system robust against
adversarial attacks, such as a botnet that diversifies all features, or an attacker that learns from
failures.
There is always a possibility of improvement, therefore we have implemented all the core
features of the proposed system, but we believe there can be few more advancements to this
artifact:
• Improving the accuracy of the model.
• Working with a larger set of tuples and data on varied platform for increased reliability.
• Advancements of the user interface for better interactions
• Optimizing the time required for data injection and retrieval
81
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
10 REFERENCES
➢ JavaScript The Complete Reference 3rd Edition, by Thomas Powell Nambouri Sravya,
Chavana Sai praneetha, S. Saraswathi,” Identify the Human or Bots Twitter Data using
Machine Learning Algorithms”, International Research Journal of Engineering and
Technology (IRJET), Volume: 06 Issue: 03 | Mar 2019 www.irjet.net, e-ISSN: 2395-
0056, p- ISSN: 2395-0072.
➢ M. Smruthi, N. Harini,” A Hybrid Scheme for Detecting Fake Accounts in Facebook”,
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-
3878, Volume-7, Issue-5S3, February 2019.
➢ Tehlan, Pooja, Rosy Madaan, and Komal Kumar Bhatia. "A Spam Detection
Mechamism in Social Media using Soft Computing."
➢ Rao, P. S., J. Gyani, and G. Narsimha. "Fake profiles identification in online social
networks using machine learning and NLP." Int. J. Appl. Eng. Res 13.6 (2018): 973-
4562.
➢ Raturi, Rohit. "Machine learning implementation for identifying fake accounts in social
network." International Journal of Pure and Applied Mathematics 118.20 (2018): 4785-
4797. J. Wang, “Fundamentals of erbium-doped fibre amplifiers arrays (Periodical
style—Submitted for publication),” IEEE J. Quantum Electron., submitted for
publication.
➢ Van Der Walt, Estée, and Jan Eloff. "Using machine learning to detect fake identities:
bots vs humans." IEEE Access 6 (2018): 6540-6549.
➢ Kulkarni, Sumit Milind, and Vidya Dhamdhere. "Automatic detection of fake profiles
in online social networks." Open access international journal of science and engineering
3.1 (2018): 70-73. M. Young, The Techincal Writers Handbook. Mill Valley, CA:
University Science, 1989.
➢ Ala'M, Al-Zoubi, Ja'far Alqatawna, and Hossam Faris. "Spam profile detection in social
networks based on public features." 2017 8th International Conference on information
and Communication Systems (ICICS). IEEE, 2017.
82
MRIET
FAKE PROFILE DETECTION USING DEEP LEARNING
➢ Elovici, Yuval, and Gilad Katz. "Method for detecting spammers and fake profiles in
social networks." U.S. Patent No. 9,659,185. 23 May 2017.
➢ Gurajala, Supraja, et al. "Profile characteristics of fake Twitter accounts." Big Data &
Society 3.2 (2016): 2053951716674236.
➢ Ferrara, Emilio, et al. "Predicting online extremism, content adopters, and interaction
reciprocity." International conference on social informatics. Springer, Cham, 2016.
➢ Caspi, Avner, and Paul Gorsky. "Online deception: Prevalence, motivation, and
emotion." Cyber Psychology & Behaviour 9.1 (2006): 54-59.
➢ Bergen, Emilia, et al. "The effects of using identity deception and suggesting secrecy
on the outcomes of adult-adult and adult-child oradolescent online sexual interactions."
Victims & Offenders 9.3 (2014): 276-298.
➢ Wani, Suheel Yousuf, Mudasir M. Kirmani, and Syed Imamul Ansarulla. "Prediction
of fake profiles on Facebook using supervised machine learning techniques-A
theoretical model." International Journal of Computer Science and Information
Technologies (IJCSIT) 7, no. 4 (2016): 1735-1738.
➢ Wu, W., Alvarez, J., Liu, C. and Sun, H.M., 2018. Bot detection using unsupervised
machine learning. Microsystem Technologies, 24(1), pp.209-217.
➢ https://www.google.co.in/
➢ https://jupyter.org/install
➢ https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
➢ https://www.datarobot.com/wiki/deep-learning/
➢ https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
➢ https://github.com/
83
MRIET