Professional Documents
Culture Documents
Fake Profiling
Fake Profiling
Fake Profiling
[ college logo ]
A STYDY ON
in
Three Years Diploma Program in Engineering & Technology of Maharashtra
State Board of Technical Education, Mumbai (Autonomous)
ISO 9001:2008 (ISO/IEC-27001:2013)
at
1
MAHARASHTRA STATE BOARD OF TECHNICAL
EDUCATION, MUMBAI
Certificate
This is to certify that Mr. /Mrs.
in Engineering & Technology at [ your college name ] , has completed the Micro
Project satisfactorily in Subject ETI in the academic year 2019-2020 as per the
Head of
Institute
2
INDEX
3
Abstract
Social networks such as Facebook, Twitter and Google+ have attracted millions of users in the
last years. One of the most widely used social networks, Facebook, recently had an initial public
offering (IPO) in May 2012, which was among the biggest in Internet technology. Forprofit and
nonprofit organizations primarily use such platforms for target-oriented advertising and large-
scale marketing campaigns. Social networks have attracted worldwide attention because of their
potential to address millions of users and possible future customers. The potential of social
networks is often misused by malicious users who extract sensitive private information of
unaware users. One of the most common ways of performing a large-scale data harvesting attack
is the use of fake profiles, where malicious users present themselves in profiles impersonating
fictitious or real persons. The main goal of this research is to evaluate the implications of fake
user profiles on Facebook. To do so, we established a comprehensive data harvesting attack, the
social engineering experiment, and analyzed the interactions between fake profiles and regular
users to eventually undermine the Facebook business model. Furthermore, privacy
considerations are analyzed using focus groups. As a result of our work, we provided a set of
countermeasures to increase the awareness of user
4
Introduction
In recent years, online social networks such as Facebook, Twitter an Google+ have become a
global mass phenomenon and one of the fastest emerging e-services according to Gross and
Acquisti (2005) and Boyd and Ellison (2007). A study recently published by Facebook (2012)
indicates that there were about 901 million monthly active users on the platform at the end of
March 2012. Therefore, Facebook is one of the largest online social networks. Not only common
users but also celebrities, politicians and other people of public interest use social media to
spread content to others. Furthermore, companies and organizations consider social media sites
the medium of choice for large-scale marketing and target-oriented adver-tising campaigns.
The sustainability of the business model relies on several different factors and is usually not
publicly disclosed. Nonetheless, we assume that two major aspects are significant for Facebook.
First and foremost, Facebook relies on people using their real-life identity and therefore
discourages the use of pseudonyms. Verified accounts allow (prominent) users to verify their
identity and to continue using pseudonyms, e.g., stage names such as ‘Gaga’. This is considered
to be a security mechanism against fake accounts (TechCrunch 2012); moreover, users are asked
to identify friends who do not use their real names.
One of the problems in data classification is the unbalanced distribution of data, in which items
in some classes are more than those of other classes. This problem arises in two-class
applications more than the others; it means that one class has more items than the other class.
The resampling approach means changing the distribution of training sample sets by processing
data. There are several approaches towards improving the class efficiency by balancing the
datasets [1. Resampling data may balance the distribution of the data class by removing the
samples of majority class by the use of undersampling approach or increasing the samples of
minority class using oversampling to balance. There is another approach known as the minority
class artificial sampling which creates the Synthetic Minority Oversampling Technique
(SMOTE) of artificial data based on similarity of the characteristics between minority class
items. In the proposed model, due to the use of similarity feature of the nodes and the
5
unwillingness to remove information, SMOTE method is used. Due to the replication of minority
class samples from the main data in all oversampling approaches, it may increase noise data and
processing time and result in overfitting and decrease in efficiency.
Chawla proposed the SMOTE algorithm. This algorithm can randomly create items of a
minority class based on certain rule and combine these new sample items with the original
dataset to produce new training steps. This approach can be used to produce new minority class
items. In minority classes, different samples have different roles in the process of oversampling,
and these marginal samples take more roles than the items at the center of minority class.
Examples obtained on the margin of a minority class may improve the theme recognition
decision and classification rate for minority class prototypes.
6
Literature Survey
The literature survey has been carried out to explore the previous research works done in the
following relevant areas: Web Mining Web Content Mining Extractive Summarization Query
Based Summarization Sentence Scoring Approaches Preprocessing for Query Based
Summarization Web page Filtering Approaches Web page Segmentation Applications of
Summarization Applications of Query Based Summarization 2.1 WEB MINING World Wide
Web is accumulated with heterogeneous information contents to cater to the information needs
of various user communities, and has now become the single largest collection of data in the
information era. This enormous corpus which is actually growing beyond the expectations and
imagination of everyone opens up scope for more research work. 25 Data mining techniques
could be applied to this huge collection of diversified contents to uncover hidden patterns, trends
and useful knowledge which could be utilized to provide value added services to web users. Few
of the value added services resulting due to the knowledge gained are, search result ranking,
targeted marketing, improved customer relationship and service, and fraudulent user or
transaction detection. Therefore capturing and discovering knowledge from the web data
resources has become very important for web mining research community. Soumen Chakrabarti
et al (1999) described a new hypertext resource discovery system known as focused crawler
which analyzed its crawl boundary to find most relevant links and to avoid irrelevant regions of
the web. This approach led to significant reduction in hardware and network resources
requirement. Mei Kobayashi and Koichi Takeda (2000) studied the growth and development of
Internet and technologies that were useful for information retrieval on the web and also discussed
the development of various techniques targeted to resolve problems such as slow retrieval speed .
System Design
7
Characteristics of the proposed system
The fake profiling created for identifying crimininals has following features
In comparison to the present system the proposed system will be less time consuming and
is more efficient.
Analysis will be very easy in proposed system as it is automated
Result will be very precise and accurate and will be declared in very short span of time
because calculation and evaluations are done by the simulator itself.
The proposed system is very secure as no chances of leakage of question paper as it is
dependent on the administrator only.
The logs of appeared candidates and their marks are stored and can be backup for future
use
Admin Table:
8
S.No. Field name Data Type Description
9
Fig. Data Flow Diagram of fake profiling
10
Fig . Use case diagram for Online Examination Portal
Conclusion
12
In this paper a new classification
algorithm was proposed
to improve detecting fake accounts on
social networks,
where the SVM trained model
decision values were used
to train a NN model, and SVM testing
decision values were
used to test the NN model.
To reach our goal we used ”MIB”
baseline dataset from
[26] and run it into pre-processing
phase where four feature
reduction techniques were used to
reduce the feature vector
In this paper a new classification
algorithm was proposed
13
to improve detecting fake accounts on
social networks,
where the SVM trained model
decision values were used
to train a NN model, and SVM testing
decision values were
used to test the NN model.
To reach our goal we used ”MIB”
baseline dataset from
[26] and run it into pre-processing
phase where four feature
reduction techniques were used to
reduce the feature vector
In this paper a new classification
algorithm was proposed
to improve detecting fake accounts on
social networks,
14
where the SVM trained model
decision values were used
to train a NN model, and SVM testing
decision values were
used to test the NN model.
To reach our goal we used ”MIB”
baseline dataset from
[26] and run it into pre-processing
phase where four feature
reduction techniques were used to
reduce the feature vector
The correlation feature set records a
remarkable accuracy
among the other feature selection
technique sets, because
correlation technique not only select
the best features, but
also removes the redundanc
15
The correlation feature set records a
remarkable accuracy
among the other feature selection
technique sets, because
correlation technique not only select
the best features, but
also removes the redundanc
In this paper a new classification algorithm was proposedto improve detecting fake accounts on
social networks,where the SVM trained model decision values were usedto train a NN model,
and SVM testing decision values wereused to test the NN model.To reach our goal we used
”MIB” baseline dataset from and run it into pre-processing phase where four featurereduction
techniques were used to reduce the feature.
As mentioned above the feature subsets with highest
accuracy was highlighted, as following:
spearmans rank-order Correlation best pattern was
(1000001000110110), Multiple linear Regression best
pattern was (0110110111001111), Wrapper-SVM best
pattern was (110111111011111). NN accuracy results
illustrated in Figure 7.
As shown in Figure 7, the results show that SVM classifier
has the highest accuracy while using Wrapper-SVM feature
set and the lowest accuracy was with Yang et al. feature
set. while the accuracy results for NN classifier were lower
than their counterparts using SVM classifier, with highest
accuracy 0.888 from regression feature set and lowest
accuracy using PCA feature set.
By comparing the accuracy results of all the three
classification algorithms, it was illuminated that SVM-NN
classification algorithm has the highest classification
accuracy results on all the feature subsets compared with
the other two previous classifiers as in Figure 8, with
highest accuracy 0.983.
6. Conclusion
16
In this paper a new classification algorithm was proposed
to improve detecting fake accounts on social networks,
where the SVM trained model decision values were used
to train a NN model, and SVM testing decision values were
used to test the NN model.
To reach our goal we used ”MIB” baseline dataset from
[26] and run it into pre-processing phase where four feature
reduction techniques were used to reduce the feature vector.
The correlation feature set records a remarkable accuracyamong the other feature selection
technique sets, becausecorrelation technique not only select the best features.
17
References
Books:
Websites :
www.tutorialspoint.com
www.w3schools.com
www.geeks4geeks.com
18