1st Review PDF

Prevention of Shilling Attack in recommender
systems by detecting fake user profiles

LAB HOURS
ATTENDED
DURING
08.01.16 TO
08.02.16/
TOTAL LAB
HOURS
BATCH NO.
STUDENT NAME
ROLL NO.
G.NEERAJA
2012503551
56 hrs
R.DEEPIKA
2012503540
56hrs
K.LENAVATHI
2012503546
56hrs
T.KEERTHANA
2012503516
52hrs
GUIDE NAME
DR.S.THAMARAI
SELVI
DOMAIN
Big Data Analytics - Recommender system and collaborative
filtering
OBJECTIVE
To design a shilling attack prevention algorithm which
detects and flags the fake user profiles by their history of
ratings.
PROPOSED SYSTEM
Implement the user-user Collaborative Filtering algorithm for
recommendations with MovieLens 1M dataset where the dataset is
injected with fake profiles created with attack models.
DWT is used to extract the features which is used by SVM for classifying
the profiles.
COLLABORATIVE FILTERING TYPES:

User-based filtering
Item-based filtering
CHALLENGES ADDRESSED
Batch processing is tedious so perform online detection.

SVM generally offered the best performance compared to unsupervised
learning algorithms[7] for classifying user profiles.
.
DWT can be used instead of HHT when,
The speed of the transform implementation is crucial, and
The exact value of the instantaneous frequency is not as
important as its relative change.
Feature
HHT
DWT
Completeness
Yes
Yes
Algorithm for fast application

and
Real time applicability
Slow
Fast
Decision making latency
Yes
Yes
Inverse transform
No
Yes
ARCHITECTURE OF THE PROPOSED SYSTEM

Feature Extraction
Ratings of
the profiles
who are
blacklisted
Generating
rating series
Generating
DWT
scalogram
Amplitue
phase,
frequency of
DWT signal
Calculating
feature
values
Fake User Detection
Generate
feature
set
SVM
based
classifier
Detection
results
PHASE I
OBJECTIVE:
To Prevent shilling attack by detecting fake users by applying
Discrete Waveform Transform on users rating series and Using Support
Vector Machines to classify the users.
MODULES:
1. User based CF algorithm using LensKit
2. DWT on sample novelty and popularity based rating series
3. SVM training, testing for model feature set.
RESULT FROM PHASE I
DWT module is implemented and transformed signals are produced from

which the features can be extracted.
SVM is implemented using LIBSVM and samples are classified.
A user-user CF algorithm is implemented and LensKit is integrated with

required datasets and packages.
LITERATURE SURVEY-PHASE II
Defending Grey Attacks by Exploiting Wavelet Analysis in
Collaborative Filtering Recommender Systems
Zhihai Yang, Journal of Advanced Research in Artificial Intelligence , Vol. 4, 2015
The main contributions of this paper are summarized as follows:

Employ novelty, popularity and rating deviation of item to construct rating
series to perform discrete wavelet transform (DWT).
Extract 15 features using amplitude domain analysis method for each series
and use em-clustering for classifying profiles.
CURRENT PROBLEM :
Considering nearly 45 features for classifying.
Special focus only on grey attack.
SOLUTION:
Extract 17 features from the users rating series to detect the fake
profile for major attacks like push and nuke attack.
Clustering versus SVM for malware detection

Usha Narra, Fabio Di Troia, Journal of Computer Hacking techniques, Springer,
2015.
- Compares clustering techniques like EM clustering and K-means clustering with
Support Vector Machines.
- Experiments are conducted on malware dataset and conclusions are tabulated.
INFERENCES:
Em-clustering can be used for classification before a model has been trained.
When a model can be trained, SVM always shows better results than EM-clustering.
Comparative Analysis of Hilbert Huang and Discrete Wavelet Transform in
Processing of Signals Obtained from the Cutting Process
Zivana B. Jakovljevic, FME Transactions, Vol. 41, 2013.
Paper gives comparative survey of HHT and DWT for analysis of signals
obtained from cutting process, considering the desired outcomes of the analysis.
INFERENCE: When the speed of the transform implementation is crucial, and the
exact value of the instantaneous frequency is not as important as its relative change,
DWT is the technique of choice.
Survey of review spam detection using machine learning techniques

Michael Crawford, Taghi M. Khoshgoftaar, Journal of Big data,
Springer , 2015.
The main contributions of this paper are summarized as follows:
Study on prominent machine learning techniques and analyse
performance of different classification approaches.
INFERENCE:
Unsupervised and semi-supervised methods are currently unable to

match the performance of supervised learning methods.
Limitation of supervised learning method is that labeling the dataset for
training is tedious in real time application.
Filler Item Strategies for Shilling Attacks against Recommender Systems

Sanjog Ray,Ambuj Mahanti, Proceedings of the 42nd Hawaii International
Conference on System Sciences, IEEE, 2009.
Proposes filler item strategies for both all-user attacks and in-segment attacks.
Experiments are conducted to show that their attack strategies are the most effective
attack strategies against both user-based and item-based collaborative filtering systems.
INFERENCE:
Provides an effective approach towards constructing attack models
Shows the importance of target item and filler items in construction of successful
attack strategies.
The Definition of Novelty in Recommendation System

Liang Zhang, Journal of Engineering Science and Technology , Vol 6 , 2013.
Contains definition and algorithm of novel recommendation, the meaning of "novel",
and defines novelty of item in recommendation system. Experiment to prove novelty to
recommend can effectively ensure certain accuracy.
LIMITATION: Uses low precision algorithm to predict rated value, and that algorithm is
not commonly used
PHASE II WORK
Module 1:
Inject fake profiles users into genuine user database
Module 2:
Generate Novelty and Popularity based rating series.
Module 3:
Extract 17 features from the DWT Scalogram .
Module 4:
Calculate Performance metrics and validation.
MODULE 1 : ATTACK MODEL

(1) IS - selected items
(2) IF - set of filler items usually chosen randomly.
(3) It - set of target items.
(4) I is the set of unrated items.
Fig. General form of an attack profile[source:Ihsan Guns et. al, 2014]
MODULE 2 : GENERATION OF NOVELTY AND

POPULARITY BASED RATING
SERIES
NOVELTY in recommendation is degree to which it is unusual

from the users normal taste.
POPULARITY of items usually reflects the genuine users tastes or

preferences in a collaborative recommender system.
Procedure:
Generate similarity between item i and item j sim(i,j).
Generate novelty of item i to user u - NOIu,i
Generate novelty of item i by using NOIu,i
Sort all item in set i according to NOIi in descending order.
Create novelty based rating series of user u .
Similarly do the same for popularity based rating series.
MODULE 3: Extracting Features from DWT
The 17 features are:
NBAA novelty-based average amplitude of user u with total

items,
NBAP novelty-based average phase of user u with total items,
NBAF novelty-based average instantaneous frequency of user u

with total items,
PBAA popularity based average amplitude of user u with total

items,
PBAP popularity based average phase of user u with total items,

PBAF popularity based average instantaneous frequency of user
u with totaL items,
AAPI average amplitude of user u with popular items,
APPI average phase of user u with popular items,
AFPI average instantaneous frequency of user u with popular

items,
AAUI average amplitude of user u with unpopular items,
APUI average phase of user u with unpopular items,
AFUI average instantaneous frequency of user u with unpopular

items,
FSTI ratio between number of items rated by user u and the

number of entire items in the recommender system.
FSPI ratio between number of popular items rated by user u and

the number of entire popular items in the recommender system,
FSPII ratio between number of popular items rated by user u and

the total number of entire items rated by user u,
FSUI ratio between number of unpopular items rated by user u

and the total number of items in the recommender system,
FSUII ratio between number of items rated by user u and the total
number of entire items rated by user u.
AAUI average amplitude of user u with unpopular items,
APUI average phase of user u with unpopular items,
AFUI average instantaneous frequency of user u with unpopular

items,
FSTI ratio between number of items rated by user u and the

number of entire items in the recommender system.
FSPI ratio between number of popular items rated by user u and

the number of entire popular items in the recommender system,
FSPII ratio between number of popular items rated by user u and

the total number of entire items rated by user u,
FSUI ratio between number of unpopular items rated by user u

and the total number of items in the recommender system,
FSUII ratio between number of items rated by user u and the total
number of entire items rated by user u.
WHY 17 FEATURES ?
We take 17 features for the following reasons:
NBAA, AAPI, APPI - to distinguish all types of attack profiles
NBAP, AAUI
-to distinguish further.
AIFP
- to distinguish bandwagon attack
PBAA and PBAP
- to distinguish random and average attack
FSTI, FSPI, FSPII,

FSUI and FSUII
- to distinguish average attack based on

number of ratings in a user profile
MODULE 4: PERFORMANCE EVALUATION
Specificity , Sensitivity and Precision:
TN - number of genuine profiles which are correctly classified,
N -total number of genuine profiles,

TP - number of attack profiles which are correctly detected,
P - total number of attack profiles,
FP - number of genuine profiles misclassified as attack profiles.
Detection rate:
False positive rate:
PRELIMINARY RESULT:
Attack model vector has been created which has to be written
into dataset files.
Novelty and popularity based rating series is generated for a

specific users rating.
Amplitude , phase and frequency is extracted from DWT signal

for Novelty and popularity based rating series using which 17
features for SVM classification will be created.
Screenshots:
LENSKIT ALGORITHM EVALUATOR:
EXTRACTING AMPLITUDE, PHASE AND FREQUENCY FROM DWT:
TIMELINE:
65% completion of implementation - 8/3/16
Implementation completion
- 5/4/16
Performance validation
- 12/4/16
REFERENCES:
[1]Fuzhi Zhang, ,Quanqiang Zhou, HHTSVM: An online method for detecting

profile injection attacks in collaborative recommender systems ,in the Journal of the
Knowledge-Based System, Vol. 65, pp 96105, 2014
[2]Liang Zhang, The Definition of Novelty in Recommendation System, Journal

of Engineering Science and Technology, Vol 6 , 2013.
[3] Alper , Zeynep Ozdemira, Huseyin Polata, A novel shilling attack detection
method, in the Proceedings of the International Conference on Information
Technology and Quantitative Management, pp.166-167, 2014
[4]Zhihai Yang, Defending Grey Attacks by Exploiting Wavelet Analysis in

Collaborative Filtering Recommender Systems, International Journal of Advanced
Research in Artificial Intelligence, Vol. 4, 2015.
[5]Sanjog Ray Ambuj Mahanti,Filler Item Strategies for Shilling Attacks against
Recommender Systems in Proceedings of the Hawaii International Conference on
System Sciences ,pp . 1 -10,2009
[6]Ihsan Gunes, Cihan Kaleli, Alper Bilge , Useyin Polat, Shilling attacks against
recommender systems: a comprehensive survey,in the Journal of the Artificial
Intellingence review, Vol. 42, pp 767-799, 2014.
[7] Michael Crawford, Taghi M. Khoshgoftaar, Survey of review spam detection

using machine learning techniques , in Journal of Big data, Springer , 2015.
SUGGESTIONS AND REMARKS

1st Review PDF

Uploaded by

Copyright:

Available Formats

You might also like

1st Review PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1st Review PDF

Uploaded by

Copyright:

Available Formats

Prevention of Shilling Attack in recommender

systems by detecting fake user profiles

COLLABORATIVE FILTERING TYPES:

Batch processing is tedious so perform online detection.

Algorithm for fast application

Decision making latency

ARCHITECTURE OF THE PROPOSED SYSTEM

Fake User Detection

RESULT FROM PHASE I

DWT module is implemented and transformed signals are produced from

SVM is implemented using LIBSVM and samples are classified.

A user-user CF algorithm is implemented and LensKit is integrated with

The main contributions of this paper are summarized as follows:

Clustering versus SVM for malware detection

Survey of review spam detection using machine learning techniques

Unsupervised and semi-supervised methods are currently unable to

Filler Item Strategies for Shilling Attacks against Recommender Systems

The Definition of Novelty in Recommendation System

MODULE 1 : ATTACK MODEL

(2) IF - set of filler items usually chosen randomly.

(3) It - set of target items.

(4) I is the set of unrated items.

Fig. General form of an attack profile[source:Ihsan Guns et. al, 2014]

MODULE 2 : GENERATION OF NOVELTY AND

NOVELTY in recommendation is degree to which it is unusual

POPULARITY of items usually reflects the genuine users tastes or

Generate similarity between item i and item j sim(i,j).

Generate novelty of item i to user u - NOIu,i

Generate novelty of item i by using NOIu,i

Sort all item in set i according to NOIi in descending order.

Create novelty based rating series of user u .

Similarly do the same for popularity based rating series.

MODULE 3: Extracting Features from DWT

The 17 features are:

NBAA novelty-based average amplitude of user u with total

NBAP novelty-based average phase of user u with total items,

NBAF novelty-based average instantaneous frequency of user u

PBAA popularity based average amplitude of user u with total

PBAP popularity based average phase of user u with total items,

AAPI average amplitude of user u with popular items,

APPI average phase of user u with popular items,

AFPI average instantaneous frequency of user u with popular

AAUI average amplitude of user u with unpopular items,

APUI average phase of user u with unpopular items,

AFUI average instantaneous frequency of user u with unpopular

FSTI ratio between number of items rated by user u and the

FSPI ratio between number of popular items rated by user u and

FSPII ratio between number of popular items rated by user u and

FSUI ratio between number of unpopular items rated by user u

AAUI average amplitude of user u with unpopular items,

APUI average phase of user u with unpopular items,

AFUI average instantaneous frequency of user u with unpopular

FSTI ratio between number of items rated by user u and the

FSPI ratio between number of popular items rated by user u and

FSPII ratio between number of popular items rated by user u and

FSUI ratio between number of unpopular items rated by user u

-to distinguish further.