1st Review PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Prevention of Shilling Attack in recommender

systems by detecting fake user profiles


LAB HOURS
ATTENDED
DURING
08.01.16 TO
08.02.16/
TOTAL LAB
HOURS

BATCH NO.

STUDENT NAME

ROLL NO.

G.NEERAJA

2012503551

56 hrs

R.DEEPIKA

2012503540

56hrs

K.LENAVATHI

2012503546

56hrs

T.KEERTHANA

2012503516

52hrs

GUIDE NAME

DR.S.THAMARAI
SELVI

DOMAIN
Big Data Analytics - Recommender system and collaborative
filtering

OBJECTIVE
To design a shilling attack prevention algorithm which
detects and flags the fake user profiles by their history of
ratings.

PROPOSED SYSTEM
Implement the user-user Collaborative Filtering algorithm for
recommendations with MovieLens 1M dataset where the dataset is
injected with fake profiles created with attack models.
DWT is used to extract the features which is used by SVM for classifying

the profiles.

COLLABORATIVE FILTERING TYPES:


User-based filtering
Item-based filtering

CHALLENGES ADDRESSED

Batch processing is tedious so perform online detection.


SVM generally offered the best performance compared to unsupervised
learning algorithms[7] for classifying user profiles.

.
DWT can be used instead of HHT when,
The speed of the transform implementation is crucial, and
The exact value of the instantaneous frequency is not as
important as its relative change.

Feature

HHT

DWT

Completeness

Yes

Yes

Algorithm for fast application


and
Real time applicability

Slow

Fast

Decision making latency

Yes

Yes

Inverse transform

No

Yes

ARCHITECTURE OF THE PROPOSED SYSTEM


Feature Extraction
Ratings of
the profiles
who are
blacklisted

Generating
rating series

Generating
DWT
scalogram

Amplitue
phase,
frequency of
DWT signal

Calculating
feature
values

Fake User Detection

Generate
feature
set

SVM
based
classifier

Detection
results

PHASE I
OBJECTIVE:
To Prevent shilling attack by detecting fake users by applying
Discrete Waveform Transform on users rating series and Using Support
Vector Machines to classify the users.

MODULES:
1. User based CF algorithm using LensKit
2. DWT on sample novelty and popularity based rating series
3. SVM training, testing for model feature set.

RESULT FROM PHASE I

DWT module is implemented and transformed signals are produced from


which the features can be extracted.

SVM is implemented using LIBSVM and samples are classified.

A user-user CF algorithm is implemented and LensKit is integrated with


required datasets and packages.

LITERATURE SURVEY-PHASE II
Defending Grey Attacks by Exploiting Wavelet Analysis in
Collaborative Filtering Recommender Systems
Zhihai Yang, Journal of Advanced Research in Artificial Intelligence , Vol. 4, 2015

The main contributions of this paper are summarized as follows:


Employ novelty, popularity and rating deviation of item to construct rating
series to perform discrete wavelet transform (DWT).
Extract 15 features using amplitude domain analysis method for each series
and use em-clustering for classifying profiles.
CURRENT PROBLEM :
Considering nearly 45 features for classifying.
Special focus only on grey attack.
SOLUTION:
Extract 17 features from the users rating series to detect the fake
profile for major attacks like push and nuke attack.

Clustering versus SVM for malware detection


Usha Narra, Fabio Di Troia, Journal of Computer Hacking techniques, Springer,
2015.
- Compares clustering techniques like EM clustering and K-means clustering with
Support Vector Machines.
- Experiments are conducted on malware dataset and conclusions are tabulated.
INFERENCES:
Em-clustering can be used for classification before a model has been trained.
When a model can be trained, SVM always shows better results than EM-clustering.
Comparative Analysis of Hilbert Huang and Discrete Wavelet Transform in
Processing of Signals Obtained from the Cutting Process
Zivana B. Jakovljevic, FME Transactions, Vol. 41, 2013.
Paper gives comparative survey of HHT and DWT for analysis of signals
obtained from cutting process, considering the desired outcomes of the analysis.
INFERENCE: When the speed of the transform implementation is crucial, and the
exact value of the instantaneous frequency is not as important as its relative change,
DWT is the technique of choice.

Survey of review spam detection using machine learning techniques


Michael Crawford, Taghi M. Khoshgoftaar, Journal of Big data,
Springer , 2015.
The main contributions of this paper are summarized as follows:
Study on prominent machine learning techniques and analyse
performance of different classification approaches.
INFERENCE:

Unsupervised and semi-supervised methods are currently unable to


match the performance of supervised learning methods.
Limitation of supervised learning method is that labeling the dataset for
training is tedious in real time application.

Filler Item Strategies for Shilling Attacks against Recommender Systems


Sanjog Ray,Ambuj Mahanti, Proceedings of the 42nd Hawaii International
Conference on System Sciences, IEEE, 2009.
Proposes filler item strategies for both all-user attacks and in-segment attacks.
Experiments are conducted to show that their attack strategies are the most effective
attack strategies against both user-based and item-based collaborative filtering systems.
INFERENCE:
Provides an effective approach towards constructing attack models
Shows the importance of target item and filler items in construction of successful
attack strategies.

The Definition of Novelty in Recommendation System


Liang Zhang, Journal of Engineering Science and Technology , Vol 6 , 2013.
Contains definition and algorithm of novel recommendation, the meaning of "novel",
and defines novelty of item in recommendation system. Experiment to prove novelty to
recommend can effectively ensure certain accuracy.
LIMITATION: Uses low precision algorithm to predict rated value, and that algorithm is
not commonly used

PHASE II WORK

Module 1:
Inject fake profiles users into genuine user database

Module 2:
Generate Novelty and Popularity based rating series.

Module 3:
Extract 17 features from the DWT Scalogram .

Module 4:
Calculate Performance metrics and validation.

MODULE 1 : ATTACK MODEL


(1) IS - selected items

(2) IF - set of filler items usually chosen randomly.

(3) It - set of target items.

(4) I is the set of unrated items.

Fig. General form of an attack profile[source:Ihsan Guns et. al, 2014]

MODULE 2 : GENERATION OF NOVELTY AND


POPULARITY BASED RATING
SERIES

NOVELTY in recommendation is degree to which it is unusual


from the users normal taste.

POPULARITY of items usually reflects the genuine users tastes or


preferences in a collaborative recommender system.

Procedure:

Generate similarity between item i and item j sim(i,j).

Generate novelty of item i to user u - NOIu,i

Generate novelty of item i by using NOIu,i

Sort all item in set i according to NOIi in descending order.

Create novelty based rating series of user u .

Similarly do the same for popularity based rating series.

MODULE 3: Extracting Features from DWT

The 17 features are:

NBAA novelty-based average amplitude of user u with total


items,

NBAP novelty-based average phase of user u with total items,

NBAF novelty-based average instantaneous frequency of user u


with total items,

PBAA popularity based average amplitude of user u with total


items,

PBAP popularity based average phase of user u with total items,


PBAF popularity based average instantaneous frequency of user
u with totaL items,

AAPI average amplitude of user u with popular items,

APPI average phase of user u with popular items,

AFPI average instantaneous frequency of user u with popular


items,

AAUI average amplitude of user u with unpopular items,

APUI average phase of user u with unpopular items,

AFUI average instantaneous frequency of user u with unpopular


items,

FSTI ratio between number of items rated by user u and the


number of entire items in the recommender system.

FSPI ratio between number of popular items rated by user u and


the number of entire popular items in the recommender system,

FSPII ratio between number of popular items rated by user u and


the total number of entire items rated by user u,

FSUI ratio between number of unpopular items rated by user u


and the total number of items in the recommender system,

FSUII ratio between number of items rated by user u and the total
number of entire items rated by user u.

AAUI average amplitude of user u with unpopular items,

APUI average phase of user u with unpopular items,

AFUI average instantaneous frequency of user u with unpopular


items,

FSTI ratio between number of items rated by user u and the


number of entire items in the recommender system.

FSPI ratio between number of popular items rated by user u and


the number of entire popular items in the recommender system,

FSPII ratio between number of popular items rated by user u and


the total number of entire items rated by user u,

FSUI ratio between number of unpopular items rated by user u


and the total number of items in the recommender system,

FSUII ratio between number of items rated by user u and the total
number of entire items rated by user u.

WHY 17 FEATURES ?
We take 17 features for the following reasons:
NBAA, AAPI, APPI - to distinguish all types of attack profiles
NBAP, AAUI

-to distinguish further.

AIFP

- to distinguish bandwagon attack

PBAA and PBAP

- to distinguish random and average attack

FSTI, FSPI, FSPII,


FSUI and FSUII

- to distinguish average attack based on


number of ratings in a user profile

MODULE 4: PERFORMANCE EVALUATION

Specificity , Sensitivity and Precision:

TN - number of genuine profiles which are correctly classified,

N -total number of genuine profiles,


TP - number of attack profiles which are correctly detected,
P - total number of attack profiles,
FP - number of genuine profiles misclassified as attack profiles.

Detection rate:

False positive rate:

PRELIMINARY RESULT:
Attack model vector has been created which has to be written
into dataset files.

Novelty and popularity based rating series is generated for a


specific users rating.

Amplitude , phase and frequency is extracted from DWT signal


for Novelty and popularity based rating series using which 17
features for SVM classification will be created.

Screenshots:
LENSKIT ALGORITHM EVALUATOR:

EXTRACTING AMPLITUDE, PHASE AND FREQUENCY FROM DWT:

TIMELINE:
65% completion of implementation - 8/3/16
Implementation completion
- 5/4/16
Performance validation
- 12/4/16

REFERENCES:

[1]Fuzhi Zhang, ,Quanqiang Zhou, HHTSVM: An online method for detecting


profile injection attacks in collaborative recommender systems ,in the Journal of the
Knowledge-Based System, Vol. 65, pp 96105, 2014

[2]Liang Zhang, The Definition of Novelty in Recommendation System, Journal


of Engineering Science and Technology, Vol 6 , 2013.

[3] Alper , Zeynep Ozdemira, Huseyin Polata, A novel shilling attack detection
method, in the Proceedings of the International Conference on Information
Technology and Quantitative Management, pp.166-167, 2014

[4]Zhihai Yang, Defending Grey Attacks by Exploiting Wavelet Analysis in


Collaborative Filtering Recommender Systems, International Journal of Advanced
Research in Artificial Intelligence, Vol. 4, 2015.

[5]Sanjog Ray Ambuj Mahanti,Filler Item Strategies for Shilling Attacks against
Recommender Systems in Proceedings of the Hawaii International Conference on
System Sciences ,pp . 1 -10,2009

[6]Ihsan Gunes, Cihan Kaleli, Alper Bilge , Useyin Polat, Shilling attacks against
recommender systems: a comprehensive survey,in the Journal of the Artificial
Intellingence review, Vol. 42, pp 767-799, 2014.

[7] Michael Crawford, Taghi M. Khoshgoftaar, Survey of review spam detection


using machine learning techniques , in Journal of Big data, Springer , 2015.

SUGGESTIONS AND REMARKS

You might also like