Collaborative Filtering: Presented by

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

P2P BASED DISTRIBUTIVE

COLLABORATIVE FILTERING
(Under Guidance of Prof. Partha Sarathi Chokraborty)

Presented by:
DEBANKAN CHAKRABORTY
ARINDAM KUNDU
RUPAM KUMAR HAZRA SUKANTA
DUTTA
INTRODUCTION
 Collaborative Filtering (CF) technique has been proved to be one of the
most successful techniques in recommender systems in recent years.
 However, most existing CF based recommender systems worked in a
centralized way and suffered from its shortage in scalability as their
calculation complexity increased quickly both in time and space
when the record in user database increases.
 In this article, we first propose a distributed CF algorithm called Pipe
CF together with two novel approaches: significance refinement and
unanimous amplification, to further improve the scalability and
prediction accuracy.
 We then show how to implement this algorithm on a Peer-to-Peer
(P2P) structure through distributed hash table method, which is the
most popular and efficient P2P routing algorithm, to construct a
scalable distributed recommender system.
 The experimental data show that the distributed CF-based
recommender system has much better scalability than traditional
centralized ones with comparable prediction efficiency and accuracy.
RECOMMENDATIO
N SYSTEM
 Recommendation system is a specific type of information filtering
technique that attempts to present information items (such as movies,
music, web sites, news) that are likely of interest to the user.
 It is of great importance for the success of e-commerce and IT
industry nowadays, and gradually gains popularity in various
applications (e.g. Netflix project, Google news, Amazon).

 Intuitively, a recommendation system builds up a user's profile based


on his/her past records, and compares it with some reference
characteristics, and seeks to predict the `rating' that a
user would give to an item he/she had not yet evaluated.

 In most cases, the recommendation system corresponds to a


large-scale data mining problem.

 Based on the choice of reference characteristics, a system could be


based on content-based approach or collaborative filtering
(CF) approach or both.
COLLABORATIVE FILTERING
 Collaborative filtering is probably the most familiar, most widely
implemented, and most mature recommendation technique .
 It is commonly used in many E-Commerce recommender systems
to support users selecting music CDs, movies, and more .
 CF is based on the assumption that people with similar tastes prefer
the same items .
 In order to generate a recommendation, CF initially creates a
neighborhood users with the highest similarity to the user whose
preferences are to be predicted.
 Advantage:
The main advantage of CF is that it is completely independent of any
item representation. Thus items can be recommended regardles of
their contents.
 Disadvantage:
CF systems are known to suffer from two inherent drawbacks :
sparsity (lack of sufficient information about the users) and cold-
start (no information about a new user or item recently added to the
system).
DISTRIBUTIVE COLLABORATIVE FILTERING
 In this work we propose a novel approach to overcome the inherent
limitations of CF ( sparsity of data and cold start) by exploiting
multiple distributed information repositories.
 These may belong to a single domain or to different domains. To
facilitate our approach ,We used Loud Voice, a multi-agent
communication infrastructure that can connect similar information
repositories into a single virtual structure called "implicit
organization".
 Repositories are partitioned between such organizations according
to geographical and topical criterion.
 Geographical distribution - imitates a situation where information
about a particular user is available only in his close vicinity.
 Topical distribution – imitates a situation where each repository
stores information related to a limited number of topics (objects
types).
We employ CF to generate user-personalized recommendations over
different data distribution policies.CF using social distinction
considerations (such as age, occupation and gender) improves the
quality of recommendations.
P2P SYSTEM
 The term ‘Peer-to-Peer’ refers to a class of systems and applications that
employ distributed resources to perform a critical function in a decentralized
manner.
 Some of the benefits of a P2P approach include: improving scalability by
avoiding dependency on centralized points; eliminating the need for costly
infrastructure by enabling direct communication among clients; and enabling
resource aggregation
 The main purpose of P2P systems are to share resources among a group of
computers called peers in a distributed way.
 Among these algorithms, distributed hashtable (DHT) algorithm is one of
the most popular and effective and supported by many P2P systems such as
CAN (Ratnasamy, Francis, Handley, Karp, & Shenker, 2001), Chord (Stocal et
al., 2001), Pastry (Rowstron & Druschel,2001), and Tapestry (Zhao et al.,
2001).
Structured P2p network
 Structured P2P network employs a globally consistent protocol to ensure that any
node can globally route a search to some to some peer that has the desired file.
 By far the most common form of structured P2P network is the DHT(Distributed
Hash Table) in which a variant of consistent hashing is used to assign ownership of each
file to a particular peer.
Distributed Hash Tables(DHTs) are a class of decentralized distributed systems that
provide a lookup service similar to a hash table: (key, value) pairs are stored in the DHT,
and any participating node can efficiently retrieve the value associated with a given key.
Responsibility for maintaining the mapping from keys to values is distributed among the
nodes, in such a way that a change in the set of participants causes a minimal amount of
disruption.
DISTRIBUTIVE HASH TABLE
 A DHT overlay network is composed of several DHT nodes and each node
keeps a set of resources (e.g. files, rating of items).
 Each resource is associated with a key (produced, for instance, by hashing
the file name) and each node in the system is responsible for storing a
certain range of keys.
 Peers in the DHT overlay network locate their wanted resource by issue a
lookup(key) request which returns the identity (e.g. the IP address) of the
node that stores the resource with the certain key.
 The primary goals of DHT are to provide an efficient, scalable, and robust
routing algorithm which aims at reducing the number of P2P hops, which are
involved when we locate a certain resource, and to reduce the amount of
routing state that should be preserved at each peer.
 In Chord (Stocal et al., 2001), each peer keeps track information of logN
other peers (N is the total number of peers in the community).

Computation of Similarity Function : Similarity between target user and candidate users so as to offer nearest neighbors to produce high-quality recommendations.

Pearson Correlation method computes user


Vector Cosine method computes user similarity using (2). Here before the scalar
product between two vectors is computed,
similarity as the scalar product of rating
ratings are normalized as the difference
vectors: between real ratings and average rating of
user.
r a ,i  rb ,i (r
iR(a,b)
a,i ra)(rb,i rb)
s (a, b) 
iR ( a ,b )
s(a,b) 
r 2
a ,i
iR ( a ,b )
 r
iR ( a ,b )
2
b ,i  a,i a 
(r
iR(a,b)
r )2
 b,i b
(r
iR(a,b)
r )2

in which, s(a,b) is the degree of in which, s(a,b) is the degree of


similarity between user a and similarity between user a and user
user b, R(a,b) is the set of items
b, R(a,b) is the set of items rated
rated by both user a and user b,
rx,y is rating that user x gives to by both user a and user b.
items y.
Computation of Prediction Function :
 Once the nearest neighbors of the active user a is obtained, the predicted rating on
product j, Ra,j is obtained by

Ra , j  Pa 
 i _ rates _ j
( Pi , j  Pi ) ra ,i
r i a ,i
Pa
 Where denotes the average ratings of
Pi , j
customer a,
Pi is the actual rating of neighbor i on
product j, denotes the average ratings of neighbor i and ra,i denotes the
correlation between the active user, a and its ith neighbor.
CONCLUTION :

 In this article, we propose a distributed collaborative filtering


algorithm for a distributed hash table (DHT) based P2P network
where each peer stores a fraction of the whole rating database in the
form of a set of user profiles.

 Other than that peers also maintain a data structure named Itemtable
which stores ratings of different users for a particular item along
with some other information in order to calculate the similarity
among users and predicted rating in a parallel and distributed way.
FUTURE RESEARCH:
 In the future, we plan to develop a generic model for users’
cooperation and information trading.

 Our future work will include to improve K-Nearest Neighbor (KNN)


method which can improve scalability in turn and to deal with the
trust related issues .
REFERENCES
 Breese, J., Heckerman, D., & Kadie, C. (1998). Empirical analysis of
predictive algorithms for collaborative filtering. In: Proceedings of
the 14th Conference on Uncertainty in Artificial Intelligence, 43–52.

 Canny, J. (2002). Collaborative filtering with privacy. In: Proceedings


of the IEEE Symposium on Research in Security and Privacy, Oakland,
CA, IEEE Computer Society, Technical Committee on Security and
Privacy, IEEE Computer Society Press, pp. 45–57.

 Eachmovie collaborative filtering data set, (1997).

 J. Herlocker, J. A. Konstan, J. Riedl, “Explaining Collaborative


Filtering Recommendations”, in Proceedings of ACM Conference on
Computer Supported Cooperative Work, Philadelphia, PA, 2000

 J. B. Schafer, J. A. Konstan, J. Riedl, “E-Commerce Recommendation


Applications”, in Journal of Data Mining and Knowledge Discovery,
Vol. 5 (1/2), pp. 115-152, 2001.

You might also like