Professional Documents
Culture Documents
Content-Based Filtering and Hybrid Methods: Francesco Ricci
Content-Based Filtering and Hybrid Methods: Francesco Ricci
and
Hybrid Methods
Francesco Ricci
eCommerce and Tourism Research Laboratory
Automated Reasoning Systems Division
ITC-irst
Trento Italy
ricci@itc.it
http://ectrl.itc.it
Content
Content-based recommenders
Demographic-based recommendations
Clustering Methods
Utility-based Methods
Hybrid methods
1
Other Recommendation Techniques
[Burke, 2002]
3
[Burke, 2002]
4
2
Content-Based Recommendation
Simple Example
Polar Express
3
Content-Based Recommender
4
Syskill & Webert [Pazzani &Billsus, 1997]
Supported Interaction
The user can then explore the Web with a browser that
in addition to showing a page
10
5
Syskill & Webert User Interface
System Prediction
11
Content Model
E (S ) = p(S ) log
c{hot ,cold }
c 2 ( p ( S c ))
6
Entropy and Information Gain example
Higher Information Gain
Smaller Entropy
13
Learning
14
7
A Better Model for the Document
TF-IDF means Term Frequency Inverse Document Frequency
15
Using TF-IDF
8
Problems of Content-Based Recommenders
17
18
9
Core Recommendation Techniques
U is a set of users
I is a set of items/products
[Burke, 2002]
19
Demographic Methods
20
10
Simple Demographic Method
21
Example
High
Education
Low
Age
22
11
Demographic-based personalization
23
Demographic-based personalization
24
12
Demographic Methods (more sophisticated)
25 [Pazzani, 1999]
Clustering Methods
26
13
Core Recommendation Techniques
U is a set of users
I is a set of items/products
[Burke, 2002]
27
Utility methods
Can build a long term utility function but more often the
systems using such an approach try to acquire a short
term utility function
28
14
Utility
related
information
29
Utility
30
15
Utility and similarity
If the user has some preferred values for the attributes, e.g, q1,
, qm, one can substitute to the value of the product a local
similarity function sim(qj, pj):
m
U (u1 ,K , um , q1 ,..., qm , p1 , K, pm ) = u j Sim(q j , p j )
j =1
31
Hybrid Methods
32
16
Fab System
34
17
Fab Architecture
35
User Feedback
36
18
Collection Agents
37
38
19
Yaos definition of: ndpm
39 [Balabanovich, 1998]
Ranking accuracy
40
20
Comparison with other approaches
Personal = FAB
personalised
recommendations
Random
recommendations
Cool = pages
recommended by
human (cool sites
of the day)
Public = best
matching an
average user
profile (i.e. , is a
non personalized
solution)
41
42 [Pazzani, 1999]
21
Content-Based Profiles
43
Winnow
Initially all the weights are set to 1. Then, for each document rated by
the user a linear threshold function is computed
w x i i >
If the above sum is over the threshold and the user did not liked then
the weights associated with each word in the document are divided by 2
If the sum is below the threshold and the user liked the document then
all weights associated with words in the document are multiplied by 2
[Pazzani, 1999]
44
22
Comparison
Contentbased
recommendation
is done with
Bayes classifier
Collaborative is
standard using
Pearson
correlation
Collaboration via
content uses the
content-based
Averaged on 44 users user profiles
built by winnow.
Precision is computed in the top 3
recommendations = (# of plus in the
45
recommendation list)/3
Details
The target user training data are 30 restaurants, but they
varied (from 3 to 20) those overlapping the other users in the
training
The content-based user model (used in the collaboration via
content) is built with these 30 restaurant using winnow
The Bayes classifier (content-based prediction) is built using
the ratings of a user on these 30 restaurants (the ratings of
the other users are never exploited)
The collaborative recommender uses the rating of common
restaurants (hence from 3 to 20 and this is the only one
affected by this parameter).
44 users
Training: 30 Test: 28
46 restaurants restaurants
23
Hybrid methods
47
Hybridization Methods
[Burke, 2002]
48
24
Weighted
49
Weighted Example
50
25
Switching
51
Mixed
52
26
Feature Combination
53
Feature Combination
Target product
products
Rating to
predict
Ratings
users
Target user
Features of
Information
the products used to make
the prediction
27
Cascade
55
Feature Augmentation
28
Meta-level
57
Summary
58
29
Questions
59
30