Recommendation Systems

Chapter 12 (Section
12.4):
Recommender
Second edition of the book, coming soon
Systems
Road Map
Introduction
Content-based recommendation
Collaborative filtering based recommendation
K-nearest neighbor
Association rules
Matrix factorization
CS583, Bing Liu, UIC
Introduction
Recommender systems are widely used on

the Web for recommending products and
services to users.
Most e-commerce sites have such systems.
These systems serve two important functions.
They help users deal with the information overload

by giving them recommendations of products, etc.
They help businesses make more profits, i.e.,
selling more products.
E.g., movie recommendation
The most common scenario is the following:
A set of users has initially rated some subset of

movies (e.g., on the scale of 1 to 5) that they have
already seen.
These ratings serve as the input. The
recommendation system uses these known
ratings to predict the ratings that each user would
give to those not rated movies by him/her.
Recommendations of movies are then made to
each user based on the predicted ratings.
Different variations
In some applications, there is no rating

information while in some others there are also
additional attributes
about each user (e.g., age, gender, income, marital

status, etc), and/or
about each movie (e.g., title, genre, director, leading
actors or actresses, etc).
When no rating information, the system will not

predict ratings but predict the likelihood that a
user will enjoy watching a movie.
The Recommendation
Problem
We have a set of users U and a set of items S

to be recommended to the users.
Let p be an utility function that measures the
usefulness of item s ( S) to user u ( U), i.e.,
p:US R, where R is a totally ordered set (e.g.,

non-negative integers or real numbers in a range)
Objective
Learn p based on the past data

Use p to predict the utility value of each item s ( S)
to each user u ( U)
As Prediction
Rating prediction, i.e., predict the rating score

that a user is likely to give to an item that s/he
has not seen or used before. E.g.,
rating on an unseen movie. In this case, the utility

of item s to user u is the rating given to s by u.
Item prediction, i.e., predict a ranked list of

items that a user is likely to buy or use.
Two basic approaches
Content-based recommendations:
Collaborative filtering (or collaborative

recommendations):
The user will be recommended items similar to the

ones the user preferred in the past;
The user will be recommended items that people

with similar tastes and preferences liked in the past.
Hybrids: Combine collaborative and contentbased methods.
Road Map
Introduction
K-nearest neighbor
Association rules
Content-Based
Recommendation
Perform item recommendations by predicting

the utility of items for a particular user based
on how similar the items are to those that
he/she liked in the past. E.g.,
In a movie recommendation application, a movie

may be represented by such features as specific
actors, director, genre, subject matter, etc.
The users interest or preference is also
represented by the same set of features, called
the user profile.
10
Content-based
recommendation (contd)
Recommendations are made by comparing

the user profile with candidate items
expressed in the same set of features.
The top-k best matched or most similar items
are recommended to the user.
The simplest approach to content-based
recommendation is to compute the similarity
of the user profile with each item.
11
Road Map
Introduction
Collaborative filtering based recommendations
K-nearest neighbor
Association rules
12
Collaborative filtering
Collaborative filtering (CF) is perhaps the

most studied and also the most widely-used
recommendation approach in practice.
k-nearest neighbor,
association rules based prediction, and
matrix factorization
Key characteristic of CF: it predicts the utility

of items for a user based on the items
previously rated by other like-minded users.
13
k-nearest neighbor
kNN (which is also called the memory-based

approach) utilizes the entire user-item
database to generate predictions directly, i.e.,
there is no model building.
This approach includes both
User-based methods
Item-based methods
14
User-based kNN CF
A user-based kNN collaborative filtering

method consists of two primary phases:
the neighborhood formation phase and

the recommendation phase.
There are many specific methods for both.

Here we only introduce one for each phase.
15
Neighborhood formation
phase
Let the record (or profile) of the target user be

u (represented as a vector), and the record of
another user be v (v T).
The similarity between the target user, u, and
a neighbor, v, can be calculated using the
Pearsons correlation coefficient:
sim(u, v)
iC
iC
(ru ,i ru )(rv ,i rv )
(ru ,i ru )
iC
(rv ,i rv )
16
Recommendation Phase
Use the following formula to compute the

rating prediction of item i for target user u
p (u, i ) ru
vV
sim(u, v ) (rv ,i r v )
vV
sim(u, v )
where V is the set of k similar users, rv,i is the

rating of user v given to item i,
17
Issue with the user-based

kNN CF
The problem with the user-based formulation

of collaborative filtering is the lack of
scalability:
it requires the real-time comparison of the target

user to all user records in order to generate
predictions.
A variation of this approach that remedies this

problem is called item-based CF.
18
Item-based CF
The item-based approach works by

comparing items based on their pattern of
ratings across users. The similarity of items i
and j is computed as follows:
sim(i, j )
uU
(ru ,i ru )(ru , j ru )
2
(
r
r
)
uU u,i u
2
(
r
r
)
uU u, j u
19
Recommendation phase
After computing the similarity between items

we select a set of k most similar items to the
target item and generate a predicted value of
user us rating
p (u, i )
jJ u , j
sim(i, j )
sim
(
i
,
j
)
jJ
where J is the set of k similar items

20
Road Map
Introduction
K-nearest neighbor
Association rules
21
Association rule-based CF
Association rules obviously can be used for

recommendation.
Each transaction for association rule mining is
the set of items bought by a particular user.
We can find item association rules, e.g.,
buy_X, buy_Y -> buy_Z
Rank items based on measures such as
confidence, etc.
See Chapter 3 for details
22
Road Map
Introduction
K-nearest neighbor
Association rules
23
The idea of matrix factorization is to

decompose a matrix M into the product of
several factor matrices, i.e.,
M F1F2 ...Fn
where n can be any number, but it is usually
2 or 3.
24
CF using matrix factorization
Matrix factorization has gained popularity for

CF in recent years due to its superior
performance both in terms of recommendation
quality and scalability.
Part of its success is due to the Netflix Prize
contest for movie recommendation, which
popularized a Singular Value Decomposition
(SVD) based matrix factorization algorithm.
The prize winning method of the Netflix Prize

Contest employed an adapted version of SVD
25
The abstract idea
Matrix factorization a latent factor model.

Latent variables (also called features,
aspects, or factors) are introduced to account
for the underlying reasons of a user
purchasing or using a product.
When the connections between the latent variables

and observed variables (user, product, rating, etc.)
are estimated during the training
recommendations can be made to users by
computing their possible interactions with each
product through the latent variables.
26
Netflix Prize Contest
27
Netflix Prize Task
Training data: Quadruples of the form

(user, movie, rating, time)
For our purpose here, we only use triplets, i.e.,
(user, movie, rating)
For example, (132456, 13546, 4) means that the

user with ID 132456 gave the movie with ID 13546
a rating of 4 (out of 5).
Testing: predict the rating of each triplet:

(user, movie, ?)
28
SVD factorization
The technique discussed here is based on

the SVD method given by
Simon Funk at his blog site,

the derivation of Funks method described by
Wagman in the Netflix forums.
the paper by Takacs et al.
The method was later improved by Koren et

al., Paterek and several other researchers.
29
Intuitive Idea
30
Simon Funks SVD method
where U = [u1, u2, , uI] and M = [m1, m2, , mJ]

31
SVD method (contd)
Let us use K = 90 latent aspects (K needs to

be set experimentally).
Then, each movie will be described by only
ninety aspect values indicating how much
that movie exemplifies each aspect.
Correspondingly, each user is also described
by ninety aspect values indicating how much
he/she prefers each aspect.
32
SVD method (contd)
To combine these together into a rating, we

multiply each user preference by the
corresponding movie aspect, and then sum
them up to give a rating to indicate how much
that user likes that movie:
U = [u1, u2, , uI] and M = [m1, m2, , mJ]
Using SVD, we can perform the task

K
rij u i m j uki mkj

T
k 1
33
SVD method (contd)
SVD is a mathematical way to find these two

smaller matrices which minimizes the
resulting approximation error, the mean
square error (MSE).
We can use the resulting matrices U and M to
predict the ratings in the test set.
34
SVD method (contd)
35
SVD method (contd)
To minimize the error, the gradient descent

approach is used.
For gradient descent, we take the partial
derivative of the square error with respect to
each parameter, i.e. with respect to each uki
and mkj.
(eij ) 2
u ki
2eij
eij
u ki
36
SVD method (contd)
37
SVD method (contd)
38
The final update rules
By the same reasoning, we can also compute

the update rule for mkj.
Finally, we have both rules
The final prediction uses Eq. (11)
39
Further improvements
The two basic rules need some

improvements to make them work well.
There are also some pre-processing.
Time was also added later.
Etc
Note:
Funk used stochastic gradient descent

Not the batch (global) gradient descent.
40

Recommendation Systems

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Recommendation Systems

Uploaded by

Copyright:

Available Formats

Chapter 12 (Section

CS583, Bing Liu, UIC

Recommender systems are widely used on

They help users deal with the information overload

CS583, Bing Liu, UIC

E.g., movie recommendation

The most common scenario is the following:

A set of users has initially rated some subset of

CS583, Bing Liu, UIC

In some applications, there is no rating

about each user (e.g., age, gender, income, marital

When no rating information, the system will not

CS583, Bing Liu, UIC

We have a set of users U and a set of items S

p:US R, where R is a totally ordered set (e.g.,

Learn p based on the past data

CS583, Bing Liu, UIC

Rating prediction, i.e., predict the rating score

rating on an unseen movie. In this case, the utility

Item prediction, i.e., predict a ranked list of

CS583, Bing Liu, UIC

Two basic approaches

Collaborative filtering (or collaborative

The user will be recommended items similar to the

The user will be recommended items that people

Hybrids: Combine collaborative and contentbased methods.

CS583, Bing Liu, UIC

CS583, Bing Liu, UIC

Perform item recommendations by predicting

In a movie recommendation application, a movie

CS583, Bing Liu, UIC

Recommendations are made by comparing

CS583, Bing Liu, UIC

CS583, Bing Liu, UIC

Collaborative filtering (CF) is perhaps the

Key characteristic of CF: it predicts the utility

CS583, Bing Liu, UIC

kNN (which is also called the memory-based

CS583, Bing Liu, UIC

A user-based kNN collaborative filtering

the neighborhood formation phase and

There are many specific methods for both.

CS583, Bing Liu, UIC

Let the record (or profile) of the target user be

CS583, Bing Liu, UIC

Use the following formula to compute the

where V is the set of k similar users, rv,i is the

CS583, Bing Liu, UIC

Issue with the user-based

The problem with the user-based formulation

it requires the real-time comparison of the target

A variation of this approach that remedies this

CS583, Bing Liu, UIC

The item-based approach works by

CS583, Bing Liu, UIC

After computing the similarity between items

where J is the set of k similar items

CS583, Bing Liu, UIC

Association rules obviously can be used for

See Chapter 3 for details

CS583, Bing Liu, UIC

CS583, Bing Liu, UIC

The idea of matrix factorization is to

CS583, Bing Liu, UIC