Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 40

Chapter 12 (Section

12.4):
Recommender
Second edition of the book, coming soon
Systems

Road Map

Introduction
Content-based recommendation
Collaborative filtering based recommendation

K-nearest neighbor
Association rules
Matrix factorization

CS583, Bing Liu, UIC

Introduction

Recommender systems are widely used on


the Web for recommending products and
services to users.
Most e-commerce sites have such systems.
These systems serve two important functions.

They help users deal with the information overload


by giving them recommendations of products, etc.
They help businesses make more profits, i.e.,
selling more products.

CS583, Bing Liu, UIC

E.g., movie recommendation

The most common scenario is the following:

A set of users has initially rated some subset of


movies (e.g., on the scale of 1 to 5) that they have
already seen.
These ratings serve as the input. The
recommendation system uses these known
ratings to predict the ratings that each user would
give to those not rated movies by him/her.
Recommendations of movies are then made to
each user based on the predicted ratings.

CS583, Bing Liu, UIC

Different variations

In some applications, there is no rating


information while in some others there are also
additional attributes

about each user (e.g., age, gender, income, marital


status, etc), and/or
about each movie (e.g., title, genre, director, leading
actors or actresses, etc).

When no rating information, the system will not


predict ratings but predict the likelihood that a
user will enjoy watching a movie.

CS583, Bing Liu, UIC

The Recommendation
Problem

We have a set of users U and a set of items S


to be recommended to the users.
Let p be an utility function that measures the
usefulness of item s ( S) to user u ( U), i.e.,

p:US R, where R is a totally ordered set (e.g.,


non-negative integers or real numbers in a range)

Objective

Learn p based on the past data


Use p to predict the utility value of each item s ( S)
to each user u ( U)

CS583, Bing Liu, UIC

As Prediction

Rating prediction, i.e., predict the rating score


that a user is likely to give to an item that s/he
has not seen or used before. E.g.,

rating on an unseen movie. In this case, the utility


of item s to user u is the rating given to s by u.

Item prediction, i.e., predict a ranked list of


items that a user is likely to buy or use.

CS583, Bing Liu, UIC

Two basic approaches

Content-based recommendations:

Collaborative filtering (or collaborative


recommendations):

The user will be recommended items similar to the


ones the user preferred in the past;

The user will be recommended items that people


with similar tastes and preferences liked in the past.

Hybrids: Combine collaborative and contentbased methods.

CS583, Bing Liu, UIC

Road Map

Introduction
Content-based recommendation
Collaborative filtering based recommendation

K-nearest neighbor
Association rules
Matrix factorization

CS583, Bing Liu, UIC

Content-Based
Recommendation

Perform item recommendations by predicting


the utility of items for a particular user based
on how similar the items are to those that
he/she liked in the past. E.g.,

In a movie recommendation application, a movie


may be represented by such features as specific
actors, director, genre, subject matter, etc.
The users interest or preference is also
represented by the same set of features, called
the user profile.

CS583, Bing Liu, UIC

10

Content-based
recommendation (contd)

Recommendations are made by comparing


the user profile with candidate items
expressed in the same set of features.
The top-k best matched or most similar items
are recommended to the user.
The simplest approach to content-based
recommendation is to compute the similarity
of the user profile with each item.

CS583, Bing Liu, UIC

11

Road Map

Introduction
Content-based recommendation
Collaborative filtering based recommendations

K-nearest neighbor
Association rules
Matrix factorization

CS583, Bing Liu, UIC

12

Collaborative filtering

Collaborative filtering (CF) is perhaps the


most studied and also the most widely-used
recommendation approach in practice.

k-nearest neighbor,
association rules based prediction, and
matrix factorization

Key characteristic of CF: it predicts the utility


of items for a user based on the items
previously rated by other like-minded users.

CS583, Bing Liu, UIC

13

k-nearest neighbor

kNN (which is also called the memory-based


approach) utilizes the entire user-item
database to generate predictions directly, i.e.,
there is no model building.
This approach includes both

User-based methods
Item-based methods

CS583, Bing Liu, UIC

14

User-based kNN CF

A user-based kNN collaborative filtering


method consists of two primary phases:

the neighborhood formation phase and


the recommendation phase.

There are many specific methods for both.


Here we only introduce one for each phase.

CS583, Bing Liu, UIC

15

Neighborhood formation
phase

Let the record (or profile) of the target user be


u (represented as a vector), and the record of
another user be v (v T).
The similarity between the target user, u, and
a neighbor, v, can be calculated using the
Pearsons correlation coefficient:
sim(u, v)

CS583, Bing Liu, UIC

iC

iC

(ru ,i ru )(rv ,i rv )

(ru ,i ru )

iC

(rv ,i rv )

16

Recommendation Phase

Use the following formula to compute the


rating prediction of item i for target user u
p (u, i ) ru

vV

sim(u, v ) (rv ,i r v )

vV

sim(u, v )

where V is the set of k similar users, rv,i is the


rating of user v given to item i,

CS583, Bing Liu, UIC

17

Issue with the user-based


kNN CF

The problem with the user-based formulation


of collaborative filtering is the lack of
scalability:

it requires the real-time comparison of the target


user to all user records in order to generate
predictions.

A variation of this approach that remedies this


problem is called item-based CF.

CS583, Bing Liu, UIC

18

Item-based CF

The item-based approach works by


comparing items based on their pattern of
ratings across users. The similarity of items i
and j is computed as follows:
sim(i, j )

CS583, Bing Liu, UIC

uU

(ru ,i ru )(ru , j ru )

2
(
r

r
)
uU u,i u

2
(
r

r
)
uU u, j u

19

Recommendation phase

After computing the similarity between items


we select a set of k most similar items to the
target item and generate a predicted value of
user us rating

p (u, i )

jJ u , j

sim(i, j )

sim
(
i
,
j
)
jJ

where J is the set of k similar items


CS583, Bing Liu, UIC

20

Road Map

Introduction
Content-based recommendation
Collaborative filtering based recommendation

K-nearest neighbor
Association rules
Matrix factorization

CS583, Bing Liu, UIC

21

Association rule-based CF

Association rules obviously can be used for


recommendation.
Each transaction for association rule mining is
the set of items bought by a particular user.
We can find item association rules, e.g.,
buy_X, buy_Y -> buy_Z
Rank items based on measures such as
confidence, etc.

See Chapter 3 for details

CS583, Bing Liu, UIC

22

Road Map

Introduction
Content-based recommendation
Collaborative filtering based recommendation

K-nearest neighbor
Association rules
Matrix factorization

CS583, Bing Liu, UIC

23

Matrix factorization

The idea of matrix factorization is to


decompose a matrix M into the product of
several factor matrices, i.e.,

M F1F2 ...Fn
where n can be any number, but it is usually
2 or 3.

CS583, Bing Liu, UIC

24

CF using matrix factorization

Matrix factorization has gained popularity for


CF in recent years due to its superior
performance both in terms of recommendation
quality and scalability.
Part of its success is due to the Netflix Prize
contest for movie recommendation, which
popularized a Singular Value Decomposition
(SVD) based matrix factorization algorithm.

The prize winning method of the Netflix Prize


Contest employed an adapted version of SVD

CS583, Bing Liu, UIC

25

The abstract idea

Matrix factorization a latent factor model.


Latent variables (also called features,
aspects, or factors) are introduced to account
for the underlying reasons of a user
purchasing or using a product.

When the connections between the latent variables


and observed variables (user, product, rating, etc.)
are estimated during the training
recommendations can be made to users by
computing their possible interactions with each
product through the latent variables.

CS583, Bing Liu, UIC

26

Netflix Prize Contest

CS583, Bing Liu, UIC

27

Netflix Prize Task

Training data: Quadruples of the form


(user, movie, rating, time)

For our purpose here, we only use triplets, i.e.,

(user, movie, rating)

For example, (132456, 13546, 4) means that the


user with ID 132456 gave the movie with ID 13546
a rating of 4 (out of 5).

Testing: predict the rating of each triplet:


(user, movie, ?)

CS583, Bing Liu, UIC

28

SVD factorization

The technique discussed here is based on


the SVD method given by

Simon Funk at his blog site,


the derivation of Funks method described by
Wagman in the Netflix forums.
the paper by Takacs et al.

The method was later improved by Koren et


al., Paterek and several other researchers.

CS583, Bing Liu, UIC

29

Intuitive Idea

CS583, Bing Liu, UIC

30

Simon Funks SVD method

where U = [u1, u2, , uI] and M = [m1, m2, , mJ]


CS583, Bing Liu, UIC

31

SVD method (contd)

Let us use K = 90 latent aspects (K needs to


be set experimentally).
Then, each movie will be described by only
ninety aspect values indicating how much
that movie exemplifies each aspect.
Correspondingly, each user is also described
by ninety aspect values indicating how much
he/she prefers each aspect.

CS583, Bing Liu, UIC

32

SVD method (contd)

To combine these together into a rating, we


multiply each user preference by the
corresponding movie aspect, and then sum
them up to give a rating to indicate how much
that user likes that movie:

U = [u1, u2, , uI] and M = [m1, m2, , mJ]

Using SVD, we can perform the task


K

rij u i m j uki mkj


T

k 1

CS583, Bing Liu, UIC

33

SVD method (contd)

SVD is a mathematical way to find these two


smaller matrices which minimizes the
resulting approximation error, the mean
square error (MSE).
We can use the resulting matrices U and M to
predict the ratings in the test set.

CS583, Bing Liu, UIC

34

SVD method (contd)

CS583, Bing Liu, UIC

35

SVD method (contd)

To minimize the error, the gradient descent


approach is used.
For gradient descent, we take the partial
derivative of the square error with respect to
each parameter, i.e. with respect to each uki
and mkj.
(eij ) 2
u ki

CS583, Bing Liu, UIC

2eij

eij
u ki
36

SVD method (contd)

CS583, Bing Liu, UIC

37

SVD method (contd)

CS583, Bing Liu, UIC

38

The final update rules

By the same reasoning, we can also compute


the update rule for mkj.

Finally, we have both rules

The final prediction uses Eq. (11)

CS583, Bing Liu, UIC

39

Further improvements

The two basic rules need some


improvements to make them work well.
There are also some pre-processing.
Time was also added later.
Etc
Note:

Funk used stochastic gradient descent


Not the batch (global) gradient descent.

CS583, Bing Liu, UIC

40

You might also like