Download as odp, pdf, or txt
Download as odp, pdf, or txt
You are on page 1of 50

Collaborative Filtering

Tyler McMullen
... which for the purposes of this talk means:

Recommendations
Netflix

Google Reader

Pandora
Last.fm

...and of course... Amazon


(shameless plug)
I like to think of it as a fill-in-blank puzzle.

Item A Item B Item C


Bob 5 1 5
Suzie 5 1 ?
Joe 1 5 1
Dataset Dataset Dataset

Correlate Correlate Correlate

Recommendations

Content Booster

Output
Data
Data

Data > Algorithms


Data

Amazon uses a simple item-to-item correlation system


Data

Amazon uses a simple item-to-item correlation system

How can they get away with that?

~ 20 million items

n million users
Data

If every user bought 200 items their user-item


matrix would be 0.001% full
purchases Data
ratings
purchases Data
ratings

views

shopping cart

votes

wishlists

baby registry

wedding registry

tell-a-friend
purchases Data
ratings

views

shopping cart

votes

wishlists

baby registry

wedding registry

tell-a-friend

anything you can measure!


Data

Data > Algorithms

more different data > more of the same data


Correlation
Correlation

Find patterns in the data sets


Correlation
Pearson

Singular Value Decomposition


Correlation
Pearson

Singular Value Decomposition

Kendall tau coefficient

Spearman's rho

point biserial correlation coefficient


Correlation

Word of Caution: Watch for O(n2) here


Recommendation
Recommendation

This is the part where we figure out what you'll like.


Recommendation

So we have all these correlation matrices.

One for each of the datasets that we correlated.

Bob Suzie Joe


Bob -0.74 0.856
Suzie 0.87 0.1
Joe 0.74 -0.9
Recommendation

So let's say we have a user named Fred...

Joe 0.9
Bob 0.75
Suzie 0.5
Recommendation

Joe
Item A 5
Joe 0.9 Item B 4

Bob 0.75 Bob


Item B 5

Suzie 0.5 Item C 2

Suzie
Item C 2
Item A 2
Recommendation

Joe
Item A 5 Item A
Joe – 5
Item B 4
Suzie – 2

Bob Item B
Item B 5 Joe – 4
Item C 2 Bob – 5

Suzie Item C
Item C 2 Bob – 2
Item A 2 Suzie – 2
Recommendation

Item A
Joe – 5
Suzie – 2
Item A 3.93
Item B
Joe – 4 Item B 4.45
Bob – 5
Item C 2
Item C
Bob – 2
Suzie – 2
Recommendation

Item A 3.93
Item B 4.45
Item C 2
Content Boosting
Content Boosting

Your users reveal their preferences in their actions.


Content Boosting

Your users reveal their preferences in their actions.

If I mark every horror movie in your system as a ”1”... I don't like horror movies.
Content Boosting

Your users reveal their preferences in their actions.

If I mark every horror movie in your system as a ”1”... I don't like horror movies.

If I rate every Will Smith movie as ”5 stars”... I probably like Will Smith.
Content Boosting

All Items have properties.


Content Boosting

All Items have properties.

Movies have genres, actors, studio, locations, etc...


Content Boosting

All Items have properties.

Movies have genres, actors, studio, locations, etc...

Comics have genres, writers, artists, publishers, etc...


Content Boosting

All Items have properties.

Movies have genres, actors, studio, locations, etc...

Comics have genres, writers, artists, publishers, etc...

Kittens have color, gender, breed, cute captions, etc...


Content Boosting

I Am Legend 5
Action
Will Smith

Cloverfield 4
Action
No Will Smith

Independence Day 4
Action
Will Smith

Sleepless in Seattle 1
Romance
No Will Smith
Content Boosting

I Am Legend 5 So what do my preferences say about me?


Action
Will Smith

Cloverfield 4
Action
No Will Smith

Independence Day 4
Action
Will Smith

Sleepless in Seattle 1
Romance
No Will Smith
Content Boosting

I Am Legend 5 So what do my preferences say about me?


Action
Will Smith My mean rating is 3.5, so...

Cloverfield 4
Action
No Will Smith

Independence Day 4
Action
Will Smith

Sleepless in Seattle 1
Romance
No Will Smith
Content Boosting

I Am Legend 5 So what do my preferences say about me?


Action
Will Smith My mean rating is 3.5, so...

Cloverfield 4
Action Action: +0.8
No Will Smith

Independence Day 4
Action
Will Smith

Sleepless in Seattle 1
Romance
No Will Smith
Content Boosting

I Am Legend 5 So what do my preferences say about me?


Action
Will Smith My mean rating is 3.5, so...

Cloverfield 4
Action Action: +0.8
No Will Smith Romance: -2.5

Independence Day 4
Action
Will Smith

Sleepless in Seattle 1
Romance
No Will Smith
Content Boosting

I Am Legend 5 So what do my preferences say about me?


Action
Will Smith My mean rating is 3.5, so...

Cloverfield 4
Action Action: +0.8
No Will Smith Romance: -2.5

Independence Day 4 Will Smith: +1


Action
Will Smith

Sleepless in Seattle 1
Romance
No Will Smith
Content Boosting

Your recommendations are only as good as the


amount and quality of your data.
Content Boosting

Your recommendations are only as good as the


amount and quality of your data.

Content Boosting is thus especially useful if you have limited data.


Output
Output

I have nothing interesting to say about output...


Output

I have nothing interesting to say about output...

Moving on.
Now let's look at some code.
http://github.com/tyler/collaborative_filter

You might also like