Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

10/7/21, 9:31 PM Matrix Factorisation — Part 7.

Part 7 | by Rakesh4real | Fnplus Club | Medium

Matrix Factorisation — Part 7


Rakesh4real Follow

Jul 29, 2019 · 6 min read

Table of Contents :
1. Introduction and Recommendation Framework

2. Evaluating Recommendation Systems

3. Content Based Recommendations

4. Neighborhood Based Collaborative Filtering

5. User and Item Based Collaborative Filtering

6. KNN Recommendations

7. Matrix Factorisation

8. Deep Learning — Introduction

9. Restricted Boltzmann Machines

10. AutoRecs

11. Amazon DSSTNE and Sage Maker

12. Real-World Challenges and Solutions

https://medium.com/fnplus/matrix-factorisation-d3cd9c4d820a 1/10
10/7/21, 9:31 PM Matrix Factorisation — Part 7. Part 7 | by Rakesh4real | Fnplus Club | Medium

Collaborative filtering is a good method. But if it is that good, why are we seeking an
alternative approach?

It is sensitive to NOISE DATA and SPARSE DATA

Good results are obtained only if to of the conditions below are satisfied.

— Large data set is available

— Data is nice and clean

MODEL BASED METHODS


Instead of finding items or users similar to each other, we apply apply Machine Leaning
and Data Science in model based methods. Basic steps involved are :

Training our model with data

Predicting using model

MATRIX FACTORISATION
It has many sub categories of techniques.

CREEPY! Manages to find broader feature of users/items on their own (eg. Action or
Romantic). Math doesn’t know what to call newly found features which are simply
described by matrices.

MAIN IDEA:

Describe users and items as combinations of different amounts of each other.

For example, we may say Bob likes action movies as well as comedy movies. The same
can told as

Bob = 50% Action +20% Comedy


Matching movies for recommendations with these attributes is a good idea for
predictions

PRINCIPAL COMPONENT ANALYSIS(PCA) ON MOVIE RATINGS

https://medium.com/fnplus/matrix-factorisation-d3cd9c4d820a 2/10
10/7/21, 9:31 PM Matrix Factorisation — Part 7. Part 7 | by Rakesh4real | Fnplus Club | Medium

PCA is a statistical procedure which reduces the dimension of our user-item matrix
without losing any important information!

It is used as,

Dimentionality reduction tool(Can’t say what newly-reduced-dimensions mean)

Feature extraction tool

Note: Often, the dimensions they find correspond to features humans have learnt to
associate with items (eg. How ‘action-ey’ a movie is)

Whatever it is about movies that causes individuals to rate them differently, PCA extracts
and finds those ‘Latent Features’

https://medium.com/fnplus/matrix-factorisation-d3cd9c4d820a 3/10
10/7/21, 9:31 PM Matrix Factorisation — Part 7. Part 7 | by Rakesh4real | Fnplus Club | Medium

If R (Orignal Matrix) has some missing values, we can reconstruct R by filling blank
values!!

SIGMA MATRIX
It is a simple diagonal matrix only used to scale the values we get into proper scale -

We can multiply this sigma matrix with M or U and still R will just be product of two
matrices

Reconstruct R to get ratings of all users for all items!

We can even predict ratings using dot products of using recunstruction formula
described above

SINGULAR VALUE DECOMPOSITION (SVD)


A way of computing M, Sigma, U-Transpose all at once efficiently and get all three
factors in one shot!

Very accurate results

Used widely during Netflix prize competition


https://medium.com/fnplus/matrix-factorisation-d3cd9c4d820a 4/10
10/7/21, 9:31 PM Matrix Factorisation — Part 7. Part 7 | by Rakesh4real | Fnplus Club | Medium

But wait! Just before applying PCA to original matrix R, how to deal with missing
values?

You may fill missing values with averages of some sort… but there is a better way!

Assume we have some ratings for any given row/column in M and U-Transpose

We can treat this as ‘minimisation of profits’ problem

Find values of those complete rows and columns that best minimizes errors in
known ratings R

Use Stochastic Gradient Descent (Just one way to do it)

Apache Spark uses a different technique called ALS (Alternating Least Squares)

Note: You might be confused here because we are talking about learning the values as
matrices in M and U.

Actually, we are predicting ratings and not computing them directly which is what SVD
does. We are not doing real SVD Recommendations because SVD can’t perform on missing
data. Hence, it is ‘SVD-Inspired-algoithm’ not SVD itself. Winner of Netflix prize was a
variation of SVD called SVD++

TIPS:
You can see source code for SVD algrithm used in surpiselib in github. But too
complex to understand.

Never write your own algorithm. Odds are too high that you will mess up
somewhere. Use third party library that has been used and validated by others.

https://medium.com/fnplus/matrix-factorisation-d3cd9c4d820a 5/10
10/7/21, 9:31 PM Matrix Factorisation — Part 7. Part 7 | by Rakesh4real | Fnplus Club | Medium

SVD v/s SVD++


Actual loss function used during SGD

In SVD++, merely rating item at all is an indication of some sort of interest in the
item. No matter what the rating was!

SVD++ outperformed SVD

Worked really well!!

It is Spooky because we don’t understand the way latent features work

GIVEN, THE USAGE OF SVD IN NETFLIX COMPETITION AND IT’S SUCCESS;


A LOT OF RESEARCH FOCUSED ON SVD. THE BELOW

LISTED ONES ARE JUST SOME OF THE ALGORITHMS THAT WERE


DEVELOPED

I . Factorisation Matrices
Well suited for predicting ratings/clicks in Recommendation System

More ‘General Purposed’

Handles sparse data with ease. Unlike shoehorning itself into problem like SVD

Available in Amazon Sage

Only downsize — WORKS ONLY WITH CATEGORICAL DATA!

II. TimeSVD++
Used for predicting next item in series of events.

III. Factorized Personalized Markov Chains


Same as above. Used for predicting next item in series of events.

IV. PLSA (Promising!)


Extracts latent features from content itself

Use PLSA on ‘movie titles’ and ‘description’ and match them up with users

As they are content based, they don’t do well on their own. Add it with user
behaviour data.
https://medium.com/fnplus/matrix-factorisation-d3cd9c4d820a 6/10
10/7/21, 9:31 PM Matrix Factorisation — Part 7. Part 7 | by Rakesh4real | Fnplus Club | Medium

HYPERPARAMETER TUNING(SVD)
Hyperparameters are the parameters chosen by the ML engineer himself in order to
recommend with effective predictions. Finding optimal hyperparameters is called
hyperparameter tuning.

Big problem in ML in general

A lot of algorithms are very sensitive to parameters such as Learning rates

Hyperparameters vary with datasets

In case of SVD, Hyperparameters are —

How many latent factors to extract?

How many dimensions to reduce to?

Note: Both of them depend on the nature of data we have.

Example Code
The below code helps us understand how we can tune hyperparameters in surpriselib
using GridSearchCV package.

https://medium.com/fnplus/matrix-factorisation-d3cd9c4d820a 7/10
10/7/21, 9:31 PM Matrix Factorisation — Part 7. Part 7 | by Rakesh4real | Fnplus Club | Medium

Hyperparameter Tuning

GridSearchCV package,

Helps in hyperparameter tuning

Defines grid of different parameters you want to try different values for (try every
possible combination)

‘param_grid’ defines what to test

‘measures’ define how to test

‘cv’ set number of folds/cross_validation

SLIM — Sparse Linear Methods (2011)


MAIN IDEA: ‘Sparse aggregation’ of other things user has rated multiplied by ‘sparse
aggregation’ of weights (magic!)

Exiting results!

Outperformed all other recommendations (Hit Rates in TopN Recommendations)

Not only applicable to movies but books, music, credit cards and many more as well!

https://medium.com/fnplus/matrix-factorisation-d3cd9c4d820a 8/10
10/7/21, 9:31 PM Matrix Factorisation — Part 7. Part 7 | by Rakesh4real | Fnplus Club | Medium

Because user has rated only some items and weights only exist for these items can be
calculated — ‘sparse’ aggregation

Extend to entire user item rating matrix with the formula below

HOW WEIGHTS ARE COMPUTED ?


Complicated

So complicated that many can’t implement it in code

Similar to SVD but requires higher mathematical concepts like

— Matrix Frobenius Norm

— L1 Norm Regularisation

If SLIM comes your way, give it a second look — Promising Stuff!

NEXT>>>

https://medium.com/fnplus/matrix-factorisation-d3cd9c4d820a 9/10

You might also like