Module 2

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 53

CSE4077 Recommender Systems

Module 2
MODEL-BASED COLLABORATIVE FILTERING
NEIGHBORHOOD-BASED vs MODEL-BASED
• The neighborhood-based methods can be viewed as generalizations of k-
nearest neighbor classifiers, which are commonly used in machine learning.

• These methods are instance-based methods, whereby a model is not


specifically created up front for prediction.

• Neighborhood-based methods are generalizations of instance-based


learning methods or lazy learning methods in which the prediction approach
is specific to the instance being predicted.

• For example, in user-based neighborhood methods, the peers of the target


user are determined in order to perform the prediction.
MODEL BASED
• In model-based methods, a summarized model of the data is
created up front, as with supervised or unsupervised machine
learning methods.

• Therefore, the training (or modelbuilding phase) is clearly


separated from the prediction phase.

• Examples of such methods in traditional machine learning include


decision trees, rule-based methods, Bayes classifiers, regression
models, support vector machines, and neural networks
CONT.
• In the data classification problem, an m × n matrix, in
which the first (n − 1) columns are feature variables
(or independent variables), and the last (i.e., nth)
column is the class variable (or dependent variable).

• All entries in the first (n − 1) columns are fully


specified, whereas only a subset of the entries in the
nth column is specified.

• Therefore, a subset of the rows in the matrix is fully


specified, and these rows are referred to as the
training data.

• The remaining rows are referred to as the test data.


• Model-based recommender systems often have a number of advantages over neighborhood-based methods:

1. Space-efficiency: Typically, the size of the learned model is much smaller than the original ratings matrix. Thus,
the space requirements are often quite low.

2. Training speed and prediction speed: One problem with neighborhood-based methods is that the pre-processing
stage is quadratic in either the number of users or the number of items. Model-based systems are usually much
faster in the preprocessing phase of constructing the trained model.

3. Avoiding overfitting: Overfitting is a serious problem in many machine learning algorithms, in which the
prediction is overly influenced by random artifacts in the data. This problem is also encountered in classification
and regression models. The summarization approach of model-based methods can often help in avoiding
overfitting.
Decision and Regression Trees
• Decision and regression trees are frequently used in data
classification.
• Decision trees are designed for those cases in which the
dependent variable is categorical, whereas regression trees are
designed for those cases in which the dependent variable is
numerical
How to use Decision Tree?

• Here, each node comprises an attribute (feature) that


becomes the root cause of further splitting in the downward
direction.
• Can you answer,
– How to decide which feature should be located at the root
node,
– Most accurate feature to serve as internal nodes or leaf nodes,
– How to divide tree,
– How to measure the accuracy of splitting tree and many more.
DATASET

GENDER AGE APP DOWNLOADED

F YOUNG

F ADULT

M ADULT

F ADULT

M YOUNG

M YOUNG
Gini Index
Step 1: Gini Impurity Index
SPLITTING BY GENDER

STEP 2 :
SPLIT ROOT
SPLITTING BY AGE
STEP 3: Which is the best?
STEP 4 : FINAL DECISION TREE
WHAT APP YOU WILL RECOMMEND?

• For Adult customer ??


• For Young customer?

• For Male?
• For Female?
Rule Based Collaborative
Filtering Recommendation
RULE BASED COLLABORATIVE FILTERING
Association rule mining
Item1 Item2 Item3 Item4 Item5

Alice 1 0 0 0 ?
Mine rules such as
Item1 → Item5 User1 1 0 1 0 1
support (2/4),
confidence (2/2) (without User2 1 0 1 0 1
Alice)
User3 0 0 0 1 1

User4 0 1 1 0 0
Recommendation based on Association Rule Mining

• Simplest approach Item1 Item2 Item3 Item4 Item5


– transform 5-point ratings into binary Alice 1 0 0 0 ?
ratings (1 = above user average)
User1 1 0 1 0 1
• Mine rules such as
– Item1 → Item5 User2 1 0 1 0 1
• support (2/4), confidence (2/2) (without Alice) User3 0 0 0 1 1
• Make recommendations for Alice (basic method) User4 0 1 1 0 0
HOW TO GENERATE RULES?
Item1 Item2 Item3 Item4 Item5
• Compute the following rules: Alice 1 0 0 0 ?
• Item1-> Item3, Item 5 User1 1 0 1 0 1
– Support= 2/4, conf=2/2 User2 1 0 1 0 1
User3 0 0 0 1 1
User4 0 1 1 0 0
Add Item 3 and Item 5 to recommendation list
• Item1-> Item 3
– Compute Support and Confidence

Add Item 3 to recommendation list

Finally recommended Items to Alice is Item 3, Item 5


Probabilistic based
Recommendation methods
Probabilistic methods
Calculation of probabilities in simplistic approach
Item1 Item2 Item3 Item4 Item5

Alice 1 3 3 2 ?

User1 2 4 2 2 4

User2 1 3 3 5 1
X = (Item1 =1, Item2=3, Item3= … )
User3 4 5 2 3 3

User4 1 1 5 2 1

 More to consider
 Zeros (smoothing required)
 like/dislike simplification possible
Naive Bayes Collaborative Filtering
Judging Classification Performance

• A natural criterion for judging the performance of a classifier


is the probability of making a misclassification error.
• Misclassification means that the observation belongs to one
class but the model classifies it as a member of a different
class.
• A classifier that makes no errors would be perfect, but we
do not expect to be able to construct such classifiers in the
real world due to “noise” and to not having all the
information needed to classify cases precisely.
Naïve Rule

• The Naive Bayes classification algorithm is a probabilistic


classifier. It is based on probability models that
incorporate strong independence assumptions.
• The independence assumptions often do not have an
impact on reality.
• Therefore they are considered as naive.
• Naïve Bayes Algorithm is a Probabilistic Classifier
algorithm
Naïve Bayes Classifier

• A naive Bayes classifier is an algorithm that uses


Bayes' theorem to classify objects.
• Naive Bayes classifiers assume strong, or naive,
independence between attributes of data points.
• Popular uses of naive Bayes classifiers include spam
filters, text analysis and medical diagnosis.
Formula for Naïve Bayes Theorem

• Bayes Theorem provides a principled way for calculating


the conditional probability.
• The simple form of the calculation for Bayes Theorem is
as follows:
P(A|B) = P(B|A) * P(A) / P(B)
Naïve Bayes Example

• Naive Bayes classifier assumes that the presence of a


particular feature in a class is unrelated to the presence of
any other feature.
• For example, a fruit may be considered to be an apple
if it is red, round, and about 3 inches in diameter
Uses of Naïve Algorithm

• Naïve Bayes algorithms are often used in


• sentiment analysis,
• spam filtering,
• recommendation systems
Advantages of Naïve Bayes

• It doesn't require as much training data.


• It handles both continuous and discrete data.
• It is highly scalable with the number of predictors and
data points.
• It is fast and can be used to make real-time predictions
Naïve Bayes
Latent factor models (R = UV T)
• Latent factor models are considered to be state-of-the-art in
recommender systems.

• These models leverage well-known dimensionality reduction


methods to fill in the missing entries.

• Dimensionality reduction methods are used commonly in other areas


of data analytics to represent the underlying data in a small number
of dimensions.

• The basic idea of dimensionality reduction methods is to rotate the


axis system, so that pairwise correlations between dimensions are
removed.

• The key idea in dimensionality reduction methods is that the


reduced, rotated, and completely specified representation can be
robustly estimated from an incomplete data matrix.
• Latent factor models, such as singular value decomposition (SVD), comprise an
alternative approach by transforming both items and users to the same latent factor
space, thus making them directly comparable.
• The latent space tries to explain ratings by characterizing both products and users
on factors automatically inferred from user feedback.
• For example, when the products are movies, factors might measure obvious
dimensions such as comedy vs. drama, amount of action, or orientation to children;
less well defined dimensions such as depth of character development or quirkiness;
or completely uninterpretable dimensions.
• For users, each factor measures how much the user likes movies that score high on
the corresponding movie factor.
• The use of such correlations is, after all, fundamental to all collaborative filtering methods, whether they are
neighborhood methods or model-based methods.
• For example, user-based neighborhood methods leverage user-wise correlations, whereas item-based neighborhood
methods leverage item-wise correlations.
• Matrix factorization methods provide a neat way to leverage all row and column correlations in one shot to estimate
the entire data matrix.
• This sophistication of the approach is one of the reasons that latent factor models have become the state-of-the-art in
collaborative filtering.

• The simple case in which all entries in the ratings matrix R are observed. The key idea is that any m
× n matrix R of rank k min{m, n} can always be expressed in the following product form of rank-k
factors:

R = UV T or
R ≈ UV T
CONT.

You might also like