Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Recommender systems:

A recommender system is a type of machine learning system designed to suggest items


or content to users based on their preferences, interests, or past behavior. These systems
are widely used in various applications such as e-commerce platforms, streaming services,
social media platforms, and more.
Recommender systems utilize machine learning algorithms to analyze user data and make
personalized recommendations. There are several types of recommender systems, including:
 Association Rule Learning: Association Rule Learning (ARL) is a rule-based machine
learning technique used to find patterns in data.
 Content-based filtering: Recommends items similar to those the user has liked or
interacted with in the past, based on the content of the items.
 Collaborative filtering: Analyzes user behavior, such as ratings or interactions with
items, to find patterns and recommend items that similar users have liked.
 Hybrid approaches: Combine multiple techniques, such as content-based and
collaborative filtering, to provide more accurate and diverse recommendations.
 Matrix Factorization
Recommender systems play a crucial role in enhancing user experience, increasing
engagement, and driving sales or conversions in various online platforms by helping users
discover relevant and interesting content or products.
Figure 1: Types of recommendation systems

Association Rule Learning

Association rule mining finds interesting associations and relationships among large sets of
data items. This rule shows how frequently a itemset occurs in a transaction. A typical
example is a Market Based Analysis. Market Based Analysis is one of the key techniques
used by large relations to show associations between items.It allows retailers to identify
relationships between the items that people buy together frequently.

How does Association Rule Learning work?


Association rule learning works on the concept of If and Else Statement, such as if A then B.

Here the If element is called antecedent, and then statement is called as Consequent. These
types of relationships where we can find out some association or relation between two items
is known as single cardinality. It is all about creating rules, and if the number of items
increases, then cardinality also increases accordingly. So, to measure the associations
between thousands of data items, there are several metrics. These metrics are given below:

o Support
o Confidence
o Lift

Let's understand each of them:


Support
Support is the frequency of A or how frequently an item appears in the dataset. It is defined
as the fraction of the transaction T that contains the itemset X. If there are X datasets, then for
transactions T, it can be written as:

Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items X
and Y occur together in the dataset when the occurrence of X is already given. It is the ratio
of the transaction that contains X and Y to the number of records that contain X.

Lift
It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y are
independent of each other. It has three possible values:

o If Lift= 1: The probability of occurrence of antecedent and consequent is independent of each


other.
o Lift>1: It determines the degree to which the two itemsets are dependent to each other.
o Lift<1: It tells us that one item is a substitute for other items, which means one item has a
negative effect on another.

Association rule learning can be divided into three types of algorithms:

1. Apriori
2. Eclat
3. F-P Growth Algorithm
Association Rule Learning (ARL) is a rule-based machine learning technique used to find
patterns in data. The Apriori Algorithm is used while the Association Rule Learning takes
place. Apriori is a basket analysis method used to reveal product associations.

Figure 2: Association rule

Apriori Algorithm in Machine Learning

The Apriori algorithm uses frequent itemsets to generate association rules, and it is designed
to work on the databases that contain transactions. With the help of these association rule, it
determines how strongly or how weakly two objects are connected. This algorithm was given
by the R. Agrawal and Srikant in the year 1994. It is mainly used for market basket
analysis and helps to find those products that can be bought together.

Steps for Apriori Algorithm

Below are the steps for the apriori algorithm:

Step-1: Determine the support of itemsets in the transactional database, and select the
minimum support and confidence.

Step-2: Take all supports in the transaction with higher support value than the minimum or
selected support value.

Step-3: Find all the rules of these subsets that have higher confidence value than the
threshold or minimum confidence.

Step-4: Sort the rules as the decreasing order of lift


The Apriori algorithm, developed by Agrawal et al. in 1994, is a great success in extracting
association rules in the history of data mining. It has been the most well-known algorithm in
association rules extraction. The name of the algorithm is a priori, meaning “prior” since it
uses the a priori information of common objects, that is, takes the information from the

previous step.
The association rule is {X -> Y} where {X} and {Y} are the set of elements. This association
rule means that if all items in {X} are in one basket, {Y} will “likely” be in that basket.

 {X} is also called the left-hand side of the antecedent or association rule (LHS:
left-hand-side),

 {Y} is called the right-hand side of the consecutive or association rule (RHS:
right-hand-side).

An example attribution rule for X grocery items would be {Potato, Onion} -> {Bread}, so if
Potatoes and Onions {X} are purchased, customers will most likely also buy Bread {Y}. It
should be noted that the symbol “ -> ” does not indicate a causal relationship between {X}
and {Y}. This symbol merely reveals an estimate of the conditional probability of {Y} given

{X}.

Three metrics are important in the Apriori algorithm.

Figure 3: Formula for support, confidence and lift for the association rule X ⟹ Y
1 .Support (X, Y): It is the probability of X and Y products being seen together.
Support (X, Y) = Freq (X, Y) / N
Here N shows us the total number of transactions. If we open the expression, support is
calculated by dividing the number of purchasers of X and Y products by the total number of
transactions. In comparative models, the higher the support value, the better. The support
value must be greater than the threshold values. Those below are eliminated and the next step
is taken.

2. Confidence (X, Y): It is the probability of purchasing product Y if product X is


purchased.
Confidence (X, Y) = Freq (X, Y) / Freq (X)
Confidence is a probability value calculated by dividing the number of

purchasers of products X and Y by the number of purchasers of products X. The


higher the confidence criterion value in comparative models, the better. The
confidence criterion value must be greater than the confidence threshold values.
Those below are eliminated and the next step is taken.

3.Lift: It is the multiplier of the probability of purchasing product Y when


product X is purchased.
Lift = Support (X,Y) / (Support(X) * Support(Y))
Lift is a measure of togetherness that uses both support and trust. While this
metric controls how often Y and X items are purchased, it tells us how likely
it is that item Y is purchased when item X is purchased.

• lift = 1 means that there is no relationship between items (products or


services).
• If lift >1 means that if product X is purchased, product Y is likely to be
purchased.
• If lift <1 it means that if item X is purchased, item Y is unlikely to be
purchased.

Mathematical Approach to Apriori Algorithm


Consider the transaction dataset of a store where each transaction contains the list of items
purchased by the customers. Our goal is to find frequent set of items that are purchased by the
customers and generate the association rules for them.

We are assuming that minimum support count is 2 and minimum confidence is 50%.

Step 1: Create a table which has support count of all the items present in the transaction
database.

We will compare each item’s support count with the minimum support count we have set. If
the support count is less than minimum support count then we will remove those items.

Support count of I4 < minimum support count.


Step 2: Find all the superset with 2 items of all the items present in the last step.
Check all the subset of an itemset which are frequent or not and remove the infrequent ones. (
For example subset of { I2, I4 } are { I2 } and { I4 }but since I4 is not found as frequent in
previous step so we will not consider it ).

Since I4 was discarded in previous one, so we are not taking any superset having I4

Now, remove all those itemset which has support count less than minimum support count. So,
the final dataset will be

Step 3: Find superset with 3 items in each set present in last transaction dataset. Check all the
subset of an itemset which are frequent or not and remove the infrequent ones.

In this case if we select { I1, I2, I3 } we must have all the subset that is,
{ I1, I2 }, { I2, I3 }, { I1, I3 }. But we don’t have { I1, I3 } in our dataset. Same is true for {
I1, I3, I5 } and { I2, I3, I5 }.

So, we stop here as there are no frequent itemset present.


Step 4: As we have discovered all the frequent itemset. We will generate strong association
rule. For that we have to calculate the confidence of each rule.

All the possible association rules can be,


1. I1 -> I2
2. I2 -> I3
3. I2 -> I5

4. I2 -> I1
5. I3 -> I2
6. I5 -> I2

So, Confidence( I1 -> I2 ) = SupportCount ( I1 U I2 ) / SupportCount( I1 )


= (2 / 2) * 100 % = 100%.

Similarly we will calculate the confidence for each rule.

Since, All these association rules has confidence ≥50% then all can be considered as strong
association rules.

Step 5: We will calculate lift for all the strong association rules.

Lift ( I1 -> I2 ) = Confidence( I1 -> I2 )/ Support( I2 ) = 100 / 4 = 25 %.


Now we will sort the Lift in decreasing order.

It means that there is 25% chance that the customers who buy I1 are likely to buy I2.

collaborative filtering: Collaborative Filtering recommends items based on similarity


measures between users and/or items. The basic assumption behind the algorithm is that
users with similar interests have common preferences.

Cosine Similarity
 distance is less, there will be a high degree of similarity, but when the distance is large,
If this
there will be a low degree of similarity. Some of the popular similarity measures are –
1. Euclidean Distance.
2. Manhattan Distance.
3. Jaccard Similarity.
4. Minkowski Distance.
5. Cosine Similarity.
Cosine similarity is a metric, helpful in determining, how similar the data objects are
irrespective of their size. We can measure the similarity between two sentences in
Python using Cosine Similarity. In cosine similarity, data objects in a dataset are treated as a
vector. The formula to find the cosine similarity between two vectors is –
(x, y) = x . y / ||x|| ||y||
where,
 x . y = product (dot) of the vectors ‘x’ and ‘y’.
 ||x|| and ||y|| = length (magnitude) of the two vectors ‘x’ and ‘y’.
 ||x|| ||y|| = regular product of the two vectors ‘x’ and ‘y’.
Example : Consider an example to find the similarity between two vectors – ‘x’ and ‘y’,
using Cosine Similarity. The ‘x’ vector has values, x = { 3, 2, 0, 5 } The ‘y’ vector has
values, y = { 1, 0, 0, 0 } The formula for calculating the cosine similarity is : (x, y) = x . y /
||x|| ||y||
x . y = 3*1 + 2*0 + 0*0 + 5*0 = 3

||x|| = √ (3)^2 + (2)^2 + (0)^2 + (5)^2 = 6.16

||y|| = √ (1)^2 + (0)^2 + (0)^2 + (0)^2 = 1

∴ (x, y) = 3 / (6.16 * 1) = 0.49


The dissimilarity between the two vectors ‘x’ and ‘y’ is given by –
∴ (x, y) = 1 - (x, y) = 1 - 0.49 = 0.51
 The cosine similarity between two vectors is measured in ‘θ’.
 If θ = 0°, the ‘x’ and ‘y’ vectors overlap, thus proving they are similar.
 If θ = 90°, the ‘x’ and ‘y’ vectors are dissimilar.

Cosine Similarity between two vectors

Advantages :
 The cosine similarity is beneficial because even if the two similar data objects are
far apart by the Euclidean distance because of the size, they could still have a
smaller angle between them. Smaller the angle, higher the similarity.
 When plotted on a multi-dimensional space, the cosine similarity captures the
orientation (the angle) of the data objects and not the magnitude.

The Jaccard Similarity Index is a measure of the similarity between two sets of data.

Developed by Paul Jaccard, the index ranges from 0 to 1. The closer to 1, the more similar
the two sets of data.

The Jaccard similarity index is calculated as:

Jaccard Similarity = (number of observations in both sets) / (number in either set)

Or, written in notation form:


J(A, B) = |A∩B| / |A∪B|

If two datasets share the exact same members, their Jaccard Similarity Index will be 1.
Conversely, if they have no members in common then their similarity will be 0.

The following examples show how to calculate the Jaccard Similarity Index for a few
different datasets.
Example 1: Jaccard Similarity

Suppose we have the following two sets of data:


A = [0, 1, 2, 5, 6, 8, 9]
B = [0, 2, 3, 4, 5, 7, 9]

To calculate the Jaccard Similarity between them, we first find the total number of
observations in both sets, then divide by the total number of observations in either set:
 Number of observations in both: {0, 2, 5, 9} = 4
 Number of observations in either: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} = 10
 Jaccard Similarity: 4 / 10 = 0.4

The Jaccard Similarity Index turns out to be 0.4.


Example 2: Jaccard Similarity Continued

Suppose we have the following two sets of data:


C = [0, 1, 2, 3, 4, 5]
D = [6, 7, 8, 9, 10]

To calculate the Jaccard Similarity between them, we first find the total number of
observations in both sets, then divide by the total number of observations in either set:
 Number of observations in both: {} = 0
 Number of observations in either: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} = 11
 Jaccard Similarity: 0 / 11 = 0

The Jaccard Similarity Index turns out to be 0. This indicates that the two datasets share no
common members.
Example 3: Jaccard Similarity for Characters

Note that we can also use the Jaccard Similarity index for datasets that contain characters as
opposed to numbers.

For example, suppose we have the following two sets of data:


E = ['cat', 'dog', 'hippo', 'monkey']
F = ['monkey', 'rhino', 'ostrich', 'salmon']

To calculate the Jaccard Similarity between them, we first find the total number of
observations in both sets, then divide by the total number of observations in either set:
 Number of observations in both: {‘monkey’} = 1
 Number of observations in either: {‘cat’, ‘dog’, hippo’, ‘monkey’, ‘rhino’, ‘ostrich’,
‘salmon’} = 7
 Jaccard Similarity: 1 / 7= 0.142857
The Jaccard Similarity Index turns out to be 0.142857. Since this number is fairly low, it
indicates that the two sets are quite dissimilar.
The Jaccard Distance

The Jaccard distance measures the dissimilarity between two datasets and is calculated as:

Jaccard distance = 1 – Jaccard Similarity

This measure gives us an idea of the difference between two datasets or


the difference between them.

For example, if two datasets have a Jaccard Similarity of 80% then they would have a Jaccard
distance of 1 – 0.8 = 0.2 or 20%.

Surprise Library:
The “Surprise” library is a Python library designed specifically for developing recommender
systems easily and efficiently. It provides a collection of collaborative filtering algorithms,
which are techniques used to generate recommendations based on user behavior patterns and
preferences. The library has the purpose of making the creation of recommender systems
easier, where the implementation and evaluation of the algorithms is facilitated. This is a very
useful tool for recommendation systems, this can be used in marketing as well as in digital
commerce.
For real-world implementations, we need a more extensive library which hides all the
implementation details and provides abstract Application Programming Interfaces (APIs) to
build recommender systems. Surprise is a Python library for accomplishing this.

The Surprise library is a Python library designed to help you build and evaluate recommender
systems, which are systems that suggest items to users (like movies, products, or books)
based on their preferences and behaviors. Surprise simplifies the process of developing these
systems using collaborative filtering techniques.

Key Features of the Surprise Library

1. Collaborative Filtering Algorithms: Includes algorithms like K-Nearest Neighbors,


Singular Value Decomposition (SVD), and others.
2. Data Handling: Makes it easy to load and handle datasets, including built-in datasets and
custom data.
3. Evaluation Metrics: Provides tools for evaluating the performance of recommendation
algorithms using metrics like Root Mean Squared Error (RMSE) and Mean Absolute Error
(MAE).

Simple Example

Here’s a step-by-step example to demonstrate how to use the Surprise library to build a
simple recommender system:

Step 1: Install Surprise


First, install the Surprise library using pip:
pip install scikit-surprise
Step 2: Import Necessary Modules
Import the necessary modules from the Surprise library.
from surprise import Dataset, Reader, SVD
from surprise.model_selection import cross_validate
Step 3: Load a Built-in Dataset
Surprise comes with built-in datasets like the Movielens dataset. You can load it easily:

python
data = Dataset.load_builtin('ml-100k')
Step 4: Choose an Algorithm
Choose an algorithm to train your recommender system. Here, we'll use the SVD (Singular
Value Decomposition) algorithm.

algo = SVD()

Step 5: Evaluate the Algorithm


Use cross-validation to evaluate the performance of the algorithm. This will split the data into
training and test sets multiple times and compute the average performance.

cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Complete Example Code

Here is the complete code put together:

from surprise import Dataset, Reader, SVD


from surprise.model_selection import cross_validate

# Load the Movielens-100k dataset


data = Dataset.load_builtin('ml-100k')

# Initialize the SVD algorithm


algo = SVD()

# Perform 5-fold cross-validation


cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Output

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean Std


RMSE 0.9312 0.9295 0.9267 0.9375 0.9304 0.9311 0.0038
MAE 0.7352 0.7341 0.7330 0.7396 0.7354 0.7355 0.0023

This output shows the RMSE and MAE for each of the five folds, as well as the mean and
standard deviation across all folds.
Conclusion

The Surprise library makes it easy to develop and evaluate recommender systems using
collaborative filtering techniques. By following these simple steps, you can quickly build a
recommender system and assess its performance using built-in tools and datasets.

Matrix factorization is a matrix decomposition technique. Matrix decomposition is an approach for


reducing a matrix into its constituent parts. Matrix factorization algorithms decompose the user-item
matrix into the product of two lower dimensional rectangular matrices. In Figure 9.4, the original
matrix contains users as rows, movies as columns, and rating as values. The matrix can be
decomposed into two lower dimensional rectangular matrices

Singular Value Decomposition (SVD)

SVD is a factorization method for a matrix that decomposes it into three other matrices. It's
commonly used for dimensionality reduction and noise reduction.

import numpy as np
from sklearn.decomposition import TruncatedSVD

# Sample matrix
X = np.array([[3, 4, 3], [1, 2, 3], [4, 6, 4]])

# Perform SVD
svd = TruncatedSVD(n_components=2) # Reduce to 2 dimensions
X_svd = svd.fit_transform(X)

print("Original Matrix:\n", X)
print("SVD Transformed Matrix:\n", X_svd)
output:
Original Matrix:
[[3 4 3]
[1 2 3]
[4 6 4]]
SVD Transformed Matrix:
[[ 5.82430443 -0.18033773]
[ 3.47734928 1.38118615]
[ 8.23238015 -0.45582502]]

Non-negative Matrix Factorization (NMF)


NMF is a matrix factorization technique where the input matrix and the factors are constrained to be
non-negative. This is useful in contexts where negative values don't make sense, such as in image
processing and text mining.

import numpy as np
from sklearn.decomposition import NMF

# Sample matrix
X = np.array([[3, 4, 3], [1, 2, 3], [4, 6, 4]])

# Perform NMF
nmf = NMF(n_components=2, init='random', random_state=0)
W = nmf.fit_transform(X)
H = nmf.components_

print("Original Matrix:\n", X)
print("NMF Transformed Matrix (W):\n", W)
print("NMF Components Matrix (H):\n", H)

output:
Original Matrix:
[[3 4 3]
[1 2 3]
[4 6 4]]
NMF Transformed Matrix (W):
[[0.36765554 1.16307885]
[1.40759631 0. ]
[0.35854835 1.73810126]]
NMF Components Matrix (H):
[[0.7207332 1.41216128 2.13355718]
[2.21418567 3.10877869 1.87475062]]

Probabilistic Matrix Factorization (PMF)


PMF is a probabilistic approach to matrix factorization often used in collaborative filtering for
recommendation systems. Unlike SVD and NMF, PMF handles uncertainty and noise explicitly.

import numpy as np
from pmf import PMF

# Sample matrix (with some missing values set to 0 for simplicity)


R = np.array([[5, 3, 0, 1],
[4, 0, 0, 1],
[1, 1, 0, 5],
[1, 0, 0, 4],
[0, 1, 5, 4]])

# Indicate which values are observed


mask = R > 0

# Initialize PMF
pmf = PMF(num_factors=2, num_iters=100, learning_rate=0.001, reg=0.01)

# Fit the model


pmf.fit(R, mask)

# Predict the missing values


predictions = pmf.predict(R, mask)

print("Original Matrix:\n", R)
print("Predicted Matrix:\n", predictions)

You might also like