Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Movie Recommendation System Using Item Based Collaborative

Filtering

Machine Learning Assignment 3 Report

submitted to

Mr. Nirmal Kumar Nigam


Assistant Professor - Senior Scale

by

Ujjwal Arora, Reg. No. 140911330


Nihar Maheshwari, Reg. No. 140911274
Nikhil Gupta, Reg. No. 140911466
Kinshuk Puri, Reg. No. 140911110

Nov 2017
1 Introduction
Neighborhood-based collaborative filtering algorithms, also referred to as memory-based
algorithms, were among the earliest algorithms developed for collaborative filtering.
These algorithms are based on the fact that similar users display similar patterns of
rating behavior and similar items receive similar ratings. There are two primary types of
neighborhood-based algorithms:

1. User-based collaborative filtering: In this case, the ratings provided by similar users
to a target user A are used to make recommendations for A. The predicted ratings
of A are computed as the weighted average values of these "peer group" ratings for
each item.

2. Item-based collaborative filtering: In order to make recommendations for target


item B, the first step is to determine a set S of items, which are most similar to
item B. Then, in order to predict the rating of any particular user A for item B, the
ratings in set S, which are specified by A, are determined. The weighted average of
these ratings is used to compute the predicted rating of user A for item B.

For our project, we will be using item-based collaborative filtering to predict


movie ratings taken from the MovieLens database.

1
2 Methodology

2.1 Data Set


MovieLens dataset MovieLens is a web-based research recommender system that de-
buted in Fall 1997. Each week hun- dreds of users visit MovieLens to rate and receive
recommendations for movies. The site now has over 43000 users who have expressed opin-
ions on 3500+ different movies. We randomly selected enough users to obtain 100; 000
ratings from the database (we only considered users that had rated 20 or more movies).
We divided the database into a training set and a test set. For this purpose, we
introduced a variable that determines what percentage of data is used as training and
test sets; 75% of this dataset was used as training data while the rest was used as test
data. The data set was converted into a user-item matrix A that had 943 rows (i.e., 943
users) and 1682 columns (i.e., 1682 movies that were rated by at least one of the users).

2.2 Working Principle and Algorithms Used


The first step is to collect the preferences of the users. Our Collaborative Filtering (CF)
implementation stores the data in two 2D matrices. So for each user in a row we have
columns for each item that he or she has rated. After getting the 2D dataset matrix, we
need to compute the similarity between each pair of items and the prediction score for a
target item for a given user.

2.2.1 Item Similarity Computation

Computing the similarity between items is the fundamental step of our recommendation
system, since we want to recommend similar items to customers based on what they
have bought before. The basic idea of similarity computation between two items i and
j is to firstly isolate the users who have rated both of these items and then to apply
a similarity computation technique to determine the similarity si,j . We used Adjusted
Cosine similarity technique for this.

2
Adjusted Cosine Similarity
In case of the item-based CF the similarity is computed along the columns, i.e., each
pair in the co-rated set corresponds to a different user. Computing similarity using basic
cosine measure in item-based case has one important drawback: the differences in rating
scale between different users are not taken into account. The adjusted cosine similarity
offsets sets this drawback by subtracting the corresponding user average from each co-
rated pair. Formally, the similarity between items i and j using this scheme is given
by:

2.3 Prediction Computation


After getting similarity between two different items, we then compute the prediction on
an item i for a user u by computing the sum of the ratings given by the user on the
items similar to i. Each rating is weighted by the corresponding similarity si,j . And final
weighted sum is divided by the sum of similarity to get normalized prediction value. The
prediction Pu,i is given by:

This approach tries to capture how the active user rates the similar items. The
weighted sum is then scaled by the sum of the similarity terms to make sure the prediction
is in the specific range.

3
3 Results
We evaluated the accuracy of a system by comparing the numerical recommendation
scores against the actual user ratings for the user-item pairs in the test dataset. For this,
Root Mean Squared Error (RMSE) was used. Assuming a total of n tuples in the test
dataset and the error between the predict and the acutal value of each user-item pair is
ei , RMSE is calculated as follows:

Heres a few of the predicted ratings:

User ID Movie ID Actual Rating Predicted Rating


1 1172 4 3.4
1 1263 2 2
1 1293 2 2
1 1339 3.5 3.8
2 1343 2 2

Our model has a root mean squared error of 1.182.

4 Conclusion
Recommender systems are a powerful new technology for extracting additional value for
a business from its user databases. These systems help users find items they want to
buy from a business. Conversely, they help the business by generating more sales. Rec-
ommender systems are rapidly becoming a crucial tool in E-commerce on the Web. Our
results show that item-based techniques hold the promise of allowing CF-based algorithms
to scale to large data sets and at the same time produce high-quality recommendations.

4
5 Division of Work
The division of work among team members is as follows:

Ujjwal Arora, 140911330

Item Similarity computation.

Accuracy computation.

This report.

Nihar Maheshwari, 140911274

Prediction computation.

Clustering for data reduction.

Accuracy computation.

Nikhil Gupta, 14011466

Item Similarity computation.

Prediction computation.

Writing driver functions.

Kinshuk Puri, 140911110

Gathering and processing the dataset.

Prediction Computation.

References

B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, "Item-Based Collaborative Filtering


Recommendation Algorithms," Proc. 10th Intl WWW Conf., 2001

You might also like