Music Recommender 212 - 215 - 237 (1) - 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Page |1

B.E. PROJECT ON

Hybrid Music Recommender System

SUBMITTED IN PARTIAL FULFILLMENT OF REQUIREMENTS OF AWARD OF


B.E. (COMPUTER ENGINEERING)
DEGREE OF UNIVERSITY OF DELHI

SUBMITTED BY:
Akash Kumar 212/CO/10
Akshat Gupta 215/CO/10
Devender Issar 237/CO/10

GUIDED BY:
Dr. Satbir Jain

COMPUTER ENGIEERING (COE)

NETAJI SUBHAS INSTITUTE OF TECHNOGY


UNIVERSITY OF DELHI
2013-14
Page |2

CERTIFICATE

The project titled Hybrid Music Recommender System by Akash


Kumar (212/CO/10), Akshat Gupta (215/CO/10) and Devender Issar
(237/CO/10) is a record of bonafide work carried out by us, in the
Division of Computer Engineering, Netaji Subhas Institute of Technology,
New Delhi, under the supervision and guidance of Dr. Satbir Jain in
partial fulfillment of requirement for the award of the degree of Bachelor
of Engineering in Computer Engineering, University of Delhi in the
academic year 2012 - 2013.

Dr. Satbir Jain


Division of Computer Engineering
Netaji Subhas Institute of Technology
New Delhi

Dated: 30 June 2014


Page |3

CANDIDATES DECLARATION

This is to certify that the work which is being hereby presented by us in this
project titled Hybrid Music Recommender System in Partial fulfilment of the
award of the Bachelor of Engineering submitted at the Department of
Computer Engineering , Netaji Subhas Institute of Technology Delhi, is a
genuine account of our work carried out during the period from January 2014
to May 2014 under the guidance of Dr. Satbir Jain, Department of Computer
Engineering, Netaji Subhas Institute of Technology Delhi. The matter
embodied in the project report to the best of our knowledge has not been
submitted for the award of any other degree elsewhere.

Dated:

Akash Kumar

Akshat Gupta

Devender Issar

This is to certify that the above declaration by the students is true to the best
of my knowledge.

Dr. Satbir Jain


Page |4

ACKNOWLEDGEMENT

No significant achievement can be done by solo performance especially


when starting a project from ground up. This B.E. Project, on such a
revolutionary idea, has by no means been an exception. It took many very
special people to enable it and support it. Here we would like to acknowledge
their precious co-operation and express our sincere gratitude to them.

Dr. Satbir Jain has again been very supportive and involved in yet another
student project. It was his support that helped the project to start in its earliest
and most vulnerable stages. His name opened many doors for us and persuaded
many people. He was always found with energy and enthusiasm to make sure
that we were provided everything we needed. No amount of words can express
thanks to him. He was the one who backed us in providing any assistance we
needed during the project work.

We are also thankful to our friends who motivated us at each and every
step of this project. Without their interest in our project we could not have been
gone so far.

And the most of all, we would like to thanks our wonderful parents who
motivated us from day one of the project. You were the lights that lead us.

It was a great pleasure and honour to spend our time with all of them and
there could be no better payment for the efforts put into completing this B.E.
Project than their valuable presence. They are all very special to us.
Page |5

ABSTRACT

Building recommender systems have been widespread technology used over


the past few years. The music domain has long been influenced by offline radio
stations, where static playlists based on track popularity and expert
preselections are broadcast to every listener. With the advent of music
streaming platforms, such as Last.fm1 or Spotify2, the balance has shifted and
users can now create their own private radio stations. As a downside, users
now have to curate their own playlists and are less likely to discover new
music. For this, a music recommender system is an elegant supplement, which
can make use of both the wisdom of the crowds and the users past listening
history.
This dissertation investigates how mathematical models can be applied to
building recommenders systems to recognize behavioural patterns based on
historical data and to filter context information of the user from this data.
In order to further explore the potential of recommender system in the field of
recommendations we have developed an application which recommends user
based on users past data . We have used user based and content based filtering
to recommend user
The list of songs which user might want to see.
Page |6

Table of Content

1. Detailed Problem Statement


1.1. Introduction
1.2. Motivation
1.3. Problem Statement

2. Literature Survey
2.1. Recommender Systems
2.1.1. Content Based
2.1.2. Collaborative Based
2.2. User Based Collaborative Filtering
2.3. Item Based Collaborative Filtering
2.4. Tag Similarity
2.5. Track Similarity
2.6. Pearson Correlation Coefficient
2.7. Euclidean Metric

3. Software Requirement Specification


3.1. Introduction
3.1.1. Purpose
3.1.2. Scope
3.1.3. Definitions and Acronyms
3.1.4. Technologies Used
3.1.5. Refrences
3.1.6. Overview

3.2. Overall Description


3.2.1 Product Perspective
3.2.2. Product Functions
3.2.3. User Characterstics
3.2.4. Constraints
3.2.5. Assumptions

3.3. Specific Requirements


3.3.1. External Interface Requirements
3.3.2. Functional Requirements
3.3.3. Performance Requirements
3.3.4. Design Constraints
3.3.5. Non-Functional Requirements
3.3.6. Other Requirements
Page |7

3.4. Change Management Process

4. Model Proposed and Algorithm Used


4.1. Description of Database and test and Training Files

4.2. Making a Song Recommendation


4.2.1. Extract And Process Data
4.2.2. Read Training and Test Data and Build Dictionaries
4.2.3. Compute Average Rating
4.2.4. Compute Similarity
4.2.5. Compute Predicted Rating
4.2.6. Recommend Songs

4.3. Testing the Accuracy of Predictions


4.3.1. Accuracy Testing

5. Conclusion and Results


5.1. Result
5.2. Conclusion

6. Future Scope

7. References
Page |8

List of Figures

Fig 1: The Collaborative Filtering Process


Fig 2: Pictorial Representation of items
Fig 3: Diagram Depicting Item Based Collaborative Filtering
Fig 4: Diagram Depicting Workflow of Recommender System
Fig 5: Diagram Depicting Phases in Testing
Fig 6: User Details
Fig 7: Song Details
Fig 8: User Song Rating
Fig 9: Track Similarity Results
Fig 10: Tag Similarity Results
Fig 11: Error Graph i.e. abs(Actual Rating-Predicted Rating) v/s
Users.
Page |9

Chapter 1: DETAILED PROBLEM STATEMENT

1.1. Introduction

1.2. Motivation

1.3. Problem
Statement
P a g e | 10

1.1. INTRODUCTION

Due to development in machine learning and artificial intelligence techniques,


there emerged a need to develop temperature control systems which can act
more intelligently than the trivial air conditioning systems which alter the air
conditions as set by the user.

In the past twelve years, there has been interest in the development of tech, and
son on.niques that provide personalised content to users. The type of
application have included filtering of news ,messages, presenting lists of stories
or artwork that a usert may be interested in and so on.
Most of these applications have applied a technique known as Collaborative
Filtering.
This involves collecting other users opinion of how good or useful an item is ,
and then ranking items based in this information for presentation to the user.
P a g e | 11

1.2. MOTIVATION

While it may be argued that there has been some success with this technique,
there is much room for improvement. Parallel to the development of
collaborative filtering has been content based filtering. This is an approach that
tries to extract useful information from the items of the collection that are good
indicators of their usefulness of the user. It aims to develop better techniques to
locate documents that satisfy a users information need.
P a g e | 12

1.3. PROBLEM STATEMENT

Propose a predictive Model to recommend a list of songs.

Based on user feedback/ Evaluation/ Rating


Build a Hybrid/Efficient Music Recommender System
Recommendations are based upon Collaborative, User Based as
well as Content Based Filtering.
Suggest Song That Would Be Liked/Appriciated or would be of Interest to
User.
Maintain User Data base for Enhance user Experience.

Understanding:
The project aims to Build a System to recommend Music using Song heard by
User Databse and feedback /Rating provided by user. The System shall be able
to make useful/appericaible/ PERSONALISED recommendations based upon
predictive models. The system should make effective recommendations
efficiently/ fast/rapid/ Low Time Complexity.
This is an effort to implement mathematical predictive models to Improve User
Experience by making him personalised/relevant Recommendations. We believe
in ehnaced user experience by taking feedback/rating from him.

Input to the system:

Initial requirement: Dataset Comprising User, Song and Rating /Preference


/Likability specified/made by the user.
We have also user details as well as song details for personalised/enhanced
user experience.

For working of model: Feedback of user Based on the Recommendations /


Song likability are registered/recorded as input.

Output from the system:

Set /List of Songs that would interest the user based upon predictive
models.
P a g e | 13

Chapter 2: Literature Survey


2.1. Recommender Systems
2.1.1. Content Based
2.1.2. Collaborative Based
2.2. User Based Collaborative Filtering
2.3. Item Based Collaborative Filtering
2.4. Tag Similarity
2.5. Track Similarity
2.6. Pearson Correlation Coefficient
2.7. Euclidean Metric
P a g e | 14

2.1. RECOMMENDER SYSTEMS

Recommender systems analyze user's profile and the relationship between user
and target item to help user purchase or rent the item based on user's interest[2].
With the help of computer, recommender systems can analyze huge collection
of data based on users' preferences to give good recommended items. Some
online company like Netflix and Amazon use recommender systems to help
users easy to find items they want on their website[9]. Every time a user logins
to their website, a new list of recommended items are showed based on past
users reviews or purchases[3]. Instead of spend time navigate on the website
and search for the items, a recommender system can save time for the user by
display the list of items which the user likes based on users profile.

Recommender system also can help online companies sell their products better.
Recommender system can give personalize feeling to the user because it is
based on the real input from the user and it is always update. Whenever the user
buys or reviews new item, a new recommended list is created for that particular
user.

There are two groups in recommender systems:


Content Based
Content-based algorithms use user's profile to find matching items with
the user. For a twenty three year old user, a content-based algorithm will
select all items which are interested by this age. Content-based approach
also can use item's profile to recommend item to user[1]. For example, a
content-based recommender system can recommend list of movies to user
base on movies' genre which user's interest. These user and item's profiles
are difficult to collect and need to get from external source.
P a g e | 15

P a g e | 12

Collaborative Filtering
Collaborative filtering (CF) systems build a database of user opinions of
available items. They use the database to find users whose opinions are
similar (i.e., those that are highly correlated) and make predictions of user
opinion on an item by combining the opinions of other likeminded
individuals.
They don't need the explicit profiles of each user or item[1]. For a user X
who has rated five on all five movies, a CF system will analyse the data
and find all users who give the same five movies with rating of five then
recommend the list of movies that these same users' interest to user X.
P a g e | 16

Collaborative filtering techniques can be an important part of a


recommender system. One key advantage of CF is that it does not
consider the content of the items being recommended. Rather than map
users to items through "content attributes" or "demographics," CF treats
each item and user individually. Accordingly, it becomes possible to
discover new items of interest simply because other people liked them; it
is also easier to provide good recommendations even when the attributes
of greatest interest to users are unknown or hidden.
For example, many movie viewers may not want to see a particular actor
or genre so much as "a movie that makes me feel good" or "a smart,
funny movie." At the same time, CF's dependence on human ratings can
be a significant drawback. For a CF system to work well, several users
must evaluate each item; even then, new items cannot be recommended
until some users have taken the time to evaluate them. These limitations,
often referred to as the sparsity and first-rater problems, cause trouble for
users seeking obscure movies (since nobody may have rated them) or
advice on movies about to be released (since nobody has had a chance to
evaluate them).
P a g e | 17

2.2. USER BASED COLLABORATIVE FILTERING

User-based Collaborative Filtering is one of the most chosen algorithms to use


in recommender systems by online companies. It relies on the similarly
behaviours between each users in the group. These behaviours are including
buying or ratings items. The behaviours of various users in one group can help
recommending other users in same group to buy or rate different items.

Considered as the most used algorithm in Collaborating Filtering, there are


some limitations in user-based approach. The first limitation is the scalability of
the algorithm. The computation of user-base CF is more complex when the
number of users gets bigger. Therefore, it is difficult to use user-based CF in big
online service companies as Amazon and Netflix. User-based CF recommender
systems can work very well with a small dataset, but they usually don't work
well with a large dataset like Netflix's dataset. Second limitation of user based
CF is performance. Its performance is slow because User-based CF needs to
recompute the similarity of user-user every time it gives new recommendation.

FIG. 2: PICTORIAL REPRESENTATION OF ITEM


P a g e | 18

BASED COLLABORATIVE FILTERING


P a g e | 19

User Based Collaborative Filtering can be reduced to two steps:

1.2 Look for users who share the same rating patterns with the active user (the
user whom the prediction is for).
1.3Use the ratings from those like-minded users found in step 1 to calculate a
prediction for the active user.

User Based Collaborative Filtering is used by[2]:


1.4. Amazon.coms Book Matcher
1.5. Moviefinders We Predict
1.6. Style Finder
1.7. My CDNOW
P a g e | 20

2.3. ITEM BASED COLLABORATIVE FILTERING

Instead of computation between two users, the item-based collaborative filtering


algorithm computes the similarity between two items. The computation of item-
based algorithm is much simpler and more scalability than user-based
algorithm[7]. Usually, there is less number of items than users in online service
companies. For example, Netflix's dataset has over 480,000 users but there are
only 18000 movies[8].

Item-Based Collaborative Filtering invented by Amazon.com (users who


bought also bought y), proceeds in an item-centric manner:

Build an item-item matrix determining relationships between pairs of items .


Using the matrix, and the data on the current user, infer his taste.

FIG. 3: DIAGRAM DEPICTING ITEM BASED


COLLABORATIVE FILTERING
P a g e | 21

Item Based Collaborative Filtering is used by[2]:


Reel.coms Movie Matches
Moviefinders Match Maker
Amazon.coms Customers Who Bought
CDNOWs Album Advisor
Slope One
P a g e | 22

2.4. Track Similarity


Each track is represented as a vector _i = (c1; : : : ; cn), where cj
represents the number of times the user j 2 U has listened to this track, and jUj
= n is the number of all users.The similarity is determined by the adjusted
cosine similarity measure, where _cj is the average number of times a user j
listened to a track:

The incorporation of the average _cj in Equation allows to compare the


listening profiles of users with different activity levels, e.g. if a user is using
Last.fm only once a week his implicit ranking for a song should be more
influential than the one from a user who is tuning in on a daily basis.

2.5. Tag Similarity


Each track is represented as a vector = (l1; : : : ; lm), where lj [0;
100] represents the score to what extent the tag lj L describes this
track, where jLj = m is the number of all used tags. Since for tags, we do
not need to care for user-specific scales, the similarity is determined by
the cosine similarity measure

2.6. Pearson correlation coefficient

The Pearson Correlation Score is a measure of the correlation (linear


dependence) between two variables X and Y, giving a value between +1 and 1
inclusive.
Pearson's correlation coefficient between two variables is defined as the
covariance of the two variables divided by the product of their standard
deviations.
P a g e | 23

For user based algorithms, Pearson correlation only computes the similarity
between the two users who rate a same item[3]. For example, let S is the set of
items where both user x and user y rated. Then the Pearson correlation
computes the similarity between user x and user y as:

For item based algorithms, Pearson correlation algorithm computes the


similarity of two items i & j by:

Here i is the average number of rating for item i, Ru,i is number of rating user
u gives on item i.

2.7. Euclidean Distances

The Euclidean distance or Euclidean metric is the ordinary distance


between two points that would measure the distance between two points.
Similar to this approach if user ratings of songs are provided in Cartesian
coordinates ,it provides the distance/similarity between two songs .

Here d(p,q) is the Euclidean distance between two users p and q , and q1 and
p1 are the ratings of the songs provided by p and q.
To normalize the rating , make rating = 1 /(1 + d(p,q) )
P a g e | 24
P a g e | 25

Chapter 3: SOFTWARE REQUIREMENT


SPECIFICATION

This chapter includes

3.1. Introduction
3.1.1. Purpose
3.1.2. Scope
3.1.3. Definitions and Acronyms
3.1.4. Technologies Used
3.1.5. Refrences
3.1.6. Overview

3.2. Overall Description


3.2.1. Product Perspective
3.2.2. Product Functions
3.2.3. User Characterstics
3.2.4. Constraints
3.2.5. Assumptions

3.3. Specific Requirements


3.3.1. External Interface Requirements
3.3.2. Functional Requirements
3.3.3. Performance Requirements
3.3.4. Design Constraints
3.3.5. Non-Functional Requirements
3.3.6. Other Requirements

3.4. Change Management Process


P a g e | 26

3.1. INTRODUCTION
3.1.1. Purpose

The Music Recommender System is intended to provide recommendations


to user a list of songs based on users past experience ,which the user has not
currently listened and which he might want to listen. This recommender system
will use machine learning algorithm and feedbacks in the form of ratings which
user will provide after he listen the song to predict most personalised songs.

This document is meant to delineate the features of Hybrid Music


Recommneder System, so as to serve as a guide to the developers on one hand
and a software validation document for the prospective client on the other.

3.1.2. Scope
Initial functional requirements will be: -

Sytem must be able to access/store users song rating data.


System should be able to recommend a list of songs as predicted by the
algorithm. This list of songs is supposed to be the most preferable list of song
according to users previous ratings to the songs.
System should be able to take feedback from the user.
The system should be able to learn with ratings provided as a feedback by the
user and evolve with time.

Initial non functional requirements will be: -

User Details Should be Secure.


System should be portable
System should be reliable

This list is by no means, a final one. The final list will be dictated by
implementation constraints, market forces and most importantly, by end user
P a g e | 27

demands for whom this is being built.


P a g e | 28

3.1.3. Definitions, Acronyms and Abbreviations

SLA: Service Level Agreement or SLA is a formal written agreement


made between two parties, the service provider & the service recipient. It
defines the term of engagement - the fundamental rules that will govern the
relationship.

3.1.4. Technologies to be used

Programming language:

MATLAB: (matrix laboratory) is a multi-paradigm numerical


computing environment and fourth-generation programming language.
Developed by MathWorks, MATLAB allows matrix manipulations,
plotting of functionsand data, implementation of algorithms, creation
of user interfaces, and interfacing with programs written in other
languages, including C, C++, Java, and Fortran.

Tools and Development Enviornment


Wamp Server: WampServer is a Windows web development
environment. It allows you to create web applications with Apache2,
PHP and a MySQL database. Alongside, PhpMyAdmin allows you to
manage easily your databases.
ODBC Connector: ODBC (Open Database Connectivity) is a
standard programming language middleware API for accessing database
management systems (DBMS). The designers of ODBC aimed to make it
independent of database systems and operating systems; an application
written using ODBC can be ported to other platforms, both on the client
and server side, with few changes to the data access code.
P a g e | 29
P a g e | 30

3.1.6. Overview

The rest of this SRS is organized as follows: Section 2 gives an overall


description of the software. It gives what level of proficiency is expected of the
user, some general constraints while making the software and some assumptions
and dependencies that are assumed. Section 3 gives specific requirements which
the software is expected to deliver. Functional requirements are given by use
case diagram. Some performance requirements and design constraints are also
given.
P a g e | 31

3.2. OVERALL DESCRIPTION


3.2.1. Product perspective

This project is aimed towards the proposing a model for implementing


music recommender system which can be integrated into other systems for
recommendation purpose. It does not require user interference. Only feedback
from user is accepted by the system.

3.2.2. Product functions

Function 1: FINDING SIMILARITY INDEX


This is performed by applying algorithms to the find the measure of
similarity between users as well as songs. Similarity Index is Generally
computed using pearson coefficient and Euclidean Distance.

Function 2 : MAKING RECOMMENDATIONS/SUGGESTING SONGS


Suggest Song/Music to the User that would be of interest to him , based on
the weighted score of each song to be recommended.
Function 3 : USER FEEDBACK
After Listening a song, a user can rate the song.
This rating will help us in enhancing User Experience, by identifying his music
taste.

3.2.3. User characteristics

The user should know how to give feedback to the system/Rate Songs.
The user need not have any knowledge about the working of the system.
The system runs on its own and learns on its own.
P a g e | 32

Initially the System starts with Song dataset and User, User Song Data Set
is Built over time
Many parameters like Personality Traits, Social Networking Groups,
Mood, etc. have not been taken into consideration.

Initially the system starts with a smaller dataset. So, it may not give good
results initially but will learn over time.

3.2.5. Assumptions

The details related to the user are provided manually.


P a g e | 33

3.3. SPECIFIC REQUIREMENTS


3.3.1. External Interface Requirements

1) User Interface Requirements: User friendly interface is required for giving


him personalised suggestion and accepting Feedback. This feedback
recording helps in improving User Experience.

2) Hardware Interface Requirements: There are no hardware interface


requirements.

3) Communication Interface Requirements: Communication is nowhere


involved. So, there are no communication interface requirements.

3.3.2. Functional Requirements

1) SENSING TEMPERATURE: This is performed automatically by sensors.


It becomes the input to the program according to which most comfortable
inside temperature is predicted and maintained.
2) PREDICTING TEMPERATURE: This is performed by the control
system. User has no role in it. Machine learning algorithm will be used to
train the system to predict the most comfortable temperature behaviour.
This is done by the use of Hidden Markov Models.
3) USER FEEDBACK: User can give feedback anytime. The feedback is
terms of good/bad temperature settings.

3.3.3. Performance Requirements

The system is based on learning algorithm. The system will start with a
comparatively smaller dataset. With time, the system will learn and give more
efficient and appropriate predictions.

3.3.4. Design Constraints


P a g e | 34

1) Development Tools
The system shall be built using MATLAB.

2) Offline Product
There is no need of internet connection

3) Database
Database is required to store the dataset which is being built and used during the
working of the system.

3.3.5. Non-functional Requirements

1) Reliability and Availability


Back-end Internal Computers: The system shall provide storage of all databases
on redundant computers with automatic switchover.

2)Security
Data security will be maintained by the database management system used.
Network security is not required as there is no transfer of information from one
place to another. Users feedback will be stored and not displayed after it is
stored.

3)Usability
The system shall be easy to use and understand. User will only need to give
feedback. All other functions will be automated.

4) Maintainability
A commercial database is used for maintaining the database and the application
server takes care of the site. In case of a failure, a re-initialization of the program
is done. Also the software design is being done with modularity in mind so that
maintainability can be done efficiently.
P a g e | 35

5)Portability

The application is Matlab based and should be compatible with all other
systems. The end-user part is fully portable and any system should be able to use
the features of the application, including any hardware platform that is available
or will be available in the future.

3.3.6. Other Requirements

1)Dataset
The dataset for this project is quite simple. Below is the summary of the
information needed plus a short description.
i. User Details: It contains fields like User name, User id, Age, Nationality .
ii. Song Details: It contains fields like Song name, Song id, Genre.
iii. User Song Ratings: It consists of user id, song id and rating provided
/specified by the user.

2)Schedule and Budgets


The model of this project is to be proposed by 30th May 2014.
Due to lack of funding, actual implementation of the finished product is outside
the scope of this project. This project includes proposing model and showing
working of the algorithm.

3.4. CHANGE MANAGEMENT PROCESS

The SRS is developed in such a manner that any desired changes can be
introduced by the designing party in the near future according to the suitability.
P a g e | 36

Chapter 4: MODEL PROPOSED AND ALGORITHM


USED

4.1. Description of Database and test and Training Files

4.2. Making a Song Recommendation


4.2.1. Extract And Process Data
4.2.2. Read Training and Test Data and Build Dictionaries
4.2.3. Compute Average Rating
4.2.4. Compute Similarity
4.2.5. Compute Predicted Rating
4.2.6. Recommend Songs

4.3. Testing the Accuracy of Predictions


4.3.1. Accuracy Testing
P a g e | 37

4.1. DESCRIPTION OF DATA BASE AND TEST AND TRAINING


FILES

4.1.1 collecting song and user data


We downloaded the dataset from yahoo website . The size of dataset was small
and manageable and didnt require further modifications to make the size
smaaler.
The dataset for this project is quite simple. Below is the summary of the
information needed plus a short description.
a. User Details: It contains fields like User name, User id, Age,
Nationality
b. Song Details: It contains fields like Song name, Song id, Genre
c. User Song Ratings: It consists of user id, song id and rating
provided /specified by the user.

For this program, ratings from 1000 selected users on 1000 selected songs were
extracted . Each user only rated a fraction of the songs, which results in the
sparsity of the rating information.Users starting from 1 to 200 are used for
testing purposes.Rest of the users are used as training data. The goal is to
predict the ratings of first 200 users and songs, making recommendations to
these users and comparing the resultof predicted ratings with the actual ratings
given by the users to the songs.
P a g e | 38

4.2. MAKING A SONG RECOMMENDATION

This section explains the whole implementation step by step. Below is the
workflow diagram of prediction of rating of songs and recommending it to
the user.

We now describe each section of the diagram in detail mentioning the


various functions used in them.
P a g e | 39

4.2.1 EXTRACT AND PROCESS DATA


4.2.1.1. first of all , data was imported to the mysql server (wamp) with the help
of an in built utility in wamp for importing the data from a csv file
4.2.2.2. connection to the wamp server was made from matlab script via the
ODBC interface.
Data retrieved in matlab script is in the form of a cell matrix. We have a
choice to either use it as it is or use it after converting it into a normal two
dimensional matrix.
Here data set is read and three data matrices : song_details_matrix,
user_detail_matrix and user_song_rating matrix are created.

This block uses the below functions for extraction and processing of data
with specified inputs and outputs respectively.

1. Loadsongdetails which accesses the song_details table in the


mysqldatabase via a variable res and creates the song_tag_matrix
where only tags of songs are saved for calculating tag similarity later.
Tags of each song are stored in row whose number is equal to the
song_id of song. Following tags are used to describe the song:
rock,pop,jazz,classic and folk.

Input:- Input is the empty songs_tag_matrix.


Output:- Complete songs_tag_matrix filled with tags of songs .

2. loaduserdetails which accesses the user_details table in the


mysqldatabase via a variable res and creates the user_details_matrix
which consists of user_id,name, age and nationality.Details of each
user are stored in row whose number is equal to the user_id of song.

Input:- Input is the empty user_details_matrix.


Output : Complete user_details_matrix filled with mentioned details
of user .

3. loadusersongratings which accesses the user_rating table in the


mysqldatabase via a variable res and creates the user_song_rating
matrix in which row number correspond to user_id, column number
if for song_id and element(I,j) correspond to the rating given by user
i to the song j .A rating of 0 means that the user has not yet listened
the song.
Input:- Input is the empty user_song_rating.
Output: complete user_song_rating matrix is filled with ratings
P a g e | 40

given by users to the songs.

4.2.2. READ TRAINING AND TEST DATA AND BUILD


DICTIONARIES

Here we read in the training and test data set. While reading we update -
various data elements. We create a matrix for training dataset. Also for test data
set we make the rating of this song for the user equal to zero.

1. loadtrainingset This reads the rating data set of users and songs
and updates the ratings of songs for users with user_id between
201 to 1000 and saves it to the training data set .

Input:- Input is empty training data set and


user_song_rating matrix filled with ratings.
Output:- training_data matrix containing the set of training data

2. loadtestset : This reads the rating data set of users and songs and
updates the ratings of songs for users having user_id between 1 to
200 and saves it to the test data set .

Input:- Input is empty test data set and user_song_rating


matrix filled with ratings.
Output:- test_data matrix containing the set of test data
P a g e | 41

4.2.3. COMPUTE AVERAGE RATING

In this block for all users and songs, we update the average rating data member.
For any user or song, we compute sum of all ratings by iterating through all the
values of user_song_rating matrix and then dividing by total no of songs rated.
Below is the function utilised by this block.

1. processUsers In this function for all elements in


user_song_rating matrix i.e, for every (user_id, song_id,
rating), average_user_rating data is updated. This is done by
accessing a particular row of the matrix user_rating_matrix
which contains the ratings given to various songs by a particular
user. All the ratings by that particular user are added and
divided by total no of songs rated by that user.

Input: - Input is data array average_user_rating whose fields


are empty.
Output: - After processing is done, the average_user_rating
array has a floating point values showing the average
ratings of a song for a particular user.

2. processsongs In this function the song_rating_matrix data songs is


updated. This is done by referencing all the ratings given to a
particular song by all users. It represents a particular column of
User_rating_matrix belonging to that song. All the values in this are
added and divided by the number of users who rated that song.
Input: - Input is the empty average_song_rating array.
Output:- Output is the completely filled array containing
average rating of songs.
P a g e | 42

4.2.4. COMPUTE SIMILARITY

Here an O(n^2) brute force algorithm is run and for every pair of users and
songs, similarity is calculated.

Below is the function used by this block.

setuserSimilarity Next an O(n^2) brute force algorithm runs which for every
pair of users in the list populated, pearsons coefficient is calculated. This
similarity is updated in corresponding variable in user_similarity_matrix.
It calculates similarity if and only if two users have at least 1 common songs
rated. user_similarity_matrix(I,j) contains the degree of similarity between
two users.
Input:- Input to the function is user_song_rating matrix.
Output:- After processing is done for every pair of user ,
corresponding user_similarity_matrix element is updated .

settrackSimilarity In this function , an O(n^2) brute force algorithm runs


which for every pair of songs in the list populated, cosine similarity is
calculated. This similarity is updated in corresponding variable in
track_similarity_matrix.
track_similarity_matrix(I,j) contains the degree of similarity between two
songs based on the ratings given by the users to these songs.
Input:- Input to the function is user_song_rating matrix.
Output:- After processing is done for every pair of song ,
corresponding track_similarity_matrix element is updated .

settagSimilarity In this function , an O(n^2) brute force algorithm runs which


for every pair of songs in the list populated, cosine tag similarity is calculated.
This similarity is updated in corresponding variable in tag_similarity_matrix.
tag_similarity_matrix(I,j) contains the degree of similarity between two
songs based on the tags representing these songs.
Input:- Input to the function is songs_tag_matrix.
Output:- After processing is done for tags of every pair of song
, corresponding tag_similarity_matrix element is updated .
P a g e | 43

4.2.5. COMPUTE PREDICTED RATINGS

Based on the similarities calculated we compute predictions. For any user z if


song khas to be rated, all similar users existing in user_similarity_matrix in
the row of user z, Piersons correlation factor is calculated and rounded off
and added to average rating of user z This final rounded of value gives the
predicted rating.

Below is the function used by this block.

predictuserrating- Here we come back to the user_similarity_matrix and start


making predictions for the songs in the test set. For this purpose , a set of
similar users is found using the user_similarity_matrix.
The formula used for calculation of predicted rating is given in literature suvey
part. In case none of the users in in the set of similar users has rated song x,
then average rating of user i and song is used to calculate prediction.
Input:- Input to the function are user_similarity_matrix,
user_song_rating matrix and average_user_rating array and a
user_id for which predictions are to be made.
Output:- An array predicted_rating1 is created which contains
the predicted ratings of songs for user given in input.

predictscore-
The prediction score p(u; tv) of a track tv from the set of tracks Tnr(u) which
user u has so far not rated yet, is computed based on a linear combination of the
similarity scores of the tag and track recommenders,
Input:- Input to the function are track_similarity_matrix,
tag_similarity_matrix ,user_song_rating
matrix,average_song_rating array and a user_id for which
predictions are to be made.
Output:- An array predicted_rating2 is created which contains
the predicted ratings of songs for user given in input.
P a g e | 44

4.2.6 RECOMMEND SONGS

After the ratings are predicted on the basis of users, track and tags of
songs, we filter out the top five matches from each prediction. Then for
every unique user in test set, best predicted songs are recommended.

Below are the functions which help in filtering and recommendation.

1. Recommend1 This function reads previously created


predicted_rating1 array and find five songs with maximum
predicted ratings.

Input: - Input is predicted_rating1 matrix .


Output:- Output is recommendation_list1 array based on the
ratings by similar users.

2. Recommend2 This function now utilises predicted_rating2 and


recommends a user top five songs highest predicted rating on the basis
of track and tag similarity.

Input: - Input is the file predicted_rating2


Output: - Output is a file recommendation_list2 array
which contains best five predicted songs for a user.
P a g e | 45

4.3. TESTING THE ACCURACY OF PREDICTIONS

This section explains the accuracy of our predictions. Below is the work
flow diagram of accuracy testing.

FIG. 5: DIAGRAM DEPICTING PHASES IN


TESTING
P a g e | 46

4.3.1 ACCURACY TESTING

Our song recommended is validated on closeness of our


predicted ratings. To verify it we calculate our accuracy
by:-

1. Degree of closeness.
2. Root mean Square.

Following are the working steps to test accuracy:-

4.3.1.1 Given data set is divided into 2 parts


having 4/5 and 1/5 respectively.

4.3.1.2 4/5 data set is used as training set.

4.3.1.3 1/5 data set is treated as test data


even though ratings are present.

4.3.1.4 For 1/5 data set we predict ratings


and compare with the ratings already
present in training file

4.3.1.5 We compute the accuracy first by


fraction closeness.

4.3.1.6 Next we compute the rms value as a


measure of average deviation from
original values.
P a g e | 47

Chapter 5: RESULTS AND CONCLUSION

5.1. Results

5.2. Conclusion
P a g e | 48

5.1. RESULTS

The predicted model was tested on a corei3 laptop with 4GB RAM and
running windows 7 ultimate using matlab.
Step 1. Following is some snapshots of the tables imported in mysql from
excel file.

Fig 6: User details

Fig 7: Song_details table


P a g e | 49

Fig 8: Rating of songs

Step 2. Calculating similarity of songs.

2.1 Track similarity was calculated. Following is the screen shot of similarity
of 8 songs. Similarity varies between -1 and 1. Similarity >0 means songs have
some common +ve ratings by users. Since similarity of song i and song j is same
as similarity of song j and song i, so only lower part of the matrix is evaluated.
Upper part is symmetric to the lower part.
P a g e | 50

Fig 9: Track_Similarity Results

2.2 tag similarity was calculated. Following is the screen shot of similarity of 8
songs.
Similarity varies between 0 and 1. Similarity =0 means songs have no content
matching.
P a g e | 51

Fig 10: Tag Similarity Results

ERROR GRAPH

Fig 11: Error Graph i.e. abs(Actual Rating-Predicted Rating) v/s Users
P a g e | 52

5.2. CONCLUSION

Recommender System have always been an active fields amongst


masses. The Hybrid Recommender System we developed , produced
fairly good results.
However, a few more metrics can be inculcated to improve the
Recommender Results, subjected to availability of Dataset.

The derived results were much to our satisfaction but we still feel
much can be added to the same. We also want that the fellow
researchers would take active dive into this new dimension. The whole
process was enriching for us.
P a g e | 53
P a g e | 54

Chapter 6: FUTURE SCOPE


P a g e | 55

6. Future Scope

With the advent of technology, people have become more and more choosy.
They Seek personalised suggestions / recommendations for every thing.
Therefore the futue scope of recommender Systems is very high.

People will seek recommendations for movies, clothes, food, songs.


In future we aim at developing Multi-Criteron Recommender Systems(MCRS).

Multi-Criteria Recommender Systems (MCRS) can be defined as Recommender


Systems that incorporate preference information upon multiple criteria. Instead
of developing recommendation techniques based on a single criterion values, the
overall preference of user u for the item i, these systems try to predict a rating
for unexplored items of u by exploiting preference information on multiple
criteria that affect this overall preference value. Several researchers approach
MCRS as a Multi-criteria Decision Making (MCDM) problem,
P a g e | 56

REFERENCES

[1] Tho Nguyen


Web based Recommender Systems and Rating Prediction
MSc thesis at San Jose State University,2009

[2] J. Ben Schafer, Joseph Konstan, John Riedl


Recommender Systems in E-Commerce
EC99 Proceedings of the 1st ACM conference on Electronic Commerce

[3] D. Shen, Z. Lu
Computation of Correlation Coefficient and Its Confidence Interval
in
SAS
SUGI 31 Proceedings, 2006

[4] Mukund Deshpande , George Karypis


Item-Based Top-N Recommendation Algorithms
ACM Transactions on Information Systems(TOIS), Volume 22, Issue
1, January-2004

[5] G. Adomavicius, A. Tuzhilin


Towards the Next Generation of Recommender
Systems: A Survey of the State of the Art and Possible
Extensions
IEEE Transactions on Knowledge and Data Engineering 17 (2005),
634 749

[6] R. Bell and Y. Koren


``Scalable Collaborative Filtering with Jointly Derived
Neighborhood
Interpolation Weights", IEEE International Conference on Data
Mining, IEEE, 2007

[7] Badrul Sarwar, George Karypis, Joseph Konstan, John Riedl


Item Based Collaborative Filtering Recommendation Algorithms
WWW01 Proceedings of the 10th international conference on World
Wide Web
P a g e | 57

[8] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, J. Riedl


GroupLens: An open architecture for collaborative filtering of
netnews*
In Proceedings of CSCW, 1994

[9] Nathaniel Good, J. Ben Schafer, Joseph A. Konstan, Al Borchers,


Badrul Sarwar, Jon Herlocker, John Riedl
Combining Collaborative Filtering with Personal Agents
for Better Recommendations
http://www.aaai.org/Papers/AAAI/1999/AAAI99-063.pdf

[10] Jinhu Liu, Chengcheng Yang, Zi-Ke Zhang


A two-step Recommendation Algorithm via Iterative
Local Least Squares
arXiv:1206:3320v1

[11] Mark OConnor , Jon Herlocker


Clustering Items for Collaborative Filtering
http://www.csee.umbc.edu/~ian/sigir99-
rec/papers/oconner_m.pdf

[12] Jaldert Rombouts, Tessa Verhoef


A simple hybrid movie recommender system
http://www.fon.hum.uva.nl/tessa/Verhoef/Past_projects_files/Eind_
Ro mbouts_Verhoef.pdf

You might also like