Music Recommendation System

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Internship Seminar
on
Music Recommender System

UNDER THE GUIDANCE OF : PRESENTED BY:


PROF.MR. NIRUPADI TIDIGOL MUKUND B N (1SZ18CS008)
PRERANA Y (1SZ19CS009)
MOHANMALA K N (1SZ19CS005)
INTRODUCTION
With the rise of digital content distribution, we have access to a huge music
collection. With millions of songs to choose from, we sometimes feel overwhelmed.
Thus, an efficient music recommender system is necessary in the interest of both
music service providers and customers

Our music recommender system is large-scale and personalized. We learn from


users’ listening history and features of songs and predict songs that a user would
like to listen to.
Abstract
 In our project, we will be using a sample data of songs to find correlations between users
and songs so that a new song will be recommended to them based on their previous
history.
We will implement this project using libraries like NumPy, Pandas. We will also be using
Cosine similarity along with Count Vectorizer.
A music recommendation system is a software application designed to suggest songs or
playlists to users based on their music preferences and listening history. The system uses
data mining techniques to analyze the user's listening habits, such as the genres of music
they listen to, the artists they prefer, and the songs they frequently play.
WHY A RECOMMENDER NEEDED

• Music dataset is too big while life is short!!!! You need someone to
teach you how to manage and give you wise suggestions
according to your taste!

• Music service providers need a more efficient system to attraction their clients!
OUR SYSTEM

User‘s Music
listening Prediction of
history
&music Recommender songs that
user will listen
informatio to
n
System

Our system: off-line system


 Features:  Too big dataset:
 Large-scale: 1 000 000  Difficult to implement the whole
users dataset, so need to create a
15000 000 songs small dataset by ourselves
 Open
 Implicit feedback
 Content:
 Format of the Dataset:
 Hdf5 files
 Triplets (user, song, count)  Need to be opened by a
 Meta-data, content-analysis
Python Wrapper
 No users’ demographic
information,
timestamp
Features & Content Difficulties linked to the Data
1.Popularity 2.Same
based artist Content based
model greatest hits Model

Latent factor Nearest


...
Model Neighborhood
3.Collaborative 4.Content-
SVD
filtering based
Model
Idea Pros:
1. Sort songs by popularity in  Idea is simple
a decreasing order  Easy to implement
2. For each user, recommend
 Served as baseline
the songs in order of
popularity, except those Cons:
already in the user’s profile  Not personalized (users and
songs’ information is not taken
into account)
 Some songs will never be listend

Idea & Steps Pros & Cons


Idea Pros:
 Idea is simple
1. Sort songs by popularity in  Easy to implement
a decreasing order  Minimum personalized
2. For each user, the ranking
of songs is re-ordered to
place songs by artists Cons:
3. recommend the songs in the  Only single-meta-data is used
new order, except those  Maximally conservative:
already in the user’s profile explore the space byond songs
doesn’t
with which the user is likely
already familiar

Idea & Steps Pros & Cons


Idea: songs that are often listened by the same user tend Idea: users who listen to the same songs in the past
to be similar and are more likely to be listened together in tend to have similar interests and will probably listen to
future by some other user. the same songs in future.

Item-based User-based
🞂 COLLABORATIVE FILTERING
This method utilizes preferences and behaviors of other users to come up with
recommendations.
The general operation of these systems is to pair users with similar tastes together into
groups. Recommendations are then made based on the collective preferences of the users
within each group.
Thus if user x and user y are both members of the same preference group, and user x
likes a particular sample, then there should be a high probability that user y also like this
sample. Obviously, this method requires a significantly large initial dataset to provide
relevant predictions.
1. Based on item’s description Similarity !=
and user’s preference profile recommendation (no
2. Not based on choices of notion of personalization)
other users with
similar interests
3. We make recommendations
by looking for music Majority of songs have too
whose features are very few listeners, so difficult to
similar to the tastes of the “collaborate”
user

What’s content-based model? And Why?


1. Create a space of songs according to songs features.
We find out neighborhood of each song.

2. We look at each user’s profile and suggest songs which are


neighbors to the songs that he listens to
Idea: SVD  Personalized
 Listening histories are  Meta-data is fully used, all
influenced by a set of factors the information is well
specific to the domain (e.g. explored
Genre, artist).  It works well in many tested
These factors are in general not cases
obvious and we need to infer
those so called latent factors from
the data.
Users and songs are
characterized by latent factors.

Idea
Exploratory Data Exploratory Data Analysis
 
EDA is an approach to analyzing the data using visual techniques. It is used to discover
trends, and patterns, or to check assumptions with the help of statistical summaries and
graphical representations. 
The dataset we have contains around 14 numerical columns but we cannot visualize
such high-dimensional data. But to solve this problem t-SNE comes to the rescue. t-
SNE is an algorithm that can convert high dimensional data to low dimensions and uses
some non-linear method to do so which is not a concern of this article.
 
S System Architecture

🞂
Data Collection and Understanding Process:

The real dataset is used for the research. We have taken music data which
contains 2000 records and 15 fields, including categorical and numeric
features. Each record in the music information, and each field in the record
represents a feature of that particular employee.
Data Preparation and Pre-processing:
After the process of data collection is finished, the process of preparing
the data is performed. It is important to refine this data so that it can be
suitable for the models and generate better results. In this phase we
performed tasks like cleaning, filling the missing data, and removing
unwanted data. The data of Spotify had various attributes which were not
relevant, i.e., as not giving any useful information, like Title, Artist, Top
Genre, Energy, BPM, Liveness, etc.; hence these attributes are removed
in this phase.
Feature Selection:
Feature
🞂 selection is one of the main concepts of DM and Machine Learning. Where it
is a process of selecting necessary useful variables in a dataset to improve the results
of machine learning and make it more accurate, there are a lot of columns in the
predictor variable. So, the correlation coefficient is calculated to see which of them
are important and these are then used for training methods. From there, we get the top
factors that affect performance.
Results:
First latent factors capture properties of the most popular items,
while the additional latent factors represent more refined
features related to unpopular items.

Number of latent factors influences the quality of


long-tail items differently than head items.
[1] McFee, B., BertinMahieux,T., Ellis, D. P., Lanckriet, G. R. (2012, April). The million
song dataset challenge. In Proceedings of the 21st international conference companion
on World Wide Web (pp. 909916).ACM.

[2] Aiolli, F. (2012). A preliminary study on a recommender system for the million
songs dataset challenge. PREFERENCE LEARNING: PROBLEMS AND APPLICATIONS IN AI

[3] Koren, Yehuda. "Recommender system utilizing collaborative filtering combining


explicit and implicit feedback with both neighborhood and latent factor models."
U.S. Patent No. 8,037,080. 11 Oct. 2011.

[4] Cremonesi, Paolo, Yehuda Koren, and Roberto Turrin. "Performance of recommender
algorithms on top-n recommendation tasks." Proceedings of the fourth ACM
conference on Recommender systems. ACM, 2010

You might also like