Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/321528718

Personality Based Music Recommendation System

Thesis · December 2017


DOI: 10.13140/RG.2.2.18154.00962

CITATIONS READS

0 11,343

4 authors:

Abhishek Paudel Brihat Ratna Bajracharya


George Mason University Tribhuvan University
2 PUBLICATIONS   6 CITATIONS    6 PUBLICATIONS   7 CITATIONS   

SEE PROFILE SEE PROFILE

Miran Ghimire Nabin Bhattarai


Tribhuvan University Tribhuvan University
2 PUBLICATIONS   6 CITATIONS    1 PUBLICATION   0 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Personality Based Music Recommendation System View project

Photocrypt View project

All content following this page was uploaded by Abhishek Paudel on 05 December 2017.

The user has requested enhancement of the downloaded file.


TRIBHUVAN UNIVERSITY
INSTITUTE OF ENGINEERING
PULCHOWK CAMPUS

PERSONALITY BASED MUSIC RECOMMENDATION SYSTEM

By:
Abhishek Paudel (070/BCT/502)
Brihat Ratna Bajracharya (070/BCT/513)
Miran Ghimire (070/BCT/521)
Nabin Bhattarai (070/BCT/522)

A PROJECT WAS SUBMITTED TO THE DEPARTMENT OF ELECTRONICS AND


COMPUTER ENGINEERING IN PARTIAL FULFILLMENT OF THE REQUIREMENT
FOR THE BACHELOR’S DEGREE IN COMPUTER ENGINEERING

DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING


LALITPUR, NEPAL

AUGUST 1, 2017
ii

TRIBHUVAN UNIVERSITY
INSTITUTE OF ENGINEERING
PULCHOWK CAMPUS
DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING

The undersigned certify that they have read, and recommended to the Institute of Engineer-
ing for acceptance, a project report entitled ”Personality Based Music Recommendation
System” submitted by Abhishek Paudel, Brihat Ratna Bajracharya, Miran Ghimire and
Nabin Bhattarai in partial fulfillment of the requirements for the Bachelor’s Degree in Com-
puter Engineering.

Supervisor: Mr. Daya Sagar Baral, Lecturer


Department of Electronics and Computer Engineering
Institute of Engineering, Pulchowk Campus

Internal Examiner: Dr. Arun Kumar Timilsina, Deputy Director


CARD, Institute of Engineering, Pulchowk Campus

External Examiner: Mr. Mahesh Singh Kathayat, Associate Professor


Kathmandu Engineering College

Coordinator: Mrs. Bibha Sthapit, Deputy Head


Department of Electronics and Computer Engineering
Institute of Engineering Pulchowk Campus

DATE OF APPROVAL:
iii

COPYRIGHT

The authors have agreed that the Library, Department of Electronics and Computer Engi-
neering, Institute of Engineering, Pulchowk Campus may make this report freely available
for inspection. Moreover, the authors have agreed that permission for extensive copying of
this project report for scholarly purpose may be granted by the supervisors who supervised
the project work recorded herein or in their absence, by the Head of the Department wherein
the project report was done. It is understood that the recognition will be given to the authors
of this project and to the Department of Electronics and Computer Engineering, Pulchowk
Campus, Institute of Engineering in any use of the material of this report. Copying or pub-
lication or the other use of this report for financial gain without approval of the Department
of Electronics and Computer Engineering, Institute of Engineering, Pulchowk Campus and
authors’ written permission is strictly prohibited.

Request for permission to copy or to make any other use of the material in this report in
whole or in part should be addressed to:

Head
Department of Electronic and Computer Engineering,
Institute of Engineering, Pulchowk Campus,
Lalitpur, Nepal
iv

ACKNOWLEDGMENT

We would like to express our sincere gratitude to the Department of Electronics and Com-
puter Engineering at Institute of Engineering, Pulchowk Campus for providing us opportu-
nity to implement the knowledge gained over these years as major project for fourth year.

We would also like to express our deepest sense of gratitude and thanks to our supervisor
Mr. Daya Sagar Baral for providing invaluable insight and guidelines for this project.

We would also like to thank all of our friends who have directly and indirectly helped us in
doing this project. Last but not the least, we place a deep sense of appreciation to our family
members who have been constant source of inspiration for us.

Any kind of suggestion or criticism will be highly appreciated and acknowledged.

Authors:
Abhishek Paudel
Brihat Ratna Bajracharya
Miran Ghimire
Nabin Bhattarai
v

ABSTRACT

Music is an integral part of our life. We listen to music everyday as per our taste and mood.
With the advancement and increase in volume of digital content, the choice for people to
listen to diverse type of music has also increased significantly. Thus, the necessity of de-
livering the most suited music to the listeners has been an interesting field of research in
computer science. One of the important measures to deliver the best music to listeners could
be his/her personality trait. In this project, we aim to discover the impact of personality traits
on the collaborative filtering (user to user) which is one of the most popular recommendation
engines used today.

In order to determine the personality of a person, social media like Facebook can be a use-
ful platform where people express their views on different matters, share their opinions and
thoughts. Such expressions of thoughts and opinions can be leveraged to study the person-
ality traits of the person and hence use this information to try to enhance existing user to
user collaborative filtering techniques for music recommendation. Personality traits of the
users can be studied in terms of standard Big Five Personality Traits defined as Openness
to experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism [2].With
this project, we were able to determine that personality of the user can be one of the crucial
factor, in the recommendation of the music.

Keywords: Music Recommendation System, Collaborative filtering, Big Five Personality


Traits, Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism
vi

TABLE OF CONTENTS

TITLE PAGE i

LETTER OF APPROVAL ii

COPYRIGHT iii

ACKNOWLEDGMENT iv

ABSTRACT v

TABLE OF CONTENTS viii

LIST OF FIGURES x

LIST OF TABLES xi

LIST OF ABBREVIATIONS xii

1 INTRODUCTION 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Scope of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Understanding Of Requirement . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 Organization of the Report . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 LITERATURE REVIEW 6

3 THEORETICAL BACKGROUND 7
3.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Document Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 Obtaining a dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.4 Model Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.5 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.6 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.7 Multinomial Naive Bayes . . . . . . . . . . . . . . . . . . . . . . 15
vii

3.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


3.4 Recommender System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.1 Knowledge based Recommendation(Searching) . . . . . . . . . . . 19
3.4.2 Utility based Recommendation . . . . . . . . . . . . . . . . . . . . 20
3.4.3 Demographic based Recommendation . . . . . . . . . . . . . . . . 21
3.4.4 Content based Filtering . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.5 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.6 Hybrid Recommendation System . . . . . . . . . . . . . . . . . . 24
3.5 Issues in recommendation system . . . . . . . . . . . . . . . . . . . . . . 25
3.6 Recommendation Model for Experimentation . . . . . . . . . . . . . . . . 26
3.6.1 Global Baseline Algorithm . . . . . . . . . . . . . . . . . . . . . . 27
3.6.2 User to User collaborative filtering . . . . . . . . . . . . . . . . . . 27
3.6.3 Combination of Global Baseline and User to User collaborative fitering 28
3.6.4 Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7 Model Evaluation for Recommender System . . . . . . . . . . . . . . . . . 31

4 METHODOLOGY 32
4.1 Requirement Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Functional Requirement . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Non-functional Requirement . . . . . . . . . . . . . . . . . . . . . 32
4.2 Feasibility Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Operational Feasibility . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.2 Technical Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.3 Economic Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.4 Legal Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.5 Scheduling Feasibility . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Software Development Approach . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 SYSTEM DESIGN 37
5.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 ER Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.5 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.6 Context Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.7 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.8 Front End of the System(User Interface) . . . . . . . . . . . . . . . . . . . 47
5.9 Back End of the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
viii

6 TOOLS AND TECHONOLOGIES 49


6.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 Django . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.4 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.5 NLTK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.6 Facebook Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.7 HTML/CSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.8 JavaScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.9 PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.10 Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7 RESULT 52
7.1 Big Five Personality Frequency Distribution . . . . . . . . . . . . . . . . . 52
7.2 Logistic Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.3 Naive Bayes Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.4 Evaluation of Recommendation System . . . . . . . . . . . . . . . . . . . 55
7.4.1 Latent Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

8 LIMITATIONS AND FUTURE ENHANCEMENTS 62

9 CONCLUSION 63

REFERENCES 64

APPENDIX A 67

APPENDIX B 69
ix

List of Figures

3.1 Graph of logistic (sigmoid) function . . . . . . . . . . . . . . . . . . . . . 11

4.1 Scrum Software Development Cycle . . . . . . . . . . . . . . . . . . . . . 35

5.1 Block Diagram of the System . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 Tasks performed by preprocessor unit within the system . . . . . . . . . . . 38

5.3 Tasks performed by classifier unit within the system . . . . . . . . . . . . . 39

5.4 Tasks performed by recommedation unit within the system . . . . . . . . . 40

5.5 Recommedation models used within the system . . . . . . . . . . . . . . . 40

5.6 Use Case Diagram of the System . . . . . . . . . . . . . . . . . . . . . . . 41

5.7 ER diagram of the System . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.8 Activity Diagram of the System . . . . . . . . . . . . . . . . . . . . . . . 45

5.9 Context Diagram of the System . . . . . . . . . . . . . . . . . . . . . . . . 46

5.10 Data Flow Diagram of the System: level-0 . . . . . . . . . . . . . . . . . . 47

7.1 Class Frequency Distribution of Users . . . . . . . . . . . . . . . . . . . . 52

7.2 F-Measure vs Number of Iterations (Logistic Regression) . . . . . . . . . . 53

7.3 F-Measure of Naive Bayes Model . . . . . . . . . . . . . . . . . . . . . . 54

7.4 RMSE of various models used in the system . . . . . . . . . . . . . . . . . 56

7.5 RMSE of Collaborative Filtering with User Rating Matrix . . . . . . . . . 56

7.6 RMSE of Collaborative Filtering with similarity interm of Personality Matrix 57

7.7 RMSE of Collaborative Filtering combined with Global Baseline with User
Rating Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
x

7.8 RMSE of Collaborative Filtering combined with Global Baseline with User
Personality Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7.9 RMSE of Collaborative Filtering with User Rating and Personality Matrix . 58

7.10 RMSE of Collaborative Filtering with User Rating and Personality Matrix
combined with Global Baseline . . . . . . . . . . . . . . . . . . . . . . . . 59

7.11 RMSE of Matrix Factorization vs Number of Iterations . . . . . . . . . . . 60

7.12 RMSE of Matrix Factorization vs Number of Latent Factors . . . . . . . . 60

9.1 Home Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

9.2 Recommended Songs Page . . . . . . . . . . . . . . . . . . . . . . . . . . 67

9.3 User Profile Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

9.4 Help Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68


xi

List of Tables

7.1 Confusion Matrix of Logistic Regression Model (Openness) . . . . . . . . 53

7.2 Confusion Matrix of Logistic Regression Model (Conscientiousness) . . . . 53

7.3 Confusion matrix of Logistic Regression Model (Extraversion) . . . . . . . 53

7.4 Confusion Matrix of Logistic Regression Model (Agreeableness) . . . . . . 53

7.5 Confusion matrix of Logistic Regression Model (Neuroticism) . . . . . . . 53

7.6 Confusion Matrix of Naive Bayes (Openness) . . . . . . . . . . . . . . . . 54

7.7 Confusion Matrix of Naive Bayes (Conscientiousness) . . . . . . . . . . . 54

7.8 Confusion Matrix of Naive Bayes (Extraversion) . . . . . . . . . . . . . . 54

7.9 Confusion Matrix of Naive Bayes (Agreeableness) . . . . . . . . . . . . . 55

7.10 Confusion Matrix of Naive Bayes (Neuroticism) . . . . . . . . . . . . . . . 55

7.11 RMSE of Recommendation System Models . . . . . . . . . . . . . . . . . 55


xii

LIST OF ABBREVIATIONS

API Application Package Interface


CBF Content Based Filtering
CF Collaborative Filtering
CSS Cascading Style Sheet
DFD Data Flow Diagram
ERD Entity Relationship Diagram
HTML Hyper Text Markup Language
IR Information Retrieval
JS JavaScript
ML Machine Learning
MVT Model-View-Template
NLP Natural Language Processing
NLTK Natural Language ToolKit
ORDBMS Object-Relational Database Management System
PBMRS Personality Based Music Recommendation System
PoS Part of Speech
UI User Interface
1

1. INTRODUCTION

On the Internet, where the number of choices is overwhelming, there is need to filter, priori-
tize and efficiently deliver relevant information in order to alleviate the problem of informa-
tion overload, which has created a potential problem to many Internet users. Recommender
systems solve this problem by searching through large volume of dynamically generated
information to provide users with personalized content and services.

Besides, these days social networks have become widely used and popular mediums for
information dissemination as well as the facilitators of social interactions. User contribution
and activities provide a valuable insight into individual behavior, experiences, opinions and
interests.Considering that personality, which uniquely identifies each one of us, affects a lot
of aspects of human behavior, mental process and affective reactions, there is an enormous
opportunities, for adding new personality based qualities in order to enhance the current
collaborative filtering recommendation engine.

Previous work has shown that the information in user’s social media account is reflective of
their actual personalities, not an idealized version of themselves which makes a broad user
base social networking site Facebook, an ideal platform in order to study the personality
traits of an user. Several well studied personality models have been proposed, among which
the ”Big Five Model” as known as ”Five Factor Model”(FFM) is the most popular one [2].

1.1. Background

The Big Five Model of personality dimensions has emerged as one of the most well-researched
and well-regarded measures of personality structure in recent yeas [2]. The model five do-
mains of personality: Openness, Conscientiousness, Extroversion, Agreeableness and Neu-
roticism, were conceived by Tupes and Christal [3] as the fundamental traits that emerged
from analyses of previous personality tests. McCrae, Costa and John [4] continued five fac-
tor model research and consistently found generality across age, gender and cultural lines.
The Big Five Model traits are characterized by the following:

1. Openness to Experience: Openness is a general appreciation of art, emotion, adven-


ture, unusual ideas, imagination, curiosity, and variety of experience. The tend to
be more creative and more aware of their feelings.They are also most likely to hold
unconventional beliefs. Some sample items used by person with this traits are:
2

• I have an excellent ideas.


• I am quick to understand things.
• I am full of ideas.

2. Conscientiousness: Conscientiousness is a tendency to display self-discipline, act du-


tifully and strive for achievement againts measures or outside expectations. It is related
to the way in which people control, regulate and direct thier impules.The level of con-
scientiousness rises among young adults and then declines among older adults. Some
sample items used by person with this traits are:

• I follow a schedule.
• I am exact in my work.
• I am always prepared.

3. Extraversion: Extraversion is characterized by breadth of activities, surgency from


external activity/situations and energy creation from external means. Extraverts enjoy
interacting with people and are often perceived as full of energy. They tend to be
enthusiastic, action-oriented individuals. They posses high group visibility, like to talk
and assert themselves. Some sample items used by person with this traits are:

• I love the life of the party.


• I don’t mind being the center of attention.
• I feel comfortable around the people.

4. Agreeableness: The agreeableness trait reflects individual differences in general con-


cern for social harmony. Agreeable individuals value getting along with others. They
are generally considerate, kind, generous, trusting and trustworthy, helpful and willing
to compromise their interests with others. They also have optimistic view of human
nature. Some sample items used by person with this traits are:

• I have a soft heart.


• I am interested in people.
• I take time out for others.

5. Neuroticism: Neuroticism is the tendency to experience negative emotions, such as


anger, anxiety or depression.

• I get irritated easily.


• I get stressed out easily.
• I get upset easily.
3

1.2. Motivation

The growth in the amount of digital information and the number of users in the Internet
have created a potential challenge of information overload which hinders timely access to
items of interest on the Internet. Thus there has been increased in the demand for the best
recommender system more than ever before. And music is essential to many of our lives. We
listen to it when waking up, while in transit at work, and with our friends. For many, music
is like a constant companion. It can bring us joy and motivate us, accompany us through
difficult times and alleviate our worries. Hence music is much more than mere entertainment,
but as stated earlier, growth in the amount of digital information have created a potential
challenge of information overload where a recommendation engine plays a very crucial role
in filtering the vital fragment out of large amount of dynamically generated information
according to user’s preferences, interest or observed behavior about item. Hence,with this
project, we attempt to devise a method to improve the collaborative filtering engine via the
use of personality in order to compute the similar user’s for the recommendation of music as
it is believed, person with similar personality has similar taste in music.

1.3. Objectives

The objectives of the project can be summarized with the points below:

1. To find out if the personality of the user can be a crucial factor in the music recom-
mendation system.

2. To find out if collaborative recommendation engine can be enhanced via the use of
personality for similar user computation.

1.4. Problem Statement

Music is an essential part of human life. Music is the pleasant sound that leads us to ex-
perience harmony and higher happiness. With the advancement in technology, music has
significantly progressed and increased in terms of quality and volume. The type of music
people create and listen differs according to place and culture. The taste of music even dif-
fers from person to person and even in moods of same person. So, it is very useful if we
could determine some method to find what kind of music a person might be interested in lis-
tening and use this finding to recommend music to him/her. Collaborative filtering is one the
most popular filtering techniques today and with this project we aim to enhance it.For this,
4

we have assumed that the personality of the user might be one of the key factor in his/her
music listening habit. Hence, via this project, we are to see if the personality of the user
might have any impact on the collaborative filtering enhancement assuming the correlation
between the personality and music listening habit exists.

1.5. Scope of the Project

The most important scope of the project will be to discover if the personality traits of an
individual can be used for the enhancement of the recommendation engine in order to provide
the more personalized content to the user as a recommendation.

1.6. Understanding Of Requirement

Nowadays, digital data on the Internet has been massive than ever, which have created a
potential challenge of information overload, hindering timely access of items of interest on
the Internet. So, there is a requirement for the better recommendation system than ever.
Thus with this project, we will try to find out if the recommendation engine can perform
better if personality of the individual is used as one of the metrics for a recommendation.
Nowadays, social networking sites have become vastly popular among several people of
different religion, caste, ethnic groups and different location of the world, which shows how
culturally diverse the people around the world is. And in this diversity, we can also see
the varieties in the personalities of people living in different parts of the world. The main
purpose of the social networking sites are to connect different peoples from different parts of
the world which makes it the most suitable platform in order to study the personality traits
of the user. Thus studied personality traits might be used in recommendation engine in order
to improve it’s efficiency.

1.7. Organization of the Report

The organization of the report is done in the following ways:

1. Chapter 1: It includes the introduction about the problem and the method we are trying
to employ to solve.

2. Chapter 2: It includes Literature Review which includes the works related to the
project and the notable works prevailing prior to this project development with their
results.
5

3. Chapter 3: It includes the theoretical background for the development of the project.

4. Chapter 4: It includes methodology used for the development of the project.

5. Chapter 5: It includes system design techniques along with the use case, activity dia-
gram used for the development of the system.

6. Chapter 6: It includes tools and technologies used for the development of the system.

7. Chapter 7: It includes the analysis and the result of the experiment we tried in the
project.

8. Chapter 8: It includes the limitation and future enhancements of the project.

9. Chapter 9: It describes the conclusion of the project.


6

2. LITERATURE REVIEW

Recommender System is a rich problem research area. It has abundant practical applications
also defined as systems which promote recommendation of people(normally seen as service
provider) as well as promote recommendation of products/services.In computers, Recom-
mender Systems begin to appear in 90’s, they are applications that provide personalized
advice for users about products or services they might be interested in [9].

In 2005, Gonzalez [8] proposed a first model based on psychological aspects, he uses Emo-
tional Intelligence to improve on-line course recommendations.

In 2008, Recommender System based on personality traits [10] was published, experiment-
ing on recommender system with the personality. The basically tired to recommend a person,
in a voting scenario. Here recommendation was based on those psychological aspect of can-
didates and an imaginary person who they dreamed as ideal candidate. System used 30 facets
of big 5 personality traits and only big 5 personality traits as the psychological measures of
the users.

In 2014, Improving Music Recommender System. What can we learn from research on
music tastes? [5] was published which discuss about the music tastes from psychological
point of view and uses psychology of music to identify the correlates of music tastes and to
understand how music tastes are formed and evolve through time. It reveals the importance
of social influences on music tastes and provides a basic suggestion for the design of music
recommender system.

Also in 2014, Enhancing Music Recommender System with Personality Information andE-
motional States [6] was published, that researches to improve the music recommendation
by including personality and emotional states. The proposal offers a great insight on how a
recommendation engine can be improved with the personality via the series of steps.

In 2016, A Comparative Analysis of Personality Based Music Recommendation System [7]


was published which describes a preliminary study on considering information about the
target user’s personality in music recommendation system. It proposes a five different kind
of models for the personality based music recommendation system.

In this project, we are continuing with the experimentation of A ComparativeAnalysis of


Personality Based Music Recommendation System whereby, we tried to study the effect of
personality based system on collaborative filtering.
7

3. THEORETICAL BACKGROUND

3.1. General

The project tries to study the impact of the personality on the collaborative recommendation
engine. Thus, initially personality of the user has to be predicted which can be done via the
user’s study on social media i.e Facebook where a status update might be one of the good
metric to predict the personality using ”document classification” technique and the traits of
the person are used as the similarity metric for similar user computation on ”collaborative
filtering”.

3.2. Document Classification

Document Classification is an example of Machine Learning(ML) in the form ofNatural


Language Processing(NLP). By classifying text, we are aiming to assign one or more classes
or categories to a document, making it easier to manage or sort. Broadly speaking, there are
two classes of ML techniques:

1. Unsupervised: In unsupervised method, model is responsible for clustering of a similar


document.

2. Supervised: In supervised methods, a model is created based on training set. Cate-


gories are predefined and documents within the training dataset are manually tagged
with one or more category labels. A classifier is then trained on the dataset which
means it can predict a new document’s category from then on. ”This is the technique
that has been used in the project.”

Supervised Document Classification comprises of series of steps which are briefly described
below:

3.2.1. Obtaining a dataset

The quality of the tagged dataset is by far the most important component of a statistical NLP
classifier. The dataset needs to be large enough to have an adequate number of document in
each class. The datasets also needs to be of a high enough quality in terms of how distinct
8

the documents in the different categories are from each other to allow a clear delineation
between categories [11].

3.2.2. Preprocessing

Preprocessing or Data preprocessing is a data mining technique that involves transforming


raw data into an understandable format(as per requirement of the project may differ from the
need of the project). It comprises of various steps:

• Data Cleaning: Data is cleansed through processes such as filling in missing values,
smoothing the noisy data or resolving the inconsistencies in the data.

• Data Integration: Data with different representation are put together and conflicts
within the data are resolved.

• Data Transformation: Data is normalized, aggregated and generalized.

• Data Reduction: This step aims to present a reduced representation of the data in a
data warehouse.

• Data Discretization: This involves the reduction of a number of values of a continuous


attribute by dividing the range of attribute intervals.

The preprocessing carried out in the project are:

1. Removal of StopWords: In computing, stop words are words which are filtered out
prior to, or after processing of natural language data(text). There is not one definite
list of stop words which all tools use and such a filter is not always used. Any group
of words cab be chosen as the stop words for a given purpose. By removing the
stop words during data preprocessing we reduce the computational complexity of the
program and hence the project can run in an effective way [12].

2. Convert every characters to lowercase: This step is carried out in order to remove the
distinction between the same words written in upper and lower case, so that model
doesn’t treat them differently.

3. Sentence level processing/tokenization: In lexical analysis, tokenization is the process


of breaking a stream of text up into words, phrases, symbols or other meaningful
elements called tokens. Here in the project sentences are tokenized into the words
and some more preprocessing are applied after that.
9

4. Stemming: In linguistic morphology and information retrieval, stemming is the pro-


cess for reducing inflected(or sometimes derived) words to their stem, base or root
form generally a written word form. The stem need not be identical to the morpholog-
ical root of the word. It is usually sufficient that related words map to the same stem,
even if this stem is not itself a valid root. It is generally a implemented as the rule
based system for stemming for words [13].

5. PoS tagging: In corpus linguistics, a part of speech tagging, also called grammatical
tagging or word-category disambiguation, is the process of marking up a word in a
text(corpus) as corresponding to a particular part of speech, based on it’s definition as
well as context i.e relationship with adjacent and related words in a phrase, sentence,
or paragraph [14]. This aids in removal of unwanted part of speech in the sentences
and helps to build the better model. The parts of speech used in our model are verb,
adverb and adjective.

3.2.3. Feature Extraction

1. Bag of Words: The bag of words model is a simplifying representation used in natural
language processing and information retrieval(IR). In this model, a text (such as sen-
tence or a document) is represented as the bag disregarding grammar and even word
order but keeping multiplicity. The bag of words model is commonly used in methods
of document classification where the frequency or occurrence of each word is used as a
feature for training a classifier. It is comparable to the skip gram model of the unigram
in the language model [15]

2. Feature Vector Creation: Feature Vector Creation is the process of conversion of bag
of words model of into the vector form where by each words are represented with their
frequencies. For feature vector creation initially a vocabulary is built using all of the
corpus available in dataset which helps to create a vector space model for words and
feature vector are derived from the each corpus accordingly [15].

3.2.4. Model Creation

It is a classification algorithm used for the classification of personality in the project. Here
we have implemented Naive Bayes, Logistic Regression as the classification algorithm.
10

3.2.5. Logistic Regression

In general, when we make a machine learning based software, we are basically trying to
come up with a function to predict the output for future inputs based on the experience it has
gained through the past inputs and their outputs. The past data is referred to as the training
set.

Logistic regression[31] (also known as logit regression or logit model) is one of the most
popular machine learning algorithms used for classification problem. Given a training set
having one or more independent (input) variables where each input set belongs to one of
predefined classes (categories), what logistic regression model tries to do is come up with a
probability function that gives the probability for the given input set to belong to one of those
classes. The basic logistic regression model is a binary classifier (having only 2 classes), i.e.,
it gives the probability of an input set to belong to one class instead of the other. If the
probability is less than 0.5, we can predict the inputs set to belong to the latter class. But
logistic regression can be hacked to work for multi-class classification problem as well by
using concepts like “one vs. rest”. What we basically do is create a classifier for each class
that predicts the probability of an input set to belong to that particular class instead of all
other classes. It is popular because it is a relatively simple algorithm that performs very well
on wide range of problem domains.

Actually, logistic regression is one of the techniques borrowed by machine learning from the
field of statistics. Logistic regression was developed by statistician David Cox in 1958. The
binary logistic model is used to estimate the probability of a binary response based on one
or more predictor (or independent) variables (called features).

The name “logistic” comes from the probability function used by this algorithm. The logistic
function (also known as sigmoid function) is defined as:

ex 1
logistic(x) = sigmoid(x) = = (3.1)
1+e x 1 + e−x

The graph of this function is given in Figure 3.1.

The logistic regression classifier uses the logistic function of the weighed (as well as biased)
sum of the input variables to predict the probability of the input set belonging to a class
(or category). The probability function is already fixed. The only thing that we can change
while learning from different training set is the set of weight parameters (θ ) assigned to each
feature.
11

Figure 3.1: Graph of logistic (sigmoid) function

3.2.5.1 The Algorithm

Let,

m = number of samples in training set


n = number of features ≥ 1
 
x0
 
x1 
 
x = input feature vector = x2  , x0 = 1(always)
 
.
 .. 
 
xn
y = class ∈ {1, 0} with 1 as primary class

The training set for this machine learning algorithm is the set of m training samples (exam-
ples).
n o
training set = (x(1) , y(1) ), (x(2) , y(2) ), · · · , (x(m) , y(m) ) (3.2)

The weight parameters for the (n + 1) features are given by:


 
θ0
 
θ1 
 
θ = θ2 
 
.
 .. 
 
θn
12

Then the hypothesis function used to predict the output y for input feature set x with param-
eter θ is given by:
!
n
1
= sigmoid θ T x =

hθ (x) = sigmoid ∑ θixi T (3.3)
i=0 1 + e−θ x

Now the aim of this machine learning algorithms is to adjust the parameters θ to fit the
hypothesis hθ (x) with the real output y of training set with minimum cost (error). For that,
we need to define the cost function, preferably, a convex cost function. There are different
types of cost functions. Linear regression, for instance, uses the sum of the squares of the
errors as the cost function. But in logistic regression, since the output is not linear (even
though the input is), this cost function turns out to be non-convex and there are not efficient
algorithms that can minimize a non-convex function. Therefore, we define a logarithmic cost
function J(θ ) for logistic regression as follows:

1 m   
(i) (i)

J(θ ) = ∑ Cost h θ x , y (3.4)
m i=1

where,

− log(h) for y = 1
Cost(h, y) =
− log(1 − h) for y = 0
= −y log(h) − (1 − y) log(1 − h) (3.5)

After we define an appropriate convex cost function J(θ ), the machine learning algorithm
basically boils down to finding parameter θ that minimizes J(θ ).

min J(θ ) (3.6)


θ

This can be achieved using various optimization algorithms. Some notable ones are Gradient
descent, BFGS, Quasi-Newton, L-BFGS, etc. The gradient descent is the simplest one. It is
a hill-climbing optimization algorithm that tries finds the local optima from the start point.
But since our cost function J(θ ) is convex, there is only one minima and that is the global
minima.

To find parameter θ that minimizes the cost function J(θ ), we initial θ with small random
13

values and then iterate the following until convergence:

∂ J(θ )
θj = θj −α ; j ∈ 0, 1, 2, . . . , n; α = learning rate
∂θj

where,
m 
∂ J(θ )   
(i)
= ∑ hθ x(i) − y(i) x j (3.7)
∂θj i=1

Note that all θ j ’s must be updated simultaneously. This concept is called batch learning,
contrary to online learning where the parameter is updated separately for every training
example.

The resulting parameter θ that minimizes the cost function J(θ ) is the parameter of the
learning model. Then we can use the hypothesis hθ (x) to predict the output y for any input
feature set x. The output y will be a value in the range (0, 1). The output can be interpreted
as the probability of the given input set belonging to the class 1 (primary class).

Other advanced optimization algorithms such as BFGS, L-BFGS, Quasi-Newton, etc. are
more efficient that the basic gradient descent and also has the advantage that we don’t have
to manually select the learning rate (α). These advanced optimization algorithms will auto-
matically select the appropriate value of α to maximize efficiency.

3.2.5.2 Other Considerations

3.2.5.2.1 Feature Scaling

Feature scaling is the process of scaling (or normalizing) all the features to a limit of [−1, 1]
or [0, 1]. This is required because unscaled features causes some features to get higher pri-
ority implicitly and it reduces the accuracy of the learning algorithm. Feature scaling can be
done in following ways:

(i)
(i) x j − min(x j )
xj = ∈ [0, 1]
max(x j ) − min(x j )

or
(i)
(i) xj −xj
xj = ∈ [−1, 1]
max(x j ) − min(x j )
14

or
(i)
(i) xj −xj
xj = (for normal distribution)
σx j

3.2.5.2.2 Regularization

Regularization [33] is the process of scaling down the values of parameter θ to reduce the
problem of over-fitting. Over-fitting is the condition when the learning algorithm satisfies
the training set very precisely but doesn’t satisfy test data (not included in training set).
Regularization is done by introducing a regularization parameter λ term in the overall cost
function J(θ ).

1 m   
(i) (i)
 λ n
J(θ ) = ∑ Cost hθ x
m i=1
,y +
2m ∑ θ j2 (3.8)
j=0

The corresponding first-order derivatives then becomes:


m    
(i) (i) (i)
−y


 ∑ hθ x xj for j = 0
∂ J(θ ) 
= i=1
m 
∂θj
 
(i) (i)

(i) λ
 ∑ hθ x −y xj + θj for j ∈ 1, 2, . . . , n



i=1 m

Regularization helps to solve over fitting problem in machine learning. Simple model will
be a very poor generalization of data. At the same time, complex model may not perform
well in test data due to over fitting. It is necessary to choose the right model in between
simple and complex model. Regularization helps to choose preferred model complexity, so
that model is better at predicting. Regularization is nothing but adding a penalty term to the
objective function and control the model complexity using that penalty term.

3.2.6. Naive Bayes

Naive Bayes is one of the model used for the classification under the“Bayesian Classifier”.
In machine learning, Naive Bayes classifier are the family of simple probabilistic classifiers
based on Bayes theory with the assumption of ”independence” between the features. If the
dependence between the features exists Bayesian Network will be used for the classification.
The major advantage of the Naive Bayes is it’s simplicity and highly scalability and ability to
work on huge dataset too. Despite the oversimplified assumption, Naive Bayes have worked
quite well in many complex real world situation. They are probabilistic, which means that
15

they calculate the probability of each category for a given sample, and output the category
with the highest one. It is comparable to the unigram language model created on each set
of classes. It is widely used for text classification which used in various fields like email
sorting, language detection etc.
There are various variation of Naive Bayes which are:

1. Multi-variate Bernoulli Naive Bayes: It is used whenever the feature vectors are binary
i.e occurrence of the feature is important rather than it’s count.

2. Multinomial Naive Bayes: It is typically used for discrete counts i.e whenever the
frequency of occurrence of the feature vector is important.

3. Gaussian Naive Bayes: In this model, it is assumed that the feature follows a normal
distribution. Instead of discrete counts, there are continuous features.

In the project, multinomial Naive Bayes is used as the classifier as for personality prediction
the frequency of occurrence of each feature is the feature vector is important and distribution
of the feature is in discrete form.

3.2.7. Multinomial Naive Bayes

3.2.7.1 Problem Formulation

In order to understand how Naive Bayes classifiers [16] work, briefly understanding the
concept of Bayes’ rule is important. The probability model was formulated by Thomas
Bayes.
Given the set of features (x1 , x2 , x3 , · · · , xn ),
Mathematically Bayes theorem can be written as:

(P(Ck ) ∗ P(x|Ck )
P(Ck |x) = (3.9)
P(x)

where,
P(Ck |x) is the posterior probability of class ’c’ given the attributes x
P(Ck ) is the prior probability of class
P(x|Ck is the likelihood which is the conditional probability of attributes being in the given
class Ck .
P(x) is called evidence
k is used to denote the class label
16

Naive Bayes makes the independence assumption, so that 3.9 can be written as:

(P(Ck ) ∗ P(x1 |Ck ) ∗ P(x2 |Ck ) ∗ ..... ∗ (P(xn |Ck )


P(Ck |x) = argmax
P(x) (3.10)
≈ (P(Ck ) ∗ P(x1 |Ck ) ∗ P(x2 |Ck ) ∗ ..... ∗ (P(xn |Ck )

which is the required equation of Naive Bayes used for the classification of document.

3.2.7.2 Additive Smoothing

In statistics, additive smoothing [17], also called Laplace smoothing is a technique used
to smooth categorical data. Give an observation x = (x1 , x2 , · · · , xd ) from a multinomial
distribution with N trials and paramater vector θ = (θ1 , θ2 , · · · , θd ), a smoothed version of
data given the estimator:
xi + α
θi = (3.11)
N + αd
When α = 1 in 3.11, it’s called add one Laplace smoothing which has been used as the
smoothing technique in the project in order to cancel out the effect of zero term by assigning
them a small probability.

3.2.7.3 Underfitting

Underfitting [16] in the Naive Bayes Classifier, can occur if the probabilities result from
conditional and prior are very small, in this case in order to prevent the program from un-
derfitting, resulting from the multiplication of the very small terms, log can be used in 3.10,
after which final equation becomes:

k
P(Ck |x) = log p(Ck ) + ∑ log(x|Ck ) (3.12)
i=1
which is the final equation used in the project for the classification of user’s status on Face-
book into the personality.

3.2.7.4 Overfitting

In order to reduce the overfitting and finding the best model for the classifier, kth -fold cross
validation, technique has been used. The major advantage of this method is that all observa-
tions are used for both training and testing and each observation is used for testing exactly
17

once [18].
In the project 5th -fold cross validation technique has been applied in which the data set is
divided into the 5 test cases and train cases and classifier is trained on each of those cases.

3.2.7.5 Optimization

Naive Bayes classifier, as seen in 3.10, classifies features set into a class via the multiplica-
tion of the prior and conditional probability which requires their computation each time the
classifier tries to classify the feature into class.

In order to solve the above problem, conditional and prior probability is precomputed and
stored in “HashTable” [16], where the conditional probability of each feature set is stored,
which can be easily be retrieved and used for the classification. Here hash table has been
implemented as dictionary object in python as dictionary in low level are stored as hash
value pair in memory.

After the detection of the personality, this information is used as the metric for computation
of similar user, in the recommendation engine(collaborative filtering) in order to observe it’s
effect on the recommendation.

3.3. Data Analysis

Data Analysis [19] is a primary component of data mining and business intelligence and is
key to gaining the insight that derives business decisions. Data analysis is a proven way
for organizations and enterprises to gain the information they need to make better decisions,
server their customers and increase productivity and revenue. Besides, with the growth of
internet, there is so much of digital data and information available and data analysis has
become more necessary than ever. Some of the data analysis techniques are:

• Descriptive: It is analysis techniques that uses aggregation and data mining to provide
insight into past and answer ”What has happened?”. It involves the calculation of
simple measures of composition and distribution of variables. They are often used to
describe relationship in data. Such as: total stock in inventory, average money spent
per customer etc.

• Predictive: It is the process of extracting information from existing data sets in or-
der to determine patterns and predict future outcomes and trends. It encompasses a
variety of statistical techniques from predictive modeling, machine learning and data-
18

mining.Such as: predicting what items customers will purchase together, how sales
might close at the end of the year etc.

• Prescriptive:It is a data analysis technique that allows user to prescribe a number of


different possible actions to and guide them towards a solution. In a nut-shell, these
analytics are all about providing advice.

– Recommender System: It is one of the prescriptive data analysis technique used


to recommend an item to user.

3.4. Recommender System

Recommender systems were originally defined as ”people provide recommendations as in-


puts, which the system then aggregates and directs to appropriate recipients”, like using ex-
perts knowledge as input for the system to enrich it’s ability to recommend to people accord-
ing to the given knowledge. However, now the term has a broader connotation,describing any
system that produces individualized recommendations as output or has the effect of guiding
the user in a personalized way to interesting or useful objects in a large space of possible
options [20].

Recommender systems are information filtering systems that deal with the problem of infor-
mation overload by filtering vital information fragment out of large amount of dynamically
generated information according to user’s preferences, interest, or observed behavior about
item. Recommender system has the ability to predict whether a particular user would prefer
an item or not based on the user’s profile [23].

Recommender system are beneficial to both service providers and users.Recommendation


systems have also proved to improve decision making process and quality. In e-commerce
setting, recommender system aids to enhance revenues, for the fact that they are effective
means of selling more products. In scientific libraries, recommender system supports users
by allowing them to move beyond catalog searches. Therefore, the need to use efficient and
accurate recommendation techniques within a system that will provide relevant and depend-
able recommendations for users cannot be over-emphasized.

Recommender system typically produce a list of recommendation in one of two ways- through
collaborative and content filtering. Collaborative filtering approaches build a model from a
user’s past behavior(items previously purchased or selected and numerical rating given to
those items) as well as similar decisions made by other users. This model is then used to
predict items(or ratings for items) that the user may have an interest in. Content-based fil-
19

tering approaches utilize a series of discrete characteristics of an item i.e item-profile based
on the purchase history of the user i.e with the help of user-profile in order to recommend
items. The Hybrid recommender system is the one in which the one or more recommender
system is combined for the recommendation. Besides there are several categorization of
recommendation system which is enlisted below:

1. Personalized Recommendation: It involves online suggestion of data in any format that


is relevant to each and every user, based on the user’s implicit behavior and provided
details.

(a) Knowledge Based Recommendation(Searching)


(b) Utility Based Recommendation
(c) Demographic Based Recommendation
(d) Content Based Recommendation
(e) Collaborative Recommendation
i. Memory Based(user based, item based)
ii. Model Based(clustering techniques, association techniques, Bayesian net-
works, neural networks, latent factor)
(f) Hybrid Recommendation
i. Weighted
ii. Switching
iii. Mixed
iv. Feature Combinations
v. Cascade
vi. Feature Augmentation
vii. Meta Level

2. Non-Personalized Recommendation: It is a recommendation system that recommends


items to consumers based on what other consumer have said about the product in an
average i.e the recommendations are independent of the customer, so all customers
gets the same recommendation.

3.4.1. Knowledge based Recommendation(Searching)

Knowledge based recommendation system [22] is based on the explicit knowledge about
item classification, user interest and recommendation standard(which item should be recom-
mend in which feature). It is an alternative approach to the collaborative filtering.
20

3.4.1.1 Working Mechanism

1. Recommendation: Here recommendation is made based on explicit knowledge.

3.4.1.2 Pros

• Free from cold-start problem.

• Sensitive to changes of preference.

• Can include non-product features.

• Can map from user needs to products.

3.4.1.3 Cons

• Suggestion ability is static(no learning model)

3.4.2. Utility based Recommendation

Utility based recommender system make recommendation based on the calculation of the
utility of each item for the user. Utility based recommender techniques uses multi-attribute
utility function based on item rates that user offer to describe user preferences and apply the
utility function to calculate item utility for each user.

3.4.2.1 Working Mechanism

• Recommendation: Compute the utility of each object for the user and recommend
accordingly.

3.4.2.2 Pros

• Sensitive to changes of preference.

• Can include non-product features.

• No ramp-up required.
21

3.4.2.3 Cons

• Suggestion ability static.

• User must input utility function.

3.4.3. Demographic based Recommendation

Demographic recommendation technique [21] uses information about the user only. The de-
mographic types of users include gender, age, knowledge of language, disabilities, ethnicity,
mobility, employment status, home ownership and even location.The system recommends
items according to demographic similarities of the users.

3.4.3.1 Working Mechanism

1. User profile creation: User profile is created based on their demographic information.

2. User-item matrix construction: The user-item rating matrix is constructed based on the
rating of items by the user.

3. Recommendation : In order to recommend the item to user, the similar users are com-
puted with the help of cosine similarity, then the rating for that item by that user is
computed with the help of rating of neighborhood of similar user(average or weighted
average).

3.4.3.2 Pros

• Can identify cross-genre items.

• Domain knowledge about item is not needed.

• Adaptive: quality improves over time.

3.4.3.3 Cons

• Gathering of demographic data might lead to privacy issues.

• Gray sheep problem

• Stability vs Plasticity problem


22

3.4.4. Content based Filtering

Content based technique [22] is a domain-dependent algorithm and it emphasizes more on


the analysis of the attributes of items in order to generate predictions. When documents
such as web pages, publications and news are to be recommended, content-based filtering
technique is the most successful. In content-based filtering technique, recommendation is
made on the user profiles using features extracted from the content of items the user has
evaluated in the past.

3.4.4.1 Working Mechanism

1. Item Profile Creation: Here initially, item profile is created in order with the help of
it’s feature. In case of movie, music meta data available can be used for item profile
creation.

2. User Profile Creation: User profile is created, based on their interaction with the item
i.e with the help of the their rating on the items. Hence user profile is created with the
help of the item profile either by taking the average of item-profile or weighted average
of item-profile.

3. Recommendation: Cosine similarity is used for the similarity computation between


the user profile and the profile of items to be recommended, items with the highest
similarity are recommended to the user.

3.4.4.2 Pros

• Implicit feedback is sufficient.

• Adaptive quality improves over time.

3.4.4.3 Cons

• New user ramp-up problem(cold start problem).

• Quality dependent on large historical data set.

• Stability Vs Plasticity problem.


23

Content based filtering can outperform the collaborative, whenever the the ratio of item to
user is very high.

3.4.5. Collaborative Filtering

Collaborative filtering [23]is a domain-independent prediction technique used for content


that cannot easily and adequately be described by meta data such as movies and music. Col-
laborative filtering technique works by building a database(user-item) matrix of preference
for items by users. It then matches users with relevant interest and preferences by calculating
similarities between their profiles to make recommendations. Such users build a group called
neighborhood. An user get recommendation to those items that he has not rated before but
were already positively rated by users in his neighborhood. Recommendation that are pro-
duced by CF can be either prediction or recommendation. Prediction is a numerical value
expressing Ri j , expressing the predicted score of item ’j’ for the user ’i’, which recommen-
dation is a list of top N items that the user will like the most as shown in the figure below.
The technique of collaborative filtering can be divided into two categories: memory based
and model based.

• Memory Based: This approach uses user rating data to compute the similarity between
users or items. This is used for making recommendations. This was an early approach
used in many commercial systems. It’s effective and easy to implement. Typical exam-
ples of this approaches are neighborhood-based CF and item-based/user-based top-N
recommendations. The user based top-N recommendation algorithm uses a similarity
based vector model to identify the k most similar users to an active user. After the
k most similar users are found their corresponding user-item matrices are aggregated
to identify the set of items to be recommended. The advantages with this approach
include: the explainability of the results, which is an important aspect of recommenda-
tion system, easy creation and use, easy facilitation of new data, content-independence
of the items being recommended, good scaling with co-rated items.There are several
disadvantages with this approach. It’s performance decreases when data gets sparse,
which occurs frequently with web-related items. This might hinder the scalability of
this approach and creates problems with large datasets.

• Model Based: In this approach, models are developed using different data mining,
machine learning algorithms to predict user’s rating of unrated items. There are many
model-based CF algorithms. Bayesian networks, clustering models, latent semantic
models such as singular value decomposition, probabilistic latent semantic analysis,
multiple multiplicative factor etc.
24

In this model, methods like singular value decomposition, principle component analy-
sis, known as latent factor models, compress user-item matrix into a low-dimensional
representation in terms of latent factors. One advantage of using this approach is that
instead of having a high dimensional matrix containing abundant number of missing
values,will be dealing with a much smaller matrix in lower-dimensional space. A re-
duced presentation could be utilized for either user-based or iitem-base neighborhood
algorithms. It handles the sparsity of the original matrix better than memory based
ones.

3.4.5.1 Pros

• Can identify cross-genre items.

• Domain knowledge not needed

• Adaptive: quality improves over time.

• Implicit feedback sufficient.

3.4.5.2 Cons

• New user/item ramp-up problems(cold-start problem)

• Gray sheep problem

• Stability vs Plasticity problem

3.4.6. Hybrid Recommendation System

All of the known recommendation techniques have strengths and weakness and many re-
searchers choose to combine the techniques in different ways. The different approaches used
for the modeling of hybrid recommendation system are:

• Weighted: The scores of several recommendation techniques are combined together to


produce a single recommendation.

• Switching: The system switches between recommendation techniques depending on


the current situation.
25

• Mixed: Recommendation from several different recommender systems are presented


at the same time.

• Feature Combination: Feature from different recommendation data sources are thrown
together into a single recommendation algorithm.

• Cascade: On recommender refines the recommendation given by another.

• Feature augmentation: Output from one technique is used as an input feature to an-
other.

• Meta-level: The model learned by one recommender is used as input to another.

3.5. Issues in recommendation system

The issues [22] that can result in the recommendation system can be described as follows:

1. Data Collection: The data used by recommendation engines can be categorized into
explicit and implicit data. Explicit is all data the user themselves feed into the system.
The collection of explicit data must not be intrusive or time consuming. Implicit data
source in e-commerce is the transaction data. Implicit data needs to be analyzed before
it can be used to describe user features or user-item ratings.

2. Cold Start/Ramp-Up: The cold start problem occurs when too little/no rating data is
available in the initial state. The recommendation system then lack data to produce
appropriate recommendations. They mostly occur in the learning models. Two cold
start problems are new user problem and new item problem.

3. Stability Vs Plasticity: The converse of the cold start problem is the stability vs plas-
ticity. When consumers have rated so many items their preferences in the established
user profiles are difficult to change.

4. Sparsity: In most use cases for recommendation systems, due to the catalog sizes of e-
business vendors, the count of rating already obtained is very small related to the count
of ratings that need to be predicted. But collaborative filtering techniques focuses on
an overlap in ratings and have difficulties when the space of rating is sparse(few user
have rated the similar items). Sparsity in the user-item rating matrix degrades the
quality of the recommendations.

5. Performance and Scalability: Performance and scalability are important issues for rec-
ommendation systems as e-commerce websites must be able to determine recommen-
dations in real-time and often deal with huge data sets of millions of users and items.
26

The big growth rates of e-business are making the sets even larger in the user dimen-
sion.

6. User Input Consistency: Recommendation techniques that work with user-to-user cor-
relations like collaborative filtering or demographic, depends on more correlation co-
efficients between the users in a data set. Users can be categorized into three classes
based on their correlation coefficients with other users. The majority of users fall into
the class of “white sheep”, where there is a high rating correlation with other users.
Resulted engines can easily find recommendations for them. The opposite type is the
“black sheep” where there are only few or no correlating users. This makes it quite
difficult to find recommendations for them. The bigger problem is the “gray sheep”
problem where users have different opinions or an unusual taste that results in low
correlation coefficients with many users. They fall on a border opinions or an unusual
taste that result in low correlation coefficients with many users. They fall on a border
between user tastes. Recommendations for them are very difficult to find and they also
cause different recommendations for their correlated users.

7. Privacy: Privacy is an important issue in recommendation systems. To provide person-


alized recommendations, recommendation systems must know something regarding to
users. In fact, the more the system know, the more precise the recommendation can
get. Users are concerned about what information is gathered, how it is used, and if it is
stored. These privacy affect both the collection of explicit and implicit data. Regard-
ing explicit data, users are not interested to disclose information about themselves and
their interests. If questionnaires get too personal, user may give false information in
order to protect their privacy.

3.6. Recommendation Model for Experimentation

The purpose of this study is to understand how personality impacts on the collaborative fil-
tering model and compare it with other some popular models (global baseline, latent factor).
Hence the recommendation model used in the project are:

• Global Baseline Algorithm

• User to User collaborative filtering(with and without personality)

• Combination of Global Base line and user to user collaborative filtering

• Matrix Factorization
27

Here all together, 8 different recommendation models are created among which 4 are created
by the combination of global baseline algorithm, user to user collaborative filtering(with and
without personality).

3.6.1. Global Baseline Algorithm

Global Baseline algorithm provides a mechanism to compute the unknown rating with base-
line (i.e “global effects”) estimates of corresponding users and items. Mathematically, Sup-
pose µ be the system wide average rating, bx be the overall user rating deviation from system
average and bi be the deviation in rating for an item i then global base line algorithm rates
an item i for an user x as:

GlobalBaselineEstimate[Rx,i ] = µ + bx + bi (3.13)

3.6.2. User to User collaborative filtering

• User to Rating matrix computation: User-rating matrix is computed with rating data
of different users available from database or dataset.

• Normalization of the rating: It is done in order to make the avg rating of the system
zeros so that the unknown values can be padded with zeros. Mathematically, Suppose
µx be the average rating of the user x and Rx,i represents a rating of user x on item i
then normalized rating for an user x on item i can be computed as:

NormalizedRating[NRx,i ] = Rx,i − µ (3.14)

• Computing similar user: In order two compute the similar user,two metrics has been
used in the project i.e similarilty based on the rating matrix of the user and similarity
based on the personality. In both of the cases the similar user is computed with the
help of cosine similarity after the normalization of the rating. Mathematically, Suppose
ra = [ra 1, ra 2, · · · , ra n] be the user rating matrix of the user a and rb = [rb 1, rb 2, · · · , rb n]
be the user rating matrix of user b, then cosine similarity between user a and b can be
obtained as:

ra 1 ∗ rb 1 + ra 2 ∗ rb 2 + · · · + ra n ∗ rb n
similaritya,b = p p (3.15)
ra 12 + ra 22 + · · · + ra n2 ∗ rb 12 + rb 22 + · · · + rb n2

Similarly, person with similar personality is computed with the help of personality
vector.
28

• Rating prediction: A rating for user x on item i with the help of N neighbor is computed
by taking the weighted average rating of the neighbors.

∑N
y=1 sx,y ∗ ry,i
rx,i = (3.16)
∑N
y=1 sx,y

• Recommendation: After precdiction of the rating, top-N items can be recommended


to the users.

3.6.3. Combination of Global Baseline and User to User collaborative fitering

The equation 3.13 and 3.16 can be combined as use togetheras:

∑N
y=1 sx,y ∗ (ry,i − baseliney,i )
rx,i = baselinex,i + (3.17)
∑N
y=1 sx,y

where,
rx,i is the rating on item i by user x
baselinex,i is the baseline estimate on item i by user x
baseliney,i is the baseline estimate on item i by user y
sx,y is the similarity between user x and y
N is the total neighbors used for the recommendation

3.6.4. Matrix Factorization

Matrix factorization [24] involves in a factorization of a matrix to find out tow or more
matrices such that when fators are multiplied together, original matrix in obtained. In rec-
ommender system, the matrix factorization is employed to predict the missing ratings such
that the values would be consistent with the existing rating in the matrix. The intuition be-
hind using matrix factorization, is that it is assumed there should be some latent features that
determine how a user rates an item. For example two users would give high rating to a cer-
tain music if they both like the singer of the music or if the music is of same genre. Hence, if
these latent features can be discovered, we should be able to predict a rating with respect to
a certain user and a certain item, because the features associated with the user should match
with the features associated with the item.

In trying to discover the different features, we also make the assumptions that the number of
features would be smaller than the number of users and the number of items. Suppose we
29

have a set U of users and set D of items. Let R of size |U| ∗ |D| be the matrix that contains
all the ratings that the users have assigned to the items. We also assume that we would like
to discover K latent features. So our task here is to find out two matrices P of size |U| ∗ K
and Q of size |D| ∗ K such that their products approximates R.

R ≈ P ∗ QT = Rb (3.18)

In this ways each row of P would represent the strength of the associations between a user
and the features. Similarly, each row of Q would represent the strength of the associates
between an item and the features. To get the prediction of a rating of an item d j by ui we can
calculate the dot product of the two vectors corresponding to the ui and d j :

rbi j = pi T q j
K (3.19)
= ∑ pik q jk
k=1

Now, in order to find a way to obtain P and Q. One way to approach this problem is first
initialize the two matrices with some values, calculate how ’different’ their product is to M,
and then try to minimize this difference iteratively. Such a method is called gradient descent,
aiming at finding a local minimum of the difference. The difference here, usually called the
error between the estimated rating and the real rating, can be calculated by the following
equation of each user-item pair:

ei j 2 = (ri j − rbi j )2
K (3.20)
= (ri j ) − ∑ pik q jk )2
k=1

Here we consider the squared error because the estimated rating can be either higher or
lower than the real rating. To minimize the error, it is necessary to know in which direction
we have to modify the values of pik and qk j . In order words, we need to know the gradient
at the current values and hence differentiate the above equation with respect to those two
variables separately:

∂ ei j 2
= −2(ri j − rbi j )(qk j )
∂ pik (3.21)
= −2ei j qk j
30

And,

∂ ei j 2
= −2(ri j − rbi j )(pk j )
∂ qik (3.22)
= −2ei j pk j

Now, the update rules can be formulated for both pik and qk j as:

∂ ei j 2
pik = pik + α
∂ pik (3.23)
= pik + 2αei j qk j

And,

∂ ei j 2
qik = qik + α
∂ qik (3.24)
= qik + 2αei j pik

Here, α is a constant called learning rate whose value determines the rate of approaching the
minimum. Usually, α is chosen to be between 0.001 to 0.1, this is because if we make too
large step towards the minimum, we may run into the risk of missing the minimum and end
up oscillating around minimum. Using the above rule, it is possible to iteratively perform the
operation until the error converges to it’s minimum or run the process for the finite number
of iteration.

3.6.4.1 Regularization

Here, in order to avoid over-fitting, regularization is introduced by adding the additional β


and modifying the squared error as:

K
β 2
ei j 2 = (ri j − ∑ pik qk j)2 + ∑ (|P|2 + |Q|2 ) (3.25)
k=1 2 k=1

Here, the new parameter β is used to control the magnitudes of the user=feature and item-
feature vector such that P and Q would give a good approximation of R without having to
contain large numbers. In practice, β is set in range of 0.02. Now the new update rules for
this squared error can be obtained similarly as above and the new update rules becomes:
31

∂ ei j 2
pik = pik + α
∂ pik (3.26)
= pik + α(2ei j qk j − β pki )

And,

∂ ei j 2
qik = qik + α
∂ qik (3.27)
= qik + α(2ei j pik − β qk j )

Thus, in this way matrix factorization can be implemented as the recommender system.

3.7. Model Evaluation for Recommender System

Evaluation measures for recommender systems are separated into three categories [28]:

• Predictive Accuracy Measures: These measures evaluate how close the recommender
system came t predicting actual rating/utility values.

• Classification Accuracy Measures: These measures evaluate the frequency with which
a recommender system makes correct/incorrect decisions regarding items.

• Rank Accuracy Measures: These measures evaluate the correctness of the system or-
dering of items performed by the recommendation system.
Since the project is about the impact of personality on user to user based collaborative
filtering we are concerned with only predictive accuracy measures. There are many
variant of predictive accuracy measures such as: Mean Absolute Error(MAE), Mean
Squared Error(MSE), Root Mean Squared Error(RMSE),Normalized Mean Absolute
Error(NMAE).
Among them, root mean squared error is the most popular one and has been used in
the project as:
Let uxi be the actual rating of the user x on item i and uc
xi be the predicted rating of the
user x on item i, then, the root squared error can be computed as:
s
(∑N d2
n=1 (uxi − uxi )
RMSE = (3.28)
N
32

4. METHODOLOGY

4.1. Requirement Specification

A software requirement specification is a description of a software system to be developed. It


lays out functional and non-functional requirements. It describes what the software product
is expected to do and what not to do. It enlists enough and necessary requirements that are
required for the project development. It mainly aids to describe the scope of the work and
provide a software designers a form of reference.

4.1.1. Functional Requirement

The functional requirement specification of the project are mainly categorized as user re-
quirements, security requirements, and device requirement each of which are explained in
detail below:

• User Requirement: User should have account on Facebook and user must have at least
one post needed to analyze the personality for the music recommendation.

• Security Requirement: The user can’t have access to the Facebook API. User must
provide their own login credentials.

• Device Requirement: System must be initiated on web-browser.

4.1.2. Non-functional Requirement

The non-functional requirement of the system can be summarized as follows:

• Performance: The system shall have a quick, accurate and reliable results.

• Capacity and Scalability: The system shall be able to store personality computed by
the system into the database.

• Availability: The system shall be available to user anytime whenever there is an Inter-
net connection.

• Recovery: In case of malfunctioning or unavailability of server, the system should be


able to recover and prevent any data loss or redundancy.
33

• Flexibility and Portability: System shall be accessible anytime from any locations.

4.2. Feasibility Assessment

Feasibility Assessment is done to analyze the viability of an idea. In case of software devel-
opment, it performs the practicality of the project or system. The result from feasibility as-
sessment determines whether the project should go ahead, be redesigned, or dropped. There
are five areas of feasibility - Technical, Economic, Legal, Operational and Scheduling.

4.2.1. Operational Feasibility

The operational feasibility analysis gives the description about how the system operates and
what resources do the system requires for performing its designated task. Being closely re-
lated to data analysis and integration, the system requires to be easily operable for different
uses and operations like new data entry, concurrent use of system should be fluent consum-
ing minimum cost and resources. System to designed is operationally feasible as the system
can be operated with the resource as of personal computer i.e browser. The project is devel-
oped as the website allows the easy access to the multiple users. Beisdes the analysis task
performed by the subsystem is also operationally feasible.

4.2.2. Technical Feasibility

Technical Feasibility Assessment examines whether the proposed system can actually be de-
signed to solve the desired problems and requirements using available technologies in the
given problem domain. The system is said to be feasible technically if it can be deploy-
able, operable and manageable under current technological context of our country Since the
system aims to enhance the existing collaborative engines, via the use of personality and as
the data set for the personality is avialable and the building a classifier is also feasible, the
project can be considered technically feasible.

4.2.3. Economic Feasibility

Economic Feasibility checks whether the cost required for complete system development is
feasible using at he available resources in hand. It should be noted that the cost of resources
and overall cost of deployment of system should be kept minimum while operational and
34

maintenance cost for the system should be within the capacity of organization. Since the
system is hosted on Heroku cloud hosting service which is free of cost for limited use, the
system can be considered economically feasible for the developement.

4.2.4. Legal Feasibility

Legal Feasibility assessment checks the system for any conflicts with legal requirement,
regulations that are to be followed as per the standard maintained by the governing body. As
such, the system that is being developed must comply with all the legal boundaries such as
copyright violation, authorize use of licenses and other. This prevent any future conflicts for
the system and also provide legal basis for the system in future if any other tries to use part
of or full system without necessary permission and documents. The data obtained from the
social media of the user is being consented by the user and doesn’t does violate any other
obligation of law and privacy, project can be considered legally feasible.

4.2.5. Scheduling Feasibility

Any project is considered fail if it is not completed on time. so, Scheduling Feasibility
estimates the time require fro the system to fully develop and whether that time feasible or
not according to current trend in market. If the project takes longer time to complete, it may
be outdated or some other may launch the similar system before our system is complete. So,
it is required to fix the deadline for any project and the system should be out and operative
before specified deadline. As the scheduling of the project is in consistent with the available
time of the project, project can be considered scheduling feasibile.

4.3. Software Development Approach

The developed system, being huge and dynamic nature, may not be developed efficiently and
timely procured with the traditional development approaches like waterfall. Thus to meet the
requirements of the system and ensuring the timely delivery and adaptability for changing
requirement, Scrum methodology under the Agile Development method is chosen for the
development of system.

Scrum is an agile way to manage project. Agile software development with Scrum is of-
ten perceived as a methodology, but rather than viewing Scrum as methodology, it is often
thought as a framework for managing a process. In the agile Scrum world, instead of provid-
35

Figure 4.1: Scrum Software Development Cycle

ing complete, detailed descriptions of how everything is to be done on a project,much of it is


left up to Scrum software development team. This is because the team will know best how
to solve the problem they are presented. It relies on a self-organizing, cross-functional team.
The scrum team is self-organizing in that there is no overall team leader who decides which
person will do which task or how a problem will be solved. This is the issue that are decided
by the team as a whole. Besides the team is cross functional, because of which everyone is
needed to take a feature from idea to implementation.

This model suggests that projects progress via a series of sprints. In keeping with an agile
methodology, sprints are time boxed to no more than a month long, most commonly two
weeks. It advocates for planning meeting at the start of the sprint, where team members
figure out how many items they can commit to and then create a sprint backlog, a list of task
to perform during the sprint. During an agile Scrum sprint, the Scrum team takes a small set
of features from idea to coded and tested functionality. At the end, these features are coded,
tested and integrated into the evolving product or system.

On each day of the sprint, all team members attend a Scrum meeting. During that time, team
members share what they worked on the prior day, will work on that day and identify any
impediments to progress. The model sees routine scrums as a way to synchronize the work of
team members as they discuss the work of the sprint. At the end of a sprint the sprint review is
conducted, during which the team demonstrates the new functionality to any stakeholder who
wishes to provide feedback that could influence the next sprint. The feedback loop within
Scrum software development may result in changes to the freshly delivered functionality, but
it may just as likely result in revising or adding items to the product backlog.

The primary artifact in Scrum development is, the product itself. The Scrum model expects
36

the team to bring the product or system to a potentially shippable state at the end of each
Scrum sprint. The product backlog is another artifact of Scrum. This is the complete list of
the functionality that remains to be added to the product. The most popular and successful
way to create a product backlog using Scrum methodology is to populated it with the user
stories, which are short descriptions of functionality described from the perspective of a
user or customer. In Scrum project management, on the first day of a sprint and during
the planning meeting, team members create the sprint backlog. The sprint backlog can be
thought of as the team’s to do list for the sprint whereas the product backlog is a list of
features to be built. The sprint backlog is the list of tasks the team needs to perform in order
to deliver the functionality it committed to deliver during the sprint. Additional artifacts
in the Scrum methodology is the sprint burn down chart which shows the amount of work
remaining either in a sprint to determine whether a sprint is on schedule to have all planned
work finished by the desired date.

Hence scrum is adaptive, has small repeating cycles and there is short-term planning with
constant feedback, inspection and adaption and is therefore chosen as the software develop-
ment methodology. Here, project team members can be thought of a software development
team and project supervisor as scrum master. The scrum meeting was conducted regularly
within the interval of 3 weeks, keeping scrum backlog as well as product backlog and hence
the progress in the project was made in the form of sprints whereby the product backlog
helped to identify and prioritizes the features to implement in each sprint and the burn down
chart helped to keep the project timely on schedule. Whenever the bug was found relating to
the feature, it was dealt immediately before marking the feature complete i.2 1-2 sprints ware
focused only on Defect backlogs. Each scrum meeting lasted about 15 minutes in which ev-
ery team member answered three question: What have I done since last meeting?, What will
I do until next meeting? and What problems do I have? and hence in this way a cycle is
continued till product is completely developed.

4.4. Data Collection

In order to predict the personality, dataset for training the classification model was obtained
from myPersonality website[1]. It consisted of collections of status updates of Facebook
users along with their personality classification scores in terms of big five personality traits.
For the recommendation system, survey was conducted among the colleagues who gave
ratings to predefined set of music in the database. Their personality traits were determined
with our personality classification model before they rated the music.
37

5. SYSTEM DESIGN

5.1. System Overview

The system uses initially predicts the personality of the user with the help of their social
media account(facebook). Thus initially a classifier is to be trained to classify the personality
of the user on the basis of their status update. Afterwards, the predicted personality of the
user is to be used as one the metrics for the similar user computation in collaborative filtering
and the effect of the personality on the collaborative filtering engine was observed.

5.2. System Architecture

The give figure below is the architectural diagram of the project showing the process that are
involved during the development of the project.

Figure 5.1: Block Diagram of the System

• Pre-processing Unit/Data Preprocessor: This subsystem is responsible for the conver-


sion of the status update of the user from the dataset [1] as well user logged in via API
[30] into vector representation via the use of bag of word and tf-idf model.
It is responsible for:

1. Lower casing the status update.


2. Tokenization
3. Filtering stop words
4. Filtering parts of speech
38

5. Stemming
6. Conversion of textual data to vector representation(numerical form)

The following figure depicts tasks performed within preprocessor unit within the sys-
tem:

Figure 5.2: Tasks performed by preprocessor unit within the system

• Classifier: After the vector representation of the status update,this subsystem is re-
sponsible for personality prediction. Classifier are trained by the admin in the system
using the dataset [1] in order to predict a personality. In the project there are two clas-
sifier model used for the personality classification.
They are:

1. Naive Bayes Classifier


2. Logistic Regression

The following figure depicts tasks performed by classifier unit within the system:
39

Figure 5.3: Tasks performed by classifier unit within the system

• Music Recommender System: The system comprises of the eight models for the rec-
ommendation of the music to the user.
They are:

1. Global Baseline Approach


2. User to User collaborative filtering with rating matrix
3. User to User collaborative filtering with personality matrix
4. User to User collaborative filtering with weighted average of rating and person-
ality matrix
5. Combination of global baseline and CF with rating matrix
6. Combination of global baseline and CF with personality matrix
7. Combination of global baseline and CF with weighted average of rating and per-
sonality matrix
8. Matrix Factorization

The following figure summarizes tasks performed by recommender unit within the
system:
40

Figure 5.4: Tasks performed by recommedation unit within the system

The following figure depicts various recommendation models used within the system:

Figure 5.5: Recommedation models used within the system

• Storage Unit/Database: It is responsible for storing of user data, music data, user-
music-rating data and user-music-recommendation data made by the recommender
system as well as providing recommendation based on user-feedbback. SQLite database
is used as the storage unit for the project.
41

5.3. Use Case Diagram

A use case diagram represents user’s interaction with the system that shows the relationship
between the user and the different use cases in which the user is involved. A use case diagram
can identify different types of users of a system, different use cases as well. A use-case
diagram provides higher-level view of the system. Use case diagrams are the blueprints
for the system. The use cases are shown as ovals, actors as stick people (even if they are
machines), with lines (known as associations) connecting use cases to the actors who are
involved with them. A box around the use cases emphasizes the boundary between the
system (defined by the use cases) and the actors who are outside of the system. In our case,
actor ’user’ can log in to the system, allow system to access his profile information and view
output or result from the system.
Use case diagram of the system depicting the actors and their interaction to the system is
given in the figure below:

Figure 5.6: Use Case Diagram of the System

From the above diagram, it is clear that the system consists of two actors.They are:
42

• User: They are the ones who will be using the system directly. The users will be able
to do the actions like login, viewing recommendation and listening to a music.

• Admin: Admin is directly responsible for training a classifier subsystem and recom-
mender subsystem, creation of model for the storage engine and verification of all of
these subsystem.
System is composed of ui, classifier, music recommender and storage unit.Classifer
within the system is responsible for the classification of the personality of the user and
update of the database. Recommender is responsible for the recommendation of the
music to the user and also update of the database.Storage unit is responsible for the
creation of the data base model and storage of system data and ui is responsible for
providing user with login access, profile access and displaying user a personality and
recommendation of music.
43

5.4. ER Diagram

An Entity-Relationship model describes inter-related things of interest in a specific domain


of knowledge. In software development, ER model has become an abstract data model that
defines a data/information structure that can be implemented in a database, typically a rela-
tional database. In our database, we have separate tables for user, music, recommendation
and session. Each table has several attributes that best describe the table. To get required
information, say we need to print details of particular user, we need to access database and
more than one tables to retrieve data. We need certain relationship between each tables in our
database and a common attribute to map tuples of one table to another. ER Diagram provides
visual reference to complete database at one glance. We can develop database looking at the
ER Diagram and later use it as reference for further improvement.
The ER diagram depicting the entities used in the system and relationship between them is
given below: The entities in the system are:

Figure 5.7: ER diagram of the System

1. Session: It consists of attributes: session id and user id and has one to one relationship
with user.

2. Music: It consists of attributes: artist and song title and also has many to many rela-
tionship with the user-music and recommendation.

3. User: It consists of attributes user id, name and personality traits attributes and has
many to many relationship with user-music and one to one with the session with ses-
sion and recommendation.

4. Recommendation: It consists of attributes user and music and has one to one relation-
ship with user while many to many relationship with music.
44

5. User-Music: It consists of attributes user, music and rating and many to many relation-
ship with user and music.

System is implemented within the Django framework that provides a abstraction to the rela-
tionship within the database hence we can directly implement those relationship within the
database such as one-to-one, one-to-many and many-to-many.

5.5. Activity Diagram

Activity Diagram describes the dynamic aspects of the system. It diagram shows user ori-
ented view of system operation. We have made activity diagram using swim-lanes. A
swim lane is a visual element that distinguishes job sharing and responsibilities for sub-
processes. In our system’s activity diagram, we have three swim-lanes and we have sepa-
rated job/responsibilities accordingly. Each step is continuation of previous step. Decision
is taken wherever necessary and fork and join is used to divide or attach work flow. The
objective of making activity diagram is similar to objectives of other UML Diagrams. Only
difference is that it is used to show message flow between activities.
45

Figure 5.8: Activity Diagram of the System

The diagram above shows the activity digram of the system. It depicts how the user, admin
and system interacts with each other. Initially user login into the system providing the basic
user profile information. Afterwards, the status of the new/old user is used to predict the
personality with classifier. Then music is recommended to user. Besides, the user can also
view his personality.
46

5.6. Context Diagram

A system context diagram (SCD) in engineering is a diagram that defines the boundary be-
tween the system, or part of a system, and its environment, showing the entities that interact
with it. This diagram is a high level view of a system. It is similar to a block diagram. In
our system context diagram, there are two entities namely, user and sysadmin and a process
(which is the system we developed as our project). This diagram shows the input and output
for each of the entity as well as the process.

Figure 5.9: Context Diagram of the System

5.7. Data Flow Diagram

A data flow diagram (DFD) is a graphical representation of the ”flow” of data through an
information system, modelling its process aspects. A DFD shows what kind of information
will be input to and output from the system, how the data will advance through the system,
and where the data will be stored. However, it does not show information about process
timing or whether processes will operate in sequence or in parallel Being a UML diagram,
DFD presents both control and data flows as a unified model. Given diagram is the level-
0 DFD that shows internal distinct process of our system. There are four processes and a
datastore which stores all data, intermediate outcomes and results. Two entities, user and
sysadmin, take part in flow of data to/from these process. Each arrow head in the data
flow diagram shows the direction of the data/information flow and label provide type of
data/information that flows through. The figure given below is the data flow diagram of the
project showing the flow of data within the system.

It is the level-0 dfd, with the two entities User and Admin. User is responsible for log in,
47

Figure 5.10: Data Flow Diagram of the System: level-0

view personality and view recommended music, all of which takes data from the user and
recommendation store, to provide a data to the user. Besides admin is responsible for creating
and altering models(classifier,recommender,database) all of which are reflected within the
user and recommendation store.

5.8. Front End of the System(User Interface)

User Interface is one of the major part of the system. It is where a user will login through
their Facebook id in order to experience the personalized based music listening. User is
able to able to view his/her recommended music as well as personality via the website and
also view the detailed description about the personality traits. Personality classification and
music recommendation are all performed in the backend of the system.

5.9. Back End of the System

After the user login through Facebook, the user post are extracted through Graph API. The
data obtained goes through the preprocessor where it’s performs various NLP techniques
48

such as tokenization, POS tagging at the end of which feature vector is given as output
by this subsystem. Thus created vector is passed through the classifier, that classifies the
personality of the user, which is then stored in the database and is also fed into user-to-user
collaborative filtering engine to determine the similar user and recommend the music to the
user.Besides there are also other recommendation model, one with the least RMSE value is
used for the recommendation of the music. Thus obtained result is sent back to the front
end and is displayed to the user. Then the user can view recommended music and his/her
personality traits too.
49

6. TOOLS AND TECHONOLOGIES

6.1. Python

Python is a widely used high-level programming language for general-purpose program-


ming. Python features a dynamic type system and automatic memory management and
supports multiple programming paradigms, including object-oriented, imperative, functional
programming, and procedural styles. It has a large and comprehensive standard library.

6.2. Django

Django is a free and open-source web framework, written in Python, which follows the
model-view-template (MVT) architectural pattern. Django’s primary goal is to ease the cre-
ation of complex, database-driven websites. Django emphasizes reusability and ”pluggabil-
ity” of components, rapid development, and the principle of don’t repeat yourself. Python is
used throughout, even for settings files and data models. Django also provides an optional
administrative create, read, update and delete interface that is generated dynamically through
introspection and configured via admin models.

6.3. NumPy

NumPy is a library for the Python programming language, adding support for large, multi-
dimensional arrays and matrices, along with a large collection of high-level mathematical
functions to operate on these arrays.

6.4. Pandas

Pandas is a software library written for the Python programming language for data manip-
ulation and analysis. In particular, it offers data structures and operations for manipulating
numerical tables and time series. It offers wide range of features including DataFrame object
for data manipulation with integrated indexing, tools for reading and writing data between
in-memory data structures and different file formats, data alignment and integrated handling
of missing data and more.
50

6.5. NLTK

NLTK is a leading platform for building Python programs to work with human language data.
It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet,
along with a suite of text processing libraries for classification, tokenization, stemming, tag-
ging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and
an active discussion forum. Natural Language Processing with Python provides a practical
introduction to programming for language processing.

6.6. Facebook Platform

Facebook Platform is an umbrella term used to describe the set of services, tools, and prod-
ucts provided by the social networking service Facebook for third-party developers to create
their own applications and services that access data in Facebook. The Graph API is the core
of Facebook Platform, enabling developers to read from and write data into Facebook.

6.7. HTML/CSS

Hypertext Markup Language (HTML) is the standard markup language for creating web
pages and web applications. Web browsers receive HTML documents from a web server
or from local storage and render them into multimedia web pages. HTML describes the
structure of a web page semantically and originally included cues for the appearance of the
document. Cascading Style Sheets (CSS) is a style sheet language used for describing the
presentation of a document written in a markup language. It is most often used to set the
visual style of web pages and user interfaces written in HTML.

6.8. JavaScript

JavaScript (JS) is a high-level, dynamic, weakly typed, object-based, multi-paradigm, and


interpreted programming language. Alongside HTML and CSS, JavaScript is one of the
three core technologies of World Wide Web content production. It is used to make web
pages interactive.
51

6.9. PostgreSQL

PostgreSQL is an object-relational database management system (ORDBMS) with an em-


phasis on extensibility and standards compliance. As a database server, its primary functions
are to store data securely and return that data in response to requests from other software ap-
plications. It can handle workloads ranging from small single-machine applications to large
Internet-facing applications (or for data warehousing) with many concurrent users. Post-
greSQL is ACID-compliant and transactional. PostgreSQL has updatable views and mate-
rialized views, triggers, foreign keys; supports functions and stored procedures, and other
expandability.

6.10. Git

Git is a version control system (VCS) for tracking changes in computer files and coordinating
work on those files among multiple people. It is primarily used for source code management
in software development, but it can be used to keep track of changes in any set of files.
As a distributed revision control system it is aimed at speed, data integrity, and support for
distributed, non-linear workflows.
52

7. RESULT

7.1. Big Five Personality Frequency Distribution

myPersonality dataset contains status updates of 223 users. These users come under Big Five
Personality traits. We analyzed this data to see how users are distributed under different per-
sonality traits. The frequency distribution of each class of personality traits is given below:

Figure 7.1: Class Frequency Distribution of Users

7.2. Logistic Regression Model

We analyzed the effect of number of iterations on the f-measure of the logistic regression
model which is given below:

The followings tables show the confusion matrix of Naive Bayes for Big Five Personality
classes:
53

Figure 7.2: F-Measure vs Number of Iterations (Logistic Regression)

N =50 Predicted:Yes Predicted: No


Actual:Yes 4 11
Actual:No 12 23
Table 7.1: Confusion Matrix of Logistic Regression Model (Openness)

N =50 Predicted:Yes Predicted: No


Actual:Yes 6 18
Actual:No 11 15
Table 7.2: Confusion Matrix of Logistic Regression Model (Conscientiousness)

N =50 Predicted:Yes Predicted: No


Actual:Yes 22 9
Actual:No 12 7
Table 7.3: Confusion matrix of Logistic Regression Model (Extraversion)

N =50 Predicted:Yes Predicted: No


Actual:Yes 9 14
Actual:No 14 13
Table 7.4: Confusion Matrix of Logistic Regression Model (Agreeableness)

N =50 Predicted:Yes Predicted: No


Actual:Yes 16 14
Actual:No 12 8
Table 7.5: Confusion matrix of Logistic Regression Model (Neuroticism)
54

7.3. Naive Bayes Model

The following figure shows f-measure of the naive bayes model for Big Five Personality
classes:

Figure 7.3: F-Measure of Naive Bayes Model

The followings tables show the confusion matrix of Naive Bayes for Big Five Personality
classes:

N =50 Predicted:Yes Predicted: No


Actual:Yes 3 12
Actual:No 8 27
Table 7.6: Confusion Matrix of Naive Bayes (Openness)

N =50 Predicted:Yes Predicted: No


Actual:Yes 9 15
Actual:No 4 22
Table 7.7: Confusion Matrix of Naive Bayes (Conscientiousness)

N =50 Predicted:Yes Predicted: No


Actual:Yes 20 11
Actual:No 12 7
Table 7.8: Confusion Matrix of Naive Bayes (Extraversion)
55

N =50 Predicted:Yes Predicted: No


Actual:Yes 12 11
Actual:No 13 14
Table 7.9: Confusion Matrix of Naive Bayes (Agreeableness)

N =50 Predicted:Yes Predicted: No


Actual:Yes 20 10
Actual:No 15 5
Table 7.10: Confusion Matrix of Naive Bayes (Neuroticism)

7.4. Evaluation of Recommendation System

The following table shows RMSE of various recommendation models:

Recommendation Model RMSE


User to User Collaborative Filtering with User
Rating Matrix with combination of Global Baseline 4.72
User to User Collaborative Filtering with User
Rating Matrix 3.89
User to User Collaborative Filtering with User
Personality Matrix 3.20
User to User Collaborative Filtering with Weighted
Average of User Personality Matrix and User Rating Matrix 3.2
User to User Collaborative Filtering with User
Personality Matrix with combination of Global Baseline 3.10
User to User Collaborative Filtering with Weighted
Average of User Personality Matrix and User rating Matrix
with combination of Global Baseline Algorithm 3.04
Global Baseline Algorithm 2.86
Matrix Factorization 0.88
Table 7.11: RMSE of Recommendation System Models
56

The following figure depicts the RMSE of various recommendation models:

Figure 7.4: RMSE of various models used in the system

The following figures show effects of change in number of nearest neighborhood in the dif-
ferent collaborative filtering models.

Figure 7.5: RMSE of Collaborative Filtering with User Rating Matrix


57

Figure 7.6: RMSE of Collaborative Filtering with similarity interm of Personality Matrix

Figure 7.7: RMSE of Collaborative Filtering combined with Global Baseline with User Rat-
ing Matrix
58

Figure 7.8: RMSE of Collaborative Filtering combined with Global Baseline with User Per-
sonality Matrix

Figure 7.9: RMSE of Collaborative Filtering with User Rating and Personality Matrix
59

Figure 7.10: RMSE of Collaborative Filtering with User Rating and Personality Matrix com-
bined with Global Baseline
60

7.4.1. Latent Factor

The following figure shows the RMSE of matrix factorization when number of iterations is
varied:

Figure 7.11: RMSE of Matrix Factorization vs Number of Iterations

The following figure shows the RMSE of matrix factorization when k is varied with number
of iterations is fixed at 1000:

Figure 7.12: RMSE of Matrix Factorization vs Number of Latent Factors


61

Comparison of the above models, we can conclude that the result of user to user collabora-
tive filtering with the personality has slightly better result than the user to user collaborative
filtering with the user Rating matrix but the matrix factorization outperforms them all. Be-
sides, the result of weighted average of user similarity matrix with rating and personality also
performs better than only a rating matrix but has a comparable result with the user to user
collaborative filtering with personality to compute the similarity. Currently the system uses
switching hybrid methodology in order to recommend the user among the different models
used in the system i.e one with the least RMSE.
62

8. LIMITATIONS AND FUTURE ENHANCEMENTS

In the context of building classification model, the preprocessing of the status updates was
a huge challenge. The status updates of users can contain various emojis which could be
significant but ignored.

The current recommendation system as a whole suffers from cold start in case of item i.e.
item ramp up problem. In case of user to user collaborative filtering with rating matrix, it
suffers from stability vs plasticity issue. Besides, in case of collaborative engine, gray sheep
problem still prevails with the use of the personality too because of sparsity in the user rating
matrix.

The current system can be enhanced with the consideration of emojis and some demographic
information about the users in case of personality classification. The ramp-up problem in
case of recommendation engine can be solved with the content filtering method in order to
create a profile of an item. Besides stability vs plasticity issue can be solved by giving low
weights to the old rating of the users in case of user to user collaborative filtering with rating
matrix.
63

9. CONCLUSION

In this project, we have developed classification models that take Facebook user’s status as
input and classifies their personality based on big five personality traits. This information is
used by user to user collaborative filtering to find out similar users and recommend music
to them. This recommendation model performs better than the user to user collaborative
collaborative filtering with rating matrix but not as good as the matrix factorization. Besides
the recommendation model developed with personality has comparable result to weighted
average of similarity using rating matrix and personality matrix. Hence, with reference to
current scenario of our project, we can conclude that personality,i.e big five traits of the
user can be used to enhance existing user to user collaborative filtering that computes the
similarity with the user rating matrix.
64

References

[1] D. Stillwell and M. Kosinski. (2017, January). myPersonality DataSet. Retrieved from
http://mypersonality.org/wiki/doku.php

[2] Goldberg LR, et al. (2006). Five Factor Model of Personality: The international per-
sonality item pool and the future of pulic-domain personality measures J Res Pers,
40(1):8486.

[3] E. Tupes and R. Christal. (1992). Recurrent personality factors based on trait ratings.
Journal of Personality, 60(2): 225251.

[4] R. McCrae and O. John. (1992). An introduction to the five-factor model and its appli-
cations. Journal of Personality, 60(2): 175215.

[5] L. Audery. (2014). Improving music recommender systems: What can we learn from
research on music tastes? ISMIR.

[6] B. Ferwerda and S. Markus. (2014). Enhancing Music Recommender Systems with Per-
sonality Information and Emotional States: A Propoasal. UMAP Workshops.

[7] O. Melissa, A. Micareli and G. Sansonetti. (2016). A Comparative Analysis of Person-


aility Based Music Recommender Systems. EMPIRE RecSys

[8] G. Gonzalez and M. Miquel. (2017). Embedding Emotional Context in Recommendation


System. 20th International Florida Artifical Intelligence Research Society Conference-
FLAIRS.

[9] P. Resnick and H. R. Vairan. (1997). Recommender Systems. Communications of the


ACM, 40(3):56-58.

[10] Nunes, M.A.S.N. (2008). Recommender system based on personality traits

[11] G. Parsa. (2017, July). Document Classification. Retrieved from http://www.


kdnuggets.com/2015/01/text-analysis-101-document-classification.
html

[12] Dropping common stop words. (2017, March). Retrieved


from http://nlp.stanford.edu/IR-book/html/htmledition/
dropping-common-terms-stop-words-t1.html

[13] M. F. Porter. (2001). Stemming Algorithm Snowball: A Language for Stemming Algo-
rithms
65

[14] Part of Speech Tagging. (2017, March). Retrieved from http://en.wikipedia.org/


wiki/Part-of-speechtagging

[15] Textual data and vector space model. (2017, July). Retrieved from http://www.
calpoly.edu/~dsun09/lessons/textprocessing

[16] Naive Bayes text classification. (2017, July). Retrieved from


https://nlp.stanford.edu/IR-book/html/htmledition/
naive-bayes-text-classification-1.html

[17] Additive Smoothing. (2017, July). Retrieved from https://en.wikipedia.org/


wiki/Additive_smoothing

[18] Cross Validation. (2017, July). Retrieved from https://www.cs.cmu.edu/


~schneide/tut5/node42.html

[19] Data Analysis. (2017, February). Retrieved from https://www.ngdata.com/


what-is-data-analysis

[20] Recommender System. (2017, July). Retrieved from http://recommender-system.


blogspot.com/2012_10_01_archive.html

[21] L. Safoury and A. Salah. (2013). Exploiting user demographiattributes for solving cold-
start problem in recommender system.Lecture Notes on Software Engineering, 1(3), 303

[22] A. Tejal. (2015). A Survey on Recommendation System. International Journal of In-


novative Research in Advanced Engineering(IJIRA).

[23] F. O. Isinkaye, Y. O. Folajimi and B. A. Ojokoh. (2015). Recommendation Systems:


Principles, methods and evaluation Egyptian Informatics Journal.

[24] Non-negative matrix factorization. (2017, July). Retrieved from https://www.


slideshare.net/BenjaminBengfort/non-negative-matrix-factorization

[25] C. Manning. (2017, June). Stanford NLP - Standford NLP Group. Retrieved from
https://nlp.standford.edu/manning

[26] Recommendation System, Coursera. (2017, July). Retrieved from https://


coursera.org/learn/machine-learning

[27] Studying the big five personality traits-UK Essays. (2017. Jan-
uary). Retrieved from https://ukessays.com/essays/psychology/
studying-the-big-five-personality-traits.php

[28] D. Alexander. (2009). Collaborative Fileringand Recommender Systems


66

[29] TF-IDF. (2017, July). Retrieved from http://www.tfidf.com/ 2015

[30] Facebook Graph API. (2017, July). Retrieved from https://developers.


facebook.com/docs/graph-api

[31] Andrew Ng. (2017, March). Machine Learning [Video lectures]. Retrieved from
https://www.coursera.org/learn/machine-learning.

[32] Logistic Regression. (n.d.). (2017, July). Retrieved from https://en.wikipedia.


org/wiki/Logistic_regression.

[33] Jason Brownlee. (2017, August). Logistic Regression for Machine Learn-
ing [Online tutorial]. Retrieved from http://machinelearningmastery.com/
logistic-regression-for-machine-learning.
67

APPENDIX A

Output Screenshots

Figure 9.1: Home Page

Figure 9.2: Recommended Songs Page


68

Figure 9.3: User Profile Page

Figure 9.4: Help Page


69

APPENDIX B

Stopwords used

’i’, ’me’, ’my’, ’myself’, ’we’, ’our’, ’ours’, ’ourselves’, ’you’, ’your’, ’yours’, ’yourself’,
’yourselves’, ’he’, ’him’, ’his’, ’himself’, ’she’, ’her’, ’hers’, ’herself’, ’it’, ’its’, ’itself’,
’they’, ’them’, ’their’, ’theirs’, ’themselves’, ’what’, ’which’, ’who’, ’whom’, ’this’, ’that’,
’these’, ’those’, ’am’, ’is’, ’are’, ’was’, ’were’, ’be’, ’been’, ’being’, ’have’, ’has’, ’had’,
’having’, ’do’, ’does’, ’did’, ’doing’, ’a’, ’an’, ’the’, ’and’, ’but’, ’if’, ’or’, ’because’, ’as’,
’until’, ’while’, ’of’, ’at’, ’by’, ’for’, ’with’, ’about’, ’against’, ’between’, ’into’, ’through’,
’during’, ’before’, ’after’, ’above’, ’below’, ’to’, ’from’, ’up’, ’down’, ’in’, ’out’, ’on’, ’off’,
’over’, ’under’, ’again’, ’further’, ’then’, ’once’, ’here’, ’there’, ’when’, ’where’, ’why’,
’how’, ’all’, ’any’, ’both’, ’each’, ’few’, ’more’, ’most’, ’other’, ’some’, ’such’, ’no’, ’nor’,
’not’, ’only’, ’own’, ’same’, ’so’, ’than’, ’too’, ’very’, ’s’, ’t’, ’can’, ’will’, ’just’, ’don’,
’should’, ’now’, ’d’, ’ll’, ’m’, ’o’, ’re’, ’ve’, ’y’, ’ain’, ’aren’, ’couldn’, ’didn’, ’doesn’,
’hadn’, ’hasn’, ’haven’, ’isn’, ’ma’, ’mightn’, ’mustn’, ’needn’, ’shan’, ’shouldn’, ’wasn’,
’weren’, ’won’, ’wouldn’

POS tags used

JJ: adjective or numeral, ordinal


example: third ill-mannered pre-war regrettable oiled calamitous first separable ectoplasmic
battery-powered participatory fourth still-to-be-named multilingual multi-disciplinary etc.

JJR: adjective, comparative


example: bleaker braver breezier briefer brighter brisker broader bumper busier calmer
cheaper choosier cleaner clearer closer colder commoner costlier cozier creamier crunchier
cuter etc.

JJS: adjective, superlative


example: calmest cheapest choicest classiest cleanest clearest closest commonest corniest
costliest crassest creepiest crudest cutest darkest deadliest dearest deepest densest dinkiest
etc.

RB: adverb
example: occasionally unabatingly maddeningly adventurously professedly stirringly promi-
70

nently technologically magisterially predominately swiftly fiscally pitilessly etc.

RBR: adverb, comparative


example: further gloomier grander graver greater grimmer harder harsher healthier heavier
higher however larger later leaner lengthier less- perfectly lesser lonelier longer louder lower
more etc.

RBS: adverb, superlative


example: best biggest bluntest earliest farthest first furthest hardest heartiest highest largest
least less most nearest second tightest worst

View publication stats

You might also like