Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 50

ABSTRACT

This project Emo player (an emotion based music player) is a novel approach that helps
the user to automatically play songs based on the emotions of the user. It recognizes the facial
emotions of the user and plays the songs according to their emotion. The emotions are
recognized using a machine learning method Support Vector Machine (SVM) algorithm.

SVM can be used for classification or regression problems. It finds an optimal boundary
between the possible outputs. The training dataset which we used is Olivetti faces which
contain 400 faces and its desired values or parameters. The webcam captures the image of the
user.It then extract the facial features of the user from the captured image. The training
process involves initializing some random values for say smiling and not smiling of our
model, predict the output with those values, then compare it with the model's prediction and
then adjust the values so that they match the predictions that were made previously.

Evaluation allows the testing of the model against data that has never been seen and
used for training and is meant to be representative of how the model might perform when in
the real world. According to the emotion ,the music will be played from the predefined
directories

3
1. PROJECT OVERVIEW
1.INTRODUCTION

1.1 Background
Music plays a very important role in enhancing an individual‘s life as it is an important
medium of entertainment for music lovers and listeners. In today‘s world, with ever
increasing advancements in the field of multimedia and technology, various music players
have been developed with features like fast forward, reverse, variable playback speed (seek &
time compression),local playback, streaming playback with multicast streams and including
volume modulation, genre classification etc. Although these features satisfy the user‘s basic
requirements, yet the user has to face the task of manually browsing through the playlist of
songs and select songs based on his current mood and behaviour.

1.2 Objectives
Project Emo player (an emotion based music player) is a novel approach that helps the user to
automatically play songs based on the emotions of the user. It recognizes the facial emotions
of the user and plays the songs according to their emotion. The emotions are recognized
using a machine learning method Support Vector Machine(SVM )algorithm. The human face
is an important organ of an individual‘s body and it especially plays an important role in
extraction of an individual‘s behaviours and emotional state.The webcam captures the image
of the user.It then extract the facial features of the user from the captured image. Facial
expression categorized into 2, smiling and not smiling.
According to the emotion, the music will be played from the predefined directories.

4
1.3 PROBLEM DEFINITION
Using traditional music players, a user had to manually browse through his playlist
and select songs that would soothe his mood and emotional experience. In today‘s world,
with ever increasing advancements in the field of multimedia and technology, various music
players have been developed with features like fast forward, reverse, variable playback speed
(seek & time compression),local playback, streaming playback with multicast streams and
including volume modulation, genre classification etc.
Although these features satisfy the user‘s basic requirements, yet the user has to face
the task of manually browsing through the playlist of songs and select songs based on his
current mood and behaviour. That is the requirements of an individual, a user sporadically
suffered through the need and desire of browsing through his playlist, according to his mood
and emotions.

1.4 PURPOSE
The main concept of this project is to automatically play songs based on the emotions of the
user. It aims to provide user-preferred music with emotion awareness. In existing system user
want to manually select the songs, randomly played songs may not match to the mood of the
user, user has to classify the songs into various emotions and then for playing the songs user
has to manually select a particular emotion. These difficulties can be avoided by using Emo
Player (Emotion based music player). The emotions are recognized using a machine learning
method Support Vector Machine (SVM)algorithm. SVM can be used for classification or
regression problems. According to the emotion, the music will be played from the predefined
directories.

1.5 SCOPE
Facial expressions are a great indicator of the state of a mind for a person. Indeed the
most natural way to express emotions is through facial expressions. Humans tend to link the
music they listen to, to the emotion they are feeling. The song playlists though are, at times
too large to sort out automatically. It would be helpful if the music player was “smart
enough” to sort out the music based on the current state of emotion the person is feeling. The
project sets out to use various techniques for an emotion recognition system, analyzing the
5
impacts of different techniques used. By using Emo player we can easily play the songs
according to the emotion of the user.
SYSTEM ARCHITECTURE:
In this paper,In this project, by running the main web page it will trigger an XML file that
then Opencv helps in capturing images from the webcam as well as for processing purposes. made the
implementation of the fisher face methodology of OpenCV for classification. And fisher face to train
the model and store it in a model-file(XML). While using a player it uses for prediction for the
emotion which will show you the main media player web page. In this, it contains 2 options one for
emotion-based detection and the other for random selection of songs Rando picking we are a small
library in python i.e, Eel. On the other hand, we are having the emotion-based music system in this we
are using 3 main algorithms for capturing, detection and playing of the music.this system, describes
the facial expressions using detection and combination of spatial expressions. After Feature
Extraction, the Emotions are classified it is in 4 forms I.e, Happy, Angry, Sad and neutral face. The
emotions that are transferred to last step are in numerical form and the music is played from the
emotions that are detected. The main objective of face detection technique is to identify the frame I.e,
face. And the other phase of the project is the random mode for this we are using Eel for the random
picking of songs irregular to the queue. The win sound module is used to access the local sound-
playing machinery that is provided in the Windows platforms.

6
1.6 APPLICATION
 Automatically play song based on the emotion of the user.
 Act as a pluigin for website.
 Recommending for Youtube.
 Smart TV.
 Personal Assistant

1.7 LITERATURE REVIEW


Machine learning is a field of computer science that uses statistical techniques to give
computer systems the ability to "learn" (i.e., progressively improve performance on a specific
task) with data, without being explicitly programmed.
The name machine learning was coined in 1959 by Arthur Samuel. Evolved from the
study of pattern recognition and computational learning theory in artificial intelligence,
machine learning explores the study and construction of algorithms that can learn from and
make predictions on data – such algorithms overcome following strictly static program
instructions by making data-driven predictions or decisions, through building a model from
sample inputs. Machine learning is employed in a range of computing tasks where designing
and programming explicit algorithms with good performance is difficult or infeasible;
example applications include email filtering, detection of network intruders or malicious
insiders working towards a data breach, optical character recognition (OCR), learning to
rank, and computer vision.
Machine learning is closely related to (and often overlaps with) computational statistics,
which also focuses on prediction-making through the use of computers. It has strong ties to
mathematical optimization, which delivers methods, theory and application domains to the
field. Machine learning is sometimes conflated with data mining, where the latter subfield
focuses more on exploratory data analysis and is known as unsupervised learning. Machine
learning can also be unsupervised and be used to learn and establish baseline behavioral
profiles for various entities and then used to find meaningful anomalies.
Within the field of data analytics, machine learning is a method used to devise complex
models and algorithms that lend themselves to prediction; in commercial use, this is known
7
as predictive analytics. These analytical models allow researchers, data scientists, engineers,
and analysts to "produce reliable, repeatable decisions and results" and uncover "hidden
insights" through learning from historical relationships and trends in the data. Effective
machine learning is difficult because finding patterns is hard and often not enough training
data are available; as a result, machine-learning programs often fail to deliver.

Our work is based and worked and it is run on python runtime library so we refer
[1] this reference gives python contribution for scientific area, and our player playes the
music according to the type of emotion and it is based on support vector machine learning
technique so we refer [2] which introduce some concepts of machine learning[3] We refer
this to understand the language for data mining and machine learning.[4]This video link gives
brief idea for emotion recognition in python.[5]This video link gives out brief idea of facial
recognition in python. We need a dataset for our player to detect type of emotion , we need to
generate it so we refer [6] for data set generation,[7] for emotion recognition with python and
open CV.

Steps in Machine Learning


Machine learning is a field of computer science that gives computers the ability to learn
without being programmed explicitly. The power of machine learning is that you can
determine how to differentiate using models, rather than using human judgment. The basic
steps that lead to machine learning and will teach you how it works are described below in a
big picture:
1. Gathering data
2. Preparing that data
3. Choosing a model
4. Training
5. Evaluation
6. Hyper parameter tuning
7. Prediction.

Gathering Data:
8
Once you know exactly what you want and the equipment’s are in hand, it takes you
to the first real step of machine learning- Gathering Data. This step is very crucial as the
quality and quantity of data gathered will directly determine how good the predictive model
will turn out to be. The data collected is then tabulated and called as Training Data.
Data Preparation:
After the training data is gathered, you move on to the next step of machine learning:
Data preparation, where the data is loaded into a suitable place and then prepared for use in
machine learning training. Here, the data is first put all together and then the order is
randomized as the order of data should not affect what is learned. This is also a good enough
time to do any visualizations of the data, as that will help you see if there are any relevant
relationships between the different variables, how you can take their advantage and as well as
show you if there are any data imbalances present. Also, the data now has to be split into two
parts. The first part that is used in training our model, will be the majority of the dataset and
the second will be used for the evaluation of the trained model’s performance. The other
forms of adjusting and manipulation like normalization, error correction, and more take place
at this step.

Choosing a model:
The next step that follows in the workflow is choosing a model among the many that
researchers and data scientists have created over the years. Make the choice of the right one
that should get the job done.

Training:
After the before steps are completed, you then move onto what is often considered the bulk of
machine learning called training where the data is used to incrementally improve the model’s
ability to predict. The training process involves initializing some random values for say A
and B of our model, predict the output with those values, then compare it with the model's
prediction and then adjust the values so that they match the predictions that were made
previously.
This process then repeats and each cycle of updating is called one training step.

9
Evaluation:

Once training is complete, you now check if it is good enough using this step. This is
where that dataset you set aside earlier comes into play. Evaluation allows the testing of the
model against data that has never been seen and used for training and is meant to be
representative of how the model might perform when in the real world.

Parameter Tuning:
Once the evaluation is over, any further improvement in your training can be possible
by tuning the parameters. There were a few parameters that were implicitly assumed when
the training was done. Another parameter included is the learning rate that defines how far
the line is shifted during each step, based on the information from the previous training step.
These values all play a role in the accuracy of the training model, and how long the training
will take. For models that are more complex, initial conditions play a significant role in the
determination of the outcome of training. Differences can be seen depending on whether a
model starts off training with values initialized to zeroes versus some distribution of values,
which then leads to the question of which distribution is to be used. Since there are many
considerations at this phase of training, it’s important that you define what makes a model
good. These parameters are referred to as Hyper parameters. The adjustment or tuning of
these parameters depends on the dataset, model, and the training process. Once you are done
with these parameters and are satisfied you can move on to the last step.

Prediction:
Machine learning is basically using data to answer questions. So this is the final step
where you get to answer few questions. This is the point where the value of machine learning
is realized. Here you can Finally use your model to predict the outcome of what you want.
The above-mentioned steps take you from where you create a model to where you Predict its
output and thus acts as a learning path.

10
2. System Analysis
2.1 Existing System
The features available in the existing Music players present in computer systems are
as follows: i. Manual selection of Songs ii. Party Shuffle iii. Playlists iv. Music squares where
user has to classify the songs manually according to particular emotions for only four basic
emotions .Those are Passionate, Calm, Joyful and Excitement.
Using traditional music players, a user had to manually browse through his playlist
and select songs that would soothe his mood and emotional experience .In today‘s world,
with ever increasing advancements in the field of multimedia and technology, various music
players have been developed with features like fast forward, reverse, variable playback speed
(seek & time compression),local playback, streaming playback with multicast streams and
including volume modulation, genre classification etc.

Although these features satisfy the user‘s basic requirements, yet the user has to face
the task of manually browsing through the playlist of songs and select songs based on his
current mood and behaviour. That is the requirements of an individual, a user sporadically
suffered through the need and desire of browsing through his playlist, according to his mood
and emotions.

2.2 Limitation Existing System


 It requires the user to manually select the songs.
 Randomly played songs may not match to the mood of the user.
 User has to classify the songs into various emotions and then for playing the songs user has
to manually select a particular emotion.

11
2.3 PROPOSED SYSTEM
Here we propose a Emotion based music player(Emo Player).Emo player is an music
player which play songs according to the emotion of the user.. It aims to provide user-
preferred music with emotion awareness. Emo player is based on the idea of automating
much of the interaction between the music player and its user. The emotions are recognized
using a machine learning method Support Vector Machine(SVM )algorithm. In machine
learning, support vector machines are supervised learning models with associated learning
algorithms that analyse data used for classification and regression analysis. It finds an
optimal boundary between the possible outputs.
The training dataset which we used is Olivetti faces which contain 400 faces and its
desired values or parameters. The webcam captures the image of the user. It then extract the
facial features of the user from the captured image. The training process involves initializing
some random values for say smiling and not smiling of our model, predict the output with
those values, then compare it with the model's prediction and then adjust the values so that
they match the predictions that were made previously. Evaluation allows the testing of the
model against data that has never been seen and used for training and is meant to be
representative of how the model might perform when in the real world. According to the
emotion, the music will be played from the predefined directories.

2.4 Advantages of Proposed System


 Users don’t want to select song manually.
 No need of playlist.
 Users don’t want to classify the songs based on the emotions.

12
OVERVIEW OF MODULES

The overall system can be divided into various logical stages like capturing image, detection
of face, detection of landmark points on the detected face, classifying those feature points
with the help of SVM classifier, and then generating the playlist according to that recognized
mood. During the training phase a dataset of images will be created to train the SVM
classifier while after implementation of the system a single captured image can be given to
trained SVM file to predict the mood.
The different moods in which the system will classify the images are happy, neutral, and sad.
The system will pre-sort the songs according to their genre in the above mentioned
categories. Using these presorted lists and some heuristics like the most frequently played
songs, personalized playlists will be created.

13
There are Three Modes
1.Queue Mode
2.Emotion Mode
3. Random Mode

1.Queue Mode
In this Queue Mode the Files in Buffer. Buffer files can play what ever files to play

14
2. Emotion mode
Project is a novel approach that helps the user to automatically play songs based on the emotions of
the user. It recognizes the facial emotions of the user and plays the songs according to their
emotion.
3.Random mode
Emo player is an music player which plays song according to the emotion of the user. It aims to
provide user preferred music with emotion awareness. Emo player is based on the idea of
automating generation of music. The emotion are recognized using a machine learning method
support vector machine algorithm

Block Diagram of MoodyPlayer

15
16
sequence diagram

17
18
DataSet creation

In order to gain high accuracy in correctly predicting the mood, properly training model is a
vital step. To have a good trained model giving a dataset which accurately justifies the
generalized nature of human face is also important. The dataset must consist of images of
subject or person with varied face structure and facial expressions. If the system recognizes
and processes different ethnically differentiating facial structure it will more generic in nature
and system can be effective generalized with high performance.
For proposed system to work, dataset must contain images of human frontal faces only.
Dataset contains only faces of adult male and female subjects having age range between 18 to
36 years portraying different emotions. Based on the above criteria, selection was done from
multiple databases which were Karolinska Directed Emotional Faces (KDEF) [12],
Taiwanese Facial Expression Image Database [13], and The Japanese Female Facial
Expression (JAFFE) [14] Database. Along with these an additional database is also created
which is named as Indian Facial Emotion Expression Database (IFEED) and collectively all
19
this forms a single dataset which has been categorized into three different emotion i.e. happy,
neutral, sad, and they are accordingly labeled.
Conventionally dataset is divided into training dataset and testing dataset using some thumb
rule but proposed system will use K-fold cross validation. To build a generic dataset certain
criteria were imposed while selecting dataset these criteria were 1) Healthy ratio of male and
female, in which there are 80 male and 102 female images. 2) Diversity in age i.e., a broad
age range from 18 to 36 is chosen. 3) Care is also been taken to represent a good amount of
subject from varied geographical region i.e., Indian, Japanese, South Asian, South American.
After applying the above criteria‟s, the dataset contains 1050 images in total with 350 in each
of the three moods.

3.2 Face detection


It is important for an image to be well lit and recognizable for the system to detect face. The
system should read faces in low light but if the light is below certain accepted limit the
system gives a prompt to retake as no face is detected.
Image is divided into the fixed size blocks. The blocks from which Histogram of Gradient
(HOG) features are extracted and are overlapped on the detector window, the extracted
vectors are fed as input to linear SVM in order to classify object or non-object classification
and obtain result as the extracted face.

3.3 Feature extraction

When the system takes the images as an input it‟s very important to device an algorithm to
detect emotions. This detection of emotions can be obtained by extracting the feature of the
face for example the lips are stretched and make an extended „v‟ type structure it means
person is very happy similarly if eyebrows are arched and lips are in the shape of small „n‟
then it amounts to a sad emotions. Number of permutations and combinations like the above
stated determine the appropriate emotion through face and classify them into different present
emotion categories.
Feature extraction is necessary stage for predicting the correct mood of the user so proposed
system will use ensemble of regression trees to obtain facial landmark positions from sparse
2
0
subset of pixel intensities. For robust performance with high quality prediction gradient
boosting for training an ensemble of regression trees is used to minimize the misclassified
and partially labeled data. There are total 52 landmark points on the face that system will
detect and these points will be the data on basis of which the mood will be predicted. These
extracted feature points are further passed to the SVM for classification.

3.4 Classification
Classification is a general process related to categorization. It is the action or process of
clustering something. System is capable to classify images into different emotions. For
classification of images, system will use Support Vector Machine (SVM). There are two
phases in preparing an efficient classifier. 3.4.1 Training Phase: The training dataset is
divided into three classes for three moods i.e. happy, neutral, and sad , for each of these
corresponding mood values +1 ,+2, and +3 is assigned in the training file as the correct label
for this model to train. Here giving feature point along with its correct label is at most
important as the learning of the model is supervised learning. The feature point are the
position of the landmark on the face which were calculated in the previous step of feature
extraction these feature point are converted in a single dimension from a two dimension X,Y
coordinate system. 3.4.2 Testing Phase: In testing phase , a similar input is given to the train
model as it was in training phase with an exception that correct label is not known by the
system for predicting the mood. It is just use to check the accuracy of the predicted result.

Mood detection phases of MoodyPlayer


The figure depicts system‟s overview. It processes image from IFEE database, detecting the
face using the HOG technique, then detecting the facial landmark points and classifying it
using SVM classifier to predict the mood happy.

21
3.5 Music classification

In this stage, the predicted mood of the user is used to classify songs and to create a playlist
for the particular mood. Songs are categorized into three groups representing the previously
specified mood. This classification of songs happens on the basis of genre of the songs. For
experimentation only English songs are considered which were freely available to download
and use. These songs were manually categorized into above said three groups.

22
SOFTWARE AND HARDWARE REQUIREMENTS

Hardware Requirements:
The most common set of requirements defined by any operating system or software
application is the physical computer resources, also known as hardware. The hardware
requirements required for this project are:

 Intel i3
 4GB RAM
 Webcam
 Speaker

Software Requirements:

Software Requirements deal with defining software resource requirements and pre-
requisites that need to be installed on a computer to provide optimal functioning of an
application. These requirements or pre-requisites are generally not included in the software
installation package and need to be installed separately before the software is installed. The
software requirements that are required for this project are:

 Python 2.7
 Open CV 3.1

23
System IMPLEMENTATION

Detecting emotions

Collecting data

    Facial expression detection in Fisherface works with the help of trained models.
Reason behind this is to allow user to take dataset according to their use. Suppose if we take a
huge amount of dataset of around 25-30k it will give nice accuracy no doubt but if the
situation is like that the user of the devices are a few people. Now in such condition if we
take some precise dataset with around 400-450 images as input releted to the user then it will
also give good accuracy with the benefit of less amount of dataset and less storage on
memory to operate. As well as small memory of data give output fast which result in quick
response time. Here we first tried with Cohn-Kanade dataset then we made some
classification in the as our need make it to train our model.
   
Loading and saving trained model

For training, We have used Fisherface method of cv2 library.

For training data model we have make a python code which grab all the classified
images from folders and map it with it’s emotion. These data we at an instance stored in
dictionary and then use .train method to train model.

To save the model for later use we have implemented .save method.

  
Now at the detection time first we have load model in memory using .read method.

24
Prediction of result is based on the prediction and confidence value which .predict
method return.

Haarcascade model

Haarcascade model is precise face detection trained model which is provided by


Open-cv. It return the co-ordinates in terms of (x, y) at (left, bottom) of face frame and it’s
width and height from those co-ordinates.

As here in the .detectMultiScale() method it is capable of detect multiple faces and it


return an array of all the faces(co-ordinates) as an element.   
The arguments has set according to the threshold what we need for our checking
purpose. We have set it such like it doesn’t affect our model accuracy.

Result Calculation
In our model we have not stick on one image for testing, While the code will run it
will take around 10 images in a short time(1-2 sec) and for all those images it will compute
result and according to the average value of that it will give result. Apart from that we have
make two codes one work on single face at a time while another work with multiple faces in
the image.

Machine Learning

Fisherface ML algorithm
    Fisherface algorithm is an algorithm which work on the basis of LDA and PCA
concepts. Linear discriminant analysis (LDA) is a supervised Learning method of machine
learning. Now supervised Learning is that where we use such data whose answer is also
given to the model to learn it. It work on the concept of dimensionality reduction. Which
reduce the execution time among classification.

25
Principal Component Analysis (PCA) is a one kind of conversion from correlated
variables to uncorrelated in the form of mathematical values.
It is mostly used for the observing data and from that by some probabilistic
calculation generate models. The flow of Fisherface is like it takes classified images then it
will reduce the dimension of the data and by calculating it’s statistical value according the
given categories it stores numeric values in .xml file. While prediction it also calculate the
same for given image and compare the value with the computed dataset values and give
according result with confidence value.  

Resizing images
    Whatever the image we have chosen for dataset it mostly related to the size which
can give an precise output. The size is chosen such like the model can able to easily
distinguish face from image by haarcascade model. And the size what we get from real time
scan is not always same as data (very less difference) so, We resize it to the exact model data
size. In our case we have chosen 350*350.

Here In this method, we have implemented the cropping of image by given


parameters of haarcascade by clahe_image[] and use of cv2’s method .resize() to the given
size. Finally, We have stored those images in dictionary and after some count(=10) take it to
check result.

Gray scaling images


It was the need for the method and because of it’s contrast and shaded face, it result in
benefit for algorithm to get output.

26
Face detection

As the given in the code grab_face() methods uses to get the images and do all
operation and finally return cropped ,grayed face value in dictionary.

Train and predict methods

This code is use to get prediction and confidence value for given amount of image.
Then get the max function with obtained output and final result is shown to the user.

Playing music

Detected emotions

We have implemented the linking of python with javascript through eel library.
Which provide us the privilege to access python methods from js as well as vice versa. Here
the striating flow will be in python code as the library is implemented in python then it
transfer the control to html, JS. And according to the result we show emoticons.

 
27
   Sad     happy  angry neutral
According to which we can classify emotion directory for playing song we have
chosen this 4 emotions.

Methods for playing songs

In JavaScript file we have implemented too much methods for the switching of song.
Queue
Based on Emotion
Random
In the first one as queue works it has been implemented. In second one we call python
code to get emotion from user’s facial expression and according to that chosen next song
which is also randomly and played it. In third one we directly used random function and all
the methods are dynamic it can handle as change in number of songs accordingly.

HTML, CSS and JS concepts for online music player.


As we know the css give a great look to communicate and through JS we can interact
with user and not look like complicated program run at console and it also give user privilege
to choose any song to play.

2
8
TESTING AND EVALUATION

Testing and Evaluation


The evaluation of the software indicates that the primary objective of the system has
been achieved: using multiple techniques to perform emotion recognition and determining
which has the highest success rates. This also suggests that the music classification section of
the system has been achieved also. Hence the system uses the predetermined emotion to
classify and sort out playlists for a user.

29
Testing the developed system was done with around 160 images from various
databases.The images were tested using various feature extractors and using the two different
classifiers. Each technique portrayed various results. The expressions considered were:
1. Neutral
2. Happy
3. Sad
4. Angry

6.1 During Development:


During the period of development, thorough unit testing occurred after new
featureshad been implemented at the end of each sprint cycle. Testing was done on various
subjects and the observations and comments were taken into account for future development.
6.2 Test Driven Development:
The entire system was designed with a test driven development approach. Final test
cases of people portraying particular emotions was predetermined. A set of 160 images was
used to test the system in the end. 40 images per each emotion was used to test the system
thoroughly.
6.3 Results and Analysis:
The following section details on a few of the important findings of the system. Results
from the basic use of SIFT and SURF were not very successful, but couple these 2 feature
detectors with the Bag Of Words model, the results were much more accurate. PCA with
BRIEF showed quite well results, but the highest accuracy rates were given by PCA with
Harris Edge Detector, where the Euclidean distance was found between selective features
from the face (figure 2) and the mean face produced by the PCA algorithm.

This section first compares the face recognition accuracy of a full frontal face to that
of a sideways facing person. The Viola Jones technique (discussed in section 4.1.1 ) is
extremely accurate in localizing a frontal face from a picture or a video. But problems arise
when the user views sideways. This section compares the accuracy of local binary patterns
and viola jones in face localization for a frontal face to that of a sideways facing subject.

3
0
A total of 80 subjects were Local Binary Patterns
tested, 40 were frontal facing and 40
were facing sideways either to the left
or right. Viola Jones
Frontal Face 40 39
Sideways Face 17 14

31
Source Code :
Capture.py
import cv2
import argparse
import time
import os
import Update_Model
import glob
import random
import eel
#import winsound

frequency=2500
duration=1000

eel.init('WD_INNOVATIVE')
emotions=["angry", "happy", "sad", "neutral"]
fishface = cv2.face.FisherFaceRecognizer_create()
font = cv2.FONT_HERSHEY_SIMPLEX
'''try:
fishface.load("model.xml")
except:
print("No trained model found... --update will create one.")'''

parser=argparse.ArgumentParser(description="Options for emotions based music


player(Updating the model)")
parser.add_argument("--update", help="Call for taking new images and retraining the
model.", action="store_true")
args=parser.parse_args()
facedict={}
32
video_capture=cv2.VideoCapture(0)
facecascade=cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
def crop(clahe_image, face):
for (x, y, w, h) in face:
faceslice=clahe_image[y:y+h, x:x+w]
faceslice=cv2.resize(faceslice, (350, 350))
facedict["face%s" %(len(facedict)+1)]=faceslice
return faceslice

def grab_face():
ret, frame=video_capture.read()
#cv2.imshow("Video", frame)
cv2.imwrite('test.jpg', frame)
cv2.imwrite("images/main%s.jpg" %count, frame)
gray=cv2.imread('test.jpg',0)
#gray=cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
clahe=cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
clahe_image=clahe.apply(gray)
return clahe_image

def detect_face():
clahe_image=grab_face()
face=facecascade.detectMultiScale(clahe_image, scaleFactor=1.1, minNeighbors=15,
minSize=(10, 10), flags=cv2.CASCADE_SCALE_IMAGE)
if len(face)>=1:
faceslice=crop(clahe_image, face)
#return faceslice
else:
print("No/Multiple faces detected!!, passing over the frame")

def save_face(emotion):
33
print("\n\nLook "+emotion+" untill the timer expires and keep the same emotion for
some time.")
#winsound.Beep(frequency, duration)
print('\a')

for i in range(0, 5):


print(5-i)
time.sleep(1)

while len(facedict.keys())<16:
detect_face()

for i in facedict.keys():
path, dirs, files = next(os.walk("dataset/%s" %emotion))
file_count = len(files)+1
cv2.imwrite("dataset/%s/%s.jpg" %(emotion, (file_count)), facedict[i])
facedict.clear()

def update_model(emotions):
print("Update mode for model is ready")
checkForFolders(emotions)

for i in range(0, len(emotions)):


save_face(emotions[i])
print("Collected the images, looking nice! Now updating the model...")
Update_Model.update(emotions)
print("Model train successful!!")

def checkForFolders(emotions):
for emotion in emotions:
if os.path.exists("dataset/%s" %emotion):
34
pass
else:
os.makedirs("dataset/%s" %emotion)

def identify_emotions():
prediction=[]
confidence=[]

for i in facedict.keys():
pred, conf=fishface.predict(facedict[i])
cv2.imwrite("images/%s.jpg" %i, facedict[i])
prediction.append(pred)
confidence.append(conf)
output=emotions[max(set(prediction), key=prediction.count)]
print("You seem to be %s" %output)
facedict.clear()
return output;
#songlist=[]
#songlist=sorted(glob.glob("songs/%s/*" %output))
#random.shuffle(songlist)
#os.startfile(songlist[0])
count=0
@eel.expose
def getEmotion():

count=0
while True:
count=count+1
detect_face()
if args.update:
update_model(emotions)
35
break
elif count==10:
fishface.read("model.xml")
return identify_emotions()
break

#eel.start('main.html', options=web_app_options)
#options={'host':'file', 'port': '//'}

eel.start('main.html')#//WD_INNOVATIVE//main.html')
#, options)

'''van Gent, P. (2016). Emotion Recognition With Python, OpenCV and a Face Dataset. A
tech blog about fun things with Python and embedded electronics. Retrieved from:
http://www.paulvangent.com/2016/06/30/making-an-emotion-aware-music-player/
'''
display.py
import tkinter as tk
import cv2
from PIL import Image, ImageTk

width, height = 800, 600


cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, height)

root = Music_player.Tk()
root.bind('<Escape>', lambda e: root.quit())
lmain = Music_player.Label(root)
36
lmain.pack()

def show_frame():
_, frame = cap.read()
frame = cv2.flip(frame, 1)
cv2image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGBA)
img = Image.fromarray(cv2image)
imgtk = ImageTk.PhotoImage(image=img)
lmain.imgtk = imgtk
lmain.configure(image=imgtk)
lmain.after(10, show_frame)

show_frame()
root.mainloop()

Update_model.py

import numpy as np
import glob
import random
import cv2

fishface=cv2.face.FisherFaceRecognizer_create()
data={}

def update(emotions):
run_recognizer(emotions)
print("Saving model...")
fishface.save("model.xml")
print("Model saved!!")

37
def make_sets(emotions):
training_data=[]
training_label=[]

for emotion in emotions:


training=training=sorted(glob.glob("dataset/%s/*" %emotion))
for item in training:
image=cv2.imread(item)
gray=cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
training_data.append(gray)
training_label.append(emotions.index(emotion))
return training_data, training_label

def run_recognizer(emotions):
training_data, training_label=make_sets(emotions)
print("Training model...")
print("The size of the dataset is "+str(len(training_data))+" images")
fishface.train(training_data, np.asarray(training_label))

code.js

var songrun=false;
var count=1;
var mod=1;
var path=["songs\\ban ja rani.mp3"
,"songs\\Banduk meri laila.mp3"
,"songs\\barish.mp3"
,"songs\\haareya.mp3"
,"songs\\ik vari aa.mp3"
,"songs\\main tera.mp3"
,"songs\\mercy.mp3"
,"songs\\musafir.mp3"
3
8
,"songs\\o sathi.mp3"
,"songs\\phir bhi.mp3"];

var sname=["Ban Ja tu meri Rani",


"Banduk Meri Laila",
"Barish",
"Haareya",
"Ik vari aa",
"main tera boyfriend",
"mercy",
"musafir",
"o sathi",
"Phir Bhi"
];

var sd=["Artist: Guru Randhawa<br>Movie: Tumhari Sulu<br>Released: 2017",


"Artists: Ash King, Jigar Saraiya<br>Featured artists: Sidharth Malhotra, Raftaar<br>Movie:
A Gentleman<br>Released: 2017"
,"Artists: Ash King, Shashaa Tirupati<br>Movie: Half Girlfriend<br>Released:
2017<br>Awards: Zee Cine Award for Song of the Year"
,"Artist: Arijit Singh<br>Movie: Meri Pyaari Bindu<br>Released: 2017<br>Nominations:
Mirchi Music Awards for Best Song Producer - Programming & Arranging"
,"Artist: Arijit Singh<br>Movie: Raabta<br>Released: 2017"
,"Artists: Arijit Singh, Neha Kakkar, Meet Bros<br>Movie: Raabta<br>Released:
2017<br>Composer(s): : Sohrabbudin (Original); Sourav Roy (Revamped).<br>Genre: Dance
music"
,"Artist: Badshah<br>Released: 2017<br>Nominations: Mirchi Music Awards for Best Song
Engineer - Recording and Mixing"
,"Artist: KK<br>Movie: Shab<br>Released: 2017"
,"Artist: Arijit Singh<br>Movie: Shab<br>Released: 2017"
,"Artists: Arijit Singh, Shashaa Tirupati<br>Movie: Half Girlfriend<br>Released:
2017<br>Written: 2001 (lyrics)<br>Lyricist(s): Manoj Muntashir<br>Composer(s):
Mithoon"];

var bool=[];
for(var i=0; i<sd.length; i++)
39
bool[i]=false;

var icon=["images\\\\1.jpg",
"images\\\\2.jpg",
"images\\\\3.jpg",
"images\\\\4.jpg",
"images\\\\5.jpg",
"images\\\\6.jpg",
"images\\\\7.jpg",
"images\\\\8.jpg",
"images\\\\9.jpg",
"images\\\\10.jpg"];

var mood=[["1","2","3"],["4","5"],["6","7","8"],["9","10"]];
var mmm=["1.png","1.png","1.png","2.png","2.png","3.png","3.png","3.png","4.png","4.png"];

var songs=new Array(icon.length);


for (var i = 0; i<icon.length; i++) {
songs[i]=new Array(4);
songs[i][0]=path[i];
songs[i][1]=sd[i];
songs[i][2]=icon[i];
songs[i][3]=mmm[i];
console.log(songs[i][0]);
console.log(songs[i][1]);
console.log(songs[i][2]);
var ins=document.createElement("div");
ins.id='b'+i;
//ins.onclick=function(){
//next(this);
//};
ins.setAttribute("class", "song");
document.body.appendChild(ins);
document.getElementById('b'+i).innerHTML='<div id="pic" style=\'background-
image: url(\"'+songs[i][2]+'\");\'> <input type="button" id="'+"a"+i+'" class="play" > <input

4
0
type="button" id="'+"c"+i+'" class="add"> </div><div id="data"><br><br>'+songs[i]
[1]+'</div>';
document.getElementById('a'+i).onclick=function(){
play(this);
};
document.getElementById('c'+i).onclick=function(){
addq(this);
};
}

function setmod(elem){
mod=elem.value;
if(!songrun){
if(mod==2)
getTime();
if(mod==3)
rand_play();
}
}

function play(elem){
console.log(elem.id);
var x=elem.id.charAt(1);
var z=songs[x][0];
document.getElementById("sname").innerHTML=sname[x];
document.getElementById("sel").src= z;
document.getElementById("main_slider").load();
document.getElementById("main_slider").play();
document.getElementById("emoji").style.backgroundImage="url('"+songs[x][3]+"')";
songrun=true;

41
var eqc=1;
var sqc=1;

function addq(elem){
console.log(elem.id);
var x=elem.id.charAt(1);
if(!songrun){
var z=songs[x][0];
document.getElementById("sname").innerHTML=sname[x];
document.getElementById("sel").src= z;
document.getElementById("main_slider").load();
document.getElementById("main_slider").play();
document.getElementById("emoji").style.backgroundImage="url('"+songs[x]
[3]+"')";
songrun=true;
return;
}
if(bool[x]==true)
return;

bool[x]=true;
var l=document.createElement("label");
l.id="e"+eqc;
l.name=x;
l.innerHTML=sname[x]+"<br>";
//var text=document.createTextNode(sname[x]+"<br>");
//l.appendChild(text);
document.getElementById("queue").appendChild(l);
eqc=eqc+1;
}

function nextsong(){
if(sqc==eqc){
alert("Queue is empty.");

42
return;
}
var elem=document.getElementById("e"+sqc);
var xa=elem.name;
var pa=songs[xa][0];
bool[xa]=false;
document.getElementById("sname").innerHTML=sname[xa];
document.getElementById("sel").src= pa;
document.getElementById("main_slider").load();
document.getElementById("main_slider").play();

document.getElementById("emoji").style.backgroundImage="url('"+songs[xa][3]+"')";

songrun=true;
document.getElementById("queue").removeChild(elem);
sqc=sqc+1;

function next_in_Q(){
songrun=false;
if(sqc==eqc){
alert("Queue is empty.");
return;
}
var elem=document.getElementById("e"+sqc);
var xa=elem.name;
var pa=songs[xa][0];
document.getElementById("sname").innerHTML=sname[xa];
document.getElementById("sel").src= pa;
document.getElementById("main_slider").load();
document.getElementById("main_slider").play();

document.getElementById("emoji").style.backgroundImage="url('"+songs[xa][3]+"')";
songrun=true;

43
document.getElementById("queue").removeChild(elem);
sqc=sqc+1;
}

function rand_play(){
var index=Math.random()*path.length;
index=parseInt(index);
var pa=songs[index][0];
document.getElementById("sname").innerHTML=sname[index];
document.getElementById("sel").src= pa;
document.getElementById("main_slider").load();
document.getElementById("main_slider").play();
document.getElementById("emoji").style.backgroundImage="url('"+songs[index]
[3]+"')";
songrun=true;

}
function moody(val){
var index=Math.random()*mood[val].length;
index=parseInt(index);
var pa=songs[mood[val][index]-1][0];
document.getElementById("sname").innerHTML=sname[mood[val][index]-1];
document.getElementById("sel").src= pa;
document.getElementById("main_slider").load();
document.getElementById("main_slider").play();
document.getElementById("emoji").style.backgroundImage="url('"+songs[mood[val]
[index]-1][3]+"')";
songrun=true;
}

async function getTime() {


let value = await eel.getEmotion()();
if(value=="angry")
moody(0);
else if(value=="happy")

44
moody(1);
else if(value=="sad")
moody(2);
else
moody(3);
}

main.html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>musical Arena</title>
<script type="text/javascript" src="/eel.js"></script>
<link rel="stylesheet" type="text/css" href="header.css">
<link rel="stylesheet" type="text/css" href="background.css">
<link rel="stylesheet" type="text/css" href="player.css">
<style>
body{
margin: 0;
}
</style>
</head>

<body>
<br><br><br><br><br>
<div id="first">
<h1 font-family="verdana" align="center" style="font-size: 40px;color:
white;margin: 0;">Music Player<h1>
</div>
<!--<div id="second" class="second">

<button onclick="myFunction()">

45
</div>
-->

<script src="code.js"></script>

<div
id="queue"><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Que
ue<input type="button" id="next" onclick="nextsong()"><hr></div>

<div id="third">
<div id="emoji"></div>
<div id="xyz">
&nbsp;&nbsp;&nbsp;Playing :&nbsp;&nbsp;
<label id="sname" align="center">none</label></div>
<div id="mod">mode : Queue-mode <input type="radio" name="mode"
checked="checked" onclick="setmod(this)" value="1"> &nbsp;&nbsp;Emotion-mode <input
type="radio" name="mode" onclick="setmod(this)" value="2"> &nbsp;&nbsp;Random-mode
<input type="radio" name="mode" onclick="setmod(this)"
value="3"></div>
<audio controls id="main_slider">
<source id="sel" type="audio/mpeg">
Your browser does not support the audio element.
</audio>

<script>
document.getElementById("main_slider").onended=function(){
if(mod==1)
next_in_Q();
else if(mod==2){
getTime();
}
else
rand_play();

};

46
</script>
</div>
</body>
</html>

SCREEN SHOTS

47
4
8
5. CONCLUSION AND FUTURE WORK

The Emotion-Based Music Player is used to automate and give a better music

player experience for the end user. The application solves the basic needs of music

listeners without troubling them as existing applications do: it uses technology to

increase the interaction of the system with the user in many ways. It eases the work of

the end-user by capturing the image using a camera, determining their emotion, and

suggesting a customized play-list through a more advanced and interactive system. The

49
user will also be notified of songs that are not being played, to help them free up

storage space.

6.1 Future Work

The application can be improved by modifying and adding few functionality.

• Current application uses Affectiva SDK that has a lot of limitations, creating
custom emotion recognition system that can be merged into the current appli-
cation improves functionality and performance of the system.

• Making the application run without needing an internet connection.

• Including other emotions

• Playing songs automatically

• Optimizing the EMO-algorithm by including additional features which helps


system to categorize user based on many other factors like location and sug-
• gesting the user to travel to that location and play songs accordingly.

BIBLIOGRAPHY AND REFERENCES

[1] Science about facial expression, 2011. [Online; accessed 11-July-2017]. [2]

Average time spent by a person, 2014. [Online; accessed 11-July-2017]. [3] All

music, 2017. [Online; accessed 11-July-2017].

50
[4] Angularjs- framework, 2017. [Online; accessed 11-July-2017]. [5]

Cordova - framework, 2017. [Online; accessed 11-July-2017]. [6]

Css - style sheet, 2017. [Online; accessed 11-July-2017].

[7] Emotinal recognition - affectiva, 2017. [Online; accessed 11-July-2017]. [8]

Firebase- real time database, 2017. [Online; accessed 11-July-2017]. [9] Java

script, 2017. [Online; accessed 11-July-2017].

[10] Mean- software bundle, 2017. [Online; accessed 11-July-2017]. [11]

Mongodb, 2017. [Online; accessed 11-July-2017].

[12] Moodfuse, 2017. [Online; accessed 11-July-2017].


[13] Mvc - architecture, 2017. [Online; accessed 11-July-2017]. [14]

Nodejs, 2017. [Online; accessed 11-July-2017].

[15] Nosql - non sql, 2017. [Online; accessed 11-July-2017].

[16] Saavn, 2017. [Online; accessed 11-July-2017].

[17] Steromood, 2017. [Online; accessed 11-July-2017].

51
[18] Abat, F., Maaoui, C., and A.Prusk. Human-computer interaction using emotion
recognition from facial expression. 2011 UKSim 5th European Sympo- sium on
Computer Modeling and Simulation (2011).

[19] Viola, P., and Jones, M. Rapid object detection using a boosted cascade of
simple features. Proceedings of the 2001 IEEE Computer Society Conference on, vol.
1, pp. 511-518 IEEE, 2001 (2001).

52

You might also like