Research Paper

Real Time Feedback Generation System using Facial
Emotion Recognition
Vengal Rao Guttha, Harish Kumar Kondakindi,
Department of Computer Science & Engineering,
Department of Computer Science & Engineering,
National Institute of Technology Delhi,
National Institute of Technology Delhi,
New Delhi, India
New Delhi, India
harishkondakindi97@gmail.com
vengalraoguttha@gmail.com
Abstract—Advertisements have come a long way from textual content, product or service that is supposed to stimulate
content to graphical content and finally video advertising, where emotional and facial responses. The system proposed by this
an advertisement video is played before or while the user is paper recognizes the state of emotion of the customer while
viewing any content of his interest on the internet. Feedback for he/she is watching or viewing an advertisement and then
an advertisement or a service is essential for its improvement generates the feedback.
which is often taken from the user through forms. Instead of
asking to fill up forms, here facial expression analysis can be used
to generate the feedback without having the user intervention. II. LITERATURE SURVEY
For the purpose of emotion detection, a Support Vector Machine Tran Son Hai, Le Hoang Thai, Nguyen Thanh Thuy [1]
has been employed to detect six types of emotions namely - proposed a system for facial expression classification using
happy,sad,anger,surprise,disgust and neutral. First the face is Artificial Neural network (ANN) and K-Nearest neighbor
detected and 68 facial landmarks co-ordinates are located on the
(KNN) techniques. Firstly, ICA has been used to extract facial
face that correspond to eyes, mouth, nose and jaw. With this
features which are fed to the neural network. And then certain
landmarks,a total of 15 real valued parameters are calculated
taking into account the neutral face of the user. These real valued
facial feature ratios are computed and classified using KNN. A
parameters are computed as ratios of different distant metrics minimum function has been used to combine the output of
corresponding to the actual face and neutral face. Experimental ANN and KNN classifier.
results show that an accuracy of 91.16% has been achieved in Olga Krestinskaya and Alex Pappachen James [4] used
recognizing the six emotions. This information is used to generate template matching method for emotion recognition and
a visual feedback to the advertiser. Apart from generating demonstrated that the pixel normalization and feature
feedback for an advertisement, this can also be used to get
extraction based on local mean and standard deviation followed
opinions of people wherever communication or language becomes
a hindrance either due to illiteracy or not knowing the language.
up by the Min-Max similarity classification can improve the
overall classification rates.
Keywords—Facial emotion recognition, Feature extraction, Shoaib Kamal, Farrukh Sayeed and Mohammed Rafeeq [5]
Support Vector Machines, Machine learning proposed a feature extraction technique by embedding 2D-
LDA and 2D-PCA. Classifiers SVM and KNN were used for
I. INTRODUCTION the emotion classification.
Advertisements are of great importance for any business Dev Drume and Anand Singh Jalal [6] proposed a multilevel
activity as it attracts people to use their particular service or classification approach where PCA is used at level-1 and SVM
product. One important and widely used advertising media is used at level-2 for emotion classification.
across the globe is internet among which video advertising is
mostly used. Here a video is played before a user is about to
III. PROPOSED SYSTEM
watch any content of his interest. Customer feedback is
essential to improving company’s product, its delivery and In this paper, we proposed a system which takes the image
even the understanding of the users.Most companies know this (frames) of the user through the front camera of the computer.
but struggle to gather enough good feedback, beyond the After some preprocessing steps, the image is classified to the
occasional survey. One reason can be the fact that many emotion which the user exhibits. This process repeats as long
companies don’t actually sk for feedback. And those that take as the advertisement video is being played. At the end of the
feedback, make it hard or irritating for users to provide them. advertisement, a visual feedback is generated.
For example, many feedback systems work by asking users to Finally, complete content and organizational editing before
fill a form, which he/she may not be interested in. This paper formatting. Please take note of the following items when
proposes a way of generating feedback about the advertisement proofreading spelling and grammar:
without the actual intervention of the user.
A. Face and Facial landmark detection
Facial expressions are a form of non-verbal communication.
They are a primary means of conveying social information. After a frame is captured, it is first converted into gray scale
Facial expression analysis can be used to test the impact of any image in order to make the face detection process
computationally efficient. For the task of face detection, a pre- (euclidean distance) is taken as constant distance measure. The
trained model using Histogram of Oriented Gradients and SVM following figure shows the different distance measures used.
linear classifier is applied. After detecting the face in the image,
we apply the facial landmark predictor. The facial landmark
detector used is an implementation of the ‘One Millisecond
Face Alignment with an Ensemble of Regression Trees’
paper[2]. This model detects 68 co-ordinates that correspond to
different facial localities namely eyes, eyebrows, mouth, nose
and jaw.
1. inter eyebrow distance 2. left eye width

3. right eye width 4. mouth width
5. mouth height 6. nose to lip
7. constant distance 8. chin height
Figure2. Various distance measures used
Figure1. Proposed System work flow 9. left eye diagonal 10. right eye diagonal
11. left eye slant 12. right eye slant
13. left eye to left eyebrow 14. right eye to right eyebrow
B. Feature Extraction 15. left mouth edge depth 16. right mouth edge depth
Having detected the facial landmark co-ordinates, various
Figure3. Various distance measures used
distance measures that would play a major role in the detection
of facial expression, have been calculated. In order to take into
account the actual significance of distances, we found that Based on the above distance measures, the following 15
taking ratio of the distance with a constant distance measure effective ratios were calculated, which are used as the features
would be effective. Because the parts of face may scale to of the face.
different length for different persons. It is observed that in
human face, the length of the nose is constant while the person ratio1 = inter_eye_brow_dist / constant_distance
is giving a facial expression. So the length of the nose
ratio2 = left_eyebrow_to_left_eye / constant_distance D. Training the SVM Classifier
ratio3 = right_eyebrow_to_right_eye / constant_distance Above obtained data set is used to train and test the Support
Vector Machine classifier. Randomly chosen 80% of the data
ratio4 = left_eye_diagonal / constant_distance set is used as training data set and the remaining 20% is used
ratio5 = right_eye_diagonal / constant_distance for testing the classifier. We used SVM with linear kernel for
this purpose.
ratio6 = left_eye_width / constant_distance
ratio7 = right_eye_width / constant_distance
E. Video Analysis
ratio8 = nose_to_upper_lip / constant_distance After saving the above obtained SVM model, it can be
ratio9 = mouth_height / constant_distance used for our analysis in the video. In this when a person
watches advertisement, the emotion of the viewer will be
ratio10 = chin_height / constant_distance recognized using the SVM model that we trained, tested and
ratio11 = left_mouth_edge_depth / constant_distance saved previously by accessing the viewer’s face through front
camera of a personal computer. Through front camera we can
ratio12 = right_mouth_edge_depth / constant_distance capture frames periodically and then analyse them and store
ratio13 = mouth_width / constant_distance the results for further interpretation. The process works by
initially asking the user to register his neutral face before the
ratio14 = left_eyebrow_slant / constant_distance advertisement begins. Then the facial features of that neutral
ratio15 = right_eyebrow_slant / constant_distance face will be extracted and stored. These stored neutral face
features will be used to calculate the ratios for the actual face
that are used as input to the model and predict the emotion.
C. Setting up the Training Data set The following figures show the various processes performed
The KDEF (Karolinska Directed Emotional Face) data set on a frame captured through the front camera.
consisting of 4900 images of facial expressions corresponding
to 70 individuals, each displaying 7 different expressions,
photographed from 5different angles was selected. But we have
used only the front facing images depicting 6 facial expressions
(happy,sad,anger,surprise,disgust,neutral). Each front facing
image is subjected to feature extraction. After getting the 15
ratios corresponding to a face, we also calculated the ratios for
a neutral face of the same person. Ratios of the actual
parameters and those of neutral face are calculated for each
front facing image in the data set. For example the following
equation shows how one of the parameters of the dataset is
created.
Figure 4. Sample image Figure 5. Gray scale image
A training data set is developed with a label containing the
expression of the face and other parameters are the real valued
parameters (ratio of the actual and neutral face parameters)
extracted using facial landmarks. This resulted in a data set of
around 900 instances. But there were some images in the
KDEF database that we found were quite ambiguous in
emotion. For example the images in Figure3 and Figure4 are
misleading the classifier because we even saw that some people
weren’t able to recognize the actual emotion by looking at the
images. Those images seem to represent sad expression but
they were labeled as fear. So along with the KDEF we used
another database, Extended Cohn-Kanade data set(CK+) , from
which we used only the neutral and sad images. The features of Figure 6. Bounded box containing Figure 7. Facial landmark co-
these images are added to the existing data set created using the face -ordinates
KDEF database.
IV. RESULTS
The above developed system was tested using 20% of the
randomly chosen instances of the dataset created. And an
accuracy of 91.16% has been achieved in recognizing the six
Figure 4. Example of one of the 15 parameters stored for an image emotions namely (happy, sad, anger, disgust, surprise and
neutral).
TABLE I. ACCURACY ACHIEVED IN RECOGNIZING EMOTION Then the number of frames subjected to emotion recognition
Accuracy are
Emotions recognized
achieved(%)
N = t*s
happy, sad, anger, disgust, surprise, neutral 91.16
Therefore N frames will be used to predict the overall
happy, surprise, disgust, anger 97.24
emotion of the viewer. The following methods can be used to
happy, surprise, disgust, anger, neutral 95.23 analyse and predict the overall emotion.
A. Method I
The following images show different instances of the live The emotion that was expressed for maximum amount of
web camera feed upon which the system was applied. The time during the advertisement can be thought of as the overall
numbers on the images represent the probability of the viewer emotion of the viewer towards the advertisement. Let
expressing the said emotion. (a,b,c,d,e,f) be a vector with a representing the fraction of time
for which happy emotion is expressed, similarly b for sad, c
for neutral, d for surprise, e for disgust and f for anger. Then
(a,b,c,d,e,f) = ( i/N, j/N, k/N, l/N, m/N, n/N )
where i represents the number of times happy is detected, j for

sad, k for neutral, l for surprise, m for disgust and n for neutral.
B. Method II
In this method, different instances of advertisement video
are given different weights i.e., any emotion during a particular
Figure 8. Happy emotion Figure 9. Surprise emotion interval of time can be given more importance or weight than
the other. Then as in the first method,
(a,b,c,d,e,f) =
 happyi  j  k surprisel disgustm angern 

 , sad , neutral , , ,
 W W W W W W 
 
where each term in the numerator represents the sum of

Figure 10. Disgust emotion Figure 11. Anger emotion weights where corresponding emotion is expressed and W
represents the sum of weights over every instance. After this,
the emotion whose fraction is largest is taken as the overall
emotion of the viewer. If a particular instance or interval is of
interest, then those weights should be non zero and the other
weights as zero. It can be observed that Method I is a special
case of Method II where equal weight or importance is given to
every instance of video.
CONCLUSION
The main motive of this paper is to implement the use of
facial expression recognition in order to generate feedback for
the advertisements. Hectic form filling has been replaced by the
Figure 12. Sad emotion Figure 13. Neutral emotion automatic recognition of opinions. Apart from advertisements,
this approach can also be used for other applications where
V. GENERATION OF FEEDBACK quick reviews or opinions are required. Proposed system was
After recognizing the emotion of the viewer throughout the able to process the images, recognize the emotions with a
advertisement, this information can be used to generate the decent accuracy and generate the feedback in the form that can
feedback. Let s be the rate at which frames are captured in real be used to analyze the user opinion on the advertisement. In
time and the duration of advertisement video be t seconds. future we would like to focus on improving the emotion
recognition rate for side facing images and also improve the
robustness of the system for images with glasses, facial hair,
etc. These features when incorporated with the existing system
would make the overall feedback generation system very
effective.
ACKNOWLEDGMENT
We would like to express our gratitude to Ms. Vidushi
Bhatti, our mentor for her guidance and support throughout the
project.
REFERENCES
[1] Tran Son Hai, Le Hoang Thai, Nguyen Thanh Thuy, “Facial Expression
Classification Using Artificial Neural Network and K-Nearest
Neighbor”, IJITCS, vol.7, no.3, pp.27-32, 2015. DOI:
10.5815/ijitcs.2015.03.04
[2] V. Kazemi and J. Sullivan, “One millisecond face alignment with an
ensemble of regression trees.”, 2014 IEEE conference on Computer
Vision and Pattern Recognition, Columbus, OH, 2014, pp.1867-1874,
doi:10.1109/cvpr.2014.241
[3] A.N. Sreevatsan, K.G. Sathish Kumar, S. Rakeshsharma, Mohd.
Mansoor Roomi, “Emotion recognition from facial expressions: a target
oriented approach using neural network”, 4th Indian Conference on
Computer Vision, Graphics and Image Processing, ICVGIP, pp. 497-502,
December 2004
[4] O. Krestinskaya and A.P. James, “Facial emotion recognition using min-
max similarity classifier”, 2017 International Conference on Advances
in Computing, Communications and Informatics (ICACCI), Udupi, 2017,
pp. 752-758. doi: 10.1109/ICACCI.2017.8125932
[5] S. Kamal, F. Sayeed and M. Rafeeq, “Facial emotion recognition for
Human-Computer Interactions using hybrid feature extraction
technique”, 2016 International Conference on Data Mining and
Advanced Computing (SAPIENCE), Ernakulam, 2016, pp.180-184. doi:
10.1109/SAPIENCE.2016.7684129
[6] D. Drume and A.S. Jalal, “A multi-level classification approach for
facial emotion recognition”, 2012 IEEE International Conference on
Computational Intelligence and Computing Research, Coimbatore, 2012,
pp. 1-5. doi: 10.1109/ICCIC.2012.6510279

Research Paper

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Paper

Uploaded by

Copyright:

Available Formats

Real Time Feedback Generation System using Facial

1. inter eyebrow distance 2. left eye width

Figure2. Various distance measures used

(a,b,c,d,e,f) = ( i/N, j/N, k/N, l/N, m/N, n/N )

where i represents the number of times happy is detected, j for

 happyi  j  k surprisel disgustm angern 

where each term in the numerator represents the sum of

You might also like