Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/310651141

Real Time Face Expression Recognition of Children with Autism

Article · November 2016

CITATIONS READS

17 8,186

2 authors:

Suzan Anwar Mariofanna Milanova


Philander Smith College University of Arkansas at Little Rock
12 PUBLICATIONS   42 CITATIONS    195 PUBLICATIONS   963 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Visual Attention : Biology, Computational Models and Applications View project

NVIDIA Deep Learning Institute Ambassador View project

All content following this page was uploaded by Mariofanna Milanova on 22 November 2016.

The user has requested enhancement of the downloaded file.


International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)

Real Time Face Expression Recognition of Children


with Autism
Suzan Anwar Mariofanna milanova
Department of Computer Science Department of Computer Science
University of Arkansas at Little Rock University of Arkansas at Little Rock
Little Rock, Arkansas, USA Little Rock, Arkansas, USA
sxanwar@ualr.edu mgmilanova@ualr.edu

Abstract— People can accurately identify a common face addition to visual data [3]. Furthermore, brain signals can also
and understand a facial expression in a single glance. be used in emotion recognition.
However, children with autism spectrum disorder (ASD) Murugappan et.al. have accomplished to recognize emotions
often have problems communicating with their parents, such as happiness, sadness, surprise and fear by use of time-
teachers, and other kids. In this paper, we present an frequency based methods of EEG data [4].
innovative system to recognize facial expressions in The focus of this study is the use visual data in emotion
children with ASD during playtime. Children are observed recognition. Creation of an effective detection from face
while playing or using their tablets or laptops while the images and videos constructs the basis of the all facial
researchers track the child’s facial expressions. We have recognition researches. The success of these detectors are
implemented an Active shape Model (ASM) tracker, which based on sturdy detection of facial points. Arı and Akarun,
tracks 116 facial landmarks via web-cam input, the classified both head movements and facial expressions by
tracked landmark points are used to extract face developing face triangulation tracking based on high
expression features using a Support Vector Machine resolution, multi posture active object model and using the
(SVM) based classifier which gives rise to robust our head trajectory information with Saklı Markov classification
system by recognizing seven expressions rather than only model [5]. Akakın and Sankur have used independent
six expression as in the most of face expression systems. components analysis results of the trajectory which is obtained
The proposed system is applied to Child Affective Face through tracking of 17 facial triangulation points and 3D
Expression CAFE set, and we obtained 93% classification discrete cosine transform (DCT) of the time-space cube
accuracy. In addition, another experiment has been obtained through aligning faces in series, as their detectors [6].
carried out in which children performed all 7 expression When assessed with various classification methods, it is
classes. General success rate for 4 classes of this concluded that best result is obtained through use of 3D DCT.
experiment has been observed as 100%. Kuano et.al. suggested a learning model for each emotion
based upon variable intensity templates and enabled face
Keywords— Face Expression, Face Detection, SVM, detection independent of the pose [7]. Sebe et al. have studied
Expression Classification, Autistic, CAFÉ, Real Time, ASM, PCA. face expressions with Bayes nets, support vector machines and
decision trees and presented the database they worked on to
researchers [8]. On the other hand, Littlewort et al. have
I. INTRODUCTION suggested the systematic use of Adaboost, support vector
Analysis of facial expressions has a significant importance in machines and linear discriminant analysis methods [9].
fields such as verbal, non-verbal expression and human- Shan et al. utilized local binary patterns statistical model and
computer interaction. Since our facial expressions are a mirror classified the obtained attributes using various artificial
of our emotions, facial recognition and emotion recognition learning techniques and reported that support vector machines
terms can be used interchangeably in literature. Numerous attained the best result [10]. Busso et.al. used a commercial
methods have been developed in the Vision-based automated software for facial point tracking, separated human face in 5
expression recognition field recently. Fasel and Luettin and areas of interest, calculated principal component analysis
Pantic and Rothkrantz have studied these researches in detail (PCA) coefficients of facial point locations for each region
in [1,2]. As Paleari et.al. suggests, it also possible develop and classified this attribute vector obtained from these values
multi-modal emotion recognition by use of voice data in with the nearest 3 neighbor [11].

1
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)
Aggarwal and Shaohua Wan introduced an automatic face 2. Frame capturing and create shape si for each frame,
emotion recognition method that rely on a metric learning. A si=(x1, y1, x2, y2,……, xL, yL), i=1,…., N
new metric space was learnt, that the group of data points have 3. Normalizing each shape using Procrustes Analysis,
a higher probability of being in the same class, also they found 4. Apply Principle Component Analysis PCA in eq. 1,
specificity and sensitivity to identify the annotation reliability
Csek = λkek (1)
of each annotator [12]. Mao et.al. presented a real time
Where
emotion recognition system using Kinect sensors for
 Cs is the covariance matrix constructed using
extracting both 2d and 3d facial emotion features. They
combined the features of point location detected by Kinect and normalized shapes,
animation units [13].  ek is the Kth eigenvector,
In this paper, we present a real time face emotion  λk is the Kth eigenvalue,
recognition. In detail, we make the following contribution. 5. Obtaining attributes by identification of specific
First, extracting 116 face features from PC webcam, length and regions in face using the landmarks and
recorded video and still image. We found the precise
applying Mahalanobis distance in eq. 2 and Sobel
localization of the facial features by proposing a multi-
resolution active shape model. filter, the gradient magnitude is given in eq. 3,
Secondly, reducing the dimension of the input data DM(x) = ( x   ) TS 1 ( x   ) (2)
(features) by applying principle component analysis (PCA).
Because of the superior classification performance and its
ability to deal with high dimensional input data, SVM is the
|G|= Gx 2  Gy 2 (3)
choice of the classifier in this paper for facial emotion 6. Apply SVM classifier using eq. 4,
recognition.
Finally, enabling autistic children to play while the
proposed system capture their emotion and analyze it, to help
f(x) =  ( y K ( x .x)  b)
i
i i i (4)

them make a positive progress with their parents, teachers, and


therapists.
Detecting and Training
The research begins with description of proposed system
steps including face detection and triangulation point tracking,
Face Detection
proceeds with derivation of attributes and emotion
classification sections which constitutes the rest of the system
in sections 2, 3, and 4. The experiments done and the results
obtained are given in part 5. Section 6 summarizes the Facial Feature Tracking
conclusions.

II. PROPOSED SYSTEM Pre-Processing


The proposed system pursues a different track and has
Frame Interpolation Edge
three main steps as shown in Fig.1 which represents the block
diagram for the proposed system. First step is the detection of Points
facial triangulation points using a tracker based on multi-
resolution active shape model. In the second step, local
changes in specific regions of the face (forehead wrinkles, eye Sobel Filtering
brow wrinkles, distance eyes to eyebrows, wrinkles in cheeks,
vertical and horizontal measures of the mouth) are calculated
with the help of point location tracking. Finally a face
Classification
expression is identified according to distance of the obtained
attribute vector to that of in the learning set.
Feature Extraction
A. Proposed System Algorithm
The algorithm of our proposed system recognizes the facial SVM Classifier
emotion according to the following steps;

1. Detect the face for each frame using Viola-Jones


detector, Fig. 1. The Proposed System’s Block Diagram

2
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)
B. Face Detection and Tracking of Triangulation Points C. Pre Processing and Regions Finding
Active Shape Model (ASM) introduced by Cootes et al is ASM-based tracker tracks 116 triangulation points
one of the most prevalent technique for detection and tracking indicated on left side of Fig. 3. The tracker works sturdy on
of triangulation point[14]. In order to identify the triangulation eyebrows, eyes, chin and nose points however, since it cannot
points in an image, first the location of face is detected with an track flexible lip points correctly, it is not reasonable to
overall face detector (such as Viola-Jones). The average face directly use the location of this points as attributes. There are
shape which is aligned according to position of the face two reasons for this phenomenon, first reason is ASM’s
constitutes the starting point of the search. Then the steps holistic modeling of all triangulation point’s locations, and
described below are repeated until the shape converges. second reason is losing small changes in location of lip points
to constraints made on shape with PCA. Further, the
1. For each point, best matching position with the template difference in intensity at lip edges is not as significant as other
is identified by using the gradient of image texture in the face components. Therefore, instead of directly using the
proximity of that point. locations of triangulation points being tracked, attributes
2. The identified points are projected from their point appear in the right side of Fig. 3 and in the list below is
locations in training set to the shape eigenvalues which is obtained by identification of specific length and regions in
obtained by Principal Component Analysis (PCA). face using these points.
Whereas the individual template matchers in the first step 1- Distance of the mid-point of eye-gap to eyebrow mid
may diverge from their sound positions and the shape obtained points.
may not look like a face, the holistic approach used in the 2- Mouth width.
second step strengthens the independent weak models by
constraining them and associating them with the shapes in the 3- Mouth height.
training set. Multi-resolution system is utilized in order to
prevent the model from being stuck in local optima. Search 4- Domain of vertical edge in the foreface.
starts from the lowest resolution level in the image pyramid 5- Domain of horizontal edge in the mid-forehead.
and goes from rough to detailed. When tracking is done in a
video, the starting point of a search is taken as the shape in the 6- Sum of vertical and horizontal edge domains in right
previous frame instead of the average shape. In the cases were cheek.
the face is not detected search is initiated by overall detection 7- Sum of vertical and horizontal edge domains in left
of face.It is stated that ASM gives better results when it is cheek.
trained with model specific to the person [5]. This proposed
system in this research conducts, besides a general model First three attributes are obtained by Mahalanobis distance
which overlays models of different children, and also uses of the corresponding triangulation points to each other. For the
models specific for each child. ASM tracking is developed by other attributes, the image is first smoothened by filtering with
asmlibrary developed by Wei [15]. Fig. 2 shows the Gauss core, then by filtering with Sobel vertical and horizontal
triangulation points for neutral and surprise face expression. cores separately, edge domains are calculated. Next, absolute
value corresponding to each region is calculated. Horizontal
and vertical edge domains in disgust expression are shown in
Fig. 4. For instance, the average vertical edge domain value
corresponding to blue box in the forehead constitutes the
fourth attribute.

Fig. 2. Facial Ttriangulation Points for Neutral (left) and Surprise Fig. 3. Facial Landmarks (left) and Attributes that are Used to
(right) expression. Extract the Region of Interest (right).

3
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)

Each individual who will use the system repeats each


For motivation on selecting attributes, movement expression T times, recording a total of NT samples ( N=
descriptions corresponding to emotional expression given in number of expression classes). During the test phase, the child
Table 1 can be examined. These clues are based on leading once more starts with a neutral expression, prepares the
research done by Ekman and Friesen [16]. system to initial state to avoid disturbances of the
environment. In each following frame, attribute vector for the
III. EMOTION CLASSIFICATION face is calculated. The average distance value of each element
Since facial expressions differ for each person and the in the attribute vector to each class is calculated which is di,
i=1,….N vectors. The distance values depict differences,
environment (such as lighting), here a mechanism which can
whereas Si = e-di values depict similarities and used as
be configured according to a specific person and specific
similarity metrics. Finally Si vector is normalized as the sum
environment has been targeted. In this mechanism children of their elements will equal to 1.
start with a neutral expression and wait for 2 seconds. In this
process an average of derived attributes is calculated and in This values can be used as a probability for each class.
frames following that, in order to enable system to act Because of the superior classification performance and its
independent of environmental variables, attributes are ability to deal with high dimensional input data, SVM is the
normalized by division to their average. choice of the classifier in this study for facial expression
recognition. The key idea of SVM is to map the original input
space into a higher dimensional feature space in order to
achieve a linear solution. This mapping is done using kernel
functions.
When ASM need to be extended for training of a new
child, photographs of the individual are taken and
triangulation points of this photograph are marked up
automatically by matching with general model. After this
marked points are fine-tuned with Pinotator [17], an ASM
specific for the person is generated. As it is discussed before,
the classifier can be easily trained for a new person and
environment, enabling the use of the system for new people.

IV. RESULTS

A. Proposed System Requirments


The proposed system is implemented Microsoft Visual
Studio c++ 2013 with OpenCV. It has been deployed on 14-
inch (1920×1080 pixels) Dell laptop with a quad-core i7 GHz
Fig. 4. A Picture from Disgust Expression (Left) and processor, 8 GB of RAM, and 500 GB hard disk drive running
Corresponding Vertical Edge Domain (Right). Windows 10. The system has also been tested using a surface
pro 3 tablet. The tablet’s specifications are 12-inch (2160 x
Table 1 Emotional Expressions And Their Descriptions 1440 pixels) with Intel core i7 processor, 4 GB of RAM and
EMOTIONAL DESCRIPTIONS 64 GB hard disk running Windows 8.

Surprise Rise of eyebrows, sight opening of mouth, slight fall


of chin
Anger Frowning of eyebrows, tightening of lips and B. CAFE set Experment Results
standing out of eyes. In order to measure the performance of the proposed
Happiness Rise and fall of mouth edges system, the Child Affective Facial Expression (CAFE) set [18]
that shown in Fig. 5 for some images, is used. The CAFE set
Sadness Fall of mouth edges and frowning of inner eyebrows consists of 1192 images for male and female children of 2 to 8
years old posing for seven facial expression, those expressions
Fear Rise of eye brows, standing out of eyes and slight
opening of mouth
are; sadness, happiness, surprise, anger, disgust, fear, and
Disgust Rise of upper lip, wrinkle of nose, fall of cheeks
neutral. The data information for CAFE set is available in a
sortable Excel file, it has been used to compare and evaluate
the results produced by our proposed system. The result of
comparison is shown in Fig. 6, our proposed system achieved

4
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)
a correctness score of 93%. As it can be seen, all expressions Table 2 Expression Data Information In Cafe
can be classified with a significant error. Dataset
Expression No. Of Images
The reason for these significant error is that the children in
general do not have a distinctive face expression, and they are Happiness 215
not capable of creating these expression. When the emotion Anger 205
which can be misclassified with another are investigated, it is
observed that the child created similar face expression. Disgust 191
Fear 140
Neutral 230
Surprise 103
Sadness 108

Sample emotion recognition results of CAFE set can be seen


in Figure 7. The probability rates of corresponding emotion
expression is given in the interface.

C. Real Time Experment Results


In order to measure the performance of the system in real
time environment, visuals from 6 children (5 female, 1 male)
of age between 2 to 15 years old, for 7 classes (neutral,
surprise, anger, happiness, sadness, fear, disgust) with 3
repetitions are taken. Four children are from one family and
diagnosed with Asperger’s syndrome. The test conducted
while the children playing games or watching movies.
Classification success rate has been calculated as 80.76% and
Fig. 5. Four Samples from CAFE Set the error matrix seen in Fig. 8 is obtained. As it can be seen,
neutral expression, anger and happiness can be classified
Table 2 summarizes the expressions in CAFE dataset, it shows without error. A significant part of the error occurs when fear
the expression and the number of image that represent this and surprise expressions are misclassified with each other. The
expression in CAFE dataset. As seen in Table 2 the number of reason for this is that in both expressions the child raises
expression is not unified, the expression “Neutral” has the eyebrows and has similar face expressions. Sadness and
biggest number of the images while the expression “Surprise” disgust expressions can be misclassified by one-another too.
has the less one. When the person creates the emotional expressions in a
distinctive way the system can work with 100 % success rate.

Fig. 6. Performance Comparison of Our System with CAFÉ Set

5
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)

In addition to this experiment, another experiment has been


carried out in which children perform only first 4 emotion
classes for which they are more comfortable with doing.
General success rate of this experiment has been observed as
100%.
The reason for having complete success rate is to have
distinctive face expressions in these classes and the children
are more capable of creating these expressions. When the
emotions which can be misclassified with one-another are
investigated, it is observed that child created similar face
expressions. The probability rates of corresponding emotion
expression is given in the interface. As it can be seen in the
picture in Fig. 9, the system can give promising results even in
cases where face is partially blocked. Sample emotion
recognition results of the children can be seen in Fig 10.

V. CONCLUSION
In this research, an emotion recognition system based on
tracking of face triangulation points in real-time has been
proposed. Even in cases where triangulation point tracker does
not provide absolute results, attributes, which are an effective
emotion detector, are proposed. Neutral, surprise, anger,
happiness, sadness, fear, and disgust face expressions have
been classified and an overall 93% success rate has been
obtained when the proposed system is applied on CAFE set
and 80% during the real time experiments. The emotions
which are mixed with each other are observed to be sadness-
disgust-fear groups. The system success rate is found out to be
100% when children perform 4 expressions which they most
comfortable with. The system which can easily be configured
for each individual and environment has produced successful
results even in cases of partial blockage.
Fig. 7. CAFE set Emotion Recognition Results

Fig. 9. The Results When the Face is Partially Blocked

Fig. 8. Confusion Matrix for Seven Classes

6
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)
[3] M. Paleari, R. Chellali, ve B. Huet, “Features for multimodal emotion
recognition: An extensive study,” IEEE Conf. on Cyber. and Intelligent
Sys., s.90–95, 2010.
[4] M. Murugappan, M. Rizon, R. Nagarajan, S. Yaacob, D. Hazry and I.
Zunaidi, “Time-frequency analysis of EEG signals for human emotion
detection,” 4th Kuala Lumpur Int’l Conference on Biomedical
Engineering, 2008.
[5] İ. Arı ve L. Akarun, Facial Feature Tracking and Expression
Recognition for Sign Language, in : IEEE, Signal Processing and
Communications Applications, Antalya, 2009.
[6] H. Ç. Akakın ve B. Sankur, “Spatiotemporal Features for Effective
Facial Expression Recognition,” IEEE 11th European Conf. on
Computer Vision, Workshop on Sign Gesture Activity, 2010.
[7] S. Kumano, K. Otsuka, J. Yamato, E. Maeda, ve Y. Sato, “Pose-
Invariant Facial Expression Recognition Using Variable-Intensity
Templates,” International Journal of Computer Vision, c. 83, sf. 178-
194, Kas. 2008.
[8] N. Sebe, M. S. Lew, Y. Sun, I. Cohen, T. Gevers, ve T. S. Huang,
“Authentic facial expression analysis,” Image and Vision Computing, c.
25, sf. 1856-1863, 2007.
[9] G. Littlewort, M. S. Bartlett, I. Fasel, J. Susskind, ve J. Movellan,
“Dynamics of facial expression extracted automatically from video,”
Image and Vision Computing, c. 24, sf. 615-625, 2006.
[10] C. Shan, S. Gong, ve P.W. McOwan, “Facial expression recognition
based on Local Binary Patterns: A comprehensive study,” Image and
Vision Computing, c. 27, sf. 803-816, May. 2009.
[11] C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S.
Lee, U. Neumann, ve S. Narayanan, “Analysis of emotion recognition
using facial expressions, speech and multimodal information,”
Proceedings of the 6th international conference on Multimodal
interfaces - ICMI ’04, New York, 2004.
[12] W. Shaohua, J.K. Aggarwal, “Spotaneous facial expression recognition:
Arobust metric learning approach,” Computer Vision Research Center,
The University of Texas at Austin, Austin, TX 78712-1084, US, Pattern
Recognition 47 2014.
[13] Q. Mao, P. Xinyu, Y. Zhan, S. Xiangiun, “Usinng Kinect for real time
emotion recognition via facial expression,” Frontiers of Information
Technology & Electronic Engineering, Volum 16, issue 4, pp 272-282,
April 2015.
[14] T. F. Cootes, C. J. Taylor, D. H. Cooper ve J. Graham, others, “Active
shape models-their training and application,” Computer vision and
image understanding, c. 61, sf. 38–59, 1995.
[15] Y. Wei, “Research on Facial Expression Recognition and Synthesis,”
Y.L. Tezi, Department of Computer Science and Technology, Nanjing
University, 2009. http://code.google.com/p/asmlibrary/
[16] P. Ekman ve W. Friesen, “Facial Action Coding System”, Consulting
Psychologists Press, 1978.
[17] İ. Arı, Y. Açıköz, “Fast Image Annotation with Pinotator,” IEEE 19th
Signal Processing and Communications Applications Conference, 2011.
[18] https://nyu.databrary.org/volume/30

Fig. 10. Results of Real Time Emotion Recognition

IV. REFERENCES

[1] B. Fasel ve J. Luettin, “Automatic facial expression analysis: a survey,”


Pattern Recognition, 36:259-275, 2003.
[2] M. Pantic ve L.J.M. Rothkrantz, “Automatic analysis of facial
expressions: the state of the art,” IEEE Trans. on Pattern Analysis and
Machine Intelligence, 22:1424-1445, 2000.

View publication stats

You might also like