Professional Documents
Culture Documents
Real Time Face Expression Recognition Children Autism
Real Time Face Expression Recognition Children Autism
net/publication/310651141
CITATIONS READS
17 8,186
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Mariofanna Milanova on 22 November 2016.
Abstract— People can accurately identify a common face addition to visual data [3]. Furthermore, brain signals can also
and understand a facial expression in a single glance. be used in emotion recognition.
However, children with autism spectrum disorder (ASD) Murugappan et.al. have accomplished to recognize emotions
often have problems communicating with their parents, such as happiness, sadness, surprise and fear by use of time-
teachers, and other kids. In this paper, we present an frequency based methods of EEG data [4].
innovative system to recognize facial expressions in The focus of this study is the use visual data in emotion
children with ASD during playtime. Children are observed recognition. Creation of an effective detection from face
while playing or using their tablets or laptops while the images and videos constructs the basis of the all facial
researchers track the child’s facial expressions. We have recognition researches. The success of these detectors are
implemented an Active shape Model (ASM) tracker, which based on sturdy detection of facial points. Arı and Akarun,
tracks 116 facial landmarks via web-cam input, the classified both head movements and facial expressions by
tracked landmark points are used to extract face developing face triangulation tracking based on high
expression features using a Support Vector Machine resolution, multi posture active object model and using the
(SVM) based classifier which gives rise to robust our head trajectory information with Saklı Markov classification
system by recognizing seven expressions rather than only model [5]. Akakın and Sankur have used independent
six expression as in the most of face expression systems. components analysis results of the trajectory which is obtained
The proposed system is applied to Child Affective Face through tracking of 17 facial triangulation points and 3D
Expression CAFE set, and we obtained 93% classification discrete cosine transform (DCT) of the time-space cube
accuracy. In addition, another experiment has been obtained through aligning faces in series, as their detectors [6].
carried out in which children performed all 7 expression When assessed with various classification methods, it is
classes. General success rate for 4 classes of this concluded that best result is obtained through use of 3D DCT.
experiment has been observed as 100%. Kuano et.al. suggested a learning model for each emotion
based upon variable intensity templates and enabled face
Keywords— Face Expression, Face Detection, SVM, detection independent of the pose [7]. Sebe et al. have studied
Expression Classification, Autistic, CAFÉ, Real Time, ASM, PCA. face expressions with Bayes nets, support vector machines and
decision trees and presented the database they worked on to
researchers [8]. On the other hand, Littlewort et al. have
I. INTRODUCTION suggested the systematic use of Adaboost, support vector
Analysis of facial expressions has a significant importance in machines and linear discriminant analysis methods [9].
fields such as verbal, non-verbal expression and human- Shan et al. utilized local binary patterns statistical model and
computer interaction. Since our facial expressions are a mirror classified the obtained attributes using various artificial
of our emotions, facial recognition and emotion recognition learning techniques and reported that support vector machines
terms can be used interchangeably in literature. Numerous attained the best result [10]. Busso et.al. used a commercial
methods have been developed in the Vision-based automated software for facial point tracking, separated human face in 5
expression recognition field recently. Fasel and Luettin and areas of interest, calculated principal component analysis
Pantic and Rothkrantz have studied these researches in detail (PCA) coefficients of facial point locations for each region
in [1,2]. As Paleari et.al. suggests, it also possible develop and classified this attribute vector obtained from these values
multi-modal emotion recognition by use of voice data in with the nearest 3 neighbor [11].
1
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)
Aggarwal and Shaohua Wan introduced an automatic face 2. Frame capturing and create shape si for each frame,
emotion recognition method that rely on a metric learning. A si=(x1, y1, x2, y2,……, xL, yL), i=1,…., N
new metric space was learnt, that the group of data points have 3. Normalizing each shape using Procrustes Analysis,
a higher probability of being in the same class, also they found 4. Apply Principle Component Analysis PCA in eq. 1,
specificity and sensitivity to identify the annotation reliability
Csek = λkek (1)
of each annotator [12]. Mao et.al. presented a real time
Where
emotion recognition system using Kinect sensors for
Cs is the covariance matrix constructed using
extracting both 2d and 3d facial emotion features. They
combined the features of point location detected by Kinect and normalized shapes,
animation units [13]. ek is the Kth eigenvector,
In this paper, we present a real time face emotion λk is the Kth eigenvalue,
recognition. In detail, we make the following contribution. 5. Obtaining attributes by identification of specific
First, extracting 116 face features from PC webcam, length and regions in face using the landmarks and
recorded video and still image. We found the precise
applying Mahalanobis distance in eq. 2 and Sobel
localization of the facial features by proposing a multi-
resolution active shape model. filter, the gradient magnitude is given in eq. 3,
Secondly, reducing the dimension of the input data DM(x) = ( x ) TS 1 ( x ) (2)
(features) by applying principle component analysis (PCA).
Because of the superior classification performance and its
ability to deal with high dimensional input data, SVM is the
|G|= Gx 2 Gy 2 (3)
choice of the classifier in this paper for facial emotion 6. Apply SVM classifier using eq. 4,
recognition.
Finally, enabling autistic children to play while the
proposed system capture their emotion and analyze it, to help
f(x) = ( y K ( x .x) b)
i
i i i (4)
2
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)
B. Face Detection and Tracking of Triangulation Points C. Pre Processing and Regions Finding
Active Shape Model (ASM) introduced by Cootes et al is ASM-based tracker tracks 116 triangulation points
one of the most prevalent technique for detection and tracking indicated on left side of Fig. 3. The tracker works sturdy on
of triangulation point[14]. In order to identify the triangulation eyebrows, eyes, chin and nose points however, since it cannot
points in an image, first the location of face is detected with an track flexible lip points correctly, it is not reasonable to
overall face detector (such as Viola-Jones). The average face directly use the location of this points as attributes. There are
shape which is aligned according to position of the face two reasons for this phenomenon, first reason is ASM’s
constitutes the starting point of the search. Then the steps holistic modeling of all triangulation point’s locations, and
described below are repeated until the shape converges. second reason is losing small changes in location of lip points
to constraints made on shape with PCA. Further, the
1. For each point, best matching position with the template difference in intensity at lip edges is not as significant as other
is identified by using the gradient of image texture in the face components. Therefore, instead of directly using the
proximity of that point. locations of triangulation points being tracked, attributes
2. The identified points are projected from their point appear in the right side of Fig. 3 and in the list below is
locations in training set to the shape eigenvalues which is obtained by identification of specific length and regions in
obtained by Principal Component Analysis (PCA). face using these points.
Whereas the individual template matchers in the first step 1- Distance of the mid-point of eye-gap to eyebrow mid
may diverge from their sound positions and the shape obtained points.
may not look like a face, the holistic approach used in the 2- Mouth width.
second step strengthens the independent weak models by
constraining them and associating them with the shapes in the 3- Mouth height.
training set. Multi-resolution system is utilized in order to
prevent the model from being stuck in local optima. Search 4- Domain of vertical edge in the foreface.
starts from the lowest resolution level in the image pyramid 5- Domain of horizontal edge in the mid-forehead.
and goes from rough to detailed. When tracking is done in a
video, the starting point of a search is taken as the shape in the 6- Sum of vertical and horizontal edge domains in right
previous frame instead of the average shape. In the cases were cheek.
the face is not detected search is initiated by overall detection 7- Sum of vertical and horizontal edge domains in left
of face.It is stated that ASM gives better results when it is cheek.
trained with model specific to the person [5]. This proposed
system in this research conducts, besides a general model First three attributes are obtained by Mahalanobis distance
which overlays models of different children, and also uses of the corresponding triangulation points to each other. For the
models specific for each child. ASM tracking is developed by other attributes, the image is first smoothened by filtering with
asmlibrary developed by Wei [15]. Fig. 2 shows the Gauss core, then by filtering with Sobel vertical and horizontal
triangulation points for neutral and surprise face expression. cores separately, edge domains are calculated. Next, absolute
value corresponding to each region is calculated. Horizontal
and vertical edge domains in disgust expression are shown in
Fig. 4. For instance, the average vertical edge domain value
corresponding to blue box in the forehead constitutes the
fourth attribute.
Fig. 2. Facial Ttriangulation Points for Neutral (left) and Surprise Fig. 3. Facial Landmarks (left) and Attributes that are Used to
(right) expression. Extract the Region of Interest (right).
3
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)
IV. RESULTS
4
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)
a correctness score of 93%. As it can be seen, all expressions Table 2 Expression Data Information In Cafe
can be classified with a significant error. Dataset
Expression No. Of Images
The reason for these significant error is that the children in
general do not have a distinctive face expression, and they are Happiness 215
not capable of creating these expression. When the emotion Anger 205
which can be misclassified with another are investigated, it is
observed that the child created similar face expression. Disgust 191
Fear 140
Neutral 230
Surprise 103
Sadness 108
5
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)
V. CONCLUSION
In this research, an emotion recognition system based on
tracking of face triangulation points in real-time has been
proposed. Even in cases where triangulation point tracker does
not provide absolute results, attributes, which are an effective
emotion detector, are proposed. Neutral, surprise, anger,
happiness, sadness, fear, and disgust face expressions have
been classified and an overall 93% success rate has been
obtained when the proposed system is applied on CAFE set
and 80% during the real time experiments. The emotions
which are mixed with each other are observed to be sadness-
disgust-fear groups. The system success rate is found out to be
100% when children perform 4 expressions which they most
comfortable with. The system which can easily be configured
for each individual and environment has produced successful
results even in cases of partial blockage.
Fig. 7. CAFE set Emotion Recognition Results
6
International Acadmey of Engineering and Medical Research, 2016
Volume-1, ISSUE-1
Published Online October-November 2016 in IAEMR (http://www.iaemr.com)
[3] M. Paleari, R. Chellali, ve B. Huet, “Features for multimodal emotion
recognition: An extensive study,” IEEE Conf. on Cyber. and Intelligent
Sys., s.90–95, 2010.
[4] M. Murugappan, M. Rizon, R. Nagarajan, S. Yaacob, D. Hazry and I.
Zunaidi, “Time-frequency analysis of EEG signals for human emotion
detection,” 4th Kuala Lumpur Int’l Conference on Biomedical
Engineering, 2008.
[5] İ. Arı ve L. Akarun, Facial Feature Tracking and Expression
Recognition for Sign Language, in : IEEE, Signal Processing and
Communications Applications, Antalya, 2009.
[6] H. Ç. Akakın ve B. Sankur, “Spatiotemporal Features for Effective
Facial Expression Recognition,” IEEE 11th European Conf. on
Computer Vision, Workshop on Sign Gesture Activity, 2010.
[7] S. Kumano, K. Otsuka, J. Yamato, E. Maeda, ve Y. Sato, “Pose-
Invariant Facial Expression Recognition Using Variable-Intensity
Templates,” International Journal of Computer Vision, c. 83, sf. 178-
194, Kas. 2008.
[8] N. Sebe, M. S. Lew, Y. Sun, I. Cohen, T. Gevers, ve T. S. Huang,
“Authentic facial expression analysis,” Image and Vision Computing, c.
25, sf. 1856-1863, 2007.
[9] G. Littlewort, M. S. Bartlett, I. Fasel, J. Susskind, ve J. Movellan,
“Dynamics of facial expression extracted automatically from video,”
Image and Vision Computing, c. 24, sf. 615-625, 2006.
[10] C. Shan, S. Gong, ve P.W. McOwan, “Facial expression recognition
based on Local Binary Patterns: A comprehensive study,” Image and
Vision Computing, c. 27, sf. 803-816, May. 2009.
[11] C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S.
Lee, U. Neumann, ve S. Narayanan, “Analysis of emotion recognition
using facial expressions, speech and multimodal information,”
Proceedings of the 6th international conference on Multimodal
interfaces - ICMI ’04, New York, 2004.
[12] W. Shaohua, J.K. Aggarwal, “Spotaneous facial expression recognition:
Arobust metric learning approach,” Computer Vision Research Center,
The University of Texas at Austin, Austin, TX 78712-1084, US, Pattern
Recognition 47 2014.
[13] Q. Mao, P. Xinyu, Y. Zhan, S. Xiangiun, “Usinng Kinect for real time
emotion recognition via facial expression,” Frontiers of Information
Technology & Electronic Engineering, Volum 16, issue 4, pp 272-282,
April 2015.
[14] T. F. Cootes, C. J. Taylor, D. H. Cooper ve J. Graham, others, “Active
shape models-their training and application,” Computer vision and
image understanding, c. 61, sf. 38–59, 1995.
[15] Y. Wei, “Research on Facial Expression Recognition and Synthesis,”
Y.L. Tezi, Department of Computer Science and Technology, Nanjing
University, 2009. http://code.google.com/p/asmlibrary/
[16] P. Ekman ve W. Friesen, “Facial Action Coding System”, Consulting
Psychologists Press, 1978.
[17] İ. Arı, Y. Açıköz, “Fast Image Annotation with Pinotator,” IEEE 19th
Signal Processing and Communications Applications Conference, 2011.
[18] https://nyu.databrary.org/volume/30
IV. REFERENCES