Professional Documents
Culture Documents
Comparison Between Geometrybased and Gaborwaveletsbased Facial e
Comparison Between Geometrybased and Gaborwaveletsbased Facial e
7
0 19 5 6
feature space, we will also study the desired number of 1
4 3 8 9
29 30
rate. Finally, we note that a similar representation of faces 25 26 27
28
has been developed in Wiskott et al. [21] for face recog-
nition, where they use a labeled graphs, based on a Gabor 31
33
32
wavelet transform, to represent faces, and face recognition
is done through elastic graph matching.
where tnk and ykn are respectively the pattern target value
and network output value, representing the probability that
input xn belongs to class Ck . The activation function of the
output units is the softmax function:
yk = Pc exp(exp(
ak )
ak )
k =1 0 0
P
where ak = j wkj zj and zj is the output of hidden unit
j . The activation function of the hidden units is ‘tanh’:
,a
( ) = tanh(aj ) eea ,+ ee,a
Figure 3: Gabor wavelet representation: Examples of three a j j
kernels g aj j j
P
where aj = i wji xi and xi is the value of input unit i.
3 The Architecture and Training The two-layer perceptron is trained through Rprop (Re-
The architecture of our FER system is based on a two- silient propagation) [16], which is a local adaptive learning
layer perceptron (see Fig. 4). As described in Sect. 2, an scheme, performing supervised batch learning. The idea is
image is first preprocessed, and two sets of features (ge- to eliminate the harmful influence of the size of the partial
ometric positions and Gabor wavelet coefficients) are ex- derivative on the weight step. In consequence, the weight
tracted. These features are fed in the input units of the update depends only the sign of the derivative, and is exclu-
two-layer perceptron. The objective of the first layer is to sively determined by a weight-specific, so-called “update-
(t)
perform a nonlinear reduction of the dimensionality of fea-
value” ij :
ture space, depending on the number of hidden units. Note 8
that there are no interconnections in the first layer between >
> ,ijt ( ) @E (t) >
if @w 0
<
wijt = >+ijt
ij
geometric and Gabor-wavelet parts, because they are two ( ) ( ) @E (t) < 0
if @w
pieces of information very different in nature. The second > ij
probability of the input image belonging to the associated itself is adapted based on a sign-dependent learning pro-
facial expression. cess:
8
The FER problem is considered as a statistical classi-
ijt,
> + ( 1) @E (t,1) @E (t) > 0
fication problem. The training is done by minimizing the <
> if @w @wij
t @E (t,1) @E (t) < 0
ij
cross-entropy for multiple classes [3]: ij = >, ijt,
( ) ( 1)
if @w @wij
>
ijt, otherwise
ij
c : ( 1)
XX , yn
E =, tnk ln nk
tk where 0 < , < 1 < . (we use , = 0:5, = 1:2).
+ +
n k=1
100 100 100
training training
90
training 90 90
80 80
generalization 80
generalization
70 70 70
50
generalization 50 50
40 40 40
30 30 30
20 20 20
10 10 10
0 0 0
1 2 3 4 5 6 7 8 9 10 20 1 2 3 4 5 6 7 8 9 10 20 1 2 3 4 5 6 7 8 9 10 20
Number of hidden units Number of hidden units Number of hidden units
4 Experiments 100
1
S , of the segments and test its performance, by 20
evaluating the error function (recognition rate), using 10
the remaining segment.
The above process is repeated for each of the S pos- 0
1 2 3 4 5 6 7 8 9 10 20
sible choices of the segment to be omitted from the Number of hidden units
training process.
Finally, we average the results over all S trained two- Figure 6: Comparison of the generalized recognition rates
layer perceptrons.
Since the training is a nonlinear optimization problem, the cycles. The recognition rates on the training data and on
final result depends on the initial guess of the weights of the the test data (generalization) with respect to the number of
perceptrons. So, each perceptron is furthermore trained ten hidden units are shown in Fig. 5. In particular, the general-
times with randomly initialized weights. Thus, the result ized recognition rates are compared in Fig. 6.
for each configuration shown below is the average of the From the experimental results, we have the following
results produced by 100 trained two-layer perceptrons. observations:
We have carried out experiments on the FER using the
developed architecture by using geometric positions alone, Gabor coefficients are much more powerful than geo-
using Gabor wavelet coefficients alone, and by using the metric positions;
combination of the two pieces of information. In order to At least two hidden units are necessary to code rea-
investigate the appropriate dimension to code the facial ex- sonably facial expressions;
pression, we vary the number of hidden units from 1 to Probably from 5 to 7 hidden units are sufficient to
20. The perceptrons with geometric positions alone were code precisely facial expressions;
trained by running 250 cycles through all the training data, Adding geometric positions improves the recognition
while other perceptrons were trained by running only 100 rate only for low dimensional coding (with less than
Label: Surprise NN outputs Label: Happiness NN outputs with surprise and anger with disgust). Finally, in a small
Neu. 0.000 Neu. 0.122 percentage of cases there is also a possibility that the im-
Hap. 0.000 Hap. 0.720 ages were mislabelled by the experimenter.
Sad. 0.000 Sad. 0.000
Sur. 1.000 Sur. 0.000 In order to give the reader a concrete feeling of the FER
Ang. 0.000 Ang. 0.000 results, we show a few examples in Fig. 7 and Fig. 8. The
Dis. 0.000 Dis. 0.000 original labeling in the database and our system outputs
Fear 0.000 Fear 0.158
are both shown. Note that, our system provides the prob-
Label: Disgust NN outputs Label: Fear NN outputs ability it believes that an image belongs to each of the fa-
Neu. 0.001 Neu. 0.002 cial expression classes. The examples shown in Fig. 7 have
Hap. 0.000 Hap. 0.000 obtained a consistent labeling from our system, while for
Sad. 0.428 Sad. 0.022 those in Fig. 8, our system does not agree with the label-
Sur. 0.000 Sur. 0.000
Ang. 0.016 Ang. 0.001 ing given in the database. Note that even in the latter case,
Dis. 0.555 Dis. 0.005 our system usually gives a reasonable result, because the
Fear 0.000 Fear 0.970 expressor may have posed an incorrect expression.
4.2 Experiments After Excluding Fear Images
Figure 7: Examples of correct labeling
The expressors found it most difficult to pose fear ex-
Label: Anger NN outputs Label: Fear NN outputs
pressions accurately. In addition, human has more dif-
Neu. 0.000 Neu. 0.010
ficulty in recognizing fear. There is some evidence sup-
Hap. 0.000 Hap. 0.000 porting the hypothesis that fear expressions are processed
Sad. 0.091 Sad. 0.502 differently from the other basic facial expressions [1]. If
Sur. 0.000 Sur. 0.000 we exclude the fear images from the database, an experi-
Ang. 0.224 Ang. 0.001
Dis. 0.685 Dis. 0.007
ment with 30 human non-experts shows that in 85.6% of
Fear 0.000 Fear 0.479 all cases, human subjects agree with the expressors’ label-
ing, about 6% higher than when fear images are included.
Label: Sadness NN outputs Label: Happiness NN outputs
Hence, we have repeated exactly the same analysis as in the
Neu. 0.099 Neu. 0.822
Hap. 0.846 Hap. 0.017
last subsection but with a dataset in which all fear images
Sad. 0.052 Sad. 0.150 were excluded. The results are shown in Fig. 9. The same
Sur. 0.000 Sur. 0.011 general observations can be made. When 7 hidden units are
Ang. 0.000 Ang. 0.000 used, our system achieves a generalized recognition rate of
Dis. 0.002 Dis. 0.000
Fear 0.002 Fear 0.000
73.3% with geometric positions alone, 92.2% with Gabor
wavelet coefficients alone, and 92.3% with combined in-
Figure 8: Examples of disagreement formation.
5 Conclusion
5 hidden units). No improvement is observed when 5 In this paper, we have compared the use of two types of
or more hidden units are used. features extracted from face images for recognizing facial
expressions. The first type is the geometric positions of a
The recognition rate (i.e., the agreement with the label- set of fiducial points on a face. The second type is a set
ing provided by the expressors) achieved by our system is of multi-scale and multi-orientation Gabor wavelet coeffi-
90.1% with 7 hidden units. This should be compared with cients extracted from the face image at the fiducial points.
the agreement between human subjects and expressors’ la- They can be used either independently or jointly. We have
beling. In the study of Lyons et al. [12], 60 human non- developed an architecture based on a two-layer perceptron.
expert subjects were asked to rate each facial image for Comparison of the recognition performance with different
content of the six basic facial expressions. In 20.5% of all types of features shows that Gabor wavelet coefficients are
cases, the category which received the highest rating (aver- much more powerful than geometric positions and that the
aged over all subjects) disagreed with the expression label agreement between computer and the expressors’ labeling
of the image. This is similar to the results reported in the is higher than that between human subjects and the expres-
literature but with different image database [2, 10]. Sev- sors’ labeling.
eral sources of this disagreement may be identified. The Furthermore, since the first layer of the perceptron actu-
expressor may have posed the expression inaccurately or ally performs a nonlinear reduction of the dimensionality
even incorrectly in some cases. The experimental subjects of the feature space, we have also studied the desired num-
may have confused one expression with another when per- ber of hidden units, i.e., the appropriate dimension to rep-
forming the rating task (for example, fear may be confused resent a facial expression in order to achieve a good recog-
100 [7] P. Ekman and W. Friesen. Unmasking the Face: A guide to
combination recognizing emotions from facial expressions. Consulting
90
Psychologists Press, Palo Alto, CA, 1975.
80 Gabor [8] I. Essa and A. Pentland. Coding, analysis, interpretation,
and recognition of facial expressions. IEEE Transactions
70 on Pattern Analysis and Machine Intelligence, 19(7):757–
geometric
Recognition rate (%)