Accurate and Reliable Facial Expression Recognition Using Advanced Softmax Loss With Fixed Weights

IEEE SIGNAL PROCESSING LETTERS, VOL.
27, 2020 725
Accurate and Reliable Facial Expression Recognition

Using Advanced Softmax Loss With Fixed Weights
Ping Jiang , Gang Liu , Quan Wang , and Jiang Wu
Abstract—An important challenge for facial expression recogni- learned features are likely to be separable in the angular space but
tion (FER) is that real-world training data are usually imbalanced. not sufficiently discriminative. Hence, many deep embedding
Although many deep learning approaches have been proposed to approaches have been presented to enhance the discriminative
enhance the discriminative power of deep expression features and
enable a good predictive effect, few works have focused on the mul- power of deep features by reducing intraclass variation and
ticlass imbalance problem. When supervised by softmax loss (SL), enhancing interclass differences. Wen et al. [13], [14] proposed
which is widely used in FER, the classifier is often biased against classical center loss (CL) to force deep features of the same class
minority categories (i.e., smaller interclass angular distances). In to their centers. Li et al. [15], [16] reduced intraclass variation
this letter, we present advanced softmax loss (ASL) to mitigate the by minimizing the distance between a sample and the center
bias induced by data imbalance and hence increase accuracy and
reliability. The proposed ASL essentially magnifies the interclass of its neighbors. Based on the CL and the locality-preserving
diversity in the angular space to enhance discriminative power loss, Luo et al. [17] designed local subclass loss to constrain
in every category. The proposed loss can easily be implemented intraclass variation. On the basis of SL and CL, Cai et al. [18]
in any deep network. Extensive experiments on the FER2013 and added island loss (IL) to simultaneously increase the interclass
real-world affective faces (RAF) databases demonstrate that ASL is angular distance. The exclusive regularization in [19] is actually
significantly more accurate and reliable than many state-of-the-art
approaches and that it can easily be plugged into other methods a special case of IL. However, almost no methods have addressed
and improves their performance. the problem that real-world expression datasets are usually
imbalanced [20].
Index Terms—Deep learning, convolutional neural networks,
softmax loss, multiclass imbalance, facial expression recognition.
Under the supervision of SL, the angular space of minority
classes is often compressed and thus leads to poor discrimination
I. INTRODUCTION results. That is, minority categories usually have less interclass
diversity. Numerous equidistant prototype embeddings [21]–
ACIAL expression recognition (FER) is a popular topic in
F computer vision and machine learning because of its vast
array of potential applications in human-computer interfaces [1],
[23] have been used to constrain the target vectors to be equidis-
tant in Euclidean space, but could not guarantee that they were
uniformly distributed in the angular domain. Therefore, this
[2]. In recent years, the focus of FER has transitioned from work proposes advanced softmax loss (ASL) to mitigate the
controlled laboratory environments to real-world conditions due bias against minority categories by using fixed (unlearnable)
to the success of deep learning techniques [3]–[7]. FER in weights in which all weight vectors are uniformly distributed in
realistic settings is still a challenging problem, since real-world the angular space (where a weight vector corresponds to a class).
data contain additional factors that are unrelated to expressions, Our major contributions are as follows:
such as head pose, illumination, and gender. - An analysis of interclass diversity illustrates that it is ideal
In the FER literature, the most widely used loss function is for all classes to be uniformly distributed in the angular
softmax loss (SL) [8]–[12]. Under the supervision of SL, deeply domain.
- A novel method ensures that the deeply learned expression
Manuscript received January 2, 2020; revised April 4, 2020; accepted April
11, 2020. Date of publication April 22, 2020; date of current version May features are uniformly distributed and hence enhances the
21, 2020. This work was supported in part by the National Natural Science discriminative ability for minority classes.
Foundation of China (NSFC) under Grant 61702395, Grant 61972302, and Grant - Almost all works have reported the best scores, but these
61711530248, in part by the Key Program of Shaanxi Technology Committee
of China under Grant 2019NY-182 and Grant 2020NY-167, and in part by the results may be incorrect because the recognition results
High-Level Culture Program of Yulin College under Grant 207010074. The vary across implementations (e.g., in our experiments, the
associate editor coordinating the review of this manuscript and approving it for difference between the best score and the worst is more than
publication was Prof. Mylene Q. Farias. (Corresponding author: Gang Liu.)
Ping Jiang is with the School of Computer Science and Technology, Xidian 2% for some methods). Thus, we present a new performance
University, Xi’an 710071, China, and also with the School of Information metric (reliability) to measure the variation in classification
Engineering, Yulin University, Yulin 719000, China (e-mail: jiangping@yulinu. scores and employ a fairer metric (the mean score) to
edu.cn).
Gang Liu and Quan Wang are with the School of Computer Science and Tech- evaluate the conventional recognition accuracy.
nology, Xidian University, Xi’an 710071, China (e-mail: gliu_xd@163.com; - Exhaustive experiments on the FER2013 dataset [24] and
qwang@xidian.eud.cn). real-world affective faces (RAF) database [15] demonstrate
Jiang Wu is with the School of Information Engineering, Yulin University,
Yulin 719000, China (e-mail: wujiang@yulinu.eud.cn). that the proposed method is more accurate and reliable
Digital Object Identifier 10.1109/LSP.2020.2989670 than many state-of-the-art approaches and that it can easily
1070-9908 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Amrita Vishwa Vidyapeetham Chennai Campus. Downloaded on January 20,2023 at 16:25:00 UTC from IEEE Xplore. Restrictions apply.
726 IEEE SIGNAL PROCESSING LETTERS, VOL. 27, 2020
Algorithm 1: Computing Weights for ASL.

Input: Rate α
Output: Weights wk (k ∈ N )
1: Randomly initialize wk with a normal distribution
1
2: while ( max {|wTk wl − |} < 10−4 ) do
k,l∈N 1−n
k=l
3: Compute the gradients Δwk as in Eq. (5)
4: Update the weights:
5: wk ← wk − αΔwk
6: end while
Fig. 1. Distributions of 2D deep features trained on a part of the MNIST 7: Normalize wk to wk = 1
dataset under the supervision of SL and ASL.
be incorporated into other methods (such as loss func- Algorithm 2: Training Strategy for ASL.
tions [25]–[28] that are extensions of SL). Input: Training data {Ii , yi }m
i=1 , learning rate η, number of
iterations T
II. PROPOSED METHOD Output: Network layer parameters θ
1: Randomly initialize network layer parameters θ 1 ,
A. Softmax Loss
generate ASL parameters wk according to Algorithm 1
SL is widely used in FER works and is defined as follows: 2: for t = 1 to T do
3: Compute the loss LtS as in Eq. (1):
1
T
ewyi xi
m
∂Lt
LS = − log wT
, (1) 4: Compute the back-propagation error: ∂xSt
m i=1 k xi
k∈N e
i
5: Update the network layer parameters:
t
∂LS ∂xit
where N denotes the set of sample labels and (·)T denotes 6: θ t+1 = θ t − η m i=1 ∂xti ∂θ t
the transpose operator. xi ∈ Rd is the deep feature of the ith 7: end for
sample, and yi ∈ N is the corresponding class label. wk ∈ Rd
denotes the kth column (vector) of the weight parameters W =
[w1 , w2 , . . . , wn ] ∈ Rd×n in the SL. n and m are the numbers Based on the concept of the polytope [31], [32], we know that
of classes and training samples, respectively. ŵTk ŵl = 1/(1 − n) is a solution of Eq. (3), i.e., these points are
It is well known that the deeply learned features supervised by the vertexes of a regular (n − 1)-simplex (note that d ≥ n − 1
SL exhibit a ‘radial’ distribution and are likely to be separable in must hold, where d is the feature dimension). That is, to enhance
the angular domain. For example, Fig. 1 shows the 2D features the discriminative power between different classes, it is ideal that
trained on the training set of MNIST [29] (there are only 1,000 the weight vectors wk are uniformly distributed in the angular
samples for digits 2 and 5, respectively) using LeNet++ networks space (i.e., every Akl equals 1/(1 − n) radians). Hence, the bias
as in [13], [14], [30]. However, SL usually compresses the due to the imbalanced training data is mitigated by forcing wk
angular space of minority classes (see digits 2 and 5 in Fig. 1(a)) to be uniformly distributed in the angular domain.
and thus leads to poor generalization to the test samples.
In well-trained convolutional neural networks (CNNs), the B. Advanced Softmax Loss
weight vector wk can be regarded as the cluster center of all xi As mentioned above, if every Akl is equal to arccos(1/(1 −
with yi = k [19]. To augment the discriminative power of the n)) (where arccos denotes the inverse of the cosine function),
CNN, different classes should be as widely separated as possible; this is an optimal scenario for enhancing discriminative power;
therefore, we examine the loss function below, which is similar thus, we aim to ensure that the learned features are uniformly
to IL [18]. distributed in the angular domain.
wT wl Note that ŵTk ŵl = 1/(1 − n) is one solution of Eq. (3) but not
Linter = k
= ŵTk ŵl , (2)
wk wl the only one. For example, the vectors (1;0;0), (0;1;0), (-1;0;0),
k∈N l∈N k∈N l∈N
l=k l=k and (0;-1;0) satisfy Eq. (3) but are not the desired solution. Thus,
we cannot achieve the ideal case by optimizing the loss Linter
where ŵk and ŵl are normalized weight vectors and ·
or by solving Eq. (3). This is why IL is not very efficient (see
denotes the Euclidean norm. ŵk T ŵl is the cosine value of Akl ,
Tables I-II).
which denotes the vectorial angle between wk and wl , and the
For convenience, we suppose that wk is a unit vector (i.e.,
loss Linter is the sum of the cosine values of all pairwise angles.
wk = 1), and then we have cos(Akl ) = wTk wl . Since Akl ∈
Therefore, Linter magnifies the interclass angular distances.
[0, π], our final goal Akl = 1/(1 − n) is the unique solution of
Linter takes its minimal value −n only when the following
the equation J = 0, where J is defined as follows:
equation holds:
1 1
2
ŵk = − ŵl . (3) J = wk wl −
T
. (4)
2n(n − 1) 1−n
l∈N ,l=k k∈N l∈N ,l=k
JIANG et al.: ACCURATE AND RELIABLE FACIAL EXPRESSION RECOGNITION USING ADVANCED SOFTMAX LOSS WITH FIXED WEIGHTS 727
TABLE I III. EXPERIMENTS

PERFORMANCE COMPARISON ON THE FER2013 DATABASE. THE BEST
RESULTS ARE MARKED IN BOLDFACE A. Implementation Details
Our experiments are implemented by using Python with the
PyTorch framework on a workstation computer with the follow-
ing specifications: Intel Xeon Silver 4110 2.10 GHz CPU, 32 GB
RAM and an NVIDIA Titan RTX 24 G GPU.
1) Tested Databases: A series of experiments were con-
ducted on two benchmark expression databases: the FER2013
database [24] and the RAF dataset [15]. The FER2013 database
contains 28,709, 3,589 and 3,589 images for training, private
testing and public testing, respectively, and these training images
include 3,995 angry expressions, 436 disgusted expressions,
4,097 fearful expressions, 7,215 happy expressions, 4,830 sad
expressions, 3,171 surprised expressions, and 4,965 neutral ex-
pressions. The RAF dataset provides 12,271 images for training
and 3,068 images for testing, where the training set consists of
705 angry images, 717 disgusted images, 281 fearful images,
4,772 happy images, 1,982 sad images, 1,290 surprised images,
and 2,524 neutral images.
2) Image Preprocessing: A 44 × 44 image is randomly
TABLE II cropped from the input 48 × 48 grayscale image and then nor-
PERFORMANCE COMPARISON ON THE RAF DATASET. THE BEST RESULTS ARE
MARKED IN BOLDFACE malized by dividing all pixels by 255. The cropped image is
randomly horizontally flipped with a 50% probability to aug-
ment the data, and it is ultimately expanded to 3 × 44 × 44 by
being copied.
3) Training & Testing: For a fair comparison, this letter em-
ploys the same fundamental network architecture over a series of
experiments; the architecture is an extension of ResNet18 [33].
The output of the final dropout layer is the deeply learned
expression feature xi .
All models are trained in an end-to-end fashion by using the
stochastic gradient descent (SGD) method with a mini-batch
of 64 samples and a momentum coefficient of 0.9. All weights
are initialized by using a variance scaling initializer (He initial-
The gradient of J with respect to wk is: izer) [34], and the L2 weight decay is 5 × 10−4 . All of the models
are trained for 100 epochs, and the learning rate is initialized to
1 1 0.01 and decays by a γ of 0.9 every 5 epochs after 40 epochs.
Δwk = wTk wl − wl . (5)
n(n − 1) 1−n This letter utilizes the ten-crop method to recognize the testing
l∈N ,l=k
images. A testing image is cropped to ten 44 × 44 images, and
The desired wk is independent of the deep features xi ; there- then it is classified into the class with the highest average score
fore, the ideal wk is not necessarily updated in a mini-batch as with respect to these ten cropped images.
in conventional SL. In the proposed ASL, the weight vector wk
is first randomly initialized with a normal distribution and then B. Performance Evaluation
optimized by using the gradient descent method. The details for There are two popular criteria for evaluating the performance
computing wk are shown in Algorithm 1, and the rate α is set of classifiers: the overall accuracy rate (OA) and the average
as 0.1 in this letter. class accuracy (ACA), which are defined as follows:
Algorithm 2 illustrates the training process when the deep
networks are under the supervision of ASL. Compared with zc
OA = c∈N × 100%, (6)
conventional SL, the fundamental difference is that ASL sets n
c∈N c
the weights wk according to Algorithm 1 when the networks 1 zc
are initialized, and they are unchanged in the training phase. ACA = × 100%, (7)
|N | nc
Fig. 1(b) shows an example of ASL with the same settings c∈N
as in Section II-A, where one can clearly see that all classes where zc represents the number of samples that belong to the cth
are nearly uniformly distributed in the angular space. The ASL class and are correctly classified, nc is the number of samples in
enlarges the interclass angular distance of minority categories the cth class, and |N | denotes the number of categories. OA is the
(digits 2 and 5) and hence improves the recognition performance. most widely used criterion, and ACA can index the precision of
728 IEEE SIGNAL PROCESSING LETTERS, VOL. 27, 2020
each category. For a good classifier, both OA and ACA should TABLE III
THE RESULTS OF JOINT SUPERVISION FOR THE FER2013 DATASET. THE BEST
be as large as possible. RESULTS ARE MARKED IN BOLDFACE
In our experiments, every model was repeatedly tested N
(N = 10) times, and there are thus N results. Therefore, we
measure the recognition performance through the following two
statistical values of each evaluation criterion EC (i.e., OA or
ACA):
1
N
M ean = ECi , (8)
N i=1

1 N
ST D = (ECi − M ean)2 , (9)
N i=1
where ECi represents the ith value of EC. M ean indicates

the accuracy; i.e., the classifier is more accurate if M ean is
greater. The standard deviation (ST D) indicates the reliability TABLE IV
(variation) of the recognition results; i.e., the recognizer is more THE RESULTS OF JOINT SUPERVISION FOR THE RAF DATABASE. THE BEST
reliable when ST D is smaller. RESULTS ARE MARKED IN BOLDFACE
C. Experimental Results and Analysis

1) Performance Comparison: In this experiment, we exam-
ine the performance of our ASL function. The basic networks
were supervised by different loss functions (i.e., SL, IL [18],
CL [14], L2SL [25], and ASL), and the results with respect to
the FER2013 and RAF databases are shown in Tables I and II.
On the FER2013 dataset, it can clearly be observed that
our ASL outperforms all of the other methods with respect
Finally, the proposed ASL is easy to implement and has
to accuracy (i.e., greatest M ean) and reliability (i.e., smallest
lower computational complexity because it is very simple. For
ST D) for both ACA and OA.
example, the time to train one epoch on the FER2013 database
On the RAF database, ASL achieves the best OA and the
for ASL, SL, IL, CL, and L2SL is 40.02 s, 40.75 s, 49.22 s,
greater ACA (see M ean in Table II). Although IL has the highest
41.73 s, and 43.83 s, respectively. Although the ASL needs
reliability, its recognition accuracy is definitively the lowest.
approximately 30 s to initialize network parameters, it has lower
This experiment illustrates that the proposed ASL is more
computational complexity overall.
accurate and reliable than many state-of-the-art methods.
2) Joint Supervision: As is the case for conventional SL, the
proposed ASL can be jointly applied with other loss functions. IV. CONCLUSION
In this experiment, we tested two joint signals (i.e., ASL+CL In facial expression recognition tasks, the expression data are
and ASL+L2SL) on the two databases, and the results are usually imbalanced; however, few FER approaches effectively
shown in Tables III and IV. One can clearly see that the joint cope with this problem. Hence, this letter presents a novel loss
signals ASL+CL and ASL+L2SL are superior to CL and L2SL, function, called advanced softmax loss, to mitigate the bias
respectively, in every case. Although ASL+CL and ASL+L2SL induced by imbalanced training expression data. The proposed
produce a trivial enhancement of accuracy, their outperformance loss ensures that every class has an equivalent classification do-
with respect to reliability is remarkable. main and ability by using fixed (unlearnable) weight parameters
This experiment demonstrates that our proposed ASL can that have the same magnitude and are uniformly distributed in
cooperate with many state-of-the-art methods and improve their the angular space. The experimental results on two benchmark
performance. real-world expression databases demonstrate that our proposed
3) Discussion: The proposed method has the following ad- ASL is more accurate and reliable than many state-of-the-art
vantages: FER works. The proposed loss can be individually employed as
First, the deeply learned expression features supervised by the supervision signal and can also be used to jointly supervise
our ASL have greater and more balanced (unbiased) interclass deep networks with other loss functions.
angular distances, and hence, it performs better on both majority The proposed ASL can be used to many classification prob-
classes and minority categories. Consequently, the proposed lems in theory; however, it cannot efficiently tackle with a large
ASL is superior to many state-of-the-art approaches. number of categories (e.g., face recognition) because the number
Second, since the proposed method constrains only the weight of pairwise angles dramatically increases with the growth of
vectors, it can be easily plugged into other methods and improves classes. Therefore, how to address this problem is our future
their capability. work.
JIANG et al.: ACCURATE AND RELIABLE FACIAL EXPRESSION RECOGNITION USING ADVANCED SOFTMAX LOSS WITH FIXED WEIGHTS 729
REFERENCES [17] Z. Luo, J. Hu, and W. Deng, “Local subclass constraint for facial expression
recognition in the wild,” in Proc. 24th Int. Conf. Pattern Recognit., 2018,
[1] R. K. Gupta and S. D. Senturia, “Real time face detection and facial pp. 3132–3137.
expression recognition: Development and applications to human computer [18] J. Cai, Z. Meng, A. S. Khan, Z. Li, J. O. Reilly, and Y. Tong, “Island loss
interaction,” in Proc. Conf. Comput. Vis. Pattern Recognit. Workshop, for learning discriminative features in facial expression recognition,” in
2003. Proc. IEEE Int. Conf. Autom. Face Gesture Recognit., 2018, pp. 302–309.
[2] J. Deng, C. Pang, Z. Zhang, Z. Pang, H. Yang, and G. Yang, “CGAN based [19] K. Zhao, J. Xu, and M.-M. Cheng, “RegularFace: Deep face recognition
facial expression recognition for human-robot interaction,” IEEE Access, via exclusive regularization,” in Proc. IEEE Conf. Comput. Vision Pattern
vol. 7, pp. 2169–3536, 2019. Recognit., 2019, pp. 1136–1144.
[3] G. E. Hinton, A Practical Guide to Training Restricted Boltzmann Ma- [20] S. Li and W. Deng, “A deeper look at facial expression dataset bias,”
chines. Berlin, Germany: Springer, 2012, pp. 599–619. Apr. 2019, arXiv:1904.11150v1.
[4] A. Majumder, L. Behera, S. Member, and V. K. Subramanian, “Automatic [21] W. Deng, Y. Liu, J. Hu, and J. Guo, “The small sample size problem
facial expression recognition system using deep network-based data fu- of ICA: A comparative study and analysis,” Pattern Recognit., vol. 45,
sion,” IEEE Trans. Cybern., vol. 48, no. 1, pp. 103–114, Jan. 2018. pp. 4438–4450, 2012.
[5] Y.-H. Lai and S.-H. Lai, “Emotion-preserving representation learning via [22] W. Deng, J. Hu, X. Zhou, and J. Guo, “Equidistant prototypes em-
generative adversarial network for multi-view facial expression recogni- bedding for single sample based face recognition with generic learning
tion,” in Proc. IEEE Int. Conf. Autom. Face Gesture Recognit. Workshops, and incremental learning,” Pattern Recognit., vol. 47, pp. 3738–3749,
2018, pp. 263–270. 2014.
[6] H. Yang, Z. Zhang, and L. Yin, “Identity-adaptive facial expression [23] M. Hayat, S. Khan, W. Zamir, J. Shen, and L. Shao, “Max-margin class im-
recognition through expression regeneration using conditional generative balanced learning with Gaussian affinity,” Jan. 2019, arXiv:1901.07711v1.
adversarial networks,” in Proc. IEEE Int. Conf. Autom. Face Gesture [24] J. Ian et al., “Challenges in representation learning: A report on three
Recognit. Workshops, 2018, pp. 294–301. machine learning contests,” Neural Netw., vol. 64, pp. 59–63, Apr. 2015.
[7] F. Zhang, T. Zhang, Q. Mao, and C. Xu, “Joint pose and expression [25] F. Wang, X. Xiang, J. Cheng, and A. L. Yuille, “Normface: L2 hypersphere
modeling for facial expression recognition,” in Proc. IEEE Conf. Comput. embedding for face verification,” in Proc. ACM Int. Conf. MultiMed., 2017,
Vis. Pattern Recognit., 2018, pp. 3359–3368. pp. 1041–1049.
[8] S. Li and W. Deng, “Deep facial expression recognition: A survey,” [26] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hy-
Oct. 2018, arXiv:1804.08348v2. persphere embedding for face recognition,” in Proc. IEEE Conf. Comput.
[9] B. Fasel, “Robust face analysis using convolutional neural networks,” Ob- Vision Pattern Recognit., 2017, pp. 6738–6746.
ject Recognit. Supported User Interact. Service Robots, vol. 2, Aug. 2002, [27] H. Wang et al., “CosFace: Large margin cosine loss for deep face recog-
pp. 40–43. nition,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2018,
[10] B. Sun, L. Li, G. Zhou, and J. He, “Facial expression recognition in the pp. 5265–5274.
wild based on multimodal texture features,” J. Elect. Imag., vol. 25, no. 6, [28] J. deng, J. Guo, and N. Xue, “ArcFace: Additive angular margin loss
2016, Art. no. 061407. for deep face recognition,” in Proc. IEEE Conf. Comput. Vision Pattern
[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies Recognit., 2019, pp. 4685–4694.
for accurate object detection and semantic segmentation,” in Proc. IEEE [29] Y. Lecun and C. Cortes, “The MNIST database of handwritten digits,”
Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 580–587. Oct. 2019.
[12] K. Wang, X. Peng, J. Yang, S. Lu, and Y. Qiao, “Suppressing uncertainties [30] R. Gao, F. Yang, W. Yang, and Q. Liao, “Margin loss: Making faces
for large-scale facial expression recognition,” 2020, arXiv:2002.10392v2. more separable,” IEEE Signal Process. Lett., vol. 25, no. 2, pp. 308–312,
[13] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning Feb. 2018.
approach for deep face recognition,” in Proc. Eur. Conf. Comput. Vis., [31] W. Rudin, Principles of Mathematical Analysis, vol. 3. New York, NY,
2016, pp. 499–515. USA: McGraw-Hill, 1964, ch. 10.
[14] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A comprehensive study on center [32] M. A. EI-Gebeily and Y. A. Fiagbedzi, “On certain properties of the regular
loss for deep face recognition,” Int. J. Comput. Vision, vol. 127, pp. 668– n-simplex,” Int. J. Math. Educ. Sci. Technol., vol. 35, no. 4, pp. 617–629,
683, Jun. 2019. 2004.
[15] S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality- [33] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
preserving learning for expression recognition in the wild,” in Proc. Conf. recognition,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2016,
Comput. Vision Pattern Recognit., 2017, pp. 2584–2593. pp. 770–778.
[16] S. Li and W. Deng, “Reliable crowdsourcing and deep locality-preserving [34] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
learning for unconstrained facial expression recognition,” IEEE Trans. Surpassing human-level performance on imagenet classification,” in Proc.
Image Process., vol. 28, no. 1, pp. 356–370, Jan. 2019. IEEE Int. Conf. Comput. Vision, 2015, pp. 1026–1034.

Accurate and Reliable Facial Expression Recognition Using Advanced Softmax Loss With Fixed Weights

Uploaded by

Copyright:

Available Formats

You might also like

Accurate and Reliable Facial Expression Recognition Using Advanced Softmax Loss With Fixed Weights

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Accurate and Reliable Facial Expression Recognition Using Advanced Softmax Loss With Fixed Weights

Uploaded by

Copyright:

Available Formats

IEEE SIGNAL PROCESSING LETTERS, VOL.

27, 2020 725

Accurate and Reliable Facial Expression Recognition

Algorithm 1: Computing Weights for ASL.

TABLE I III. EXPERIMENTS

where ECi represents the ith value of EC. M ean indicates

C. Experimental Results and Analysis

You might also like