Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Neurocomputing 204 (2016) 198–210

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Face recognition using class specific dictionary learning for sparse


representation and collaborative representation
Bao-Di Liu a,n, Bin Shen b, Liangke Gui c, Yu-Xiong Wang c, Xue Li e, Fei Yan d,
Yan-Jiang Wang a
a
College of Information and Control Engineering, China University of Petroleum, Qingdao 266580, China
b
Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
c
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
d
Lijin County Party Committee Office, Dongying 257499, China
e
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

art ic l e i nf o a b s t r a c t

Article history: Recently, sparse representation based classification (SRC) and collaborative representation based clas-
Received 2 March 2015 sification (CRC) have been successfully used for visual recognition and have demonstrated impressive
Received in revised form performance. Given a test sample, SRC or CRC formulates its linear representation with respect to the
7 July 2015
training samples and then computes the residual error for each class. SRC or CRC assumes that the
Accepted 20 August 2015
Available online 8 April 2016
training samples from each class contribute equally to the dictionary in the corresponding class, i.e., the
dictionary consists of the training samples in that class. This, however, leads to high residual error and
Keywords: instability. To overcome this limitation, we propose a class specific dictionary learning algorithm. To be
Class specific dictionary learning specific, by introducing the dual form of dictionary learning, an explicit relationship between the basis
Sparse representation
vectors and the original image features is represented, which also enhances the interpretability. SRC or
Collaborative representation
CRC can be thus considered as a special case of the proposed algorithm. Blockwise coordinate descent
Face recognition
algorithm and Lagrange multipliers are then adopted to optimize the corresponding objective function.
Extensive experimental results on five benchmark face recognition datasets show that the proposed
algorithm achieves superior performance compared with conventional classification algorithms.
& 2016 Elsevier B.V. All rights reserved.

1. Introduction samples. Ho et al. [9] and Tao et al. [23] proposed a nearest sub-
space method to assign the label of a test image by comparing its
Face recognition is a classical yet challenging research topic in reconstruction error for each category.
computer vision and pattern recognition [33]. Two stages are Under the nearest subspace [44, 45] framework, Wright et al.
usually considered for effective face recognition: (1) feature [27] proposed a sparse representation based classification (SRC)
extraction, (2) classifier construction and label prediction. For the system and achieved impressive performance. Given a test sample,
first stage, Turk and Pentland [25] proposed eigenfaces by per- sparse representation techniques represent it as a sparse linear
forming principal component analysis (PCA). He et al. [8] proposed combination of the training samples. The predicted label is
laplacianfaces to preserve local information. Belhumeur et al. [2] determined by the residual error from each class. Different from
suggested fisherfaces to maximize the ratio of between-class traditional decomposition frameworks like PCA, non-negative
scatter to within-class scatter. Yan et al. [28] proposed a multi- matrix factorization [39], and low-rank factorization [40], SRC
subregion based correlation filter bank algorithm to extract both allows coding under over-complete bases, and thus makes the
the global-based and local-based face features. For the latter stage, attained sparse codes capable of representing the data more
Richard et al. [19] proposed a nearest neighbor method to predict adaptively and flexibly. To analyze SRC, Zhang et al. [31] proposed
the label of a test image using its nearest neighbors in the training collaborative representation based classification (CRC) as an
alternative approach. CRC represents a test sample as the linear
combination of almost all the training samples. They found that it
n
Corresponding author. Tel.: þ 86 15764217948. is the collaborative representation rather than the sparse repre-
E-mail addresses: thu.liubaodi@gmail.com (B.-D. Liu),
stanshenbin@gmail.com (B. Shen), guiliangke@gmail.com (L. Gui),
sentation that makes the nearest subspace method powerful for
yuxiongw@cs.cmu.edu (Y.-X. Wang), lixue421@gmail.com (X. Li), classification. SRC, CRC, and their variants have been also used in
232842281@qq.com (F. Yan), yjwang@upc.edu.cn (Y.-J. Wang). other visual data sensing and analysis tasks, such as image

http://dx.doi.org/10.1016/j.neucom.2015.08.128
0925-2312/& 2016 Elsevier B.V. All rights reserved.
B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210 199

classification [12], image inpainting [41], object detection [42], demonstrated performance of hybrid dictionary learning, it is still
image annotation [1], and transfer learning [43]. a challenge to balance between the shared dictionary and the class
Despite their promise, both SRC and CRC algorithms directly use specific dictionary.
the training samples as the dictionary for each class. By contrast, a In this paper, motivated by the superior performance of the SRC
well learned dictionary, especially by enforcing some discriminative and CRC algorithms and the class specific dictionary learning
criteria, can reduce the residual error greatly and achieve superior method, we propose class specific dictionary learning (CSDL) for
performance for classification tasks. Existing discriminative dic- both sparse representation based classifier (CSDL-SRC) and colla-
tionary learning approaches are mainly categorized into three borative representation based classifier (CSDL-CRC). Fig. 1 shows
types: shared dictionary learning, class specific dictionary learning, the framework of our proposed CSDL. The major distinction
and hybrid dictionary learning. In shared dictionary learning, each between our approach and the existing class specific dictionary
basis is associated to all the training samples. Mairal et al. [16] learning methods is that the existing methods directly optimize
proposed to learn a discriminative dictionary with a linear classifier the dictionary basis vectors (the “primal” form), whereas we
of coding coefficients. Liu et al. [15] learned a Fisher discriminative leverage a “dual” reformulation of dictionary learning and opti-
dictionary. Liu et al. [35, 36, 37] and Yu et al. [38] presented a graph mize the weights instead. Compared with the “primal” form
embedded dictionary learning method. Zhang and Li [32] proposed widely used by the existing methods, our novel “dual” form offers
a joint dictionary learning algorithm for face recognition. In class several benefits:
specific dictionary learning, each basis only corresponds to a single
class so that the class specific reconstruction error could be used for  It provides an explicit relationship between the basis vectors
classification. Yang et al. [30] learned a dictionary for each class with and the original image features, thus enhancing the interpret-
sparse coefficients and applied it for face recognition. Sprechmann ability of the learned dictionary.
and Sapiro [22] also learned a dictionary for each class with sparse  It is easy to be kernelized due to the separation of original data,
representation and used it in signal clustering. Castrodad and Sapiro which is difficult for the existing methods. The generalization to
[4] learned a set of action specific dictionaries with non-negative kernel spaces will be elaborated in Section 5.7.3.
penalty on both dictionary atoms and representation coefficients.  Most of the existing class specific dictionary learning methods
Wang et al. [26] introduced mutual incoherence information to focus on introducing additional regularization terms, which
promote class specific dictionary learning in action recognition. could be easily incorporated into our dual formulation of class
Yang et al. [29] embedded the Fisher discriminative information specific dictionary learning to further improve the performance.
into class specific dictionary learning.
The shared dictionary learning approaches usually lead to a Our main contributions are threefold:
dictionary of small size and the discriminative information (i.e.,
the label information corresponding to coding coefficients) is  We propose a novel class specific dictionary learning scheme
embedded into the dictionary learning framework. The class spe- that considers the weight of each sample when generating the
cific dictionary learning approaches usually focus on the classifier dictionary (i.e., subspace). The traditional CRC and SRC methods
construction aspect since each basis vector is fixed to a single class perform face recognition without training procedures (i.e., the
label. The combination of shared basis vectors and class specific training samples are directly used for predicting the labels). By
basis vectors is then learned in hybrid dictionary learning. Zhou contrast, our proposed method compensates this deficiency by
et al. [34] learned a hybrid dictionary with Fisher regularization on introducing class specific dictionary learning. It is applicable to
the coding coefficient. Gao et al. [6] learned a shared dictionary to both CRC and SRC. Furthermore, it is reasonable to assume
encode common visual patterns and learned a class specific dic- additionally that different samples contribute unevenly in
tionary to encode subtle visual differences among different cate- constructing the corresponding subspace. CRC or SRC can be
gories for fine-grained image representation. Liu et al. [12] pro- thus viewed as special cases of our proposed algorithm.
posed a hierarchical dictionary learning method to produce a  We propose the dual form of dictionary learning to enhance the
shared dictionary and a cluster specific dictionary. In spite of the interpretability.

Fig. 1. The framework of our proposed class specific dictionary learning algorithm.
200 B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210

 We propose a simple optimization procedure that combines 2: for c ¼ 1; c rC; c þ þ do


 c 2
blockwise coordinate descent and Lagrange multipliers to solve 3: Compute the residuals ec ðyÞ ¼ y  X c s^ 2
the minimization problem.
4: end for
5: idðyÞ ¼ arg minc fec g
A preliminary version of this work appeared as Liu et al. [13].
6: return id(y)
We clarify many technical details omitted in the previous version,
present results on substantially extended datasets (e.g., Yale
dataset and ORL dataset), and provide an in-depth analysis of how, 2.2. Collaborative representation based classification
why, and when the proposed algorithm is superior to the standard
approaches. These extensive experimental evaluations demon- Zhang et al. [31] proposed the collaborative representation
strate that our proposed algorithm achieves impressive perfor- based classification (CRC) by replacing the objective function in
mance for face recognition tasks. SRC with ℓ2 regularized minimization problems as
The rest of the paper is organized as follows. Section 2 reviews n 2 o
sparse representation and collaborative representation based s^ ¼ arg min y  Xs2 þ βksk22 : ð3Þ
s
classifier algorithms. Section 3 explains the proposed class specific
Here, s A RN1 is the fitting coefficient of y. β is the regularization
dictionary learning for sparse representation and collaborative
parameter to control the tradeoff between fitting goodness and
representation methods. The solution to the corresponding opti-
collaborative property (i.e., multiple entries in X participating into
mization problem together with its convergence analysis is ela-
representing the test sample).
borated in Section 4. The overall algorithm is also summarized.
The residual error ec in Algorithm 1 is associated with most of
The application of the proposed CSDL algorithm in face recognition
the images in class c. Both SRC and CRC algorithms directly use the
is shown in Section 5. Finally, discussions and conclusions are
training samples as the dictionary and encode the test sample y as
drawn in Section 6.
y  XWs; ð4Þ
where W A RNN is an identity matrix. This means that the training
2. Overview of SRC and CRC samples contribute equally for constructing the dictionary B ¼ XW
when representing the test sample y.
In this section, two classical classifier algorithms, sparse
representation based classifier and collaborative representation
based classifier, are briefly reviewed. Sparse representation and 3. Proposed class specific dictionary learning algorithm
collaborative representation algorithms can be considered as
methods of rearranging the structure of the original data in order From Eqns. (1), (3) and (4), we could observe that W is pre-
to make the representation compact under over-complete and defined as an identity matrix I. However, for different classification
non-orthogonal bases. Hence, the data vector is represented as a tasks and sample distribution, one would favor a data-driven for-
linear combination of active basis vectors. mulation of W. That is, to make W more adaptive, it would be of
great benefit to impose that the training samples of the same class
2.1. Sparse representation based classification have different weights when constructing bases in the corre-
sponding dictionary while the training samples have no con-
Wright et al. [27] proposed the sparse representation based tribution when constructing bases in the different classes of dic-
classification (SRC) algorithmh for robust
i face recognition. Given tionary. Hence, we generalize the weight coefficient matrix from
the training samples X ¼ X 1 ; X 2 ; …; X C A RDN , X c A RDNc repre- W¼I in the cases of SRC and CRC to a block-diagonal matrix as
sents the training samples from the cth class, C represents the given in Fig. 2.
number of classes, Nc represents the number of training samples in W can be obtained effectively by dictionary learning. SRC and
P
the cth class (N ¼ Cc ¼ 1 N c ), and D represents the dimension of the CRC can be thus considered as a special case of our proposed class
samples. Supposing that y A RD1 is a test sample, the sparse specific dictionary  learning. Now, ec is calculated by
representation algorithm aims to solve the following objective ec ðyÞ ¼ y  X c s^ c 2 . The with-in class residual error will further
function: decrease via learning an adaptive dictionary for sparse repre-
n 2 o sentation or collaborative representation.
s^ ¼ arg min y  Xs2 þ 2αksk1 : ð1Þ These motivate our dictionary learning for sparse representa-
s
tion, whose objective function is
Here, Eqn. (1) is the ℓ1-norm regularizedleast-squares(ℓ1-ℓs) ( )
minimization problem. s A RN1 is the sparse code of y. α is the   XNc

regularization parameter to control the tradeoff between fitting f W c ; Sc ¼ kX c  X c W c Sc ‖2F þ 2α ‖Sc•n ‖1


n¼1
goodness and sparseness. The sparse representation based classi-  c c 2
s:t: X W  r 1; ∀k ¼ 1; 2; …; K: ð5Þ
fier is to find the minimum value of the residual error for each •k F

class:
 c 2
idðyÞ ¼ arg minc y  X c s^ 2 : ð2Þ ⎡W 1 0 0 L ⎤ 0
c
s^ represents the partial sparse codes related to images in class c. ⎢ 2 ⎥
The procedure of SRC is shown in Algorithm 1. The impressive ⎢0 W 0 L 0 ⎥
results of SRC have been reported in Wright et al. [27]. ⎢0 0 W3 L 0 ⎥
Algorithm 1. Algorithm for SRC.
⎢ ⎥
⎢L L L L L ⎥
Require: Training samples X A RDN , α, and test sample y ⎢ ⎥
Code y with the dictionary X via ℓ1-minimization. ⎢⎣0 0 0 L WC⎥
1:
n 2 o ⎦
s^ ¼ arg mins y  Xs þ 2αksk1
2
Fig. 2. The learned weight coefficient matrix W for constructing dictionary.
B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210 201

Similarly, the objective function of learning dictionary for col- Based on the convexity and monotonic property of the parabolic
laborative representation is function, it is easy to know that f ðSckn Þ reaches the minimum at the
       unique point
f W c ; Sc ¼ X c  X c W c Sc 2F þ βSc 2F
 c c 2
Sckn ¼ minf½W c X c X c kn  ½ES~c kn ;  αg
kn
s:t: X W •k F r 1; ∀k ¼ 1; 2; …; K: ð6Þ T T

þ maxf½W c X c X c kn  ½ES~c kn ; αg;


T T kn
In Eqns. (5) and (6), c is the cth class and K is the size of the ð10Þ
learned dictionary. In the paper, K is set to be twice of the number ( c
S ; p ak J q a n
where E ¼ W c X c X c W c , S~c ¼
T T kn pq
of the training samples in c. Wc is thus the learned weight coef-
0; p ¼ k&q ¼ n:
ficient for constructing the dictionary and Sc is the corresponding
sparse representation or collaborative representation. Eqns. Furthermore, given that the optimal value for Sckn does not
depend on the other entries in the same row, each entire row of Sc
(5) and (6) can be intuitively viewed as the “dual” reformulations
can be optimized simultaneously. That is,
of the “primal” form of the sparse representation and collaborative
Sck ¼ minf½W c X c X c k  ½ES~c k ;  αg
representation problems, respectively. This is better understood if T T k

one draws an analogy with linear SVM training: one can formulate
þ maxf½W c X c X c k  ½ES~c k ; αg;
T T k
the training problem as an optimization over (1) either directly the ð11Þ
( c
weights of a linear classifier vector of the same dimension as the Sp ; p a k
where S~c ¼
k
input signal (2) or the support vector weights, that is a vector of
0; p ¼ k:
dimension equal to the size of the training set that is used to
linearly combine the training inputs. In the context of the problem 4.1.2. ℓ2–ℓs minimization subproblem
here, the support vector weights are analogous to the weights W. With Sc fixed , the objective function of ℓ2–ℓs subproblem
The atoms now capture a notion of centroids similar to K-means, becomes
which explicitly expresses what happens during dictionary      
f W c ¼ X c  X c W c Sc 2F s:t: X c W c•k 2F r 1; ∀k ¼ 1; 2; …; K: ð12Þ
learning, leading to enhanced interpretability.
Here, the Lagrange multipliers are used to solve the ℓ2-norm
constrained minimization subproblem.
4. Optimization of the objective function W c can be optimized in a column-wise manner. Specifically,
T
ignoring the constant term trfX c X c g, its Lagrangian is
In this section, we focus on solving the optimization problem of K h
X i
LðW c ; λk ; μk Þ ¼  2
T
the two types of CSDL algorithms proposed above. Specifically, Sc X c X c W c•k
k
similar to the optimization strategy used in Lee et al. [11] and Liu k¼1

et al. [14], we cast the problem as two subproblems via alternating X


K h i
T T T
þ W c•k X c X c W c Sc Sc
minimization: ℓ1-norm or ℓ2-norm regularized least-squares k
k¼1
minimization subproblem with W c fixed, and ℓ2-norm con-  h i
strained least-squares minimization subproblem with Sc fixed. þ λk 1  W X X W cT cT c c
; ð13Þ
kk

where λk is the Lagrange variable.


4.1. Class specific dictionary learning for sparse representation
According to the Karush–Kuhn–Tucker (KKT) conditions, the
optimal solution W c•k should satisfy the following criteria:
The sparse representation based class specific dictionary
learning problem can be decomposed into ℓ1-norm regularized ∂LðW c ; λk Þ
ðaÞ : ¼ 0; ð14Þ
least-squares (ℓ1-ℓs) minimization subproblem with W c fixed and ∂W c•k
ℓ2-norm constrained least-squares (ℓ2-ℓs) minimization sub-
problem with Sc fixed.
T T
ðbÞ : ð1  ½W c X c X c W c kk Þ ¼ 0
4.1.1. ℓ1–ℓs minimization subproblem λk a 0: ð15Þ
With W c fixed, the objective function of ℓ1-ℓs subproblem is Hence, the solution to W c ð:; kÞ becomes
  X
Nc 
W c•k ¼ Sc•k  ½W~ c F•k sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T k
f Sc ¼ ‖X c  X c W c Sc ‖2F þ 2α Sc k1 : ð7Þ  ;
•n
Sc  W~ c F ~ c kF
k
n¼1 T X c T X c Sc  W
T T
•k •k •k
Eqn. (7) can be simplified as •k

ð16Þ
  n o N h
X i (
T T
f Sc ¼ tr X c X c  2 X c X c W c Sc•n W•pc; p≠k
where F ¼ Sc Sc and W~ c ¼
T k
n•
n¼1
0; p ¼ k:
X
N h i X
K X
N 
þ
T T T
Sc•k W c X c X c W c Sc•n þ 2α Sc j: ð8Þ Algorithm 2. CSDL algorithm for sparse representation.
kn
n¼1 k¼1n¼1
Require Data matrix X c A RDN , α, and K
T
Ignoring the constant term trfX c X c g, the objective function of Sckn 1: W c ’randðN c ; KÞ; Sc ’zerosðK; N c Þ
c 
reduces
 to Eqn. (9) with W and S1n ; S2n ; …; SKn ⧹Sckn fixed. Here,
c c c
2: for k ¼ 1; k r K; k þ þ do
( Sc1n ; Sc2n ; …; ScKn ⧹Sckn stands for all elements in Sc except the qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3: T
element in the kth row and nth column): W c•k ¼ W c•k = W c•k W c•k
( ) 4: end for
  nh i o XK h i
T T T T
f Sckn ¼ Sckn 2 W c X c X c W c kk þ2Sckn W c X c X c W c kl Scln 5: iter ¼ 0
l ¼ 1;l≠k 6: while ðf ðiterÞ  f ðiter þ 1ÞÞ=f ðiterÞ 4 1e  5 do
nh i o  7: iter’iter þ 1
2Sckn cT
W X X cT c
þ 2αSckn j: ð9Þ
kn 8: Update Sc :
9: T T
T T
Here, ½W c X c X c W c kk ¼ 1. f ðSckn Þ is a piece-wise parabolic function. E ¼ W c Xc XcW c
202 B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210

10: for k ¼ 1; k rK; k þ þ do For ℓ2–ℓs regularized minimization subproblem, the objective
11: function is
¼ minf½W X X k  ½ES~c k ;  αg
cT k
cT
Sck c
 2  2
12:
þ maxf½W c X c X c k  ½ES~c k ; αg
T T k
f ðSc Þ ¼ X c  X c W c Sc F þ βSc 2 : ð18Þ
13: end for Eqn. (18) can be easily solved by derivation and its analytical
14: Update W c : solutions is
15: T
Compute F ¼ Sc Sc , G ¼ F  ð1 IÞ  1
16: for k ¼ 1; k rK; k þ þ do Sc ¼ W Tc X Tc X c W c þ βI W Tc X Tc X c : ð19Þ
17: T
W c•k ¼ Sc•k  W c G•k
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi For ℓ2–ℓs constrained minimization subproblem, the objective
18: T T
W c•k ¼ W c•k = W c•k X c X c W c•k function is the same as Eqn. (12), and the solution thus remains as
19: end for Eqn. (16).
20: Update the objective function:
21: P
Nc 4.3. CSDL for label prediction
f ¼ ‖X c  X c W c Sc ‖2F þ 2α ‖Sc•n k1
n¼1
22: end while Algorithm 3 shows our proposed CSDL-SRC or CSDL-CRC algo-
23: return W c , and Sc rithm for predicting the label of a test sample.

Algorithm 3. Algorithm for label prediction of our proposed CSDL


4.1.3. Convergence analysis and discussion algorithm.
The objective function in Eqn. (5) is nondifferentiable and Require Training samples X A RDN , α, β, and test sample y
nonconvex. Under our optimization framework, it is partitioned as
1: for c ¼ 1; c r C; c þ þ do
2K blocks in which each block is a column of Wc or a row of Sc:
2: Code y with the learned Wc via Eqn. (11) or Eqn. (19).
 
  X
K 3: Compute the residuals ec ðyÞ ¼ y X c W c sc 2
f W c1 ; …; W cK ; Sc1 ; …; ScK ¼ ‖X c  X c W ck Sck ‖2F þ 2αkSck k1
4: end for
k¼1
 c c 2 5: idðyÞ ¼ arg minc fec g
s:t: X W  r 1; 8 k ¼ 1; 2; …; K: ð17Þ
k 2 6: return id(y)
Since the exact minimization point is obtained by Eqn. (11) or
Eqn. (16), each operation updates Sc1 ; …; ScK ; W c1 ; …; W cK alter-
nately, and it monotonically decreases the objective function in
5. Experimental results
Eqn. (5). Considering that the objective function is obviously
bounded below, it converges.
In this section, we use five standard benchmark datasets,
Now consider the convergence of the variables W c and Sc in the
including the Yale dataset [2], ORL dataset [20], Extended YaleB
function. Each subproblem with respect to W c1 ; W c2 ; …; W cK , Sc1 ;
Sc2 ; …; ScK becomes convex. It thus satisfies the separability and dataset [7], CMU PIE dataset [21], and AR dataset [17], to evaluate
regularity properties proposed in Tseng [24], and the optimal value the performance of our proposed class specific dictionary learning
for a given block of variables is uniquely achieved by the solution (CSDL) algorithm. We first describe experimental settings, and
Eqns. (11) and (16) due to the strict convexity of subproblems at then present the corresponding experimental results on these
each iteration. These properties make the subsequence (W c and datasets. Finally, we give some additional analysis and discussion
Sc ) via alternating minimization by block coordinate descent with respect to dictionary size, visualization and generalization to
converge to a stationary point. The detailed proof of the con- kernel spaces.
vergence of (blockwise) coordinate descent for functions satisfying
some mild conditions is referred to as Theorem 4.1 in Tseng [24] 5.1. Experimental settings
and also in Bertsekas [3].
The time complexity of one iteration round for sparse coding For the dataset, each face image is cropped to 32  32, pulled
(obtaining S) and weight learning (learning W) are OðK 2 D þ2KDN into column vector, and ℓ2 normalized to form the raw ℓ2 nor-
þ K 2 NÞ and OðKD þ KDN þ 2K 2 NÞ, respectively. Here, K represents malized feature. After that, power normalization is also performed
the number of bases, D represents the dimensionality of features, to produce the power normalized features. To eliminate the ran-
and N represents the number of features. domness of the experiment, the data are randomly split into the
training set and the testing set 10 times, respectively. The mean
4.1.4. Overall algorithm and standard deviation of the face recognition rate are reported.
Our algorithm for sparse representation based class specific To show the performance of our proposed CSDL algorithm, four
dictionary learning is shown in Algorithm 2. classical algorithms are used as the baseline. They are nearest
Here, 1 A RKK is a square matrix with all elements 1, I A RKK is neighbor classification (NN), collaborative representation based
the identity matrix, and  indicates the Hadamard product. By classification (CRC) [31], sparse representation based classification
iterating Sc and W c alternately, the sparse codes are obtained, and (SRC) [27], and SVM [5]. Given a test sample, CRC and SRC compute
the corresponding bases are learned. its linear representation with respect to all the training sample,
and use the fitting error as criterion for classification. For SVM, one
4.2. Class specific dictionary learning for collaborative representation against all multi-class classification strategy is used by LIBSVM [5].
For the Yale dataset and ORL dataset, we randomly select
Similar to the above procedure, the collaborative representa- 5 samples per class for training and the rest samples per class for
tion based class specific dictionary learning problem can be testing. For the AR dataset, 7 samples per class are used for
decomposed into ℓ2-norm regularized least-squares (ℓ2–ℓs) training and the rest samples per class are used for testing. For the
minimization subproblem with W c fixed and ℓ2-norm constrained Extended YaleB dataset and CMU PIE dataset, 10 and 20 samples
least-squares (ℓ2–ℓs) minimization subproblem with Sc fixed. per class are used for training and testing, respectively.
B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210 203

Fig. 3. Examples of the Yale dataset.

Fig. 4. Influence of α for SRC & CSDL-SRC and β for CRC & CSDL-CRC on the Yale dataset, respectively.

Three parameters need tuning in our proposed CSDL algorithm.


α is used to adjust the tradeoff between the reconstruction error Table 1
and the sparsity for CSDL-SRC. β is used to adjust the tradeoff Recognition rate on the Yale dataset (%).
between the reconstruction error and the collaborative property
for CSDL-CRC. K is the dictionary size for each class. α is varied on Methods Features
the grid of f10  5 ; 10  4 ; 10  3 ; 10  2 ; 10  1 g. β is varied on the grid
ℓ2 norm Power norm
of f10  5 ; 10  4 ; 10  3 ; 10  2 ; 10  1 ; 100 g. K is set to be twice of the
number of the training samples per class. The detailed parameter NN 55.78 7 4.15 60.447 4.97
tuning procedure is shown in the following subsection. SVM 76.00 7 3.93 78.22 7 2.47
CRC 79.677 1.82 82.117 1.10
SRC 77.677 2.37 81.337 2.45
5.2. Yale dataset
CSDL-CRC 80.897 1.37 83.56 7 2.21
CSDL-SRC 79.00 72.89 84.567 1.61
The Yale dataset contains 165 grayscale face images of 15
individuals. For each individual, the images are captured under
variant facial expression or configuration, such as center-light,
wear glasses, happy, left-light, w/no glasses, normal, right-light,
81.33%, and for CSDL-SRC α is 0.01 with recognition rate 84.56%.
sad, sleepy, surprised, and wink. Fig. 3 shows some sample
Table 1 compares the recognition rate of NN, SVM, CRC, SRC,
images from the dataset. The performance with different values
CSDL-CRC, and CSDL-SRC. From Table 1, our proposed CSDL
of α or β is reported in Fig. 4. From Fig. 4, using ℓ2 normalized
algorithm achieves superior performance to the other four
features, for CRC the optimal β is 0.01 with recognition rate
classical classification methods.
79.67%, and for CSDL-CRC β is 0.1 with recognition rate 80.89%;
for SRC the optimal α is 0.001 with recognition rate 77.67%, and
for CSDL-SRC α is 0.001 with recognition rate 79%. Using power 5.3. ORL dataset
normalized features, for CRC the optimal β is 0.1 with recogni-
tion rate 82.11%, and for CSDL-CRC β is 0.1 with recognition rate For the ORL dataset, there are 10 different images of each of
83.56%; for SRC the optimal α is 0.01 with recognition rate 40 distinct subjects. For some subjects, the images were taken at
204 B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210

Fig. 5. Examples of the ORL dataset.

Fig. 6. Influence of α for SRC & CSDL-SRC and β for CRC & CSDL-CRC on the ORL dataset, respectively.

different times, varying lighting, facial expressions (open/closed


Table 2
eyes, smiling/not smiling), and face facial details (glasses/no Recognition rate on the ORL dataset (%).
glasses). All the images were taken against a dark homogeneous
background with the subject in an upright, frontal position. Methods Features

Fig. 5 shows some sample images from the dataset. The per- ℓ2 norm Power norm
formance with different values of α or β is reported in Fig. 6.
Table 2 shows the recognition rate of NN, SVM, CRC, SRC, CSDL- NN 85.8 73.18 84.25 72.82
SVM 94.65 72.57 93.7 72.95
CRC, and CSDL-SRC. From Table 2, our proposed CSDL algorithm CRC 94.6 72.39 93.75 72.29
achieves superior performance to the other four classical clas- SRC 93.65 72.22 92.157 1.55
sification methods. CSDL-CRC 96.457 1.61 95.45 71.96
CSDL-SRC 95.55 71.50 95.507 1.45

5.4. Extended YaleB dataset

For the Extended YaleB dataset, there are 2,414 frontal face 5.5. CMU PIE dataset
images and 38 individuals in total. All the images are captured
under varying illumination conditions. Fig. 7 shows some sample The CMU PIE dataset contains 41,368 images of 68 indivi-
images from the dataset. The performance with different values of duals in total. Each individual is under 13 different poses, 43
α or β is reported in Fig. 8. Table 3 shows the recognition rate of different illumination conditions, and with 4 different expres-
NN, SVM, CRC, SRC, CSDL-CRC, and CSDL-SRC. From Table 3, our sions. Each individual thus may lie on multiple manifolds. Five
proposed CSDL algorithm achieves superior performance to the near frontal poses (C05, C07, C09, C27, C29) and all different
other four classical classification methods. illuminations and expressions are used in our experiment. There
B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210 205

Fig. 7. Examples of the Extended YaleB dataset.

Fig. 8. Influence of α for SRC & CSDL-SRC and β for CRC & CSDL-CRC on the Extended YaleB dataset, respectively.

are about 170 images for each individual and 11,554 images in
total. Fig. 9 shows some sample images from the dataset. The Table 3
performance with different values of α or β is reported in Fig. 10. Recognition rate on the Extended YaleB dataset (%).
Table 4 shows the recognition rate of NN, SVM, CRC, SRC, CSDL-
Methods Features
CRC, and CSDL-SRC. From Table 4, our proposed CSDL algorithm
achieves superior performance to the other four classical clas- ℓ2 norm Power norm
sification methods.
NN 53.147 1.24 51.46 71.49
SVM 85.767 0.80 94.82 70.76
5.6. AR dataset CRC 89.96 7 1.04 97.377 0.74
SRC 89.92 7 1.09 97.13 70.56
For the AR dataset, there are over 4,000 frontal faces for 126 CSDL-CRC 90.36 7 0.90 97.747 0.70
individuals. A subset consisting of 50 male and 50 female cate- CSDL-SRC 91.467 0.64 98.287 0.57
gories is used here. There are 26 face images for each class.
Compared with the two former datasets, the AR dataset contains
more facial variations, such as illumination change, various comparison with state-of-the-art dictionary learning based face
expressions, and facial disguises. Fig. 11 shows some samples face recognition methods is presented.
images from the dataset. The performance with different values of
α or β is reported in Fig. 12. Table 5 shows the recognition rate of 5.7.1. The size of bases K per class
NN, SVM, CRC, SRC, CSDL-CRC, and CSDL-SRC. From Table 5, our Fig. 13 shows the influence of the size of bases K per class for
proposed CSDL algorithm achieves superior performance to the our proposed CSDL-CRC and CSDL-SRC. From Fig. 13, when K is
other four classical classification methods. small, e.g., K ¼5, the performance of CSDL is relative poor. When K
is larger than twice of the number of the training samples per
5.7. Analysis and discussions class, the performance becomes high and steady.

In this section, first, the size of bases per class is briefly ana- 5.7.2. Visualization of the learned dictionary
lyzed. Second, the learned dictionary is visualized and analyzed. Fig. 14 shows the comparison of the original face images and
Third, the generalization to kernel space is illustrated. Finally, the class specific learned dictionary with the size 20 (the learned
206 B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210

Fig. 9. Examples of the CMU PIE dataset.

Fig. 10. Influence of α for SRC & CSDL-SRC and β for CRC & CSDL-CRC on the CMU PIE dataset.

Table 4 ¼ ½ϕðX 1 Þ; ϕðX 2 Þ; …; ϕðX N Þ. Then, the objective function of


Recognition rate on the CMU PIE dataset (%). learning dictionary for sparse representation, Eqn. (5), can be
generalized to reproducing kernel Hilbert spaces as
Methods Features
( )
      X
Nc
ℓ2 norm Power norm f W c ; Sc ¼ ‖ϕ X c  ϕ X c W c Sc ‖2H þ 2α ‖Sc•n ‖1
n¼1
NN 44.78 71.48 44.187 1.41   c  c 2
s:t: ϕ X W  r 1; ∀k ¼ 1; 2; …; K: ð20Þ
SVM 85.067 0.61 85.54 7 0.79 •k H
CRC 86.54 7 0.76 88.0770.85
SRC 86.517 0.85 84.3570.79
Similarly, the objective function of learning dictionary for col-
laborative representation, Eqn. (6), can be generalized to repro-
CSDL-CRC 88.91 70.81 89.747 0.86
ducing kernel Hilbert spaces as
CSDL-SRC 90.157 0.72 91.32 70.56
          
f W c ; Sc ¼ ϕ X c  ϕ X c W c Sc 2H þ βSc 2H
  c  c 2
dictionary for each class is obtained by XWc. Fig. 14 demonstrates s:t: ϕ X W •k H r 1; ∀k ¼ 1; 2; …; K: ð21Þ
one representative class of training face images and the corre-
Now, the Frobenius norm has been replaced by the inner-
sponding dictionary from the Extended YaleB dataset). From
product norm of that Hilbert space, such that J ϕðX c Þ J 2H ¼ κ ðX c ; X c Þ,
Fig. 14, both CSDL-SRC and CSDL-CRC efficiently extend the origi-
with kernel function κ ðX ci ; X cj Þ ¼ ϕðX ci ÞT ϕðX cj Þ. The dictionary
nal face subspace with more choices of bases. A larger subspace
becomes a set of K arbitrary functions in that Hilbert space. Using
would reduce the residual error more adaptively when recon-
the “kernel trick”, we obtain Eqn. (22) for class specific dictionary
structing the samples from the same class.
learning for sparse representation as
  c        
5.7.3. Generalization to kernel spaces ϕ X  ϕ X c W c Sc 2 þ 2αSc 1 ¼ tr κ X c ; X c
H
    n o
Another important property of our proposed class specific T  
 2 tr κ X c ; X c W c Sc þ tr Sc W c κ X c ; X c W c Sc
T
dictionary learning is that it is easy to be kernelized due to the
separation of original data. Suppose that there exists a feature X
Nc 
þ 2α Sc k1 : ð22Þ
mapping function ϕ : RD -Rt . It maps the original feature space to •n
n¼1
a high dimensional kernel space: X ¼ ½X 1 ; X 2 ; …; X N -ϕðXÞ
B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210 207

Fig. 11. Examples of the AR dataset.

Fig. 12. Influence of α for SRC & CSDL-SRC and β for CRC & CSDL-CRC on the AR dataset, respectively.

Table 5 Similarly, we obtain Eqn. (23) for class specific dictionary


Recognition rate on the AR dataset (%). learning for collaborative representation as
  c        
Methods Features ϕ X  ϕ X c W c Sc 2 þ 2αSc 2 ¼ tr κ X c ; X c
H H

    n o  
ℓ2 norm Power norm T  
 2 tr κ X c ; X c W c Sc þ tr Sc W c κ X c ; X c W c Sc þ βSc 2H :
T

NN 36.89 71.34 42.23 7 1.55


ð23Þ
SVM 88.52 70.85 88.357 0.98
CRC 94.92 70.43 95.157 0.44
Therefore, we can now search an optimal dictionary directly in
SRC 92.487 0.49 90.82 7 0.71
the kernel space through optimizing W c instead of Bc (the latter is
CSDL-CRC 96.057 0.42 96.217 0.32
an extremely hard optimization problem). Since Eqns. (22) and
CSDL-SRC 95.54 70.46 96.287 0.42
(23) only depend on the kernel function κ ðX c ; X c Þ ¼ ϕðX c ÞT ϕðX c Þ,
which can be pre-computed before sparse representation and
collaborative representation, we can now handle arbitrary kernels
with tractable computation.

5.7.4. Comparison with state-of-the-art dictionary learning based


face recognition methods
We also compare the proposed CSDL-SRC and CSDL-CRC
methods with state-of-the-art dictionary learning based face
recognition methods, including joint dictionary learning (JDL) [34],
Fisher discrimination dictionary learning (FDDL) [29], dis-
criminative KSVD (DKSVD) [32], label consistent KSVD (LCKSVD)
[10], and dictionary learning with structure incoherence (DLSI)
[18]. Table 6 shows their comparisons for face recognition on the
AR dataset. For completeness we also include the results of SRC,
Fig. 13. Influence of the size of bases K per class on the Extended YaleB dataset. CRC, NN, and SVM.
208 B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210

Fig. 14. Comparison of the original faces and the learned dictionary from the Extended YaleB dataset with ℓ2 normalized features.

Table 6 the 7 images with illumination and expression changes from


Recognition rate of competing methods on the AR Session 1 were used for training, and the other 7 images with the
dataset (%).
same condition from Session 2 were used for testing. The size of
Methods the face images is 60  43. In Table 6, α is set to 0.01 for CSDL-SRC
(power) and 0.001 for CSDL-SRC(ℓ2). β is set to 0.01 for both CSDL-
NN 71.3
SVM 85.4
SRC(power) and CSDL-SRC(ℓ2). From Table 6, our proposed CSDL
CRC 90.4 algorithm achieves comparable recognition rates to the state-of-
SRC 89.8 the-art dictionary learning based face recognition methods with
DKSVD 85.4 less complexity.
LCKSVD 89.7
DLSI 89.8
JDL 91.7
FDDL 92.0
6. Conclusion
CSDL-CRC(ℓ2) 91.3
CSDL-SRC(ℓ2) 91.1
CSDL-CRC(power) 92.1 In this paper, motivated by the fact that sparse representation
CSDL-SRC(power) 92.9 and collaborative representation are two powerful tools in face
recognition tasks and that different training samples contribute
unequally to the class specific dictionary, class specific dictionary
For a fair comparison, we follow the experimental setup in learning for both sparse representation and collaborative repre-
Yang et al. [29], whose training set and testing set are publicly sentation are proposed. A dual form of dictionary learning is thus
available.1 The default number of bases is set as the number of proposed to explicitly interpret the relationship between the
training samples. The eigenface feature of dimension 300 is used. training samples and the basis vectors. Based on these, class
The cropped AR dataset consists of over 4,000 frontal images from specific dictionary learning is optimized by alternating blockwise
126 individuals. For each individual, 26 pictures were taken in two coordinate descent and Lagrange multipliers. Experimental results
separated sessions. We chose a subset consisting of 50 male sub-
show that our proposed CSDL algorithms, i.e., CSDL-SRC and
jects and 50 female subjects in the experiment. For each subject,
CSDL-CRC, have achieved superior performance for face
recognition tasks.
1
http://www4.comp.polyu.edu.hk/  cslzhang/code/FDDL_IJCV.zip
B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210 209

Acknowledgment [26] H. Wang, C. Yuan, W. Hu, C. Sun, Supervised class-specific dictionary learning
for sparse modeling in action recognition, Pattern Recognit. 45 (11) (2012)
3902–3911.
This paper is supported partly by the National Natural Science [27] J. Wright, A.Y. Yang, A. Ganesh, S. Sastry, Y. Ma, Robust face recognition via
Foundation of China (Grant nos. 61402535, 61271407), the Natural sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2) (2009)
210–227.
Science Foundation for Youths of Shandong Province, China (Grant
[28] Y. Yan, H. Wang, D. Suter, Multi-subregion based correlation filter bank for
no. ZR2014FQ001) , Qingdao Science and Technology Project (no. robust face recognition, Pattern Recognit. 47 (11) (2014) 3487–3501.
14-2-4-111-jch), and the Fundamental Research Funds for the [29] M. Yang, L. Zhang, X. Feng, D. Zhang, Sparse representation based fisher dis-
crimination dictionary learning for image classification, Int. J. Comput. Vis. 109
Central Universities, China University of Petroleum (East China) (3) (2014) 209–232.
(Grant no. 14CX02169A). [30] M. Yang, L. Zhang, J. Yang D. Zhang, Metaface learning for sparse representa-
tion based face recognition, in: Proceedings of the 17th International Con-
ference on Image Processing, IEEE, Hong Kong, 2010, pp. 1601–1604.
[31] D. Zhang, M. Yang, X. Feng, Sparse representation or collaborative repre-
sentation: which helps face recognition? in: Proceedings of the 13th Inter-
References national Conference on Computer Vision, IEEE, Barcelona, 2011, pp. 471–478.
[32] Q. Zhang, B. Li, Discriminative k-svd for dictionary learning in face recognition,
[1] D. Tao, L. Jin, W. Liu, X. Li, Hessian regularized support vector machines for in: Proceedings of the 23rd International Conference on Computer Vision and
mobile image annotation on the cloud, IEEE Trans. on Multimedia 15 (4) Pattern Recognition, IEEE, San Francisco, California, 2010, pp. 2691–2698.
(2013) 833–844. [33] W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: a literature
[2] P.N. Belhumeur, J.P. Hespanha, D. Kriegman, Eigenfaces vs. fisherfaces: survey, ACM Comput. Surv. 35 (4) (2003) 399–458.
recognition using class specific linear projection, IEEE Trans. Pattern Anal. [34] N. Zhou, Y. Shen, J. Peng, J. Fan, Learning inter-related visual dictionary for
Mach. Intell. 19 (7) (1997) 711–720. object recognition, in: Proceedings of the 25th International Conference on
[3] D.P. Bertsekas, Nonlinear Programming, Athena Scientific, Belmont, MA, 1999. Computer Vision and Pattern Recognition, IEEE, Province, Rhode Island, 2012,
[4] A. Castrodad, G. Sapiro, Sparse modeling of human actions from motion pp. 3490–3497.
imagery, Int. J. Comput. Vis. 100 (1) (2012) 1–15. [35] W. Liu, D. Tao, Multiview hessian regularization for image annotation, IEEE
[5] C. Chang, C. Lin, Libsvm: a library for support vector machines, ACM Trans. Trans. Image Process 22 (7) (2013) 2676–2687.
Intell. Syst. Technol. 2 (3) (2011) 27. [36] W. Liu, D. Tao, J. Cheng, Y. Tang, Multiview hessian discriminative sparse coding
[6] S. Gao, I.W.-H. Tsang, Y. Ma, Learning category-specific dictionary and shared for image annotation, Comput. Vis. Image Underst 118 (1) (2014) 50–60.
dictionary for fine-grained image categorization, IEEE Trans. Image Process. 23 [37] W. Liu, Z. Zha, Y. Wang, K. Lu, D. Tao, p-Laplacian regularized sparse coding for
(2) (2014) 623–634. human activityr recognition, IEEE Trans. Ind. Electron. (2016).
[7] A. Georghiades, P. Belhumeur, D. Kriegman, From few to many: illumination [38] J. Yu, D. Tao, M. Wang, Adaptive hypergraph learning and its application in
cone models for face recognition under variable lighting and pose, IEEE Trans. image classification, IEEE Trans. Image Process 21 (7) (2012) 3262–3272.
Pattern Anal. Mach. Intell. 23 (6) (2001) 643–660. [39] Y.-X. Wang, Y.-J. Zhang, Nonnegative matrix factorization: a comprehensive
[8] X. He, S. Yan, Y. Hu, P. Niyogi, H.-J. Zhang, Face recognition using laplacianfaces, review, IEEE Trans. Know. Data Eng 25 (6) (2013) 1336–1353.
IEEE Trans. Pattern Anal. Mach. Intell. 27 (3) (2005) 328–340. [40] Y.-X. Wang, L.-Y. Gui, Y.-J. Zhang, Neighborhood preserving non-negative
[9] J. Ho, M. Yang, J. Lim, K. Lee, D. Kriegman, Clustering appearances of objects tensor factorization for image representation, in: Proceedings of the 37th IEEE
under varying illumination conditions, in: Proceedings of 16th International International Conference on Acoustics, Speech and Signal Processing, Kyoto,
Conference on Computer Vision and Pattern Recognition, IEEE, Madison, 2012, pp. 3389–3392.
Wisconsin, 2003, pp. 1–11. [41] Y.-X. Wang, Y.-J. Zhang, Image inpainting via weighted sparse non-negative
[10] Z. Jiang, Z. Lin, L.S. Davis, Label consistent k-svd: learning a discriminative matrix factorization, in: Proceedings of the 18th IEEE International Conference
dictionary for recognition, IEEE Trans. Pattern Anal. Mach. Intell. 35 (11) (2013) on Image Processing, Brussels, 2011, pp. 3409–3412.
2651–2664. [42] Y.-X. Wang, M. Hebert, Model recommendation: Generating object detectors
[11] H. Lee, A. Battle, R. Raina, A. Ng, Efficient sparse coding algorithms, in: Pro- from few samples, in: Proceedings of the 28th IEEE International Conference
ceedings of Advances in Neural Information Processing Systems, The MIT on Computer Vision and Pattern Recognition, Boston, Massachusetts, 2015, pp.
Press, Vancouver, British Columbia, 2006, pp. 801–808. 1619–1628.
[12] B.-D. Liu, B. Shen, X. Li, Locality sensitive dictionary learning for image clas- [43] Y.-X. Wang, M. Hebert, Learned by transferring from unsupervised universal
sification, in: Proceedings of the 22nd International Conference on Image sources, in: Proceedings of the 30th AAAI Conference on Artificial Intelligence,
Processing, IEEE, Quebec, 2015, pp. 3807-3811.. Phoenix, Arizona, 2016.
[13] B.-D. Liu, B. Shen, Y.-X. Wang, Class specific dictionary learning for face [44] C. Xu, D. Tao, C. Xu, Multi-view intact space learning, IEEE Trans. Pattern Anal.
recognition, in: Proceedings of the International Conference on Security, Pat- Mach. Intell 37 (12) (2015) 2531–2544.
tern Analysis, and Cybernetics, IEEE, Wuhan, Hubei, 2014, pp. 229–234. [45] C. Xu, D. Tao, C. Xu, Large-Margin Multi-view information bottleneck, IEEE
[14] B.-D. Liu, Y.-X. Wang, S. Bin, Y.-J. Zhang, Y.-J. Wang, Blockwise coordinate Trans. Pattern Anal. Mach. Intell 36 (8) (2014) 1559–1572.
descent schemes for sparse representation, in: Proceedings of the 39th
International Conference on Acoustics, Speech and Signal Processing, IEEE,
Florence, 2014, pp. 5304–5308.
[15] B.-D. Liu, Y.-X. Wang, Y.-J. Zhang, Y. Zheng, Discriminant sparse coding for Bao-Di Liu received the Ph.D. degree in Electronic
image classification, in: Proceedings of the 37th International Conference on Engineering from Tsinghua University. Currently, he is
Acoustics, Speech and Signal Processing, IEEE, Kyoto, 2012, pp. 2193–2196. an assistant professor in College of Information and
[16] J. Mairal, J. Ponce, G. Sapiro, A. Zisserman, F.R. Bach, Supervised dictionary Control Engineering, China University of Petroleum,
learning, in: Proceedings of Advances in Neural Information Processing Sys- China. His research interests include computer vision
tems, The MIT Press, Vancouver, British Columbia, 2009, pp. 1033–1040. and machine learning.
[17] A. Martinez, The AR Face Database, CVC Technical Report 24, 1998.
[18] I. Ramirez, P. Sprechmann, G. Sapiro, Classification and clustering via dic-
tionary learning with structured incoherence and shared features, in: Pro-
ceedings of the 23rd International Conference on Computer Vision and Pattern
Recognition, IEEE, San Francisco, California, 2010, pp. 3501–3508.
[19] O.D. Richard, E.H. Peter, G.S. David, Pattern classification. A Wiley-Interscience,
2001, pp. 373–378.
[20] F.S. Samaria, A.C. Harter, Parameterisation of a stochastic model for human
face identification, in: Proceedings of the 2nd Workshop on Applications of
Computer Vision, IEEE, Sarasota, Florida, 1994, pp. 138-142.
[21] T. Sim, S. Baker, M. Bsat, The cmu pose, illumination, and expression (pie) Bin Shen is a Ph.D candidate in Department of Com-
database, in: Proceedings of the fifth International Conference on Automatic puter Science, Purdue University, West Lafayette, IN
Face and Gesture Recognition, IEEE, Shanghai, 2002, pp. 46–51. 47907, USA. Before joining Purdue, he got B.S. and M.S.
[22] P. Sprechmann, G. Sapiro, Dictionary learning and sparse coding for unsu- degrees from EE, Tsinghua University, Beijing, in 2007
pervised clustering, in: Proceedings of the 35th International Conference on and 2009, respectively. His research interests include
Acoustics, Speech and Signal Processing, IEEE, Dallas, Texas, 2010, pp. 2042– image processing, machine learning and data mining.
2045.
[23] D. Tao, X. Li, X. Wu, S.J. Maybank, Geometric mean for subspace selection, IEEE
Trans. Pattern Anal. Mach. Intell. 31 (2) (2009) 260–274.
[24] P. Tseng, Convergence of a block coordinate descent method for non-
differentiable minimization, J. Optim. Theory Appl. 109 (3) (2001) 475–494.
[25] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci. 3 (1) (1991)
71–86.
210 B.-D. Liu et al. / Neurocomputing 204 (2016) 198–210

Yu-Xiong Wang is a Ph.D. student in the Robotics Fei Yan, received the B.S. degree in Electronic and
Institute, School of Computer Science, at Carnegie Information Engineering, China University of Petro-
Mellon University. His research interests include leum. Currently, he is a staff in Lijin County Party
computer vision, image processing, and machine Committee Office, China. His research interests include
learning. computer vision and machine learning.

Xue Li received the B.S. degree in Electronic Engineer- Yan-Jiang Wang received the M.S. degree from Beijing
ing from Beijing Institute of Technology (BIT), Beijing, University of Aeronautics and Astronautics, Beijing,
China, in 2011. Currently, she is a Ph.D candidate in the China, in 1989 and the Ph.D. degree from Beijing Jiao-
Department of Electronic Engineering at Tsinghua tong University, Beijing, China, in 2001. Now he is a
University, Beijing, China. Her research interests professor of the College of Information and Control
include image classification, automatic image annota- Engineering, China University of Petroleum, Qingdao,
tion and machine learning. China. He is also the head of the Institute of Signal and
Information Processing, China University of Petroleum.
His research interests include pattern recognition,
computer vision, and cognitive computation.

Liangke Gui received his B.S. degree in Electronic


Information Engineering from Shandong University,
Jinan, China, in 2012. He is currently a M.S. student at
the Language Technology Institute, School of Computer
Science, Carnegie Mellon University, Pittsburgh, PA
15213, USA. His research interests include computer
vision, image processing and multimodal machine
learning.

You might also like