Matrix Principal Component Analysis For Image Compression and Recognition

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Matrix Principal Component Analysis for Image Compression and Recognition

Kohei Inoue, Kenji Hara and Kiichi Urahama

Department of Visual Communication Design


Kyushu University
Fukuoka-shi, 815-8540 Japan

Abstract between elements as well as those between data. Yang et


al.[2] have presented the two-dimensional PCA (2DPCA)
Vector data are mapped into low-dimensional vector spaces which reduces data matrices into those with smaller size by
with one orthonormal matrix in the principal component multiplying one orthonormal matrix to them, which mini-
analysis (PCA), which has been extended to 2DPCA in mizes the errors of reconstructed matrices from the original
which matrix data are mapped into low-dimensional ma- data. Its computational time for mapping data is shorter
trix spaces with one orthonormal matrix. We present a ma- than the PCA. The 2DPCA, however, exploits correlations
trix principal component analysis (MPCA) method in which only between columns of data matrices, hence its compres-
matrix data are mapped into low-dimensional matrix spaces sion capability is insufficient.
with two orthonormal matrices, which are derived from the
If we extend the 2DPCA to exploiting both of correla-
higher-order singular value decomposition of a third-order
tions between columns and those between rows in data ma-
tensor as an approximate solution of simultaneous dimen-
trices, we arrive at a mapping with two orthonormal ma-
sionality reduction of matrices. The MPCA method is ap-
trices multiplied to data matrices from both sides in simi-
plied to image compression and recognition of face images
lar form of the singular value decomposition. One of such
where it outperforms the PCA and the 2DPCA in compres-
forms of simultaneous low-rank approximation of matri-
sion accuracy, recognition speed and classification rates. In
ces has been presented by Pesquet-Popescu et al.[3] and
addition to improvement in recognition, this method is also
Shashua et al.[4]. Their algorithms, however, do not nec-
useful for speeding up dimensionality reduction of learning
essarily ensure orthogonality between basis vectors in map-
data. We examine this by using it as a pre-processor for the
ping matrices, hence cannot be used for image recognition
PCA together with its extension to third-order tensors for
except for use only for image compression. An algorithm
similarity search of images with their color histograms.
which gives orthonormal mapping matrices has been pre-
sented by Ye et al.[5]. Their algorithm is, however, an
iterative solution method, hence demands long computa-
1 Introduction tional time especially for large size of data. They called
their algorithm the generalized principal component analy-
This paper is addressed to dimensionality reduction of data
sis (GPCA)[5], however, this term is confusing because it
given by a form of matrices, of which typical example is
has also been used for other types of generalization of the
image data. A popular technique used for it is the prin-
PCA such as [6]. We call it Ye’s-GPCA in this paper.
cipal component analysis (PCA) as is used in the eigen-
face method[1]. The PCA, however, deals with data given We present, as an alternative to Ye’s-GPCA, a direct so-
by a form of vectors, hence its application to matrix data lution method for the simultaneous low-rank approximation
needs unfolding of matrices into vector forms which are of multiple matrices. Our method is based on an analyti-
then mapped into low-dimensional vector spaces with one cal formulation of the higher-order singular value decom-
orthonormal matrix which minimizes the errors of recon- position for dimensionality reduction of a single third-order
structed vectors from the original ones. Unfolding of ma- tensor. Our algorithm is a direct solution method in con-
trices into vectors, unfortunately, disorders neighborhood trast to an iterative one by Ye et al., hence is faster than
relations in their elements, hence in the PCA the correlation their algorithm. We call this algorithm the matrix principal
between data vectors are utilized for compression, but that component analysis (MPCA). Though the term MPCA has
between elements in each data is not exploited. also been used as the multi-way PCA which is an extension
We should deal with data given by a matrix form in- of the PCA to tensors[7], this terminology appears rarely,
tactly in order for reducing redundancies from correlations hence the MPCA is less confusing than the GPCA.
We apply the MPCA to compression and recognition 2.2 Two-Dimensional PCA
of face images and show that the MPCA outperforms the
2DPCA which outperforms the PCA in the compression ac- In the PCA described above, data matrices are transformed
curacy, recognition speed and classification rates. to vectors. We next treat
nthem in their intact matrix forms.
In addition to its high recognition speed, the MPCA At first, the mean Ā = k=1 Ak /n is subtracted from each
is also useful for speeding up dimensionality reduction of Ak as Bk = Ak − Ā. This centered matix is projected to
learning data. We show this by using it as a pre-processor Ck = Bk V with a q × r matrix V which is orthonormal:
for the PCA together with it extension to third-order tensor V T V = Ir . Reconstruction from this projection becomes
(rec)
data for similarity search of images with color histograms. Ak = Ā + Ck V T = Ā + Bk V V T . Hence the projection
(rec)
matrix V minimizing the reconstruction error Ak −Ak 
is given by
2 Simultaneous Low-Dimensional
n

Approximation of Multiple Matri- V∗ = arg min Bk − Bk V V T 2F (4)
V
ces k=1

where  • F is the Frobenius norm of a matrix. Since


Let us be given n p × q matrices Ak (k = 1, ..., n). We con-
sider linear dimensionality reduction methods which mini- n

mize the errors of matrices reconstructed from compressed Bk − Bk V V T 2F
ones from their original data. k=1
n

= tr[(Bk − Bk V V T )(BkT − V V T BkT )]
2.1 Principal Component Analysis
k=1
It is needed in the PCA that each data matrix is unfolded n n

into a vector ak of length pq. They are preliminarily cen- = tr(Bk BkT ) − tr(V T BkT Bk V ), (5)
n
tered by subtraction of their mean ā = k=1 ak /n from
k=1 k=1

them as bk = ak − ā. The centered vector bk is pro-


eq.(4) is equivalent to
jected to ck = U T bk with a pq × r matrix U which is
orthonormal: U T U = Ir where Ir is the r × r identity n

matrix. Reconstruction from this projection is given by V∗ = arg max tr(V T BkT Bk V ) (6)
(rec) V
ak = ā + U ck = ā + U U T bk . Hence the projection k=1
(rec)
matrix U minimizing the reconstruction error ak − ak  The solution V∗ is the matrix whose columns
n are the first r
is given by eigenvectors vj of the covariance matrix k=1 BkT Bk , i.e.
n
 V∗ = [v1 , ..., vr ].
U∗ = arg min bk − U U T bk 2 (1) Yang et al. at first called this method IMPCA (im-
U
k=1 age PCA)[8], but next renamed it 2DPCA (2-dimensional
where  •  is the Euclidean norm of a vector. Since PCA)[2].
n

bk − U U T bk 2 2.3 Simultaneous Singular Value Decomposi-
k=1
n
tion of Matrices

= tr[(bk − U U T bk )(bTk − bTk U U T )] In the above 2DPCA, only the number of columns in data
k=1 matrices is reduced while their rows keep intact. This sug-
n n
 gests another projection of the form Ck = U T Bk which re-
= tr(bk bTk ) − tr(U T bk bTk U ) (2)
duces only rows leaving columns intact. Thus the 2DPCA
k=1 k=1
is partial and this observation leads to combination of these
where tr(•) is the trace of a matrix, eq.(1) is equivalent to two projections to reduce rows and columns at once with
n
 the form U T Bk V . 
U∗ = arg max tr(U T bk bTk U ) (3) We firstly subtract from Ak their mean Ā = nk=1 Ak /n
U
k=1 as Bk = Ak − Ā and this centered matrices are projected to
The solution U∗ is the matrix whose columns Ck = U T Bk V with a p × r1 matrix U and a q × r2 matrix
 are the first
r eigenvectors uj of the covariance matrix nk=1 bk bTk , i.e. V where U and V are orthonormal: U T U = Ir1 , V T V =
(rec)
U∗ = [u1 , ..., ur ]. Ir2 . Reconstruction from this projection is Ak = Ā +
U Ck V T hence the projection matrices U and V minimizing matrices with size p × p, q × q, n × n. This decomposition
(rec)
the reconstruction error Ak − Ak  are given by is called the HOSVD[9].
We can reduce the rank of tensors by discarding columns
n
 in X except for the first r1 ones and denoting the remain-
{U∗ , V∗ } = arg min Bk − U Ck V T 2F (7) ing p × r1 matrix as U , similarly denoting the first q × r2
U,V
k=1 submatrix of Y as V , and also denoting the first n × r3
submatrix of Z as W . Then the tensor B is shrunk to
This is an extension of the singular value decomposition
B̃ = S ×1 U ×2 V ×3 W where S is a third-order core
(SVD) to multiple matrices. In the SVD, center matrix Ck
tensor of the size r1 × r2 × r3 . This B̃ is called a truncated
is diagonal, while in eq.(7), it is a dense matrix. In con-
HOSVD which is called the HOSVD shortly in the sequel.
trast to the PCA and the 2DPCA, we cannot obtain the so-
This (truncated) HOSVD B̃ satisfies the following property:
lution of eq.(7) analytically. Ye et al.[5] presented a method
[Property
p 2] There qholds the n B −
inequality B̃2 ≤
solving eq.(7) numerically with an iterative scheme which, 2 2 2
however, converges to only a local optimal solution and de- i=r1 +1 σi(1) + j=r2 +1 σj(2) + k=r3 +1 σk(3) where

mands long computational time especially for large size of σi(l) (l = 1, 2, 3) are three-mode singular values[9].
data. Ye et al. called this iterative algorithm the GPCA This property 2 is formally an extension of the error
which is confusing as is described above, hence we refer it bound of the SVD. The SVD is, however, the minimum er-
Ye’s-GPCA in this paper. ror solution, while the HOSVD is not necessarily so. Nev-
ertheless it is practically a good approximation of the min-
imum error solution and coincides with it in many well-
posed tensors[9]. Thus the HOSVD is a good approximate
3 Approximate Solution Based on solution of
Higher-Order SVD
min B − S ×1 U ×2 V ×3 W 2F
U,V,W,S
We derive an approximate solution of eq.(7) analytically
subj.to U T U = Ir1 , V T V = Ir2 , W T W = Ir3 (8)
from the higher-order SVD (HOSVD)[9]. Before it, we
summarize the HOSVD briefly. A practical procedure for computation of the HOSVD of
a tensor B is
(1) We unfold B into three matrices B(1) , B(2) , B(3) .
3.1 Higher-Order SVD (2) We compute their first r1 , r2 , r3 left singular vectors and
The higher-order SVD (HOSVD) is an extension of the array them into matrices U, V, W .
SVD to tensors and is defined for any order of tensors, in (3) We compute the core tensor S by S = B ×1 U T ×2
which the third-order is relevant to eq.(7). V T ×3 W T .
If we pile up n p × q matrices Bk , we get a third-order
tensor B = [bijk ] (i = 1, ..., p; j = 1, ..., q; k = 1, ..., n). 3.2 Matrix PCA
Before we define the HOSVD for this tensor, we define
some terminologies. At first We can derive an analytical approximate solution of eq.(7)
[Definition 1] A tensor composed of piled-up matrices from the above HOSVD. Our derivation is based on the fol-
can be decomposed into constituent matrices which are lowing relation:
then laid end to end to construct a matrix long sideways. B − S ×1 U ×2 V ×3 W 2F
This oblong matrix and also this operation is called un-
folding and denoted by B(1) = [B1 , ..., Bn ], B(2) = = B − C ×1 U ×2 V 2F
n
[B1T , ..., BnT ], B(3) = [H1T , ..., HpT ]; Hi = [bijk ](j =
= Bk − U Ck V T 2F (9)
1, ..., q; k = 1, ..., n). Conversely the operation restoring
k=1
B from B(1) , B(2) , B(3) is called folding.
[Definition 2] A tensor obtained by folding of XS(1) which where C = S ×3 W and Ck = [ci,j,k ](i = 1, ..., r1 , j =
is the product of two matrices X and S(1) which is an un- 1, ..., r2 ) with C = [ci,j,k ] (i = 1, ..., r1 ; j = 1, ..., r2 ; k =
folding of a tensor S is denoted by S ×1 X and is called the 1, ..., n).
product of the tensor S and the matrix X. This multiplica- The last term in eq.(9) coincides with the objective func-
tion is commutative: S ×1 X ×2 Y = S ×2 Y ×1 X. tion in eq.(7). From this observation, we know
The HOSVD is defined by using this notation as: [Property 3] Two matrices U and V derived from the
[Property 1] An arbitrary third-order tensor with size p×q× HOSVD are approximate solutions of eq.(7).
n can be decomposed as B = T ×1 X ×2 Y ×3 Z where T is Hence we can approximately solve eq.(7) analytically
a third-order core tensor with size p × q × n and X, Y, Z are by:
30
20 PCA
Ye's-GPCA 2DPCA
MPCA MPCA
20
time (sec)

error
10
10

0 0
302 602 902 1202 1502 0 0.2 0.4 0.6 0.8 1
image size compression ratio

Figure 1: Time for computation of U and V . Figure 2: Relation between compression rate and recon-
struction error

(1) unfolding B to B(1) , computing its first r1 left singular


vectors and arraying them to a matrix U , and next
(2) unfolding B to B(2) , computing its first r2 left singular
vectors and arraying them to a matrix V .
This is a direct solution method in contrast to an iterative
algorithm Ye’s-GPCA. We call this direct method the matrix
PCA (MPCA).

3.3 Comparison of Speed with Ye’s-GPCA


The direct solution method MPCA is expected to be faster
than the iterative algorithm Ye’s-GPCA. We measured the
time for computing U and V for 200 monochromatic square
images, i.e. p = q in this example. We set the lower dimen-
sions as r1 = r2 = p. The computational time is illustrated
in Fig.1 where the horizontal axis is the image size p2 . Our
MPCA is faster than Ye’s-GPCA. Obtained U and V almost Figure 3: Reconstructed images
coincide in both methods MPCA and Ye’s-GPCA, i.e. in
this example, almost optimal solution is obtained with both
of Ye’s-GPCA and MPCA. definition, the rate is high as data-size after compression is
small). If the compression rate is the same, the reconstruc-
tion error is small in the order MPCA<2DPCA<PCA, i.e.
4 Face Image Compression and the MPCA can compress images highly under a same error
tolerance.
Recognition
We confirm this result with reconstructed images shown
We examined the performance of the MPCA in compari- in Fig.3 where five images in the top row are the im-
son with the PCA and the 2DPCA for face image compres- ages reconstructed from the PCA compression with r =
sion and recognition. We used the ORL database of faces 5, 10, 15, 20, 25. The numbers in the parentheses are the
(http://www.uk.research.att.com/facedatabase.html) which compression rates (nearer to one in higher compression).
is composed of 40 persons with 10 images each, totally 400 Five images in the middle row are images reconstructed
face images with various illumination and expressions. Im- from the 2DPCA with r = 5, 10, 15, 20, 25 and those in the
age size is 92 × 75. bottom row are images reconstructed from the MPCA with
r1 = r2 = 5, 10, 15, 20, 25. As is shown in Fig.3, good
reconstruction is obtained even if the compression rate is
4.1 Compression of Face Images
high in the MPCA. In the 2DPCA, as was reported in [2, 8],
The relationship between the compression rates and recon- horizontal blur is remarkable owing to its compression only
struction errors is plotted in Fig.2 for the PCA, the 2DPCA along rows as is seen in the left image in the middle row
and the MPCA. The compression rate is 1 − (pq + pqr + in Fig.3. In contrast, the MPCA compresses images along
nr)/npq in the PCA, 1−(pq+qr+npr)/npq in the 2DPCA both directions in rows and columns hence such directional
and 1 − (pq + pr1 + qr2 + nr1 r2 )/npq in the MPCA (in this blur is unseen in its reconstructed images.
Bk into a lower-rank third-order tensor Ck as Ck = Bk ×1
0.8 U T ×2 V T ×3 W T from which Bk is approximately recon-
structed as B̃k = Ck ×1 U ×2 V ×3 W where U, V, W are
recognition rate 0.6
p1 × r1 , p2 × r2 , p3 × r3 matrices and Ck is a r1 × r2 × r3
0.4 PCA third-order core tensor. The simultaneous low-rank approx-
2DPCA imation problem similar to eqs.(1) and (7) is written in this
MPCA
0.2 case as
 n
0.005 0.01 0.015
time (sec) min Bk − Ck ×1 U ×2 V ×3 W 2F
U,V,W,Ck
k=1
subj.to U U = Ir1 , V T V = Ir2 , W T W = Ir3 (10)
T
Figure 4: Relation between time and rate of classification
Similarly to eq.(7), we can get an approximate solution
of eq.(10) from the HOSVD of the fourth-order tensor B =
4.2 Face Recognition [bi1 i2 i3 k ](i1 = 1, ..., p1 , i2 = 1, ..., p2 , i3 = 1, ..., p3 , k =
1, ..., n), whose HOSVD is an approximate solution of
The capability of high compression rates in the MPCA is
expected to be valuable for face recognition with reduction min B − S ×1 U ×2 V ×3 W ×4 X2F
U,V,W,X,S
of computational cost of classification. We examined face
recognition by the MPCA by using the same 400 face im- subj.to U T U = Ir1 , V T V = Ir2 ,
ages as above, which are divided in half into 200 learning W T W = Ir3 X T X = Ir4 (11)
data and 200 test data. The relationship between the classifi-
cation time and the recognition rate is shown in Fig.4 where where S is a r1 × r2 × r3 × r4 core tensor. If we set r4 = n,
the horizontal axis is an average classification time per one then the objective function of eq.(11) is transformed as
datum and the vertical axis is the average recognition rate. B − S ×1 U ×2 V ×3 W ×4 X2F
The PCA is plotted with the dotted line, the broken line = B − C ×1 U ×2 V ×3 W 2F
denotes the 2DPCA and the solid line is the MPCA. The n

breadth of the lines is narrow in the 2DPCA and the MPCA = Bk − Ck ×1 U ×2 V ×3 W 2F (12)
because their classification time is short even at their lowest k=1
compression rates. The recognition rates are high in the or-
where C = S ×4 X is a fourth-order core tensor of size
der PCA<2DPCA<MPCA with the computational time in
r1 × r2 × r3 × n. If we denote C = [ci1 i2 i3 k ](i1 =
the order PCA>2DPCA>MPCA.
1, ..., r1 , i2 = 1, ..., r2 , i3 = 1, ..., r3 , k = 1, ..., n), then
Ck = [ci1 i2 i3 k ](i1 = 1, ..., r1 , i2 = 1, ..., r2 , i3 = 1, ..., r3 ).
5 Use as Preprocessor for PCA and The last term in eq.(12) is the same as the objective func-
tion of eq.(10). Hence we can get an approximate solution
Extension to Third-Order Tensors of eq.(10) by the following procedure:
(1) We pile up Bk into a fourth-order tensor B.
We showed in the above section that the MPCA is useful for (2) We unfold B into B(1) and compute its first r1 singular
speeding up the computation in the recognition phase. It is vectors and array them to a matrix U .
also useful for speeding up the computation in the learning (3) We unfold B into B(2) and compute its first r2 singular
phase. In the PCA, data are unfolded into vectors whose vectors and array them to a matrix V .
high dimensionality demands long computational time in (4) We unfold B into B(3) and compute its first r3 singular
the eigenvector decomposition and their projection. We can vectors and array them to a matrix W .
compress this computational time by reducing preliminar- (5) We compute Ck = Bk ×1 U T ×2 V T ×3 W T (k =
ily the dimension of data with the MPCA and applying the 1, ..., n).
PCA to those medium dimensionality data. We examine We call this algorithm a third-order tensor PCA (TPCA).
this use of the MPCA as a pre-processor for the PCA to- The first-order TPCA is a vector PCA which is the basic
gether with its extension to third-order tensor data. PCA in section 2.1 and the second-order TPCA is the matrix
PCA (MPCA) is section 3.2.
5.1 Tensor PCA
5.2 Image Retrieval
Let us be given n third-order tensors Bk (k = 1, ..., n) of
size p1 × p2 × p3 with their elements Bk = [bi1 i2 i3 k ](i1 = We examined an application of the TPCA as a pre-processor
1, ..., p1 , i2 = 1, ..., p2 , i3 = 1, ..., p3 ). We project each for speeding up the reduction of dimensions of database
and the 2DPCA. We extended the MPCA to third-order ten-
sor data and used it as a pre-processor for the dimensionality
0.8 PCA
TPCA+PCA
reduction with the PCA and applied it to similarity search
time (sec)
of images with their color histograms.
0.6

0.4 References
0 20 40 60
[1] M. Turk and A. Pentland, “Eigenfaces for recognition,” J.
dimension
Cog. Neurosci., vol. 3, pp. 71-86, 1991.

[2] J. Yang, D. Zhang, A. F. Frangi and J.-Y. Yang, “Two-


Figure 5: Computational time for dimensionality reduction dimensional PCA: a new approach to appearance-based face
of data representation and recognition,” IEEE Trans. PAMI, vol. 26,
no. 1, pp. 131-137, Jan. 2004.
1
[3] B. Pesquet-Popescu, J. Pequet and A. P. Petroupulu, “Joint
singular value decomposition - A new tool for separable
0.8 representation of images,” Proc. ICIP, vol. 2, pp. 569-572,
precision

Greece, Oct. 2001.


PCA
TPCA+PCA
0.6
[4] A. Shashua and A. Levin, “Linear image coding for regres-
sion and classification using tensor-rank principle,” Proc.
0.4 CVPR, vol. 1, pp. 42-49, Hawaii, Dec.2001.
0 20 40 60
dimension [5] J. Ye, R. Janardan and Q. Li, “GPCA: An efficient dimen-
sion reduction scheme for image compression and retrieval,”
ACM Conf. KDD, 2004.
Figure 6: Precision of retrieval of color images
[6] R. Vidal, Y. Ma and S. Sastry, “Generalized principal com-
ponent analysis (GPCA),” IEEE Conf. CVPR, 2003.
data in an image retrieval method using the PCA[10]. We [7] A. Smilde, R. Bro and P. Geladi, “Multi-way analysis: Ap-
experimented similarity search in a database of 512 color plications in the chemical sciences,” Wiley, Oct. 2004.
images whose RGB color histograms are represented by
8 × 8 × 8 third-order tensors. We reduced preliminarily [8] J. Yang and J.-Y. Yang, “From image vector to matrix:
those tensors to 4 × 4 × 4 tensors by the TPCA, which are a straightforward image projection technique–IMPCA vs.
then unfolded into 64 dimensionality vectors and projected PCA,” Patt. Recogn., vol. 35, no. 9, pp. 1997-1999, Sept.
2002.
into lower dimensionality vectors with the PCA. Computa-
tional times for dimensionality reduction of entire data is [9] L. D. Lathauwer, B. D. Moor and J. Vandewalle, “A multi-
plotted in Fig.5 where the x-axis denotes the final dimen- linear singular value decomposition,” SIAM J. Matrix Anal.
sion of data. The broken line denotes direct application of Appl., vol. 21, no. 4, pp. 1253-1278, Apr. 2000.
the PCA and the solid line denotes the combination of the
[10] R. T. Ng and A. Sedishe, “Evaluating multidimensional in-
TPCA and the PCA. The precision of the retrieval with vec-
dexing structures for images transformed by principal com-
tors reduced with TPCA+PCA was almost the same as that
ponent analysis,” Storage and Retrieval for Images and
with PCA as is shown in Fig.6. The precision was evaluated Video Databases (SPIE), pp. 50-61, 1996.
by the proportion of the number of retrieved images by us-
ing compressed histograms coinciding with those retrieved
with the original color histograms.

6 Conclusion
We have presented a direct solution method MPCA for si-
multaneously reducing the dimensionality of multiple ma-
trices. We have shown that the MPCA is faster than the
iterative algorithm Ye’s-GPCA, and its rates of image com-
pression and classification are higher than those of the PCA

You might also like