Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO.

3, MAY 2005 679

High-Speed Face Recognition Based on Discrete


Cosine Transform and RBF Neural Networks
Meng Joo Er, Member, IEEE, Weilong Chen, and Shiqian Wu, Member, IEEE

Abstract—In this paper, an efficient method for high-speed been argued that existing feature-based techniques are not reli-
face recognition based on the discrete cosine transform (DCT), able enough for extracting individual facial features. Holistic face
the Fisher’s linear discriminant (FLD) and radial basis function recognition has attracted more attention since the well-known
(RBF) neural networks is presented. First, the dimensionality of
the original face image is reduced by using the DCT and the large statistical method, the principal component analysis (PCA) [also
area illumination variations are alleviated by discarding the first known as Karhunen-Loeve transform (KLT)] was applied in
few low-frequency DCT coefficients. Next, the truncated DCT face recognition [3], [4]. Another well-known approach is the
coefficient vectors are clustered using the proposed clustering Fisherfaces in which the Fisher’s linear discriminant (FLD) is
algorithm. This process makes the subsequent FLD more efficient. employed after the PCA is used for dimensionality reduction [5].
After implementing the FLD, the most discriminating and in-
variant facial features are maintained and the training samples are Compared with the Eigenface (PCA) approach, the fisherface
clustered well. As a consequence, further parameter estimation approach is more insensitive to large variations in lighting direc-
for the RBF neural networks is fulfilled easily which facilitates tion and facial expression. More recently, some variants of FLD
fast training in the RBF neural networks. Simulation results show (LDA) have been developed for face recognition such as F-LDA
that the proposed system achieves excellent performance with [7], D-LDA [8], FD-LDA [9], and KDDA [10] etc. However,
high training and recognition speed, high recognition rate as well
as very good illumination robustness. the computational requirements of these approaches are greatly
related to the dimensionality of the original data and the number
Index Terms—Discrete cosine transform (DCT), face recogni- of training samples. When the face database becomes larger, the
tion, FERET database, Fisher’s linear discriminant (FLD), illumi-
nation invariance, Olivetti Research Laboratory (ORL) database, time for training and the memory requirement will significantly
radial basis function (RBF) neural networks, Yale database. increase. Moreover, the system based on the PCA should be
retrained when new classes are added. As a consequence, it is
impractical to apply the PCA in systems with a large database.
I. INTRODUCTION The discrete cosine transform (DCT) has been employed in face
UMAN face recognition has become a very active research recognition [11]–[14]. The DCT has several advantages over
H area in recent years mainly due to increasing security de-
mands and its potential commercial and law enforcement applica-
the PCA. First, the DCT is data independent. Second, the DCT
can be implemented using a fast algorithm. In [11], the DCT
tions. Numerous approaches have been proposed for face recog- is applied for dimensionality reduction and then the selected
nition and considerable successes have been reported [1]. How- low-frequency DCT coefficient vectors are fed into a multilayer
ever, it is still a difficult task for a machine to recognize human perceptron (MLP) classifier. It is well-known that the problems
faces accurately in real-time, especially under variable circum- arising from the curse of dimensionality should be considered in
stances such as variations in illumination, pose, facial expression, pattern recognition. It has been suggested that as the dimension-
makeup, etc. The similarity of human faces and the unpredictable ality increases, the sample size needs to increase exponentially
variations are the greatest obstacles in face recognition. in order to have an effective estimate of multivariate densities
Generally speaking, research on face recognition can be [16]. In face recognition applications, the original input data are
grouped into two categories, namely, feature-based and holistic usually of high dimension, whereas, only limited training sam-
(also called template matching) approaches [1], [2]. Fea- ples are available. Therefore, dimensionality reduction is a very
ture-based approaches are based on the shapes and geometrical important step which will greatly improve the performance of the
relationships of individual facial features including eyes, mouth, face recognition system. However, if only the DCT is employed
nose, and chin. On the other hand, holistic approaches handle the for dimensionality reduction, we cannot keep enough frequency
input face images globally and extract important facial features components for important facial features in order to compromise
based on the high-dimensional intensity values of face images the problem of curse of dimensionality. Besides, some variable
automatically. Although feature-based schemes are more robust features exist in the low-dimensional features extracted by the
against rotation, scale, and illumination variations, they greatly DCT. Hence, in our proposed system, the FLD is also employed
rely on the accuracy of facial feature detection methods and it has in the DCT domain to extract the most discriminating features of
face images. With the combination of DCT and FLD, more DCT
Manuscript received April 25, 2003; revised July 19, 2004.
W. Chen and M. J. Er are with the Computer Control Laboratory, School coefficients can be kept and the most discriminating features can
of Electrical and Electronic Engineering, Nanyang Technological University, be extracted at high speed.
Singapore 639758, Singapore (e-mail: emjer@ntu.edu.sg). In [22], the clustering process is implemented after the FLD is
S. Wu is with the Department of Human Centric Media, Institute for Info-
comm Research, Singapore 119613, Singapore. employed. However, the FLD is a linear projection method and
Digital Object Identifier 10.1109/TNN.2005.844909 the results are globally optimal only for linearly separable data.

1045-9227/$20.00 © 2005 IEEE


680 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 3, MAY 2005

Fig. 1. Block diagram of high-speed face recognition based on DCT and RBF neural networks.

There are lots of nonlinear variations in human faces such as has been found to be asymptotically equivalent to the optimal
pose and expression. Once the faces with different poses are put Karhunen-Loeve Transform (KLT) for signal decorrelation. The
into the same class, they will actually smear the optimal projec- formula of DCT and its inverse transform can be found in [15].
tion such that we cannot get the most discriminating feature and In the JPEG image compression standard, original images
good generalization. For this reason, we should ensure that the are initially partitioned into rectangular nonoverlapping blocks
face images in each class should not have large nonlinear varia- (8 8 blocks) and then the DCT is performed independently on
tions. In this paper, instead of regarding each individual as one the subimage blocks [15]. In our proposed system, we simply
class, we propose a subclustering method to split one class into apply the DCT on the entire face image. If the DCT is only ap-
several subclasses before implementing the FLD. Moreover, the plied to the subimage independently, some relationship informa-
subclustering process is very crucial to structure determination tion between subimages cannot be obtained. However, we can
of the following radial basis function (RBF) neural networks. In obtain all frequency components of a face image by applying the
fact, the number of clusters is just the number of hidden neurons DCT on the entire face image. In addition, some low-frequency
in the RBF neural networks. components are only related to the illumination variations which
Neural networks have been widely applied in pattern recogni- can be discarded.
tion for the reason that neural-networks-based classifiers can in- For an image, we have an DCT coefficient ma-
corporate both statistical and structural information and achieve trix covering all the spatial frequency components of the image.
better performance than the simple minimum distance classifiers The DCT coefficients with large magnitude are mainly located
[1]. Multilayered networks (MLNs), usually employing the back- in the upper-left corner of the DCT matrix. Accordingly, as il-
propagation (BP) algorithm, are widely used in face recognition lustrated in Fig. 2, we scan the DCT coefficient matrix in a
[21]. Recently, RBF neural networks have been applied in many zig-zag manner starting from the upper-left corner and subse-
engineering and scientific applications including facerecognition quently convert it to a one-dimensional (1-D) vector. Detailed
[22], [23]. The RBF neural networks possess the following salient discussions about image reconstruction errors using only a few
features: 1) They are universal approximators [25]; 2) they have significant DCT coefficients can be found in [11]. As a holistic
a simple topological structure [26]; 3) they can implement fast feature extraction method, the DCT converts high-dimensional
learning algorithms because of locally tuned neurons [27]. Based face images into low-dimensional spaces in which more signif-
ontheadvantagesofRBFneuralnetworksandtheefficientfeature icant facial features such as outline of hair and face, position of
extraction method, a high-speed RBF neural networks classifier eyes, nose and mouth are maintained. These facial features are
whereby near-optimal parameters can be estimated according to more stable than the variable high-frequency facial features. As
the properties of the feature space instead of using the gradient a matter of fact, the human visual system is more sensitive to
descent training algorithm is proposed in this paper. As a conse- variations in the low-frequency band.
quence, our system is able to achieve high training and recogni- Illumination variations are still an unsolved problem in face
tion speed which facilitates real-time applications of the proposed recognition, particularly for the holistic approach. For the PCA
face recognition system. The block diagram of our proposed face (Eigenface) method, it has been suggested that by discarding the
recognition system is shown in Fig. 1. three most significant principal components, variations due to
The organization of the paper is as follows. In Section II, lighting can be reduced. In [5], experimental results showed that
we first present the DCT-based feature extraction method to- the PCA method performs better under variable lighting condi-
gether with its relevant properties against illumination varia- tions after removing the first three principal components. How-
tions. Then, the proposed clustering algorithm and the FLD ever, the first several components not only correspond to illumi-
method implemented in the DCT domain are presented. Sec- nation variations, but also some useful information for dicrimi-
tion III describes the architecture of RBF neural networks and nation [5]. Besides, since the PCA method is highly dependent
the parameter estimation approach, which is based on the prop- on the training samples, different components are obtained with
erties of previously extracted feature vectors. Experimental re- different training samples. Therefore, there is no guarantee that
sults are presented and discussed in Section IV. Finally, conclu- the first three principal components are definitely related to the
sions are drawn in Section V. illumination variations. In this paper, we investigate the illumi-
nation invariant property of the DCT by discarding its several
low-frequency coefficients. It is well-known that the first DCT
II. FEATURE EXTRACTION coefficient represents the dc component of an image which is
solely related to the brightness of the image. Therefore, it be-
A. DCT
comes DC-free (i.e., zero mean) and invariant against uniform
The DCT has been widely applied to solve numerous prob- brightness change by simply removing the first DCT coefficient.
lems among the digital signal processing community. In partic- Fig. 3 illustrates the robustness of the DC-free DCT against uni-
ular, many data compression techniques employ the DCT, which form brightness variation.
CHEN et al.: HIGH-SPEED FACE RECOGNITION BASED ON DCT AND RBF NEURAL NETWORKS 681

Fig. 2. Scheme of scanning two-dimensional (2-D) DCT coefficients to a 1-D


vector

Fig. 4. Effect of nonuniform illumination reduction of DCT after discarding


the first three coefficients. (a)–(c) are under different lighting condition
(center-light, left-light, right-light). (d)–(f) are obtained from the inverse DCT
transform after setting the first coefficient to the same appropriate value, the
second and the third coefficients to zero.

Fig. 3. Illustration of the brightness invariance of the DC-free DCT. (a) and (b)
Same images with different brightness. (c) and (d) Are, respectively, obtained Fig. 5. Clustering with category information.
from the inverse DCT transform of (a) and (b) after setting the first coefficients
to the same value. Since the first coefficient represents the dc component of B. Clustering
the image, after inverse DCT transform, some of the image intensity values
will become negative if the first coefficient is set to zero. Therefore, in order to Several well-known clustering algorithms such as -means
display the images correctly after taking the inverse DCT transform, we choose clustering and fuzzy -means clustering are widely used in RBF
a median value between the first DCT coefficients of (a) and (b). neural networks [27], [30]. However, these clustering approaches
are unsupervised learning algorithms and no category informa-
It should be noted that the two reconstructed images in tion about patterns is used. On the contrary, in face recognition,
Fig. 3(c) and (d) are slightly different. This is because in ad- category information of training samples is previously known.
justing the original image to different levels of brightness, some Accordingly, we can simply cluster the training samples in each
intensity values of the image reach the maximum or minimum class. As depicted in Fig. 5, it is impossible for a purely unsuper-
value (for an 8-b grayscale intensity image, the maximum vised clustering algorithm to separate each class accurately.
and minimum intensity value is 255 and 0, respectively) and In the proposed system, the classes are split in terms of their
some information is lost. In addition, the DC-free DCT is only Euclidean distances. It is unreasonable to split the classes ac-
invariant against linear brightness variations. In other words, it cording to the degree of overlap because there are still great
is not robust against varying contrast. overlaps between classes after performing the DCT. Moreover,
Some of the low-frequency components of DCT also account the following FLD will reduce the within-class scatter and, thus,
for the large area nonuniform illumination variations. Con- make the sparsely distributed training samples in each subclass
sequently, the nonuniform illumination effect can be reduced tighter. Hence, we only need to prevent the samples with large
by discarding several low-frequency DCT coefficients. Face variations from being clustered in the same subclass. The pro-
images under different lighting conditions and their corre- posed clustering algorithm is as follows.
sponding reconstructed images after discarding the first three 1) Let be the number of total clusters and be the
coefficients are illustrated in Fig. 4. In the proposed system, number of classes. Initially, set each class to be one
the truncated DCT works like a bandpass filter which inhibits cluster, i.e., .
high-frequency irrelevant information and low-frequency illu- 2) Find two training samples with
mination variations. the largest Euclidean distance in class
682 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 3, MAY 2005

. These two samples are called carefully so that the FLD will work efficiently. After the clustering
clustering reference sample (CRS). algorithm and the FLD are implemented, the sparsely distributed
3) Compute the Euclidean distances from the two training samples cluster more tightly which simplifies parameter
samples to the samples in other classes estimation of the RBF neural networks in the sequel.
, denoted as and
, where is the number C. FLD
of samples not belonging to class , respectively as In order to obtain the most salient and invariant feature of
follows: human faces, the FLD is applied in the truncated DCT domain.
The FLD is one of the most popular linear projection methods
(1) for feature extraction. It is used to find a linear projection of
(2) the original vectors from a high-dimensional space to an op-
timal low-dimensional subspace in which the ratio of the be-
where denotes the Euclidean norm. tween-class scatter and the within-class scatter is maximized.
4) Compute the mean value and the standard deviation of In this paper, we apply the FLD to discount variations such as
as follows: illumination and expression. The details about the FLD can be
found in [2]. It should be noted that we apply the FLD after
clustering such that the most discriminating facial feature can
(3)
be effectively extracted. The discriminating feature vectors
projected from the truncated DCT domain to the optimal sub-
space can be calculated as follows:
(4)
(9)
where are truncated DCT coefficient vectors, and is
(5) the FLD optimal projection matrix.

III. CLASSIFICATION USING RBF NEURAL NETWORKS


(6) A. Structure Determination and Parameter Estimation of RBF
Neural Networks
5) Define the scope radius of the two CRSes The traditional three-layer RBF neural networks are em-
as follows: ployed for classification in the proposed system. The archi-
tecture is identical to the one used in [22]. We employ the
(7) most frequently used Gaussian function as the radial basis
(8) function since it best approximates the distribution of data
in each subset. In face recognition applications, the RBF
where is a positive constant clustering factor. If neural networks are regarded as a mapping from the feature
, then split the cluster hyperspace to the classes. Therefore, the number of inputs of
into two clusters with CRSes and , re- RBF neural networks is determined by the dimension of input
spectively, and set . vectors. In the proposed system, the truncated DCT vectors
6) There are three scenarios for any sample in class after implementing the FLD are fed to the input layer of the
( is not CRS). These scenarios as depicted in RBF neural networks. The number of outputs is equal to the
Fig. 6 will be handled as follows: class number. The hidden neurons are very crucial to the RBF
i) If only one CRS’s scope comprises , then neural networks, which represent the subset of the input data.
will be merged with the cluster to which this CRS be- After the clustering algorithm is implemented, the FLD projects
longs. the training samples into the subspace in which the training
ii) If more than one CRS’s scope comprise , then samples are clustered more tightly. Our experimental results
will be merged into the cluster to which the CRS show that the training samples are well separated and there are
with the shortest distance to belongs. no overlaps between subclasses after the FLD is performed.
iii) If no CRS’s scope comprise , then is re- Consequently, in our system, the number of subclasses (i.e.,
garded as another CRS belonging to a new cluster. Set the number of hidden neurons of the RBF neural networks) is
and compute the radius according to determined by the previous clustering process.
steps (4)–(5). Repeat step (6), until dose not change. In the proposed system, we simplify the estimation of the
7) Apply steps (2)–(6) to all classes. RBF parameters according to the data properties instead of su-
It follows from (7) and (8) that the radius of the CRS is chosen pervised learning since the nonlinear supervised method often
according to the mean distance and standard deviation from this suffers from having a long training time and the possibility of
CRS to the training samples belonging to other classes. The clus- being trapped in local minima. Two important parameters are
tering factor controls the clustering extent. The larger the value associated with each RBF unit, the center and the width
of , the more clusters there are. Therefore, should be chosen . Each center should well represent each subclass because
CHEN et al.: HIGH-SPEED FACE RECOGNITION BASED ON DCT AND RBF NEURAL NETWORKS 683

Fig. 6. Illustration of one class split into three subclasses.

the classification is actually based on the distances between the training. Accordingly, the following method for width estima-
input samples and the centers of each subclass. There are dif- tion is proposed:
ferent strategies in selecting RBF centers with respect to dif-
(11)
ferent applications [24]. Here, as the FLD keeps the most dis-
criminating feature for each sample in each subclass, it is rea- (12)
sonable to choose the mean value of the training samples in
every subclass as the RBF center as follows: where is the center of the th cluster belonging to the th
class and is the center of the th cluster belonging to the th
(10) class and is the median distance from the th center to
the centers belonging to other classes. In the proposed system,
since the centers of RBF units well represent the training sam-
where is the th sample in the th subclass and is the ples in each cluster, we estimate the width of one cluster by cal-
number of training samples in the th subclass. culating the distances from this center to the centers belonging
1) Width Estimation: To our knowledge, every subclass has to other classes instead of the individual training samples so as
its own features which lead to different scopes for each subclass. to avoid excessive computational complexity. Hence, the width
The width of an RBF unit describes the properties of a subclass of the th cluster is estimated as follows:
because the width of a Gaussian function represents the stan-
dard deviation of the function. Besides, the width controls the (13)
amount of overlap of Gaussian functions. If the widths are too
large, there will be great overlaps between classes so that the where is a factor that controls the overlap of this cluster with
RBF units cannot represent the subclasses well and the output other clusters belonging to different classes. Equation (13) is de-
belonging to the class will not be so significant which will lead rived from the Gaussian function. It should be noted that
to great misclassifications. On the contrary, too small a width is determined by the distances to the cluster centers belonging to
will result in rapid reduction in the value of a Gaussian function other classes (not other clusters) because one class can be split
and, thus, poor generalization [28]. Accordingly, our goal is to into several clusters and the overlaps between clusters from the
select the width that minimizes the overlaps between different same class are allowed to be great. The median distance
classes so as to preserve local properties as well as maximize well measures the relative scope of RBF units. Furthermore, by
the generalization ability of the network. selecting the proper factor , suitable overlaps between different
As foreshadowed earlier, the FLD makes the subclasses well classes can be guaranteed.
separate. However, it has been indicated that the FLD method 2) Weight Adjustment: In the first stage, we estimate
achieves the best performance on the training data, but gener- the parameters of the RBF units by using unsupervised
alizes poorly to new individuals, particularly when the training training methods. The second phase of training is to optimize
data set is small [29]. The distribution of training samples cannot second-layer weights of the RBF neural networks. Since the
well represent the new inputs. Hence, in this special case, the output of the RBF neural networks is a linear model, we can
width of each subclass cannot be estimated merely according apply linear supervised learning to minimize a suitable error
to the small number of training samples in each subclass. Our function. The sum-of-squares error function is given by
studies show that the distances from the centers of RBF units
to the new input samples belonging to other classes are similar (14)
to the distances to the training samples in other classes. These
distances can be used to estimate the widths of RBF units since where is the target value for output unit when the th training
they generally imply the range of RBF units. In [31], it has been sample is fed to the network,
indicated that the patterns which are not consistent with data is the th output of the RBF unit, is the number of RBF units
statistics (noisy patterns) should be rejected rather than used for generated according to the clustering algorithm in Section II-B
684 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 3, MAY 2005

and is the total number of training samples. This problem can


be solved by the linear least square (LLS) paradigm [16].
Let and be the number of input and output neurons, re-
spectively. Furthermore, let be the RBF unit matrix
and be the target matrix con-
sisting of “1’s” and “0’s” with exactly one per column that iden-
tifies the processing unit to which a given exemplar belongs. The
problem of finding an optimal weight matrix such
that the error function (14) is minimized is as follows:
(15)
where is the pseudoinverse of and is given by
(16)
In the proposed system, however, direct solution of (16) can lead
to numerical difficulties due to the possibility of being sin-
gular or near singular. This problem can be best solved by using
Fig. 7. RMSE curve.
the technique of singular value decomposition (SVD) [32].

IV. EXPERIMENTAL RESULTS AND DISCUSSIONS information may become noise and degrade the performance.
Our experiments also showed that the best recognition rates are
In order to evaluate the proposed face recognition system, our achieved when the feature dimension is about 30. Hence, the
experiments are performed on three benchmark face databases: feature dimension of 30 will be adopted in the following simu-
1) The Olivetti Research Laboratory (ORL) database; 2) The lation studies.
FERET database; 3) The Yale database. Besides, for each 1) Parameter Selection: Two parameters, namely the clus-
database, we use three different evaluation methods which are tering factor and the overlapping factor need to be determined.
mostly used in each database, respectively. In this way, the As foreshadowed in Section II-B, the subclustering process is
experiment results can be compared with other face recognition based on the mean value and standard deviation of the distances
approaches fairly. from the CRS to the samples in other classes. Normally, we can
choose 1 as the initial value since the difference between
A. Testing on the ORL Database the mean distance and standard deviation approximately implies
First, our face recognition system is tested on the ORL data- the scope of CRS. Nevertheless, with different databases and ap-
base. There are 400 images of 40 subjects. In the following plications, the proper value of should be adjusted accordingly.
experiments, five images are randomly selected as the training For the ORL database, the following experimental results show
samples and another five images as test images. Therefore, a that the proper value of lies in the range of .
total of 200 images are used for training and another 200 for Since the overlapping factor is not related to , we can fix the
testing and there are no overlaps between the training and testing value of when estimating . The value of is set to 1 for the fol-
sets. Here, we verify our system based on the average error rate, lowing parameter estimation process. It follows from (13) that the
, which is defined as factor actually represents the output of the RBF unit when the
distance between the input and the RBF center is equal to . As
(17) a result, should be a small value. We can initially assume that
liesintherangeof .Moreoptimalandprecise canbe
where is the number of simulation runs. (The proposed system further estimated by finding the minimum value of the root mean
is evaluated on ten runs, 10.), is the number of misclas- square error (RMSE). The RMSE curves for five different training
sifications for the th run and is the total number of testing sets are depicted in Fig. 7. It is evident that there is only one min-
samples for each run. We also denote the maximum and min- imum value in each RMSE curve and it usually lies in the range of
imum misclassification rates for the ten runs as and , . We should not choose the exact minimum value
respectively. of because the FLD makes the training samples in each cluster
The dimension of feature vectors fed into the RBF neural net- tighter in comparison with the testing samples. Accordingly, in
works is essential for recognition. In [22], experimental results order to obtain better generalization, we choose a slightly larger
showed that the best results are achieved when the feature di- value for . In the following experiments, we choose the value of
mension is 25–30. If the feature dimension is too small, the fea- 0.1forthe overlapping factor which is shown to be a proper value
ture vectors do not contain sufficient information for recogni- for the RBF width estimation in our system.
tion. However, it does not mean that more information will result 2) Number of DCT Coefficients: In order to determine how
in higher recognition rate. It has been indicated that if the dimen- many DCT coefficients should be chosen, we test the recogni-
sion of the network input is comparable to the size of the training tion performance with different number of DCT coefficients.
set, the system is liable to overfitting and result in poor gen- Simulation results are summarized in Table I. Here, the clus-
eralization [17]. Moreover, the addition of some unimportant tering coefficient is set to 1. We can see from Table I that
CHEN et al.: HIGH-SPEED FACE RECOGNITION BASED ON DCT AND RBF NEURAL NETWORKS 685

TABLE I TABLE III


RECOGNITION PERFORMANCE VERSUS NUMBER OF DCT COEFFICIENTS PERFORMANCE ON 10 SIMULATIONS ( = 1.5; = 0.1)
( = 1 = 0.1)

TABLE IV
RECOGNITION PERFORMANCE COMPARISON OF DIFFERENT APPROACHES

TABLE II
RECOGNITION PERFORMANCE VERSUS CLUSTERING FACTOR ( = 0.1)

method. Therefore, for face images with large variations such


as pose, scale etc., the subclustering is necessary before imple-
menting the FLD. This process will be more effective if there
are more training samples for each cluster.
By setting the optimal parameters in the proposed system, we
obtain high recognition performance based on ten simulation
studies whose results are shown in Table III.
4) Comparisons With Other Approaches: Many face recog-
more DCT coefficients do not necessarily mean better recog- nition approaches have been performed on the ORL database.
nition performance because high-frequency components are re- In order to compare the recognition performance, we choose
lated to unstable facial features such as expression. There will be some recent approaches tested under similar conditions for
more variable information for recognition when the DCT coeffi- comparison. Approaches are evaluated on recognition rate,
cients increase. According to Table I, the best performance is ob- training time and recognition time. Comparative results of
tained when 50–60 DCT coefficients are used in our recognition different approaches are shown in Table IV. Our experiments
system. In addition, Table I shows that the performance of our are performed on a Pentium II 350M computer, using Win-
system is relatively stable when the number of DCT coefficients dows 2000 and Matlab 6.1. It is hard to compare the speed of
changes significantly. This is mainly due to the FLD algorithm different algorithms performed on different computing plat-
which discounts the irrelevant information as well as keeps the forms. Nevertheless, according to the information of different
most invariant and discriminating information for recognition. computing systems as listed in Table IV, we can approximately
3) Effect of Clustering: As mentioned in Section II, the compare their relative speeds. It is evident from the table that
FLD is a linear projection paradigm and it cannot handle our proposed approach achieves high recognition rate as well
nonlinear variations in each class. Therefore, the proposed as high training and recognition speed.
subclustering algorithm is applied before taking the FLD. The 5) Computational Complexity: In this section, in order to
clustering factor controls the extent of clustering as well as provide more information about the computational efficiency of
determines the number of RBF units. Small number of clusters the proposed system, the approximate complexity of each part
may lead to great overlap between classes and cannot obtain is analyzed and the results are summarized in Table V.
the optimal FLD projection direction. On the other hand, an In face recognition applications, the dimensionality of an
increase of clusters may result in poor generalization because original face image is usually considerably greater than the
of overfitting. Moreover, since the training samples in each number of training samples. Therefore, the computational com-
class are limited, the increase of clusters leads to reduction plexity mostly lies in the dimensionality reduction stage. The
of training samples in each cluster so that the FLD will work training and recognition speed are greatly improved because the
inefficiently. Table II shows one run of the recognition results fast DCT reduces the computational complexity from
with different numbers of clusters where denotes the mis- to for an image where is a power of 2.
classification rate. The best performance is obtained when Moreover, the proposed parameter estimation method is much
lies in the range of 1.0–2.0. The results show that subclustering faster than the gradient descent training algorithm which will
will improve the performance even when the FLD is applied take up to hundreds of epochs.
on small clusters. Without the clustering process, face images 6) Performances With Different Numbers of Training Sam-
with large nonlinear variations will be in the same cluster. The ples: Since the FLD is a statistical method for feature extrac-
FLD will discount some important facial features instead of tion, the choice of training samples will affect its performance.
extracting them since the FLD is a linear global projection In [6], the authors indicate that the FLD works efficiently only
686 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 3, MAY 2005

TABLE V
COMPUTATIONAL COMPLEXITY

Fig. 9. Reconstructed images after discarding several low-frequency DCT


coefficients. (a) Original image. (b) Reconstructed image after discarding the
first three DCT coefficients. (c) Reconstructed image after discarding the first
six DCT coefficients (in order to display the image, the first coefficient is
actually retained).

Fig. 8. Performances with different number of training samples (results are


based on ten runs).

when the number of training samples is large and representative


for each class. Moreover, small number of training samples will
result in poor generalization for each RBF unit. Simulation re-
sults with different numbers of training samples are shown in
Fig. 10. Examples of normalized FERET face images from four subsets. (a)
Fig. 8. Our approach is promising if more training samples are Dup 1. (b) Dup 2. (c) Fafb. (d) Fafc.
available.
7) Performances After Discarding Several DCT Coeffi- change greatly with time. In this case, discarding the first
cients: As illustrated in Section II, the DC-free DCT has the several low-frequency DCT coefficients will mainly reduce
robustness against linear brightness variations. The truncated large area illumination variations.
DCT also alleviates the effect of the large area nonuniform
illumination by discarding several low-frequency components. B. Testing on the Feret Database
However, there are no such large illumination variations in The proposed feature extraction method is also tested on the
the ORL database. Therefore, discarding the first three DCT FERET database which contains more subjects with different
coefficients will not get better performance. On the contrary, variations [37]. We employ the Colorado State University
the performance will get worse (see Table VIII). The reason (CSU) Face Identification Evaluation System to evaluate our
is that some holistic facial features, for example, the relative feature extraction method [38]. The original face images are
intensity of the hair and the face skin, will be more or less first normalized by using the preprocessing program provided
ruined since they are low-frequency components (see Fig. 9). in the CSU evaluation system. Examples of normalized images
In fact, this kind of influence is slight compared to large area are shown in Fig. 10. Four testing probe subsets with different
illumination variations. We can see from Fig. 9 that the main evaluation tasks are used (see Fig. 10 and Table VII). We
facial features such as face outline, eyes, nose and mouth are only compare our proposed feature extraction method with the
maintained well after discarding several low-frequency DCT baseline PCA method with or without the first three principal
coefficients. Furthermore, in many face recognition applica- components. To generate the cumulative match curve, the
tions, only faces without hair are used for recognition for the Euclidean distance measure is used. Here we only evaluate our
reason that the human’s hair is an unstable feature which will proposed feature extraction method but not the classifier. Since
CHEN et al.: HIGH-SPEED FACE RECOGNITION BASED ON DCT AND RBF NEURAL NETWORKS 687

Fig. 11. Cumulative match curves. (a) Dup1. (b) Dup2. (c) Fafb. (d) Fafc.

TABLE VI the first three DCT low-frequency coefficients because illumi-


PERFORMANCES AFTER DISCARDING SEVERAL DCT COEFFICIENTS
nation variations are reduced. The histogram equalization in the
preprocessing procedure can only deal with uniform illumina-
tion. However, by discarding several low-frequency DCT coef-
ficients, both uniform and nonuniform illumination variations
TABLE VII can be reduced. We can see from Fig. 11(d) that the perfor-
FOUR PROBE SUBSETS AND THEIR EVALUATION TASK mance of PCA is greatly improved by discarding the first three
components. However, in other probe sets, the performance be-
comes even worse without the first three components. Because
the PCA is a statistical approach which is data dependent, the
first three components are not necessarily related to illumina-
tion variations. It depends on the training and testing sets. We
only normalized frontal face images are used in this experi- can see from the experimental results that only the probe set Fafc
ment and the training samples for each class are limited, the with large illumination changes will get better performance by
subclustering process is skipped. The cumulative match curves discarding the first three components. It should be highlighted
for four probe sets are, respectively, shown in Fig. 11. (For the that low-frequency DCT coefficients are mainly associated with
PCA approach, 50 components are used. For the DCT FLD illumination variations for the face image without hair. There-
approach, 70 DCT coefficients are used and the dimensionality fore, discarding several low-frequency coefficients will only re-
of the feature vectors is also 50 after implementing the FLD). duce the illumination variations but not ruin some facial fea-
From the cumulative match curves of the four different probe tures. It should be noted that we do not compare our feature
sets, we can see that the performance is improved by discarding extraction approach with all other available approaches tested
688 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 3, MAY 2005

Fig. 12. Cumulative match curves (D-LDA). (a) Dup1. (b) Dup2. (c) Fafb. (d) Fafc.

on the FERET database and we do not claim that our approach performance because there will be more irrelevant and unstable
achieve the best performance on the FERET database. The main information. Moreover, the D-LDA is more computationally
objective of this experiment is to show that the performance is expensive than the DCT D-LDA or DCT FLD method. It
indeed improved by discarding the first several low-frequency should be noted that in our experiment, the best performance
DCT coefficients. Moreover, compared with the baseline PCA of DCT D-LDA is obtained when the number of DCT
approach, our approach performs better on the four probe sets coefficients is close to the largest number of DCT coefficients
and the computational cost is greatly reduced. that the DCT FLD method can keep and the DCT D-LDA
Recently, a variant of FLD (LDA) called direct LDA achieves similar results as the DCT FLD method.
(D-LDA) has been proposed to solve the small sample size
problem instead of using the PCA FLD scheme [8]. This
method is used to discard the null space of the between-class C. Testing on the Yale Database
scatter matrix rather than discarding the null space of the
within-class scatter matrix. The D-LDA can also be employed We further test our proposed approach on the Yale database
in our system after implementing the DCT for dimensionality which contains 165 frontal face images covering 15 individuals
reduction. As illustrated in Fig. 12, our empirical study on the taken under 11 different conditions. Each individual has dif-
FERET database shows that the D-LDA does not achieve better ferent facial expressions (happy, sad, winking, sleepy, and sur-
performance than the DCT D-LDA (The numbers in the prised), illumination conditions (center-light, left-light, right-
figure legend represent the number of DCT coefficients kept). light) and small occlusion (with glasses). In order to be consis-
In addition, similar to the results obtained in Section IV-A, tent with the experiments of [5], the centered face images of the
more DCT coefficients do not necessarily correspond to better normalized Yale face database and their closely cropped images
CHEN et al.: HIGH-SPEED FACE RECOGNITION BASED ON DCT AND RBF NEURAL NETWORKS 689

TABLE IX
PERFORMANCE ON THE YALE DATABASE ( = 0.1)

V. CONCLUSION
This paper presents a high-speed face recognition method
Fig. 13. Examples of the centered Yale images. based on the techniques of DCT, FLD, and RBF neural net-
works. Facial features are first extracted by the DCT which
greatly reduces dimensionality of the original face image as
well as maintains the main facial features. Compared with the
well-known PCA approach, the DCT has the advantages of data
independency and fast computational speed. Besides, we have
explored another property of DCT. It turns out that by simply dis-
carding the first DCT coefficient, the proposed system is robust
against uniform brightness variations of images. Furthermore,
by discarding the first few low-frequency DCT coefficients, the
effect of nonuniform illumination can be alleviated. In order to
Fig. 14. Examples of the closely cropped Yale images. obtain the most invariant and discriminating feature of faces, the
linear projection method FLD is further applied to the feature
TABLE VIII vectors. Before implementing the FLD, we propose a clustering
COMPARISON ON THE YALE DATABASE
algorithm to prevent training samples with large variations
from being clustered in the same class. This process guarantees
optimal projection direction for the FLD as well as determines
the number of hidden neurons of the RBF neural networks for
classification. After the FLD is applied, there are no overlaps
between classes and the architecture and parameters of RBF
neural networks are determined according to the distribution
properties of the training samples. Simulation results on three
benchmark face databases show that our system achieves high
are used in our experiment.1 Examples of the centered Yale im- training and recognition speed, as well as high recognition rate.
ages and the closely cropped Yale images are shown in Figs. 13 More importantly, it is insensitive to illumination variations.
and 14, respectively.
For the Yale face database, our system is also tested by using ACKNOWLEDGMENT
the “leaving-one-out” strategy which was used in [5]. In order to The authors would like to thank those researchers who have
fairly evaluate different feature extraction methods, the nearest kindly provided the public face databases such as ORL, FERET,
neighbor classifier was used for classification. We also skip the and Yale etc. The authors are also grateful to the anonymous
subclustering process since only frontal normalized face images reviewers for their valuable comments.
are used. Comparative results are shown in Table VIII. The results
of the Eigenface (PCA) and Fisherface in our experiment are REFERENCES
not identical to the results shown in [5]. This is mainly due to [1] R. Chellappa, C. L. Wilson, and S. Sirohey, “Human and machine recog-
little different cropping size and normalization. Nevertheless, the nition of faces: A survey,” Proc. IEEE, vol. 83, no. 5, pp. 705–740, May
results are very close. By discarding the first three coefficients, 1995.
[2] R. Brunelli and T. Poggio, “Face recognition: Features versus tem-
the DCT achieves better performance because variations due to plates,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 10, pp.
lighting are reduced. However, if the FLD is further applied to the 1042–1053, Oct. 1993.
[3] M. Kirby and L. Sirvoich, “Application of the Karhunen-Loeve proce-
DCT feature vectors, discarding the first three coefficients will dure for the characterization of human faces,” IEEE Trans. Pattern Anal.
not improve the performance significantly because the FLD also Mach. Intell., vol. 12, no. 1, pp. 103–108, Jan. 1990.
discounts illumination variations. We can see from Table VIII [4] M. A. Turk and A. P. Pentland, “Eigenfaces for recognition,” J. Cogn.
Neurosci., vol. 3, pp. 71–86, 1991.
that our feature extraction method achieves similar performance [5] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces versus
compared to the Fisherface method. However, the main advan- fisherfaces: Recognition using class specific linear projection,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997.
tage of our method is its low computational load which is crucial [6] A. Martinez and A. Kak, “PCA versus LDA,” IEEE Trans. on Pattern
for high-speed face recognition in large databases. If we use the Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228–233, 2001.
proposed RBF neural networks for classification, our system [7] R. Lotlikar and R. Kothari, “Fractional-step dimensionality reduction,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 6, pp. 623–627, Jun.
achieves better performance (see Table IX). 2000.
[8] H. Yu and J. Yang, “A direct LDA algorithm for high-dimensional
1The normalized Yale face database can be obtained from http://www- data—With application to face recognition,” Pattern Recognit., vol. 34,
white.media.mit.edu/vismod/classes/mas622-00/datasets/ pp. 2067–2070, 2001.
690 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 3, MAY 2005

[9] J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “Face recognition [38] R. Beveridge, D. Bolme, M. Teixerira, and B. Draper, The CSU Face
using LDA-based algorithms,” IEEE Trans. Neural Netw., vol. 14, no. 1, Identification Evaluation System User’s Guide: Version 5.0. Fort
pp. 195–200, Jan. 2003. Collins, CO: Colorado State Univ., May 2003.
[10] , “Face recognition using Kernel direct discriminant analysis algo-
rithms,” IEEE Trans. Neural Netw., vol. 14, no. 1, pp. 195–200, Jan.
2003.
[11] Z. Pan, R. Adams, and H. Bolouri, “Image redundancy reduction for
neural network classification using discrete cosine transforms,” in Proc. Meng Joo Er (S’82–M’87) received the B.Eng.
IEEE-INNS-ENNS Int. Joint Conf. Neural Networks, vol. 3, Como, Italy, and M.Eng. degrees in electrical engineering from
2000, pp. 149–154. the National University of Singapore, in 1985 and
[12] Z. M. Hafed and M. D. Levine, “Face recognition using the discrete 1988, respectively, and the Ph.D. degree in systems
cosine transform,” Int. J. Comput. Vis., vol. 43, no. 3, pp. 167–188, 2001. engineering from the Australian National University,
[13] V. V. Kohir and U. B. Desai, “Face recognition using a DCT-HMM ap- Canberra, in 1992.
proach,” in Proc. IEEE Workshop on Applications of Computer Vision
From 1987 to 1989, he worked as a Research
(WACV’98), Princeton, NJ, 1998, pp. 226–231.
and Development Engineer in Chartered Electronics
[14] S. Eickeler, S. Müller, and G. Rigoll, “High quality face recognition in
Industries Pte, Ltd., Singapore, and a Software
JPEG compressed images,” in Proc. IEEE Int. Conf. Image Processing
Engineer in Telerate Research and Development Pte,
(ICIP), Kobe, Japan, 1999, pp. 672–676.
[15] W. Pennebaker and J. Mitchell, JPEG Still Image Data Compression Ltd., Singapore, respectively. He is currently Director
Standard. New York: Van Nostrand, 1993. of the Intelligent Systems Center, a University Research Centre cofunded
[16] C. M. Bishop, Neural Networks for Pattern Recognition. New York: by Nanyang Technological University (NTU) and Singapore Engineering
Oxford Univ. Press, 1995. Technologies, Ltd., and an Associate Professor in the School of Electrical and
[17] J. L. Yuan and T. L. Fine, “Neural-network design for small training Electronic Engineering (EEE), NTU. His research interests include control
sets of high dimension,” IEEE Trans. Neural Netw., vol. 9, no. 2, pp. theory and applications, robotics, and automation, fuzzy logic and neural
266–280, Mar. 1998. networks, artificial intelligence, biomedical engineering, parallel computing,
[18] J. Mao and A. K. Jain, “Artificial neural networks for feature extraction power electronics and drives, and digital signal processors applications. He
and multivariate data projection,” IEEE Trans. Neural Netw., vol. 6, no. has authored a book entitled Dynamic Fuzzy Neural Networks: Architectures,
2, pp. 296–317, Mar. 1995. Algorithms, and Applications (New York: McGraw Hill, 2003), seven book
[19] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: chapters, and numerous published works in his research areas of interest.
A convolutional neural-network approach,” IEEE Trans. Neural Netw., Dr. Er was the winner the Institution of Engineers, Singapore (IES)
vol. 8, no. 1, pp. 98–113, Jan. 1997. Prestigious Publication (Application) Award in 1996 and IES Prestigious
[20] S.-H. Lin, S.-Y. Kung, and L.-J. Lin, “Face recognition/detection by Publication (Theory) Award in 2001. In recognition of his outstanding research
probabilistic decision-based neural network,” IEEE Trans. Neural Netw., work as a promising young academic staff member, he was awarded a
vol. 8, no. 1, pp. 114–132, Jan. 1997. Commonwealth Fellowship tenable at the University of Strathclyde, U.K.,
[21] D. Valentin, H. Abdi, A. J. O’Toole, and G. W. Cottrell, “Connectionist in 2000. Due to his excellent performance in teaching, he received the
models of face processing: A survey,” Pattern Recognit., vol. 27, pp. Teacher of the Year Award for the School of EEE in 1999. In 1998, he
1209–1230, 1994. was invited as a Guest Researcher at the Precision Instrument Development
[22] M. J. Er, S. Wu, J. Lu, and H. L. Toh, “Face recognition with radial basis Center of National Science Council, Taiwan. He was invited as a Panelist to
function (RBF) neural networks,” IEEE Trans. Neural Netw., vol. 13, no. the IEEE-INNS-ENNS International Conference on Neural Networks held
3, pp. 697–710, May 2002. in Como, Italy, in 2000. He served as the Editor of the IES Journal on
[23] F. Yang and M. Paindavoine, “Implementation of an RBF neural network Electronics and Computer Engineering from 1995 to 2004. Currently, he
on embedded systems: Real-time face tracking and identity verification,” serves as an Associate Editor of the International Journal of Humanoid
IEEE Trans. Neural Netw., vol. 14, no. 5, pp. 1162–1175, Sep. 2003. Robotics and an Associate Editor of Neurocomputing, respectively. He is a
[24] S. Haykin, Neural Networks, A Comprehensive Foundation. New Member of the Editorial Board of the International Journal of Computer
York: Macmillan, 1994. Research. He served as a Member of the International Scientific Committee
[25] J. Park and J. Wsandberg, “Universal approximation using radial basis of the International Conference on Circuits, Systems, Communications, and
functions network,” Neural Comput., vol. 3, pp. 246–257, 1991. Computers (CSC) in 1998. He was a Member of the International Scientific
[26] S. Lee and R. M. Kil, “A Gaussian potential function network with hier- Committee of a unique three conferences series on Soft Computing consisting
archically self-organizing learning,” Neural Netw., vol. 4, pp. 207–224, of the 2001 World Scientific and Engineering Society (WSES) International
1991. Conference on Neural Networks and Applications, the 2001 WSES International
[27] J. Moody and C. J. Darken, “Fast learning in network of locally-tuned Conference on Fuzzy Sets and Fuzzy Systems, and the 2001 WSES International
processing units,” Neural Computat., vol. 1, pp. 281–294, 1989. Conference on Evolutionary Computation. He was invited to serve as the
[28] S. Wu and M. J. Er, “Dynamic Fuzzy neural networks: A novel approach General Chairman of four WSES International Conferences, namely the
to function approximation,” IEEE Trans. Syst., Man Cybern. B, Cybern., International Conference on Robotics, Distance Learning, and Intelligent
vol. 30, no. 2, pp. 358–364, 2000. Communications Systems (RODLICS) organized by WSES and cosponsored
[29] G. Donato, M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski, by the IEEE Robotics and Automation Society, the International Conference
“Classifying facial actions,” IEEE Trans. Pattern Anal. Mach. Intell., on Speech, Signal, and Image Processing 2001 (SSIP’2001), the International
vol. 21, no. 10, pp. 974–989, 1999. Conference on Multimedia, Internet, Video Technologies (MIV’2001), and
[30] W. Pedrycz, “Conditional fuzzy clustering in the design of radial basis
the International Conference on Simulation (SIM’2001) held in Malta in
function neural networks,” IEEE Trans. Neural Netw., vol. 9, no. 4, pp.
2001. He also served as a Member of the International Program Committee
601–612, Jul. 1998.
on the International Conference on Computational Intelligence, Robotics, and
[31] A. G. Bors and I. Pitas, “Median radial basis function neural network,”
Autonomous Systems (CIRAS’2001) held in Singapore in 2001. Moreover,
IEEE Trans. Neural Netw., vol. 7, no. 6, pp. 1351–1364, Nov. 1996.
[32] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Nu- he served as a CoChairman of the Keynotes/Workshop Sub-Committee, the
merical Recipes in C: The Art of Scientific Computing. Cambridge, Asian Control Conference, held in Singapore in 2002. He also served as the
U.K.: Cambridge Univ. Press, 1992. General Chair of two 2002 WSEAS International Conferences, namely the
[33] F. Samaria, “Face recognition using hidden markov models,” Ph.D. dis- 2002 WSEAS International Conference on Electronics, Control, and Signal
sertation, Trinity College, Univ. Cambridge, Cambridge, MA, 1994. Processing (ECS’2002) and the 2002 WSEAS International Conference on
[34] A. S. Tolba and A. N. Abu-Rezq, “Combined classifiers for invariant face E-ACTIVITIES in Singapore. He became the first Singapore citizen to be
recognition,” in Proc. Int. Conf. Information Intelligence and Systems, invited to serve as a Member of the Board of Directors of WSEAS. He has also
1999, pp. 350–359. been very active in organizing international conferences. He was a member
[35] T. Tan and H. Yan, “Object recognition based on fractal neighbor dis- of the main organizing committees of International Conference on Control,
tance,” Signal Process., vol. 81, pp. 2105–2129, 2001. Automation, Robotics, and Vision (ICARCV) for six consecutive terms (1994,
[36] Y.-S. Ryu and S.-Y. Oh, “Simple hybrid classifier for face recognition 1996, 1998, 2000, 2002, and 2004) and the Asian Conference on Computer
with adaptively generated virtual data,” Pattern Recognit. Lett., vol. 23, Vision (ACCV) in 1995. He was the Co-Chairman of Technical Programme
pp. 833–841, 2002. Committee and Person-in-Charge of Invited Sessions for ICARCV’96 and
[37] P. J. Phillips, H. Wechsler, J. Huang, and P. Rauss, “The FERET database ICARCV’98. He was in charge of International Liaison for ICARCV’2000
and evaluation procedure for face recognition algorithms,” Image Vis. and was Technical Program Chair for ICARCV’2002. He served as the
Comput. J., vol. 16, no. 5, pp. 295–306, 1998. General Chair of ICARCV’2004.
CHEN et al.: HIGH-SPEED FACE RECOGNITION BASED ON DCT AND RBF NEURAL NETWORKS 691

Weilong Chen received the B.S. degree in automatic Shiqian Wu (M’02) received the B.S. and M.Eng.
control engineering from Southeast University, degrees from Huazhong University of Science and
Nanjing, China, in 1999. He is currently pursuing the Technology, Wuhan, China, in 1985 and 1988, re-
Ph.D. degree in electrical and electronics engineering spectively, and the Ph.D. degree in electrical and elec-
at Nanyang Technological University, Singapore. tronic engineering from Nanyang Technological Uni-
His research interests are in pattern recognition, versity, Singapore, in 2001.
computer vision, and neural networks. He is currently a Research Scientist in the De-
partment of Human Centric Media, Institute for
Infocomm Research, Singapore. From 1988 to 1997,
he was a Lecturer and then became as Associate
Professor at the Huazhong University of Science
and Technology, Wuhan, Hubei, China. From 2000 to 2002, he served as
a Research Engineer and then Research Fellow in the Center for Signal
Processing, Singapore. In 2002, he was an Associate Research Staff in the
Laboratories for Information Technology, Singapore. He has published more
than 30 international journal and conference papers. He is a coauthor of a
book entitled Dynamic Fuzzy Neural Networks: Architectures, Algorithms and
Applications (New York: McGraw-Hill, 2003). Currently, his research interests
include fuzzy systems, neural networks, computer vision, pattern recognition,
face detection and recognition, and infrared image analysis.

You might also like