Neurocomputing: Mohammad Reza Faraji, Xiaojun Qi

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Neurocomputing 199 (2016) 16–30

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Face recognition under varying illuminations using logarithmic fractal


dimension-based complete eight local directional patterns
Mohammad Reza Faraji n, Xiaojun Qi
Department of Computer Science, Utah State University, Logan, UT 84322-4205, USA

art ic l e i nf o a b s t r a c t

Article history: Face recognition under illumination is really challenging. This paper proposes an effective method to
Received 8 October 2015 produce illumination-invariant features for images with various levels of illumination. The proposed
Received in revised form method seamlessly combines adaptive homomorphic filtering, simplified logarithmic fractal dimension,
6 January 2016
and complete eight local directional patterns to produce illumination-invariant representations. Our
Accepted 31 January 2016
Available online 16 March 2016
extensive experiments show that the proposed method outperforms two of its variant methods and nine
state-of-the-art methods, and achieves the overall face recognition accuracy of 99.47%, 94.55%, 99.53%,
Keywords: and 86.63% on Yale B, extended Yale B, CMU-PIE, and AR face databases, respectively, when using one
Face recognition image per subject for training. It also outperforms the compared methods on the Honda UCSD video
Illumination variations
database using five images per subject for training and considering all necessary steps including face
Complete eight local directional patterns
detection, landmark localization, face normalization, and face matching to recognize faces. Our evalua-
Logarithmic fractal dimension
tions using receiver operating characteristic (ROC) curves also verify the proposed method has the best
verification and discrimination ability compared with other peer methods.
& 2016 Elsevier B.V. All rights reserved.

1. Introduction preprocessing method is finally applied on the cropped images to


obtain illumination-insensitive face images. In this stage, if face
Face recognition with a wide range of applications in security, images contain facial expressions such as smile, anger, or scream,
forensic investigation, and law enforcement is a good compromise Gabor features-based expression normalization can be employed to
between reliability and social acceptance [1,2]. A typical face extract facial features that are robust to expression variations [11,12].
recognition system includes four main stages (shown in Fig. 1 [1]): In the final stage, a face matching process is used on the normalized
face detection, landmark localization, face normalization, and face faces to recognize the identity of the faces. This stage includes fea-
matching. In the first stage, face detectors such as the SNoW-based ture extraction and face classification. In feature extraction, each face is
(sparse network of windows) face detector [3], the Viola-Jones converted to a one-dimensional vector to be used for classification.
detector [4], and the skin-color-pixel-based Viola-Jones face detec- Each face might also be divided into a regular grid of cells to extract
tor [5] detect faces in images or video frames. In the second stage, histograms [13]. Dimension reduction methods can also be used to
respective facial landmarks are localized for each detected face. reduce the dimension of the data before a classifier is employed to
Some of recently proposed facial landmark localization techniques verify the identity of the input face image. Representative dimension
include boosted regression with Markov networks (BoRMaN) [6], reduction methods include principal component analysis (PCA) [14–
16], independent component analysis (ICA) [17,15], kernel principal
local evidence aggregation for regression-based facial point detec-
component analysis (KPCA) [18], Fisher's linear discriminant analysis
tion (LEAR) [7], discriminative response map fitting (DRMF) [8], and
(FDA) [19], kernel Fisher's linear discriminant analysis (KFDA)
Chehra v.1 (meaning “face” in hindi) [9]. In the third stage, detected
[20,21], and singular value decomposition (SVD). Finally, the face
faces are geometrically normalized and corrected by facial land-
classification stage uses the extracted features to classify and verify a
marks. First, geometric rectification is applied to normalize the pose.
new unknown face image with respect to the image database. Some
Pose normalization then uses facial landmarks to estimate the
of the popular classification methods used in face recognition sys-
frontal view for each face [10]. Face images can be automatically
tems are k nearest neighbors [22,23], neural-networks [24,25], and
cropped using facial landmarks, since landmarks exactly indicate the
the sparse representation classifier (SRC) [26,27].
locations of key facial points in the image. The illumination However, five key factors including illumination variations,
pose changes, facial expressions, age variations, and occlusion
n
Corresponding author: significantly affect the performance of face recognition systems.
E-mail address: Mohammadereza.Faraji@aggiemail.usu.edu (M.R. Faraji). Among these factors, illumination variations such as shadows,

http://dx.doi.org/10.1016/j.neucom.2016.01.094
0925-2312/& 2016 Elsevier B.V. All rights reserved.
M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30 17

Fig. 1. Typical framework for illumination-invariant face recognition [1].

underexposure (too dark), and overexposure (too bright) attracted minimize the difference from the same person. LTrP encodes the
much attention in the last decade [28,1]. In the image of a familiar relationship among the center pixel and its neighborhood pixels
face, changing the direction of the light source leads to shifts in the using derivatives in vertical and horizontal directions. LFD per-
location and shape of shadows, changes in highlights (i.e., making forms a log function on face images and transfers images to the
the face appear too bright or too dark), and reversal of contrast. fractal dimension domain to produce illumination-invariant face
Braje et al. [29] studied the effects of illumination in face recog- representations.
nition from the psychological perspectives. They found out that Reflectance field estimation methods estimate the face reflec-
illumination variations have enormously complex effects on the tance field, which is illumination-invariant, from a face image. They
image of an object. Specifically, varying the illumination direction usually apply the Lambertian reflectance-illumination model (i.e.,
resulted in larger image differences than did varying the identity retinex theory) [1,42] as their face imaging model. These methods
of the face. Furthermore, cast shadows introduced spurious are generally effective. However, the surface of the human face is
luminance edges that may be confused with object contours and approximately Lambertian, which makes illumination-invariant
consequently significantly hindered performance of face recogni- representation less sufficient [43]. Self-quotient image (SQI)
tion systems. As a result, there is a need for extracting facial fea- [44,45], adaptive smoothing (AdaS) [46], Gradientface [47], Weber-
tures which are invariant or insensitive to illumination variations. face [48], generalized Weberface (GWF) [49], and local Gradientface
These invariant facial features can then be used for recognizing the XOR and binary pattern (LGXBP) [50] are representative methods.
face. Various methods have been proposed to deal with illumi- SQI, which is a ratio image between a given test image and its
nation variations. These methods can be categorized into the fol- smoothed version, implicitly indicates that each preprocessed image
lowing three groups: gray-level transformation methods, gradient is illumination-invariant [51]. The AdaS method estimates the illu-
or edge extraction methods, and reflection field estimation mination by smoothing the input image using an iterative method.
methods [1,30]. The Gradientface method creates a ratio image between the y-gra-
Gray-level transformation methods redistribute the intensities dient and the x-gradient of a given image. The Weberface method,
in a face image with a linear or non-linear transformation function inspired by Weber's law and based on the Weber local descriptor
to correct the uneven illumination to some extent [1]. They are (WLD) [52], creates a ratio image between the local intensity var-
simple and computationally efficient and fast. But, they are not iation and the background. The Gradientface and Weberface meth-
effective in eliminating the effects of illumination and therefore ods are proven to be illumination-insensitive. The GWF method is a
achieve an inferior performance than the methods in the other generalized multi-scale version of the Weberface method. It also
two groups. Representative methods include histogram equaliza- considers an inner and outer ground for each pixel and assigns
tion (HE) [31] and gamma intensity correction (GIC) [32]. HE different weights for them to develop a weighted GWF (wGWF). The
adjusts and flattens the contrast of the image using the image's LGXBP method uses a histogram-based descriptor to produce
histogram. GIC maps the image Iðx; yÞ to the image Gðx; yÞ using illumination-invariant representation of a face image. It first trans-
αIðx; yÞ1=γ , where α is a gray-stretch parameter and γ is the Gamma forms a face image into the logarithm domain and obtains a pair of
coefficient. illumination-insensitive components: gradient-face orientation and
Gradient or edge extraction methods produce illumination- gradient-face magnitude. These two components are then encoded
insensitive representations by extracting gray-level gradients or by LBP and local XOR patterns to form the local descriptor and the
edges from face images [1]. Representative methods include local histogram for face recognition.
binary patterns (LBP) [33], local directional patterns (LDP) [34], In this paper, we propose an effective method which encodes
enhanced LDP (EnLDP) [35], local directional number patterns face images with various levels of illumination. To this end, we
(LDN) [36], eight local directional patterns (ELDP) [37], adaptive first filter images using the adaptive homomorphic filter to par-
homomorphic eight local directional patterns (AH-ELDP) [38], tially reduce the illumination. We then use the simplified loga-
discriminant face descriptor (DFD) [39], local tetra patterns (LTrP) rithmic fractal dimension transformation as an edge enhancer
[40], enhanced center-symmetric local binary pattern (ECS-LBP) technique to enhance facial features and remove illumination to
[41], and logarithmic fractal dimension (LFD) [30]. LBP takes P some point. We finally produce eight edge directional images
pixels in a circle of radius R around each pixel and threshold these using Kirsch compass masks and propose a complete ELDP
pixels based on the value of the central pixel, where P and R are (CELDP), which considers both directions and magnitudes of the
usually set to be 8 and 2 for face recognition applications, edge responses, to obtain illumination-invariant representations.
respectively. LDP, EnLDP, LDN, and ELDP produce eight directional The contributions of the proposed method are: (1) Using the
edge images using Kirsch compass masks and encode the direc- adaptive homomorphic filter to partially reduce the illumination
tional information to obtain noise and illumination-invariant by attenuating the low-frequency (i.e., illumination) component of
representations. AH-ELDP filters face images using the homo- each face image based on the characteristics of each face image.
morphic filter, enhances the filtered images using an interpolative (2) Transforming filtered face images to the LFD domain by skip-
function and then performs the ELDP method to produce final AH- ping every other adjacent scale (i.e., reducing the computational
ELDP images. The DFD method is an improvement of the LBP time by half) to enhance facial features such as eyes, eyebrows,
descriptor. It is involved with three-step feature extraction to nose, and mouth while keeping noise at a low level. (3) Employing
maximize the appearance difference from different persons and a gradient-based descriptor which considers the relations among
18 M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30

directions and magnitudes of all eight directional edge responses adaptive homomorphic filter on the original face image in the
to represent more valuable structural information from the frequency domain to attenuate the low frequency signal (i.e.,
neighborhood, capture the distinct characteristics of each face illuminance) and strengthen the high frequency signal (i.e., texture
image, and therefore achieve robustness against illumination information) of the image. An adaptive c is used to adjust the
variations and noise. (4) Achieving the highest face recognition homomorphic filter for each face image based on the slope of
accuracy on five face databases using the same set of pre- changes in the low frequency components of the image and reduce
determined parameters and a simple nearest neighbor-based the illumination influence. Next, AH-ELDP adopts the idea of the
classifier compared with nine recently proposed state-of-the-art interval type-2 fuzzy sets [53] to apply an interpolative enhance-
methods. ment function to stretch the contrast of the filtered image. Finally,
The rest of this paper is organized as follows: Section 2 similar to the ELDP method, it uses Kirsch compass masks to
describes some of previous work and presents the proposed compute edge responses and considers the relations among eight
method. Section 3 presents experimental results on four publicly directional numbers to obtain the AH-ELDP image.
available image databases as well as a standard video database and These LDP-based approaches are similar to LBP from one per-
evaluates the verification and discrimination ability of the pro- spective since the LBP method also transfers the 8-bit binary code
posed method using ROC curves. Finally, Section 4 draws the to a decimal value. However, LDP approaches compute the eight
conclusions and presents the directions of future work. directional edge images using Kirsch compass masks and use the
directional numbers to generate the 6-bit or 8-bit code, accord-
ingly. LBP, on the other hand, uses the center pixel of the original
2. Methodology image in each neighborhood as the threshold and computes the
sparse points to generate the 8-bit code. Therefore, LDP approa-
In this section, we first describe some of the related methods in ches use more edge information from the entire neighborhood,
more detail and then present the proposed method. while LBP discards most of the information in the neighborhood
by using few numbers of intensities in a neighborhood, which
2.1. Previous work makes the LBP method sensitive to noise and limits its accuracy
[38,36].
The LDP-based approaches such as LDP, EnLDP, LDN, and ELDP The fractal analysis (FA) as a type of texture analysis has been
operate in the gradient domain to produce illumination-invariant recently used in medical imaging [54–56], signal processing [57],
representations. They use directional information to produce edge and also in face recognition under varying illuminations [30].
responses which are insensitive to lighting variations [36]. Speci- Specifically, the recently proposed LFD method [30] performs a log
fically, for each face image, eight directional edge images are function on a face image to expand dark pixels and compress
computed by eight Kirsch compass masks ðM 0 ; M 1 ; …; M 7 Þ shown bright pixels for partial illumination reduction. It then transfers
in Fig. 2. Eight masks can be produced by rotating the first Kirsch the face image to the fractal dimension (FD) domain to produce
mask (M0) 45° apart in eight directions. The ith directional edge the illumination-invariant representation by using the differential
image, Ii , is computed by the convolution of the face image Iðx; yÞ box-counting (DBC) algorithm, a popular computationally efficient
and the corresponding mask Mi. Then, each LDP-based approach method for dealing with large images.
performs a different operation to produce illumination-invariant
images. For example, for each pixel (x,y) of the image I, LDP [34] 2.2. The proposed method
considers an 8-bit binary code and assigns 1's to the three bits
corresponding to the three prominent numbers and 0's to the The proposed method consists of three steps. The first step is to
other five bits. EnLDP [35] and LDN [36] consider a 6-bit binary use the adaptive homomorphic filter to partially reduce the illu-
code. The first and second three bits respectively code the posi- mination. The second step is to apply the simplified LFD method to
tions of the top and second top positive directional numbers for enhance facial features. The last step employs a complete ELDP
EnLDP, and the positions of the top positive and top negative (CELDP), which uses both directions and magnitudes of edge
directional numbers for LDN. ELDP [37] considers an 8-bit binary responses to produce the illumination-invariant representation.
code and assigns 1's to the corresponding positive numbers and Fig. 3 illustrates the framework of the proposed method to pro-
0's to the corresponding negative numbers. In this way, ELDP uses duce final matching scores for each face image.
all the eight directional numbers and represents more valuable
structural information. However, it requires applying a Gaussian 2.2.1. Adaptive homomorphic filter
filter to smooth images before performing the ELDP process. The In the first step, we apply the adaptive homomorphic filter as a
generated codes for all the LDP-based methods are then converted Gaussian high-pass filter on a face image to strengthen the high-
to their corresponding decimal values to produce LDP, EnLDP, LDN, frequency signal and attenuate the low-frequency signal. A face
and ELDP images, which are illumination-invariant. image I can be represented based on the Lambertian-reflectance
The recently proposed AH-ELDP [38] method combines three model so that Iðx; yÞ ¼ Rðx; yÞLðx; yÞ, where Iðx; yÞ, Rðx; yÞ, and Lðx; yÞ
steps to produce illumination-invariant images. It first applies an are respectively the pixel intensity, reflectance, and illuminance at
position (x,y) of the face image [38,58]. R corresponds to the tex-
ture information such as the contrast arrangement of skin, eye-
brows, eyes, and lips of a face and is therefore considered as the
high frequency signal. L contains the illumination values of
neighboring pixels which are similar to each other and is therefore
considered as the low frequency signal [59,38]. The process of
applying the filter is summarized in Fig. 4. It indicates that the
logarithmic function is used to separate the R and L functions so
that the operation can be performed in the frequency domain. The
Fourier transform result is then multiplied with the adaptive
Fig. 2. Kirsch compass masks in eight directions. homomorphic filter to reduce the illuminance. The filtered image
M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30 19

is obtained by the exponential function of the inverse Fourier [38], the c parameter is computed by:
transform. Mag 1
The homomorphic filter Hðu; υÞ is represented as follows: c¼ ð2Þ
Mag 2
h i
where Magi indicates the ith largest magnitude value (i.e., the
Hðu; υÞ ¼ ðγ H  γ L Þ 1  exp  cD ðu;υÞ=D0 þ γ L
2 2
ð1Þ
strength of the frequency components) at non-zero frequency
locations within a square window centering at the origin of the
where Dðu; υÞ is the distance from (u, υ) to the origin of the cen- Fourier transform.
tered Fourier transform, D0, is the cutoff distance measured from
the origin, γ L o1 and γ H 4 1 are the parameters of the filter, and c 2.2.2. Logarithmic fractal dimension
is a constant to control the sharpness of the slope of the filter In the second step, we perform the LFD method on top of the
function as it transitions between γH and γL. Since c along with D0 filtered image. The LFD method, first, applies a log function on the
pffiffiffi
(i.e., D0 = c) determines the actual cutoff distance, it is considered filtered image to expand the values of dark pixels and compress
as the most important parameter of the filter [38]. As proposed in the values of bright pixels. Then, it transfers the obtained image to
the FD domain using the DBC algorithm [30]. To this end, LFD
computes a sequence of two-dimensional matrices Mat' to overlay
the filtered image f of size M  N at each pixel (x, y) as follows:
    2
maxðW d ðx; yÞÞ  minðW d ðx; yÞÞ dmax
Mat d  1 ðx; yÞ ¼ floor þ1
d d
ð3Þ
where d is the scaling factor with a minimum value of 2 and a
maximum value of dmax that represents how much a specific
structure of pixels is self-similar to its surroundings. W d ðx; yÞ is a
d  d neighborhood block whose center is at location (x, y) of an
image. When d is an odd number, the center is in the middle of the
neighborhood block. When d is an even number, the center is to
the left and above the middle of the neighborhood block. max and
min operations obtain the highest and lowest intensity values in
the image block covered by the d  d neighborhood block cen-
tering at the location (x, y) of an image, respectively. Specifically,
for each scaling factor d, the center of a sliding neighborhood block
of size d  d is positioned at each pixel in an image. The maximum
intensity value and the minimum intensity value of the image
portion covered by the sliding neighborhood block are used to
compute the new value at the corresponding pixel location fol-
lowing Eq. (3).
Each two-dimensional matrix Matd  1 of size M  N is then
converted to a one-dimensional column vector of size ðM  NÞ  1
and the converted ðdmax  1Þ column vectors are concatenated to
obtain the two-dimensional matrix V, which has ðM  NÞ rows and
ðdmax  1Þ columns. Each row in the matrix V contains the values
computed by Eq. (3) at a particular location in an image across
scales from 2 to dmax. For example, vðx;yÞ denotes the obtained
values at location (x, y) of an image across scales from 2 to dmax. A
log function is then applied on all the elements of V and the
respective scaling factor d. The LFD image is finally computed as
follows:
Sυ ðx; yÞ
LFDðx; yÞ ¼ ð4Þ
S
where S and Sυ are the sums of squares as follows:
P 2
dmax  1
X
dmax 1
d ¼ 1 log ðdÞ
S¼ ðlog ðdÞÞ2  ð5Þ
d¼1
dmax  1

P P 
dmax  1 dmax  1
X
dmax 1
d¼1 log ðdÞ d¼1 log ðυðx;yÞ;d Þ
Sυ ðx; yÞ ¼ log ðdÞðlog ðυðx;yÞ;d ÞÞ
d¼1
dmax  1
ð6Þ
Fig. 3. The framework of the proposed method.

Fig. 4. Homomorphic filtering approach to partially reduce the illuminance of the face image Iðx; yÞ.
20 M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30

The original LFD method [30] sets dmax to be 10 and recom-


mends to set dmax to be between 9 and 12 for face images. How-
ever, we observe that the structure of pixels in a face image
changes slowly across scales. Therefore, skipping every other
adjacent scale does not lose much information and we propose to
set d ¼ 2; 4; 6; …; dmax . In this way, the number of overlay two-
dimensional matrices Mat (i.e., the column dimension of the
matrix V) decreases to a half (i.e., dmax =2) and these targeted scales Fig. 6. CELDPM code computation using eight magnitudes thresholded by the
average of absolute values of all the eight edge images (i.e., 83).
are involved in the computation in Eqs. (5) and (6). Obviously, this
reduces computational time to transfer an image to its FD image
by a half and therefore reduces the run time of the method
significantly. CELDPM has a similar 8-bit binary code structure. However, it
thresholds the absolute value of all eight magnitude values for
2.2.3. Complete ELDP (CELDP) each pixel based on the average of absolute values of all the eight
The LFD image obtained from the adaptive homomorphic fil- edge images (i.e., m). Fig. 6 shows the process of producing an
tered image is fed into the ELDP process. To this end, we compute CELDPM value of a pixel using its eight magnitudes.
eight edge directional images using Kirsch compass masks to Fig. 7 illustrates the intermediate results of the proposed
produce the ELDP image. Given a pixel (x, y) in the LFD image, if method together with the final CELDPD and CELDPM images for an
Ii ðx; yÞ shows the ith (i¼ 0,1,…,7) directional number computed by input face image. Clearly, both CELDPD and CELDPM images contain
the ith Kirsch compass mask Mi, the respective ELDP value at the important features which are illumination-invariant.
pixel (x, y) is defined as follows:
iX
¼7
2.2.4. Classification
ELDPðx; yÞ ¼ DðIi ðx; yÞÞð2Þi ð7Þ Similar to the Weberface, AH-ELDP, and LFD methods, the one
i¼0 nearest neighbor (1NN) method with l2 norm (1NN-l2) is used as
the classifier. 1NN-l2 simply assigns an input image to its nearest
where
neighbor image in the database. Therefore, recognition accuracy
1 z40 only shows the effectiveness of preprocessing methods in produ-
DðzÞ ¼
0 otherwise cing illumination-invariant features. To utilize the CELDPD and
CELDPM images of each input image for the classification task, we
Fig. 5 shows the process of producing an ELDP value of a pixel
first compute similarity distances DisD and DisM between the
using its eight directional numbers. Their directions (i.e., positive
CELDPD and CELDPM images of the input image and their corre-
and negative) are used to obtain the ELDP image, while their
sponding CELDPD and CELDPM preprocessed images for all images
respective magnitudes are discarded. We notice that a directional
in the database, respectively. We then independently normalize all
number with a higher value is expected to contain more infor-
distances DisD and DisM to be in the range of [0, 1]. We denote the
mation about the texture changes in its respective direction.
normalized distances as NDisD and NDisM, respectively. The final
Therefore, we propose to consider and integrate the influence of
similarity measure Dis is computed by fusing NDisD and NDisM as
the magnitudes of the directional numbers in the ELDP process. To
follows:
this end, we adopt the idea of the completed LBP (CLBP) method
[60], which considers magnitudes of difference between the Dis ¼ α  NDisD þ ð1  αÞ  NDisM ð10Þ
neighborhood pixels and the central pixel as well as their corre- where α is the weight of the direction component of the edge
sponding signs, to design the proposed CELDP. We define CELDP as images at the decision level. We set α to be 0.80, as Guo et al. [60]
the composition of direction component (CELDPD) and magnitude proved that the local difference error made by using the sign
component (CELDPM), which respectively are defined as follows: component of the LBP code to reconstruct an image is only 1/4 of
CELDPD ðx; yÞ ¼ ELDPðx; yÞ ð8Þ the respective error made by the magnitude component of the LBP
code. This valid argument is used in our system to compute the
iX
¼7 final similarity measure as a proportion of similarity measures
CELDPM ðx; yÞ ¼ MðabsðIi ðx; yÞÞ; mÞð2Þi ð9Þ obtained from CELDPD and CELDPM components.
i¼0

where
3. Experimental results
1 z4m
Mðz; mÞ ¼
0 otherwise
3.1. Experimental settings
and m is the average of absolute values of all the eight edge
directional images. Basically, CELDPD is the traditional ELDP and We evaluate the proposed method by conducting experiments
contains the information of the directions of edge responses. on four publicly available image databases with large illumination
variations, namely, Yale B, extended Yale B, PIE, and AR image
databases [61–63] as well as the Honda UCSD video database [64].
We use the recognition accuracy, which is computed as the per-
centage of the number of correctly recognized faces to the number
of all probe (testing) images, as the performance measure. We also
use ROC curves to illustrate the ability of each method to verify
and discriminate face images.
We compare the proposed method with nine recently proposed
state-of-the-art methods such as Gradientface, Weberface, LFD,
Fig. 5. CELDPD (i.e., ELDP) code computation using the signs of eight directional LBP, LDP, EnLDP, LDN, ELDP, and AH-ELDP. Fig. 8 demonstrates six
numbers computed from eight Kirsch compass masks for a pixel. original images from the Yale B face database and their
M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30 21

Fig. 7. Illustration of the intermediate results of the proposed method.

Fig. 8. Comparison of preprocessed images obtained by different methods. (a) Original face images in S0 of the Yale B face database. Illumination-invariant images produced
by the methods of (b) Gradientface, (c) Weberface, (d) LBP, (e) LDP, (f) EnLDP, (g) LDN, (h) LFD, (i) ELDP, (j) AH-ELDP, (k) CELDPD, and (l) CELDPM.

corresponding preprocessed images generated by nine compared of 0.3 to smooth the image for Gradientface and ELDP methods
state-of-the-art methods and the CELDPD and CELDPM images and use the Gaussian filter with the SD of 1 for Weberface method,
produced by the proposed method. All of these methods are since the above methods achieve decent accuracy for all five
implemented in MATLAB and their parameters are set as recom- databases using these values. Also, similar to [38,30], the para-
mended by the respective researchers. For the LBP method, we meters of the proposed system are set to be: γH ¼ 1.1, γL ¼ 0.5,
apply the uniform LBP operator with 8 pixels in a circle of radius of D0 ¼15, and dmax ¼ 10. However, when images are of size 64  64,
2. We also use the Gaussian filter with the Standard Deviation (SD) we set D0 to be proportional to its size (i.e., D0 ¼ 64  0.15 ¼9.6).
22 M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30

Results on the Yale B face database: The Yale B database contains include the average accuracy together with its SD for all subsets
grayscale face images of 10 subjects under nine poses. We use including S00 , S1, S2, S3, S4, and S5 (i.e., 630 testing images in total)
frontal face images in our experiments. Each subject has 64 images in the last column of Table 1. The proposed method outperforms
(480  640 pixels). All face images are cropped and resized to the nine state-of-the-art methods with the best face recognition
100  100 pixels. These images are categorized into six subsets accuracy of 99.47% and the smallest SD of 0.004, while the second
based on the angle between the light source directions and the highest accuracy of 96.67% is obtained by the AH-ELDP method. It
central camera axis: S0 (0°, 6 images per subject), S1 (1–12°, improves the face recognition accuracy of Gradientface, Weber-
8 images per subject), S2 (13–25°, 10 images per subject), S3 (26– face, LBP, LDP, EnLDP, LDN, LFD, ELDP, and AH-ELDP by 9.18%,
50°, 12 images per subject), S4 (51–77°, 10 images per subject), and 7.52%, 10.92%, 15.69%, 15.73%, 20.72%, 4.56%, 8.85%, and 2.90%,
S5 (above 78°, 18 images per subject). S0 contains 6 images per respectively.
subject, which are shown in the first column of Fig. 8. These We also include the performance of the two variant systems,
6 images, respectively, have  35°, 20°, 0°, 20°, 45°, and 90° of namely, CELDPM (i.e., CELDP with only magnitude information)
elevation of the light source. Positive elevation implies that the and CELDPD (i.e., CELDP with only directional information), in the
light source is above the horizon, while negative elevation implies tenth and eleventh rows of Table 1. It is shown that the CELDPD
that the light source is below the horizon. Similar to experimental method achieves the second best recognition accuracy, while the
settings in [38,30], we conduct six experiments to evaluate the CELDPM method achieves the fifth best accuracy. This is expected
performance of each compared method. In the ith experiment, we and confirms the claim that the direction contains more important
use the ith image from each subject in S0 as the reference information in edge response images than magnitude [60].
(training) image and the remaining five images from each subject Results on the extended Yale B face database: The extended Yale
in S0 (50 images in total) and all the images in each of the other B database contains grayscale face images of 38 subjects under
five subsets as probe (testing) images. In each experiment, the nine poses. We use frontal face images in our experiments. Each
remaining five probe images for each subject in S0 are formed as a subject has 64 images (2414 images out of 2432 images are used
new subset S0’ with 50 images. Then, for each experiment, we since 18 images are corrupted). All face images are cropped and
compute the recognition accuracy for each of the six subsets (i.e., resized to 100  100 pixels. These frontal face images are cate-
S0', S1, S2, S3, S4, and S5). Next, the average face recognition gorized into six subsets based on the angle between the light
accuracy of each subset across all the six experiments is computed source directions and the central camera axis: S0 (0°, 228 images),
and, considered as the recognition accuracy for the respective S1 (1–12°, 301 images), S2 (13–25°, 380 images), S3 (26–50°, 449
subset. We also report the standard deviation of the recognition images), S4 (51–77°, 380 images), and S5 (above 78°, 676 images).
accuracy across all the six experiments for each subset. Finally, we S0 contains 6 images per subject, which are similar to S0 in the
compute the average and standard deviation of the face recogni- Yale B database and have  35°,  20°, 0°, 20°, 45°, and 90° of
tion accuracy across all the six subsets. Table 1 summarizes the elevation of the light source. We conduct six experiments for each
average recognition accuracy and its SD of each subset for the nine compared method using the same experimental settings with the
state-of-the-art compared methods and the proposed method. We Yale B database. Table 2 summarizes the average recognition

Table 1
Recognition accuracy (%) and corresponding SD in parentheses for Yale B face images.

Method S00 S1 S2 S3 S4 S5 Ave.

Gradientface 83.67 (0.10) 95.62 (0.06) 90.83 (0.09) 84.86 (0.10) 91.00 (0.09) 95.56 (0.05) 91.11 (0.06)
Weberface 87.33 (0.12) 93.96 (0.09) 90.17 (0.12) 89.17 (0.06) 89.83 (0.08) 98.33 (0.01) 92.51 (0.06)
LBP 83.33 (0.08) 96.25 (0.05) 91.83 (0.09) 85.28 (0.07) 88.50 (0.07) 90.93 (0.05) 89.68 (0.04)
LDP 81.67 (0.12) 92.71 (0.10) 86.67 (0.17) 87.22 (0.03) 84.00 (0.03) 84.07 (0.08) 85.98 (0.03)
EnLDP 79.67 (0.11) 92.92 (0.10) 87.00 (0.13) 81.81 (0.10) 85.33 (0.12) 87.13 (0.09) 85.95 (0.05)
LDN 77.00 (0.13) 91.25 (0.13) 84.50 (0.16) 80.14 (0.08) 81.33 (0.11) 80.93 (0.10) 82.40 (0.05)
LFD 92.33 (0.13) 95.83 (0.10) 93.50 (0.13) 93.06 (0.10) 95.67 (0.06) 97.59 (0.03) 95.13 (0.08)
ELDP 83.33 (0.12) 95.42 (0.06) 89.83 (0.10) 84.86 (0.09) 90.50 (0.10) 95.46 (0.05) 91.38 (0.06)
AH-ELDP 93.67 (0.06) 99.58 (0.01) 96.50 (0.05) 94.31 (0.06) 96.17 (0.04) 98.15 (0.02) 96.67 (0.03)
CELDPM 92.00 (0.10) 96.04 (0.07) 93.67 (0.10) 94.58 (0.06) 94.17 (0.07) 96.85 (0.03) 95.00 (0.06)
CELDPD 99.33 (0.01) 100.00 (00) 99.67 (0.01) 98.47 (0.01) 99.50 (0.01) 99.44 (0.00) 99.36 (0.004)
CELDP 99.33 (0.01) 100.00 (00) 99.50 (0.01) 99.03 (0.01) 99.67 (0.01) 99.44 (0.00) 99.47 (0.004)

Table 2
Recognition accuracy (%) and corresponding SD in parentheses for extended Yale B face images.

Method S00 S1 S2 S3 S4 S5 Ave.

Gradientface 68.25 (0.15) 86.54 (0.19) 79.69 (0.21) 73.05 (0.07) 77.59 (0.13) 82.17 (0.13) 78.76 (0.07)
Weberface 71.84 (0.17) 84.38 (0.15) 78.55 (0.21) 77.43 (0.06) 78.68 (0.12) 88.24 (0.05) 81.32 (0.09)
LBP 63.07 (0.17) 82.45 (0.18) 76.32 (0.24) 67.52 (0.07) 72.54 (0.10) 75.17 (0.11) 73.44 (0.07)
LDP 67.89 (0.18) 84.60 (0.23) 77.10 (0.25) 73.31 (0.11) 65.96 (0.06) 64.32 (0.11) 71.18 (0.08)
EnLDP 65.61 (0.19) 86.21 (0.19) 77.41 (0.21) 68.93 (0.07) 71.58 (0.13) 71.23 (0.17) 73.29 (0.08)
LDN 61.58 (0.21) 83.17 (0.24) 74.30 (0.26) 65.07 (0.09) 66.27 (0.10) 64.08 (0.16) 68.47 (0.09)
LFD 79.30 (0.23) 85.88 (0.24) 84.82 (0.26) 83.03 (0.21) 83.33 (0.17) 81.21 (0.06) 82.91 (0.17)
ELDP 70.53 (0.13) 87.76 (0.11) 81.14 (0.14) 72.94 (0.09) 78.25 (0.15) 83.70 (0.13) 79.85 (0.09)
AH-ELDP 77.46 (0.10) 88.26 (0.05) 84.82 (0.12) 81.89 (0.06) 86.32 (0.11) 89.79 (0.06) 85.77 (0.05)
CELDPM 79.30 (0.20) 86.38 (0.21) 81.32 (0.23) 82.52 (0.13) 83.33 (0.17) 84.96 (0.07) 83.38 (0.15)
CELDPD 90.96 (0.04) 93.63 (0.02) 94.34 (0.05) 92.69 (0.03) 95.26 (0.03) 93.96 (0.02) 93.71 (0.02)
CELDP 92.46 (0.06) 94.07 (0.03) 94.34 (0.06) 93.98 (0.03) 96.05 (0.03) 95.02 (0.01) 94.55 (0.02)
M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30 23

accuracy and its SD of each subset for the nine state-of-the-art the proposed method with different sized images. Fig. 11 shows
compared methods and the proposed method. S00 for each all eight images for a subject from the AR database. We conduct
experiment contains the images in S0 excluding the training eight experiments to evaluate the performance of the proposed
image. We also include the average accuracy and its SD for S00 , S1, method. In the ith experiment, we use the ith image from each
S2, S3, S4, and S5 (i.e., 2376 testing images in total) in the last subject as the reference image and all the other seven images as
column of Table 2. The proposed method outperforms the nine the testing images.
compared methods with the best face recognition accuracy of Table 4 summarizes the average recognition accuracy of the ten
94.55% and the smallest SD of 0.02, while the second highest compared methods for all reference images. The proposed CELDP
accuracy of 85.77% is obtained by the AH-ELDP method. It method achieves the highest average recognition accuracy of
improves the face recognition accuracy of Gradientface, Weber- 86.63%, while the second highest accuracy of 85.95% is obtained by
face, LBP, LDP, EnLDP, LDN, LFD, ELDP, and AH-ELDP by 20.05%, the AH-ELDP method. Compared with the other nine methods, the
16.27%, 28.74%, 32.83%, 29.01%, 38.09%, 14.04%, 18.41%, and 10.24%, proposed CELDP method improves the face recognition accuracy of
respectively. Gradientface, Weberface, LBP, LDP, EnLDP, LDN, LFD, ELDP, and AH-
Similar to Yale B database, we include the results for CELDPM ELDP by 19.49%, 27.47%, 14.20%, 12.51%, 1.64%, 4.44%, 1.59%, 1.26%
and CELDPD in Table 2 to confirm that the direction contains and 0:08%, respectively.
more important information in edge response images than The obtained accuracy for our two variant systems in Table 4
magnitude. indicates that CELDP improves the performance of both variations
Results on the PIE face database: The PIE database contains and outperforms all the compared methods. This also clearly
41,368 grayscale images (486  640 pixels) from 68 subjects. They verifies the effectiveness of incorporating direction and magnitude
are captured under various poses, illuminations, and expressions. of edge responses to achieve significant better results while
Frontal images from the illumination subset (C27) are used in our CELDPD and CELDPM alone yield inferior results.
experiments. This subset contains 21 images per subject. All face Results on the Honda UCSD video database: The Honda UCSD video
images are cropped and resized to 100  100 pixels. Fig. 9 shows database is a standard video database for evaluating face tracking/
all 21 images for a subject from this database and their corre- recognition algorithms [64]. Each video sequence is recorded in an
sponding CELDPD and CELDPM images. We conduct 21 experi- indoor environment at 15 frames per second, and each lasts for at
ments to evaluate the performance of the proposed methods. In least 15 s. The resolution of each video sequence is 640  480. Each
the ith experiment, we use the ith image from each subject as the individual is recorded in at least two video sequences. All the video
reference (training) image and all the other 20 images as the probe sequences contain significant 2-D (in-plane) and 3-D (out-of-plane)
(testing) images. Fig. 10 shows the recognition accuracy of differ- head rotations. In each video, the person rotates and turns his/her
ent methods under each reference set. Obviously, the proposed head in his/her own preferred order and speed. Typically in about
method outperforms the other methods for all reference images 15 s, the individual is able to provide a wide range of different poses.
except for the AH-ELDP method which achieves comparable However, some of these sequences contain difficult events such as
accuracy as the proposed method. partial occlusion, face partly leaving the field of view, and large scale
Table 3 summarizes the average recognition accuracy and its changes. The database contains two datasets corresponding to the
SD of the ten compared methods for all reference images. The two video sequences. We use training and testing subsets from the
proposed method achieves the highest average recognition first dataset, which respectively contain 20 videos for training and 39
accuracy of 99.53% and the smallest SD of 0.01, while the second videos for testing. The training videos belong to 20 human subjects,
highest accuracy of 99.45% is obtained by the AH-ELDP method. while the testing videos contain 19 individuals of these subjects. We
Compared with the other nine methods, the proposed method exclude the subject that is not in the testing set in our experiments.
improves the face recognition accuracy of Gradientface, In this research, the Viola–Jones face detector [4] is used to
Weberface, LBP, LDP, EnLDP, LDN, LFD, ELDP, and AH-ELDP by detect faces in each frame. It is a robust real-time face detector to
2.46%, 5.10%, 4.36%, 19.31%, 6.10%, 10.05%, 1.71%, :60% and :08%, distinguish faces from non-faces. Also, in our experiments, we use
respectively. It clearly shows the effectiveness of the proposed the most recently proposed facial landmark localization technique,
method. Chehra v.1 [9], to localize the landmarks in each face. Chehra is fully
The obtained accuracy for our two variant systems (i.e., the automatic real-time face and eyes landmark detection and track-
CELDPM and CELDPD) in Table 3 indicates that CELDP improves the ing software capable of handling faces under uncontrolled natural
performance of both variations and outperforms all the compared settings. It determines 49 landmarks for each face. Fig. 12 shows a
methods. In addition, CELDPD and CELDPM achieve the third and video frame from the Honda database together with the face
fourth accuracy, respectively. detection and landmark localization results, where the detected
Results on the AR face database: The AR database, collected at face is positioned in the green bounding box and its 49 landmarks
Ohio State University, contains over 4000 color images corre- are marked by “n”.
sponding to 126 people's faces (70 men and 56 women). They Illumination-invariant methods require the face images to be in
are frontal view faces with different facial expressions, illumi- the frontal view and the same size. Therefore, a series of pre-
nation conditions, and occlusions (sun glasses and scarf) [63]. processing steps should be applied. Since the bounding box for a
Each person participated in two sessions, separated by a two- detected face contains lots of non-facial features, especially the
week (14 days) time period. The same pictures were taken in background, each face needs to be cropped. If all faces are in the
both sessions. We use the cropped version of the database [65], frontal view similar to the frontal view face shown in Fig. 12, we
which contains 100 subjects (50 men and 50 women) of the size only need to crop and resize all faces. We can use the landmarks,
of 165  120 pixels. However, we still crop the faces to be similar i.e., the location of eyebrows, left and right corners of the left and
to the other databases that we used. We only use images with right eyes (i.e., external corners), and the lower part of mouth, to
illumination conditions and neutral expressions. They include automatically crop the face. Fig. 13 shows the cropped version of
two images with neutral expressions, two images with light the detected face in Fig. 12.
from the left, two images with light from the right, and two However, in the Honda UCSD video database, most of the
images with light from both sides for each subject. Since the faces are in non-frontal views. Fig. 14(a) shows a typical detec-
original size is small, all images are resized to 64  64 pixels. ted face in the non-frontal view and its landmarks. More
This also helps us to evaluate the robustness of parameters in operations should be applied to normalize the pose and
24 M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30

Fig. 9. Illustration of sample images of the CMU-PIE database and their CELDP images. (a) 21 samples for a subject. (b) Corresponding CELDPD images. (c) Corresponding
CELDPM images.

estimate the frontal view so the processed face can be used in compute the roll value for each detected face. Roll is defined as
the proposed system as well as other compared systems. Here, the angle between the horizontal line and the line connecting
we use landmarks for different tasks. To this end, we first the external corners of the eyes. It is shown in Fig. 14(b). We
M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30 25

then determine whether the face is in the frontal/right/left view. is greater than y2, the face is in the right view. Otherwise, it is in
As shown in Fig. 14(c), y1 is computed as the distance from the the left view.
nose tip to the external corner of the right eye, and y2 is com- Next, we rotate the face image using the computed roll value.
puted as the distance from the nose tip to the external corner of We also need to rotate the landmarks to their new locations.
the left eye. If y1 is equal to y2, the face is in the frontal view. If y1 Fig. 14(d) shows the result after rotating the face and its respective
landmarks. In the proposed system, we assume the face is in the
left view before we estimate its frontal view. Therefore, if the face
is in the right view (i.e., y1 4 y2 ), we flip the face to obtain its
respective left view. Fig. 14(f) shows the flipped face for Fig. 14(e),
which is now in the left view. Then, we appropriately stretch each
row from the left side of the face to estimate the frontal view for
the half of the face. The stretching operation needs to keep
important regions of a face in an appropriate scale with respect to
each other. Therefore, we divide the half face to four vertical
bands. The boundary of each band is determined using specific
landmarks or midpoint between a pair of specific landmarks. For
every row in the cropped image, its respective four segments in
the four bands are stretched to have an equal size to produce the
estimated frontal view of the half face. The bands and the
employed landmarks to estimate the frontal view of the half face
together with the estimated frontal view are shown in Fig. 14(g).
Fig. 10. Comparison of the recognition accuracy of ten methods for PIE face images. Finally, the right view of the face is reconstructed by flipping the

Table 3 Table 4
Average recognition accuracy (%) and correspond- Average recognition accuracy (%) and correspond-
ing SD in parentheses for PIE face images. ing SD in parentheses for AR face images.

Methods Accuracy Methods Accuracy

Gradientface 97.14 (0.03) Gradientface 72.50 (0.03)


Weberface 94.70 (0.06) Weberface 67.96 (0.03)
LBP 95.37 (0.04) LBP 75.86 (0.03)
LDP 83.42 (0.10) LDP 77.00 (0.01)
EnLDP 93.81 (0.07) EnLDP 85.23 (0.01)
LDN 90.44 (0.09) LDN 82.95 (0.02)
LFD 97.86 (0.02) LFD 85.27 (0.02)
ELDP 98.91 (0.01) ELDP 85.55 (0.01)
AH-ELDP 99.45 (0.01) AH-ELDP 85.95 (0.01)
CELDPM 96.25 (0.04) CELDPM 80.07 (0.02)
CELDPD 99.29 (0.01) CELDPD 83.76 (0.01)
CELDP 99.53 (0.01) CELDP 86.63 (0.01)

Fig. 11. Eight images with light changes and neutral expression from a subject in the AR face database.
26 M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30

estimated frontal left view. The final estimated frontal view of the
face is shown in Fig. 14(h).
After estimating the frontal view for each face, all face ima-
ges are resized to the size of 64  64. In this experiment, we use
every other ten frame to evaluate the performance of compared
methods and exclude other frames due to the large number of
frames. As a result, there are 42 face images for each subject in
this database on average. We can then conduct the experiment
on this reduced database. For the Honda UCSD video database,
we use five estimated frontal views of each subject from the
training subset for training and use estimated frontal views
from the testing subset for testing. Since subjects have facial
expressions, we apply Gabor filters as a post-processing step to
deal with expression variations. We also use the SRC [26,27] as
the classifier since more than one image are used for training.
The SRC classifier takes advantage of sparse representation to
achieve better performance when each subject has multiple
Fig. 12. A detected face from the Honda database. Each of 49 landmarks extracted
images in training.
by Chehra v.1 is represented by “n”.

Fig. 13. Cropped face for the detected face in Fig. 12 using the facial landmarks.

Fig. 14. Pose normalization process. (a) A detected face and its respective landmarks; (b) angle between the external corners of eyes and the horizontal axis determine the
roll value (here, r¼ 9.52°); (c) distances between the external corners of eyes and the nose tip determine whether the face is in the left view or in the right view (e.g., y1 4y2
means it is in the right view); (d) the rotated face using the roll value and its respective landmarks which are also rotated to the new locations; (e) cropped face using the
landmarks; (f) the cropped face is flipped; (g) the estimated frontal view of the left side of the face; and (h) the right side of the face is reconstructed using the left side of
the face.
M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30 27

Table 5
Recognition accuracy (%) for Honda UCSD video
face images.

Methods Accuracy

Gradientface 71.72
Weberface 74.49
LBP 74.51
LDP 60.35
EnLDP 72.73
LDN 70.33
LFD 75.13
ELDP 75.15
AH-ELDP 71.84
CELDPM 68.16
CELDPD 75.53
CELDP 76.89

Table 5 summarizes the recognition accuracy of the ten


compared methods. The proposed CELDP method achieves the Fig. 15. ROC curves for the Yale B database.
highest average recognition accuracy of 76.89%, while the
second highest accuracy of 75.15% is obtained by the ELDP
method. Compared with the other nine methods, the proposed
CELDP method improves the face recognition accuracy of
Gradientface, Weberface, LBP, LDP, EnLDP, LDN, LFD, ELDP, and
AH-ELDP by 7.21%, 3.22%, 3.19%, 27.41%, 5.72%, 9.33%, 2.34%,
2.31% and 7.03%, respectively. Comparing with the accuracy of
the two variant systems, the CELDP method improves the
performance of both variations to outperform all the compared
methods.
It should be mentioned that the compared methods behave
differently on the Yale B, extended Yale B, CMU-PIE, AR, and
Honda UCSD databases. For example, the Weberface method
achieves the fourth best accuracy of 92.51% and 81.32% on the
Yale B and extended Yale B databases, the seventh best accuracy
of 94.70% on the CMU-PIE database, the tenth best accuracy of
67.96% on the AR database, and the fifth best accuracy of 74.49%
on the Honda UCSD database, respectively. LFD achieves the
third best accuracy on Yale B, extended Yale B, and Honda UCSD
databases, while it achieves the fourth best accuracy on AR and
CMU-PIE databases. Clearly, the proposed CELDP consistently Fig. 16. ROC curves for the extended Yale B database.
outperforms all the other methods by achieving the highest
recognition accuracy on all these databases with different
illuminations.
match indicates an input image for a subject is determined to be the
On the other hand, the comparison of the performance of our
same subject as the reference (training) image.
two variant systems with the nine state-of-the-art methods on
We plot ROC curves for the four databases. For Yale B and
five databases indicates that the CELDPD method outperforms all
extended Yale B databases, we use each of the six images in S0 as
nine compared methods for Yale B, Extended Yale B, and Honda
the reference image and all the rest as probe images and compute
UCSD databases and yields comparable results for the PIE and
the average of respective similarity measures as true and false
AR databases. However, the CELDPM method only yields com-
scores. In a similar setting, we use each of the 21 images for each
parable performance for the five databases. This performance
subject in the CMU-PIE database as the reference image and all the
difference between CELDPD and CELDPM is expected, since the
rest as probe images and compute the average of respective
direction contains more important information than the
similarity measures as true and false scores. Similarly, we use each
magnitude. of the eight images for each subject in the AR database as the
reference image and all the rest as probe images and compute the
3.2. ROC curves and computational time average of respective similarity measures as true and false scores.
These scores are then used to compute the TPR and the FPR in
The ROC curve is a graphical plot to illustrate the verification and dealing with each reference image. Finally, the TPR and FPR for
discrimination ability of a binary classifier. It plots the true positive each database can be easily computed as the average of all the TPR
rate (TPR) against the false positive rate (FPR) of the classifier. A and FPR values across all experiments over different reference
classifier or method with higher TPR and lower FPR indicates a images.
better verification and discrimination ability. Since we use the same For the Honda UCSD video database, since we use multiple
classifier for all methods, the ROC curve can illustrate the ability of images from each subject for training and conduct one experiment
each method to verify and discriminate face images. Here, a positive using the remaining images as testing images, we compute the
28 M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30

Fig. 20. Average computational run-time for a face image (100  100).

average of similarity measures between each probe image from


subject i and training images for the subject i as true scores. We
then compute the average of similarity measures between each
Fig. 17. ROC curves for the PIE database.
probe image from subject i and training images for all the subjects
excluding the subject i as false scores.
Figs. 15–19 compare the respective ROC curves of ten compared
methods for Yale B, extended Yale B, CMU-PIE, AR, and Honda
UCSD databases, respectively. Clearly, the proposed CELDP method
achieves the best verification and discrimination ability compared
with the other state-of-the-art methods since it has the largest
area under the ROC curve.
The proposed method also has a comparable computational
time with the compared state-of-the-art methods. Fig. 20 shows
average computational run-time to preprocess a face image of
the size of 100  100 and produce the illumination-invariant
features from the preprocessed image. The LFD method has the
highest computational time, which is also employed in the
proposed system. However, we efficiently simplify the method
by skipping every other adjacent scale while using the same
dmax of 10.

4. Conclusions

The paper proposes a method to produce illumination-


Fig. 18. ROC curves for the AR database.
invariant representation for face images using logarithmic fractal
dimension-based complete eight local directional patterns. The
proposed method uses the adaptive homomorphic filter to par-
tially reduce the illumination by attenuating the low-frequency
component of the face image. It then applies the simplified loga-
rithmic fractal dimension operation as an edge enhancer techni-
que to enhance facial features such as eyes, eyebrows, nose, and
mouth while keeping noise at a low level. Finally, the proposed
method achieves robustness against illumination variations and
noise using CELDP, which not only considers the relations among
directions and magnitudes of all eight directional edge responses
but also represents more valuable structural information from the
neighborhood. Our experimental results verify that the proposed
method outperforms its two variant methods and nine recent
state-of-the-art methods and also obtains the best verification and
discrimination ability on four publicly available image databases
and one video database.

References

[1] H. Han, S. Shan, X. Chen, W. Gao, A comparative study on illumination pre-


Fig. 19. ROC curves for the Honda UCSD video database. processing in face recognition, Pattern Recognit. 46 (6) (2013) 1691–1699.
M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30 29

[2] M.R. Faraji, X. Qi, An effective neutrosophic set-based preprocessing method [31] S.M. Pizer, E.P. Amburn, J.D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. ter
for face recognition, In: Proceedings of the International Conference on Mul- Haar Romeny, J.B. Zimmerman, K. Zuiderveld, Adaptive histogram equalization
timedia Expo, 2013. and its variations, Comput. Vis. Graph. Image Process. 39 (3) (1987) 355–368.
[3] M. Yang, D. Roth, N. Ahuja, A snow-based face detector, In: Advances in Neural [32] S. Shan, W. Gao, B. Cao, D. Zhao, Illumination normalization for robust face
Information Processing Systems 12, Massachusetts Institute of Technology recognition against varying lighting conditions, In: IEEE International Work-
(MIT), Cambridge, Massachusetts, USA, 2000, pp. 855–861. shop on Analysis and Modeling of Faces and Gestures, 2003, pp. 157–164.
[4] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple [33] T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary pat-
features, Comput. Vis. Pattern Recognit. (2001) I-511–I-518. terns: application to face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 28
[5] G.C. Luh, Face detection using combination of skin color pixel detection and (12) (2006) 2037–2041.
viola-jones face detector, In: International Conference on Machine Learning [34] T. Jabid, M. Kabir, O. Chae, Local directional pattern (ldp) for face recognition,
and Cybernetics, vol. 1, 2014, pp. 364–370. http://dx.doi.org/10.1109/ICMLC. In: Digest of Technical Papers International Conference on Consumer Elec-
2014.7009143. tronics, 2010, pp. 329–330. http://dx.doi.org/10.1109/ICCE.2010.5418801.
[6] M. Valstar, B. Martinez, X. Binefa, M. Pantic, Facial point detection using [35] F. Zhong, J. Zhang, Face recognition with enhanced local directional patterns,
boosted regression and graph models, In: IEEE Conference on Computer Vision Neurocomputing 119 (2013) 375–384.
And Pattern Recognition, 2010, pp. 2729–2736. [36] A. Ramirez Rivera, R. Castillo, O. Chae, Local directional number pattern for
[7] B. Martinez, M. Valstar, X. Binefa, M. Pantic, Local evidence aggregation for face analysis: face and expression recognition, IEEE Trans. Image Process. 22
regression-based facial point detection, IEEE Trans. Pattern Anal. Mach. Intell. (5) (2013) 1740–1752, http://dx.doi.org/10.1109/TIP.2012.2235848.
35 (5) (2013) 1149–1163. [37] M.R. Faraji, X. Qi, Face recognition under illumination variations based on
[8] A. Asthana, S. Zafeiriou, S. Cheng, M. Pantic, Robust discriminative response eight local directional patterns, IET Biom. 4 (1) (2015) 10–17.
map fitting with constrained local models, Comput. Vis. Pattern Recognit. [38] M.R. Faraji, X. Qi, Face recognition under varying illumination based on
(2013). adaptive homomorphic eight local directional patterns, IET Comput. Vis. 9 (3)
[9] A. Asthana, S. Zafeiriou, S. Cheng, M. Pantic, Incremental face alignment in the (2015) 390–399.
wild, In: Computer Vision and Pattern Recognition, 2014, pp. 1859–1866. [39] Z. Lei, M. Pietikainen, S. Li, Learning discriminant face descriptor, IEEE Trans.
[10] M. De Marsico, M. Nappi, D. Riccio, H. Wechsler, Robust face recognition for Pattern Anal. Mach. Intell. 36 (2) (2014) 289–302, http://dx.doi.org/10.1109/
uncontrolled pose and illumination changes, IEEE Trans. Syst. Man Cybern. TPAMI.2013.112.
Syst. 43 (1) (2013) 149–163, http://dx.doi.org/10.1109/TSMCA.2012.2192427. [40] K. Juneja, A. Verma, S. Goel, An improvement on face recognition rate using
[11] V. Štruc, N. Pavešić, The complete Gabor–Fisher classifier for robust face local tetra patterns with support vector machine under varying illumination
recognition, EURASIP J. Adv. Signal Process. 2010 (2010) 31:1–31:13, http://dx. conditions, In: International Conference on Computing, Communication
doi.org/10.1155/2010/847680. Automation (ICCCA), 2015, pp. 1079–1084. http://dx.doi.org/10.1109/CCAA.
[12] Z. Lai, D. Dai, C. Ren, K. Huang, Multiscale logarithm difference edgemaps for 2015.7148566.
face recognition against varying lighting conditions, IEEE Trans. Image Process. [41] D.-J. Kim, M.-K. Shon, S. Lee, E. Kim, Illumination-robust face recognition
24 (6) (2015) 1735–1747, http://dx.doi.org/10.1109/TIP.2015.2409988. approach using enhanced preprocessing and feature extraction, J. Nanoelec-
[13] X. Tan, B. Triggs, Enhanced local texture feature sets for face recognition under tron. Optoelectron. 11 (2) (2016) 141–147, http://dx.doi.org/10.1166/
difficult lighting conditions, IEEE Trans. Image Process. 19 (6) (2010) jno.2016.1855.
1635–1650, http://dx.doi.org/10.1109/TIP.2010.2042645. [42] M. Ochoa-Villegas, J. Nolazco-Flores, O. Barron-Cano, I. Kakadiaris, Addressing the
[14] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci. 3 (1) (1991) illumination challenge in two-dimensional face recognition: a survey, IET Comput.
71–86. Vis. 9 (6) (2015) 978–992, http://dx.doi.org/10.1049/iet-cvi.2014.0086.
[15] B.A. Draper, K. Baek, M.S. Bartlett, J. Beveridge, Recognizing faces with PCA and [43] X. Zou, J. Kittler, K. Messer, Illumination invariant face recognition: a survey,
ICA, Comput. Vis. Image Understand. 91 (12) (2003) 115–137, http://dx.doi. In: First IEEE International Conference on Biometrics: Theory, Applications,
org/10.1016/S1077-3142(03)00077-8 (Special issue on Face Recognition). and Systems, 2007, pp. 1–8 http://dx.doi.org/10.1109/BTAS.2007.4401921.
[16] J.H. Shah, M. Sharif, M. Raza, M. Murtaza, Saeed-Ur-Rehman, Robust face [44] H. Wang, S. Li, Y. Wang, J. Zhang, Self quotient image for face recognition, In:
recognition technique under varying illumination, J. Appl. Res. Technol. 13 (1) International Conference on Image Processing, vol. 2, 2004, pp. 1397–1400.
(2015) 97–105, http://dx.doi.org/10.1016/S1665-6423(15)30008-0. http://dx.doi.org/10.1109/ICIP.2004.1419763.
[17] M. Bartlett, J.R. Movellan, T. Sejnowski, Face recognition by independent [45] H. Wang, S.Z. Li, Y. Wang, Face recognition under varying lighting conditions
component analysis, IEEE Trans. Neural Netw. 13 (6) (2002) 1450–1464, http: using self quotient image, In: Proceedings of the Sixth IEEE International
//dx.doi.org/10.1109/TNN.2002.804287. Conference on Automatic Face and Gesture Recognition, 2004, pp. 819–824.
[18] K.I. Kim, K. Jung, H.J. Kim, Face recognition using kernel principal component http://dx.doi.org/10.1109/AFGR.2004.1301635.
analysis, IEEE Signal Process. Lett. 9 (2) (2002) 40–42, http://dx.doi.org/ [46] Y.K. Park, S.L. Park, J.K. Kim, Retinex method based on adaptive smoothing for
10.1109/97.991133. illumination invariant face recognition, Signal Process. 88 (8) (2008)
[19] D.L. Swets, J. Weng, Using discriminant eigenfeatures for image retrieval, IEEE 1929–1945, http://dx.doi.org/10.1016/j.sigpro.2008.01.028.
Trans. Pattern Anal. Mach. Intell. 18 (8) (1996) 831–836. [47] T. Zhang, Y.Y. Tang, B. Fang, Z. Shang, X. Liu, Face recognition under varying
[20] Q. Liu, R. Huang, H. Lu, S. Ma, Face recognition using kernel-based Fisher illumination using gradientfaces, IEEE Trans. Image Process. 18 (11) (2009)
discriminant analysis, In: Fifth IEEE International Conference on Automatic 2599–2606, http://dx.doi.org/10.1109/TIP.2009.2028255.
Face and Gesture Recognition, 2002, pp. 197–201. [48] B. Wang, W. Li, W. Yang, Q. Liao, Illumination normalization based on weber's
[21] Z. Sun, J. Li, C. Sun, Kernel inverse Fisher discriminant analysis for face law with application to face recognition, IEEE Signal Process. Lett. 18 (8) (2011)
recognition, Neurocomputing 134 (0) (2014) 46–52, http://dx.doi.org/10.1016/ 462–465, http://dx.doi.org/10.1109/LSP.2011.2158998.
j.neucom.2012.12.075. [49] Y. Wu, Y. Jiang, Y. Zhou, W. Li, Z. Lu, Q. Liao, Generalized weber-face for
[22] D. Masip, J. Vitria, Shared feature extraction for nearest neighbor face recog- illumination-robust face recognition, Neurocomputing 136 (2014) 262–267,
nition, IEEE Trans. Neural Netw. 19 (4) (2008) 586–595, http://dx.doi.org/ http://dx.doi.org/10.1016/j.neucom.2014.01.006.
10.1109/TNN.2007.911742. [50] T. Song, K. Xiang, X. Wang, Face recognition under varying illumination based
[23] F. Song, Z. Guo, Q. Chen, Two-dimensional nearest neighbor classifiers for face on gradientface and local features, IEEJ Trans. Electr. Electron. Eng. 10 (2015)
recognition, In: International Conference on Systems and Informatics, 2012, 222–228.
pp. 2682–2686. http://dx.doi.org/10.1109/ICSAI.2012.62236077. [51] A. Baradarani, Q.M.J. Wu, M. Ahmadi, An efficient illumination invariant face
[24] S. Lawrence, C. Giles, A.C. Tsoi, A. Back, Face recognition: a convolutional recognition framework via illumination enhancement and DD-DTcWT filter-
neural-network approach, IEEE Trans. Neural Netw. 8 (1) (1997) 98–113, http: ing, Pattern Recognit. 46 (1) (2013) 57–72.
//dx.doi.org/10.1109/72.554195. [52] J. Chen, S. Shan, C. He, G. Zhao, M. Pietikainen, X. Chen, W. Gao, Wld: a robust
[25] T.H. Le, Applying artificial neural networks for face recognition, In: Advances local image descriptor, IEEE Trans. Pattern Anal. Mach. Intell. 32 (9) (2010)
in Artificial Neural Systems, vol. 2011, 2011, 16 pp. http://dx.doi.org/10.1155/ 1705–1720, http://dx.doi.org/10.1109/TPAMI.2009.155.
2011/673016. [53] M.H. Fazel Zarandi, M.R. Faraji, M. Karbasian, Interval type-2 fuzzy expert
[26] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via system for prediction of carbon monoxide concentration in mega-cities, Appl.
sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2) (2009) Soft Comput. 12 (1) (2012) 291–301.
210–227. [54] C.-C. Chen, J. DaPonte, M. Fox, Fractal feature analysis and classification in
[27] A. Wagner, J. Wright, A. Ganesh, Z. Zhou, H. Mobahi, Y. Ma, Toward a practical medical imaging, IEEE Trans. Med. Imag. 8 (2) (1989) 133–142.
face recognition system: robust alignment and illumination by sparse repre- [55] O. Al-Kadi, D. Watson, Texture analysis of aggressive and nonaggressive lung
sentation, IEEE Trans. Pattern Anal. Mach. Intell. 34 (2) (2012) 372–386, http: tumor ce ct images, IEEE Trans. Biomed. Eng. 55 (7) (2008) 1822–1830.
//dx.doi.org/10.1109/TPAMI.2011.112. [56] P. Kim, K. Iftekharuddin, P. Davey, M. Toth, A. Garas, G. Hollo, E. Essock, Novel
[28] A.F. Abate, M. Nappi, D. Riccio, G. Sabatino, 2D and 3D face recognition: a fractal feature-based multiclass glaucoma detection and progression predic-
survey, Pattern Recognit. Lett. 28 (2007) 1885–1906. tion, IEEE J. Biomed. Health Inform. 17 (2) (2013) 269–276.
[29] W. Braje, D. Kersten, M. Tarr, N. Troje, Illumination effects in face recognition, [57] A. Zlatintsi, P. Maragos, Multiscale fractal analysis of musical instrument sig-
Psychobiology 26 (4) (1998) 371–380. nals with application to recognition, IEEE Trans. Audio Speech Lang. Process.
[30] M.R. Faraji, X. Qi, Face recognition under varying illumination with logarithmic 21 (4) (2013) 737–748.
fractal analysis, IEEE Signal Process. Lett. 21 (12) (2014) 1457–1461, http://dx. [58] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Prentice-Hall, Upper
doi.org/10.1109/LSP.2014.2343213. Saddle River, New Jersey, USA, 2007.
30 M.R. Faraji, X. Qi / Neurocomputing 199 (2016) 16–30

[59] Y.S. Huang, C.Y. Li, An effective illumination compensation method for face Xiaojun Qi received her M.S. and Ph.D. degrees in
recognition, In: Advances in Multimedia Modeling, Lecture Notes in Computer Computer Science from Louisiana State University in
Science, vol. 6523, 2011, pp. 525–535. 1999 and 2001, respectively. In Fall 2002, she joined the
[60] Z. Guo, D. Zhang, D. Zhang, A completed modeling of local binary pattern Department of Computer Science at Utah State Uni-
operator for texture classification, IEEE Trans. Image Process. 19 (6) (2010) versity as a tenure-track assistant professor. In July
1657–1663, http://dx.doi.org/10.1109/TIP.2010.2044957. 2008, she received tenure and was promoted to
[61] T. Sim, S. Baker, M. Bsat, The cmu pose, illumination, and expression (pie) Associate Professor. In July 2015, she was promoted as
database, In: Proceedings of the IEEE International Conference on Automatic Professor. She has been a senior IEEE member since
Face and Gesture Recognition, 2002, pp. 46–51. http://dx.doi.org/10.1109/ 2010. Her research interests include content-based
AFGR.2002.1004130. image retrieval, digital image watermarking and infor-
[62] A.S. Georghiades, P.N. Belhumeur, D. Kriegman, From few to many: illumi- mation hiding, and pattern recognition and computer
nation cone models for face recognition under variable lighting and pose, IEEE vision.
Trans. Pattern Anal. Mach. Intell. 23 (6) (2001) 643–660, http://dx.doi.org/
10.1109/34.927464.
[63] A.M. Martinez, R. Benavente, Pca versus lda, CVC Technical Report 24,
Columbus, Ohio, USA.
[64] K.-C. Lee, J. Ho, M.-H. Yang, D. Kriegman, Visual tracking and recognition using
probabilistic appearance manifolds, Comput. Vis. Image Understand. 99 (3)
(2005) 303–331, http://dx.doi.org/10.1016/j.cviu.2005.02.002.
[65] A.M. Martinez, A.C. Kak, Pca versus lda, IEEE Trans. Pattern Anal. Mach. Intell.
23 (2) (2001) 228–233.

Mohammad Reza Faraji received the B.S. degree in


Applied Mathematics from Zanjan University in 2005,
the M.S. degree in Systems Engineering from Amirkabir
University of Technology in 2009. He received his Ph.D.
degree in Summer 2015 in Computer Science at Utah
State University. He is currently a postdoctoral
researcher with Computer Science Department at Utah
State University. His research interests include pattern
recognition, computer vision, and fuzzy systems.

You might also like