Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

C OMPARISON OF H OLISTIC AND F EATURE BASED

A PPROACHES TO FACE R ECOGNITION

A minor thesis submitted in partial fulfillment to the degree of


Masters of Applied Science in Information Technology
S TEWART T SENG

School of Computer Science and Information Technology,


Faculty of Applied Science,
Royal Melbourne Institute of Technology University,
Melbourne, Victoria, Australia.

10th of July 2003


Declaration

I certify that all work on this thesis was carried out between March 2003 and July 2003 and it has
not been submitted for any academic award at any other college, institute or university. The work
presented was carried out under the supervision of Dr. Ron Van Schydnel and Dr. Vic Ciesielski.
All other work in the thesis is my own except where acknowledged in the text.

Signed,

Stewart Tseng
10th of July 2003

i
Contents

Acknowledgements iii

Abstract 1

1 Introduction 2

2 Background 5
2.1 Early Face Recognition Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Brief Overview of Face Recognition Approaches . . . . . . . . . . . . . . . . . . . . . . 7

3 Holistic Face Recognition 13


3.1 Karhunen-Loève expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Feature Based Face Recognition 28


4.1 General Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Elastic Bunch Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Experiments 35
5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6 Conclusions 42

A Covariance Matrix 51

B Generalised Eigenproblem 52

ii
Acknowledgements

First and foremost, I would like to sincerely thank my supervisors, Ron Van Schydnel and
Vic Ciesielski for their invaluable supervision, direction as well as the many interesting and
constructive discussions, which have helped shape this minor thesis.

Special thanks to the following people who have assisted this research including Justin Zobel
(RMIT University), Albert Lionelle (Colorado State University), Matthew Turk (University of
California), Laurenz Wiskott (Humboldt-University Berlin), João Hespanha (University of Cali-
fornia), Peter Kalocsai (University of Southern California) and Aleix Martinez (Ohio State Uni-
versity).

iii
Abstract

Face recognition relates to identifying or verifying individuals by their faces. There are many
face recognition approaches. These can be classified as either holistic or feature based. At
present, there have been only a small number of investigations comparing holistic and fea-
ture based approaches. Our goal is to compare one holistic and one feature-based approach
to a collection of 3,282 face images. Additionally, we have surveyed recent holistic and fea-
ture based approaches. We have compared the performance of the Eigenface and Elastic Bunch
Graph Matching and found that the Elastic Bunch Graph Matching achieved a recognition rate of
96.2%, which was significantly higher than the 71.6% recognition rate achieved by the Eigenface
approach.

1
Chapter 1

Introduction

Face recognition is concerned with identifying individuals from a collection of face images. Face
recognition pertains to a vast range of biometric approaches including fingerprint, iris/retina and
voice recognition. Overall, biometric approaches are concerned with identifying individuals by
their unique physical characteristics. Traditionally, the use of passwords and Personal Identifica-
tion Numbers have been employed to formally identify individuals but the disadvantages of such
methods are that someone else may use them or they can be easily forgotten. Given these prob-
lems, the development of biometrics approaches such as face recognition, fingerprint, iris/retina
and voice recognition provides a far superior solution when identifying individuals, because not
only does it uniquely identify individuals, but it also minimises the risk of someone else using
another person’s identity. However, a disadvantage of fingerprint, iris/retina and voice recog-
nition is they require active cooperation from individuals. For example, fingerprint recognition
requires participants to press their fingers onto a fingerprint reading device, iris/retina recognition
requires participants to actively stand in front of a iris/retina scanning device, or, voice recogni-
tion requires participants to actively speak into a microphone device. Therefore, face recognition
is considered a better approach to other biometrics because it is versatile in the sense that indi-
viduals are identified actively, by standing in front of a face scanner, or passively, as they walk
past a face scanner.

2
There are also disadvantages of using face recognition. Faces are highly dynamic and can vary
considerably in their orientation, lighting, scale and facial expression, therefore face recognition
is considered a difficult problem to solve. Given these problems, many researchers from a range
of disciplines including pattern recognition, computer vision and artificial intelligence have pro-
posed many solutions to minimise such difficulties in addition to improving the robustness and
accuracy of such approaches. As many approaches have been proposed, there have also been
extensive surveys written over the last thirty years (Goldstein et al. 1971, Kaya & Kobayashi
1972, Harmon et al. 1981, Samal & Iyengar 1992, Chellappa et al. 1995, Zhao et al. 2000).

One can fundamentally classify the proposed approaches as either holistic based, where faces are
recognised using global features from faces, or, featured based, where faces are recognised using
local features from faces. However, the features used in holistic and feature-based approaches
are fundamentally different. Features found from holistic approaches represent the optimal vari-
ances of pixel data in face images that are used to uniquely identify one individual to another.
Alternatively, the features found from feature-based approaches represent face features like the
eyes, noses and mouth, where these features are used to uniquely identify individuals.

The first goal of this paper is to compare the face recognition performance of a holistic and a
feature-based approach. We achieve our comparison by testing these two approaches on the AR
face database (Martinez & Benavente 1998). In addition to our comparison, the second goal will
be to provide a survey of recent holistic and feature-based approaches, which we complement
previously written surveys on face recognition. It is implied that all discussed approaches in this
paper use latent two-dimensional grey scale images to represent faces. Furthermore, we define
that a dataset refers to a collection of faces, the gallery set refers to the training set and the probe
set refers to the test set.

Face recognition has far reaching benefits to corporations, the government and the greater society.
The application of face recognition to corporations include access to computers, secure networks
and video conferencing; access to office buildings and restricted sections of these buildings; ac-
cess to storage archives, or, identifying members at conferences and annual general meetings.

3
Specific corporate applications include access and authorisation to operate machinery, clocking
on and clocking off when beginning and finishing work, assignment of work responsibilities and
accountability based on identity, monitoring employees, or, confirming the identity of clients,
suppliers and transport and logistics companies when they send and receive packages. Addition-
ally, sales, marketing and advertising companies could identify their customers in conjunction
with customer relationship management software.

Application of face recognition to state and federal governments may include, access to parlia-
mentary buildings, press conferences and access to secure confidential government documents
and reports and doctrines. Specific government use of face recognition can include, Australian
Customs verifying the identity of individuals to their passport files and documents, or, state and
federal police using face recognition to improve crime prevention and facilitate police activities.

Application of face recognition to the greater society may include election voting registration,
access to venues and functions, verifying the identity of driver’s to their issued licenses and
personal identification cards, confirming identity for point of sales transactions like credit cards
and EFTPOS and confirming identity when accessing funds from an automatic teller machine.
Other applications of face recognition include facilitating home security, or, gaining access to
motor vehicles.

As there are many potential applications of face recognition, our paper will limit our discussion
to only the computational aspects of face recognition. This paper shall begin with earlier de-
velopments of face recognition and an overview of other face recognition approaches in chapter
2. In chapter 3, we will focus on describing recent holistic approaches, while in chapter 4, we
shall describe recent feature based approaches. We then compare the performance of a holistic
approach to a feature-based approach and report the outcomes found in chapter 5. We shall then
conclude by summarising the paper, as well as, provide insight on our future work and outline
three leading commercial vendors in chapter 6. We shall begin with the background of face
recognition.

4
Chapter 2

Background

In this chapter, we shall outline earlier approaches of face recognition in section 2.1, which have
contributed to the development of face recognition. In section 2.2, we will describe an overview
of a number of approaches being presently utilised in face recognition. We now begin with a
discussion on the earlier developments of face recognition.

2.1 Early Face Recognition Developments

The central question of face recognition is how do people recognise faces? More importantly,
what are the processes for recognising faces. These questions have led to many investigations
by the cognitive psychology and neurophysiology field. In respect to face recognition, cognitive
psychology and neurophysiology are concerned with the empirical evaluation and theoretical
development of cognitive processes that enables conceptual and practical understanding of how
the human cognition recognises faces. Overall, cognitive psychologists and neurophysiologist
implied that through such investigations they would be able to reconstruct the cognitive processes
of the human cognition. Therefore, they would be able to predict the patterns used by the human
cognition to recognise faces (Young & Bruce 1991).

5
Such studies have included the familiarity and unfamiliarity of faces (Benton 1980), investiga-
tions into people with Prosopagnosia, where the patients are no longer able to recognise faces
of previously known individuals (Hécaen & Angelergues 1962) and the effects of face distinc-
tiveness (Shepherd et al. 1991) when recognising faces. Other studies have also included face
uniqueness (Hay & Young 1982) and use of face caricatures, which distort faces to improve their
uniqueness and distinction amongst the general population (Rhodes et al. 1987, Benson & Perret
1991).

Specific to cognitive psychology, several different but complementary information processing


models have been proposed (Hay & Young 1982, Ellis 1986, Bruce & Young 1986). Even though
some argued that these information processing models are too generalised and abstract in their
representations of the human cognition, these models were still applied to test the validity of
their hypotheses, which were achieved with the use of computers. For example, in Sergent
(1989) investigation, they used computers to facilitate with their research and understanding into
face recognition.

In parallel to cognitive psychology and neurophysiology, the field of computer science was in-
terested in constructing computational models to achieve face recognition. Fundamentally, face
recognition interrelates to and is influenced by computer vision and artificial intelligence (Clowes
1971, Minsky 1975, Marr 1980). This led to initial developments of face recognition systems
(Baron 1981, Kohonen 1984, Stonham 1986). However, it was the work of Kanade (1973) who
had combined low-level processing of images and high level decision making to develop one of
the first full face recognition systems. The majority of others had manually selected features to
perform face recognition, whereas Kanade (1973) opted to use an autonomous approach to se-
lect feature points of the face. The system identified people via three stages of extracting features
autonomously, adjusting the location of feature points to different face sizes and measuring the
similarity between two faces based on the Euclidean distance amongst the feature points. Specif-
ically, the first stage of the system captured a coarse resolution of the face via a face scanner or
television camera. The coarse images scanned were transformed to a binary image, by applying
a Laplacian operator with a threshold of 30. After the transformation, the vertical and horizontal

6
integral projection of the binary images were used to locate and extract feature points of the face
(Kanade 1973). The second part was similar except faces were scanned at higher resolutions,
but only at specific locations. The specific locations were based on the extracted feature points
of coarsely scanned faces. Furthermore, faces scanned at higher resolutions were achieved by
using a face scanner or television camera. In the second stage, the system adjusted the features
points, found from high resolution faces, to match the ratio and location to different face sizes.
In the last stage, the system determined whether a probe face matched a gallery face, which
was defined by the smallest Euclidean distance amongst the feature points between a probe and
gallery set. Kanade (1973) reported achieving a recognition rate of approximately 45% to 75%
with a dataset of 40 faces. The 40 faces consisted of 20 faces that formed the gallery set and
another 20 faces for the probe set. Kanade (1973) reported the system correctly recognised 15
probe faces from the 20 gallery faces. Looking forward to the present, there have been many
significant contributions since the work of Kanade (1973). In the next section, we shall discuss a
present overview of face recognition approaches.

2.2 Brief Overview of Face Recognition Approaches

The area of face recognition has seen many diverse, novel and promising approaches proposed.
In this section, although we only provide an overview of present approaches in the face recog-
nition literature, we also describe various face datasets that have been used to test the feasibility
of such approaches. In addition, we shall outline an industry standard dataset, which has pro-
vided a common measure of accuracy to evaluate the performance of different face recognition
approaches.

The use of Support Vector Machines (Vapnik 1998, Burges 1998) has previously been used for
general pattern recognition, but also applied to face recognition by Guo et al. (2000). Guo et al.
(2000) had used Support Vector Machines in conjunction with a binary decision tree, which
they found comparable improvements to the Eigenface approach (Turk & Pentland 1991). Also,
Xi et al. (2002) reported using Support Vector Machines for face recognition where they were

7
able to accurately and autonomously extract features from faces. Tefas et al. (2001) had used
Support Vector Machines with the elastic bunch graph matching approach (Wiskott et al. 1999)
and reported achieving a lower error rate for recognising faces when compared to the standard
elastic bunch graph matching approach (Wiskott et al. 1999). While a general evaluation of
Support Vector Machines to face recognition was proposed by Johnsson et al. (1999).

Fast Fourier Transform was another approach applied to face recognition. In Colmenarez &
Huang (1998), they proposed using the Fast Fourier Transform approach to represent the gallery
set as a number of templates. Face recognition for this approach was achieved by minimising the
Euclidean distance amongst the peaks found between comparing a gallery face to a probe face.
In contrast, Kalocsai & Biederman (1998) had used Fast Fourier Transform to stimulate neural
and behavioural effects of the humans visual system when recognising faces.

Another face recognition approach was the Hidden Markov Model. This approach had been used
in speech and character recognition, but was applied to face recognition by Samaria & Fallside
(1993), Samaria (1994) and Samaria & Young (1994).

In contrast, the majority of face recognition approaches focused only on directly recognising
faces, they did not specifically consider the inherent difficulties of lighting variations on faces.
Consequently, lighting variations led to a reduction in face recognition performance. Thus, Adini
et al. (1997) provided an empirical study to evaluate whether a selected number of approaches
were invariant to various lighting conditions being tested. They had constructed a database of
125 images that included 25 individuals. Five face images were taken of each person where they
strictly controlled the lighting variations. Lighting variation included the illumination of the left
side and the right side of the face. They found by not pre-processing grey scale faces, where
faces would normally be normalised to the average face of a gallery set, the approaches being
evaluated would always incorrectly identify a gallery face to a probe face. This was indicated
by a 100% error rate, where the error rate defined the incorrect identification of individuals.
In addition, they found by pre-processing the grey scale faces the incorrect identification rate
varied from 20% to 100%. Similarly, in another evaluation, Georghiades et al. (1998) had also

8
conducted an investigation into the effects of lighting variations on faces.

As lighting and other variations were important factors to consider when performing face recog-
nition, there was also a common need to compare different face recognition approaches. In
addition, there needed to be a standard dataset for both the gallery and probe set to enable the
benchmarking of performances for face recognition approaches. Therefore, Phillips et al. (2000)
proposed to test the performance and feasibility of face recognition approaches on a large dataset.
Furthermore, they proposed to develop a common measure of performance in order to evaluate
face recognition approaches as well as identify future research directions. A large dataset was
constructed for the Face Recognition Technology (FERET) program (Phillips et al. 2000), which
contained 14,126 images of 1,199 individuals. The performance measure was based on the cu-
mulative recognition rate. The cumulative recognition rate was found by firstly ranking each
probe set in relation to the gallery set. The rank indicates how well a probe face matches an
intended gallery face, given that the probe and gallery face represents the same individual. So for
a probe face with a rank of 1, this indicates the probe face has the smallest Euclidean distance
to an intended gallery face. Therefore, one can generally conclude that a probe face with the
nth smallest Euclidean distance to the intended gallery face has a rank of n (Phillips et al. 2000).
Once the rank is found for each probe face, a cumulative rank score is calculated for each rank.
This is found by the sum of each probe face within a rank of n and is then divided by the total
number of probe faces. It is the percentage of this cumulative rank score which denotes the cu-
mulative recognition rate. For example, given we have determined the rank for the probe faces,
we have 15 faces within a rank of 10 (inclusive of rank 1 to rank 10) and a total of 20 probe
faces. The cumulative recognition rate in this instance would be 75% at rank 10. In a further
investigation, Phillips et al. (2003a) compared 10 commercial vendor products against a much
larger database, containing 121,589 images of 37,437 individuals. The faces were originally
taken from a collection of approximately 6.8 million images of 6.1 million individuals (Phillips
et al. 2003a).

In the past, others have constructed a probe and gallery set for the purposes of evaluating their
own proposed approaches, shown in tables 2.1 and 2.2. The list of the datasets are further dis-

9
cussed in chapters 3 and 4. In addition, tables 2.1 and 2.2 include the recognition rate achieved
on such datasets, where we briefly explain these results in the footnote on page 12.

As a typical gallery set would only contain a single view of a known individual, it was most likely
those single views were also the frontal views of faces. Motivated by this limitation, Beymer &
Poggio (1995) proposed a pose invariant face recognition approach, which generated virtual ori-
entation views from the affine transformation of frontal views of faces. Another similar but
independent proposal on pose invariant face recognition was investigated by Lando & Edelman
(1995). Complementary to pose invariant investigations, Moses et al. (1996) experimented with
the effects of using upright faces versus inverted faces of individuals, whereby faces had varying
poses. In addition, the inverted faces were created from the upright faces. What they had dis-
covered from their experiments was that inverted faces did not perform as well as using upright
faces, since the inverted faces were treated as a different face class and therefore significantly
reduced the face recognition performance. In a further investigation by Moses & Ullman (1998),
they evaluated global, local and hybrid approaches of face recognition based on their analysis on
the effects of using frontal and varying poses of faces. The hybrid approaches as discussed by
Moses & Ullman (1998), was based on using both a global and local approach to achieve face
recognition.

A comparison of feature based versus template matching approaches was put forth by Brunelli
& Poggio (1993). Thus, we were also interested to provide a recent account of developments
in holistic and featured based approaches. Hence forth it is the focus of the next two chapters
to describe holistic face recognition approaches in chapter 3 and feature based face recognition
approaches in chapter 4.

10
Face Datasets Individual Variation Total Image Image Dimension Rate
Type
Sirovich & Kirby (1987) 115 113 male 115 Grey 128128 -
2 female Scale
Turk & Pentland (1991) 16 male 2500 Grey 512512 96%
Scale 256256 85%
128128 64%
6464
3232
1616
O’Toole et al. (1993) 150 75 male 150 Grey 151225 98.7%
75 fe- Scale (16
male levels)
approx
Pentland et al. (1994) [21] [45] [9] [2] [189] [90] Grey - 90%
Scale 95%
Moghaddam et al. (1998) 112 frontal 224 Grey - 89.5%
view Scale
Belhumeur et al. (1997) [5] [16] [66] [10] [330] [160] Grey - -
Scale
Yang et al. (2000) [15] [40] [11] [10] [165] [400] Grey 2941 -
Scale
Swets & Weng (1996) 504 - 1614 Grey - 98%
Scale
Zhao et al. (1999) - - 115 - - -
Kim et al. (2002) 133 lighting 665 - - -

Table 2.1: Face Datasets A: Provides an overview of the face dataset used by various face
recognition approaches, as well as, details on how many different individuals were represented,
the gender of the individuals, the total number of images that were used in their experiments, the
format type of the images, the different image dimensions used, and the recognition rate.

11
Face Datasets Individual Variation Total Image Image Dimension Rate
Type
Lu et al. (2003) [20] [40] [-] [10] [575] [400] Grey 11292 -
Scale
Craw et al. (1992) [1] [-] [-] [-] [64] [50] Grey - 95%
Scale
Manjunath et al. (1992) 86 facial 306 Grey 128128 94%

expres- Scale
sion &
orienta-
tion
Wiskott et al. (1999) [250] frontal [500] [216] Grey 256384 99%
[108] orien- Scale 256384 97%
tatation
Wiskott et al. (1995) 111 male 111 Grey 128128 90.2%
female Scale
bearded
glasses
Kalocsai et al. (2000) 325 2 650 Grey - 93%
Scale

Table 2.2: Face Datasets B: Provides an overview of the face dataset used by various face
recognition approaches, as well as, details on how many different individuals were represented,
the gender of the individuals, the total number of images that were used in their experiments, the
format type of the images, the different image dimensions used, and the recognition rate.

1. averaged over lighting


2. averaged over orientation
3. averaged over different face scales
4. used 1 to 100 eigenvectors for recognition
5. view based recognition of 9 different orientations (-90 to +90 ) of the face
6. modular eigenspace recognition based on the eigeneyes and eigenfaces
7. based on the intrapersonal and extrapersonal class recognition rate
8. within the top 15 correctly identified individuals
9. based on recognising a moving sequence of a single person
10. within the top 3 correctly identified
11. within the top 10 correctly identified
12. within the top 4 correctly identified
13. correctly identified gender
14. used 100 kernels that statistically provided the most information for recognition

12
Chapter 3

Holistic Face Recognition

Holistic face recognition utilises global information from faces to perform face recognition. The
global information from faces is fundamentally represented by a small number of features which
are directly derived from the pixel information of face images. These small number of features
distinctly capture the variance among different individual faces and therefore are used to uniquely
identify individuals. In this chapter, we shall describe two holistic approaches to face recognition
called Karhunen-Loève expansion and Linear Discriminant Analysis.

In the Karhunen-Loève expansion section, we define the Karhunen-Loève expansion concept


and explain the use of partial and full faces by Sirovich & Kirby (1987) and Kirby & Sirovich
(1990), as well as, an influential paper on the Eigenface approach by Turk & Pentland (1991).
Additionally, we will also discuss the Karhunen-Loève expansion been applied to face recogni-
tion by others (O’Toole et al. 1993, Pentland et al. 1994, Moghaddam et al. 1998, Yang et al.
2000).

Alternatively, in the Linear Discriminant Analysis section, we define the concept of Linear Dis-
criminant Analysis. Furthermore, we shall discuss the work of Belhumeur et al. (1997) on Fisher-
faces and many others who have proposed variations of Linear Discriminant Analysis to achieve
face recognition (Swets & Weng 1996, Zhao et al. 1999, Kim et al. 2002, Lu et al. 2003).

13
3.1 Karhunen-Loève expansion

The Karhunen-Loève expansion, also known as Principal Component Analysis or Hotelling


transform, is traditionally concerned with feature selection for signal representation. By apply-
ing Karhunen-Loève expansion to face recognition, it finds a small number of features, which are
defined by the principal components of the face. The principal components of the face are found
by projecting two dimensional face images into a one dimensional subspace, then selecting the
principal components which capture the highest variances amongst individual faces. Specifically,
the principal components of the face are found by solving the eigenvectors and eigenvalues of
the covariance matrix (Appendix B). In other words, the eigenvectors consist of a small number
of features that represent variations amongst faces in the dataset. The small number of features
can also be called the feature space. Furthermore, by finding the best features that represent faces
there is greater efficiency in memory storage and run-time execution.

To find the eigenvectors u, from the dataset , we consider the face as a grey scale image where
each pixel in the image represents a vector, thus a face image is a matrix of vectors  
 
    
   ..   (3.1)
 . 
  

where the face image is represented as a one dimensional column vector in raster order. For
example, the end of the first row of the face image is joined by the start of the second row of the
face image and the process continues for the rest of the face image.

Then for each face  in the dataset  we find the the average face  for the dataset , where 
denotes each face, which we previously referred to as  for a single face in the dataset . 


centralises the average face  near the coordinate system origin and M is the size of the dataset.

 
 

  (3.2)
  

14
Figure 3.1: Average Face (Sirovich & Kirby 1987) From left to right, normal face, average face for the
dataset, and normalised face, which is difference between the normal face and the average face

Once the average face  (equation: 3.2) is found, we normalise the face  by subtracting the
average face  from each face  in the dataset ,

    
 (3.3)

As eigenvectors u is derived from the covariance matrix C (Appendix A.1), the covariance ma-
trix C represents the relationship between two matrices and in this application also represents the
variance of the dataset (Fukunaga 1990), where  represents the normalised matrix transposi-
tion of the face image

C
 

   (3.4)
  

To find the eigenvectors u we can solve the following

     (3.5)

where  is the eigenvalues, given the eigenvalues   satisfy the equation,      .

15
In choosing the optimal eigenvectors  , we find the highest eigenvalues  of the covariance
matrix

 
 

      (3.6)
  

where M highest eigenvalues  represents the largest variance for the normalised face  , whereas
the remaining eigenvalues  would have variances close to zero. The variances close to zero
do not provide any significant discriminatory information to facilitating face recognition.

Having found the M highest ranked eigenvalues   , we can find the associated eigenvectors  by
the following



     (3.7)
 

By finding the eigenvectors u for the dataset , the optimal projection is defined as




    (3.8)
 

where
is the feature space or small number of features representing the dataset  and  is the
optimal transformation matrix for the dataset . We shall now describe how the Karhunen-Loève
expansion has been applied to face recognition.

Sirovich & Kirby (1987) were the first to apply the Karhunen-Loève expansion to face recog-
nition. In their initial investigation of face recognition, they used a small number of features to
represent the dataset. Once they had reconstructed the small number of features to form images,
they concluded the images resembled faces, which they called Eigenpictures (see figure 3.2 on
page 27, the first 8 principal components of the upper portion of a face are shown). They tested
their approach on a gallery set of 115 grey scale faces that consisted of 113 males and 2 females.
The dimensions of the images in the gallery set were 128  128 pixels and contained small
lighting variations. The images were frontal views of the face but were manually cropped to

16
only contain the eyes and nose. From their experiments of using 40 features, they found an error
rate of 3.9% and 2.4% respectively for two female probe faces. The error rate was based on a
probe set of 115 faces that consisted of 113 males and 2 female faces. They concluded from the
error rates that their approach was gender independent. In another experiment, they also used 40
features and found an error rate of 7.8% where faces with insufficient lighting were tested on the
same probe set. While a limitation of this investigation was the use of partial faces, we shall see
in the following investigation the advantages of using full faces for face recognition.

Kirby & Sirovich (1990) extended their previous investigation (Sirovich & Kirby 1987) on using
Karhunen-Loève expansion to use full frontal faces and face reflections. Face reflections were
created by mirroring images about the mid-line of the face. Using full frontal faces and face
reflections, they investigated the effects of even and odd symmetry Eigenfunctions to perform
face recognition (Kirby & Sirovich 1990). From their experiments, they discovered a majority of
the small number of features corresponded to the even symmetry Eigenfunctions. Therefore, they
concluded the even symmetry Eigenfunctions represented the underlying, general symmetrical
and structural properties of the face. They concluded that even and odd symmetry Eigenfunctions
facilitated the identification of individuals. Furthermore, even and odd symmetry Eigenfunctions
reduced the face recognition error rate for probe faces which were not part of the gallery set.

Motivated by the work of Sirovich & Kirby (1987), Turk & Pentland (1991) applied the Karhunen-
Loève expansion called Eigenfaces, to achieve face recognition. Turk & Pentland (1991) wanted
to use global and holistic information of the face to perform face recognition whereas others
had previously used individual and local facial features (Turk 2001). They proposed a general
framework to face recognition. The framework defined the face space for face recognition and
defined the face map for face detection. The face space was defined by the Eigenfaces con-
structed from the gallery set. Thus, the framework transformed probe faces into their Eigenface
equivalents (probe Eigenface). The probe Eigenfaces were then matched to the face space to
determine whether faces matched a known individual or not. Specifically, the closeness of the
probe Eigenfaces to face space was determined by minimising the Euclidean distance for the
face class metric and face space metric. For illustrative purposes, we outline the following two

17
metrics:

The face class metric is defined as


   
 (3.9)

where   
    
   


where  is the probe Eigenfaces and 


is the face class f.

The face space metric defined as

    
 (3.10)

where   
    
   


where  is the mean adjusted Eigenfaces (  Eigenfaces derived from the input faces minus
the average faces) and 
is the face space.

Given the face class and face space metric determined whether the probe Eigenfaces did match
a face in the gallery set, the face class and face space specified an arbitrary threshold . For
example, if the probe Eigenfaces were below the threshold for the face class and face space,
then the face was recognised. Otherwise, if the probe Eigenfaces were above the threshold ,
the probe Eigenfaces were classified as unknown. Additionally, if the same probe faces were
consistently being classified as unknown, the classified unknown faces would therefore be added
to the face space and face class.

The face space had four possible outcomes to indicate how close a probe Eigenfaces matched a
face in the gallery set:

1. close to face class and close to face space

2. far from face class and close to face space

3. close to face class and far from face space

18
4. far from face class and far from face space

Whilst Turk & Pentland (1991) selected a small number of features with the highest variances,
O’Toole et al. (1993) investigated whether the selection of features with different variances im-
proved face recognition. In their experiments, O’Toole et al. (1993) explored the effects of
selecting features with higher variances as compared to selecting features with lower variances
for face recognition. They discovered by selecting the first 15 features with the highest variances,
general features of faces were captured whereas selecting features with lower variances unique
features of faces were captured for face recognition.

In contrast, Pentland et al. (1994) observed the limitations of face recognition and included face
variations in lighting, orientation and scale. They proposed to extend the existing Eigenface ap-
proach (Turk & Pentland 1991) to handle a larger dataset that contained 7,562 face images of
approximately 3,000 individual. Their proposal included a view-based and modular eigenspace
approach to face recognition. The view-based approach extended a single set of features for the
gallery set to include multiple set of features to represent the gallery set. The multiple set of
features captured the scale and orientation variance that was present in the gallery set. Thus,
the view-based approach would encompass the scale and orientation variations in M indepen-
dent subspaces. They rationalised by using M independent subspaces, when compared to the
Eigenface approach of only using one dimensional subspace, they were able to capture and rep-
resent the shape and geometry of faces. For their other approach, they proposed using a modular
eigenspace, where individual features of faces were modularised to form the Eigenface equiva-
lents; the eyes, nose and mouth were defined as the eigeneyes, eigennose and eigenmouth respec-
tively. They stated by using a modular eigenspace their approach was robust to large variations in
the gallery set. Additionally, they suggested a coarse-to-fine strategy could be employed, where
at the coarse level, eigenfaces were used to recognise whole faces and at the finite level, the
eigeneyes, eigennose and eigenmouth were used to recognise individuals by their facial features.

Moghaddam et al. (1998) proposed to improve the Eigenface approach. They stated a limita-
tion of the Eigenface approach was its face similarity measure. They explained by using the

19
Euclidean distance to measure face similarity, it did not capture significant face variations that
could improve face recognition. Therefore they defined a different face similarity measure. It was
based on a probabilistic measure of face similarity, which they incorporated into the Eigenface
approach. They proposed that the probabilistic face similarity measure would capture significant
face variations, which was defined by two mutually exclusive classes, the intrapersonal and ex-
trapersonal class. The intrapersonal class captured the face variations of the same individuals,
which also included variations in lighting and facial expressions. In contrast, the extrapersonal
class captured variations amongst different individuals. They tested their approach on the 1996
FERET dataset (Phillips et al. 2000), which they found their proposed face similarity measure
improved the recognition of faces by up to 10% when compared to using the Euclidean distance
to measure face similarity.

A different approach to face recognition was proposed by Yang et al. (2000). Based on higher
order statistics (Rajagopalan et al. 1999), Yang et al. (2000) proposed the kernel Eigenface ap-
proach to capture high level information such as edges and curves from three or more pixels of
face images. Therefore they inferred by capturing such relationships within face images it would
improve face recognition. Their approach was an improvement to the Eigenface approach. They
stated the fundamental difference between the kernel Eigenface feature space and Eigenface
feature space was that kernel Eigenface projected faces into higher dimensional feature space,
whereas the latter Eigenface approach projected into a lower dimensional subspace. The higher
dimensional projection space for the kernel Eigenface can be formalised by

       (3.11)

where is the feature space found from the high dimensional space,  is the original face
image with a dimensionality size of a and  is the new face image with a dimensionality size
of  where   a implies that  represents a high dimensional feature space that is greater than
the a dimensionality feature space.

20
The previous covariance matrix shown in eq. 3.4 now becomes

 
 

   (3.12)
  

where the defines the new size of the covariance matrix  .

Given this difference, they defined that even though the kernel Eigenface was a nonlinear tech-
nique, they were still able to find a small number of features by finding the principal components
of the face, where these features could be used to uniquely identify individuals. In their ex-
periments they compared the Eigenface and kernel Eigenface approach, in which they used two
datasets; the Yale (Yale Face Database 1997) and AT&T (AT&T Face Database 1994) dataset.
The Yale dataset consisted of 165 images of 11 different individuals and contained variations
in face expression and lighting where these images were resized to
  pixels. The AT&T
dataset consisted of 400 images of 40 individuals with variations in facial expression and pose;
the images were resized to

 pixels. According to their experiments, they found kernel
Eigenface approach produced a lower error rate when compared to the Eigenface approach on
the Yale and AT&T dataset. In the next section, we shall describe a different holistic approach to
face recognition.

3.2 Linear Discriminant Analysis

Fisher’s Linear Discriminant known as Linear Discriminant Analysis finds a small number of
features that differentiates individual faces but recognises faces of the same individual. A small
number of features is found by maximising the Fisher Discriminant Criterion (Fisher 1936),
which is achieved by maximising the grouping of individual faces whilst minimising the group-
ing of different individual faces. Therefore by grouping faces of the same individual these fea-
tures can be used to determine the identity of individuals.

21
Linear Discriminant Analysis is defined by the between scatter class   and within scatter class
 . The between scatter class  are faces of different individuals while the within scatter class
 are faces of the same individuals. The between scatter class  specifically represents the
scatter of features around the mean of each face class whilst the within scatter class  represents
the scatter of features around the overall mean for all face classes. The definition of the between
scatter class  is



         (3.13)
 

where C defines the number of face classes (individuals),  is the number of faces per face class,
 is the mean of the face class and  is the overall mean of all different face classes.
We also outline the within scatter class   as


 
        (3.14)
  

where  is the number of faces per face class and  is the mean of a face class.

Given eq. 3.13 and eq. 3.14 to find a small number of features we maximise the Fisher Discrim-
inant Criterion ratio (Fisher 1936) by the between scatter class  and the within scatter class
 . The Fisher Discriminant Criterion ratio is



 
 (3.15)

where
represents a small number of features and u is the optimal projection for the gallery
set. To find the optimal projection u, this can be found by solving the general eigenproblem
(Appendix B) to Linear Discriminant Analysis by

      (3.16)

where  is the eigenvalues of a face class and  is the within scatter class.

22
To maximise the Fisher Discriminant Criterion ratio the between scatter class   must be max-
imised, whilst the within scatter class  must be minimised. However, Fisher Discriminant
Criterion could become unstable once the within scatter class   was singular. The within scat-
ter class  being singular can be caused by the number of faces in a probe set being smaller than
the number of pixels present in each face image. We shall now describe how Linear Discriminant
Analysis has been applied to face recognition.

Belhumeur et al. (1997) were motivated to develop a tolerant variant of the Fisher Discriminant
Criterion they called Fisherfaces. Fisherfaces were described as a stable approach that prevented
the within scatter class  from becoming singular and also maximising the ratio for the between
scatter class  and the within scatter class  . Fisherfaces are defined in terms of finding a small
number of features


 
  (3.17)

where  is defined by eq. 3.18 and 


 is defined by eq. 3.19.

The equation for  is

   (3.18)

where u is the optimal eigenvectors for the covariance matrix and   is the within scatter class.

For 
 , this is


 


  
 

(3.19)

where  is the transposition of   .

Similarly, Swets & Weng (1996) defined a similar approach to Belhumeur et al. (1997), where
they proposed using two projections to solve the problem of the within scatter class   being
singular. They proposed using Karhunen-Loève expansion to reduce the face dimensionality by

23
optimal projection. The resultant principal components of the face would then be projected as in-
put to Linear Discriminant Analysis. In their experiment, the features found by Karhunen-Loève
expansion were compared to the features found by the Linear Discriminant Analysis, to eval-
uate their performance. The performance measured how well the Karhunen-Loève expansion
and Linear Discriminant Analysis captured variance from the dataset. Their experiment used the
dataset supplied by the Weizmann Institute (Weizmann Face Database 2000). The dataset con-
tained small variations in scale, pose and position. However, the dataset also contained variable
lighting and face expressions. Accordingly to their experiments, they highlighted that Linear Dis-
criminant Analysis outperformed the Karhunen-Loève expansion. They had based their outcome
on Linear Discriminant Analysis capturing 95% of the variance with 15 features when compared
to the Karhunen-Loève expansion capturing 95% variance with 35 features. Furthermore, they
found that Linear Discriminant Analysis was not effected by lighting variations present in face
images.

However, as Swets & Weng (1996) used two projections to prevent the within scatter class  
becoming singular, Zhao et al. (1999) instead used a small positive number  that was added
to the within scatter class  to prevent the within scatter class  becoming singular. This
ensured the within scatter class  would always be positive. They also emphasised using a
modified weighted Euclidean distance metric to improve the recognition accuracy of faces. The
modified weighted Euclidean distance metric was defined by weighting the regularised variances
(eigenvalues). They compared their modified weighted distance metric to the standard weighted
distance metric and unweighted distance metric for face recognition. According to their ex-
periment, the modified weighted Euclidean distance metric performed better than the other two
distance metrics (Zhao et al. 1999).

Whilst Kim et al. (2002) observed the limitation of Linear Discriminant Analysis using a single
set of features for a gallery set, they proposed using a Linear Discriminant Analysis mixture
model. The Linear Discriminant Analysis mixture model fundamentally captured several differ-
ent sets of features for a gallery set. They described the Linear Discriminant Analysis mixture
model consisted of a Principal Component Analysis mixture model to firstly reduce the dimen-

24
sionality of faces and then a Linear Discriminant Analysis mixture model used to find the differ-
ent sets of features. The Linear Discriminant Analysis mixture model used the standard Fisher
Discriminant Criterion ratio to maximise the within scatter class   and the between scatter class
 . They had used the PSL dataset (Wang & Tan 2000) taken from the MPEG7 community to
test their proposal. The dataset contained 271 different individuals and contained lighting and
pose variations. Comparing the Principal Component Analysis mixture model against the Linear
Discriminant Analysis mixture model, their experiments showed the Principal Component Anal-
ysis mixture model performed better than the Linear Discriminant Analysis mixture model for
a small number of features (see Martinez & Kak (2001) for a comparison of Principal Compo-
nent Analysis and Linear Discriminant Analysis). On the other hand, they claimed the Linear
Discriminant Analysis mixture model performed better than the Principal Component Analysis
mixture model for features greater than 35.

In contrast, Lu et al. (2003) proposed a new Linear Discriminant Analysis method called Direct-
Fractional Step Linear Discriminant Analysis. Direct-Fractional Step Linear Discriminant Anal-
ysis combined two existing Linear Discriminant Analysis approaches of Fractional-Step Linear
Discriminant Analysis (Lotlikar & Kothari 2000) and Direct-Linear Discriminant Analysis (Yu
& Yang 2001). Fractional-Step Linear Discriminant Analysis outlined the reduction of the face
dimensionality by a small number of iterations, whereas Direct-Linear Discriminant Analysis
directly classified the original high-dimensionality of face images. Furthermore, Direct-Linear
Discriminant Analysis minimised the loss of discriminatory information by evaluating whether
the null space, features with variances close to zero, provided any discriminatory information
that facilitated the identification of individuals (Chen et al. 2000). Direct-Fractional Step Lin-
ear Discriminant Analysis was described as using the Direct-Linear Discriminant Analysis ap-
proach to find a small number of features to form the feature subspace, which also took into
account the null space. Thereafter, the Direct-Fractional Step Linear Discriminant Analysis used
Fractional-Step Linear Discriminant Analysis to further reduce the feature subspace by remov-
ing the smallest variances (eigenvalues) after a number of iterations. The resulting small number
of features were then used to perform face recognition. In their experiments, they used the

25
ORL dataset (AT&T Face Database 1994) provided by AT&T Laboratories Cambridge and the
UMIST dataset (UMIST Face Database 2000). The ORL dataset contained 400 images of 40
individuals, having 10 images per individual and included variations in lighting and face expres-
sions. In addition, faces were taken at different times and a few faces had glasses. From the 400
images in the ORL dataset, 200 were used for training and 200 were used as testing. Whilst the
UMIST dataset had 575 images of 20 individuals, where the UMIST dataset contained profile
or frontal views of individuals. From the 575 images from the UMIST dataset, they randomly
selected 160 images for training and the remaining 415 images were used for testing. From their
experiments, the Direct-Fractional Step Linear Discriminant Analysis achieved a low error rate
of 4% when using 22 features for the ORL dataset. However they did not state the lowest error
rate on the UMIST dataset, but showed a figure graph indicating approximately 2.5% when using
12 features.

3.3 Summary

We have discussed in this chapter several Karhunen-Loève expansion and Linear Discriminant
Analysis approaches. However, these two holistic approaches are fundamentally different as
Karhunen-Loève expansion finds a small number of features by the principal components of the
face, whereas, Linear Discriminant Analysis finds a small number of features by maximising the
grouping of faces from the same individual and minimising the grouping of faces from different
individuals. As holistic approaches represent global information of faces, the disadvantage of
this approach is the variances captured may not be relevant features of the face. Therefore one
advantage of using feature-based approaches is that they attempt to accurately capture relevant
features from face images. In the next chapter, we shall discuss feature based approaches, which
use a priori information to uniquely identify individuals by their facial features.

26
Figure 3.2: First Eight Principal Components of the Upper Portion of the Face (Sirovich & Kirby
1987) From top to bottom, left to right, the 1st principal component is the top left, 4th principal component
is the bottom left, 5th principal component is the top right and 8th principal component is the bottom right

27
Chapter 4

Feature Based Face Recognition

Feature based face recognition uses a priori information or local features of faces to select a
number of features to uniquely identify individuals. Local features include the eyes, nose, mouth,
chin and head outline, which are selected from face images. In this chapter, we will describe
general feature based approaches and the Elastic Bunch Graph Matching.

In the General Approaches section, we describe the work of Craw et al. (1992) where they applied
a priori information to find face features and a biologically motivated approach by Manjunath
et al. (1992).

In the Elastic Bunch Graph Matching section, we shall briefly describe a predecessor to Elastic
Bunch Graph Matching, called Dynamic Link Architecture by Lades et al. (1993). We shall
then describe Elastic Bunch Graph Matching which was proposed by Wiskott et al. (1999). Ad-
ditionally, we will also focus on other work on Elastic Bunch Graph Matching (Wiskott et al.
1995, Kalocsai et al. 2000). We now begin with the general approaches of feature based face
recognition.

28
4.1 General Approaches

The general approaches to feature based face recognition are concerned with using a priori infor-
mation of the face to find local face features. Alternatively, another general approach is to find
local significant geometries of the face that correspond to the local features of faces. We will
now discuss the general approaches that have been applied to face recognition.

Craw et al. (1992) were motivated to locate features within faces. Their approach utilised a priori
information to accurately find local features. Their implementation consisted of two parts: the
first part was designed to identify individual features, such as the eyes (general location of the
eyes), eye (iris and surrounding white around the iris), chin, cheek, hair, jaw-line, mouth, mouth-
bits (edges and outline of the lips), head outline and the nose; the second part refined the features
found from the first part by using a priori information to locate 40 pre-defined face features.
Specifically, the process of finding these 40 pre-defined features were to initially find the head
outline within face images. By finding the head outline, this facilitated finding other face features.
They employed a polygonal shape as the initial head outline. An initial shape score (Craw et al.
1992) was used to measure how close the initial head outline matched the face. The head outline
was then iteratively deformed to match the average head outline for all face images. Allowable
deformation included scale, orientation and location. Additionally, the iterative deformation of
the head outline was guided by the secondary shape score (Craw et al. 1992) that indicated
how close the orientation and scale of the head outline matched the face image. Once the head
outline was found, they could concentrate on finding other face features based on the a priori
information of the face, within the boundaries of the head outline. From their experiments,
they reported a success rate of approximately 95% for 64 images based on a moving sequence
of a single person. They also experimented with 50 still frontal views of different individual
faces. Their approach successfully found 43 complete head outlines. However, they stated the
remaining head outlines found were only partial head outlines and were caused by the variations
around the chin and mouth of individual faces. Furthermore they indicated an error rate of 6%
was found for inaccurately identifying the areas of the chin and mouth of faces. Therefore, they

29
concluded the error rates could be directly attributed to faces having beards and moustaches.

In contrast, Manjunath et al. (1992) proposed a method that did not utilise a priori information to
find face features, but, found significant curvature changes within faces that corresponded to the
face features. Their approach recognised faces in three stages. The first stage extracted features
from faces by utilising Gabor wavelets (Manjunath et al. 1992, equation 1). In the second stage
they created a graph like data structure that was used to represent the face features found as a
collection of interconnected nodes. There were two types of graphs created; an input face graph
represented a probe face and a model face graph represented a gallery face. Therefore, in the
third stage their approach matched the input face graphs to the model face graphs in order to
determine the identity of the probe set in relation to the gallery set. Specifically, the matching
process determined how similar an input face graph was to a gallery face by its distance. Hence,
a probe face with the smallest distance to a gallery face inferred a match. In their experiments,
they used a dataset of 306 faces. The dataset contained approximately 2 to 4 images of 86
individuals. Additionally, the dataset contained variations in facial expression, orientation and
minor variations in position and scale. They found a 86% recognition rate when the probe face
was within the first rank. Furthermore, a 96% recognition rate was found when the probe face was
within the top three ranked matches. In the next section, we shall describe a similar biologically
motivated approach.

4.2 Elastic Bunch Graph Matching

Elastic Bunch Graph Matching recognises faces by matching the probe set represented as the
input face graphs, to the gallery set that is represented as the model face graph. Fundamental to
the Elastic Bunch Graph Matching is the concept of nodes. Essentially, each node of the input
face graph is represented by a specific feature point of the face. For example, a node represents
an eye and another node represents the nose and the concept continues for representing the other
face features. Therefore the nodes for the input face graph are interconnected to form a graph
like data structure which is fitted to the shape of the face as illustrated in figure 4.1.

30
Figure 4.1: Face Graph (Wiskott et al. 1999)

In contrast, the model face graph represents the gallery set only used one model face graph to
represent the entire gallery set. However, the model face graph can be conceptually thought of
as a number of input face graphs stacked on top of each other and concatenated to form one
model face graph, with the exception that this is applied to the gallery set instead of the probe
set. Therefore, this would allow the grouping of the same types of face features from different
individuals. For example, the eyes of different individuals could be grouped together to form the
eye feature point for the model face graph and the noses of different individuals can be grouped
together to form the nose feature point for the model face graph. An illustration of the model
face graph is shown in figure 4.2.

Given the definition for the input face graph and model face graph, to determine the identity for
the input face graph is to achieve the smallest distance in relation to the the model face graph for
a particular gallery face. The distance is determined by the node similarity measure for the input
face graphs to the model face graph, (for example, see Wiskott et al. (1999)).

A predecessor to Elastic Bunch Graph Matching, was Lades et al. (1993) proposal of an exten-
sion to Artificial Neural Networks, called Dynamic Link Architecture. Dynamic Link Archi-
tecture grouped sets of neurons into more symbolic representations. The purpose of grouping
the neurons would facilitate in object recognition and also be invariant to position and other
distortions. The focus of Dynamic Link Architecture was to demonstrate the method’s perfor-

31
Figure 4.2: Model Face Graph (Wiskott et al. 1999) A single model face graph is used to
represent the entire gallery set. Generally, feature points of the model face graph are represented
by grouping together the same type of features from different individuals. The same type of
features could be the eyes or noses and the other relevant face features. By different individuals
we mean they are taken from the gallery set. A feature point of the model face graph refers to a
group of the same type of features taken from different individuals. For example, the eye feature
point of the model face graph is represented by a group of eyes taken from different individuals.

mance by recognising faces. They added that Dynamic Link Architecture was not specifically
tuned for face recognition, but instead was intended to be used for generic object recognition.
Lades et al. (1993) initially proposed a complete Artificial Neural Networks approach to be im-
plemented. However, for ease of implementation, they chose to implement the Elastic Graph
Matching method instead. The implementation defined that each node represented a single fea-
ture of the face, they called a jet. The jets were found by using Gabor wavelet transform (Wiskott
et al. 1999). The nodes were connected by edges that defined the relative distance between two
nodes. The nodes were interconnected to form a graph that represented the face features. Recog-
nition of faces were achieved by building a input face graph and then matching it to the model
face graph to find the best match.

32
Wiskott et al. (1999) extended Dynamic Link Architecture implementation to have multiple jets
for each node. The multiple jets represented the same types of features from different individuals,
which we described earlier at the start of this section. Therefore they called their extension Elastic
Bunch Graph Matching. In addition to proposing this extension, they employed a coarse to fine
strategy to find the best matching model face graph to the input face graph. At a coarse level,
they used a similarity measure that did not consider phase variation (Wiskott et al. (1999), eq.
5), where phase is the spatial frequency representing the face image pixels. Whilst at a finite
level, phase variation (Wiskott et al. (1999), eq. 7) was considered because phase variations was
caused by the translation of the face image pixels. Therefore by minimising phase variations,
this improves finding the jets, ultimately leading to greater accuracy in identifying individuals.
From their experiment they used the FERET (Phillips et al. 2000) and the Bochum (Bochum Face
Database 1995) dataset. They used 250 frontal views of individuals from the FERET dataset as
the model face graph. Their results on the FERET dataset found a 98% recognition accuracy,
where 245 out of the 250 input face graphs that contained variations in facial expression were
correctly identified. For the Bochum dataset they used 108 neutral frontal views for the model
face graph. They found 94% recognition accuracy where 102 out of the 108 input face models
with variations in facial expressions were correctly recognised. Also see Okada et al. (1998) for
details on their preparation and outcome from the FERET Phase III Test.

Wiskott et al. (1995) proposed a specific implementation of Elastic Bunch Graph Matching that
identified the gender of individuals. They proposed to use high level topological information
to describe the nodes of the model face graph, whereby the individual nodes of the model face
graph were labelled as either male, female, bearded or having glasses. The gender of the input
face graph was determined when the majority of the matching nodes from the model face graph
were labelled either male or female. For example if the majority of the nodes were labelled male,
the input face graph gender was determined as being male. In their experiment they used 112
faces with neutral front views. Of those 112 faces, 65% of faces were male, 28% of faces wore
glasses and 19% of faces had beards. They found they achieved 90.2% for gender identification,
90.2% correct detection of face wore glasses and 92.9% correct detection of faces having beards.

33
An improvement to the Elastic Bunch Graph Matching method was proposed by Kalocsai et al.
(2000). In their investigation, they explored the effect of weighting Gabor kernels to improve
face recognition, where 40 Gabor kernels were produced from 48 feature points of the face. They
found from using a dataset of Caucasian faces that the most discriminatory face features were
situated around the forehead and eyes. In contrast, the least discriminatory face features were the
mouth, nose, cheeks and the lower outline of the face. They also used a dataset of Asian faces and
found similar results to the Caucasian faces, where the most discriminatory face features from
the Asian dataset were the forehead and eyes. However, they also found other discriminatory
face features that included the nose and cheeks. They concluded the highest weighted kernels
would provide a more compact representation of faces and achieve higher recognition rates by
using the highest weighted kernels as compared to the lowest weighted kernels.

4.3 Summary

We have described in this chapter, feature-based approaches where we have covered general
approaches and the Elastic Bunch Graph Matching. We have described that feature-based ap-
proaches apply a priori information to find local faces features that are used to uniquely identify
individuals. We have seen in our description of general approaches that using a priori informa-
tion constrains the search space for finding and locating face features, as well as, being used to
conceptually relate the features found to the high-level semantics of the face. In contrast, Elastic
Bunch Graph Matching uses a biologically motivated approach to finding and locating the face
features. It also represented the features found as a graph like structure that semantically resem-
bled the shape of the human face. In the next chapter we shall evaluate the performance of a
holistic approach and a feature-based approach in our experiments.

34
Chapter 5

Experiments

The aim of the experiments is to compare how accurate the Eigenface (Turk & Pentland 1991)
and elastic bunch graph matching (Wiskott et al. 1999) implementation can identify individuals
from varying face conditions.

The basis for our comparison of a holistic and a feature-based approach is achieved by using the
Eigenface, which was developed by the Massachusetts Institute of Technology Media Laboratory
Vision and Modelling Group, and the Elastic Bunch Graph Matching, which was developed by
the Colorado State University Computer Science Department, where we have modified their
programs for the purposes of enabling an objective comparison.

In the dataset section, we shall describe the AR Face database (Martinez & Benavente 1998) that
was used in our experiments. In the results section, we test the accuracy on the dataset, where
we will highlight the results obtained by examining the Eigenface and Elastic Bunch Graph
Matching implementation. We conclude with a discussion on the performance of the Eigenface
and the Elastic Bunch Graph Matching. We now begin by describing the dataset.

35
neutral expression wearing sun glasses
smile wearing sun glasses & lighting left side of face
anger wearing glasses & lighting right side of face
scream wearing a scarf (covering mouth & neck)
lighting left side of face wearing a scarf lighting left side of face
lighting right side of face wearing a scarf lighting right side of face
lighting all sides of face

Table 5.1: 13 Face Variations: The face images were taken in two sessions, separated by 14 days apart.
The face images taken captured 13 different defined face variations

5.1 Dataset

We chose the AR Face Dataset (Martinez & Benavente 1998) as we wanted to use a different
dataset to the FERET database (Phillips et al. 2000), so we could independently evaluate the
Eigenface and Elastic Bunch Graph Matching implementation. However since we wanted to
compare these two approaches we utilised the cumulative recognition rate, which was used in
the FERET program (Phillips et al. 2000). This cumulative recognition rate would therefore
allow us to objectively compare the performance of these two approaches.

The gallery set contained 126 individuals of 70 men and 56 women. All 13 face variations as
outlined in table 5.1, were based on a frontal view of the face. The original gallery set had an
image dimension of 768 by 576 pixels, but we rescaled the images to 128 by 96 pixel for our
experiments. Specifically, the AR Face Dataset consisted of two sessions that were taken 14 days
apart.

In relation to the experiments, we had used 2 sets to represent the 126 individuals as the gallery
set. The first gallery set consisted of 133 images of the 126 individuals with only neutral expres-
sions that were taken from the first session. The second gallery set had 119 images of the 126
individuals with only neutral expressions and were taken from the second session. For the probe
sets, a combined total of 3,030 different faces were used and corresponded to the 126 different

36
individuals but these images were the other 12 face variations and did not include the neutral
face expression. The probe set was divided into 2 sets, where the 1st set had 1,596 face images
taken from the first session of the AR Face Dataset, and the 2nd set had 1,434 face images taken
from the second session of the AR Face Dataset. We emphasise that the probe sets were not part
of the gallery set as they did not consist of neutral expressions.

5.2 Results

Our criteria for evaluating the accuracy of the Elastic Bunch Graph Matching and Eigenface
approach was based on the cumulative recognition rate by Phillips et al. (2000), which we defined
earlier in chapter 2.

We had systematically tested the Eigenface and Elastic Bunch Graph Matching by a process of
testing the 1st probe set against the 1st gallery set and testing the 2nd probe set against the 2nd
gallery set.

In our first experiment we performed a series of tests on the Eigenface approach. One test was
to measure the identification performance. Using the gallery sets, we incrementally trained the
system by varying the number of Eigenfaces used to train the system. For the first gallery set we
trained between 1 to 133 Eigenfaces. While for the second gallery set we had trained between
1 to 119 Eigenfaces. By varying the number of Eigenfaces, we found the highest cumulative
recognition rates were achieved when we trained 133 and 119 Eigenfaces for the first and second
gallery set respectively. As seen in figure 5.1, training 133 Eigenfaces for the 1st Set found a
cumulative recognition rate of 71.2% and training 119 Eigenfaces for the 2nd Set found a 71.9%
cumulative recognition rate.

Our next experiment tested the Elastic Bunch Graph Matching. As stated in Wiskott et al. (1999),
they had manually selected the face feature locations by hand. We also manually selected the
feature locations of the face, but only for a small number of images. Thereafter, we decided to
use the Elastic Bunch Graph Matching to automate the selection of the remaining features for

37
72

70

Cumulative Recognition Rate


68
1st Dataset
2nd set

66

64

0 10 20 30 40 50
Rank

Figure 5.1: Eigenface Results For the 1st Set 133 Eigenfaces were used to train the system. For the 2nd
Set, 119 Eigenfaces were trained. The rank for figure 5.1, represents the cumulative individual rank for
probe images on the gallery. The cumulative recognition rate is defined as the identification accuracy for
the Eigenface approach.

the two gallery sets. We used a narrowing local search approach from the elastic bunch graph
matching to automate feature selection. Interestingly, the features found from this automated
selection when compared to our manually selected features were just as accurate, therefore we
used the automatic selected features to train the system.

Another important aspect of Elastic Bunch Graph Matching was deciding on the type of simi-
larity measure for matching the probe and gallery sets. We decided to use a predictive iterative
search for the similarity measure. The predictive iterative search estimated a given feature lo-
cation and attempted to find the highest similarity for each feature by iteratively finding the
maximum similarity measure. It stopped once the maximum similarity measure found could no
longer be increased (Bolme 2003). In figure 5.2, we achieved a 95.0% cumulative recognition
rate for the 1st set. In comparison, we achieved a 97.3% cumulative recognition rate for the 2nd
set.

38
100

Cumulative Recognition Rate


90

1st Set
80
2nd Set

70

60
0 10 20 30 40 50
Rank

Figure 5.2: Elastic Bunch Graph Matching Results We achieved these results by using a local nar-
rowing search to locate face features at specific locations of the face. We then used an iterative predictive
search for measuring the match between the probe and gallery set based on their similarity measure. Iter-
ative predictive search, found the maximum similarity measure for each feature. Whereby it continuously
searched at a specific location of the faces in order to maximise the similarity measure for each feature.
It stopped its search once the maximum similarity measure could not be further increased. The rank for
figure 5.2, represents the cumulative individual rank for probe images on the gallery. The cumulative
recognition rate is defined as the identification accuracy for the Elastic Bunch Graph Matching approach.

For figures 5.3 and 5.4, we show a comparison of the performances of the Eigenface and Elastic
Bunch Graph Matching on the 1st and 2nd dataset of the (Martinez & Benavente 1998). For
both the 1st and 2nd Set the elastic bunch graph matching outperforms the Eigenface approach
for lower recognition rank but comparably achieves a relative similar cumulative recognition rate
for the first rank. This was indicated by the Eigenface achieving 63.8% for the 1st Set and 65.8%
for the 2nd Set as compared to the Elastic Bunch Graph Matching, which achieved 62.9% for the
1st Set and 64.9% for the 2nd Set.

39
100

Cumulative Recognition Rate


90

Elastic Bunch Graph Matching


80
Eigenface

70

60
0 10 20 30 40 50
Rank

Figure 5.3: 1st set Eigenface & Elastic Bunch Graph Matching We compare the cumulative recogni-
tion rate on the 1st set for the Eigenface and Elastic Bunch Graph Matching

5.3 Discussion

Overall, the Eigenface and Elastic Bunch Graph Matching performed better on the 2nd set than
the 1st set for figures 5.1 and 5.2. Our primary interpretation is that the number of faces in
the probe set of the 2nd set is less than the 1st set. Therefore this implies that either: this can
constrain the number of tests being performed yielding higher recognition rate, however, it could
be the opposite, where instead it leads to higher error rates; or, the images from the probe set of
the 2nd set are more similar to the gallery set of the 2nd set in contrast to the probe and gallery
set of the 1st set.

By comparing the performance of the Eigenface and Elastic Bunch Graph Matching approach,
the cumulative recognition rate indicates the Elastic Bunch Graph Matching in this instance had a
higher identification rate. Reasons attributed to the Elastic Bunch Graph Matching performance
indicates two factors, the Gabor wavelets used to capture the features did accurately find features
around particular locations of the face. In addition, when face features were covered by objects

40
100

Cumulative Recognition Rate


90

Elastic Bunch Graph Matching


80
Eigenface

70

60
0 10 20 30 40 50
Rank

Figure 5.4: 2st set Eigenface & Elastic Bunch Graph Matching We compare the cumulative recogni-
tion rate on the 2st set for the Eigenface and Elastic Bunch Graph Matching

such as sun glasses and scarves the system relied on the other features to perform recognition. It
also seemed that even though the features were automatically selected, as opposed to manually
selecting the features, the system still achieved a high recognition rate.

On the other hand the Eigenface approach did not comparably perform as well. A few reasons
can be attributed to the Eigenface performance; critical to the performance was the variance to
uniquely identify gallery faces. Some probe faces matched to gallery faces were not within the
defined face space as these probe faces contained vast variations in illumination and changes in
the shapes of faces. This was attributed to faces wearing a scarf or wearing sun-glasses. Another
factor was the probe sets deviated substantially from the average face for the gallery set. Hence,
we gather that because the variance of the probe sets are vastly different to the variance of the
gallery set this has directly effected the performance of the Eigenface approach.

41
Chapter 6

Conclusions

Face recognition is a difficult problem because faces can vary substantially in their orientation,
lighting, scale and facial expression. The first goal was to provide a survey of recent holistic and
feature based approaches that complement previous surveys. We described holistic approaches
including Karhunen-Loève expansion and Linear Discriminant Analysis. The holistic approach
has advantages of distinctly capturing the most prominent features within face images used, to
uniquely identify individuals amongst a gallery set, as well as, automatically finding features.
However, disadvantages of holistic approaches are that recognition performance could be sig-
nificantly affected by: a probe set deviating from the average face of a gallery set because of
lighting, orientation and scale; or, features found from faces may not form part of the face but
some other feature captured. For example, capturing features from the background of a face.
This led to our discussion on feature based approaches, where we described general approaches
and elastic bunch graph matching. The advantages for feature based approaches include the
accurate selection of facial features to uniquely identify individuals and its robustness in recog-
nition performance despite variations in face expression, faces being occluded by another object,
orientation or lighting. On the other hand, the disadvantages of this approach were that if fea-
tures were selected manually then this could contain inaccurate location of features by the human
user, or, if features were automatically selected then this would be reliant on the accuracy of the

42
feature based approach, hence leading to inaccurate location of face features.

The second goal was to compare the performance of a holistic and a feature based approach on the
AR face database (Martinez & Benavente 1998). In our comparison, we found the Elastic Bunch
Graph Matching outperformed Eigenface approach when we tested these two approaches on the
AR face database. The tests were divided into two sets: for the 1st set the cumulative recogni-
tion rate for Elastic Bunch Graph Matching was 95.2% while the Eigenface approach achieved
71.2%; in the 2nd set, the cumulative recognition rate for Elastic Bunch Graph Matching was
97.5%, whereas, the Eigenface approach was 71.9%. Closer analysis of the results showed that
for the first rank, the Eigenface and Elastic Bunch Graph Matching approach achieved relatively
equal cumulative recognition rate, but, for the remaining ranks the Eigenface approach did not
increase its cumulative recognition rate because faces were too different from the average face
of the gallery set. In comparison, the Elastic Bunch Graph Matching incrementally increased its
cumulative recognition rate as it progressed towards lower ranks.

In our future work, we would like to extend our experiments to evaluate a wider breadth of
holistic and feature based approaches. The Eigenface approach in our experiment was limited
as it did not scale to larger datasets. This was because the gallery and probe set needed to be
stored in-memory in order to perform face recognition. We would like to extend the Eigenface
approach to include a more scalable in-memory version to handle much larger datasets. Future
work for the face recognition area includes many research directions including use of colour
two-dimensional images, three-dimensional models as opposed to using two-dimensional images
to handle varying orientations, adequate size and scale of faces and the handling of lighting
variations.

Interestingly, just as the research community continues to improve existing face recognition ap-
proaches, face recognition has also become commercially viable, with face recognition vendors
reporting they can achieve robust and high recognition rates. However to verify these claims,
in a recent independent face recognition vendor evaluation by Phillips et al. (2003a) they eval-
uated 10 leading commercial vendors of face recognition. They found that Cognitec Systems

43
GmbH’s FaceVACS, Eyematic’s Visual Sensing Software and Indetix’s FaceIT were amongst the
top three for the highest recognition accuracy achieved. Therefore, we provide a brief account
of these three leading commercial vendor’s face recognition systems. The Cognitec Systems
GmbH’s FaceVACS can be classified as feature based approach. This system recognises faces in
the following process:

1. locate the position and size of faces; determine the positions of the eyes

2. checks whether the image quality taken is adequate for face recognition

3. faces are normalised by scaling the faces to a fixed size and positioning the eyes to a fixed
location

4. faces are pre-processed using standard image processing techniques

5. face features are extracted

6. if building a gallery set, features for each face are stored as part of the gallery set

7. features from the gallery set are compared to the probe set to determine a match

8. match is determined by the setting of a threshold, so if the match score is higher than the
threshold this constituted a match

Recently, the Cognitec Systems GmbH’s FaceVACS was installed at Sydney’s International Air-
port, as part of the SmartGate system. The software was used to verify the identity of all air crew
before they boarded the aeroplanes. Similarly, Eyematic’s Visual Sensing Software is also con-
sidered a feature based approach. The systems consisted of four components of locating faces
within images, then finding local feature points, extracting those local feature points and then
matching the features found to a gallery set of faces. Also finding local feature points were based
a on threshold of acceptance, where if the local feature points were above the threshold then this
would indicate a match. Indetix’s FaceIT used local feature of the faces to perform face recog-
nition and was also considered a feature based approach. Thus for Phillips et al. (2003a) vendor

44
evaluation, the identification recognition rate achieved by Cognitec Systems GmbH’s FaceVACS
was 87% (Phillips et al. 2003b, fig. 7), second was Indetix’s FaceIT with 84% (Phillips et al.
2003b, fig. 7) and the third highest was Eyematic’s Visual Sensing Software with 80% (Phillips
et al. 2003b, fig. 7).

Commercial face recognition systems have provided comparable performances to those being
developed by the research community. However, we have seen that biologically motivated face
recognition approaches have achieved robust and accurate results to this present. We envisage as
researchers work more closely with certain aspects of the human cognition and perception, their
approaches shall yield higher recognition rates as well as being be more robust to larger datasets.
Thus forth, in perspective, as we ideally strive to autonomously develop face recognition systems,
shall we also conceptually model the characteristics of the human cognition and perception.

45
Bibliography

Adini, Y., Moses, Y. & Ullman, S. (1997), ‘Face Recognition: The problem of Compensating for Changes in
Illumination Direction’, IEEE Transaction on Pattern Analysis and Machine Intelligence 19(7), 721–732.

AT&T Face Database (1994), http://www.uk.research.att.com/facedatabase.html. AT&T Laboratories Cambridge.

Baron, R. (1981), ‘Mechanisms of Human Facial Recognition’, International Journal of Man-Machine Studies
15, 137–138.

Belhumeur, P., Hespanha, J. & Kriegman, D. (1997), ‘Eigenfaces vs. Fisherfaces: Recognition Using Class Specific
Linear Projection’, IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 711–720.

Benson, P. & Perret, D. (1991), Perception and Recognition of Photographic Quality Facial Caricatures: Implica-
tions for the Recognition of Natural Images, in V. Bruce, ed., ‘Face Recognition’, Vol. 3(1), Lawrence Erlbaum
Associates, East Sussex, England, United Kingdom, pp. 105–135.

Benton, A. (1980), ‘The Neuropsychology of Facial Recognition’, American Psychologist 35, 176–186.

Beymer, D. & Poggio, T. (1995), Face Recognition From One Example View, A.i. memo no. 1536, Artificial
Intelligence Laboratory, Massachusetts Institute of Technology, Massachusetts, United States.

Bochum Face Database (1995), http://www.neuroinformatik.ruhr-uni-bochum.de/top.html. Institut für Neuroinfor-


matik, Ruhr-University.

Bolme, D. (2003), Elastic Bunch Graph Matching, Master thesis, Colorado State University, Fort Collins, Colorado,
United States.

Bruce, V. & Young, A. (1986), ‘Understanding Face Recognition’, British Journal of Psychology 77, 305–327.

Brunelli, R. & Poggio, T. (1993), ‘Face Recognition: Features versus Templates’, Pattern Analysis and Machine
Intelligence 15(10), 1042–1052.

Burges, C. (1998), ‘A Tutorial on Support Vector Machines for Pattern Recognition’, Data Mining and Knowledge
Discovery 77, 1–43.

Chellappa, R., Wilson, C. & Sirohey, S. (1995), ‘Human and Machine Recognition of Faces: A Survey’, Proceedings
of the IEEE 83(5), 705–741.

Chen, L., Liao, H., Ko, M., Lin, J. & Yu, G. (2000), ‘A new lda-based face recognition system which can solve the
small sample size problem’, Pattern Recognition 33(10), 1713–1726.

Clowes, M. (1971), ‘On Seeing Things’, Artificial Intelligence 2, 79–112.

46
Colmenarez, A. & Huang, T. (1998), Face Detection and Recognition, in H. Wechsler, P. J. Phillips, V. Bruce, F. F.
Soulié & T. S. Huang, eds, ‘Face Recognition: From Theory to Applications’, NATO ASI Series F, Springer-
Verlag.

Craw, I., Tock, D. & Bennett, A. (1992), Finding Face Features, in ‘European Conference on Computer Vision’,
pp. 92–96.

Ellis, H. (1986), Processes Underlying Face Recognition, in R. Bruyer, ed., ‘The Neuropsychology of Face Percep-
tion and Facial Expression’, Lawrence Erlbaum Associates, New Jersey, United States.

Fisher, R. (1936), ‘The use of multiple measures in taxonomic problems’, Ann. Eugenics 7, 179–188.

Fukunaga, K. (1990), Introduction to Statistical Pattern Recognition, Computer Science and Science Computing,
2nd edn, Academic Press, New York.

Georghiades, A., Kriegman, D. & Belhumeur, P. (1998), Illumination Cones for Recognition Under Variable Light-
ing Face, in ‘Conference Proceedings on Computer Vision and Pattern Recognition’, California, United States,
pp. 52–58.

Goldstein, A., Harmon, L. & Lesk, A. (1971), ‘Identification of Human Faces’, Proceedings of the IEEE 59, 748–
760.

Guo, G., Li, S. & Chan, K. (2000), Face recognition by Support Vector Machines, in ‘Proceedings of Fourth IEEE
Conference Automatic Face and Gesture Recognition’, Grenoble, France, pp. 196–201.

Harmon, L., Khan, M., Lasch, R. & Ramig, P. (1981), ‘Machine Identification of Human Faces’, Pattern Recognition
13, 97–110.

Hay, D. & Young, A. (1982), The Human Face, in A. Ellis, ed., ‘Normality and Pathology in Cognitive Functions’,
Academic Press, New York, United States, pp. 173–202.

Hécaen, H. & Angelergues, R. (1962), ‘Agnosia for Faces (Prosopagnosia)’, Archives of Neurology 7, 92–100.

Johnsson, K., Kittler, J., Li, Y. & Matas, J. (1999), Support Vector Machines for Face Authentication , in ‘British
Machine Vision Conference’, Nottingham, United Kingdom, pp. 543–553.

Kalocsai, P. & Biederman, I. (1998), Differences of Face and Object Recognition in Utilizing Early Visual Informa-
tion, in H. Wechsler, P. J. Phillips, V. Bruce, F. F. Soulié & T. S. Huang, eds, ‘Face Recognition: From Theory to
Applications’, NATO ASI Series F, Springer-Verlag.

Kalocsai, P., von der Malsburg, C. & Horn, J. (2000), ‘Face Recognition by statistical analysis of feature detectors’,
Image and Vision Computing 18, 273–278.

Kanade, T. (1973), Picture Processing System by Computer Complex and Recognition of Human Faces, Ph.D thesis
, Department of Information Science, Kyoto University, Japan.

Kaya, Y. & Kobayashi, K. (1972), A Basic Study of Human Face Recognition, in S. Watanabe, ed., ‘Frontiers of
Pattern Recognition’, Academic Press, New York, United States, pp. 265–289.

Kim, H., Kim, D. & Bang, S. (2002), Face Recognition using LDA Mixture Model, in ‘16th International Conference
on Pattern Recognition’, Vol. 2, pp. 486– 489.

Kirby, M. & Sirovich, L. (1990), ‘Application of the Karhunen-Loève Procedure for the Characterization of Human
Faces’, IEEE Transaction. Pattern Analysis and Machine Intelligence 12(1), 103–108.

47
Kohonen, T. (1984), Self-Organisation and Associative Memory, Springer-Verlag, Berlin, Germany.

Lades, M., Vorbrüggen, J., Buhmann, J., Lange, J., von der Malsburg, C., Würtz, R. & Konen, W. (1993), ‘Distortion
invariant object recognition in the dynamic link architecture’, IEEE Transactions on Computers 42(3), 300–311.

Lando, M. & Edelman, S. (1995), Generalisation From a Single View in Face Recognition, in ‘International Work-
shop on Automatic Face- and Gesture-Recognition’, Zürich, Switzerland, pp. 80–85.

Lotlikar, R. & Kothari, R. (2000), ‘Fractional-step dimensionality reduction’, IEEE Transactions on Pattern Analysis
and Machine Intelligence 22(6), 623–627.

Lu, J., Plataniotis, K. & Venetsanopoulos, A. (2003), Face Recognition using LDA-Based Algorithms, in ‘IEEE
Transactions on Neural Networks’, Vol. 14(1), pp. 195– 200.

Manjunath, B., Chellappa, R. & von der Malsburg, C. (1992), ‘A Feature Based Approach to Face Recognition’,
IEEE Conference Proceedings on Computer Vision and Pattern Recognition pp. 373–378.

Marr, D. (1980), A Computational Investigation into the Human Representation and Processing of Visual Informa-
tion, in J. Wilson, ed., ‘Vision’, W.H Freeman and Company, San Francisco, United States.

Martinez, A. & Benavente, R. (1998), The AR Face Database, Technical Report 24, Computer Vision Center,
Universitat Autónoma de Barcelona (UAB), Barcelona, Spain.

Martinez, A. & Kak, A. (2001), ‘PCA versus LDA’, IEEE Transactions on Pattern Analysis and Machine Intelli-
gence 23(2), 228–233.

Minsky, M. (1975), A Framework for Representing Knowledge, in P. Winston, ed., ‘The Psychology of Computer
Vision’, McGraw-Hill, New York, United States.

Moghaddam, B., Wahid, W. & Pentland, A. (1998), Beyond Eigenfaces: Probabilistic Matching for Face Recog-
nition , in ‘Proceedings of Third IEEE International Conference on Automatic Face and Gesture Recognition’,
Nara, Japan, pp. 30–35.

Moses, Y., Edelman, S. & Ullman, S. (1996), ‘Generalisation to Novel Images in Upright and Inverted Faces’,
Perception 25, 443–461.

Moses, Y. & Ullman, S. (1998), ‘Generalization to Novel Views: Universal, Class-based, and Model-based Process-
ing’, International Journal on Computer Vision 29, 233–253.

Okada, K., Steffens, J., Maurer, T., Hong, H., Elagin, E., H.Neven & von der Malsburg, C. (1998), The Bochum/USC
Face Recognition System And How it Fared in the FERET Phase III test, in H. Wechsler, P. J. Phillips, V. Bruce,
F. F. Soulié & T. S. Huang, eds, ‘Face Recognition: From Theory to Applications’, Springer-Verlag, pp. 186–205.

O’Toole, A., Abdi, H., Deffenbacher, K. & Valentin, D. (1993), ‘Low-dimensional representation of faces in higher
dimensions of the face space’, Journal of Optical Society America A 10(3), 405–411.

Pentland, A., Moghaddam, B. & Starner, T. (1994), View-Based and Modular Eigenspaces for Face Recognition, in
‘Proceedings of IEEE Conference on Computer Vision and Pattern Recognition CVPR ‘94’, Seattle, Washington,
pp. 84–91.

Phillips, P., Grother, P., Micheals, R., Blackburn, D., Tabassi, E. & Bone, M. (2003a), Face Recognition Vendor Test
2002 - Evaluation Report, Technical Report NISTIR 6965, DoD Counterdrug Technology Development Program
Office, Virginia, United States.

48
Phillips, P., Grother, P., Micheals, R., Blackburn, D., Tabassi, E. & Bone, M. (2003b), Face Recognition Vendor Test
2002 - Overview and Summary, Technical Report NISTIR 6965, DoD Counterdrug Technology Development
Program Office, Virginia, United States.

Phillips, P., Moon, H., Rizvi, S. & Rauss, P. (2000), ‘The FERET Evaluation Methodology for Face-Recognition
Algorithms’, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10), 1090–1104.

Rajagopalan, A., Burlina, P. & Chellappa, R. (1999), Higher Order Statistical Learning for Vehicle Detection in
Images , in ‘Proceedings. Seventh International Conference on Computer Vision’, Vol. 2, pp. 1204–1209.

Rhodes, G., Brennan, S. & Carey, S. (1987), ‘Identification and Ratings of Caricatures: Implications for Mental
Representations of Faces’, Cognitive Psychology 190, 473–497.

Samal, A. & Iyengar, P. (1992), ‘Automatic Recognition and analysis of human faces and facial expressions: A
survey’, Pattern Recognition 25(1), 65–77.

Samaria, F. (1994), Face Recognition using Hidden Markov Models, Ph.D thesis, Trinity College, University of
Cambridge, England.

Samaria, F. & Fallside, F. (1993), ‘Face Identification and Feature Extraction using Hidden Markov Models’, Image
Processing: Theory and Applications .

Samaria, F. & Young, S. (1994), ‘HMM-Based Architecture for Face Identification’, Image and Vision Computing
12(8), 537–543.

Sergent, J. (1989), Structural Processing of Faces , in A. Young & H. Ellis, eds, ‘Handbook of Research on Face
Processing’, Vol. 3(1), Elsevier Science Publishers B.V., Amsterdam, Netherlands, pp. 57–91.

Shepherd, J., Gibling, F. & Ellis, H. (1991), The Effects of Distinctiveness, Presentation Time and Delay on Face
Recognition, in V. Bruce, ed., ‘Face Recognition’, Vol. 3(1), Lawrence Erlbaum Associates, East Sussex, Eng-
land, United Kingdom, pp. 137–145.

Sirovich, L. & Kirby, M. (1987), ‘Low-dimensional procedure for the characterization of human faces’, Journal of
the Optical Society of America A 4(3), 519–524.

Stonham, J. (1986), Practical Face Recognition and Verification with WISARD, in H. Ellis, M. Jeeves, F. Newcombe
& A. Young, eds, ‘Aspects of Face Processing’, Martinus Nijhoff, Dordrecht, Netherlands.

Swets, D. & Weng, J. (1996), ‘Using Discriminant Eigenfeatures for Image Retrieval’, IEEE Transactions on Pattern
Analysis and Machine Intelligence 18(8), 831–836.

Tefas, A., Kotropoulos, C. & Pitas, I. (2001), Using support vector machines to enhance the performance of elastic
graph matching for frontal face authentication, in ‘IEEE Transactions on Pattern Analysis and Machine Intelli-
gence’, Vol. 23 (7), Grenoble, France, pp. 735–746.

Turk, M. (2001), ‘A Random Walk Through Eigenspace’, IEICE Transaction on Information and Systems E84-
D(12), 1586–1595.

Turk, M. & Pentland, A. (1991), ‘Eigenfaces for Recognition’, Journal of Cognitive Neuroscience 3(1), 71–86.

UMIST Face Database (2000), http://images.ee.umist.ac.uk/danny/database.html. University of Manchester Insti-


tute of Science and Technology.

Vapnik, V. (1998), Statistical Learning Theory, John Wiley & Sons, New York, United States.

49
Wang, L. & Tan, T. K. (2000), Experimental Results of Face Description Based on the 2nd-order Eigenface Method,
ISO/MPEG m6001, Panasonic Singapore Laboratories Pte Ltd (PSL).

Weizmann Face Database (2000), ftp://ftp.idc.ac.il/pub/users/cs/yael/Facebase/. Weizmann Institute.

Wiskott, L., Fellous, J., Krüger, N. & von der Malsburg, C. (1995), Face Recognition and Gender Determination, in
M. Bichsel, ed., ‘Proceedings of International Workshop on Automatic Face- and Gesture-Recognition’, Zürich,
pp. 92–97.

Wiskott, L., Fellous, J., Krüger, N. & von der Malsburg, C. (1999), Face recognition by elastic bunch graph match-
ing, in L.C. Jain et al., ed., ‘Intelligent Biometric Techniques in Fingerprint and Face Recognition’, CRC Press,
chapter 11, pp. 355–396.

Xi, D., Podolak, I. & Lee, S. (2002), Facial component extraction and face recognition with support vector machines,
in ‘Proceedings of Fifth IEEE Conference Automatic Face and Gesture Recognition’, Washington DC, United
States, pp. 76–81.

Yale Face Database (1997), http://cvc.yale.edu/projects/yalefaces/yalefaces.html. Yale University.

Yang, M., Ahuja, N. & Kriegman, D. (2000), Face Recognition Using Kernel Eigenfaces , in ‘Proceedings. Inter-
national Conference on Image Processing’, Vol. 1, Vancouver, Canada, pp. 37–40.

Young, A. & Bruce, V. (1991), Perceptual Categories and the Computation of “Grandmother”, in V. Bruce, ed.,
‘Face Recognition’, Vol. 3(1), Lawrence Erlbaum Associates, East Sussex, England, United Kingdom, pp. 5–49.

Yu, H. & Yang, J. (2001), ‘A direct LDA algorithm for high-dimensional data - with application to face recognition’,
Pattern Recognition 34(10), 2067–2070.

Zhao, W., Chellappa, R. & Phillips, P. (1999), Subspace Linear Discriminant Analysis for Face Recognition, Tech-
nical Report CAR-TR-914, Centre for Automation Research, University of Maryland, College Park, Washington.

Zhao, W., Chellappa, R., Rosenfeld, A. & Phillips, P. (2000), Face recognition: A literature survey, Technical Report
CAR-TR-948, Centre for Automation Research, University of Maryland, College Park, Washington, United
States.

Zobel, J. (1997), Writing for Computer Science: the art of effective communication, Springer-Verlag, Singapore.

50
Appendix A

Covariance Matrix

The following defines the covariance matrix for a face image. We define that A is the covariance
matrix, M is the size of the dataset,  is the normalised face,  is the transposition of the
normalised face,  is the face image,  defines the average face for the dataset and c is the
resultant covariance matrix vectors.

A
 
   
 
      


  



    


  ..              
 

 . 

   
 
 
                        
  ..
.
..
.
..
.




                     
           
 
 
..
.
..
.
..
.
..
. 
 (A.1)
           

51
Appendix B

Generalised Eigenproblem

In this example, we demonstrate how the associated eigenvectors and eigenvalues can be solved
for a small covariance matrix. Given the covariance matrix A is a 2  2 square matrix

 
   
  (B.1)

we firstly find the eigenvalues for the covariance matrix A, where the characteristic equation for
finding the eigenvalues must satisfy the following

      (B.2)

where  is the eigenvalues, I is the identity matrix and u is the eigenvectors.


We begin by substituting the covariance matrix from eq. B.1 and the identity matrix into the
characteristic equation eq. B.2

   
          (B.3)

To find the eigenvalues we solve the following

       
         
     (B.4)

52
    
     
    (B.5)

We must now find the determinant of eq. B.6 which is

 
   

   
  (B.6)

where the determinant of eq. B.6 produces an algebraic equation, which we can solve by factori-
sation of the following

       (B.7)

Therefore from eq. B.7 we find two eigenvalues,  = 5 and  = 3.


As we have found the eigenvalues for the covariance matrix A, the next step is to find the eigen-
vectors. We achieve this by selecting an eigenvalue and substituting that value into eq. B.8. In
this instance, we shall use  = 5 which produces

    

    
    (B.8)

the following is the eigenvector for eq. B.8

 
    (B.9)


thus we can prove that we have satisfied the characteristic equation, by finding the product of eq.
B.8

    

    
  

 (B.10)

53
having satisfied the characteristic equation, to verify that the eigenvalue  = 5 is derived from the
covariance matrix A, we calculate the product of the eigenvector and the covariance matrix A

    
    
  




(B.11)

and taking out the common factor the resultant is

 
   (B.12)


which verifies that the integer multiple of 5 confirms the eigenvalue  we had previously found.

54

You might also like