Consolidated New

SYSTEMATIC STUDY OF
ROBUST FACE
RECOGNITION
ii
ABSTRACT
Face recognition has become a striking exploration area in the recent
years. It finds an extensive range of applications beginning from video games,
Facebook to high security scenarios such as border control. Researchers find great
interest in cracking the challenges of face recognition such as pose, expression and
illumination variations and occlusion. Face recognition is becoming increasingly
accurate as the technology evolves. Due to the non-contact and remote sensing
capabilities of the face biometric, it is regarded as the only viable biometric trait to
be used in surveillance systems for subject identification. Integrating face
recognition capability in surveillance systems will significantly improve the
security of the monitoring area.
Among all other scenarios, the face recognition in surveillance
systems is significant for security applications, especially at night time when the
subject is far away from the camera. Cross Spectral face recognition is generally
required for nighttime surveillance as the registered face images in the gallery are
usually visual face images (VIS) of good quality but the probe images are in Near
iii
Infra-Red (NIR) modality. Images acquired in different spectral bands have a lot of
differences due to distinct imaging mechanism or imaging environment, etc. Also
the differences in standoff distances termed as Cross Distance face matching creates
an unequal image quality scenario due to illumination variations and this also have
to be accounted for. Considering the increased threats at night time, the proposed
approach for cross-spectral and cross-distance face recognition uses a common
feature-based representation for both NIR images as well as VIS images. This paves
way for bridging the modality gap and also compensates the various other
differences and thus yields way for increased recognition rates.
The long distance and night time face recognition system comprises
of major stages like face detection, feature extraction and feature matching. The
probe image is one of the long distance night time images from the database and
the gallery data consists of visible images taken at short indoor distances. The NIR
face images are detected using the popular Viola Jones face detection algorithm
and pre-processed to improve their quality so as to match the gallery data. Then
highly discriminative features are extracted from both the gallery data as well as
the probe data. For a long distance and night time face recognition system, the
biggest challenge to address cross-spectral (VIS vs NIR) and cross distance (1 m
vs 60 m or 100 m or 150 m) matching. Hence to make this possible, an additional
stage that helps in bridging the cross-spectral and cross-distance gap has been
carried out in the pre-processing stage.

iv
The proposed method for long distance and night time face
recognition was evaluated using the LDHF database. The database consists of
several classes (100 classes) each with single sample and therefore classifiers were
not used for matching. The results show that the proposed method performs better
at higher distances like 150 m and 100 m.

v
TABLE OF CONTENTS
Chapter No. Title Page No.

ABSTRACT ii
TABLE OF CONTENTS v
LIST OF TABLES x
LIST OF FIGURES xi
LIST OF SYMBOLS AND ABBREVIATIONS xiv
1 INTRODUCTION 1
1.1 Applications of Face Recognition
Technology 5
1.2 Challenges of Face Recognition System 5
1.3 Long Distance and Night Time Face
Recognition 9
1.3.1 Night Time Imaging Modes 9
1.3.2 Face Recognition in the NIR Band 11
1.3.3 Cross-Spectral and Cross-Distance
Matching 12
1.4 Problem Definition 13
1.5 Objectives of the Research Work 14
1.6 Organization of the Thesis 14
2 LITERATURE SURVEY 16
2.1 Introduction 16
2.2 Literature Survey on Face Detection
Techniques 20
vi
2.3 Literature Survey on Preprocessing

Techniques 22
2.4 Literature Survey on Feature Extraction and
Matching Techniques 25
2.5 Inference from the Literature Survey 27
2.6 Conclusion 28
3 FACE DETECTION 33
3.1 Introduction 33
3.2 The Viola Jones Face Detector 35
3.2.1 The Integral Image and Feature
Extraction 35
3.2.2 AdaBoost Learning 37
3.2.3 Cascade Classifier 39
3.3 Analysis of Viola Jones Face Detector on the
LDHF Database 39
3.3.1 The LDHF Database Description 39
3.3.2 Experimental Results and Discussion 40
3.4 Conclusion 44
4 PRE-PROCESSING 46
4.1 Introduction 46
4.2 Proposed Approach for Preprocessing 47
4.2.1 Median Filtering 48
4.2.2 Wavelet Normalization 49
4.2.3 Difference of Gaussian Filtering 50
4.3 Photometric Normalization Techniques 51
4.3.1 The Non Local Means 52
4.3.2 The Adaptive Non Local Means 53
4.3.3 Single Scale Retinex 54
4.3.4 Multi Scale Retinex 54
4.3.5 Adaptive Single Scale Retinex 55
vii
4.3.6 Isotropic Smoothing 56
4.3.7 Anisotropic Smoothing 56

4.3.8 Anisotropic Smoothing Stable 57
4.3.9 Discrete Cosine Transform 57
4.3.10 Difference of Gaussian 58
4.3.11 Homomorphic Filtering 58
4.3.12 Large Scale and Small Scale Features 59
4.3.13 Single Scale Self Quotient Image 59
4.3.14 Multi Scale Self Quotient Image 60
4.3.15 WeberFaces 60
4.3.16 Multi Scale WeberFaces 61
4.3.17 Retina Model 62
4.3.18 Steerable Gaussians 63
4.3.19 Tan and Triggs 63
4.3.20 Wavelet Normalization 64
4.4 Performance Measures for Evaluating
Photometric Normalization Techniques 65
4.4.1 Entropy 65
4.4.2 Root Mean Square Contrast 65
4.4.3 Histogram Spread 65
4.4.4 Histogram Flatness Measure 66
4.5 Evaluation Scheme for Modality Gap
Reduction using Difference of Gaussian
Filtering 67
4.5.1 Root Mean Square Error 67
4.5.2 Peak Signal to Noise Ratio 67
4.5.3 Mean Absolute Error 68
4.5.4 Structural Content 68
4.5.5 Normalized Cross Correlation 69
viii
4.5.6 Normalized Absolute Error 69

4.6 Experimental Results and Discussion 70
4.6.1 Results of Photometric Normalization 70

4.6.2 Results of Modality Gap Reduction
using DoG Filtering 77
4.7 Conclusion 78
5 FEATURE EXTRACTION AND MATCHING 80
5.1 Introduction 80
5.2 Proposed Approach for Feature Extraction 81
5.2.1 Image Representation using Wavelet
Transform 83
5.2.2 Local Binary Pattern Features 85
5.2.3 Histogram of Oriented Gradients 86
5.2.4 Normal Fitting Parameters 87
5.3 Matching Techniques 88
5.3.1 Euclidean Distance 88
5.3.2 Cosine Similarity 89
5.3.3 Canberra Distance 89
5.3.4 Manhattan Distance 89
5.3.5 Chebyshev distance 90
5.3.6 Statistic Value X2 90
5.3.7 Chord Distance 90
5.3.8 Pearson’s Correlation Coefficient 91
5.4 Results and Discussions 91
5.5 Conclusion 94
6 CONCLUSION 95
6.1 Introduction 95
6.2 Summary of the Work Done 95
6.2.1 Analysis of Viola Jones Face Detector 96
ix
on the LDHF Database

6.2.2 Wavelet based Pre-processing
Approach 96
6.2.3 Highly Discriminative Feature Selection 97

6.3 Future Scope 97
REFERENCES 98
LIST OF PUBLICATIONS 107
CURRICULUM VITAE 108
x
LIST OF TABLES
Table No. Title Page No.

2.1 Summarization of Literatures related to Long
Distance and Night Time Face Recognition 27
3.1 The detection rates at different distances 40
4.1 Measurement Parameters 66
4.2 Performance measures attained for sample images 74
taken at distance 150m
4.5 Average values of 100 images for each similarity 78
measure
5.1 Result Comparison in terms of recognition rates 92
5.2 Comparison of recognition rates between different 93
distance measures
xi
LIST OF FIGURES
Figure No. Title Page No.

1.1 General Block Diagram of Face Recognition system 3
1.2 Example showing same individual taken at different 6
lighting conditions
1.3 Pose variations of a single subject 7
1.4 Co-operative and non-co-operative face 7
1.5 Intra-class variability 8
1.6 Inter-class similarity 8
1.7 Example showing (a) VIS and (b) NIR images 11
1.8 Example showing (a) intra-spectral, (b) cross-spectral 12
& cross-distance
1.9 Overview of the Research work 19
3.1 Illustration of integral image generation 36
3.2 Comparison of detection rates for various distances 41
3.3 Perfect detection at 60 m 41
3.4 Partial detection at 60 m 42
3.7 No detection at 100 m 43
3.10 No detection at 150 m 44
4.1 Basic stages of pre-processing in the proposed 47
approach
4.2 Calculating the median value of a pixel neighborhood 48
4.3 Block Diagram of Wavelet Normalization 50
xii
4.4 Photometric Normalized outputs for a sample image at 71

150m (a) anisotropic smoothing, (b) anisotropic
smoothing stable, (c) adaptive nl means, (d) adaptive
single scale retinex, (e) DCT normalization, (f) DoG, (g)
homomorphic filtering, (h) isotropic smoothing, (i) lssf
normalization, (j) multiscale retinex, (k) multiscale self-
quotient image, (l) multi scale weberfaces, (m) nl means,
(n) retina modeling, (o) steerable gaussians, (p) single
scale retinex, (q) single scale self-quotient image, (r) tan
& trigs, (s) weberfaces, (t) wavelet normalization
xiii
4.7 (a) DoG filtered VIS image (b) DoG filtered NIR 77
image
5.1 Stages of feature extraction 81
5.2 Wavelet decomposition upto 3 levels 84
5.3 Three level wavelet decomposition of sample face 92
images (a) 1 m and (b) 150 m
xiv
LIST OF SYMBOLS AND ABBREVIATIONS
Symbols
σ - standard deviation
θ - threshold
Φ(x) - mother wavelet
i(x,y) - input image
ii(x,y) - integral image
Id(x) - Photometric Normalized Image
Gσ - Gaussian Kernel
h - decay parameter
k(x,y) - smoothing kernel
p - polarity
wi - weights
Abbreviations
2D - Two Dimensional
3D - Three Dimensional
AISS - Anisotropic Smoothing
AISSS - Anisotropic Smoothing Stable
ANLM - Adaptive Non Local Means
ASSR - Adaptive Single Scale Retinex
CCD - Charge Coupled Devices
DARPA - Defense Advanced Research Projects Agency
DCT - Discrete Cosine Transform
DoG - Difference of Gaussian
xv
DSLR - Digital Single Lens Reflex

FR - Face Recognition
FERET - Face Recognition Technology

FRVT - Face Recognition Vendor Test
HFR - Heterogeneous Face Recognition
HFM - Histogram Flatness Measure
HMF - Homo Morphic Filtering
HOG - Histogram of Oriented Gradients
HS - Histogram Spread
IR - Infra-Red
ISS - Isotropic Smoothing
KLT - Kanade-Lucas-Tomasi
LBP - Local Binary Pattern
LDA - Linear Discriminant Analysis
LDHF - Long Distance Heterogeneous Face Database
LED - Light Emitting Diode
Lssf - Large scale and small scale features
LSNA - Local Structure of Normalized Appearance
LWIR - Long Wave Infra-Red
MB-LBP - Multiscale Block Local Binary Pattern
MSR - Multi Scale Retinex
MSSQI - Muti Scale Self Quotient Image
MSWF - Multi Scale WeberFaces
MWIR - Mid Wave Infra-Red
NAE - Normalized Absolute Error
NIR - Near Infra-Red
NIST - National Institute of Standards and Technology
NK - Normalized Cross Correlation
NLM - Non Local Means
xvi
PSNR - Peak Signal to Noise Ratio

RM - Retina Model
RMSE - Root Mean Square Error
SC - Structural Content
SG - Steerable Gaussians
SSR - Single scale Retinex
SSSQI - Single Scale Self Quotient Image
SWIR - Short Wave Infra-Red
TT - Tan and Triggs
VIS - Visual
WF - WeberFaces
WN - Wavelet Normalization
1
CHAPTER 1
INTRODUCTION
Face recognition is a technology used to distinctively identify or verify

a person by comparing and exploring patterns based on the
person's facial features. Face recognition systems engage a technique that can
predict whether there’s a match based on numerous points on an individual’s face
(P.J. Phillips et al 2009). The human face plays a significant role in our social
interaction by conveying people’s identity. Using the human face as a key to
security, biometric face recognition technology has received significant attention
in the past several years due to its prospective and wide variety of applications in
both law enforcement and non-law enforcement scenarios (Unar J.A et al 2014).
When compared to other biometric systems using fingerprint/palmprint and iris,
face recognition technology has distinct advantages because of its non-intrusive
process (Seo et al 2011). Face images are usually captured from a distance without
touching the person being identified, and the identification does not require
interacting with the person. Above all, face recognition system helps in restraining
crime circumstances because face images that have been captured and saved can
later help identify or verify an intruder (Kang D et al 2014).
Face recognition has become a predominant area of research in

computer vision over the last ten years or so, and is also one of the most prosperous
applications of image analysis and understanding (Maeng H et al 2011). Due of
the solemnity in the nature of the problem, not only computer science researchers
1
2
are interested in it, but neuroscientists and psychologists also. There is also a
general opinion that advances in computer vision research will deliver useful
perceptions to neuroscientists and psychologists into how human brain works, and
vice versa (Montag et al 2016).
It looks like facial recognition technology came out of nowhere.

Nevertheless, in truth, this technology has been in the workings for some time.
The important milestones in the history of face recognition sheds light on how this
transformative technology came into existence, and how it has progressed over
time. Woodrow Wilson Bledsoe is recognized to be the father of face recognition
who established a system that could classify photos of faces by hand using what’s
called as a RAND tablet. It is a device that can be used to input horizontal and
vertical coordinates on a grid using a stylus that emitted electromagnetic pulses
(Zhao et al. 2003). The system was able to physically record the coordinate
locations of numerous facial features that includes the eyes, nose, hairline and
mouth.
In 1971, Goldstein, Harmon, and Lesk succeeded in improving the

accuracy of the manual face recognition system. They engaged 21 specific
individual markers comprising lip thickness and hair color in order to
automatically identify faces. But in Bledsoe’s system, the actual biometrics had to
still be manually calculated. Sirovich and Kirby (1987) started to apply linear
algebra to handle the issues in facial recognition which became to be called as the
Eigenface approach, started as a pursuit for a low-dimensional depiction of facial
images. The efforts by Sirovich and Kriby started to show that feature analysis on
an assortment of facial images might form a set of basic features. In 1991, Turk
and Pentland, successfully expanded the Eigenface method by exploring how to
detect faces within images. This innovation headed to the first instances of
automatic face recognition. Though their approach was constrained by both
technological and environmental factors, it was a significant revolution in
2
3
demonstrating the viability of automatic facial recognition. The Defense

Advanced Research Projects Agency (DARPA) and the National Institute of
Standards and Technology bowled out the Face Recognition Technology
(FERET) platform beginning in the 1990s in order to embolden the profitable face
recognition market (T.Bourlai et al 2012). The mission involved creating a
database of facial images. The National Institute of Standards and Technology
(NIST) launched Face Recognition Vendor Tests (FRVT) in the initial 2000s.
Constructing on FERET, FRVTs were designed to offer autonomous government
estimations of facial recognition systems that were commercially obtainable, in
addition to prototype technologies.
Input Image
Pre-Processing
Feature
Extraction
Recognized
Classification
output
Database
Figure 1.1 General Block Diagram of Face Recognition System
3
4
At the beginning of 2010, Facebook started executing facial

recognition functionality that facilitated to recognize people whose faces may be
featured in the photographs that Facebook users keep posted day-to-day. In 2017,
Apple released the iPhone X thus publicizing face recognition as one of its primary
new features. The face recognition system employed in the phone is used for
device security.
The innovative model of iPhone traded out almost rapidly, ascertaining

that consumers now accept facial recognition as the new gold standard for
security. These sequential advances in face recognition technology substantiate
the necessity for it in the days to come, thus stating that more research is required
to commemorate the effectiveness of it.
By and large, there are three phases in a face recognition system,

namely, face detection, pre-processing, feature extraction and classification (J.C.
Klontz et al. 2013) as shown in Figure 1.1. The input to a face recognition system
is always an image or a video stream. The output is usually an identification or
verification of the subject in the image or video. Face detection is the prior step in
face recognition and it does the job of locating a face in the input image (D.Huang
et al. 2007). The face images are then pre-processed and enhanced in the pre-
processing stage in order to compensate for illumination variation and other
changes (Z.Pan et al. 2003).The next phase is feature extraction in which highly
discriminative, non-redundant and informative features facilitating good
recognition rates. Feature extraction is the procedure of converting the input data
into a vector of features that can very well epitomize the input data (S.Zhao et al.
2005). Classification being the final step in face recognition yields the output of
the system.
4
5
1.1 APPLICATIONS OF FACE RECOGNTION TECHNOLOGY
Facial recognition technology has been conventionally related with the

security sector, though, today there is an active extension into other businesses
including retail, marketing and health (L.Shen et al. 2012). Also it has great
usefulness in human computer interaction, multimedia, virtual reality, computer
entertainment, medical records, information security, online banking, biometric
identification including passports, driver licenses, automated identity verification
in border controls, law enforcement e.g. video surveillances, investigation and
personal security such as driver monitoring system, home video surveillance
system. Of all applications, the face recognition technology has a major
component in verifying access for security in surveillance scenarios which is the
greatest need of the hour (Kang et al 2014). This research work also focuses on
this particular application.
1.2 CHALLENGES OF FACE RECOGNITION SYSTEM
The success of face recognition system depends upon the quality of

probe (the input images acquired at uncontrolled conditions) images and gallery
(the images acquired at controlled conditions, say at 1 m indoor) images, since the
image quality is important in extracting the facial features. Without accurate facial
features the robustness of the approaches will also be lost (F.Nicolo et al 2012).
Thus even the best recognition algorithm fails as the quality of the image declines.
An important encounter that astonished the researches is when images

of a same person appears contrarily due to variation in lighting. Illumination can
alter the appearance of an entity significantly. It has been observed that the
variations amongst the images of the same face due to illumination and observing
direction are almost always greater than image variations due to change in face
5
6
identity (Y.Moses et al 1994). As it is evident in Figure 1.2, the same person, with
the similar facial expression is likely to appear strikingly different when the
direction of the light varies. These variations are exaggerated even more greater
by added factors including facial expression, hair styles, perspiration, cosmetics,
and even deviations due to aging.
Figure 1.2 Example showing same individual taken at different lighting

conditions
Figure 1.3 exemplifies the differences in the pose of a single person.

Habitually, the gallery data used by face recognition systems are frontal view face
images of persons. Frontal view images contain more unambiguous information
of a face than profile or other pose angle images (B.Klare 2010). The difficulty
arises when the system has to recognize a rotated face using this frontal view
gallery data. The user is thus required to collect numerous views of an individual
in a face database.
6
7
Figure 1.3 Pose Variations of a single subject
The issues under pose variation has been divided in to three categories:
Modest case with small rotation angle; most regularly addressed case, when there
is a set of training image pairs, frontal and rotated images; most problematic case,
when the required training image pairs are not accessible (Z.Lei et al 2009).
Figure 1.4 Co-operative and Non Co-operative Face
7
8
Figure 1.5 Intra-class Variability
This condition usually occurs in any surveillance environment. A

co-operative face is captured under controlled conditions. The pose, the lighting
and all other necessary factors are pre-determined (Z.Lei et al, 2012). Whereas, in
a surveillance environment the face image obtained is generally non co-operative.
It may be captured outdoors and even at night time. Figure 1.4 shows the
difference between a cooperative and a non-co-operative face and denote the
variations in different images of the same person. Figure 1.5 illustrates intra-class
variability.
Figure 1.6 Inter-class Similarity
8
9
There are cases where images of two different people appear to be

same. Such a condition is termed as inter-class similarity. Figure 1.6 shows inter-
class similarity.
1.3 LONG DISTANCE AND NIGHT TIME FACE RECOGNITION
Mostly, the face recognition systems are dependent on the usage of

face images captured in the visible range of the electromagnetic spectrum, i.e.
380-750 nm. Nevertheless, in real-world circumstances (law enforcement and
military) we ought to deal with tough environmental situations characterized by
disapproving lighting and prominent shadows. An obvious example is a night-
time environment (Kang et al, 2014) where human
acknowledgement based exclusively on visible spectral images may not be
representative. Inadequate illumination at nighttime and low resolution or blurred
face image at a long distance are viewed as the most imperative challenges for
face recognition in surveillance scenarios.
1.3.1 Night Time Imaging Modes
So as to deal with such demanding face recognition scenarios, multi-

spectral camera sensors are very beneficial because they can capture images
during both day and night (S.Z.Li et al 2013). Therefore, recognizing faces
through the infrared spectrum has come to be an area of growing interest.
Regarding Face Recognition, both the visible and IR sensors are significant.
Visible sensors take the advantage that they are of low cost and the spatial
resolution are much higher when compared to definite infrared sensors,
particularly short wave IR or cooled infrared (thermal) ones.
9
10
The infrared (IR) spectrum is divided into diverse spectral bands

built on the response of various detectors: they are, the active IR and the thermal
(passive) IR band. The active IR band (0.7-2.5µm) is classified into the NIR (near
infrared) and the SWIR (short wave IR) spectrum. NIR has the benefit that we can
see at night but then again the limitation is that an illuminator is essential, which
can be speckled (not possible to stealthily illuminate the scene). SWIR is likely to
have longer wavelength range than NIR and it is more forbearing to low levels of
obscurants like fog and smoke. Dissimilarities in appearance amongst images
detected in the visible and the active IR band are due to the properties of the object
that is being imaged. The profits of SWIR are debated in (T. Bourlai, A. Ross, C.
Chen, and L. Hornak, 2012). SWIR may possibly collect facial features that are
not perceived in the visible spectrum and can be joint with visible-light imagery
to creäte a more uncut image of the human face. The SWIR range have been
recently become practical for face recognition, predominantly since the growth of
indium gallium arsenide sensors, which are designed to work well in night-time
circumstances. One more advantage is that the exterior light source that may be
vital for regions in the SWIR band can covertly illuminate the scene since it
radiates light invisible to the human eye (N. Kalka 2011).
The passive IR band is again divided into the Mid-Wave (MWIR)

and the Long-Wave Infrared (LWIR) band. MWIR ranges from 3-5µm, whereas
LWIR ranges from 7-14µm. Both MWIR and LWIR cameras are able to sense
temperature variations across the face at a distance, and produce thermograms in
the form of 2D images. The major differences between MWIR and LWIR is that
the MWIR has both reflective and emissive properties, whereas LWIR comprises
primarily of emitted radiation. The advantage is that they are both almost totally
resistant to external illumination (M. Ao et al 2009). Added advantage is that they
disclose different image characteristics of the facial skin. However, their
boundaries are that they are subject to variations in temperature in the surrounding
10
11
environment, and to the variations of the heat patterns of the face that are affected
due to various factors, e.g. stress, changes in temperature of the surrounding
environment, physical activity etc. The prominence of MWIR in face recognition
technology has been lately proposed by T.Bourlai et al. (2012).
1.3.2 Face Recognition in the NIR Band
(a) (b)
Figure 1.7 Example showing (a) VIS image (b) NIR image
Bearing in mind a case of face recognition at nighttime, a distinctive

light source that is capable of illuminating the subject's face in the dark is needed.
Amongst various types of lighting, the use of infrared source is the most
commonly approved method for nighttime face recognition. Particularly, the use
of near-infrared (NIR) illumination for nighttime face recognition in surveillance
scenarios has the following advantages (Maeng et al, 2012) NIR illuminator is
commonly not visible to the human eye and maintains the surveillance operation
covert; NIR images are generally not affected by ambient temperatures, emotional
and health condition of the subject when compared to thermal images; reduced
price of NIR illuminator in comparison to thermal sensors (S.Z.Li et al 2007); NIR
illumination are able to easily penetrate through glasses; NIR light is robust to
disparities in ambient lighting. Thus, it is not astonishing that a number of research
11
12
groups have studied face recognition using NIR illumination and have also offered
various NIR face detection and recognition methods.
1.3.3 Cross-Spectral and Cross-Distance Matching
(a)
(b)
Figure 1.8 Example showing (a) intra-spectral, (b) cross-spectral

& cross-distance
The term cross spectral matching denotes a condition where the

gallery and the probe data are of two different spectral modes. For example, in
night time surveillance the probe image is usually an IR image, whereas the
reference images in the gallery would generally be visual images (B.Zhang et al,
12
13
2010). In such cases, there is a need for matching images of two different spectra
in contrast to intra-spectral matching where two or more images of the same
spectra are compared.
Similarly, cross-distance matching denotes a condition where the

gallery image is taken at a closer distance of about 1metre and the probe data is
taken at long distances say for example 30m, 60m, 100m ,150m, etc., Such
conditions usually prevail in military surveillance where continuous monitoring is
strictly required even at night time. Figure 1.8 shows examples for intra-spectral,
cross-spectral and cross-distance matching.
1.4 PROBLEM DEFINITION
The problem formulated based on the literature study is given below

which brings out the area of concern, conditions to be improved and the
difficulties to be eliminated in long distance and night time face recognition that
leads to framing the objectives of this research work and carrying out meaningful
investigations.
1. Face detection in long distance and night time images is still a challenging
problem. There is no proven face detection method for the above scenario
2. The detected faces from long distance and night time images are low in
contrast, less of facial features and dissimilarity (modality gap) with its
visual counterpart in the case of cross-spectral face matching.
3. Extraction of facial features that are highly discriminative from the
detected face image which has various limitations mentioned above is still
a challenging problem in face recognition. It is understood that single type
of features may not be suitable to attain better recognition
13
14
1.5 OBJECTIVES OF THE RESEARCH WORK
1. To apply and test the performance of the benchmarked Viola Jones Face
detection Algorithm on Long Distance and Night Time Face Images.
2. To develop a novel approach to carry out pre-processing such that the
differences in modality, distance and illumination are compensated.
3. To develop a novel approach for feature extraction which yields highly
discriminative features and thus enhances the recognition rate for long
distance and night time face images.
Figure 1.9 Overview of the Research work
1.6 ORGANIZATION OF THE THESIS
The remaining section of the thesis is organized as follows:

A comprehensive literature survey on long distance and night time
face images is presented in Chapter 2 which presents the necessary background
for the rest of the chapters.
14
15
Chapter 3 describes the face detection techniques and the

performance of the Viola Jones Face Detection Algorithm for Long Distance and
Night Time Face Images.
Chapter 4 presents the pre-processing approach using the Median
Filtering, Wavelet Normalization and the Difference of Gaussian Filtering.
Chapter 5 describes the approach for feature extraction using the
combination of Wavelet Transform, Histogram of Oriented Gradients, Local
Binary Patterns and Normal Fitting Parameters.
Chapter 6 gives a vivid summary on the different stages of long
distance and night time face recognition system and the highlights of the results
obtained at the various stages.
15
16
CHAPTER 2
LITERATURE SURVEY
2.1 INTRODUCTION
Face recognition has gained a significant position among most

commonly used applications of image processing furthermore availability of
possible technologies in this field have donated a great deal to it. The field of
biometrics has gained supreme attention and has taken its place as the most
reliable choice for recognition during the recent years because of the availability
of feasible technology after extensive research in this field and loopholes in other
systems of identification (M. Ao et al. 2009). Nevertheless, efforts are still in hand
to develop a more user-friendly system meeting requirements of security systems,
yielding more accurate results, to protect our assets and secure our privacy.
Ambiguities exist in traditional methods of recognition as they authenticate people
and grant them access to virtual and physical domains examining an individual’s
behavioral and physiological traits and characteristics in order to realize their
purpose (Ming-Hsuan 2008). A significant benefit of face recognition is that it can
be carried out without physical contact. Database for face recognition systems
varies from static controllable photographs to uncontrollable videos (P.J.Philip et
al. 2009). This restraint imposes a large collection of technical challenges for such
systems in image processing, analysis, and understanding. In Face Recognition,
there are different challenges that cost a great deal during the various stages of
face recognition thus affecting the recognition rate. For solving these issues a
16
17
general statement of the issue can be resolved, formulated and observed first. Any
face recognition system comprises three main parts of that are pre-processing,
feature selection and classification (Seo et al. 2011).
Human beings are capable of recognizing hundreds of faces by

learning throughout their whole life span and identify and recognize easily
familiar faces even after separation of some years. This skill and ability is fairly
apt in human beings that it is hardly affected even after the lapse of the period and
various changes in visuals due to viewing aging, expressions, distractions and
conditions such as beards or change in hair styles and glasses (Unar et al, 2014).
The ability of humans to deduce intelligence or facial appearance character can be
suspected but face recognition is an essential and important element of the ability
of perception system of a human and is a usual assignment for all humans.
Building a system similar to human perception system is still an active area of
research (K.W.Bowyer et al, 2006). However, it yields successful results only
under restricted conditions. An ideal and better face recognition method and
technique should consider classification issues as well as demonstration and
representation. Face recognition has started to become a vital and an essential
concern for many applications such as security system, card verification, credit
criminal identification, video surveillance, person identification; people tagging,
Database Investigation and Pervasive Computing (Y.Adini et al, 2009). In the last
several years, abundant algorithms and methodologies have been recommended
for recognizing a face from an image. In these methodologies computers have
focused on detecting and recognizing features and traits of individuals such as the
nose, head outline, eyes, mouth and describing a face shape and model by the size,
position, and relations between these traits and features. Several researchers have
noticed that the recognition rate of faces is high, if 3D faces are used (X.Zou et al,
2007).
17
18
The process of recognizing human faces from images and videos is

definitely a hard nut to crack. Though there are several approaches to carry out
this mission, none is capable of accomplishing it with 100% accuracy because of
the numerous challenges that is faced by this system (D.Huang et al, 2009). These
factors can be divided into 2 categories as Intrinsic and Extrinsic factors. Intrinsic
factors comprises of the physical condition of the human face e.g. aging, facial
expressions etc. that affects the system whereas extrinsic factors are those that turn
out to be a reason to change the appearance of the face e.g. lightening condition,
pose variation, long distance and night time imaging (H.Han et al, 2013). This
research exclusively focuses on one important challenge of face recognition, i.e.
long distance and night time imaging in surveillance environment.
By means of the extensive utilization of surveillance video cameras,

the inevitability to perform robust face recognition in surveillance videos is on
high demand for usage in access control, security monitoring, etc. But then again,
still it is very perplexing for the existing face recognition algorithms to work
precisely in real-world surveillance data that consist of wide range of variations
(Maeng et al, 2012).
The approaches for face recognition can be broadly classified into

three groups such as general algorithms, 2D techniques, and 3D approaches. Each
category can be further classified into the following (Chihaoui et al, 2015):
General Algorithms are divided into two as Holistic approaches and Local
Approaches. Principal component analysis Fisher discriminant analysis, Artificial
Neural Network, Line edge maps and Directional Corner Point are the methods
that can be categorized under Holistic approaches. Local Approaches includes
Template matching, modular PCA, Elastic bunch graph matching and local binary
patterns.
18
19
Two Dimensional (2D) techniques are again classified into Real

View-based Matching, Transformation in image and Feature space. Beymer's
method and panoramic view are characterized under real view-based matching
(Jameel, 2015). The methods under transformation in image space are Parallel
deformation, pose parameter manipulation, Active appearance models, linear
shape model and Eigen light-field. Transformation in feature space covers Kernel
methods (kernel PCA, kernel FDA), Expert fusion, correlation filters, Local linear
regression and tied factor analysis.
Three Dimensional (3D) approaches for face recognition are

classified as Generic shape-based models, Feature-based 3D construction and
Image-based 3D construction. Generic shape-based models cover approaches
such as the Cylindrical 3D pose recovery, Probabilistic geometry assisted face
recognition and Automatic texture synthesis (Bevilacqua et al, 2006). Feature
based 3D construction covers Composite deformable model, Jiang's method,
multi-level quadratic variation minimization. Methodologies that come under
Image-based 3D construction are Morphable model, illumination cone model and
stereo matching (S. J. D. Prince et al, 2008).
All the approaches presented above have already been tested on

different datasets and addressed only one or two challenges, which in turn makes
us to highlight which one is the best based on the recognition rate (X.Chai et al,
2007). Therefore, the larger the number of the addressed problems/challenges is,
the higher is the flexibility to real time applications and the same stands as the
focus of our research work.
The face recognition system is not a single process. The functioning

of every module is more important to achieve the expected target. The rest of this
chapter is organized as follows: (a) face detection (b) pre-processing and (c)
feature extraction. The works available for all the above illustrated sub-modules
19
20
are discussed in this chapter. This chapter covers several articles from different
reputed journals and conferences for literature review and has discussed the
various existing methods of face recognition. Advantages and disadvantages of
these various works are also discussed briefly in order to identify the suitability of
these techniques for in achieving a better recognition rate.
2.2 LITERATURE SURVEY ON FACE DETECTION TECHNIQUES
Face detection is being one of the most premeditated topics in

computer vision literature, not just because of the challenging nature of face as an
object, but also due to the innumerable applications that require the application of
face detection as a first step (H.A. Rowleyet al,1998). For the period of the past
15 years, incredible progress has been made due to the obtainability of data in
unrestricted capture circumstances through the Internet, the effort made by the
community to improve publicly available benchmarks, as well as the progress in
the development of robust computer vision algorithms (Devendra Singh et al,
2012).
Human faces detection in color images can be carried out using an

approach presented by Sharif M. et al, (2012) that utilizes HSV color space. This
approach in real time video works in two steps. At first statistical model to get H
(Hue) and S (Saturation) ratios for skin region is applied. Second, to get
approximation of face location in an image with respect to the detected skin is on
the basis of defined ratios for scene width and height region (Lienchart R et al,
2002). Lastly to verify the face from the preceding roughly detected skin region,
an eye template matching algorithm is applied. With equitably acceptable
performance, presented model has been tested in real time environment. A novel
face detection technique is presented by Salih and Muhittin (2009), based on
accelerated GPU object detection system which can efficaciously detect 90.8%
faces from real time environment (high resolution video ranging from 640×480 to
20
21
1920×1080) without forfeiting accuracy. Real-time face feature detection based

on conditional regression forests on low quality images is presented (Marciniak et
al, 2011) T. The system is tested on labeled faces in the wild database which
contain 5749 individuals’ facial images and achieve 87.5% accuracy, that display
improved results as compared to previously designed face features detection
systems. Likewise, the contributors presented a human face retrieval frame work
for video database, by applying fast Haar-like features based algorithm and
Kanade-Lucas-Tomasi (KLT) tracker. The approach is instigated using Open CV
and it achieves 94.17% accuracy.
An innovative approach for face area localization is proposed by

Dibakar (2010), which strategizes the detected face based on analyzing human
body shape characteristics and skin colour information, which successfully detects
about 97.5% face correctly. A trustworthy face detection technique is proposed by
(W. Chen et al, 2006) which depends on skin colour detection in colour images,
with 2D Gaussian modal and histogram. The system did not require any training
that significantly condenses computational cost. In order to gain efficiency and
robustness, fusion strategy is considered, that gives accurate results with 0.904
probability on Stottinger dataset. Zhu et al, (2004) suggested a Face Detection and
Pose Estimation technique employing Tree structured and shape model, besides
system is trained under fully supervised circumstances. The system is
substantiated on multi-view point, illumination and expression conditions, with
around 750,000 images of 337 people. The system achieves 99.9% accuracy,
when allowing ±15º error tolerance on MultiPIE database.
Similarly, an efficient face recognition technique is presented by

Guillaumin (2011), by exploiting caption based supervision. The approach works
in a two stage process: at first face retrieval from video frames is accomplished
and then correct association of face from database is established. Human face
recognition for real time attendance system was suggested by Susheel et al (2010).
21
22
He presented that the system works in two main stages: first, face detection based
on AdaBoost with Haar cascade is used and in second, face recognition based on
fast and simple PCA and LDA is executed. The system is tested on 500 images
and after face detection the images are stored in JPEG format in 100 x 100 matrix
size for face recognition. This approach achieves accurate results for general
purpose online attendance system. Wood et al (2006) proposed a robust face
detection based on lighting-variable adaboosting, which is adaptive for fluctuating
illuminations and also depends on multiple features such as global and local
intensity variations. The system is tested on standard datasets Caltech-101, that
successfully achieve overall 95% accuracy in different lighting levels.
Robert Viola and Michael Jones (Paula Viola et al, 2001) have
defined the machine mastering approach concerning the visual object detection
that has the ability of processing photos enormously in a greater speed and
accomplishing high recognition rates. The work done by them has been
distinguished by simply three important contributions. Their first contribution was
the introduction of any new imprint representation referred to as the Integral
Image permitting the features exploited by the detector to get calculated rapidly.
The next contibution was the learning algorithm which is based on a concept
known as AdaBoost. This strategy selects a small amount of precarious visible
features at a larger extent and reassures tremendously useful classifiers. The third
was gradually utilizing more complex classifiers inside a “cascade”, thus allowing
the background regions of the image to get swiftly thrown away while bombing
out more calculation on promising object-like areas. Even though the Viola Jones
algorithm was proposed almost a decade ago, the algorithm still stands the best
accepted both commercially and scholastically.
2.3 LITERATURE SURVEY ON PRE-PROCESSING TECHNIQUES
22
23
The basic purpose of pre-processing in long distance and

night time face recognition is to enhance the outdoor captured of images
containing varying lighting conditions, varying distances, depreciation of image
quality due to the longer distance between the camera and subject, random noises
and modality differences. In an uncontrolled environment, the subject may not be
aware of the surveillance scenario and the recognition process will be done
automatically. In such a case of face recognition, the automatically detected
person at a distance is matched against a database of images to recognize the
person. The procedure of recognizing a person is nevertheless dependent on image
capturing device like the video surveillance cameras. The odds of correct
recognition are influenced by the quality of image captured. Quality of image is
in turn dependent on the efficiency of the camera, distance between person and
the camera, lighting condition and also whether the person looks at camera etc. In
all these cases, preprocessing the captured image might give better results in
recognition.
A number of pre-processing techniques have been discussed in the

literatures to standardize the face images and to eliminate the differences in
appearance. Goswami et al (2012) uses a sequential chain, retinex and self-
quotient pre-processing techniques for establishing a normalization in the face
images. Bourlai and Cukic (2013) employ techniques like the contrast limited
adaptive histogram equalization, retinex, self-quotient, and difference of
Gaussians filtering. Maeng et al. (2012) has used the histogram equalization and
Gaussian smoothing as pre-processing techniques and thereby focusing on the
consequence of distance on heterogeneous face matching scenario. Kang et al.
suggested an image restoration method influenced by the Linear Local Embedding
(LLE) to recuperate high-quality face images from corrupted probe images. Zhu
et al. (2004) proposed a transductive method named as transductive heterogeneous
face matching (THFM) to decrease the domain difference that occurs as a result
of heterogeneous data and learns the discriminative model for targeting people
23
24
concurrently. Ancong Wu et al (2008) put forth deep zero-padding for training

one-stream network towards inevitably sprouting domain-specific nodes in the
network used for cross-modality matching. Dahua Lin et al (2002) articulated an
algorithm where two transforms are instantaneously cultured to convert the
samples in both modalities respectively to a mutual feature space.
It is very interesting to know that the researchers state the problem

of illumination variation between face images of different persons to be smaller
than the variations between face images of the same person under different
illumination. Also it is defined that illumination creates larger variation in face
images than pose (Adini et al. 1997). A most simple and highly used method that
is applied at preprocessing stage of face recognition in order to eliminate
illumination variation is Histogram Equalization (HE) that helps in enhancing the
overall contrast of an image (Gonzalez et al. 1992). Many modulations of
histogram normalization techniques like uniform histogram distribution, normal
histogram distribution, log normal histogram distribution and histogram
truncation are put forth by different researchers as a pre-processing or post
processing techniques for illumination normalization. Jobson et al. (1997)
extended the retinex theory to Single Scale Retinex approach, that strengthens the
local contrast and also the brightness of face images. In this the illumination
preprocessing scheme works grounded on gamma correction technique. Wang et
al. (2013) anticipated self-quotient image (SQI) to develop the fluctuating lighting
conditions in face recognition (O. Arandjelović,2013 and H.Wang et al. 2004 ).
Another approach to photometric normalization is Local Normalization which is
proposed by Xie and Lam (2006), to diminish the effect of uneven lighting
conditions to get the equivalent face images under the normal lighting. Chen et al.
(2006) discarded a appropriate percentage of DCT coefficients in zigzag pattern
in order to curtail the variation of face images from the same individual under
different lighting conditions and then inverse DCT transform was done to get the
final illumination normalized images.
24
25
2.4 LITEARTURE SURVEY ON FEATURE EXTRACTION AND

MATCHING TECHNIQUES
Feature extraction represents the relevant information contained in

an image such that the process of classifying the face image is made easy by a
strict procedure. Feature extraction is carried out subsequently after the pre-
processing stage in face recognition system (F. Ferri et al, 1994). In most cases of
face recognition and image processing, feature extraction is a distinct form of
dimensionality reduction. Feature selection is a serious issue to the whole system
since the matching process cannot be efficient enough recognize from poorly
selected features (H. Frigui et al, 1999). The important criteria for feature
extraction is as follows: Features must contain the required information to
distinguish between classes; they must be insensitive to inappropriate variations
in the input, and should also be limited in number, in order to enable efficient
computation of highly discriminant functions. Features can also be termed as
descriptors. The process of feature extraction involves extraction of relevant
features from faces to form feature vectors (A.K. Jain eta la, 1997). These feature
vectors are then used by the matchers to compare the input image with that of the
target output. This makes the work of the matcher simple to compare between
different classes by looking at these features as it allows a fairly easy distinction
(C. Lee et al, 1993). Feature extraction methods can be grouped into three main
categories: global features; local features, statistical features; geometrical features.
Global features represent the image as a whole to generalize the entire
image. Contour representations, shape descriptors and texture features can be
categorized under global features (T. Ojala et al, 2002). Few examples of global
features are Shape Matrices, Invariant Moments (Hu, Zerinke), Histogram
Oriented Gradients (HOG) and Co-HOG. Generally, global features are used for
low level applications like object detection and classification (Tsai et al, 2002).
Global features give the overall information in an image when compared to local
25
26
features. This may not be needed in all cases, as many applications require only
specific or relevant data from the image.
Local features represent the key points in an image. Usually, local

features represent the texture pattern in an image (A. Ben-Hur et al, 2003). Some
examples of local descriptors are Scale Invariant Feature Transforms (SIFT),
Speeded Up Robust Features (SURF), Local Binary Pattern (LBP), Binary Robust
Invariant Scalable Keypoints (BRISK), Maximally Stable Extremal Regions
(MSER) and Fast Retina Keypoint (FREAK). Local features are commonly used
for high level applications like object recognition (T. Hastie et al, 2000). Local
features are region specific and they mostly highlight only the relevant data and
they are likely to eliminate redundancy.
Statistical features are compact representation of certain quantitative

features in an image, derived from a statistical distribution of points (C. Chatterjee
et al, 1997). A wide selection of statistical feature descriptors can be used for
feature extraction, that ranges from simple descriptive statistics to complex
transformations (P.A. Chou et al, 1991). Some examples of statistical feature
extraction techniques are mean and standard deviation computations, frequency
count summarizations, Karhunen-Lóeve transformations, etc. The following are
few statistical features: arithmetic mean; standard deviation; kurtosis; skewness;
entropy and percentiles. These quantitative features extracted from the image are
structured into a fixed length feature vector. Statistical features enable high speed
and low complexity and also are invariant to style variations to some extent.
Geometrical features represent local and global properties of characters

and are highly tolerant to distortions and style variations (Cheng-Lin Liu, 2007).
They are topological features that are used to encode certain knowledge about the
contour of the image and they require some knowledge as to what type of
components build up the image (Oivind Due Trier et al, 1996). Geometric features
present segments, perimeters and areas face images formed by the detected points.
26
27
Image matching is the task of establishing correspondences between

two images of the same scene/object. Image matching is a high-level machine
vision technology that tries to match the features of the input image with that of a
predefined gallery data (Nilamani Bhoi et al, 2010). Mostly, the approaches used
for image matching comprises of detecting a set of interest points that are
associated with image descriptors from image data. Once the features are extracted
from two or more images, the next step is to establish some preliminary feature
matches between these images. The performance of a matcher solely depends on
the efficiency of the selected features. For good accuracy, it is recommended to
use several feature descriptors at the same time.
2.5 INFERENCE FROM THE LITERATURE SURVEY

Facial Recognition (FR) has become an important technology to handle
the tremendous growing need for identification and verification since last century.
There are large numbers of commercial, security and forensic applications
requiring the use of face recognition technologies. The key advantage of facial
recognition is that it requires no physical interaction on behalf of users. Most face
recognition systems depend on the usage of face images captured in the visible
range of the electromagnetic spectrum, i.e. 380-750 nm. However, in real-world
scenarios (military and law enforcement) it is necessary to deal with harsh
environmental conditions characterized by unfavorable lighting, pronounced
shadows and long distance. Such an example is a night-time military environment,
where human recognition based solely on visible spectral images may not be
feasible. In order to deal with such difficult FR scenarios, multi-spectral camera
sensors are very useful because they can image day and night. Thus, recognition
of faces across the infrared spectrum has become an area of growing interest. The
infrared (IR) spectrum is divided into different spectral bands based on the
response of various detectors, i.e. the active IR and the thermal (passive) IR band.
The active IR band (0.7-2.5µm) is divided into the NIR (near infrared) and the
27
28
SWIR (short wave IR) spectrum. NIR has the advantage that it can capture face
images at large standoff during night time which contains sufficient information
for face recognition. However, the literatures show that there is a great need for
improvement in the area of long distance and night time face recognition, which
is the main goal of this project.
The challenges with regards to long distance and night time facial
recognition technologies are: expensive long range IR cameras, image quality
(e.g., image resolution, compression, blur, and noise), time span (facial aging),
occlusion, and demographic information (e.g., gender, race/ethnicity, or age),
variations in pose, expression, including illumination that depends on the
operational environment, below par performance of FR algorithms and software
packages. The existing face recognition systems have addressed some of the
challenges through new image acquisition set ups, cross spectral and cross
distance type face databases and face recognition algorithms. However, the
problems related to image quality, image variations and recognition accuracy still
exist, which are the main objectives of this project work to be addressed.
The proposed face recognition system will be much useful to military

and law enforcement agencies. It can be installed in checkpoints at base camps
and other places targeted by suicide bombers. It is expected to provide
identification/verification of (i) felony or misdemeanor, (ii) persons affiliated with
a terrorist organization, or (iii) a match of persons reported to a law-enforcement
agency as missing. The closely related literatures with respect to this research have
been summarized in Table 2.1.
2.6 CONCLUSION
This review benevolence the various literatures pertaining to
different stages of long distance and night time face recognition. A number of
authors have experimented on face recognition systems under controlled
28
Table 2.1 Summarization of Literatures related to Long Distance and Night Time Face Recognition
Author Image Acquisition Database Pre- Feature Matching Merits & Demerits
& Setup processing Extraction
Publica
tion
details
Kang et Canon 600D DSLR LDHF Geometric and combination of RS-LDA 150m, still images, cross-
al, 2014. with telephoto lens Photometric three filters (DoG, spectral and cross distance
100 subjects,
and RayMax300 Normalization, CSDN and matching, rank 1 accuracy at 60,
1m, 60m,
illuminator Image Gaussian) + 100m & 150m is 82%, 69% &
100m &
Restoration descriptors (SIFT, 28%
150m.
MLBP)
Bourlai Canon EOS 5D Mark VIS, SWIR, Geometric and LBP, LTP PCA, PCA+LDA, BIC, intra-spectral, cross spectral and
et al, II, Canon PowerShot MWIR and Photometric ML, MAP cross distance, identification
2012. SX110, Goodrich NIR Normalization rates at rank 1 - .998, .996, .968,
SU640 and the (CLAHE, .952 (IS), .988, ..985, .939, .922
30, 60, 90,
XenICS Xeva-818, SSRlog, (CS & CD)
FLIR Systems, NIR 120m; SSRatan)
Camera with PTZ
platform
Bourlai NIR Camera with PTZ unknown geometric LBP, LTP PCA, PCA+LDA, BIC, Baseline - G8 identification rate
et al, platform normalization, ML, MAP 100%, intraspectral & cross
2012. masking, ditance – LDA (CMC) with
histogram 80% training set.
equalization,
pixel
normalization
30
Table 2.1 (continued)

NIR camera, telescope NFRAD-DB histogram SIFT & MLBP FaceVACS, DoG-SIFT still images, illumination
Maeng and NIR illuminator equalization, and DoG-MLBP pattern, DoG-SIFT high
et al, 1m, 60m and Difference performance (CMC)
2011. of Gaussian
(DoG) filtering
Yao et camcorder, telescope, UTK-LRHM wavelet Face recognition SR+WL only visual images, SR+WL
al, 2007 eye piece transform engine used produced better results than
based multi- original (CMC)
(FaceIt,
scale
processing for VeriLook)
restoring and
enhancing data
with high
magnification
values.
Goswa turntable consisting of 2103 NIR Photometric LBP Nearest Neighbor Cross-spectral, only short
mi, several pillars holding and 2086 Normalisation Dimensionality classifier with two types distances, VIS-->NIR
2010 adjacent NIR-VIS VIS images (Sequential reduction: LDA of distance measures outperforming NIR-->VIS,
cameras at different of 430 Chain (Chi-squared Histogram cross-distance not done,
eights subjects Preprocessing Distance, Normalised combination of SQ best pre-
(SQ), Single Correlation), LDA, processing
Scale Retinex LDA+CCA
(SSR), Self
Quoitent
Image(SQI))
30
31
Table 2.1 (continued)

Rara et Not specified FRGC Not specified MAP-MRF, Moment based upto 33m, 2D Procrustes
al, 2009 AAM recognition, PCA, outperformes 3D version,
Procrustes identification rate stable for 3 &
15m, highly unstable for 33m.
Li et al, one WFOV video Human Global motion PCA PCA intra-spectral matching,
2013 camera and two subjects estimation, stabilization and deblurring
NFOV cameras with inspected at local motion increased detection score form
IR filter and powerful 20 to 30m estimation, 0.58 to 0.64, no false alarm,
IR illuminators. distance undesired recognition score increased from
motion 0.12 to 0.79
removal, and
image
debluring.
Medioni two-camera system 3D & 2D Not specified Adaboost 3D matching engine visual images, 2D & 3D gives
et al, consisting of an face data classifier better performance at shorter
2009 inexpensive large field distances, as distance increases
of view video camera performance of 3D decreases
and a narrow focus
high-resolution
camera.
Chen et Not specified 6 samples Not specified LBP, Multi- manifold learning Intra spectral, recognition
al, 2009. from 250 resolution LBP, performance increases from 2%
subjects LBP Histogram to 94.2% at rank 1 (homogenous
illuminaton condition) and from
3% to 97.3% at rank
1(heterogenous)
31
environments but with uncontrolled conditions not much work has been focused.
In this report, a list of various image acquisition setups, databases, pre-processing
techniques, feature descriptors and classifiers are provided. Among them most of
the work has been done using still images and only frontal view. Similarly, less
work has been focused on long distance face recognition.
Briefing generally the methods discussed in this chapter, each has its
own pros and cons and each one is effective in its own field of usage. Though
some schemes become intricate and their computational cost in terms of time and
space might go up high, the trade-off is made on the functionality. Fundamentally
main emphasis of conducting this survey is to group together all the work related
to Long distance and night time face images into one document.
32
CHAPTER 3
FACE DETECTION
3.1 INTRODUCTION
Face detection is a technology used to spot faces in an image. Face

detection is the very first and most important step for face recognition. The
difference between face detection and face recognition is that the latter gives the
identity of the person whereas the former gives the location of the faces (Bakshi
U. et al. 2014). Face detection can be regarded as the most multifarious and
inspiring problem in the arena of computer vision, because of the large intra-class
variations created due to the changes in facial appearance, lighting, and
expression. Furthermore, when considering the applications of real time
surveillance and biometrics, the camera restrictions and pose variations make the
dissemination of human faces in feature space more dispersed and complex than
that of frontal faces (Hatem et al, 2015). It further complicates the problem
towards robust face detection that leads to difficulty in spotting the locations of
faces in an image precisely. Also it is to be noted that are numerous variables that
affect the detection performance, comprising of wearing of glasses, different skin
coloring, gender, facial hair and facial expressions (Rath et al, 2014).
According to Yang’s (2004) survey, the techniques for face detection

can be broadly divided into four types: knowledge-based, feature invariant,
template matching and appearance-based. Many of the face detection methods
emphasis on detecting frontal faces with good lighting conditions (Abdullah et al,
2014).
33
 Knowledge-based methods employ human-coded rules to represent facial
features like two symmetric eyes, a nose in the middle and a mouth
underneath the nose.
 Feature invariant methods attempt to discover facial features that are
invariant to pose, lighting condition or rotation. Skin colors, edges and
shapes fall under this category.
 Template matching methods compute the correlation between a test image
and pre-selected facial templates.
 Appearance-based methods embrace machine learning methods to extract
discriminative features from a pre-labeled training set. The Eigenface
method is the most basic method under this category. Face detection
algorithms that were proposed in the recent years such as support vector
machines, neural networks (Rowley-Balaujam, 1998), statistical classifiers
(Schneiderman-Kanade, 2000) and AdaBoost-based face detection also fit
into this category.
Several databases are available for face detection encompassing NIR

images. Equinox datasets (Singh et al, 2012) is established with 90 individuals in
240˟320 pixel resolutions. This database contains visible and IR images taken in
different controlled lighting conditions. Surveillance camera face database
(Barnouti et al, 2016) is used as real world dataset that has considerable IR
spectrum for accumulating the data at night time. Around 4160 images are taken
from 130 subjects in both day and night time (Anil K Jain, 2012). WVUM
databases has 1250 images from 50 subjects at different poses. For every pose
nine multi-spectral images were assimilated corresponding to 100nm wide
spectral sub-bands in the range from 950nm to 1650nm. The Hong Kong
Polytechnic University of NIR face database (Varsha Gupta et al, 2015) is a
biometric center to provide huge amount of NIR images. All images of each
subject differ with their pose, angle and expression. About 34000 images are
provided by this university where large number of images are taken in distance
34
between 780- 1100 m. CASIA NIR-VIS 2.0 face database (Mohsen et al, 2017)
contains 725 subjects which has a resolution of 640˟480 pixels in both NIR and
VIS images. The number of images available in this dataset is 17500.
3.2 THE VIOLA JONES FACE DETECTOR
The Viola-Jones face detector comprises of three main concepts that

make it probable to build a efficacious face detector that can run in real time: the
image integral, classifier learning with AdaBoost and the attentional cascade
structure.
3.2.1 Integral image and Feature Extraction
The very first step of the Viola-Jones face detection algorithm (Paula
Viola et al, 2004) is to convert the input image into an integral image which can
also be called as a summed area table, designed quickly and competently
computing the sum of values in a rectangle subset of a pixel grid. The integral
image at location (x, y) contains the sum of the pixels above and to the left of (x,
y), as given in Equation 3.1
𝑖𝑖 (𝑥, 𝑦) = ∑𝑥′ ≤𝑥,𝑦 ′ ≤𝑦 𝑖(𝑥 ′ , 𝑦 ′ ) (3.1)
where i(x, y) is the pixel value of the original image and ii(x, y) is the
corresponding image integral value.
35
1 1 1 1 2 3
1 1 1 2 4 6
1 1 1 3 6 9
Input Image Integral Image
Figure 3.1 Illustration of Integral Image Generation
With the help of the integral image to calculate the sum of any rectangular area is
extremely efficient, as shown in Figure 3.1. The sum of the pixels in a rectangle
ABCD can be estimated with only four values from integral image as shown in
Equation 3.2
∑(𝑥,𝑦)∈𝐴𝐵𝐶𝐷 𝑖 (𝑥, 𝑦) = 𝑖𝑖 (𝐷 ) + 𝑖𝑖 (𝐴) − 𝑖𝑖 (𝐵) − 𝑖𝑖(𝐶) (3.2)
Haar-like features can be described as rectangular digital image

features that are alike similar to Haar-wavelets. The concept called features,
introduced by Papageorgiou et al in 1998, delivered a method for encoding image
properties in a form that can be calculated much more quickly rather than working
with only the RGB pixel values at each and every pixel of image which made the
task of feature calculation computationally expensive. A Haar-like feature
contemplates adjacent rectangular regions at a definite location in a detection
window, adds up the pixel intensities in each region and calculates the difference
between these sums. Viola and Jones also extended this set by defining similar
features comprising of 3 and 4 rectangles.
36
3.2.2 AdaBoost Learning
Assuming a feature set along with a training set of positive and

negative images, any number of machine learning methodologies could be used
to learn a classification function. The Viola Jones uses a modified version of
AdaBoost to identify a small set of features and train the classifier (Paula Viola,
2001). A single AdaBoost classifier comprises of a weighted sum of many weak
classifiers, where each weak classifier is a threshold on a single Haar like
rectangular feature. The weight associated with a given sample is adjusted based
on whether or not the weak classifier correctly classifies the sample. A single weak
classifier can be defined as in Equation 3.3
1 𝑝𝑓(𝑥) < 𝜃
ℎ(𝑥, 𝑓, 𝑝, 𝜃) = { (3.3)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where f denotes the feature value, θ is the threshold and p is the polarity indicating
the direction of the inequality. The different steps in the implementation of the
AdaBoost learning procedure is as follows:
1. Given a training sample images (x1, y1), ...,(xn, yn), where yi = 0, 1 for
negative and positive examples respectively.
1 1
2. Initialize the classifier count t = 0 and the sample weights 𝑤𝑖 = ,
2𝑚 2𝑙
for yi = 0, 1 respectively, where m and l are the number of negative and positive
samples.
3. While the number of negative samples rejected is less 50%:
(a) Increment t = t + 1.
𝑤𝑖
(b) Normalize the weights 𝑤𝑖 = ∑ .
𝑗 𝑤𝑗
(c) Select the best weak classifier with respect to the weighted
error
𝑚𝑖𝑛
∈𝑡 = 𝑓,𝑝,𝜃 ∑𝑖 𝑤𝑖 |ℎ(𝑥𝑖 , 𝑓, 𝑝, 𝜃) − 𝑦𝑖 | ( 3.4)
37
(d) Define ht(x) = h(x, ft, pt, θt) where ft, pt and θt are the
minimizers of ∈𝑡 .
1−𝑒𝑖 ∈𝑡
(e) Update the weights as 𝑤𝑖 = 𝑤𝑖 𝛽𝑡 where 𝛽𝑡 = and
1−∈𝑡
ei = 0, if xi is classified correctly, otherwise ei = 1.

(f) Compute the strong classifier
1 ∑𝑇𝑡=1 𝛼𝑡 ℎ𝑡 (𝑥 ) ≥ 𝛾𝑡
𝐻 (𝑥 ) = { (3.5)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1
where 𝛼𝑡 = 𝑙𝑜𝑔 and 𝛾𝑡 is chosen such that all positive training
𝛽𝑡
samples are correctly classified.

(g) Assess negative samples by the newly computed strong
classifier H and update the number of rejected negative samples.
To compute the minimum error in step (c), it is necessary to search over

every possible feature for every single training sample. Nevertheless, for a given
feature, we can get through a sorted list of the training images to find the optimal
θ. This can be acheived by maintaining four sums: the total sum of positive sample
weights T +, the total sum of negative sample weights T −, the sum of positive
sample weights below the current sample w +, and the sum of negative weights
below the current sample w −. The error for a threshold which splits the range
between the current and previous sample in the sorted list can be computed as e =
min (w + + (T − − w −), w− + (T + − w +)). It is noted that the first error in the min
function is the error associated with all samples below the current sample labeled
as negative and the samples above are labeled as positive. In this case, the polarity
of the weak classifier should be p = −1.
3.2.3 Cascade Classifier
The cascaded classifier composes of various stages each covering a

strong classifier from AdaBoost. The work of each stage is to regulate whether a
38
given sub-window is definitely not a face or may-be a face. When a sub-window
is classified to be a non-face by a given stage it is immediately discarded.
Contrariwise, a sub-window classified as a may-be-face is passed on to the next
stage in the cascade. It follows that more the stages a given sub-window passes,
the higher the chance the sub-window contains a face.
3.3 ANALYSIS OF VIOLA JONES FACE DETECTOR ON THE LONG

DISTANCE HETEROGENEOUS FACE DATABASE
3.3.1 The LDHF Database Description
The Long Distance Heterogeneous Face Database (LDHF) was

developed by Korea University containing both VIS-NIR images using 70 male
and 30 female subjects. The images are acquired from different distances One
meter, 60 m, 100 m, and 150 m. One meter images are taken in-door and so those
images are called reference images. Every individual subjects is captured under
the fluorescent lights by using a DSLR camera with Canon F1.8 lens and NIR
images were collected using the modified DSLR camera along with NIR
illuminator of 24 IR LEDs without visible light. Long distance (over 60m) VIS
images were collected during the daytime using a telephoto lens coupled with a
DSLR camera and NIR images were collected using the DSLR camera with NIR
light provided by RayMax300 illuminator. It was captured in day time and night
time of each subject.
3.3.2 Experimental Results and Discussion:
This research work considers only the NIR images taken at night and
at various standoff distances as the probe image while the 1 m VIS images are
39
considered as the gallery data. At each distance, 100 images are taken into account
for experimenting. Even though the Viola Jones face detection algorithm is a
proven algorithm for face detection, its performance for long distance images have
not been discussed in the literatures. In this work the performance of the Viola
Jones face detector on the LDHF database was analyzed.
Table 3.1 The detection rates at different distances
S.No Distance Perfect Partial No Wrong Total
& Type Detection Detection Detection Detection Detection
% % % % %
1. 1m VIS 89 11 0 0 94.5
2. 1m NIR 90 10 0 0 95
3. 60m VIS 22 78 0 0 61
4. 60m NIR 51 49 0 0 75.5
5. 100m VIS 8 92 0 0 54
6. 100m NIR 38 60 1 1 68
7. 150m VIS 19 81 0 0 59.5
8. 150m NIR 44 24 25 7 56
The detection rates at various distances are shown in Table 3.1 which
indicates that the performance of the bench mark detector decreases as the standoff
distance increases. The Figure 3.3 shows the perfect detection of NIR images at
60 meters. Even though the detection rate at 60m, the face detection using Viola
Jones algorithm show tremendous accuracy than at 100 m and 150 m still it has
false acceptances as shown in Figure 3.4. In Figure 3.5 shows the accurate
detection achieved at 100m is shown. In Figure 3.6 shows an example for false
acceptance is shown, i.e. two non faces are detected along with the correctly
detected face image. Figure 3.7 shows the false rejection in 100m. Figure 3.8
shows the accurate detection of NIR images at 150m. Figures 3.9 and 3.10 shows
the false acceptance shows the false rejection at 150m respectively.
40
Table 3.1 shows the level of accuracy and efficiency of the Viola Jones
algorithm on NIR images. It is to be noted that the perfect detection rate at 100m
is found to be less than 150m. This is because, the discrimination gets affected by
the background features merging with the facial features. For better comparison
the graphical representation of Table 3.1 is shown in Figure 3.2.
100
80
Accurate
60 Detection.
40 False
Acceptance.
20 False Rejection
0
1 60 100 150
meter meter meter meter
Figure 3.2 Comparison of detection rates for various distances
Figure 3.3 Perfect detection at 60 m
41
Figure 3.4 Partial detection at 60m
Figure 3.5 Perfect detection at 100m.
Figure 3.6 Partial detection at 100m.
42
Figure 3.7 No detection at 100m
Figure 3.8 Perfect detection at150m
Figure 3.9 Partial detection at 150m
43
Figure 3.10 No detection at 150m.
3.4 CONCLUSION
Face detection is well known to be the most challenging problems

in image processing and a good solution has not been achieved with performance
comparable to humans both in precision and speed. In most cases, the growth in
precision is attained at the expenditure of decline in run-time performance
(computational time) and, in major applications, high precision is demanded, and
therefore, dealing with computation to reduce processing time is now a problem
with hard constraints.
The mission of detecting a face in an image is not an easy problem

because many difficulties ascend and must be taken into account in order to result
in a good recognition rate. Basically, faces generally occupy very little area in
most images and they are usually located arbitrarily. This means that face
detection algorithms must search over all areas of any given image to be
successful. Some methods do a introductory scan over the image in an attempt to
find the areas of interest early on. Algorithms must take into account the fact that
faces vary greatly in many aspects such as size, complexion and how they are
accessorized. Also faces in images can look very different depending on their
44
orientation and pose. In particular, the seminal work by Viola and Jones has made
face detection practically feasible in some applications such as digital cameras and
photo organization software.
In this work the effect and the performance of Viola Jones face
detection algorithm on NIR long distance images are described. It is exhilarating
to see Viola-Jones face detection algorithm is increasingly being used in face
recognition as well as other object recognition problems, but when it is used
towards NIR images the difficulty for the face detection is on the rise. Only the
frontal upright face images are detected by Viola Jones algorithm but it shows the
ineffectiveness towards the NIR images due to dark pixels as shown by the
experimental results. Hence, the features extracted from the NIR images need to
be improve.
The Viola Jones algorithm may be modified to improve their detection

towards NIR images and also overcome the inefficiency towards the reflectance
effect problems present in detection. One more interesting idea to improve face
detection performance is to consider the contextual information. In an
environment that has low variations, adaptation could bring extremely significant
improvements to face detection.
45
CHAPTER 4
PRE-PROCESSING
4.1 INTRODUCTION
The exact aim of pre-processing is to improve the image data that

suppresses unwanted distortions or enhances some image features important for
further processing. In simple words, pre-processing prepares the image for feature
extraction. Raw face images captured by a surveillance camera may have a
number of problems arising due to lighting conditions, noises due to dust inside
the camera, faulty CCD (Charge Coupled Devices) elements, salt and pepper noise
caused due to sharp and sudden distortions in the image signal.
In addition, the most difficult alarm affecting the performance of

face recognition systems is strong variations in pose and illumination. Mostly,
variation between images of different faces is smaller than taken from the same
face in a variety of environments. The differences between images of one face
under different illumination conditions are greater than the differences between
images of different faces under the same illumination conditions. Accordingly, the
changes induced by illumination could be larger than the differences between
individuals, causing systems based on comparing images to misclassify the
identity of the input image (Adini et al., 1997) i.e. In a long distance and night
time face recognition system, the greatest encounter is to bridge the modality gap
between the different spectral mode (say VIS vs NIR) images and the distance
between the probe and the gallery images. The greatest deal of reducing the gap
46
between spectral modes and differences in distances in the probe and the gallery
image is accomplished using the photometric normalization and DoG filtering
techniques respectively, which is elucidated in the following sections.
4.2 PROPOSED APPROACH FOR PRE-PROCESSING
NIR face images appear different from VIS face images. Further,
NIR images become blurred as the stand-off distance increases. The gallery
images are acquired indoor while the probe images are captured outdoor. To
address the problems mentioned in the previous section, preprocessing is done as
shown in figure. applied to images to reduce the appearance difference between
NIR and VIS images, and enhance the image quality.
Difference of
Median Wavelet
Gaussian
Filtering Normalization
Filtering
Photometric Normalization
Figure 4.1 Basic stages of pre-processing in the proposed approach
The pre-processing stage involves two major steps: Photometric

Normalization and Difference of Gaussian (DoG) Filtering. Photometric
Normalization includes Noise Removal and Contrast Enhancement. Contrast has
been an important factor in any subjective evaluation of image quality. Improving
the contrast of the image is very important as the images taken at night time are
usually of very low contrast and they are also sensitive to illumination variations.
The different stages in the proposed approach are explained in detail.
47
4.2.1 Median Filtering
The median filter can be called as nonlinear digital filtering technique,

commonly used to remove noise from an image or signal. Median Filtering has
the greatest advantage of preserving edges while removing noise. This
characteristic is highly considered in this work as most of the features for matching
are extracted from the edges. Median filtering is a kind of smoothing technique,
as linear Gaussian filtering. Almost all smoothing techniques are effective at
removing noise in smooth patches or smooth regions of a signal, but badly affect
edges. Also median filtering particularly removes salt and pepper noise (a
condition when an image will have dark pixels in bright regions and bright pixels
in dark regions).
123 125 126 130 140

122 124 126 127 135 Neighbourhood values:
115,119, 120, 123, 124, 125,

118 120 150 125 134
126, 127,150
119 115 119 123 133
111 116 110 120 130 Median Value: 124
Figure 4.2 Calculating the median value of a pixel neighborhood.
The median filter takes into account each pixel in the image in turn and
looks at its nearby neighbors to decide whether or not it is representative of its
surroundings. In spite of simply replacing the pixel value with the mean of
neighboring pixel values, it replaces it with the median of those values. The
median is estimated by first sorting all the pixel values from the adjoining
neighborhood ѡ into numerical order and then replacing the pixel being
considered with the middle pixel value.
𝑦[𝑚. 𝑛] = 𝑚𝑒𝑑𝑖𝑎𝑛 {𝑥 [𝑖, 𝑗], (𝑖, 𝑗) ∈ ѡ} (4.1)
48
where ѡ represents a neighborhood defined by the user, centered around
location [m,n] in the image. If the neighborhood under consideration contains an
even number of pixels, the average of the two middle pixel values is used. Figure
4.2 illustrates an example calculation. As can be seen, the central pixel value of
150 is rather unrepresentative of the surrounding pixels and is replaced with the
median value: 124. A 3×3 square neighborhoods is used here. It is to be noted that
larger neighborhoods will result in more severe smoothing and thus affects edges.
The median is a more vigorous average than the mean and so a single
very unrepresentative pixel in a neighborhood will not affect the median value
significantly. As the median value must essentially be the value of one of the
pixels in the neighborhood, the median filter does not generate new unrealistic
pixel values when the filter straddles an edge. For this reason, the median filter is
much better at conserving sharp edges than the mean filter.
4.2.2 Wavelet Normalization

Wavelet normalization is a technique used to normalize illumination
in the facial images. In this method, the image is decomposed into subbands of
low frequency and high frequency components which are subsequently
manipulated individually. Histogram equalization is applied to the approximation
(low frequency) coefficients and at the same time the detail (high frequency)
coefficients are accentuated by multiplying by a scalar (>1) so as to enhance
edges. A normalized image is acquired from the modified coefficients by applying
inverse wavelet transform. The resultant image has not only enhanced contrast but
also enhanced edges and details that will facilitate the further face recognition
task. The block diagram of the proposed scheme is shown in the Figure below.
49
Inverse
Wavelet Approximation Histogram
Wavelet
Transform Coefficients Equalization
Transform
Detail Scalar
Coefficeients Multiplication
Figure 4.3 Block Diagram of Wavelet Normalization
Contrast enhancement is achieved by equalizing the histogram of

the image pixel gray-levels in the spatial domain so as to reallocate them
uniformly. Histogram equalization organizes the gray-levels of the image by using
the histogram form statistics. Then, a mapping function is applied to the original
gray-level values. The accumulated density function of the histogram for the
processed image histogram would approximate a straight line. This redistribution
of pixel brightness to approximate the uniform distribution improves the contrast
of the image. We here use histogram equalization to enhance the contrast of the
approximation coefficients. Therefore, the illumination of the approximation
image is also normalized.
Edge enhancement is to highlight the fine details in the original image.

The perceptibility of edges and small features can be amended by enlarging the
amplitude of the high frequency components in the image. To accentuate details,
we multiply each element in the detail coefficient matrix with a scale factor (=2
in our case).
4.2.3 Difference of Gaussian Filtering
Differnce of Gaussian (DoG) filter is a typ of filter that makes

convolution operations with original grey image to smooth original image in space
50
domain. The transfer functions of DoG filter are two Gaussian differences with
different widths. In this approach the DoG filter acts as bridge between the VIS
and NIR images. It suppresses the variations and highlights the similarities
between the two different modes specifically the VIS and the NIR images. Also
it acts as a feature enhancement procedure as it can be utilized to increase the
visibility of edges.
The DoG (V.Struc, 2009) is in reality a band-pass filter that can remove
high frequency components that represents noise, and also some low frequency
components that represents the homogeneous areas in the image. The frequency
components of the passing band are assumed to be associated to the edges in the
image. The DoG normalized technique can be constructed using the equations 4.2.
𝐷𝑜𝐺 ≜ 𝐺𝜎1 − 𝐺𝜎2 (4.2)

1 1 2 +𝑦 2 )⁄2𝜎2 1 2 +𝑦 2 )⁄2𝜎2
= ( 𝑒 −(𝑥 1 − 𝑒 −(𝑥 2 )
√2𝜋 𝜎1 𝜎2
The difference of Gaussians algorithm removes high frequency detail

that often includes random noise, rendering this approach one of the most suitable
for processing images with a high degree of noise. A major drawback to
application of the algorithm is an inherent reduction in overall image contrast
produced by the operation.
4.3 PHOTOMETRIC NORMALIZATION TECHNIQUES
A real-world face recognition system needs to work under different

imaging conditions like with varying illumination, pose, facial expression, quality
of image etc. But when we see the actual performances of face recognition systems
under varying conditions, we can say that they are unreliable to deploy them in
51
real world applications. Predominantly when we see illumination, it is the major
problem in face recognition. Different researchers express the problem due to
illumination as variation between images of different faces can be smaller than the
variations between images of the same face under different illumination. It can
even be shown that illumination causes larger variation in face images than pose
(V. Štruc et al,2009).
Illumination problem rises due to uneven lightning on faces. This

uneven lightning brings variations in illumination which affects the classification
greatly since the facial features that are being used for classification gets effected
due to this variation (F. Xiaolong et al, 2009). This section provides a brief
description of the photometric normalization techniques that were used in the
existing face recognition system.
4.3.1 The Non-Local Means (NLM)

The photometric normalized image Id(x)in the Non Local Means
algorithm (V. Štruc, 2011), can be constructed by calculating each value of Id (x)
as a weighted average of pixels comprising In(x), given by Equation 4.3,
𝐼𝑑 (𝑥 ) = ∑𝑥∈𝐼𝑛 (𝑥) 𝑤 (𝑧, 𝑥 )𝐼𝑛 (𝑥) (4.3)

where, w(z, x) represents the Gaussian weighting function (W. Chen,
2006) that estimates the similarity between the local neighborhoods of the pixel at
the spatial locations z and x. The weighting function can be defined by Equation
4.4
2
𝐺 ||𝐼 (𝛺𝑥)−𝐼𝑛 (𝛺𝑧)|| 2
1 − 𝜎 𝑛
𝑤 (𝑧, 𝑥 ) = 𝑒 2
ℎ (4.4)
𝑍(𝑧)
52
where, Gσ represents a Gaussian kernel with the standard deviation σ,
Ωx and Ωz denotes the local neighborhoods of the pixels at the locations x and z,
respectively, h stands for the parameter that controls the decay of the exponential
function, and Z(z) represents a normalizing factor.
4.3.2 The Adaptive Non-Local Means (ANLM)
The Adaptive Non Local Means Algorithm is a modified version of the

non-local means algorithm in which the decay parameter h is a function of local
contrast and not a fixed and preselected value (V. Štruc, 2011). The local contrast
between neighboring pixel locations a and b can be defined as in Equation 4.5,
|𝐼𝑛 (𝑎)−𝐼𝑛 (𝑏)|

𝜎𝑎,𝑏 = (4.5)
|𝐼𝑛 (𝑎)+𝐼𝑛 (𝑏)|
By assuming that a is an arbitrary pixel location within In(x) and b

denotes a neighboring pixel location say, above, below, left or right from a, four
contrast images encrypting the local contrast in one of the potential four directions
can be built. Finally, the contrast image Ic(x) is estimated as the average of the
four directional contrast images. For linking the decay parameter h with the
contrast, image the logarithm of the inverse of the (8- bit gray-scale) contrast
image given in Equation 4.5 is first computed,
1
𝐼𝑖𝑐 (𝑥 ) = log[ ] (4.6)
𝐼𝑐 (𝑥)
where, the numerator is an unit matrix. Subsequently, the values of the

inverted contrast image Iic(x) is linearly mapped to values of the decay parameter
h, which becomes a function of the spatial location,
53
(𝐼𝑖𝑐 (𝑥)−𝐼𝑖𝑐𝑚𝑖𝑛 )
ℎ(𝑥 ) = [ ] − (ℎ𝑚𝑎𝑥 + ℎ𝑚𝑖𝑛 ) (4.7)
𝐼𝑖𝑐𝑚𝑎𝑥 −𝐼𝑖𝑐𝑚𝑖𝑛
where Iicmax and Iicmin denote the maximum and minimum value of the
inverted contrast image Iic(x), respectively, and hmax and hmin stand for the target
maximum and minimum values of the decay parameter h.
4.3.3 Single Scale Retinex (SSR)

The single scale retinex technique (D. Jobson et al, 1997), originally
named the center/surround retinex algorithm, is can be derived from Equation 4.8
shown below under the common assumptions with regard to the characteristics of
the reflectance and luminance functions of an image.
𝑅′ (𝑥, 𝑦) = 𝑙𝑜𝑔𝑙 (𝑥, 𝑦) − log[𝑙(𝑥, 𝑦) ∗ 𝑘(𝑥, 𝑦)] (4.8)
where “*” denotes the convolution operator, k(x,y) denotes a smoothing

kernel and R’(x,y) stands for the illumination invariant reflectance output of the
single scale retinex algorithm and l(x,y) denotes the luminance of the input image.
It is imperative to choose an appropriate smoothing kernel while implementing
the SSR algorithm. One prominent way is to represent it in the form of a Gaussian
function.
4.3.4 Multi Scale Retinex (MSR)

The multi scale retinex technique proposed to use smoothing kernels
of different sizes and basically combine the outputs of different single scale
retinex implementations. Formally, the illumination invariant reflectance of an
input face image I(x,y) using the multi scale retinex (D. Jobson et al,1997) is
computed as in Equation 4.9.
54
𝑅′ (𝑥, 𝑦) = ∑𝑀
𝑖=1 𝑤𝑖 (𝑙𝑜𝑔𝑙 (𝑥, 𝑦) − log[𝑙(𝑥, 𝑦) ∗ 𝐾𝑖 (𝑥, 𝑦)] (4.9)
where Ki(x,y) denotes a Gaussian kernel at the i-th scale, and wi stands
for the weight associated with the i-th Gaussian kernel Ki(x,y).
4.3.5 Adaptive Single Scale Retinex (ASSR)
Adaptive single scale retinex techniue was proposed (X.Xie et al, 2006)
to tackle the halo effects often encountered with the original single scale retinex
technique by incorporating an adaptive smoothing procedure with a discontinuity
preserving filter into the single scale retinex algorithm with the goal of robustly
estimating the images’ luminance.
The key idea of adaptive smoothing is to iteratively convolve the input

image I(x,y) with the 3×3 averaging mask w(x,y) whose coefficients reflect the
discontinuity level of the input image at each of the spatial positions (x,y).
Mathematically, the iterative smoothing procedure at the (t+1)-th iteration is
given by Equation 4.10 and 4.11
1 1
1
𝐿(𝑡+1) (𝑥, 𝑦) = (𝑡) ∑ ∑ 𝐿(𝑡) (𝑥 + 𝑖, 𝑦 + 𝑗)𝑤 (𝑡)(𝑥 + 𝑖, 𝑦 + 𝑗)
𝑁 (𝑥, 𝑦)
𝑖=−1 𝑗=−1
(4.10)
and
𝐿(𝑡+1)(𝑥, 𝑦) = max{𝐿(𝑡+1)(𝑥, 𝑦), 𝐿(𝑡) (𝑥, 𝑦)} (4.11)
55
4.3.6 Isotropic Smoothing (ISS)
Isotropic Smoothing (O. Arandjelovic et al,2009) tries to estimate

the luminance L(x,y) of the imaging model as a blurred version of the original
input image I(x,y). Also it does not apply a simple smoothing filter to the image
to produce the blurred output, but rather constructs the luminance function L(x,y)
by minimizing the following energy-based cost function,
2
𝐽(𝐿 (𝑥, 𝑦)) = ∬(𝐿(𝑥, 𝑦) − 𝐼(𝑥, 𝑦)) 𝑑𝑥𝑑𝑦 + 𝜆 ∬(𝐿2𝑥 (𝑥, 𝑦) + 𝐿2𝑦 (𝑥, 𝑦)𝑑𝑥𝑑𝑦
(4.12)
where the first term forces the luminance L(x,y) to be close to the
original input image I(x,y), and the second term imposes a smoothing constraint
on L(x,y), and the parameter λ controls the relative importance of the smoothing
constraint
4.3.7 Anisotropic smoothing (AISS)

The Anisotropic Smoothing Technique (O. Arandjelovic et al,2009)
introduces an additional weight function ρ(x,y) that ensures the fitness between
the input image I(x,y) and the luminance L(x,y). The anisotropic smoothing is
based on the following cost function:
2
𝐽(𝐿(𝑥, 𝑦)) = ∬ 𝜌(𝑥, 𝑦)(𝐿(𝑥, 𝑦) − 𝐼(𝑥, 𝑦)) 𝑑𝑥𝑑𝑦 + 𝜆 ∬ 𝐿2𝑥 (𝑥, 𝑦) +
𝐿2𝑦 (𝑥, 𝑦)𝑑𝑥𝑑𝑦 (4.13)
It has to be noted that the usefulness of the anisotropic smoothing
procedure heavily depends on the right choice of the parameter λ.
56
4.3.8 Anisotropic Smoothing Stable (AISSS)
The anisotropic smoothing stable technique known as the Modified

Anisotropic Diffusion normalization (MAS) (V. Struc, et al, 2011) is based on the
anisotropic normalization. In this the local contrast estimate was more dynamic
thereby able to saturate the values that are extreme that was brought into the
contrast calculation due to pixel intensities close to 0 in the original facial images.
A dynamic post processing is applied in the final stage.
4.3.9 Discrete Cosine Transform (DCT)

The Discrete Cosine Transform (V. Struc, 2011) based
normalization takes the following steps: first the technique takes the logarithm of
the input image I(x,y) to separate the reflectance and luminance. Next, the entire
image is transformed to the frequency domain via the DCT transform, where the
manipulation of the DCT coefficients, with the goal of achieving illumination
invariance, takes place. Here, the first DCT coefficient C(0,0) is set to
𝐶 (0,0) = log µ. √𝑀𝑁 (4.14)
where M and N denote the dimensions of the input image I(x,y) and μ
is chosen near the mean value of I(x,y).
A predefined number (X. Tan, 2007) of DCT coefficients encoding

the lowest frequency information of the image is then set to zero. As the final step,
the modified matrix of DCT coefficients is transformed back to the spatial domain
via the inverse DCT to produce the illumination invariant representation of the
facial image.
57
4.3.10 Difference of Gaussian (DoG)
The DoG (V. Struc, 2009) is actually a band-pass filter, which

removes high frequency components representing noise, and also some low
frequency components representing the homogeneous areas in the image. The
frequency components in the passing band are assumed to be associated to the
edges in the image. The Dog normalized technique can be constructed using the
Equation 4.15
𝐷𝑜𝐺 ≜ 𝐺𝜎1 − 𝐺𝜎2

1 1 2 +𝑦 2 )⁄2𝜎 2 1 2 +𝑦 2 )⁄2𝜎2
= ( 𝑒 −(𝑥 1 − 𝑒 −(𝑥 2 ) (4.15)
√2𝜋 𝜎1 𝜎2
The Difference of Gaussians algorithm removes high frequency detail

that often includes random noise, rendering this approach one of the most suitable
for processing images with a high degree of noise. A major drawback to
application of the algorithm is an inherent reduction in overall image contrast
produced by the operation.
4.3.11 Homomorphic Filtering (HMF)

The final illumination invariant image I’(x,y) in Homomorphic
filtering algorithm (R. S. Ghiass et al,2014) is ultimately obtained by finding the
inverse transform of the filtered image and taking its exponential, as in Equation
4.16
𝐼′ (𝑥, 𝑦) = exp{Ӻ−1 [𝐻(𝑢, 𝑣 ). 𝑍(𝑢, 𝑣 )]} (4.16)
where “.” represents the element-wise multiplication. It is necessary to

note that the result of homomorphic filtering procedure is a normalized image
58
I’(x,y) in accordance to the reflectance of the image as there are no direct
subtraction of the luminance performed. Nevertheless, the result estimates the
reflectance because the effect of the luminance was condensed and that of the
reflectance was accentuated through the filtering operation in the frequency
domain.
4.3.12 Large Scale and Small-scale Features (Lssf)
The Lssf normalization technique (V. Struc et al, 2011) is based on

the concept that the large scale (low frequency) features and the small scale (high
frequency) features of an image can be considered for processing separately such
that the resultant yields a good quality photometric normalized image. According
to this, the normalization is done to the large scale features S and certain amount
of processing is done to the small scale features ρ. This can be achieved using the
Equation 4.17,
𝐼𝑛𝑜𝑟𝑚 (𝑥, 𝑦) = 𝜌′(𝑥, 𝑦)𝑆𝑛𝑜𝑟𝑚 (𝑥, 𝑦) (4.17)

where I(x,y) = ρ(x,y)S(x,y), ρ’ = T1(ρ) and Snorm = T2(S). T1 indicates
the smoothing of the small scale features and T2 indicates the illumination
normalization on the large scale features.
4.3.13 Single Scale Self Quotient Image (SSSQI)

The Single Scale Self Quotient Image (H.Wang et al, 2004.) can be
obtained from the Equation 4.18,
𝐼(𝑥,𝑦)
𝑄(𝑥, 𝑦) = ≅ 𝑅(𝑥, 𝑦) (4.18)
𝐼(𝑥,𝑦)𝐹(𝑥,𝑦)
59
where, F(x,y) is a low pass filter. A low pass filter is used here as
illumination can be considered as a low frequency component. It should be noted
that the properties of Q(x,y) are dependent on the kernel size of F(x,y). If
too small, then Q(x,y) will approximate one, and the Albedo information will be
lost; if too large, there will appear halo effects near edges. SQI is usually defined
as the intrinsic property of face images of a person. This technique is mainly used
in cases where it requires removing of shadows form the images.
4.3.14 Multi Scale Self Quotient Image (MSSQI)
The properties of the output Q(x,y) of the previous technique are

dependent on the kernel size of filter F(x,y). If it is smaller than Q ≈1, then all
reflectance information will be lost. On the other hand, if kernel size is too large
then it is likely to have halo effects near edges. To avoid this problem Wang
(2004) proposed the multi scale approach given by Equation 4.19,
𝑄(𝑥, 𝑦) = ∑𝑛𝑘=1 𝑚𝑘 𝑇{𝑄𝑘 (𝑥, 𝑦)} (4.19)

where, mk is a weighting factor, T is nonlinear function and Qk are
quotient images corresponding to k scale as shown in Equation 4.20
𝐼(𝑥,𝑦)
𝑄𝑘 (𝑥, 𝑦) = 1 ; 𝑘 = 1,2, … … , 𝑛 (4.20)
(𝑁𝑊𝑘 𝐺𝑘 )∗𝐼(𝑥,𝑦)
where, N is normalization factor, WkGk are weighted Gaussian kernel
4.3.15 Weberfaces (WF)
The illumination changes in face images are normalized using

Weber’s law (1834) based normalization technique. Ernst Weber (1834),
60
experimentally studied that the ratio of the increment threshold to the background
intensity is a constant. This can be mathematically represented as in Equation 4.21
∆𝐼
=𝑘 (4.21)
𝐼
where ∆I signifies the increment in I, I signifies the initial stimulus

intensity and k denotes that the equation on the left side remains constant despite
the changes in I. The fraction ∆I/I is known as the Weber fraction. The ratio image
can be estimated using the Weber local description (WLD) suggested by Chen-et
al (2009). It consists of two components called as differential excitation and
orientation. The weberface is obtained by applying WLD to the face image as
sown in Equation 4.22
𝐼(𝑥, 𝑦) − 𝐼(𝑥 − 𝑖∆𝑥,𝑦 − 𝑖∆𝑦 )

𝑊𝐹 (𝑥, 𝑦) = arctan (∝ ∑ ∑
𝐼(𝑥, 𝑦)
𝑖=𝐴 𝑗=𝐴
(4.22)
where A= {-1, 0, 1}. The prevention of larger magnitude of output and

partially suppressing the side effect of noise is made possible by the arc tangent.
Ic denotes the intensity of the current pixel. Ii represents the intensities of eight
neighbors and α denotes the adjusting parameter for the intensity difference
between the neighboring pixels. In this work, α takes the default value two.
4.3.16 Multi Scale Weberfaces (MSWF)
In Multi scale Weberfaces algorithm (V.Struc et al, 2009), different

scale Weberfaces are captured by calculating differential excitation of the filtering
windows with different radius. The different scale Weberfaces reveal that small
scale Weberfaces mainly consist of facial local information and more facial detail
61
component while large-scale Weberfaces include global information and less
detail component. Multi-scale Weberfaces which is calculated as Equation 4.23
𝑀𝑊𝐹 (𝑥, 𝑦) = ∑3𝑖=1 𝑤𝑖 𝑊𝐹𝑖 (𝑥, 𝑦) (4.23)
where, W Fi represents the Weberfaces of the radius (i=1,2 3), ωi is its

weighting coefficient.
4.3.17 Retina Model (RM)
The variation in illumination can be normalized using the retina

modeling technique which mimics the performance of the human retina. The
algorithm (V. Struc, 2009) combines two adaptive nonlinear functions and a
Difference of Gaussians filter. This can be connected to the performance of two
layers of the retina, the photoreceptors and the outer plexiform layer. The two
adaptive nonlinear functions are given by the Equations 4.24 and 4.25,
̅̅̅̅
𝐼𝑖𝑛
𝐹1 (𝑝) = 𝐼𝑖𝑛 (𝑝) ∗ 𝐺1 + (4.24)
2
𝐼𝑖𝑎1(𝑝)
𝐼𝑖𝑎2 = (𝐼𝑖𝑎1(max) + 𝐹2 (𝑝)) (4.25)
𝐼𝑖𝑎1 (𝑝)+𝐹2 (𝑝)
where F1 and F2 are the filter adaptation factors at pixel p, Iin is the
intensity of the image, * is the convolution factor, G1 and G2 are 2D Gaussian
filters with standard deviation σ1 and σ2 respectively, Imax is the maximal value of
image intensity and IIa1 is the light adapted image.
62
4.3.18 Steerable Gaussians (SG)
The Steerable Gaussians (SG) is a steerable filtering-based

(H.Guillaume et al, 2005) normalization that uses steerable filtrations to eradicate
the illumination caused by the variances in the appearance the face images. Taking
the concept granted, many of the irregular small blocks on the face are said to be
directional. In simple words, the steerable Gaussian is the decomposition of the
steerable pyramid that is used segregate the physiognomies in a facial image into
various sub-bands at various scales with estimations and details.
Since it is not sufficient to stop only with a one-level decomposition

to segregate these characteristics effectively, it is necessary to discover various
combinations of sub-bands at levels that are higher to gain a proper isolation. It is
mandatory to know that a direct solution must measure the derivatives at all the
directions to catch the better quality of multi-orientation information in the facial
images. Also, this approach requires very high computational cost. According to
the theory of steerable filtering, an image’s derivatives in any direction can be
interpolated by some functionality that is basic derivatives. Every image is
decomposed into sub-bands of 3-levels and 6-orientations for face to create the
database of images.
4.3.19 Tan and Triggs (TT)
The photometric normalization using the Tan and Triggs (X.Tan et

al, 2007) method involves the following steps: gamma correction, difference of
Gaussian filtering, masking and finally contrast equalization. The perception
behind the technique is to remove the illumination component which is dependent
on lighting condition during image capture so that only the reflectance component
which truly is an object property remains. Gamma correction is a nonlinear gray
63
level transformation that replaces each pixel with intensity I in the image with Iγ
for 0 < γ < 1 or if γ+0. A gamma correction increases the dynamic range of an
image. This cannot remove all effects of intensity gradients such as shading
effects. Hence a DoG filter which acts as a band pass filter to remove all shading
effects and other noises is used. The final step of the preprocessing chain is to
perform contrast equalization which globally rescales the image intensities to
standardize a robust measure to overall contrast or intensity variation.
𝐼
𝐼= 1 (4.26)
(𝑚𝑒𝑎𝑛(𝐼 𝛼 ))𝛼
𝐼
𝐼= 1 (4.27)
(𝑚𝑒𝑎𝑛(min(𝜏,𝐼 𝛼 )))𝛼
𝐼 = 𝜏 ∗ tanh(1⁄𝜏) (4.28)
The output of above steps is an image with pixel intensity in the ranges
(-τ, τ). In our work the value of τ is considered as ten.
4.3.20 Wavelet Normalization (WN)
Wavelet Normalization (V.Struc et al, 2011) which is based on 2D

discrete wavelet transform (2DWT) decomposes the image to acquire four sub-
bands: the low-low sub-band engendered by the approximation coefficients and
the low-high, the high-low and high-high sub-bands generated by the detail
coefficients. Contrast enhancement of the degraded image is perhaps done by
histogram equalization of the approximation coefficients, meanwhile edge
enhancement can be achieved by multiplying the detail coefficients with a scalar
(>1). A normalized image is thus obtained from the modified coefficients by
applying an inverse wavelet transform. In this work, Daubechies wavelet is
selected for implementation.
64
4.4 PERFORMANCE MEASURES FOR EVALUATING PHOTOMETRIC
NORMALIZATION TECHNIQUES
The performance of the pre-processing techniques is evaluated

using a set of four measures (H. Guillaume et al, 2005) that reflect the quality of
the photometric normalized output. The metrics chosen to evaluate the various
photometric normalization techniques are given below and summarized in Table
4.1.
4.4.1 Entropy
Entropy (S.K. Pal et al, 1983) is used to measure the content of an
image with higher value indicating an image with richer details. Higher the value
of entropy, higher will be the richness of details in the image.
4.4.2 Root Mean Square Contrast
Root Mean Square Contrast is defined as the standard deviation of

all pixel intensities of an image. This does not depend on the angular frequency
content or the spatial distribution of contrast in the image.
4.4.3 Histogram Spread
Histogram spread (HS) can be defined as the ratio of the quartile

distance to the range of the histogram. With this concept, it is possible to verify
that the low contrast images with narrow and peaky histogram have low value of
HS in comparison with the high contrast images with broad and flat histogram.
65
4.4.4 Histogram Flatness Measure
The Histogram Flatness Measure (HFM) (J. A. stark, 2000) can be

defined as the ratio of geometric mean of the image histogram to its geometric
mean. It is a very known fact that the geometric mean of a data set will always be
less than or equal to the arithmetic mean. Therefore, it is possible to declare that
HFMϵ [0,1]. It is also possible to verify that the low contrast images with narrow
and peaky histogram, have low value of HFM in comparison with the high contrast
images with a broad and flat histogram. It is necessary to note that for calculating
the geometric mean and arithmetic mean we ignore the bins having zero count.
Table 4.1 Measurement Parameters

S.No Measurement Expression
Parameter
1. Entropy 𝐿−1
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = − ∑ 𝑃(𝑘 )𝑙𝑜𝑔2 𝑃(𝑘)

𝑘=0
2. ContrastRMS 𝑀 𝑁
𝐶𝑅𝑀𝑆 = √∑ ∑[𝐼(𝑥, 𝑦) − 𝜇]2

𝑥=1 𝑦=1
3. Histogram Spread 𝐻𝑆
(3𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 − 1𝑠𝑡 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒)𝑜𝑓 ℎ𝑖𝑠𝑡𝑜𝑔𝑟𝑎𝑚
=
(𝑚𝑎𝑥 − 𝑚𝑖𝑛)𝑜𝑓 𝑡ℎ𝑒 𝑝𝑖𝑥𝑒𝑙 𝑣𝑎𝑙𝑢𝑒 𝑟𝑎𝑛𝑔𝑒
4. Histogram 𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑃
𝐻𝐹𝑀 =
Flatness measure 𝑎𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑃
P denotes the histogram count of the image I(x,y) of size MxN and µ represents
the mean of the intensity values of the image.
66
4.5 EVALUATION SCHEME FOR MODALITY GAP REDUCTION
USING DIFFERENCE OF GAUSSIAN FILTERING
The following set of similarity measures have been selected to evaluate

the functionality of the Difference of Gaussian (DoG) filtering for reducing the
spectral differences between NIR and VIS images: Root Mean Squared Error
(RMSE), peak signal to noise ratio (PSNR), Mean Absolute Error (MAE),
Structural Content (SC), Normalised Cross-Correlation (NK), Maximum
Difference (MD) and Normalised Absolute Error (NAE).
4.5.1 Root-Mean-Square Error (RMSE)
The root-mean-square error (RMSE) can be termed as a frequently

used measure of the differences between values that are predicted by a model
estimator and the values actually observed. In simple words the RMSE represents
the sample standard deviation of the differences between predicted values and
observed values. RMSE is a good measure (Sankur B et al, 2004) of accuracy,
though it only compares forecasting errors of different models for a particular
variable and not between variables, because it is scale-dependent. RMSE can be
given by the Equation 4.29
1
RMSE =
𝑀×𝑁
√∑𝑀 𝑁
𝑖=1 ∑𝑗=1(𝑋𝑖,𝑗 − 𝑌𝑖,𝑗 )
2
(4.29)
4.5.2 Peak Signal-To-Noise Ratio (PSNR)
The peak signal-to-noise ratio (PSNR) is a measure (S. Venkatesh

et al, 1995) used to evaluate the quality of a compressed image of the original. This
67
quality index of the images defined as the ratio between the maximum power of a
signal and the noise power. PSNR is usually expressed in terms of the logarithmic
decibel scale due to many signals having very wide dynamic range. The higher
the PSNR value, more the similarity with the original image, from a perceptual
point of view. PSNR is given by Equation 4.30
PSNR = 20*log (255/RMSE) (4.30)
4.5.3 Mean Absolute Error (MAE)
MAE measures the average magnitude of the errors in a set of

forecasts, without considering the direction. MAE is the average over the
verification samples of the absolute values of the differences between forecast and
the corresponding observation. In the calculation of MAE the individual
differences are weighted equally as Equation 4.31
1
MAE = ∑M N
i=1 ∑j=1|x(i, j) − y(i, j)| (4.31)
MN
4.5.4 Structural Content (SC)

It is one of the correlation-based measures (S. Venkatesh, et al,1995)
which represents the closeness (relationship) between two digital images which
can also be quantified in terms of correlation function. Hence, this metric measure
is complementary to the difference-based measure.
∑M N
i=1 ∑j=1(y(i, j))
2
SC = ──────────
∑M N
i=1 ∑j=1 x(i, j))
2
4.32
68
4.5.5 Normalised Cross Correlation (NK)
The closeness between two digital images can also be quantified in terms
of correlation function. The Normalised Cross Correlation is also complementary
to the difference based measures. All the correlation based measures tend to 1, as
the difference between two images tend to zero.
M N
∑ ∑(x(i, j) × y(i, j))

i=1 j=1
NK = ──────────────
∑M N
i=1 ∑j=1(x(i, j))
2
(4.33)
4.5.6 Normalised Absolute Error (NAE)
Normalised absolute error is a measure of how far is the

decompressed image from the original image with the value of zero being the
perfect fit. Large value of NAE indicates poor quality of the image.
M N
∑ ∑|x(i, j) − y(i, j)|

i=1 j=1
NAE = ───────────────
M N
∑ ∑(x(𝑖, j))
I=1 j=1
(4.34)
69
4.6 EXPERIMENTAL RESULTS AND DISCUSSIONS
4.6.1 Results of Photometric Normalization
The outputs of the various photometric normalization techniques for

a sample image at distances 150 m, 100 m and 60 m are shown in Figures
respectively. The prior goal of photometric normalization is contrast
enhancement. The visual inference shows that the contrast of the input image has
been improved to different extent by the different techniques. Of all the
techniques tried, wavelet normalization gives better contrast and it can be clearly
seen in the out shown in Figure. At 60 m, the images obtained show that the edges
are pretty clear whereas the images at 150 m are too much scattered. From this, it
is evident that as distance increases the details in the images are less.
Tables 4.2, 4.3 and 4.4 show the values of entropy, root mean square
contrast, histogram spread and histogram flatness measure of sample images.
Higher the value of entropy, higher is the quality of the image comparatively. Also
when the root mean square contrast is high the contrast of the resulting image is
high. The values of histogram spread and histogram flatness measure lies in the
range between 0 and 1. The nl means and adaptive nl means normalization give a
moderate boost to the parameters. The single scale retinex and multiscale
techniques almost doubles the contrast level but the adaptive single scale retinex
does not show up a considerable improvement in the result. The original isotropic
smoothing techniques provides better normalization compared to the anisotropic
and the anisotropic stable smoothing algorithm.
The DCT normalization, Difference of Gaussian and Lssf

normalization gives reasonable lift to the entropy, contrast and histogram based
parameters. The Lssf normalization also gives a good level of improvement in the
contrast of the image. The single scale and multiscale self-quotient image deliver
70
a good amount of progress whereas multiscale weberfaces, Tan & Triggs,
steerable Gaussian and retina model display adequate improvement. The wavelet
normalization provides the highest values compared to all other techniques. The
analysis shows that the wavelet based normalization provides good results for all
distances. These pre-processed results are sure to provide better recognition rates
for long distance face recognition system.
Figure 4.4 Photometric Normalized outputs for a sample image at 150m (a)
anisotropic smoothing, (b) anisotropic smoothing stable, (c) adaptive nl
means, (d) adaptive single scale retinex, (e) DCT normalization, (f) DoG, (g)
homomorphic filtering, (h) isotropic smoothing, (i) lssf normalization, (j)
multiscale retinex, (k) multiscale self-quotient image, (l) multi scale
weberfaces, (m) nl means, (n) retina modeling, (o) steerable gaussians, (p)
single scale retinex, (q) single scale self-quotient image, (r) tan & trigs, (s)
weberfaces, (t) wavelet normalization .
71
weberfaces, (t) wavelet normalization .
72
weberfaces, (t) wavelet normalization
73
Table 4.2 Performance measures attained for sample images taken at
distance 150m
Image 0003_150_n Image 0007_150_n
S.No Technique
Entropy C_RMS HS HFM Entropy C_RMS HS HFM
Without
pre-
1 processing 5.89 17.52 0.15 0.06 5.80 16.12 0.29 0.38
2 NLM 6.99 34.47 0.16 0.32 6.68 32.12 0.13 0.24
3 ANLM 7.11 36.89 0.18 0.37 6.84 32.51 0.15 0.27
4 SSR 7.10 36.97 0.18 0.34 6.89 37.53 0.15 0.33
5 ASSR 5.80 33.59 0.12 0.18 5.01 25.11 0.06 0.09
6 MSR 7.03 35.27 0.18 0.31 6.80 35.20 0.14 0.29
7 ISS 7.18 37.97 0.16 0.43 6.94 35.24 0.13 0.41
8 AISS 7.03 35.92 0.14 0.41 6.78 32.58 0.11 0.36
9 AISSS 6.51 24.05 0.10 0.17 6.56 27.39 0.09 0.24
10 DCT 6.98 32.36 0.15 0.32 6.58 26.47 0.11 0.22
11 DoG 6.13 20.30 0.08 0.09 6.93 32.59 0.14 0.33
12 HMF 5.93 18.74 0.07 0.12 5.80 18.11 0.06 0.11
13 Lssf 7.10 35.02 0.18 0.29 7.00 35.66 0.15 0.31
14 SSSQI 7.42 36.97 0.27 0.67 7.13 54.69 0.20 0.54
15 MSSQI 7.57 58.75 0.32 0.71 7.29 56.43 0.24 0.61
16 MSWF 6.95 32.48 0.15 0.25 6.88 34.37 0.13 0.33
17 RM 6.56 25.05 0.11 0.17 6.69 29.31 0.11 0.26
18 SG 5.86 16.26 0.07 0.07 6.36 23.53 0.10 0.16
19 TT 6.65 25.91 0.12 0.17 6.84 31.87 0.12 0.33
20 WN 7.92 76.03 0.53 0.99 7.91 73.21 0.50 0.98
21 WF 7.22 38.29 0.19 0.37 7.16 39.35 0.15 0.44
74
distance 100m
Glowing eyes (Image Dull/No glow (Image

S.No Technique 0077_100_n) 0019_100_n)
Entropy C_RMS HS HFM Entropy C_RMS HS HFM
Without
pre-
1 processing 6.40 29.52 0.28 0.13 7.26 48.44 0.38 0.28
2 NLM 6.81 33.38 0.14 0.30 6.95 38.49 0.16 0.37
3 ANLM 6.91 34.85 0.15 0.34 7.03 39.14 0.16 0.39
4 SSR 6.88 37.08 0.18 0.30 7.12 42.60 0.19 0.40
5 ASSR 5.59 34.66 0.10 0.18 5.34 32.11 0.08 0.15
6 MSR 6.79 35.03 0.17 0.28 7.03 40.09 0.18 0.36
7 ISS 7.01 37.17 0.13 0.46 7.07 37.68 0.14 0.45
8 AISS 6.95 36.49 0.11 0.44 6.73 32.86 0.10 0.36
9 AISSS 6.99 32.90 0.15 0.34 6.33 23.97 0.08 0.15
10 DCT 6.99 32.90 0.15 0.34 6.78 30.11 0.11 0.30
11 DoG 5.51 16.73 0.05 0.06 6.80 30.53 0.11 0.32
12 HMF 6.26 22.93 0.10 0.15 5.51 17.72 0.05 0.09
13 Lssf 6.67 28.71 0.13 0.21 6.93 34.03 0.14 0.28
14 SSSQI 7.37 60.76 0.27 0.66 7.25 60.56 0.25 0.62
15 MSSQI 7.47 60.11 0.31 0.68 7.35 60.91 0.29 0.65
16 MSWF 6.86 32.90 0.13 0.29 6.89 34.66 0.13 0.34
17 RM 6.51 26.05 0.09 0.22 6.66 28.37 0.11 0.25
18 SG 6.00 17.17 0.07 0.08 5.91 17.45 0.06 0.08
19 TT 6.77 29.14 0.12 0.24 6.79 30.30 0.11 0.28
20 WN 7.95 75.70 0.52 0.99 7.95 75.24 0.52 0.98
21 WF 7.13 38.85 0.16 0.44 7.11 38.85 0.15 0.41
75
distance 60m
S. Technique Glowing eyes (Image 0004_60_n) Dull/No glow (Image 0007_60_n)

No
Entrop C_RMS HS HF Entro C_RMS HS HFM
y M py
1 Without pre- 6.79 38.48 0.32 0.25 7.76 73.79 0.56 0.48
processing
2 NLM 6.32 31.11 0.09 0.27 6.54 34.36 0.11 0.31
3 ANLM 6.34 29.91 0.08 0.26 6.74 35.71 0.13 0.34
4 SSR 6.44 33.22 0.10 0.28 6.83 45.36 0.16 0.43
5 ASSR 4.53 28.92 0.06 0.12 5.20 34.01 0.08 0.17
6 MSR 6.36 31.84 0.09 0.25 6.71 41.66 0.14 0.37
7 ISS 6.52 33.17 0.07 0.37 6.57 33.75 0.07 0.38
8 AISS 6.33 29.81 0.06 0.32 6.32 30.78 0.06 0.32
9 AISSS 5.68 18.77 0.05 0.09 5.65 19.01 0.04 0.10
10 DCT 6.56 28.88 0.09 0.28 6.84 30.06 0.12 0.29
11 DoG 6.12 22.30 0.07 0.16 6.36 25.70 0.08 0.23
12 HMF 5.32 16.09 0.05 0.07 4.72 11.35 0.05 0.02
13 Lssf 6.05 24.27 0.07 0.13 5.89 21.38 0.06 0.08
14 SSSQI 6.67 55.03 0.16 0.44 6.83 57.08 0.17 0.48
15 MSSQI 6.74 55.03 0.19 0.46 6.87 56.42 0.20 0.50
16 MSWF 6.01 24.86 0.06 0.17 5.97 23.01 0.05 0.12
17 RM 6.16 26.04 0.06 0.22 6.29 25.23 0.07 0.22
18 SG 5.55 15.46 0.04 0.07 5.47 14.39 0.04 0.06
19 TT 6.89 37.28 0.10 0.45 6.85 33.82 0.10 0.38
20 WN 7.95 75.63 0.52 0.99 7.94 77.19 0.53 0.99
21 WF 6.30 28.94 0.07 0.25 6.26 27.29 0.06 0.21
76
4.5.2 Results of Modality Gap Reduction using DoG Filtering
Figure 4.7 (a) DoG Filtered VIS images (b) DoG Filtered NIR Images
In order to evaluate the effectiveness of this particular stage, the set of

measure mentioned earlier has been employed. The table shows the values for
various measures of 100 images. The Table 4.5 demonstrates the effectiveness of
applying DoG filter to reduce the modality gap. The average value of all the
measures for the 100 images under each category has been mentioned. From the
RMSE and PSNR values, it is clear that the error between the images has been
reduced thus improving the signal content.
77
Table 4.5 Similarity measures before and after DoG filtering
1m 60m 100m 150

S.No Measures Before After Before After Before After Before After
DoG DoG DoG DoG DoG DoG DoG DoG
1 RMSE 0.22 0.08 0.92 0.25 0.93 0.33 1.55 0.34
2 PSNR 54.85 58.92 49.49 54.1 48.78 53.1 46.43 52.8
3 MAE 0.23 0.06 0.2 0.18 0.25 0.22 0.46 0.25
4 NK 0.991 0.995 0.93 0.96 0.91 0.93 0.88 0.90
5 SC 0.94 0.95 0.98 1.08 0.98 1.19 0.89 1.16
6 NAE 29.94 5.75 28.6 19.78 32.68 30.3 39.87 34.4
4.7 CONCLUSION
This work provides an insight into the need for photometric

normalization technique for long distance and night time face image recognition
where the task includes comparison of the NIR image taken at a long distance to
a VIS image taken under controlled conditions and at a shorter distance. The
comparison of various photometric normalization techniques quoted in literatures
in done and their results on image samples are shown. Based on the results
obtained the wavelet based photometric normalization technique outperforms the
competing techniques. The wavelet transform not only enhances image pixel
gray-level contrast in the spatial domain, but also enhances the edges of face
images simultaneously, in the frequency domain. This feature facilitates the face
recognition system to work effectively under a wide range of illumination
conditions.
The work also elucidates the effectiveness of the DoG algorithms to
bridge the modality gap and also enhance the features such that the visibility of
edges increases. The work emphasizes the necessity of a robust system that is
78
capable of recognizing faces from long distance at night time. Future work would
be used discriminative feature extractors to represent features and use them for
classification, thus yielding to a complete face recognition system.
79
CHAPTER 5
FEATURE EXTRACTION AND MATCHING
5.1 INTRODUCTION
Feature extraction is the most imperative stage in face recognition

(Abiyev, 2014). The recognition rate of the system depends on the expressive data
extracted from the face image. Features may be explicit structures in the image
such as points, edges or objects. In humble words, features are representation of
interesting parts of an image as a compact feature vector. Generally it is faced with
dimensionality problems (Tzimiropoulos et al, 2014).When the input data to
an algorithm is too large to be administered and it is suspected to be redundant,
then it can be reduced to a subset of features or a feature vector, which are
expected to contain the relevant information from the input data, so that the
desired task can be performed by using this reduced representation instead of the
complete initial data.
Image matching or comparing images in order to obtain a measure

of their similarity, is a fundamental aspect in face recognition system. Image
matching can be defined as “the process of bringing two images geometrically
into agreement so that corresponding pixels in the two images correspond to the
same physical region of the scene being imaged” (Dai & Lu, 1999). After the
descriptors/feature vectors are computed, they can be compared to find a
relationship between images for accomplishing matching/recognition tasks. This
80
chapter describes in detail about the novel approach used for feature extraction
and matching.
5.2 PROPOSED APPROACH FOR FEATURE EXTRACTION
The subsequent stage after pre-processing is feature extraction or

feature description. Feature extraction can be defined as extracting interest or
keypoints from an image that are significant to represent the given face image.
Feature description should be called very critical because the matching phase
completely depends that the highly discriminative features extracted from the
images. Hence, an efficient feature extractor that highlights the unique
discriminative features from both probe image as well as the gallery data is
expected to improve the recognition rate of a face recognition system.
HOG
Wavelet
Transform
LBP
Feature
Dataset
Pre-processed
Image
Normal Fitting Parameters
Figure 5.1 Stages of Feature Extraction
81
In order to represent the images in the right way, a combination of
feature extractors is used here that are clearly shown in Figure 5.1. The proposed
methodology uses a combinative approach which uses a global feature, a local
feature, a statistical feature and a combination of global and local feature. This
combinative approach employs the advantage of all types of feature descriptor and
envisages a stronger feature vector that constructs an efficient recognition process.
Initially, the pre-processed face images are decomposed into three levels using
wavelet transform.
Wavelet transform decomposes the image into approximations and

details. The approximation or scaling coefficients are the low-pass representation
of the image and the details are the wavelet coefficients. At every subsequent
level, the approximation coefficients are divided into a coarser approximation
(low-pass) and high-pass (detail) part (K.P. Soman et al, 2004). The wavelet
coefficients that comprises of vertical, horizontal and diagonal detail coefficients
are separately used to estimate two different feature descriptors namely LBP and
HOG. First, the wavelet coefficients are described by the local binary pattern
(LBP) which labels the pixels of an image by thresholding the neighborhood of
each pixel and considers the result as a binary number. The Histogram of Oriented
Gradients is calculated by individually dividing the wavelet subbands into 16 ×16
blocks with 50% overlap. Each block is then divided into 2 ×2 cells with size 8 ×8
pixels. At every cell, the gradient magnitude and gradient orientation is computed.
The gradient orientation is quantized into 9 bins. Finally, the histograms are
concatenated together to form the HOG feature vector. The normal fitting
parameters of the image is then estimated, which acts as a prime member in
describing the image efficiently. Each stage in feature extraction is described in
detail in the subsections below.
82
5.2.1 Image Representation using Wavelet Transform
Wavelet transforms is one of the most important and potent tool of

image representation. Wavelets can be termed as mathematical functions that split
the data into different frequency components, and then explore each component
with a resolution matched to its scale (Zixiang Xiong et al, 1999). Wavelets have
advantages over traditional Fourier methods in investigating physical situations
where the signal contains discontinuities and sharp spikes. The primary objective
of wavelets is to do an analysis of the basic based on scale. Wavelet Transform
can be categorized as a combination of local and global feature description. In this
work, wavelet transform plays a predominant role as it segregates the required
details from the face image and only these details are represented by suitable
feature descriptors explained in the next sub sections.
The dilated and translated Mother function which is also called as the
analyzing wavelet Φ(x), define an orthogonal basis as in Equation 5.1
𝑠
𝛷(𝑠,𝑙) (𝑥 ) = 2−2 𝛷(2−𝑠 𝑥 − 𝑙) (5.1)
Here the variables s and l are integers that helps to scale and dilate the
mother function Φ to generate wavelets, such as a Daubechies wavelet family
(Daubechies et al,1997). The scale index s is used to specify the wavelet’s width,
and l gives the location index. In order to span the image at different resolutions,
we employ the analyzing wavelet in a scaling equation as in Equation 5.2,
𝑊 (𝑥 ) = ∑𝑁−2 𝑘
𝑘=−1(−1) 𝑐𝑘+1 𝛷(2𝑥 + 𝑘) (5.2)
83
where W(x) is the scaling function for the mother function Φ, and ck are the
wavelet coefficients. The wavelet coefficients must gratify linear and quadratic
constraints of the form as in Equation 5.3,
∑𝑁−1
𝑘=0 𝑐𝑘 = 2 ( 5.3)
∑𝑁−1
𝑘=0 𝑐𝑘 𝑐𝑘+2𝑙 = 2𝛿𝑙,0 (5.4)
where δ is the delta function and l is the location index.
L3 L3
LL HL L2
L3 L3 HL
LH HH Level 1
HL
L2 LH L2
HH
Level 1 LH Level 1
HH
Figure 5.2 Wavelet Decomposition upto 3 levels
After pre-processing, wavelet transform is applied to the image to

decompose it into different levels. First, a one-level, one dimensional DWT is
applied across the rows of the image. Second, a one-level, one-dimensional DWT
84
is applied across the columns of the transformed image from the first step. As
represented in Figure 5.2, the effect of these two sets of operations gives a
transformed image with four distinct bands: LL, LH, HL and HH. Here, L denotes
low-pass filtering and H denotes high-pass filtering. The LL band approximately
refers to a down-sampled version of the original image (Cohen A et al, 1992). The
LH band is likely to preserve the localized horizontal features, whereas the HL
band preserves the localized vertical features in the original image.
Lastly, the HH band is expected to isolate localized high-frequency

point features in the image. Ideally, in the one-dimensional case, since it extracts
only the highest frequencies in the image, we stop right there. Unlike the one
dimensional discrete wavelet transform, additional levels of decomposition can
extract lower frequency features in the image; these additional levels are applied
only to the LL band of the transformed image at the previous level. Figure 5.2
illustrates the two-dimensional DWT at three levels on a given sample image
5.2.2 Local Binary Pattern (LBP) features
Local Binary Pattern (LBP) is a simple and a very disciplined

texture operator which labels the pixels of an image by thresholding the
neighborhood of each pixel and reflects the result as a binary number. Due to its
discriminative power and computational simplicity, LBP texture operator has
become a popular approach in various applications. It can be seen as a joining
approach to the traditionally divergent statistical and structural models of texture
analysis. Conceivably the most important property of the LBP operator in real-
world applications is its robustness to monotonic gray-scale changes caused, for
example, by illumination variations. Another important property is its
computational simplicity, which makes it possible to analyze images in
challenging real-time settings.
85
The basic idea for developing the LBP operator was that two-
dimensional surface textures can be described by two complementary measures:
local spatial patterns and gray scale contrast. The LBP feature vector, in its
simplest form, is computed using the following steps: Divide the examined
window into cells (e.g. 16×16 pixels for each cell). For each pixel in a cell,
compare the pixel to each of its 8 neighbors (on its left-top, left-middle, left-
bottom, right-top, etc.). Track the pixels along a circle, i.e. clockwise or counter-
clockwise. Where the center pixel's value is greater than the neighbor's value,
write "0". Otherwise, write "1". This gives an 8-digit binary number (which is
usually converted to decimal for convenience). Compute the histogram, over the
cell, of the frequency of each "number" occurring (i.e., each combination of which
pixels are smaller and which are greater than the center). This histogram can be
seen as a 256 size feature vector. Concatenate histograms of all cells. This gives a
feature vector for the entire window.
5.2.3 Histogram of Oriented Gradients
Histogram of Oriented Gradients (HOG) can be defined as a dense

feature description method (Dalal et al, 2005) for images, which extracts features
from all locations in the image (or a region of interest in the image) unlike SIFT
which takes into account only the local neighborhood of keypoints. The Histogram
of Oriented Gradients is a global feature extractor which considers to describe the
image as a whole entity. It has the advantage of being simple, invariant to small
shifts and it works well for even small resolutions.
The HOG descriptor can be implemented using the following steps:

1. Split the image into small connected regions called cells. For each cell
compute a histogram of gradient directions or edge orientations for the
pixels within the cell.
86
2. Discretize each cell into angular bins according to the gradient orientation.
3. Each cell's pixel contributes weighted gradient to its corresponding angular
bin.
4. Groups of adjacent cells are considered as spatial regions called blocks.
The grouping of cells into a block is the basis for grouping and
normalization of histograms.
5. Normalized group of histograms represents the block histogram. The set of
these block histograms represents the descriptor.
5.2.4 Normal Fitting Parameters
Normal fitting parameters are statistical features used to

quantitatively represent the relevant information in the image. The normal fitting
parameters describes the image efficiently and so the estimated values are also
considered as prime members in determining the feature descriptor. The normal
fitting parameters (A. Papoulis , 1965) can be calculated by using the Equation
5.5,
1 1 𝑥−𝜇 2
𝑓(𝑥 ) = 𝑒𝑥𝑝 [− ( ) ] (5.5)
𝜎 √2𝜋 2 𝜎
−∞<𝑥<∞
where m denotes the distribution mean, and σ denotes the distribution standard
deviation. The distribution mean can be computed from the Equation 5.6,
∑𝑛
𝑖=1 𝑥𝑖
𝜇̂ = 𝑥̅ = (5.6)
𝑛
where n is the sample size. The distribution standard deviation is computed from
the Equation 5.7,
2
𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 )
𝜎̂ = √ (5.7)
𝑛(𝑛−1)
87
5.3 MATCHING
The final and most interesting phase in face recognition system is

matching the extracted features of the input image with that of the features of the
gallery data. The output of this stage says if the subject detected from the input
image belongs to the gallery or not. This is achieved by using suitable similarity
measures which yields the measure of similarity of the input image with that of
all the images in the gallery. Then the gallery image which gives the closest
distance is checked to find if it is of the same subject as that of the input image.
Resemblance can be measured either as a distance or a similarity. Distance or
similarity measures are used to resolve several pattern recognition problems such
as classification and clustering (Aggarwal et al, 2001). Most of the distance
measures can be converted into similarity measures and vice versa. A good
distance measure must satisfy the following rules: when images are identical, the
minimum value is always zero. The distance is always positive when two images
differ as negative distances are not allowed. The distances between an image A
and an image B is always same as the distance between B and A. A selection of
most commonly used and most effective measures are used in this work.
5.3.1 Euclidean Distance
The Euclidean Distance is also called as the L2 distance (Bugatti et

al, 2008). For two vectors in n dimensional hyper plane, u = (x1, x2, …….. xn) and
v = (y1, y2, ……….yn), the Euclidean Distance ED(u,v) is defined as Equation 5.9
𝐸𝐷 (𝑢, 𝑣 ) = √(𝑥1 − 𝑦1 )2 + (𝑥2 − 𝑦2 )2 + ⋯ + (𝑥𝑛 − 𝑦𝑛 )2 (5.8)
𝐸𝐷 (𝑢, 𝑣 ) = √∑𝑛𝑖=1(𝑥𝑖 − 𝑦𝑖 )2 (5.9)
88
5.3.2 Cosine Similarity
The cosine similarity between any two vectors can be defines as a

measure that calculates the cosine of the angle between them (S.Santani et al,
1999). This metric is a measurement of orientation and not magnitude. Given two
vectors A and B, the cosine similarity can be computed as Equation 5.10.
𝐴.𝐵 ∑𝑛
𝑖=1 𝐴𝑖 𝐵𝑖
cos(𝜃) = = (5.10)
‖𝐴‖‖𝐵‖
√∑𝑛 2 𝑛
𝑖=1 𝐴𝑖 √∑𝑖=1 𝐵𝑖
2
5.3.3 Canberra Distance
The Canberra distance can be defined as the numerical measure of the

distance between pairs of points in a vector space, introduced by G. N. Lance and
W. T. Williams (1967). It is generally considered to be the weighted version of L₁
(Manhattan) distance. The Canberra distance d between vectors p =
(p1,p2,……,pn) and q = (q1,q2,…….qn) in an n-dimensional real vector space is
given by Equation 5.11.
|𝑝 −𝑞 |
𝑑 (𝑝, 𝑞 ) = ∑𝑛𝑖=1 |𝑝 𝑖|+|𝑞𝑖 | (5.11)
𝑖 𝑖
5.3.4 Manhattan Distance
The Manhattan distance between two images is given by the sum of the
differences of their corresponding components (Bugatti et al, 2008). The distance
d between x= (x1, x2, …….., xn) and y = (y1, y2, ………, yn) with n variables is
given by the Equation 5.12,
𝑑 = ∑𝑛𝑖=1|𝑥𝑖 − 𝑦𝑖 | (5.12)
89
5.3.5 Chebyshev Distance
Chebyshev distance can be defined on a vector space where

the distance between given two vectors is the greatest of their differences along
any coordinate dimension. It is named after Pafnuty Chebyshev (David et al,
2004). The Chebyshev distance D between two vectors or points p and q, with
standard coordinates pi and qi respectively, is given by Equation 5.13,
𝑚𝑎𝑥
𝐷 (𝑝, 𝑞 ) = 𝑖(|𝑝𝑖 − 𝑞𝑖 )| (5.13)
5.3.6 Statistic Value X2
The Statistic Value X2 highlights the elevated discrepancies between

given two feature vectors and measures how improvable the distribution is. For
two vectors F = (f1,f2, ….., fn) and G = (g1,g2, ……, gn), the distance is given by
the Equation 5.14,
(𝑓𝑖− 𝑚𝑖 )2
𝑑𝑋 2 (𝑓, 𝑔) = ∑𝑛𝑖=1 (5.14)
𝑚𝑖
𝑓𝑖 +𝑔𝑖
where 𝑚𝑖 =
2
5.3.7 Chord Distance
The chord distance between two vectors x and y is the measure of the
distance (Bugatti et al, 2008) between the projected vectors of x and y onto the
unit sphere, which can be calculated by the Equation 5.15,
𝑥 𝑦
𝐷 (𝑥, 𝑦) = ‖‖𝑥‖ − ‖ (5.15)
‖𝑦‖ 2
90
5.3.8 Pearson’s Correlation Coefficient
The Pearson’s Correlation Coefficient (S.Santani et al, 1999) is given

by the Equation
∑𝑑
𝑖=1(𝑥𝑖 −𝑢)(𝑦𝑖 −𝑣)
𝜌= (5.16)
(√∑𝑑 2 𝑑 2
𝑖=1(𝑥𝑖 −𝑢) √∑𝑖=1(𝑦𝑖 −𝑣) )
1 1
where 𝑢 = ∑𝑛𝑖=1 𝑥𝑖 and 𝑣 = ∑𝑛𝑖=1 𝑦𝑖
𝑛 𝑛
5.4 RESULTS AND DISCUSSIONS
The feature extraction stage as mentioned previously should be given

extra care so that the matching phase gives extraordinary performance and thus
yields good recognition rates. The pre-processed face images are first decomposed
up to 3 levels as shown in Figure 5.3. The detail coefficients derived from the
horizontal, vertical and diagonal details subbands are each of size 48 ˟x 32. Only
these horizontal, vertical and diagonal detail coefficients are considered for the
further stages. Each detail subband is subjected subjected to extract the HOG and
LBP features thus yielding 1620 and 256 features, respectively. Another
important segment in the feature extractor combination is the statistical feature
descriptor, which is represented by the normal fitting parameters of the original
image. There are 4 components in the normal fitting parameters and thus it results
into a feature set of size 1880. During experimentation, the recognition
performance of each type of features was tested separately as well as in different
possible combinations. But, the combination of these three types of feature has
yielded the maximum result.
91
Figure 5.3 Three level wavelet decomposition of sample face images (a) 1m
and (b) 150 m
The feature descriptor combination comprising of HoG, LBP of

wavelet detail coefficients and normal fitting parameters yield an efficient feature
vector that is highly discriminative and unique for every face image.
Table 5.1 Result Comparison in terms of Recognition rates

S.No Distance Recognition Rates (%)
Proposed AHFR HFR
approach
1 150 m 32 28 15
2 100 m 78 69 48
3 60 m 92 82 65
The significant details in a face image are usually contained in its high frequency
components. Hence, the wavelet transform decomposes the signal into low and
92
high frequency i.e. the approximations and details; details alone are used to extract
HoG and LBP features, where approximate coefficients does not find any use.
Table 5.2 Comparison of recognition rates between different distance

measures
Distance 1m 60 m 100 m 150 m
Measure
Euclidean 100 92 78 32
Cosine 100 90 78 32
Canberra 81 64 34 11
Sum Chord 100 79 23 19
Partial 94 70 65 6
Histogram
Intersection
Manhattan 100 75 70 11
Chebyshev 86 62 45 2
Fractional 76 45 34 2
X2 Statistics 87 79 41 8
Pearson 98 82 28 23
Correlation
Coefficient
The greatest challenge faced in this work is that that the gallery
contains only a single image of the target, whereas in most of the face recognition
databases the gallery contains a minimum of three images for every subject.
However, this way of comparing a single image against 100 images with a single
version of every subject using the proposed combination of feature vectors is
93
found to be quite successful with a recognition rate of 32%, 78% and 72% at
150m, 100m and 60m, respectively as shown in Table 5.1. It is to be noted that
though the quality of images at 60m is higher than that 100m and 150m, the
recognition rate of 60m is found to be lesser than that of 100 m. This is because
all images are resized to a common size and due to this the quality of the 60m
images has come down which results in lower recognition rate comparatively.
The Table 5.2 shows the results of recognition rates yielded by

different distance and similarity measures. All the measures give their best
performances at 1 m and as distance increases the recognition rate decreases. The
Euclidean and Cosine measures give 32% at 150 m for the proposed approach
which is higher than the results provided by the existing methods for long distance
and night time face recognition system.
5.5 CONCLUSION
Feature Extraction being the most significant phase in the entire face
recognition system needs to represent the image in a highly discriminative way
such that matching process works well. This combinative approach in the research
paves way for a better representation of the features and thus enhancing the
recognition rate.
The database comprises of several classes (100 classes) each with

only a single sample and therefore using classifiers was not a possibility for
matching. The results show that the proposed method performs better at higher
distances like 150 m and 100 m. In future, this approach can be tested with larger
databases and with images that have variations in pose, expression and occlusion
94
CHAPTER 6
CONCLUSION
6.1 INTRODUCTION
This research work has dealt with improving the recognition rate of a
long distance and night time face recognition system. This research has come out
with a different approach for pre-processing the different modality image and a
combinative approach for feature extraction. This chapter summarizes the work
done and discusses the contributions. Finally, the practicable direction for future
scope of work has been discussed.
The rest of the chapter is organized as follows: Section 6.2 describes

the summary of work done, Section 6.3 explains the contribution of the work and
Section 6.4 gives the scope of the future work.
6.2 SUMMARY OF THE WORK DONE
The focus of this work is to improve the recognition rate of a long

distance and night time face recognition system by working on cross-spectral and
cross-distance based images. The experiments are carried out using the Long
95
Distance Heterogeneous Face Database. Experimental results of each module are
tested and analyzed to highlight the importance of the proposed methodology.
6.2.1 Analysis of Viola Jones Face Detector on the LDHF Database
The task of detecting a face in an image is not an easy problem

because many difficulties arise and must be taken into account. In this work the
effect and the performance of Viola Jones face detection algorithm on NIR long
distance images are described. It is exciting to see Viola-Jones face detection
algorithm is increasingly being used in face recognition as well as other object
recognition problems, but when it is used towards NIR images the difficulty for
the face detection is on the rise. Only the frontal upright face images are detected
by Viola Jones algorithm but it shows the ineffectiveness towards the NIR images
due to dark pixels. It is evident from this work that Viola Jones could not perform
so well when distance between the capturing device and the subject increases.
6.2.2 Wavelet based Preprocessing Approach
This work provides an insight into the need for photometric

normalization technique for long distance and night time face image recognition
where the task includes comparison of the NIR image taken at a long distance to
a VIS image taken under controlled conditions and at a shorter distance. The
comparison of various photometric normalization techniques quoted in literatures
in done and their results on image samples are shown. The wavelet based
photometric normalization technique outperforms all other techniques in all
aspects.
The work also elucidates the effectiveness of the DoG algorithms to
bridge the modality gap and also enhance the features such that the visibility of
96
edges increases. The work emphasizes the necessity of a robust system that is
capable of recognizing faces from long distance at night time. Future work would
be used discriminative feature extractors to represent features and use them for
classification, thus yielding to a complete face recognition system.
6.2.3 Highly Discriminative Feature Representation
Feature Extraction being the most significant phase in the entire face
recognition system needs to represent the image in a highly discriminative way
such that matching process works well. This combinative approach in the research
paves way for a better representation of the features and thus enhancing the
recognition rate.
The database consists of several classes (100 classes) each with
single sample and therefore classifiers were not used for matching. The results
show that the proposed method performs better at higher distances like 150 m and
100 m. In future, this approach can be tested with larger databases and with images
that have variations in pose, expression and occlusion
6.3 FUTURE SCOPE
The research work emphasizes the need for an improved Viola Jones
algorithm in future that can perfectly detect different modality images taken from
different distances at night time. A suitable technique to identify common features
from NIR and VIS images that yields to better recognition rates can be introduced.
97
REFERENCES
Abdullah, M., Wazzan, M. and Bo-saed, S., 2012. “Optimizing face recognition
using PCA”, International Journal of Artificial Intelligence & Applications, 3(2),
pp.23-31.
Abiyev R.H., (2014), “Facial Feature Extraction Techniques for Face

Recognition” Journal of Comput Science, 10(12), 2360-2365.
A. Ben-Hur and I. Guyon, (2003), “Detecting stable clusters using principal

component analysis”, In M.J. Brownstein and A. Kohodursky, editors, Methods
In Molecular Biology, pages 159–182. Humana Press.
Aggarwal, C.C., Hinneburg, A., Keim, D.A. (2001), “On the surprising behavior
of distance metrics in high dimensional space”, Lecture Notes in Computer
Science 1973, 420–434
Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, Jianhuang Lai
(2017), “RGB-Infrared Cross-Modality Person Re-identification”, IEEE
International Conference on Computer vision (ICCV), 10.1109/ICCV.2017.575.
A.K. Jain, D. Zongker, (1997), "Feature Selection: Evaluation Application and

Small Sample Performance", IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 19, no. 2, pp. 153-158.
Anil K. Jain, (2011), “Face Recognition: Some Challenges in Forensics”, IEEE

International Conference On Automatic Face and Gesture Recognition, pp 726-
733.
A. Papoulis, (1965),” Probability, Random Variables and Stochastic Processes”,

McGraw-Hill.
Bakshi, U. and Singhal, R., (2014), “A survey on face detection methods and
feature extraction techniques of face recognition”, International Journal of
Emerging Trends & Technology in Computer Science (IJETTCS), 3(3), pp.233-
237.
98
Barnouti, Nawaf Hazim. (2016), “Face Recognition Using Eigen-Face
Implemented On DSP Processor”, International Journal of Engineering Research
and General Science, 4(2), pp.107-113
Bevilacqua, V., Mastronardi, G., Melonascina, F., Nitti, D., (2006), “Stereo-
Matching Techniques Optimisation Using Evolutionary Algorithms”, LNCS 4113
© Springer-Verlag Berlin Heidelberg 2006, pp. 612-621
B. Klare and A. K. Jain (2010) “Heterogeneous face recognition: Matching nir to

visible light images”. In Pattern Recognition (ICPR), 20th International
Conference on, pages 1513–1516. IEEE.
Bugatti, P. H., Traina, A. J. M., and Traina Jr., C. (2008), “Assessing the best
integration between distance-function and image-feature to answer similarity
queries”, In Proceedings of the 23rd ACM SAC, pages 1225–1230, New York,
NY, USA. ACM.
B. Zhang, L. Zhang, D. Zhang, L. Shen (2010), Directional binary code with

application to PolyU near-infrared face database, Pattern Recognition. Letters. 31,
2337–2344.
C. Chatterjee, V.P. Roychowdhury, (1997), "On Self-Organizing Algorithms and

Networks for Class-Separability Features", IEEE Trans. Neural Networks, vol. 8,
no. 3, pp. 663-678.
C. Lee, D.A. Landgrebe, (1993), "Feature Extraction Based on Decision

Boundaries", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no.
4, pp. 388-400.
Cheng-Lin Liu, (2007), “Normalization-Cooperated Gradient Feature Extraction

for Handwritten Character Recognition”, IEEE Transaction on pattern analysis
and machine intelligence, vol. 29, no. 8.
Chihaoui M, Bellil W, Elkefi, A, Amar C.B, (2015), “Face recognition using

HMM-LBP”, In Hybrid Intelligent Systems; Springer: Cham, Switzerland, pp.
249–258.
Cohen,A.,I. Daubechies and J. Feauveau,(1992), ”Biorthogonal bases for

compactly supported Wavelets”, Communications on Pure Applied
Mathematics,45,485-560.
Dalal, N., Triggs, B (2005), “Histograms of Oriented Gradients for Human

Detection”, In: IEEE Conference on Computer Vision and Pattern Recognition
(CVPR).
99
Daubechies, Wim Swelden (1997),” Factoring wavelet transforms in to lifting
steps”, Princeton University, NJ and Lucent technologies, NJ.
David M. J. Tax, Robert Duin, Dick De Ridder (2004), “Classification, Parameter

Estimation and State Estimation: An Engineering Approach Using MATLAB”,
John Wiley and Sons. ISBN 0-470-09013-8.
Devendra Singh Raghuvanshi,Dheeraj Agrawal, (2012), “Human Face Detection

by using Skin Color Segmentation, Face Features and Regions Properties”,
International Journal of Computer Applications, Vol. 38, No.9,pp.14-17.
D. Huang, Y. Wang, Y. Wang 2007, “A robust method for near infrared face
recognition based on extended local binary pattern”, in: Proceedings of ISVC, pp.
437–446.
D. Huang, M. Ardabilian, Y. Wang, L. Chen, Asymmetric 3d/2d face recognition

based on lbp facial representation and canonical correlation analysis, in: ICIP,
2009, pp. 3325–3328.
D. Jobson, Z. Rahman, and G. Woodell (1997) “A multiscale retinex for bridging

the gap between color images and the human observations of scenes,” IEEE
Transactions on Image Processing, vol. 6, no. 7, pp. 965–976.
E.P. Xing and R.M. Karp, (2002), “Cliff: Clustering of high-dimensional

microarray data via iterative feature filtering using normalized cuts”, In 9th
International Conference on Intelligence Systems for Molecular Biology.
F. Ferri, P. Pudil, M. Hatef, J. Kittler, (1994), "Comparative Study of Techniques

for Large Scale Feature Selection", Pattern Recognition in Practice IV, pp. 403-
413.
F. Nicolo, N.A. Schmid (2012), “Long range cross-spectral face recognition:

matching SWIR against visible light images”, IEEE Trans. Inf. Forensics Security.
7, 1717–1726.
F. Xiaolong and V. Brijeshl, (2009), “Selection and fusion of facial features for
face recognition,” Expert Systems With Applications, vol.36,pp.7157–7169.
Hatem, H., Beiji, Z., Majeed, R., Lutf, M. and Waleed, J., (2015), “Face Detection
and Pose Estimation Based on Evaluating Facial Feature Selection”, International
Journal of Hybrid Information Technology, 8(2), pp.109-120.
100
H. Frigui, R. Krishnapuram, (1999), "A Robust Competitive Clustering Algorithm
with Applications in Computer Vision", IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 21, no. 5, pp. 450-465.
H.A. Rowley, S. Baluja, and T. Kanade, (1998) “Neural Networks Based Face
Detection”, IEEE Trans. Pattern Analysis an Machine Intelligence, vol. 20, no. 1,
pp. 22-38.
H. Han, B. Klare, K. Bonnen, A. Jain, (2013), “Matching composite sketches to

face photos: A component-based approach”, IEEE Transactions on Information
Forensics and Security 191–204.
H.Guillaume, C.Fabien, and S.Marcel, (2005) “Lighting normalization algorithms

for face verification,” IDIAP.
H.Wang, S. Li, Y.Wang, and J. Zhang (2004) “Self quotient image for face
recognition,” in Proceedings of the International Conference on Image Processing,
pp. 1397– 1400.
Jameel S, (2015), “Face recognition system using PCA and DCT in HMM”,
International Journal of Advanced Research in Computer Communication
Engineerin, 4, 13–18.
J. A. Stark (2000), “Adaptive Image Contrast Enhancement Using Generalizations

of Histogram Equalization,” IEEE Transactions on Image Processing, 9(5),
pp.889-896.
J.C. Klontz, A.K. Jain (2013), “A Case Study on Unconstrained Facial

Recognition Using the Boston Marathon Bombings Suspects”, MSU Technical
Report.
J. Goldstein, L. D. Harmon, and A. B. Lesk (1971), “Identification of human

faces”, Proceeding IEEE, vol. 59, no. 5, pp. 748-760.
Kang D, Han H, Jain AK, Lee S.W (2014) “Nighttime face recognition at large
standoff: cross-distance and cross-spectral matching”, Pattern Recognition,
47(12):3750–66.
K.P. Soman, K.I. Ramachandran (2004),” Insight in to Wavelets from Theory to

Practice”, PHI, New Delhi
K. W. Bowyer, K. Chang, P. Flynn (2006) “A survey of approaches and challenges

in 3d and multi-modal 3d + 2d face recognition”, CVIU,1–15.
101
Lin D., Tang X. (2006) “Inter-modality Face Recognition”. In: Leonardis A.,
Bischof H., Pinz A. (eds) Computer Vision – ECCV 2006. ECCV 2006. Lecture
Notes in Computer Science, vol 3954. Springer, Berlin, Heidelberg
Lienchart R, Maydt M (2002), “An extended set of Haar-like features for rapid
object detection”, In International conference on image Processing, vol 1, pp I-
900–I-903.
L. Shen, J. He, S. Wu, S. Zheng (2012), “Face recognition from visible and near-
infrared images using boosted directional binary code”, in: Proceedings of ICIC,
pp. 404–411.
L. Sirovich and M Kirby (1987), “A low dimensional procedure for the

characterization of human faces”, JOSA A 4, no. 3: 519-524.
M. Ao, D. Yi, Z. Lei, and S. Z. Li. (2009) “Handbook of Remote Biometrics for
Surveillance and Security”, Advances in Computer Vision and Pattern
Recognition Series, Springer.
Maeng H, Choi H.C, Park U, Lee S.W, Jain AK (2011) “NFRAD: near-infrared
face recognition at a distance”, International joint conference on biometrics
compendium (IJCB), IEEE.
Maeng H, Liao S, Kang D, Lee S.W, Jain AK (2012) “Nighttime face recognition
at long distance: cross-distance and cross-spectral matching”, ACCV, Daejeon,
Korea.
Marciniak T, Drgas Sz, Cetnarowicz D (2011), “Fast face location using

AdaBoost algorithm and identification with matrix decomposition methods”, In
Multimedia Communications, Services and Security; Communication in
Computer Vision and Information Science, vol 149, pp 242– 250
Ming-Hsuan Yang (2008),"Face Detection", in Encyclopedia of Biometrics (eds.

S. Z. Li).
Mohsen A., Abdul Hossen R., Abd Alsaheb Ogla M. and Mahmood Ali, (2017),
“Face Detection by Using OpenCV’s Viola Jones Algorithm based on coding
eyes”, Iraqi Journal of Science.
Montag C, Duke É, Markowetz A. (2016), “Toward Psychoinformatics: Computer

Science Meets Psychology”. Computational and Mathematical Methods in
Medicine; 2016:2983685. doi:10.1155/2016/2983685.
102
N. Kalka, T. Bourlai, B. Cukic, and L. Hornak (2011) “Cross-spectral Face
recognition in Heterogeneous Environments: A Case Study on Matching Visible
to Short-wave Infrared Imagery” In International Joint Conference on Biometrics.
Nilamani Bhoi, Mihir Narayan Mohanty. (2010), “Template Matching based Eye
Detection in Facial Image”.International Journal of Computer Applications (0975-
8887)Volume 12-No.5.
O. Arandjelović (2013) “Making the most of the self-quotient image in face

recognition.,” In Proc. IEEE International Conference on Automatic Face and
Gesture Recognition.
Oivind Due Trier, Anil K. Jain, Torfinn Taxt, (1996), “Feature extraction Methods
for character recognition – A Survey”, Journal of Pattern Recognition, Vol. 29,
No. 4, pp. 641-642.
Paula Viola and Michael J. Jones, (2001), Rapid Object Detection using
adaBoosted Cascade of Simple Features. IEEE CVPR.
P.A. Chou, "Optimal Partitioning for Classification and Regression

Trees", (1991), IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13,
no. 4, pp. 340-354.
P. Viola and M. J. Jones, "Robust real-time face detection," Int. J. Computer

Vision, vol. 57, no. 2, pp.137-154, 2004.
P.J. Phillips, P.J. Flynn, J.R. Beveridge, W.T. Scruggs, A.J. O'Toole, D. Bolme,
K.W.
Bowyer, B.A. Draper, G.H. Givens, Y.M. Lui, H. Sahibzada, J.A. Scallan, W.
Samual (2009), “Overview of the multiple biometrics grand challenge”, in:
Proceedings of ICB, pp. 705–714.
Rath, S.K. and Rautaray, S.S., (2014), “A Survey on Face Detection and
Recognition Techniques in Different Application Domain”, International Journal
of Modern Education and Computer Science, 6(8), pp.34-44
R. C. Gonzalez, R. E. Woods, and S. L. Eddins (1992) “Digital image processing

using matlab”.
R. G. Lyons, (2004), “Understanding Digital Signal Processing”, Prentice Hall.
103
R. S. Ghiass, O. Arandjelović, A. Bendada, and X. Maldague, (2014), “Infrared
face recognition: a comprehensive review of methodologies and databases.,”
Pattern Recognition, vol. 47, no. 9, pp. 2807–2824.
Sankur B., Sezginb M. (2004), “Image Thresholding Techniques: a Survey over

Categories”. Journal of Electronic Imaging, vol. 13(1), pp. 146-165.
S. Santani and R. Jain (1999), “Similarity measures,” IEEE Trans. on Pattern

Analysis and Machine Intelligence, Vol. 29, no.9, 871-883.
Seo, H.J., Milanfar, P. (2011), “Face Verification Using the LARK

Representation”, IEEE Transactions on Information Forensics and Security, PP.
99.
Singh, A., Singh, S.K. and Tiwari, S., (2012), “Comparison of face Recognition
Algorithms on Dummy Faces”, The International Journal of Multimedia & Its
Applications, 4(4), pp.121-135
S. J. D. Prince, J. Warrell, J. H. Elder, and F. M. Felisberti, (2008) “Tied Factor

Analysis for Face Recognition across Large Pose Differences,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 30, no. 6, pp. 970-984.
S. K. Pal, R. A. King and A. A. Hashim, (1983), “Automatic gray-level

thresholding through index of fuzziness and entropy,” Pattern Recog. Lett., vol.
1, pp. 141-146.
S. Venkatesh, P.L. Rosin (1995), “Dynamic Threshold Determination by Local

and Global Edge Evaluation”, CVGIP: Graphical Models and Image Processing,
57, 146-160.
S.Z. Li, R. Chu, S. Liao, L. Zhang (2007), “Illumination invariant face recognition
using near-infrared images”, IEEE Trans. Pattern Anal. Mach. Intell. 29, 627–639.
S. Z. Li, D. Yi, Z. Lei, and S. Liao (2013). “The casia nir-vis 2.0 face database”.
In Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE
Conference on, pages 348–353.
S. Zhao, R.R. Grigat (2005), An automatic face recognition system in the near
infrared spectrum, in: Proceedings of MLDM, pp. 437–444.
T. Bourlai, A. Ross, C. Chen, and L. Hornak (2012), “A Study on using Middle-

Wave Infrared Images for Face Recognition”. In SPIE, Biometric Technology for
Human Identification IX, Baltimore, U.S.A.
104
T. Bourlai and B. Cukic (2012) “Multi-spectral face recognition: identification of
people in difficult environments”. In Intelligence and Security Informatics (ISI),
IEEE International Conference on, pages 196–201.
T. Hastie, R. Tibshirani, and J. Friedman, (2000), “The Elements of Statistical

Learning, Data Mining, Inference and Prediction”, Springer Verlag.
T. Ojala, M. Pietikainen, and T. Maenp, (2002), “Multiresolution gray-scale and

rotation invariant texture classification with local binary patterns”,. IEEE PAMI,
24(7):971–987.
Tsai, D.M., and Chiang, C.H. (2002), “Rotation-invariant pattern matching using
wavelet decomposition”, Pattern Recognition Letters 23, 191-201.
Tzimiropoulos, G., S. Zafeiriou and M. Pantic (2011), “Principal component

analysis of image gradient orientations for face recognition”, Proceedings of the
IEEE International Conference on Automatic Face and Gesture Recognition and
Workshops, Mar. 21- 25, Santa Barabara, USA. pp: 553-558. DOI:
10.1109/FG.2011.5771457
Unar J.A., Seng W.C., Abbasi A. (2014), “A review of biometric technology

along with trends and prospects”, Pattern Recognition, 47 (8), pp. 2673-2688.
Varsha Gupta and Dipesh Sharma, (2015) “A study of various Face Detection
Methods”, International Journal of Advanced Research in computer and
communication Engineering, Vol 3, Issue 5.
V. Štruc, J. Žibert, and N. Pavešić, (2009), “Histogram remapping as a

preprocessing step for robust face recognition,” WSEAS Transactions on
Information Science and Applications, vol. 6, no. 3, pp. 520–529.
V. Štruc and N. Pavešić, (2011), “Photometric normalization techniques for

illumination invariance,” Advances in Face Image Analysis: Techniques and
Technologies, pp. 279–300.
W. Chen, M. Er, and S. Wu, (2006) “Illumination compensation and

normalization for robust face recognition using discrete cosine transform in
logarithmic domain,” IEEE Transactions on Systems, Man and Cybernetics - part
B, vol. 36, no. 2, pp. 458–466.
X. Chai, S. Shan, X. Chen, and W. Gao, (2007), “Locally linear regression for
pose-invariant face recognition,” IEEE Trans. Image Proc., vol. 16, no. 7, pp.
1716-1725.
105
X.Tan and B.Triggs, (2007) “Enhanced local texture feature sets for face
recognition under difficult lighting conditions,” In Proceedings of the IEEE
international workshop on analysis and modeling of faces and gestures, pp. 168–
182.
X. Xie and K.-M. Lam (2006) “An efficient illumination normalization method
for face Recognition”, Pattern Recognition Letters.
X. Zou, J. Kittler, K. Messer, (2007) “Illumination invariant face recognition: A

survey”, in: Biometrics: Theory, Applications, and Systems, pp. 1–8.
Y. Adini, Y. Moses, and S. Ullman, (1997) “Face recognition: The problem of

compensating for changes in illumination direction.,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 721–732.
Y. Moses, Y. Adini, and S. Ullman (1994), “Face recognition: The problem of

compensating for changes in illumination direction”. In European Conf. on
Computer Vision, pages 286–296
Zhao, W., Chellappa, R., Rosenfeld, A., Phillips, P.J. (2003), “Face recognition:
A literature survey”. ACM Computing Surveys 35, 399–458.
Zixiang Xiong, Kannan Ramachandran, Michael T. Orchard and Ya-Qin Zhang

(1999),” A comparative study of DCT and Wavelet based image coding”, IEEE
Transactions on Circuit and Systems for Video Technology, Vol 9, No5, August
1999.
Z. Lei and S. Z. Li. (2009), “Coupled spectral regression for matching

heterogeneous faces”. In Computer Vision and Pattern Recognition, 2009. CVPR
2009. IEEE Conference on, pages 1123–1128.
Z. Lei, S. Liao, A. K. Jain, and S. Z. Li. (2012), “oupled discriminant analysis for
heterogeneous face recognition”. IEEE Transactions on Information Forensics
and Security, 7(6):1707–1716.
Z. Pan, G.E. Healey, M. Prasad, B.J. Tromberg (2003), “Face recognition in

hyperspectral images”, IEEE Trans. Pattern Anal. Mach. Intell. 25, 1552–1560.
106

Consolidated New

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Consolidated New

Uploaded by

Copyright:

Available Formats

SYSTEMATIC STUDY OF

Face recognition has become a striking exploration area in the recent

years. It finds an extensive range of applications beginning from video games,

illumination variations and occlusion. Face recognition is becoming increasingly

be used in surveillance systems for subject identification. Integrating face

recognition capability in surveillance systems will significantly improve the

security of the monitoring area.

Among all other scenarios, the face recognition in surveillance

differences due to distinct imaging mechanism or imaging environment, etc. Also

approach for cross-spectral and cross-distance face recognition uses a common

differences and thus yields way for increased recognition rates.

biggest challenge to address cross-spectral (VIS vs NIR) and cross distance (1 m

vs 60 m or 100 m or 150 m) matching. Hence to make this possible, an additional

carried out in the pre-processing stage.

at higher distances like 150 m and 100 m.

Chapter No. Title Page No.

2.3 Literature Survey on Preprocessing

4.3.6 Isotropic Smoothing 56

4.3.7 Anisotropic Smoothing 56

4.5.6 Normalized Absolute Error 69

4.6.1 Results of Photometric Normalization 70

on the LDHF Database

6.2.3 Highly Discriminative Feature Selection 97

Table No. Title Page No.

Figure No. Title Page No.

4.4 Photometric Normalized outputs for a sample image at 71

LIST OF SYMBOLS AND ABBREVIATIONS

DSLR - Digital Single Lens Reflex

FERET - Face Recognition Technology

PSNR - Peak Signal to Noise Ratio

Face recognition is a technology used to distinctively identify or verify

Face recognition has become a predominant area of research in

It looks like facial recognition technology came out of nowhere.

In 1971, Goldstein, Harmon, and Lesk succeeded in improving the

demonstrating the viability of automatic facial recognition. The Defense

Figure 1.1 General Block Diagram of Face Recognition System

At the beginning of 2010, Facebook started executing facial

The innovative model of iPhone traded out almost rapidly, ascertaining

By and large, there are three phases in a face recognition system,

1.1 APPLICATIONS OF FACE RECOGNTION TECHNOLOGY

Facial recognition technology has been conventionally related with the

1.2 CHALLENGES OF FACE RECOGNITION SYSTEM

The success of face recognition system depends upon the quality of

An important encounter that astonished the researches is when images

Figure 1.2 Example showing same individual taken at different lighting

Figure 1.3 exemplifies the differences in the pose of a single person.

Figure 1.3 Pose Variations of a single subject

Figure 1.4 Co-operative and Non Co-operative Face

Figure 1.5 Intra-class Variability

This condition usually occurs in any surveillance environment. A

Figure 1.6 Inter-class Similarity

There are cases where images of two different people appear to be

1.3 LONG DISTANCE AND NIGHT TIME FACE RECOGNITION

Mostly, the face recognition systems are dependent on the usage of

1.3.1 Night Time Imaging Modes

So as to deal with such demanding face recognition scenarios, multi-

The infrared (IR) spectrum is divided into diverse spectral bands

The passive IR band is again divided into the Mid-Wave (MWIR)

1.3.2 Face Recognition in the NIR Band

Bearing in mind a case of face recognition at nighttime, a distinctive

1.3.3 Cross-Spectral and Cross-Distance Matching

Figure 1.8 Example showing (a) intra-spectral, (b) cross-spectral

The term cross spectral matching denotes a condition where the

Similarly, cross-distance matching denotes a condition where the