Professional Documents
Culture Documents
Face Recognition and Age Progression Using SVM
Face Recognition and Age Progression Using SVM
MASTER OF TECHNOLOGY IN
ELECTRONICS AND COMMUNICATION
With Specialization in
DIGITAL COMMUNICATION
SUBMITTED BY
MUKESH CHAUHAN
0302EC15MT08
CERTIFICATE
CERTIFICATE
Date: Date:
RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA
BHOPAL (M.P)
(UNIVERSITY OF TECHNOLOGY OF MADHYA PRADESH)
CANDIDATE’S DECLARATION
I hereby declare that this work is original and has not been submitted in
part or full to any other university or institute for the award of any Degree
or Diploma.
Date : ……………………..
Mukesh chauhan
(Enrolment No. 0302EC15MT08)
VINDHYA INSTITUTE OF TECHNOLOGY AND
SCIENCE SATNA (M.P.)
_______________RECOMMENDATION_______________
Guide
I would like to give special grateful thanks to my family who has given
more time directly and indirectly to complete the thesis work.
I also would like to express my thanks to all friends who have given me
cheering time and exciting life during the completion of my thesis work.
ABSTRACT
TABLE OF CONTENTS I-III
LIST OF FIGURES IV-V
LIST OF TABLE VI
LIST OF ABBRIVIATION VII
I
2.3 Face detection techniques 16
2.3.1 Classification-based 16
2.3.2 Regression-based 17
2.4 Bio-inspired Models 18
2.5 Literature Review 19
II
4.3 Limit of Age Progression 46
REFERENCES: 73-75
III
LIST OF FIGURES
Sr. No. Title Page No.
IV
Figure 3.14 Some sample images from FG-NET 35
Figure 3.15 Some sample images from MORPH album 36
images.
Figure 5.3 Training to Sample of different face and non face to distinguish 61
These layouts come after the push button face part detection has
Figure 5.4 62
been pressed
V
LIST OF TABLES
Sr. No. Title Page
No.
Table 3.1 MAEs of using separate SVR and SVM for different Gabor 37
features on FG-NET
Table 3.2 MAE (years) measures for 75 and 68 shape feature 38
Table 3.3 MAE (years) measures for R, I and M parts on FG-NET and 38
MORPH
Table 3.4 MAE (years) at different age groups on FG-NET 39
Table 3.5 MAE (years) comparisons. 39
VI
LIST OF ABBRIVIATION
VII
CHAPTER 1
INTRODUCTION
The human face holds important amount of information and attributes such as
expression, gender and age. The vast majority of people are able to easily recognize
human traits like emotional states, where they can tell if the person is happy, sad or
angry from the face. Likewise, it is easy to determine the gender of the person.
However, knowing person’s age just by looking at old or recent pictures for them is
often a bigger challenge. Our objective in this thesis is to develop a human face
detection and age progression from face images. Given a face image of the person, we
label it with an estimated age. Aging is non-reversible process. Human face
characteristics change with time which reflects major variations in appearance. The
age progression signs displayed on faces are uncontrollable and personalized such as
hair whitening, muscles dropping and wrinkles. The aging signs depend on many
external factors such as life style and degree of stress. For instance smoking causes
several facial characteristics changes. A 30 years old person who smokes a box of
cigarettes each day will look like a 42 years old one. Compared with other facial
characteristics such as identity, expression and gender, aging effects display three
unique characteristics:
We have to distinguish between two computer vision problems. Age synthesis which
aim at simulating the aging effects on human faces (i.e. simulate how the face would
look like at a certain age ) with customized single or mixed facial attributes (identity,
1
expression, gender, age, ethnicity, pose, etc.) which is the inverse procedure of Face
detections shown in Figure 1.1. While, Face detection and time domain analysis over
time aims at labelling a face image automatically with the exact age (year) or the age
group (year range) of the individual face.
The bio-inspired features (BIF) had significant success in the past few years on a wide
range of computer vision tasks such category recognition and face recognition, but
have only been recently applied to the human Face detectionproblem. Our approach
builds an extended version of BIF (Extended BIF – EBIF) that can encode facial
features for face representation. A face representation using EBIF for a labelled
database of face images is being extracted by incorporating fine detailed facial
features, automatic initialization using Active Shape Models and analyzing a more
complete facial area by including the forehead details. Then, for the age estimation, a
cascade of Support Vector Machine (SVM) and Support Vector Regression (SVR)
models are trained on the extracted facial features to learn an age estimator. We
evaluated our algorithm on the widely used FG-NET and MORPH benchmark
databases showing the superiority of our proposed algorithm (extended BIF – EBIF)
over the state-of-the-art methods.
In this thesis, we study different facial parts using EBIF mainly: the whole face, eye
wrinkles and the internal face using different feature shape points. The analysis shows
that, eye wrinkles contain the most important aging features where they cover 30% of
the face compared to internal face and whole face. We build our own large database
using the internet and the fast growing social media content such as that of Flickr.
This enables having a rich repository of tagged images that can serve in the task of
age estimation. We present an automatic Advance Analysis of Image Processing
Algorithm image mining engine that is capable of collecting images using human age
2
related text queries having various ethnicity groups and image qualities. Then, we use
the Active Shape Model for robust face detection using Viola and Jone’s AdaBoost
Haar-like face detector; it acts also as a removal step for non-face images. After that,
we use the extended bio-inspired features (EBIF) to extract the facial aging
information. The downloaded images from the internet are not accurately labelled
with the descriptive tags of true ages. So, we cannot use the Advance Analysis of
Image Processing Algorithm image collection as a training dataset. That motivates us
to introduce a robust universal labeller algorithm to label Flickr images automatically
with no manual human labelling. Finally, we use the FG-NET and MORPH standard
databases as testing datasets showing the superiority of our proposed image web
mining algorithm, over the state-of-the-art methods. Our human age estimator has
several advantages. Primarily, it is an automatic system that does not require any kind
of human intervention or supplying any input parameters to estimate his/her age.
Secondly, it is based on recently proposed algorithms in image and facial image
analysis for the task of evaluating the age of a human face. Thirdly, only a single
image of the subject is required to estimate his age within seconds. Lastly, the
estimated ages are very close to the real ages. The availability of good quality images
is not essential for the method, because the Active Shape Model is trained on different
qualities of images with large variation of lighting, pose, and expression which is able
to extract the face accurately.
1.1 Motivation
Automatic Face detection and it progression in time domain from facial images has
recently emerged as a technology with multiple interesting applications. The
following examples demonstrate some beneficial uses of the software.
The ECRM is a management strategy that uses latest computer vision algorithms and
approaches to build interaction tools for effectively establishing different relationships
with all customers and serving them individually. Customers are classified to different
age groups such as babies, teenagers, adults and senior adults. It is important to take
their habits, preferences, responsiveness, and expectation to marketing in
consideration, companies can earn more money by acknowledging this fact,
3
responding directly to all customers‟ specific needs based on their age groups, and
customizing products or services according to each customer age group. The most
challenging part is to maintain enough personal information records or histories from
all customers‟ age groups, where companies need to invest a large amount of cost
input to establish long-term customer relationships. For example, the owner of a fast
food shop wants to know the most popular sandwiches or meals for each age group;
the advertising companies want to target specific audiences (potential customers) for
specific advertisements in terms of age groups; mobile manufacturers want to know
which age group is more interested in their new product models showing in a public
kiosk; clothes stores may display suitable fashions for males or females according to
their age groups.
The internet is considered as the largest image database that ever existed, it allows
access to billions of face images uploaded by regular users with descriptive tags and
titles, albeit noisy in many cases. One of the most famous websites are facebook.com,
flickr.com and other regular sites, where the users can benefit from an Face
4
detectionapplication that can estimate human face accurately, and retrieve face images
based on the estimated age image query for friendship requests and face image
collection.
1.2.4 Challenges
There were several challenges encountered when attempting to develop our algorithm,
because face images can demonstrate a wide degree of variation in both shape and
texture. Appearance variations are caused by individual differences, the deformation
of an individual face due to changes in expression and speaking, as well as lighting
variations. These issues are explained more in the following points:-
5
2) Enhancement module;
In the core system module, our Face detection algorithm consists of two tasks, face
representation using the extended bio-inspired features (EBIF) that is based on the
bio-inspired features to encode the facial features robustly. Then, age estimation for
analysis over the time domain, where we train a cascade of Support Vector Machines
(SVM) and Support Vector Regression (SVR) to learn the relationship between the
coded representation of the face and the actual age of the subjects. Once this
relationship is established, it is possible to estimate the age of a previously unseen
image. Some concepts need to be explained first.
Actual Age analysis: The real age (cumulated years after birth) of an individual.
Appearance Age: The age information shown on the visual appearance.
Perceived Age: The individual age gauged by human subjects from the visual
appearance.
Estimated Age: The individual age recognized by machine from the visual
appearance.
Categorization of age: Further are being categorized on the basis of their
belonging age progression.
6
We use the Actual age and Estimate age with progression estimation definitions in
this work.
1.4Enhancement module
In this module, we do several improvements to enhance the output results from the
core system module such as more analysis on different facial parts to validate the
importance of those different parts, increase the number of pictures on FG-NET aging
database using MORPH dataset. We combine MORPH pictures and FG-NET ones
with pre-defined selection criteria, which ensure the integrity and fairness of the
chosen pictures from MORPH database, that to be inserted in FG-NET database,
which result in a new database with large variations. Our selection criteria can be
applied on any combinations of databases to have a more generalized one. Output
results from this module will show the significant improvements on the face detection
and age progression analysis.
1.5Application module
After building the core system and enhancement modules, we demonstrate the
application module that has several components:
Image collector that crawls images from the internet using human age related text
queries with several conditions such as image quality, different poses,
expressions, multiple faces in the same image and single face image. This lead to
a large database for the purpose of having more training images
The crawled images suffer from different problems such as face misalignment and
multi-instance faces in the same image with possibly incorrect label of the image
faces. This motivated us to propose different solutions to overcome the above
mentioned problems.
We then build a more generalized database using the downloaded images as training
set that uses the internet as a rich image repository in the task of age estimation. FG-
NET and MORPH datasets are being used as testing sets to show the superiority of
our web-based application components over state-of-the-art methods.
7
1.6 Neural Network for Face Recognition System
System for face recognition is consisted of two parts: hardware and software. This
system is used for automatic recognition users or confirmation of password. For input
is used either digital pictures or video frame from same video. State institution and
some private organization are using this system for face recognition especially for
identification face by video cameras like input parameter or for biometrics system for
checking identity using cameras and 3D scanners. System must to recognize where is
face on some picture, to take it from picture and to do verification. There are many
ways for verification, but the most popular is recognition of face’s characteristics.
Face has about 80 characteristic parameters some of them are: width of nose, space
between eyes, high of eyehole, shape of the zygotic bone and jaw width.
Novel framework for automatic Face detection in human face images using our
proposed algorithm for face representation. Then, a robust Face detection scheme
is introduced using a newly introduced integration of support vector machine
(SVM) and support vector regression (SVR).
We analyze different facial parts: a) eye wrinkles b) internal face (whole face
without forehead features) and c) whole face (face with forehead features). The
purpose of this study is to know the location of the most important aging features
in the input face image. Next, we increase the number of missing pictures in older
8
age groups for performance enhancement. Predefined selection criteria is used to
combine the MORPH and FG-NET database images for range completion, which
results in a new database with large variations of ethnicity groups. These selection
criteria can be applied on different database to generate a more generalized one.
We design a human Face detection application by utilizing images to build more
generalized database with different ethnicity groups. It has several components
that contribute in the construction of the database such as a image collector, face
detection and noise removal step, face representation and labelling algorithm.
9
CHAPTER 2
LITERATURE SURVEY
Human face aging is a non-reversible process, causing human face characteristics to
change over time as hair whitening, muscles drop and wrinkles, even though some
beauty care products may slightly reverse minor photo-aging effects. People have
different patterns of aging, with time human faces start to take different forms in
different ages, and there is general discriminative information we can always
describe. Previous work on aging can be broken down into two major categories; (a)
age progression and (b) face estimation and Recognition
2.1Age progression
10
2.1.1 Implicit Statistical Synthesis
Zhi-Hua Zhou use AAM to build aging functions of young faces under 30 years of
age, in which PCA is applied to extract the shape and texture variations from a set of
training examples. The PCA coefficients for the linear reconstruction of training
samples are considered as model parameters, which control different types of
appearance variations. This aging model can be used for age normalization and
further improvement the face recognition performance. [17]
Figure 2.1 shows the AAM example and the aging appearance simulation result by
Zhi-Hua Zhou‟ s method. Ramanathan and Chellappa proposed an appearance-based
adult aging framework. The shape aging is manipulated by a physically-based
parametric muscle model, while the texture aging is manipulated by an image gradient
based wrinkle extraction function. This model can predict and simulate facial aging in
two cases: weight-gain and weight loss. The wrinkle simulation module can generate
different effects such as subtle, moderate, and strong.
Figure 2.1: Face aging using active appearance model and principal component
analysis
Volker Blanz. presented a dynamic face aging model with multi-resolution and
multilayer image representations. Associated with a layered and-Or graph, all 50
training face images are decomposed into different parts. The general aging effects
are learned from global hair style and shape changes, facial components deformations,
and facial wrinkles geography. A dynamic Markov process model is built on the
11
graph structures of the training data. The graph structures over different age groups
are finally sampled in terms of the dynamic model to simulate new aging faces.
Figure above shows this model and some aging simulation results which exhibit
highly photorealistic results. [7].
12
2.1.2 Explicit Mechanical Synthesis
Both face aging and rejuvenating can be simulated using this method. The first and
third columns in Figure 2.4 show the original images of two persons with different
age. The second and fourth columns show the two rendering results for face aging and
rejuvenating (exchange the age between the two faces).
(a) Subject 1 (original face in the left picture, simulated face in the right picture (b)
Subject 2 (original face in the left picture, simulated face in the right picture
Figure 2.4: Face aging and rejuvenating results using image-based surface detail
transfer results
The existing Face detection frameworks using face images typically consist of two
main stages: age image representation and Face detection techniques.
13
2.2.1 Image representation
Kwon and Lobo studied cranio-facial development theory. The theory of cranio-facial
uses a mathematical model to describe the growth of a person‟ s head from infancy to
adulthood. Farkas provided a comprehensive overview of face anthropometry. Face
anthropometry is the science of measuring sizes and proportions on human faces. For
age growth characterization, people usually utilize the distance ratios measured from
facial landmarks, instead of using directly the mathematical models. There are two
reasons that people do not use the mathematical formulation for age estimation: The
mathematical model cannot analyze the head growth naturally specifically when the
ages are close to adults;and it is hard to measure the head growth from 2D face
images. Kwon and Lobo classified human faces into three groups: babies, young
adults and senior adults. The experiment was performed on a small database of 47
faces. The authors did not report the overall performance on this small database. Face
detection approaches based on the anthropometry model can only deal with young
ages, since the human head shape does not change too much in its adult period. [15]
The active appearance model is a statistical face model proposed initially in for
coding face images. Given a set of training face images, a statistical shape model and
an intensity model are learned separately, based on the principal component analysis.
The AAMs were used successfully for face encoding. The AAMs for face aging by
proposing an aging function, 𝑎𝑔𝑒=𝑓𝑏 to explain the variation in age. In the aging
function, age is the actual age of an individual in a face image, b is a vector
containing 50 raw model parameters learned from the AAMs, and f is the aging
function. The aging function defines the relationship between the age of individuals
and the parametric description of the face images. The experiments were performed
on a database of 500 face images of 60 subjects. The tried different classifiers for
Face detection based on their age image representation, especially the quadratic aging
function. In comparison with the anthropometry model based approaches, the AAMs
can deal with any ages in general, rather than just young ages. In addition, the AAMs
based approaches to age image representation consider both the shape and texture
rather than just the facial geometry as in the anthropometric model based methods.
14
These approaches are applicable to the precise age estimation, since each test image
will be labelled with a specific age value chosen from a continuous range. The further
improvements on these aging pattern representation methods are (1) to provide
evidences that the relation between face and age can be essentially represented by a
quadratic function; (2) to deal with outliers in the age labelling space; and (3) to deal
with high-dimensional parameters.
Geng et al. [19] introduced Aging pattern Subspace (AGES), which deal with a
sequence of an individual’s aging face images that will be used all together to model
the aging process. Instead of dealing with each aging face image separately. An aging
pattern is defined as a sequence of personal face images, coming from the same
person, sorted in the temporal order . If the face images of all ages are available for an
individual, the corresponding aging pattern is called a complete aging pattern;
otherwise, it is called anIncomplete aging pattern. The AGES method presented in can
simulate the missing ages by using an EM-like iterative learning algorithm. This
AGES method works in two stages: the learning stage and the Face detection stage. In
learning of the aging pattern subspace, the PCA technique was used to obtain a
subspace representation. The difference from the standard PCA approach is that there
are possibly some missing age images for each aging pattern. The proposed EM-like
iterative learning method is used to minimize the reconstruction error characterized by
the difference between the available age images and the reconstructed face images.
The initial values for missing faces are set by the average of available face images.
Then the eigenvectors of the covariance matrix of all face images and the means can
be computed. Given the eigenvectors and mean face, the reconstruction of the faces
can be computed. The process iterates until the reconstruction error is small enough.
In the Face detectionstage, the test face image needs to find an aging pattern suitable
for it and a proper age position in that aging pattern. The age position is returned as
the estimate of the age of the test face image. To do this, the test face is verified at
every possible position in the aging pattern and the one with the minimum
reconstruction error is selected.
In terms of utilizing the AAMs for encoding face images, Geng et al. used 200 AAMs
features to encode each face image. The experiment for evaluating the AGES method
15
was performed on the FG-NET aging database. The Mean Absolute Error (MAE) was
reported as 6.77 years. In practical use of the AGES method, a problem is that in
order to estimate the age of an input face image, the AGES method assumes there
exist face images of the same individual but at different ages or at least a similar aging
pattern for that face image in the training database. This assumption might not be
satisfied for some aging databases. And also it is not easy to collect a large database
containing face images of the same individual at many different ages with close
imaging conditions. Another problem of the AGES method is that the AAMs face
representation might not encode facial wrinkles well for senior people, because the
AAMs method only encodes the image intensities without using any spatial
neighborhood to calculate texture patterns. Intensities of single pixels usually cannot
characterize local texture information. In order to represent the facial wrinkles for
elder adults, the texture patterns at local regions need to be considered.
Pramod Kumar Pisharady and Martin Saerbeck. [18] proposed to use the Biologically
Inspired Features (BIF) for Face detectionvia faces. The theory behind this method
will be explained in details in the next section. BIF can achieve MAE of 4.77 years on
the FG-NET aging database, and MAEs of 3.91 years and 3.47 years for female and
male on the YGA database respectively. Considering both age and gender estimation
in an automatic framework , the BIF+Age Manifold feature combined with SVM can
achieve MAEs of 2.61 years and 2.58 years for female and male on the YGA database
respectively. These results demonstrate the superior performance of the BIF for the
task of age estimation.
Given an aging feature representation, the next step is to estimate ages. Face detection
approaches fall into two categories: a) classification-based; and b) regression-based
2.3.1 Classification-based
Zhi-Hua Zhou et al. [17] evaluated the performance of different classifiers for age
estimation, including the nearest neighbor classifier, the Artificial Neural Networks
(ANNs), and a quadratic function classifier. The face images are represented by the
16
AAMs method. From experiments on a small database containing 400 images at ages
ranging from 0 to 35 years, it was reported that the quadratic function classifier can
reach 5.04 years of MAE, which is slightly lower than the nearest neighbor classifier,
but higher than the ANNs. The SVM was applied to FacedetectionbyGuo et al. on a
large YGA database with 8,000 images. The MAEs are 5.55 and 5.52 years for
females and males, respectively. The MAE is 7.16 for the FG-NET aging database.
Kanno et al. presented to use artificial ANN for the 4-class age-group classification
which achieved 80% accuracy on 110 young male faces. Gaussian models in a low-
dimensional 2DLDA+LDA feature space using the EM Algorithm. The age-group
classification is determined by fitting the test image to each Gaussian model and
comparing the likelihoods. For the 5-year range age-group classification, their system
achieves accuracies of about 50% for male and 43% for female. For 10-year range
age-group classification, it achieves accuracies of about 72% for male and 63% for
female. For 15-year range age-group classification, it achieves accuracies of about
82% for male and 74% for female.
2.3.2 Regression-based
Michel F. Valstar, TimurAlmaev [5] investigated three formulations for the aging
function: linear, quadratic, and cubic, respectively, with 50 raw model parameters.
The optimal model parameters are learned from training face images of different ages
based on a genetic algorithm. The SDP is an effective tool but computationally very
expensive . When the size of the training set is large, the solution to SDP may be
difficult to achieve. An Expectation Maximization (EM) algorithm was used to solve
the regression problem and speed up the optimization process . The MAEs are
reported as 6.95 years for both female and male on the YGA database, and 5.33 for
the FG-NET aging database. Zhou et al. presented the generalized Image Based
Regression (IBR) aiming at multiple-output settings. A boosting scheme is used to
select features from redundant Haar-like feature set. The proposed training algorithm
can also significantly reduce the computational cost. The IBR can achieve 5.81 MAE
of a 5-folds cross validation test on the FGNET aging database. Suo et al. compared
Age group specific Linear Regression (ALR), MLP, SVR, and logistic regression
(multi-class Adaboost) on FG-NET and their own databases and finally achieved the
best performance with MLP in their experiments. Guo et al. , chose the SVMs as a
representative classifier, and the SVR as a representative regressor, and compared
17
their performance using the same input data. From their experiments, the SVMs
perform much better than the SVR on the YGA database (5.55 versus 7.00, and 5.52
versus 7.47 years, for females and males, respectively), while the SVMs perform
much worse than the SVR on the FG-NET database (7.16 versus 5.16 years). From
the experimental results, we can see that the classification-based Face detectioncan be
much better or much worse than the regression-based approach in different cases.
Recent work suggests that visual processing in the cortex can be modelled as a
hierarchy of increasingly sophisticated representations. A recent theory suggests that
the feed forward path of object recognition in the cortex accounts for the first few
hundred milliseconds of visual processing in primate cortex follows a mostly feed
forward hierarchy. Riesenhuber et al. Proposed a new hierarchical model derived
from a feed-forward model of the primate visual object recognition pathway, called
the “HMAX” model.The standard framework as shown in Figure 2.5 consists of
different layers of computational units called simple (S) and complex (C) cell units
creating increasing complexity as the layers progress from the primary visual cortex
(V1) to inferior temporal cortex (IT) . A notable property of the model is the nonlinear
maximum operation “MAX” over the S units rather than the linear summation
operation “SUM” in pooling inputs at the C layers. Specifically, the first layer of the
model, called the S1 layer, is created by convolving a pyramid of Gabor filters at 4
orientations and 16 scales, over the input gray level image.
Adjacent two scales of S1units are then grouped together to form 8 “bands” of units
for each orientation. The second layer, called the C1layer, is then generated by taking
the maximum values within a local spatial neighbourhood and across the scales within
a band. So the resulted C1representation contains 8 bands and 4 orientations. The
advantage of taking the “MAX” operation within a small range of position and scale
is to tolerate small shifts and scale changes
18
Figure 2.5: Bio-inspired model
[1]Mr. Dinesh Chandra Jain Dr. V. P. Pawarproposed a new way to recognize the
face using facial recognition software and using neural network methods. That makes
a facial recognition system to protect frauds and terrorists.The Face recognition is an
important and secured way to protect the frauds at everywhere like government
agencies are investing a considerable amount of resources into improving security
systems as result of recent terrorist events that dangerously exposed flaws and
weaknesses in today’s safety mechanisms. Badge or password-based authentication
procedures are too easy to hack.
19
LDA, ICA, SVM, Gabor wavelet soft computing tool like ANN for recognition and
various hybrid combination of this techniques. This review investigates all these
methods with parameters that challenges face recognition like illumination, pose
variation, facial expressions.
[3] MamtaDhanda Seth Jai Prakash Mukund Lal Institute of Engineering and
Technology proposed The design of the face recognition system is based upon
“eigenfaces”. The original images of the training set are transformed into a set of
eigenfaces E. Then, the weights are calculated for each image of the training set and
stored in the set W. Upon observing an unknown image Y, the weights are calculated
for that particular image and stored in the vector WY. Afterwards, WY is compared
with the weights of images, of which one knows for certain that they are facing.
20
found by methods sensitive to these high-order statistics. Independent component analysis
(ICA), a generalization of PCA, is one such method.
[7] Volker Blanz and Thomas Vetter, Member, IEEEproposed a method for face
recognition across variations in pose, ranging from frontal to profile views, and cross
a wide range of illuminations, including cast shadows and specular reflections. To
account for these variations, the algorithm simulates the process of image formation
in 3D space, using computer graphics, and it estimates 3D shape and texture of faces
from single images. The estimate is achieved by fitting a statistical, morphable model
of 3D faces to images. The model is learned from a set of textured 3D scans of heads.
[9]WU Xiao-Jun Josef Kittler YANG Jing-Yu Kieron Messer Wang Shi-Tong
proposed a new kernel direct discriminant analysis (KDDA) algorithm in this paper.
First, a recently advocated direct linear discriminant analysis (DLDA) algorithm is
overviewed. Then the new KDDA algorithm is developed which can be considered as
a kernel version of the DLDA algorithm. The design of the minimum distance
classifier in the new kernel subspace is then discussed.
[10] Juwei Lu, K.N. Plataniotis, A.N. Venetsanopoulos Bell Canada Multimedia
Laboratory, The Edward S. Rogers Sr. proposed a kernel machine based
discriminant analysis method, which deals with the nonlinearity of the face patterns’
distribution. The proposed method also effectively solves the so-called “small sample
size” (SSS) problem which exists in most FR tasks. The new algorithm has been
tested, in terms of classification error rate performance, on the multi-view UMIST
face database.
[11] Felix Juefei-Xu∗1 ,Khoa Luu∗1,2 , Marios Savvides1 , Tien D. Bui2 , and
Ching Y. Suen2 proposed a novel framework of utilizing periocular region for
21
age invariant face recognition. To obtain age invariant features, we first perform
preprocessing schemes, such as pose correction, illumination and periocular region
normalization. And then we apply robust Walsh-Hadamard transform encoded local
binary patterns (WLBP) on preprocessed periocular region only. We find the WLBP
feature on periocular region maintains consistency of the same individual across ages.
[13] Thomas SerreLior Wolf Tomaso Poggio proposed a introduce a novel set of
features for robust object recognition. Each element of this set is a complex feature
obtained by combining position- and scale-tolerant edgedetectors over neighboring
positions and multiple orientations. Our system’s architecture is motivated by a
quantitative model of visual cortex.
[15]Young H. Kwon∗ and Niels da Vitoria Lobo† proposed a theory and practical
computations for visual age classification from facial images. Currently, the theory
has only been implemented to classify input images into one of three agegroups:
babies, young adults, and senior adults. The computations are based on cranio-facial
development theory and skin wrinkle analysis. In the implementation, primary
features of the face are found first, followed by secondary feature analysis.
[16] AsumanGünay and Vasif V. Nabiyev proposed a novel age estimation method
- Global and Local feAture based Age estiMation (GLAAM) - relying on global and
22
local features of facial images. Global features are obtained with Active Appearance
Models (AAM). Local features are extracted with regional 2D-DCT (2- dimensional
Discrete Cosine Transform) of normalized facial images.
23
CHAPTER 3
Face detection accuracy depends on how well the input images have been represented
by good general discriminative features. The choice of classification or regression has
an impact on the result of the estimated age for unknown image. In this chapter, we
present a complete Face detection framework with description of each component.
The output of this framework is the estimated age for the input face image. Core
system module is considered the first module in our three modules system in the Face
detection task. The major contributions presented in this chapter are: (1) automatic
localization of facial landmarks for the input faces for the first time with the bio
inspired- features (BIF) method using Active Shape Models (ASM) ; (2) utilizing
micro facial features to reveal facial details in the forehead leading to a significant
increase in the overall accuracy over the state-of-the-art methods; (3) a new set of
Gabor features (I - Imaginary and M -Magnitude) parts that were not investigated
before for the Face detection problem along with the commonly used real part in BIF;
and (4) increasing the number of shape features through inclusion of the forehead
shape features.
The proposed algorithm for Face detections divided into five steps. First the facial
landmarks for the face image are detected automatically using ASM (as opposed to
the case of BIF where this step was manually performed). The image is cropped to the
area covering a fixed number of points generated from the ASM step (several
numbers of points was tested experimentally). Then, the cropped image undergoes
filtering by a family of Gabor functions that yield three set of features Real,
Imaginary and Magnitude (RMI) parts at different orientations and scales. The filtered
outputs undergo a feature dimensionality reduction step by just keeping the maximum
(MAX) and standard deviations (STD) of the Gabor filtered outputs. Finally, both a
classification-based and regression-based models were used in the training phase
24
(SVM and SVR in this case) to produce the final age model estimator. The complete
framework block diagram is illustrated in Figure 3.1.
Face images can demonstrate a wide degree of variation in both shape and texture.
Appearance variations are caused by individual differences, the deformation of an
individual face due to changes in expression, pose and speaking, as well as lighting
variations. Figure 3.2 shows how the image of a person varies with different poses.
Figure 3.1 shows illumination changes caused by light sources at arbitrary positions
and intensities that contribute to a significant amount of variability. Typically we
locate the features of a face in order to perform Face detection on the detected shape
features.
Figure 3.2: Images of the same person with different head pose
25
Figure 3.3: Images of the same person under different lighting conditions.
In this step, we aim at accurately localizing the facial region to extract features only
from the relevant parts of the input image. This localization step was manually
performed in, which limits its practical usage. In this work, we explore the use of
Active Shape Models for the automatic localization of facial landmark points; which
has two main stages, namely training and fitting. In the training stage, we manually
locate landmark points for hundreds of images in such a way that each landmark
represents a distinguishable point present on every example image. An object shape is
represented by a set of labeled points or landmarks. The number of landmarks should
be large enough to show the overall shape and the details where it is needed.
We need to determine the number of landmark points that can adequately represent
the shape. We tried 75 and 68 points which were provided by respectively. We use 75
points to cover the whole face including the forehead. Then, we build a statistical
shape model based on these annotated images. The model that will be used to describe
a shape and its typical appearances is based on the variations of the spatial position of
each landmark point within the training set. The model represents the expected shape
and local grey-level structure of a target object in an image Having built the shape
model, points on the incoming face image are fitted. First, we detect the face in the
input image. Second, we initialize the shape points and do image alignment fitting to
get the detected shape features automatically. Finally the input image is cropped to the
area covered by the ASM fitted landmark points. In this thesis, we use 75 points for
the landmark points as opposed to 68. This includes features from the forehead region
which contributed to an increase of accuracy, as shown in the reported experiments.
The difference between using 68 and 75 landmark points is illustrated in Figure 3.4.
26
Figure 3.4: 68 and 75 points samples from FG-NET.
Figure 3.5 shows the output of the fitting stage with different head poses, it is clear
that ASM is able to extract the facial features accurately for original input face images
of the same person in Figure 3.2 with different head rotation.
Figure 3.5: cropped face images from fitting stage of the same person with different
head poses.
Figure 3.6: cropped face images from fitting stage with different source of lighting.
ASM successfully extracted the facial features under different lighting conditions as
shown in Figure 3.6. Although, the fitting stage could not detect the other input face
images in Figure 3.3 due to the high intensity of source lighting and illumination. This
issue can be solved by annotating images with different illumination changes caused
27
by light sources at arbitrary positions and intensities in training stage which will
contribute to have a significant amount of variability.
Texture features have proven to be distinctive for the task of Face detectionfrom
facial images. Particularly, the use of Gabor has proven to be successful as in. We
follow a similar approach as in but with the addition of different forms of Gabor
functions, namely the imaginary and magnitude features. The cropped image output
from the feature detection localizer block using Active Shape Model (ASM) is filtered
by a family of Gabor functions with different forms, 8 orientations and 16 scales.
Gabor functions for a particular scale (sigma) and orientation (theta) are described by
the equations:
The starting filter size is (3x3) rather than (5x5) which capable of revealing facial
details in the forehead area (with parameters shown in Table 3.1. This can be
observed in Figure 3.7 where the input face image is of size 59x80. The S1 units at
four orientations of band 1 (filter sizes 3x3 and 5x5) are displayed. Figure 3.8 zooms
on parts that show the different between 3x3 and 5x5 filter size.
28
Figure 3.7: Gabor filtered results at band 1 (two scales with filter sizes 3×3 and 5×5)
at four orientations. Note that in the case of the 3x3 filter, the forehead features start
to be more visible.
Equation 3.1 and 3.2 generate two set of features (R – Real and I – Imaginary parts)
respectively, we also generate a magnitude feature which is dependent on the R and I
features and not an independent feature like them. The benefit of using the Magnitude
part is shown in the reported experiments. M is described by the following equation:-
M = 𝑅2+𝐼2
Figure 3.8: Gabor filtered results with filter sizes 3x3 and 5x5
29
The results of M – Magnitude, R - Real, and I - Imaginary image parts after applying
pyramids of Gabor function with different forms, orientations and scales are shown in
Figure 3.9, 3.10 and 3.11 respectively. Gabor filtered outputs can serve as candidate
features for the Face detection problem. However, they have a very high dimension,
leading to difficulties in training. In addition, there are redundancies in the Gabor
filter outputs. Hence, a usual adopted scheme is to summarize the outputs of the
Gabor filters using some statistics measure. Here, we adopt the operations used and
proven to work quite well, namely the maximum “MAX” and the standard deviations
“STD”,with a variation on the “MAX” definition by avoiding image sub sampling to
keep the local variations which might be important for characterizing the facial details
(e.g., wrinkles, creases and muscles drop). The three sets of Gabor features are shown
in Fig. 3.12 with the “MAX” and “STD” operators applied on them.
30
Figure 3.10: Gabor filtered results for R – Real image part
31
Table 3.3 – Gaber filter parameter
Figure 3.12: Real, Imaginary and Magnitude parts after applying “MAX” and “STD”
operators
32
Where, Fi corresponds to the maximum value of two adjacent filters in the same scale
in the S1 layer band at pixel i. Where xij and xij+1 are the filtered values with scales j
and j+1 at pixel i
F is the mean value of the filtered values within pooling grid size of Ns×Ns.
The process of the two nonlinear operations “MAX” and “STD” is done for each
orientation and each scale band independently. For instance, consider the first band:
S=1. For each orientation, it contains two S1maps: the one obtained using a filter of
size 3×3 and one the other one obtained using a filer of size 5×5. The two maps have
the same width and height but with different values, because different filters are
applied. First, we take a max over the two scales by recording only the maximum
value from the two maps leading to a maximum map through “MAX” operation.
A face feature localizer is used to detect the face in each image using Active Shape
Model stage (ASM). Then, the images are cropped and resized to 59×80 gray-level
images. For the face representation; we use our extension of the biologically-inspired
33
features method to model each face for the purpose of age estimation, which leads to a
total of 6100 features per image. We use cascade of classification and regression. We
build six SVR models and one SVM model using the experimentally selected
parameter provided in Table 3.2.
Using SVR or SVM separately cannot adequately estimate age because of the
diversity of the aging process across different ages. Hence, we combine SVR and
SVM models by selecting which model to use over each age group, based on MSE
results over the training. The age of the test image is predicted using a cascade of
SVM and SVR models by taking the average over the estimated ages as shown in
Figure 3.14. Then, based on the decision nodes, the final age is estimated.
Figure 3.13: Face detectionand categorization over the progression process for test
images cascade of SVR and SVM models
34
Where lk is the ground truth age for the test is image k and lk^ is the estimated age
and N is the total number of test images. The cumulative score CS(j) is defined as
Ne≤jN×100% where Ne≤j is the number of test images on which the Face
detectionmakes an absolute error no higher than j years.
Collecting face images is an important task for the purposes of building models for
accurate age estimation. However, it is extremely hard in practice to collect large size
aging databases, especially when one want to collect the chronometric image series
from an individual. In this thesis, we have used two standard aging databases FG-
NET and MORPH; we summarize them as follows with other existing benchmark
aging databases.
The publically available MORPH face database was collected by the face aging group
in the University of North Carolina at Wilmington, for the purpose of face biometrics
35
applications. This longitudinal database records individuals‟ metadata, such as age,
gender, ethnicity, height, weight, and ancestry, which is organized into two albums.
Album 1 contains 1,724 face images of 515 subjects taken between 1962 and 1998.
The ages range from the average of 27.3 to maximum 68 years. There are 294 images
of females and 1,430 images of males. The age span is from 46 days to 29 years.
Figure 3.16 show image examples for MORPH album 1. Album 2 contains about
55,000 face images which about 77% images are Black faces, 19% are White, and the
remaining 4% includes Hispanic, Asian, Indian, and Other. Figure 3.17 show example
images for MORPH album 2.
The private Yamaha Gender and Age (YGA) is not publically available database. So,
we did not use it in our evaluations. YGA database contains 8,000 high-resolution
outdoor color images of 1,600 Asian subjects, 800 females and 800 males, with ages
ranging from0 (newborn) to 93 years. Each subject has about 5 near frontal images at
the same age and a ground truth label of his other approximate age as an integer. The
photos contain large variations in illumination, facial expression, and makeup. The
36
faces are cropped automatically by a face detector , and resized to 60x60 gray-level
patches
MORPH data is assumed to be the same as that of the FG-NET Aging Database (0-
69), although the actual range is much smaller (15-68) (the same strategy was used
in). The results of using separate models for different face representations (R – Real, I
– Imaginary, M – Magnitude) features are shown in Table 3.1 on FG-NET aging
databases in terms of MAE. It is clear that using SVR or SVM separately cannot
adequately estimate age because of the diversity of the aging process across different
ages. Hence, we combine SVR and SVM models by selecting which model to use
over each age group, based on MSE results over the training set.
Table 3.1: MAEs of using separate SVR and SVM for different Gabor features on
FG-NET
37
Table 3.2: MAE (years) measures for 75 and 68 shape feature.
We first measure the effect of utilizing detailed shape features by using 75 points as
opposed to 68 . Experimental results are provided in Table 3.2, which shows the
superiority of a more detailed shape representation over FG-NET.
The second set of experiments aimed at measuring MAE and CS scores of the
proposed method against the state-of-the-art methods. The MAE results on the FG-
NET and MORPH are shown in Table 3.3 for different face representation features. In
the FG-NET, the R – Real part has an MAE of 3.16 which is less than the 3.31 of M –
Magnitude and 3.50 of I – Imaginary parts. For the MORPH database, (which was
tested in), the authors used 433 images which represent only Caucasian descent - we
used the same images for consistency. The R – Real part has an MAE of 4.11 on
MORPH which is significantly less than M – Magnitude and I – Imaginary parts (4.34
and 4.47) respectively.
Table 3.3: MAE (years) measures for R, I and M parts on FG-NET and MORPH
The MAEs at each age group (about 10 years span) are given in Table 3.3 for FG-
NET, Our extension of the bio-inspired features and the new set of Gabor features (I –
Imaginary and M – Magnitude) have MAEs that are less than state-of-the-art
methods. These average errors are substantially smaller than the RUN method (5.78
year) and even significantly lower than the very recent BIF approach (4.77 year)
which announced to be the best reported result so far. See Table 3.4 for an aggregate
comparison of MAE values for both FG-NET and MORPH databases
38
Table 3.4: MAE (years) at different age groups on FG-NET.
CS curves are similarly shown in Figure 3.18 for the R – Real part which we call it
extended bio-inspired feature (EBIF) and BIF methods.
39
3.1.11 Summary
In this study, we explore presented a human age estimator based on the bio-inspired
features-based method. We explored new set of Gabor features for the first time (I –
Imaginary and M – Magnitude). We have combined BIF and the Active Shape Model
(ASM) for initialization. Furthermore, we have experimented with the extraction of
finer facial features as opposed to and shown experimentally the superiority of the
proposed contributions. Evaluated on the FG-NET and MORPH benchmark
databases, our algorithm achieved high accuracy in Evaluating human ages compared
to published methods. We also tested the proposed algorithm on MORPH database to
show its generalization capabilities. We improve the output results by combining the
core system module with a second module called the enhancement module, where we
investigate different facial parts such as eye wrinkles and internal face and creating a
more generalized database by combining FG-NET and MORPH databases only to
enhance the results in the older age groups by completing the missing pictures.
Most of the Face detection frameworks use the whole face to estimate the age of the
input face image to have a better accuracy. In our previous chapter, we used the whole
face that was represented by 75 feature shape point to estimate the age using our
enhanced version of the bio-inspired features (EBIF) that achieved better accuracy
than the state-of-the-art. The researchers focused in their research to find a global
aging function using larger datasets or different methods, but few of them explored
where are the most important aging features in the face such as Zhi-Hua Zhou, who
empirically studied the significance of different facial parts for automatic age
estimation. The algorithm is based on his previous work on statistical face models.
His investigation involved the assessment of the following face regions: the whole
face (including the hairline), the whole internal face, the upper part of the face and the
lower part of the face as show in Figure 3.17.
40
Figure 3.16: Different facial regions used for Face detection
Experimental results revealed that the area around the eyes Figure 3.17(c), proved to
be the most significant for age estimation. The model of the upper facial part
minimized estimation error and standard deviation resulting in a mean error of 3.83
years and a standard deviation of 3.70 years. Zhi-Hua Zhou claims that introduction
of the hairline (when using the whole face) has a negative effect on the results. He
claims that the increased variability of the hair region distorts the Face detectiontask.
Zhi-Hua Zhou analysis was limited to subjects ranging from 0 to 35 years old, and
contained 330 images, of which only 80 were used for testing purposes. Evidently,
faces with more wrinkles were not used, leaving in doubt his ability to estimate the
age of subjects older than 35 year. This motivated us to analysis these facial parts
using faces with more wrinkles and Evaluating the age for subjects older than 35 year
to see the impact analysis on the older ages. In this chapter, we use the core Face
detection components explained in chapter 3 to analyze different facial parts: a) eye
wrinkles b) whole internal face (without forehead area) and c) whole face (with
forehead area) using different feature shape points. The analysis shows that eye
wrinkles, which cover 30% of the facial area, contain the most important aging
features compared to internal face and whole face. We use I- Imaginary and M-
Magnitude parts of Gabor function that were introduced in Chapter 3 to extensively
analyze eye wrinkles. As shown also in the experiments section in chapter 3, The FG-
NET MAEs results in the older age groups are high compared to younger age groups,
that is due to the very limited number of pictures in older age groups in the training
phase. We enhance the results of those older age groups in FG-NET by increasing the
number of missing pictures in older age groups using MORPH database. In next
41
section, we explain the proposed Face detection frame work to analyze different
human facial parts.
As shown in the diagram of Figure 3.18, the proposed Face detection algorithm
consists of two main stages, namely 1) Pre-processing stage and 2) Face detection
process.
In pre-processing stage, the facial landmarks for the whole face, internal face and eye
wrinkles are detected using Active Shape Model (ASM) block as was performed
previously in section 3.2.1. The images are cropped to the area covering a fixed
number of points generated from the ASM stage. Then, in the Face detection process,
the cropped images undergo filtering by a family of Gabor functions at different
orientations and scales using 𝑆1 block. The filtered outputs undergo a feature
dimensionality reduction step by keeping the maximum (MAX) and standard
deviations (STD) of the Gabor filtered outputs using 𝐶1 block. Finally, both
classification-based and regression-based models were used in the training phase
(SVM and SVR in this case) to produce the final age model estimator.
We use the same methodology in section 3.2.1 to extract the feature shape points, but
we aim at analyzing eye wrinkles using 20 points, internal face area using 58 points
and whole face using 75 points which were provided by . The purpose of this analysis
is to determine the locations of the most important aging features using eye wrinkles
42
or internal face only rather than using the whole face area. Finally, we build three
separate statistical shape models using
Each face region has its own annotation points that were used during the training
stage. Eye wrinkles has 20 points that was taken from the BioIDdatabase to build a
shape model that is used in the fitting stage to extract the area that covers this number
of points. Whole face and internal face have 75 and 58 points that were taken from to
build a shape model for them. These shape models are used in the fitting stage to
extract the area that covers this number of points. So, the used points describe each
facial part are uncorrelated to each other. The difference between using 75, 20 and 58
landmark points is illustrated figure below. The facial regions covered by each shape
model produce the same facial parts that were produced by Zhi-Hua Zhou except the
mouth area, where there is no shape model built for it in our experiments due to the
non-existing annotation points for this area.
43
3.2.3 Face Representation
We use the same explained feature face representation earlier chapter. Figure 3.20
show the output of 𝑆1 layer for eye wrinkle, internal face and whole face.
Figure 3.20: Gabor filtered results using I, M and R parts at different bands with filter
size 3×3 at 0
44
CHAPTER 4
4.1.1 Pose correction: the input face is warped to approximately frontal pose using
the alignment pipeline of denote the aligned photo I.
4.1.2 Texture age progress: Relight the source and target age cluster averages to
match the lighting of yielding AI s and AI t. Compute flow F source–input between
AI s and I and warp AI s to the input image coordinate frame, and similarly for F
target–input. This yields a pair of illumination matched projections, Js and Jt both
warped to input. The texture difference Jt−Js is added to the input image I.
4.1.3 Flow age progress: Apply flow from source cluster to target cluster Ftarget–
source mapped to the input image, i.e., apply F input–target ◦F target–source to the
texture-modified image I + Jt − Js. For efficiency, we precompute bidirectional flows
from each age cluster to every other age cluster. Aspect ratio progress:
Apply change in aspect ratio, to account for variation in head shape over time. Per-
cluster aspect ratios were computed as the ratio of distance between the left and right
eye to the distance between the eyes and mouth, averaged over the fiducially point
locations of images in each of the clusters. We also allow for differences in skin tone
(albedo) by computing a separate rank-4 subspace and projection for each colour
channel.
The main focus of this study is to move the research on the human Face detection and
progression to real applications and practical usage of life rather than being bounded
to the existing databases with their limitations to a single human ethnic group or the
well annotated faces. All methods and algorithms should take into consideration a
more generalized database that contains various races with different image qualities
and conditions.
45
In this work, we address the following issues:
2. Face misalignment can be rectified by using the Active Shape Model (ASM)
to locate the correct facial landmarks for the face images.
3. The problem of multi-instance faces in the same image with possibly incorrect
labels of the image. This motivated us to design the universal labeller
algorithm for efficient and effective image labelling.
As Proposed work is capable to detect small image that is a part of big image but it is
hardly to detect the big size image of larger image size. It is due to the algorithm
(template matching algorithm) which is based on the matrix formation of image taken.
Further, Inclusion of big image and its matrix information for use of template
matching is quite hard to recognise from the ancestor (Parent Image).
As this research include eyes, ear, nose and some other specific part of face image is
just the similar concept of template matching algorithm which is already described in
earlier chapter.
These theses investigate the face recognition and its modules like face part detection.
Further, for age progression this include three time domain phase like the age of
human at childhood age, at middle age and at older age. By observing these three
different phases of human this research concludes the factors on which we can
differentiate the image:
46
Texture of Skin
Aging Sign of Skin
Face Part slight shift or slight deformation
All above parameter is just the specific criteria on which age progression can be
determine in situation which is not as actual. There are many factors which can affect
the person and their face during the growth of age. Environmental condition, food
consumption, exercise, day to day life routine and many factors affect the looks as
well as overall personality of human. So, it is hard to find or predict the actual face
after some year.
47
CHAPTER 5
The success of any Face detection frame work depends on the availability of the
data. So, data collection is extremely laborious and important task. Ideal data set
should cover a wide range of age Include many different subjects and contain at
least one image for each age of each subject.
People rely on multiple cues to estimate other people’s age such as face, voice,
gait and hair. Combine face with one or more other cues for Face detection might
remarkably improve the current performance
48
Display the detected points.
Initialize the tracker with the initial point locations and the initial video frame.
transformation between the points in the previous and the current frames get the
next frame Track the points. Note that some points may be lost.
Estimate the geometric transformation between the old points and the new points
and eliminate outliers
Insert a bounding box around the object being tracked Display tracked points
Display the annotated video frame using the video player object
Clean up
This metric is also known as the Taxicab or Manhattan Distance metric. It sums the
absolute values of the differences between pixels in the original image and the
corresponding pixels in the template image. This metric is the l1 norm of the
difference image. The lowest SAD score estimates the best position of template
within the search image. The general SAD distance metric becomes:
49
5.3.2 Sum of Squared Differences (SSD)
This metric is also known as the Euclidean Distance metric. It sums the square of the
absolute differences between pixels in the original image and the corresponding pixels
in the template image. This metric is the square of the l2 norm of the difference
image. The general SSD distance metric becomes:
50
% See also: GUIDE, GUIDATA, GUIHANDLES
ifnargout
[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
gui_mainfcn(gui_State, varargin{:});
end
% End initialization code - DO NOT EDIT
% --- Outputs from this function are returned to the command line.
functionvarargout = New_Fig_OutputFcn(hObject, eventdata, handles)
% varargout cell array for returning output args (see VARARGOUT);
% hObject handle to figure
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
51
% Get default command line output from handles structure
varargout{1} = handles.output;
clear all;
close all;
clc;
if ~exist('gabor.mat','file')
fprintf ('Creating Gabor Filters ...');
create_gabor;
end
if exist('net.mat','file')
load net;
else
createffnn
end
if exist('imgdb.mat','file')
loadimgdb;
52
else
IMGDB = loadimages;
end
while (1==1)
choice=menu('Face Detection',...
'Create Database',...
'Initialize Network',...
'Train Network',...
'Test on Photos',...
'Exit');
if (choice ==1)
IMGDB = loadimages;
end
if (choice == 2)
createffnn
end
if (choice == 3)
net = trainnet(net,IMGDB);
end
if (choice == 4)
pause(0.001);
[file_namefile_path] = uigetfile ('*.jpg');
iffile_path ~= 0
im = imread ([file_path,file_name]);
try
im = rgb2gray(im);
end
tic
im_out = imscan (net,im);
toc
figure;imshow(im_out,'notruesize');
end
end
if (choice == 5)
clear all;
clc;
close all;
return;
end
end
function Psi = gabor (w,nu,mu,Kmax,f,sig)
m = w(1);
53
n = w(2);
K = Kmax/f^nu * exp(i*mu*pi/8);
Kreal = real(K);
Kimag = imag(K);
NK = Kreal^2+Kimag^2;
Psi = zeros(m,n);
for x = 1:m
for y = 1:n
Z = [x-m/2;y-n/2];
Psi(x,y) = (sig^(-2))*exp((-.5)*NK*(Z(1)^2+Z(2)^2)/(sig^2))*...
(exp(i*[KrealKimag]*Z)-exp(-(sig^2)/2));
end
end
functionim_out = imscan (net,im)
close all
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~
% PARAMETERS
SCAN_FOLDER = 'imscan/';
UT_FOLDER = 'imscan/under-thresh/';
TEMPLATE1 = 'template1.png';
TEMPLATE2 = 'template2.png';
Threshold = 0.5;
DEBUG = 0;
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~
warning off;
delete ([UT_FOLDER,'*.*']);
delete ([SCAN_FOLDER,'*.*']);
if (DEBUG == 1)
mkdir (UT_FOLDER);
mkdir (SCAN_FOLDER);
end
[m n]=size(im);
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~
% First Section
C1 = mminmax(double(im));
C2 = mminmax(double(imread (TEMPLATE1)));
C3 = mminmax(double(imread (TEMPLATE2)));
Corr_1 = double(conv2 (C1,C2,'same'));
Corr_2 = double(conv2 (C1,C3,'same'));
Cell.state = int8(imregionalmax(Corr_1) | imregionalmax(Corr_2));
Cell.state(1:13,:)=-1;
54
Cell.state(end-13:end,:)=-1;
Cell.state(:,1:9)=-1;
Cell.state(:,end-9:end)=-1;
Cell.net = ones(m,n)*-1;
[LUTmLUTn]= find(Cell.state == 1);
imshow(im);
hold on
plot(LUTn,LUTm,'.y');pause(0.01);
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~
% Second Section
while (1==1)
[i j] = find(Cell.state==1,1);
ifisempty(i)
break;
end
imcut = im(i-13:i+13,j-9:j+8);
Cell.state(i,j) = -1;
Cell.net(i,j) = sim(net,im2vec(imcut));
if Cell.net(i,j) < -0.95
for u_=i-3:i+3
for v_=j-3:j+3
try
Cell.state(u_,v_)=-1;
end
end
end
plot(j,i,'.k');pause(0.01);
continue;
elseif Cell.net(i,j) < -1*Threshold
plot(j,i,'.m');pause(0.01);
continue;
elseif Cell.net(i,j) > 0.95
plot(j,i,'.b');pause(0.01);
for u_=i-13:i+13
for v_=j-9:j+9
try
Cell.state(u_,v_)=-1;
end
end
end
elseif Cell.net(i,j) > Threshold
plot(j,i,'.g');pause(0.01);
elseif Cell.net(i,j) < Threshold
plot(j,i,'.r');pause(0.01);
end
fori_=-1:1
for j_=-1:1
m_=i+i_;
55
n_=j+j_;
if (Cell.state(m_,n_) == -1 || Cell.net(m_,n_)~=-1)
continue;
end
imcut = im(m_-13:m_+13,n_-9:n_+8);
Cell.net(m_,n_) = sim(net,im2vec(imcut));
if Cell.net(m_,n_) > 0.95
plot(n_,m_,'.b');pause(0.01);
for u_=m_-13:m_+13
for v_=n_-9:n_+9
try
Cell.state(u_,v_)=-1;
end
end
end
continue;
end
if Cell.net(m_,n_) > Threshold
Cell.state(m_,n_) = 1;
plot(n_,m_,'.g');pause(0.01);
if (DEBUG == 1)
imwrite(imcut,[SCAN_FOLDER,'@',int2str(m_),',',int2str(n_),'
(',int2str(fix(Cell.net(m_,n_)*100)),'%).png']);
end
else
Cell.state(m_,n_) = -1;
plot(n_,m_,'.r');pause(0.01);
if (DEBUG == 1)
imwrite(imcut,[UT_FOLDER,'@',int2str(m_),',',int2str(n_),'
(',int2str(fix(Cell.net(m_,n_)*100)),'%).png']);
end
end
end
end
end
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~
% Third Section
hold off
figure;imshow (Cell.net,[]);
xy_ = Cell.net > Threshold;
xy_ = imregionalmax(xy_);
xy_ = imdilate (xy_,strel('disk',2,4));
[LabelMatrix,nLabel] = bwlabeln(xy_,4);
CentroidMatrix = regionprops(LabelMatrix,'centroid');
xy_ = zeros(m,n);
fori = 1:nLabel
xy_(fix(CentroidMatrix(i).Centroid(2)),...
fix(CentroidMatrix(i).Centroid(1))) = 1;
56
end
xy_ = drawrec(xy_,[27 18]);
im_out (:,:,1) = im;
im_out (:,:,2) = im;
im_out (:,:,3) = im;
fori = 1:m
for j=1:n
ifxy_(i,j)==1
im_out (i,j,1)=0;
im_out (i,j,2)=255;
im_out (i,j,3)=0;
end
end
end
%~~~~~~~~~~~~~~~~~~~~~
if exist('imgdb.mat','file')
loadimgdb;
else
IMGDB = cell (3,[]);
end
fprintf ('Loading Faces ');
folder_content = dir ([face_folder,'*',file_ext]);
nface = size (folder_content,1);
for k=1:nface
string = [face_folder,folder_content(k,1).name];
image = imread(string);
[m n] = size(image);
if (m~=27 || n~=18)
continue;
end
f=0;
fori=1:length(IMGDB)
ifstrcmp(IMGDB{1,i},string)
f=1;
end
end
if f==1
continue;
end
fprintf ('.');
IM {1} = im2vec (image); % ORIGINAL FACE IMAGE
IM {2} = im2vec (fliplr(image)); % MIRROR OF THE FACE
IM {3} = im2vec (circshift(image,1));
IM {4} = im2vec (circshift(image,-1));
IM {5} = im2vec (circshift(image,[0 1]));
IM {6} = im2vec (circshift(image,[0 -1]));
IM {7} = im2vec (circshift(fliplr(image),1));
IM {8} = im2vec (circshift(fliplr(image),-1));
57
IM {9} = im2vec (circshift(fliplr(image),[0 1]));
IM {10} = im2vec (circshift(fliplr(image),[0 -1]));
fori=1:10
IMGDB {1,end+1}= string;
IMGDB {2,end} = out_max;
IMGDB (3,end) = {IM{i}};
end
end
fprintf ('\nLoading non-faces ');
folder_content = dir ([non_face_folder,'*',file_ext]);
nnface = size (folder_content,1);
for k=1:nnface
string = [non_face_folder,folder_content(k,1).name];
image = imread(string);
[m n] = size(image);
if (m~=27 || n~=18)
continue;
end
f=0;
fori=1:length(IMGDB)
ifstrcmp(IMGDB{1,i},string)
f=1;
end
end
if f==1
continue;
end
fprintf ('.');
IM {1} = im2vec (image);
IM {2} = im2vec (fliplr(image));
IM {3} = im2vec (flipud(image));
IM {4} = im2vec (flipud(fliplr(image)));
fori=1:4
IMGDB {1,end+1}= string;
IMGDB {2,end} = out_min;
IMGDB (3,end) = {IM{i}};
end
end
fprintf('\n');
saveimgdb IMGDB;
fprintf('\n This program detects a target in an image')
fprintf('\n Entering the image for MATLAB...')
fprintf('\n Save the image or its copy in MATLAB working Directory')
imagname = input('\n Enter the name of the image file (filename.ext) : ','s');
w = imread(imagname);
w = im2double(w);
sizw = size(w);
figure
imshow(w)
title('Input Image')
58
pause(3.5);
close;
fprintf('\n Entering the target image for MATLAB...')
fprintf('\n Save the target image or its copy in MATLAB working Directory')
tarname = input('\n Enter the name of the target image file (filename.ext) : ','s');
t = imread(tarname);
t = im2double(t);
sizt = size(t);
figure
imshow(t)
title('Target Image')
pause(3.5);
close;
ww = rgb2gray(w);
tt = rgb2gray(t);
tedge = edge(tt);
wedge = edge(ww);
out = filter2(tedge,wedge);
o = max(max(out));
output = (1/o)*out;
59
sav = input('\n Do you like to SAVE Result Image? (y/n) : ','s');
if (sav == 'y')
fprintf('\n You choose to SAVE the Result Image')
naming = input('\n Type the name of the new image file (filename.ext) : ','s');
fprintf('\n Saving ...')
imwrite(final,naming);
fprintf('\n The new file is called %s and it is saved in MATLAB working
Directory',naming)
else
fprintf('\n You choose NOT to SAVE the Result Image')
end
clear
60
Fig 5.2: After the click button of face recognition a GUI come out which is the part of
training set designed for face or non face images.
Fig 5.3: Training to Sample of different face and non face to distinguish
61
Fig 5.4: These layouts come after the push button face part detection has been pressed
This will need one picture having full face image and after it fetched it is been name
with extension. Then put the target area which could extract from that image.
62
Fig5.6: Target image which is the part of above image
63
Fig 5.8: After the execution of Age Progression of PM Modi
64
Fig 5.9: It has the scroll button for the shape bend and color bend that has main cause
of aging sign for old age.
The success of any Face detection frame work depends on the availability of the
data. So, data collection is extremely laborious and important task. Ideal data set
should cover a wide range of age Include many different subjects and contain at
least one image for each age of each subject.
For better Face detection results, facial attributes decomposition plays an
important role, because a face image shows multiple facial attributes: identity,
expression, gender, age, race, pose, etc. Decomposition of these facial attributes is
essential to extract age-related features
People rely on multiple cues to estimate other people’s age such as face, voice,
gait and hair. Combine face with one or more other cues for Face detection might
remarkably improve the current performance
65
Active Shape Model and Advance analysis of image processing algorithm can be
trained on different images under pose and illumination variations. Because, pose
and illumination variations are always troublesome in real applications. Also,
Pose-invariant and illumination- invariant techniques (intensively investigated in
face recognition) might be introduced into Face detection methods.
Considering the numerous influential factors, Face detection can hardly be certain.
At this point, Face detection can be treated as a Fuzzy classification approach
E.g., “I am 85% sure that you are 18 years old”. Rather than using SVM or SVR.
Application oriented Face detection techniques may treat Face detection task as a
binary classification problem, where, most age-specific access control systems
only need to determine whether the user is older than a certain age.
Human aging patterns can be affected by many internal and external factors such
as genetics, race, gender, health, lifestyle, and even weather conditions.
Incorporating the influential factors to improve performance of age estimation.
Age progression using the bio-inspired features is not being investigated. A
further research can be established to create automatic aging scheme, where all it
requires only one input image of a subject and produces an age progressed or age
regressed image of the person at a target age.
Paying more attention to face detection algorithms such as combining the
classification and regression methods might improve the results further, because
choosing an Face detection algorithm is always a critical choice that will increase
or decrease the performance notably.
66
(II) AGE PROGRESSION FACE PART
TEST -2 DETECTION
Images part detected part not detected (eye)
(eye)
1 Y
2 y
Images part detected part not detected (eye)
(nose)
1 Y
2 y
AGE PROGRESSION-Test 3
When we slide shape Blend … it will change the shape of face … it indicate that age
is processing in any face.
AGE PROGRESSION-Test 4
In this phase, Age is processing from child to old age hence age is processing.
67
CHAPTER 6
6.1 Conclusion
We have developed a fully automatic Face detection and progression frame work in
this thesis. A three modelled architecture is proposed:
2) Enhancement module;
3) Application module.
We constructed a new database using the internet as a rich repository for image
collection. Over many images were crawled, that is based on image collector using
human age-related queries.
In core system module, we have built the main components of our human Face
detection system. We introduced a novel face representation schema which has two
main steps; face cropping using the Active Shape Model (ASM) to crop the face
image to the area that covers the face boundary.
In this module, we do several improvements to enhance the output results from the
core system module such as more analysis on different facial parts to validate the
importance of those different parts, increase the number of pictures on FG-NET aging
database using MORPH dataset.
After building the core system and enhancement modules, we demonstrate the
application module that has several components:
68
Image collector that crawls images from the internet using human age related text
queries with several conditions such as image quality, different poses,
expressions, multiple faces in the same image and single face image. This lead to
a large database for the purpose of having more training images
The crawled images suffer from different problems such as face misalignment and
multi-instance faces in the same image with possibly incorrect label of the image
faces. This motivated us to propose different solutions to overcome the above
mentioned problems.
Face recognition systems used today work very well under constrained conditions,
although all systems work much better with frontal mug-shot images and constant
lighting. All current face recognition algorithms fail under the vastly varying
conditions under which humans need to and are able to identify other people. Next
generation person recognition systems will need to recognize people in real-time and
in much less constrained situations.
We believe that identification systems that are robust in natural environments, in the
presence of noise and illumination changes, cannot rely on a single modality, so that
fusion with other modalities is essential. Technology used in smart environments has
to be unobtrusive and allow users to act freely. Wearable systems in particular require
their sensing technology to be small, low powered and easily integral with the user's
clothing. Considering all the requirements, identification systems that use face
recognition and speaker identification seem to us to have the most potential for wide-
spread application.
Cameras and microphones today are very small, light-weight and have been
successfully integrated with wearable systems. Audio and video based recognition
systems have the critical advantage that they use the modalities humans use for
recognition. Finally, researchers are beginning to demonstrate that unobtrusive audio-
and-video based person identification systems can achieve high recognition rates
without requiring the user to be in highly controlled environments.
Further, after the face recognition and its several parts using template matching
algorithm. These researches enhance the module for age progression of face. At any
69
stage of human face we can predict its future face and past face easily. In future this
technique will be used for automatic update of passport photo at airport database and
employee database of any government agencies or exam conducted for government
sector.
70
REFERENCES
[1] Mr. Dinesh Chandra Jain Dr. V. P. Pawar Research Schoolar in Comp-Science
Associate Professor , School of Comp- Science SRTM University – Nanded (M.S)
SRTM University – Nanded (M.S) “A Novel Approach For Recognition Of Human
Face Automatically Using Neural Network Method” Volume 2, Issue 1, January 2012
ISSN: 2277 128X.
[5] Michel F. Valstar, TimurAlmaev, Jeffrey M. Girard, Gary McKeown, Marc Mehu,
Lijun Yin, Maja Pantic and Jeffrey F. Cohn: “FERA 2015 - Second Facial Expression
Recognition and Analysis Challenge” 978-1-4799-6026-2/15/$31.00 ©2015 IEEE.
[6] Marian Stewart Bartlett, Member, IEEE, Javier R. Movellan, Member, IEEE, and
Terrence J. Sejnowski, Fellow, IEEE: “Face Recognition by Independent Component
Analysis” IEEE Transactions On Neural Networks, Vol. 13, No. 6, November 2002.
[7]Volker Blanz and Thomas Vetter, Member, IEEE “Face Recognition Based on
Fitting a 3D Morphable Model” IEEE Transaction On Pattern Analysis And Machine
Intelligence, Vol. 25, No. 9, September 2003.
[8] Juwei Lu, Student Member, IEEE, Konstantinos N. Plataniotis, Member, IEEE,
and Anastasios N. Venetsanopoulos, Fellow, IEEE“Face Recognition Using Kernel
71
Direct Discriminate Analysis Algorithms” IEEE Transactions On Neural Networks,
Vol. 14, No. 1, January 2003.
[9] WU Xiao-Jun Josef Kittler YANG Jing-Yu Kieron Messer Wang Shi-Tong
Department of Computer Science, Jiangsu University of Science and Technology,
Zhenjiang, China: “A New Kernel Direct Discriminant Analysis (KDDA) Algorithm
for Face Recognition” Win2PDF available at http://www.daneprairie.com.
[10] Juwei Lu, K.N. Plataniotis, A.N. Venetsanopoulos Bell Canada Multimedia
Laboratory, The Edward S. Rogers Sr. Department of Electrical and Computer
Engineering University of Toronto, Toronto, M5S 3G4, ONTARIO, CANADA:
“Face Recognition Using Kernel Direct Discriminant Analysis Algorithms” IEEE
Transactions on Neural Networks in December 12, 2001. Revised and re-submitted in
July 16, 2002. Accepted for publication in August 1, 2002
[11] Felix Juefei-Xu∗1 ,Khoa Luu∗1,2 , Marios Savvides1 , Tien D. Bui2 , and Ching
Y. Suen2 “Investigating Age Invariant Face Recognition Based on Periocular
Biometrics” 978-1-4577-1359-0/11/$26.00 ©2011 IEEE.
[13] Thomas SerreLior Wolf Tomaso Poggio “Object Recognition with Features
Inspired by Visual Cortex” Brain and Cognitive Sciences Department Massachusetts
Institute of Technology Cambridge, MA 02142 {serre,liorwolf}@mit.edu,
tp@ai.mit.edu.
[15] Young H. Kwon∗ and Niels da Vitoria Lobo “Age Classification from Facial
Images” School of Computer Science, University of Central Florida, Orlando, Florida
32816 Received May 26, 1994; accepted September 6, 1996.
72
[16] AsumanGünay and Vasif V. Nabiyev “Age Estimation Based on AAM and 2D-
DCT Features of Facial Images” (IJACSA) International Journal of Advanced
Computer Science and Applications, Vol. 6, No. 2, 2015”
[17] Xin Geng1,2,3 , Kate Smith-Miles1 , Zhi-Hua Zhou3 “Facial Age Estimation by
Learning from Label Distributions” ARC (DP0987421), NSFC (60905031,
60635030), JiangsuSF (BK2009269), 973 Program (2010CB327903) and Jiangsu 333
Program”.
[18] Pramod Kumar Pisharady and Martin Saerbeck “Pose Invariant Face Recognition
Using Neuro-Biologically Inspired Features” International Journal of Future
Computer and Communication, Vol. 1, No. 3, October 2012.
[19] Geng, Senior Member, IEEE, Kate Smith-Miles, Senior Member, IEEE
“Automatic Age Estimation Based on Facial Aging Patterns” IEEE
TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 29, NO. 12, DECEMBER 2007
[22] A. Stone, “The Aging Process of the Face & Techniques of Rejuvenation”,
http://www.aaronstonemd.com/Facial Aging Rejuvenation.shtm
73
[25] A. Zhi-Hua Zhou, C. Taylor, and T. Cootes, “Toward Automatic Simulation of
Aging Effects on Face Images”. In Proceedings of IEEE Transactions on PAMI, vol.
24, no. 4, pp. 442-455, 2002.
74