Chapter - 1: Introduction: Dept. of Electronics and Communication

LGNP and CNN based Face Recognition 2019-2020
Chapter – 1: Introduction
Identification and verification systems utilize biometric technology nowadays.

Biometrics such as finger print, Iris, voice and human face etc. are used in recognizing an
individual with the help of set of recognizable and verifiable data which is unique and specific to
that person. Human face which is unique for human beings contains more information when
compared to other biometrics. In recent years, acquiring facial images has become easy, using
fixed low-cost cameras without physical interaction by the end user. Hence face recognition has
been commonly used in wide range of applications from personal device access system to control
systems such as biometrics, surveillance, security and so on. Despite significant achievements in
numerous face recognition algorithms, face recognition is still a challenging task because of its
struggle to perform under certain conditions.
Artificial intelligence (A.I) is getting incorporated in all walks of life and are making all
procedures automated in the new era. As computer vision has become an important branch of
artificial intelligence (A.I), it provides the computer the human view of the world. The Artificial
Intelligence (A.I) is getting applied in the fields like face detection, face generation, face
identification, agriculture, supermarkets and many more. Nowadays a particular task which was
considered difficult are impossible previously are performed very easily with the help of
artificial intelligence (A.I). With the help of these a very precise and super accurate automated
systems are obtained.
Deep learning is the sub field of machine learning, where the model mimics the human
brain and the capabilities. Deep learning model is a type of artificial neural network (ANN)
which imitates the human brain in performing the tasks such as face detection, face generation,
speech recognition, language translation and decision making. The deep learning training can
take place with human interference or with minimal human interference or with no human
interference and these are known as supervised, semi supervised and unsupervised learning. With
the help of deep learning models the computers can be trained to perform accurate image
detection, classification and identification. In deep learning, instead of giving the models
predefined equations to organize data they evolve from the basic parameters and train the
Dept. of Electronics and Communication Page 1

computer to learn those parameters on its own and extracting its features and forming many
layers of processing data.
The face detection and recognition is done in order to identify an individual in a digital
image. The various features like skin tone, age, gender, hairstyle etc are considered when a face
has to be recognized. The face detection and recognition can be done using the CNN and the data
pre-processing is done with the help of Local gradient number pattern which is coded with the
help of LDP based methods.
1.1. Face Detection and Recognition
Face detection is nothing but finding a face in an image. Face detection is a technology based on
artificial intelligence for the identification of faces in an image. Face detection and recognition
has grown from a basic computer vision technique to a highly advanced artificial neural network
and plays an important role in face tracking and face analysis.
The main function of the face detection is that in a given image whether a face is present or not.
Usually the face detection is also called facial detection and is done to a gray scale image. In face
detection the return value is the location of the face and the area occupied by the face in that
digital image. After the detection process the information like age, expressions, gender identity
and pose from the faces are considered for the recognition process. The identification of a
specific person is done with the help of the face and this is known as face recognition. In the
beginning the face detection and recognition used to work only for the upright and the frontal
faces but now many new techniques are used to detect and recognize the face from any angle.
In face recognition process, feature description and extraction plays important roles. The two
common methods for feature extraction are geometric based and appearance based methods. The
former method uses geometric information to specify facial features by encoding and combining
shape and locations of the face into a feature vector. The latter uses descriptors to analyze facial
information. However, the geometric-feature-based methods generally require accurate and
reliable facial feature detection and tracking, which is difficult to accommodate in several
situations. The appearance based methods use image filters, either on the whole face, to produce
holistic feature, or some specific face-region, to produce local features, to extract any changes in

the appearance of face image. The performance of the appearance-based methods is much better
in relatively controlled environment but their performance degrade in environmental changes.
In the past decades, even though a great progress has been made in the development of analysis
and recognition of face, those challenges are still a challenging problem in face recognition fields
such as illumination, background, and different expressions. The most important key role in face
recognition is developing an effective and robust descriptor which is capable of extracting the
face features and removing the influence of noise, light and expression.
Many approaches for face analysis and recognition have been developed such as, collaborative
preserving fisher discriminant analysis (CPFDA), discriminative multi-scale sparse coding
(DMSC), two-dimensional quaternion principal component analysis (2D-QPCA), and real-time
face recognition using deep learning and visual tracking (RFR-DLVT). These methods use the
global facial information for recognition and they are sensitive to the changes of illumination,
partial occlusions, expression and poses. Therefore, local descriptors have been deeply studied in
the past decades. As a typical representative, local binary pattern (LBP) was presented, which is
simple in principle and low in computational complexity. In addition, it combines structural as
well as the statistical features. Because of the advantages of LBP, numerous improved operators
have been introduced in recent years such as extended LBP, local ternary pattern (LTP) , and
modified gradient pattern (MGP).It is generally known that the edge and direction information is
important for image analysis. These features were also incorporated into the extended methods of
LBP, such as local maximum edge binary patterns (LMEBP), local edge binary pattern (LEBP),
and local tetra patterns (LTrP).
Further, the local directional pattern (LDP) was proposed to reduce the influence of the random
noise and monotonic illumination changes. However, LDP still produces unreliable codes for the
uniform and smooth neighborhoods. Based on LDP, enhanced local directional pattern (ELDP),
local direction number (LDN), local direction texture pattern (LDTP), and local edge direction
pattern (LEDP) were put forward to overcome the shortcomings.
Besides the methods based on local features, the deep learning approaches concerning
Convolutional Neural Network (CNN) have emerged in recent years. Those methods can learn
high-quality information in face images by training models on a large amount of data. Although

CNN-based methods have attracted considerable attention, the research on local features is still
continuing.
1.2. Advantages of Face Recognition
There are many advantages of facial recognition in many aspects of life and some of
them are as follows:
i. Processing is quicker: The face recognition process takes less time to process, usually
it takes a second or two and it is very useful. For the cyber security the companies
need a tool which is quick and safe, with the help of recognition system, this problem
is solved and another added advantage is that it is very difficult to fool the system.
ii. Enhanced Security: Facial recognition is used in the surveillance system, with the
help this system it is very easy to find out thieves and other trespassers. On a higher
level it is used to track the terrorists and other criminals with the help of scanning.
The main benefit is that there is nothing to steal as in the case of passwords or pin
numbers. It can be used to unlock phones and other devices.
iii. Banking Purpose: The facial recognition can be used in the banking system because
despite taking much care there are banking frauds happening. With the help of face
recognition, only the face will be scanned instead of sending one time password.
Sending one time passwords can be stopped as many frauds happen do to duplicating
or stealing it. In face recognition, duplicating and stealing is not possible.
iv. Seamless Integration: This is one of the biggest advantages in the industries as it is
very easily integrated with any system. This does not require any additional expenses
and gets easily integrated with existing security software. So with the help of this the
safety is improved.
1.3. Limitations of Face Recognition
The main limitation is analysis of face because of variation in facial appearance due to
changes in pose, expression, illumination and other factors such as age and make-up. Face

descriptors are sensitive to local intensity variations that occur commonly along edge
components such as eyes, eyebrows, noses, mouths, whiskers, beards, or chins due to internal
factors (eye glasses, contact lenses, or makeup) and external factors (different backgrounds).
This sensitivity of the existing face representation methods generates different patterns of local
intensity variations and makes learning of the face detector difficult. Selecting the right features
for classification is also difficult.
Another drawback is the difficulty in extracting all the features in the face which requires
huge number of images. The sub-optimal performance of most face recognition systems (FRS) in
unconstrained environment is another drawback. The lack of highly effective image pre-
processing approaches, which are typically required before the feature extraction and
classification stages might be one of the reasons for poor performance. Further, in real-life
scenarios the wide applicability of most FRSs is limited since only the minimal face recognition
issues are considered in most FRSs.
The one of the main drawbacks which is affecting is that there is very little amount of
corresponding training data, which is very important for the training of the deep learning model
and without the training data the model may not function as expected. The important source of
datasets for the training of the deep learning model is mainly obtained by two ways which are the
already available datasets and the self-created datasets. These datasets has to be maximum in
terms of both quality and quantity, lack of these datasets may cause the abnormal functioning of
the deep learning model. So the creation of the new datasets, which is used for the training of the
deep learning models, needs the involvement of humans and a very large amount of time and the
self-created datasets are less, so the data augmentation is done. Another drawback is the domain
gap, where the computer generated image may not be similar to that of the real world images. So
due to these drawbacks some method has to be incorporated into the face recognition system
which doesn’t require creating more dataset manually.
1.4. Motivation
In CNN based Face Recognition Methods, image is the input to CNN. CNN is learning from
the pixel level. In the learning process of CNN, the feature extraction starts from the lowest level
feature i.e., untreated knowledge. There is more difficulty in learning feature extraction. In order

to overcome this, new method has to be proposed such that the input images given to CNN
makes the CNN learns the characteristics of the image better than the pixel level in learning.
1.5. Objectives
The face recognition has become a blooming topic and all industries are trying to
implement it in their systems for the security reasons. Due to this reason there is high demand for
recognition process and research had been carried out at the industry level and as well as the
educational level. As a result deep learning, which is a sub branch of the machine learning, is
gaining much popularity in the field of facial recognition.
LGNP is a face descriptor which is used to generate LGNP images for the purpose of face
features extraction for face analysis. Further these LGNP images are used for training and testing
process using CNN classifier for face recognition and classification.
Convolutional Neural Network (CNN) is an algorithm in the deep learning, which consists of
neurons which have learnable weights and biases like any other neural network. CNN can be
implemented in python programming language, which is free and open source. Python is used
here because it is a high level programming language and there are many inbuilt modules, which
is used for deep learning. Python has inbuilt libraries like tensorflow and computer vision 2,
which is used for the deep learning models. Using keras module which provide interface to
tensorflow, CNN’s are designed. In this project , a Convolutional Neural Network model is
proposed for classification and recognition of faces. Convolutional Neural network involves
lesser preprocessing in extracting the features and provides the better accuracy. It employs
Convolution+ReLU and Max pooling for feature extraction. The extracted feature then fed to
feed forward neural network for training and validation. At the output layer, the softmax
activation function is used that provides the labeling in terms of probability.
The main objectives of this project are i] Generate Local Gradient Number Pattern (LGNP)
images from already existing datasets using LGNP code based on the LDP based methods. ii]
Classification and Recognition of faces using Convolutional Neural Network (CNN).

Chapter 3: Local Gradient Number Pattern and CNN
3.1. Introduction to Local Gradient Number Pattern (LGNP)
LGNP is used as a facial feature descriptor which is coded in gradient space i.e., gray
value distributions is considered in the proposed coding scheme. It is used to extract gradient
features in an image. LGNP can be calculated with the help of LGNP code. The gradient features
describe the spatial gray changes in the local neighborhood. The smaller gradient value shows
that the point has a uniform local distribution, while the bigger gradient value shows that the
point is in a region where the gray level changes rapidly. In the proposed local gradient number
pattern (LGNP) descriptor, gradients are computed in a local neighborhood which describe the
distributions in the gray-values. Inherently, it sharpens the local neighborhood, emphasizes the
transitional part in the gray scale and also reveals the structural relationship between the pixels.
Gradient is used in the improvement of the edge in the face images, to remove the shadows in the
pattern and in the enhancement of the small mutation in the flat area.
Let the image function be f (x, y) , gx and gy are the partial derivative at each pixel in the
neighborhood which are given as follows,
gx = ∂f (x, y)/∂x = f (x + 1, y) − f (x, y)
gy = ∂f (x, y)/∂y = f (x, y + 1) − f (x, y) (1)
Then, the gradient can be computed as,
G(x, y) = √ g x 2 + g y 2 (2)
The following formula is used to replace the square root operation in order to reduce complexity
and for calculation simplicity,
G(x, y) = |gx | + |gy| (3)

In the proposed method, Sobel operator is used to compute gradient values in horizontal and
vertical directions in the local neighborhood. The 3 ×3 sobel kernel will be traversing over the
entire image and the edges are covered as the zero padding is done.
Let N represents a local 3 ×3 neighborhood centered on (x,y), Gradients in the horizontal and
vertical directions are Gx and Gy respectively which can be calculated as follows,
−1 0 1 1 2 1
[
Gx = −2 0 2 * N
−1 0 1 ] [
Gy = 0 0 0 *N
−1 −2 −1 ] (4)
−1 0 1 1 2 1
[
Where, −2 0 2 and 0
−1 0 1 ] [
0
−1 −2 −1 ]
0 are 3 ×3 sobel kernels.
The total gradient G of N can be computed by
G= |Gx | + |Gy| (5)
Further, the gradient values in the eight directions, are sorted in the ascending order , i.e., {g i }(i=
1,2,…..,8), and numbered from 0 to 7.
Then, the LGNP values is coded as
LGNP(x,y) = 8 × d ga + d gb (6)
where ‘+’ implies concatenation, d gaand d gb are the directions corresponding to the ga and gb , a
and b are the sorted indexes (a, b ∈ [1, 8]).
The final LGNP value i.e., LGNP(x,y) obtained is converted into corresponding decimal number.
For coding in LGNP, gradient values are coded in two directions. The gray transitions reflected
in gradient values provides prominent detail features, such as lip border, eye border, wrinkle,
nose bridge etc. The directions that reflect gray transitions, are selected in the LGNP coding
scheme, since the detailed features plays a major role in face recognition. Obviously many ways
are there for selecting the two gradient values. Among the gradient values {gi} (i = 1, 2, 3, . . . ,
8), {gi} is divided into two parts, first part is the bigger gradient values having {g5,g6, g7, g8},
the second part is the smaller gradient values having {g1, g2, g3,g4}. In general, smaller gradient

values mean more prominent details of edge. Based on this, the combination of the smallest
value g5 in the first part and the smallest value g1 in the second part are selected, i.e., (a,b) =
(5,1) to form the final LGNP code.
3.2. Convolutional Neural Network (CNN)
There has been tremendous development in the field of Artificial Intelligence (AI) for reducing
the gap between abilities of man and machines. It finds wide range of applications in the field of
electronics and mechanical making amazing things to happen. One of the domains of AI is the
Computer vision.
The computer vision is capable of making the machines to view the things as human does,
interpret the situation is same manner and use this knowledge to take a decision to initiate certain
dependent tasks. Its applications are image recognition, image classification, media recreation,
Language processing, speech recognition, etc. The deep learning algorithm is developed as an
advancement in computer vision called Convolutional Neural Network (CNN).
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm that takes an

input in the form of image, extracts the features such as color, shape and texture, assigns weights
and biases to various extracted features in the image and be able to distinguish one from another.
The main advantage of the CNN is it requires much lower pre-processing when compared with
other classification/recognition algorithms. ConvNets are capable of learning the
filters/characteristics in the preprocessing.
The architecture of a ConvNet is inspired by the Human visual cortex and it is resemble to the
connection pattern of Neurons in the Human Brain. The neurons respond to stimuli only in a
restricted region of the visual field which is known as the Receptive Field. A collection of such
receptive fields combine to cover the entire visual area.
By applying appropriate filters a CNN is capable of successfully capturing the Spatial and
Temporal dependencies in an image. The CNN algorithm performs a better fitting to the image
dataset due to the reduction in the number of parameters involved and reusability of weights. In
other words, the network can be trained to understand the sophistication of the image better.

ConvNet reduces the input images to the form that is easier to process by employing the filters,
without losing features which are critical for making the decision. Convolutional Neural network
is composed of two parts: Feature Extractor and Classifier as shown in figure 3.1. Features
extraction is achieved by using number of layers of convolution and pooling layers. A pair of
convolution and pooling layer forms a single layer. The number of such layers to be used depends
on the types of features to be extracted.
Figure 3. 1 : Structure of Convolutional Neural Network.

3.2.1 Convolution Layer
Convolution layers contain set of filters weights of which have to be leaned. The dimensions
of the filter are much smaller to that of input image. These filters are called kernels. Each kernel
is convolved with the input volume to compute an activation map made of neurons. That is the
filter is slid across the width and height of the input and the dot products between the input and
filter are computed at every spatial position. The output volume of the convolutional layer is
obtained by stacking the activation maps of all filters along the depth dimension. Since the width
and height of each filter is designed to be smaller than the input, each neuron in the activation
map is only connected to a small local region of the input volume.
Figure 3.2 depicts the convolution process. Assume an input image is of 5x5x1 dimension as
shown in figure 3.2.1 and denoted by I, shown in green color. The filter which is used for
convolving the input image is in Convolutional Layer is called the Kernel/Filter and denoted by
K, depicted in yellow color. Example for 3x3x1 kernel matrix is

K= 0 0 0
0 1 0
0 0 0
By keeping the stride value at 1, the Kernel shifts 9 number of times and performs dot product
between the kernel and the portion of the input image over which the kernel is hovering.
Figure 3.2.1: Convolution Figure 3.2.2: Convolution across the image

Figure 3. 2: Convolution process
The filter moves from left to right according the Stride Value till it parses the complete width of
the input image. Further, filter hops down to the beginning (left) of the image and repeats the
convolution process until the entire input image is traversed as shown in figure 3.2.2.
If the image has 3 channels such as RGB, the convolution Kernel has the same depth as that of
the input image. Matrix Multiplication is performed between Kn and In stack ([K1, I1]; [K2, I2];
[K3, I3]) and then all the results are summed with the bias to give us a squashed one-depth
channel Convoluted Feature Output as shown in figure 3.3.

Figure 3.3: Convolution of kernel with 5x5x3 input image
The convolution layer is to determine the features such as edges, color and shapes. The number
of layers to be used in CNN depends on the level of features to be extracted. The purpose of the
first convolution Layer is to capture the Low-Level features such as edges, color, gradient
orientation, etc. With more layers, the CNN extracts the High-Level features, thus providing the
network capable of understanding the images in the dataset, as humans would.
3.2.2 Pooling Layer
Like Convolutional Layer, the Pooling layer is also responsible for reducing the spatial size of the
Convolved Feature. This is to decrease the computational power required to process the
data through dimensionality reduction. Furthermore, it is useful for extracting dominant
features which are rotational and positional invariant, thus maintaining the process of effectively
training of the model.
Pooling can be performed in two ways. One is Maximum Pooling and another is Average
Pooling. Maximum Pooling gives the maximum value of all the values from the portion of the
input image covered by the Kernel. Figure 3.4 shows the maximum pooling. On the other
hand, Average Pooling returns the average of all the values from the portion of the image
covered by the Kernel.

Figure 3.4: Maximum Pooling
Figure 3.5 shows the difference between maximum pooling and average pooling. Max Pooling
also performs as a Noise Suppressant. It discards the noisy activations altogether and also
performs de-noising along with dimensionality reduction. On the other hand, Average Pooling
simply performs dimensionality reduction as a noise suppressing mechanism. Thus Max Pooling
performs a lot better than Average Pooling.
Figure 3.5: Difference between maximum pooling and average pooling
The Convolutional Layer and the Pooling Layer, together form i-th layer of a Convolutional
Neural Network. The number of such layers depends on the complexities in the images. For
capturing the low-levels features the number of layers need be increased, but at the cost of more
computational power.
The extracted features are converted to one dimensional vector by using flatten layer and fed to
the neural network which works as classifier.

3.2.3 Fully Connected Network Layer (FCN Layer)
Figure 3.6: Fully Connected layer with 2 dense layers and 3 classes.
Adding a Fully-Connected layer is a (usually) cheap way of learning non-linear combinations of

the high-level features as represented by the output of the convolutional layer. The Fully-
Connected layer is learning a possibly non-linear function in that space. Figure 3.6 shows the
FCN layer with 2 dense layers and 3 classes.
The flattened output is fed to a feed-forward neural network. Over a series of epochs, the model
will be able to discriminate between dominating and certain low-level features in images and
classify the images by making use of Softmax/sigmoid activation function.
There are different types of CNNs which powers AI: LeNet, AlexNet, VGGNet and so on.
3.2.4. Definitions of parameters used in CNN
Weights and bias: Weight is the parameter within a neural network that transforms input data
within the network’s hidden layers. When the inputs are transmitted between neurons, the
weights are applied to the inputs and passed into an activation function along with the bias.
Weights and bias are both learnable parameters.
Stride: Stride size defines the amount of steps the kernel moves while traversing the input
image. It is one, by default i.e., If stride =1, the kernel moves by one step over the image.
Padding: In order to maintain the same dimensionality between the input image and output
image, the image matrix is padded with zeros along the width and height of the image.

Sigmoid: The function of Sigmoid is similar to the case of softmax but the only difference is this
sigmoid is used only for binary classification.
Learning rate: The rate at which the weights are updated during training is referred to as the
step size or learning rate.
Adam Solver: Adam is an optimization algorithm, which is used as a substitute for the classical
stochastic gradient descent method for updating the weights of the network during the training
process. The configuration parameters which are considered for the optimization are alpha, beta
1, beta 2 and epsilon. The alpha represents the learning rate, beta1 is the rate of exponential
decay for first moment estimate, beta 2 is the rate of exponential decay for the second moment
estimate and epsilon is the very small number which is used for the prevention of the zero
division. The learning rate decay of the optimizer is given by the ratio of the alpha and the square
root of the epoch t.
Rectified Linear Unit (ReLU): Rectified Linear Unit (ReLU) is an activation function used to
overcome the vanishing gradient problem. The ReLU layer acts different for positive and
negative input, when a positive input is given it passes the ReLU layer without any changes but
when a negative input is passed into the ReLU layer the gives zero as the input. The ReLU
function has become the default activation function for almost all the neural networks as the
model that uses it is easier to train and often better performance is achieved. The ReLU
activation function is as shown in the figure 3.7. The X axis represents the input given to the
ReLU function and the Y axis represents the output of the ReLU function. The ReLU can be
expressed as below
ReLU(x) = positive, if x is positive
ReLU(x) = zero, if x is negative

Figure 3.7: ReLU activation function representation

Dropout: Generally, if all the features obtained from previous layers are connected to the FC
layer, it can cause overfitting in the training dataset. Overfitting occurs when a particular model
works so perfectly on the training data which cause a negative impact in the model’s
performance when used on a new data. In order to avoid this problem, a dropout layer is utilized
wherein a few neurons are dropped from the neural network during training process resulting in
reduced size of the model.
Flatten: Flatten is the function that converts the pooled feature map to a single column that is
passed to fully connected layer. Dense adds the fully connected layers to the neural network.
Softmax: The softmax function is a function that turns a vector of K real values into a vector of
K real values that sum to 1. The input values can be positive, negative, zero, or greater than one,
but the softmax transforms them into values between 0 and 1, so that they can be interpreted as
probabilities. Softmax is used for multi-class category.

Chapter 4: Implementation
4.1. Problem Statement
The main objectives of this project is to develop a descriptor for face recognition.
i] Generate Local Gradient Number Pattern (LGNP) images from already existing
datasets using LGNP code based on the LDP based methods.
ii] Classification and Recognition of faces using Convolutional Neural Network (CNN)
4.2. Proposed Model
Figure 4.1: Block diagram Face Recognition based on LGNP and CNN
The block diagram as shown in the above figure 4.1 consists of two parts namely the
Local Gradient Number Pattern (LGNP), which is used for the extraction of local gradient
information which describes the distribution of gray values in the local neighborhood of face
images, and the Convolutional Neural Network (CNN), which is used for classification and
recognition of faces.
4.2.1. Working of LGNP

LGNP is a face feature descriptor which is used to extract gradient features in an image. LGNP
can be calculated with the help of LGNP code. The input image is first converted into a grayscale
image. Then kernel convolution is applied to these grayscale image. Sobel operator kernels are
taken here. A 3 x 3 kernel will be an overlapping window so that all the pixel values in the image
will be covered and this kernel is traversed throughout the entire image. Zero padded
convolution is performed in order to retain the boundary elements. Gradient image in horizontal
direction (Gradient X image) is obtained by convolving X direction sobel kernel with the
grayscale image. Gradient image in vertical direction (Gradient Y image) is obtained by
convolving Y direction sobel kernel with the grayscale image. The final gradient image is
obtained by combining gradient X image and gradient Y image by adding corresponding
absolute values of these images.
LGNP descriptor is then used to obtain LGNP image for the obtained gradient image. The LGNP
values are calculated for every pixel in the gradient image. Let M represents a local 3 x 3
neighborhood centered on (p,q) in gradient image, the gradient values of eight directions are
sorted in ascending order in size and numbered from 0 to 7. The respective directional values are
assigned to the chosen two gradient values and these directional values are used to generate
LGNP value. LGNP values are generated for every 3 x 3 overlapping window by traversing
throughout the entire gradient image and thus LGNP image is formed.
LGNP images are generated for all original images.
Now, Generated LGNP images are used for face recognition and classification, and the classifier
used is the Convolution Neural Network. For training and testing purpose, classifier needs
training images, validation images and testing images. Therefore, LGNP images are divided into
training images, validation images and testing images. The labelled dataset is prepared using
LGNP images train and to validate the Convolution Neural Network for classification of faces.

4.2.2. Working of CNN
Face recognition and classification is implemented using Convolutional Neural Network. The
CNN is trained by using the LGNP face images as the input of the CNN, classifier is generated
and classification result is tested with the test dataset i.e., test images in LGNP form are passed
into the CNN classifier for predicting faces. Layers of the CNN model proposed is as shown in
the figure 4.2
CNN consists of number of sequential layers: i) Input layer ii) Hidden layer and iii) Output
layer. The hidden layer consists of one or more convolutional layers and pooling layers
depending on the level of features to be extracted. Convolution and pooling layer together form
single layer. In the proposed model, there are 3 hidden layers: two convolution + pooling layers
and one dense layer. The working of the model has two phases: i) Training and Validation ii)
Testing.
Figure 4.2: Layers of CNN for face recognition and classification

Training and Validation
The convolutional Neural Network has to be trained with training dataset from the database.
Here the dataset is the generated LGNP images of ORL dataset. In keras, the CNN is created as
stack of layers one by one. First we add Convolution layer that performs the dot product of the
input image with the kernels to learn the filters and optimize the filter weights. Then, we add
ReLU (rectified Linear unit) activation function that helps CNN to learn non-linear decision
boundaries. This is required because convolution makes the spatial properties linear that makes it
difficult to distinguish colors, edges and borders. This is followed by Pooling layer that reduces
the dimension of the features depending the stride size used without eliminating the spatial
features. The maximum pooling is found highly efficient and hence employed. The proposed
model has two layers of convolution+pooling. Dropout strategy is adopted to improve
classification effect
Flatten layer is then used to convert multidimensional features into one dimensional features to
ease the training of fully connected network (FCN). The fully connected layer contains two
dense layers. In FCN, each neuron in a layer is connected to the all the neurons in the previous
layer. These connections do have weights .The features extracted from the previous layers are
applied to the FCN layer.
The dimension at the output of each layer and parameters (weights and biases) in the CNN
compiled in python is shown in figure 4.3

Figure 4.3: Weights and Biases in CNN

For training the CNN, two datasets are required: Training data set and validation dataset. The
actual dataset of images that we feed to the network for learning the weights and biases is called
Training dataset. The sample of data used to provide an unbiased evaluation of a model fit on the
training dataset while tuning model hyper-parameters. The evaluation becomes more biased as
skill on the validation dataset is incorporated into the model configuration.
Testing
Once the CNN is trained and validated we can test the working of classifier with appropriate
test images. In the project, the softmax activation function is used at the output layer that provides
labels in terms probability. There is one output for each class. The CNN outputs high probability
if the input image belongs to the respective class. The outputs probabilities of each class are
mutual.
Accuracy and Losses

The accuracy of the classification dominantly depends on the training the Neural Network.
Training a Neural network model simply means learning (determining) better values for all the
weights and the bias from labeled examples(different classes) .The number of times the CNN
will go through entire training and validation data set is called Number of Epochs. The sequence
in which CNN will go through the training dataset differs in each epoch. In each epoch the
weights of the Fully Connected Network (FCN) of CNN are optimized to yield accurate
classification. That is larger the number of epochs, the higher is the accuracy. Practically this
number will be necessarily larger value for implementing the highly efficient Neural Network.
Accuracy metric is used to measure the algorithm’s performance in an interpretable way.
How accurate the model is able to classify the training dataset and validation dataset are called
training accuracy and validation accuracy respectively. In the training part, cross-validation
technique is used the data is split into two parts - the training set, and the validation set. The
training dataset is used to train the model, while the validation dataset is used only to evaluate
the model's performance.
Metrics on the training set let us see how the model is progressing in terms of its training, but it's
metrics on validation set that let us get a measure of the quality of your model - how well it's
able to make new predictions based on data it hasn't seen before.
A loss function quantifies how “good” or “bad” a given predictor is at classifying the input data
points in a dataset. The smaller the loss, the better a job the classifier is at modeling the
relationship between the input data and the output targets.
The error on the training set is denoted as training loss. Validation loss is the error as a
result of running the validation set through the previously trained CNN. The error that occur
during the training of training dataset is called training loss whereas Validation is an error as a
result of running validation data set through already trained neural network. These losses are not
the percentages. It is the sum of errors that the Neural Network made for each sample in training
and validation data sets.

Chapter 5: Results
The Computer Vision module (CV2) is used to generate the LGNP images. The results of
the Local Gradient Number Pattern model (LGNP) and the Convolutional Neural Network
(CNN) model is shown below. CNN makes use of Keras neural-network open-source library in
the python codes to train the model. Keras is made to work on the python numerical library
Tensorflow which is basically intended to train the neural network faster and easier.
5.1. LGNP Results:
The ORL database is composed of 400 face images taken by 40 different volunteers.
There are 10 different images of 40 distinct subjects. For some of the subjects, the images were

taken at different times, varying lighting slightly, facial expressions (open/closed eyes,
smiling/non-smiling) and facial details (glasses/no-glasses). All the images are taken against a
dark homogeneous background and the subjects are in up-right, frontal position (with tolerance
for some side movement). The files are in PGM format and can be conveniently viewed using
the 'xv' program. The size of each image is 92x112, 8-bit grey levels. The images are organised
in 40 directories (one for each subject) named as: sX , where X indicates the subject number
(between 1 and 40). In each directory there are 10 different images of the selected subject named
as: Y.pgm where Y indicates which image for the specific subject (between 1 and 10).
The features of the dataset are as given in the table 5.1 below. Sample images of a
volunteer from the dataset are as shown in the figure 5.1.
Dataset Name ORL Dataset
Number of Images 400
Pixel value of each Image 112 x 92
Image Format .PGM
Size of Dataset 3.93MB
Table 5.1: Features of the Dataset.
Figure 5.1: A volunteer samples from orl database

The original image is first converted into the gray scale image and the obtained greyscale image
is as shown in the figure 5.2.
Figure 5.2: Gray scale image of the original image
The gray scaled image is padded with zeros for retaining the boundary elements. The 3 x
3 window will be traversing through the entire gray scale image. The sample of the 3 x 3 window
the calculations are as shown below.
50 48 44
[
N = 44 50 43
43 53 41 ]
The gradient images in horizontal and vertical directions are calculated using sobel operator
masks.
The gradient value for 3 x 3 window in horizontal direction is
−1 0 1 −146 13 9
[ ] [
Gx = −2 0 2 * N = −200 10 11
−1 0 1 −205 12 8 ]
The same method will be done in an overlapping manner so that all the pixels will be converted
and the corresponding Gradient X image will be obtained as shown in the figure 5.3.

Figure 5.3: Gradient X image
The gradient value for 3 x 3 window in vertical direction is
1 2 1 138 187 183
[
Gy = 0 0
] [
0 * N = −10 −2
−1 −2 −1 7 2
3
−2 ]
and the corresponding Gradient Y image will be obtained as shown in the figure 5.4.
Figure 5.4: Gradient X image
The total gradient G of N can be computed by
−146 13 9 138 187 183 284 200 192
[
−205 12 8 7] [
G= |Gx | + |Gy| = mod −200 10 11 + mod −10 −2
2
3 = 210 12 13
−2 212 14 10 ][ ]
and the corresponding final Gradient image will be obtained as shown in the figure 5.5.

Figure 5.5: Total Gradient image
N G Eight directions
50 48 44 284 200 192
[ ][
44 50 43 210 12 13
43 53 41 212 14 10
0 1 26 7
3 ¿ ¿
¿ ¿
][ ]
Calculated by Eq (4) and Eq (5)
The gradient values in the eight directions, are sorted in the ascending order, i.e., {g i}(i= 1,2,
…..,8), and numbered from 0 to 7. An example of computing LGNP is shown in figure 5.6.
Sorted Index (i) 1 2 3 4 5 6 7 8
Gradient Values (gi) 10 14 14 192 200 210 212 284
Directions 7 4 6 2 1 3 5 0
Figure 5.6: An example of computing LGNP for (a,b) =(5,1)
Then, the LGNP values is coded as : LGNP value = 8 × d ga + d gb
LGNP value = (17)8
The final LGNP value i.e., LGNP(x,y) obtained is converted into corresponding decimal number.
(1* 81) + (7 * 80) = 15
LGNP value = (17)8 = 15

The center pixel value in the total gradient 3 x 3 window is replaced by the corresponding LGNP
value calculated .
284 200 192 284 200 192
[ ][
210 12 13 210 15 13
212 14 10 212 14 10 ]
The same method will be done in an overlapping manner for the complete obtained gradient
images so that LGNP value for every pixel is calculated , thus generating LGNP images. LGNP
image will be obtained as shown in the figure 5.7.
Figure 5.7: LGNP image
The feature used here are edge features, line features and the centered surrounded features. The
image properties of the LGNP image is as shown in the below table 2.
Size of LGNP image 112 x 92
Image format of the LGNP image .JPG
Table 2: Image properties of LGNP
5.2. CNN Results:
The face classifier model is trained with the LGNP images generated for orl database. The
database contains set of 9 volunteers. Each volunteer has 10 images. From each volunteer, 6
images are selected for training, 2 images for validation and 2 images for testing. Therefore there
are 54 images for training, 18 images for validation and 18 images for testing.

The model structure of proposed CNN is shown in the figure 5.8:
Figure 5.8: Model structure of proposed CNN
From the keras, import Model Checkpoint to save the model by monitoring the particular model
parameter and EarlyStopping to stop the model training early when the parameter we set to
control is not increased. These methods may construct an object of the two and transfer it to
fit_generator as callback functions. In this analysis, validation accuracy was monitored by
passing val_acc to ModelCheckpoint. Only when the validation precision in the present epoch is
greater than in the last epoch, then the model will be saved. The same validation accuracy is
monitored and then passed val_acc to EarlyStopping. The patience parameter was set to 3, which
means that the model will cease to train if validation accuracy does not increase in 3 epochs. The
model.fit_generator function is used to train deep learning models since ImageDataGenerator has
been used for passing data to the model. The train and test data are moved to a fit_generator and

steps_per_epoch sets the batch for training data and the validation steps for test data are similar.
By executing the code the model will start to train and we get the training and validation
accuracy and loss. The Graph of accuracy, Validation accuracy, loss and validation loss Vs
number of epochs is plotted as shown in the figure 5.9.
Training of the model (first 5 epoch):
Found 54 images belonging to 9 classes.

Found 18 images belonging to 9 classes.
Epoch 1/20
20/20 [==============================] - 7s 355ms/step - loss: 1.2649 - accuracy:

0.6111 - val_loss: 2.0287 - val_accuracy: 0.2778
Epoch 00001: val_accuracy improved from -inf to 0.27778, saving model to t1.h5
Epoch 2/20

Epoch 00002: val_accuracy improved from 0.27778 to 0.44444, saving model to t1.h5
Epoch 3/20

Epoch 00003: val_accuracy improved from 0.44444 to 0.66667, saving model to t1.h5
Epoch 4/20

Epoch 00004: val_accuracy did not improve from 0.66667
Epoch 5/20


Epoch 00005: val_accuracy did not improve from 0.66667
Figure 5.9: Graph of accuracy, Validation accuracy, loss and validation loss Vs number of
epoches
Testing part:
The best trained model is saved in .h5 file after training and validating. In testing part, the
unlabeled images which are stored in the testing folder were laoded into CNN classifer which
has the best trained model, and these images are converted to numpy array. Each test image
when passed thorugh CNN classifier provides a vector of length 9 with each element being a
probability value for each class, i.e., these probablility values shows the amount of test
image features matching with the features of 9 classes in the trained model. The CNN outputs
high probability if the input test image belongs to the respective class. The classifier also labels
the corresponding class label w.r.t highest probability value. The process is repeated for all the

test images in the test folder and their corresponding probability values and labels are shown
below in the figure 5.10.
[0.8142246603965759, 0.11392971873283386, 0.004137752577662468,

0.008749136701226234, 0.004680376499891281, 0.0021216904278844595,
0.006437141448259354, 0.04501853138208389,
0.0007009933469817042]
s1
[0.009945130906999111, 0.008339894004166126, 0.03886765241622925,
0.007641488220542669, 0.009720989502966404, 0.5353386402130127,
0.13860473036766052, 0.2502250075340271, 0.0013164645060896873]
s6
[0.0006261452217586339, 0.9685408473014832, 0.003693763632327318,
0.004824518226087093, 0.0010603596456348896, 0.009299534372985363,
0.010361581109464169, 0.00035376075538806617,
0.0012394448276609182]
s2
[0.001086565898731351, 0.9747296571731567, 0.005750013515353203,
0.0025475057773292065, 0.00046232319436967373, 0.0075988685712218285,
0.006177812814712524, 0.0009457088890485466, 0.0007015115115791559]
s2
[2.7988417059532367e-05, 0.00010648425813997164, 0.9855241179466248,
0.0013588354922831059, 0.0005759620107710361, 0.000554766389541328,
0.007054125890135765, 0.0008941226406022906, 0.0039035927038639784]
s3
[0.00021494987595360726, 0.0002893123310059309, 0.9795616865158081,
0.008172516711056232, 0.0002718755276873708, 0.0009732241742312908,
0.006236340384930372, 0.001910377643071115, 0.002369841793552041]
s3
[0.00011264440399827436, 0.003554864088073373, 0.0012352322228252888,
0.9898906946182251, 1.821269870561082e-05, 0.0003813372168224305,
0.002362421015277505, 0.00021026990725658834, 0.002234262879937887]
s4
[7.090570579748601e-05, 0.0035592683125287294, 0.0011941889533773065,
0.9881616234779358, 6.200296775205061e-05, 0.0004029278934467584,
0.004329178016632795, 3.709141310537234e-05, 0.002182774245738983]

s4
[0.00399778550490737, 0.007950114086270332, 0.006784763652831316,
0.0018150405958294868, 0.6553345918655396, 0.07012850791215897,
0.012728000059723854, 0.00039862864650785923,
0.24086247384548187]
s5
[0.00332050072029233, 0.008696336299180984, 0.011834756471216679,
0.004824170842766762, 0.6621164083480835, 0.056516166776418686,
0.03386259824037552, 0.0008811790030449629, 0.21794790029525757]
s5
[2.9639310014317743e-05, 0.00034859354491345584, 0.0008452803594991565,
0.000176896239281632, 0.00047038032789714634, 0.9925413131713867,
0.0038705384358763695, 0.00032607425237074494, 0.0013913377188146114]
s6
[0.0008361845975741744, 0.01328081451356411, 0.015331235714256763,
0.0006310901953838766, 0.013530968688428402, 0.8209946155548096,
0.0946493148803711, 0.003510827897116542, 0.03723505139350891]
s6
[8.051218901528046e-05, 0.0012897439301013947, 0.0008837956702336669,
0.0004185155557934195, 0.00013006202061660588, 0.0007197260856628418,
0.9896787405014038, 0.0057305688969790936, 0.0010682501597329974]
s7
[1.311535743298009e-05, 0.003005495760589838, 0.0011181056033819914,
0.0021446323953568935, 7.558979268651456e-05, 0.0007242822903208435,
0.9921547770500183, 0.00011474672646727413, 0.0006493180408142507]
s7
[0.0071134306490421295, 0.0025696419179439545, 0.009996820241212845,
0.00970382895320654, 5.1145139877917245e-05, 0.008949404582381248,
0.01436237245798111, 0.9435909986495972, 0.0036624432541429996]
s8
[0.0024990406818687916, 0.0009316701325587928, 0.018764814361929893,
0.013309920206665993, 9.139371104538441e-05, 0.010013804771006107,
0.034046001732349396, 0.9160909056663513,
0.004252416081726551]
s8

[0.0006831343634985387, 0.0011719423346221447, 0.1750393956899643,

0.05436424911022186, 0.0012252480955794454, 0.03550982475280762,
0.013471724465489388, 0.01792367920279503, 0.7006108164787292]
s9
[8.577398693887517e-05, 0.00015199967310763896, 0.005052868276834488,
0.0009598117903806269, 0.0002646700304467231, 0.009597299620509148,
0.001936479238793254, 0.00038752082036808133, 0.9815635681152344]
s9
Accuracy: 17 out of 18 ( 94.44444444444444 %)
Figure 5.10: Output of CNN classifier (probability values and predicted labels of 18 test
images)
The accuracy obtained is the number of images predicted correctly. In order to test the classfier,
2 samples from each person is taken for testing. Since there are 9 volunteers, 18 test images are
in the testing folder. The CNN classfier ouputs the predicted labels for every test images. Here,
the classfier has predicted 17 labels correctly out of 18 test images. Hence, the accuracy of the
classfier is calculated by :
number of predicted labels correct 17

Accuracy = = = 94.44%
Total number of test images 18
The classifier also gives the test images along with their corresponding predicted labels and
actual labels which are shown in the figure 5.11


Figure 5.11: Actual and Predicted labels of test images

Conclusion and Future Scope
In this project LGNP descriptor is proposed for image preprocessing which is used to prepare
processed data of the dataset. The proposed scheme encodes the gradient values instead of the
edge responses to achieve robustness to gray-level and noise changes. Also the Kirsch template
is used instead of the Sobel operators to reduce the computational complexity. LGNP images are
generated for the dataset which needs to be classified. CNN based deep learning model is
proposed to perform face recognition and classification. LGNP images generated is the input to
the CNN, therefore the knowledge of CNN learning is knowledge of the edge of the processed
image. The processed knowledge is easier to be learned and understood by CNN, and the face
recognition is better. Therefore, when training CNN, knowledge after processing is chosen. The
CNNs are developed in python codes by using the deep learning library Keras which works on
numerical library Tensorflow. Both the CNNs are composed of three hidden layers such as 2
convolution+pooling layers and one dense layer. The output dense layer uses softmax activation
function with loss function categorical_cross_entropy and provides the classification interms of
one dimensional array with array elements equal to number of classes. To evaluate the classifier
performance, random flip and random shift is used to for data augmentation since the numbers of
the face database are limited. The order of training sets and test sets were disturbed to avoid
over-fitting. To improve the classification effect, the rectified linear unit (ReLU) and dropout
strategy are used and also batch normalization (BN) is used to accellerate convergence. The
model has been classifying the faces with maximum accuracy of 94.44% using Convolution
Neural Network classifier.

The study leads to some of the following future scope:
LGNP may produce the same code for the neighborhoods with entirely different visual
perception like LBP like LDP approaches. In order to overcome this, a simple convex-concave
partition (CCP) strategy called Fuzzy CCP can be used to enhance LGNP descriptor.
Using different set of images for training, validation and testing part to get the accurate transfer
learning in the CNN model.
Further, combining FCCP_LGNP with deep learning technology can be done to enhance its
accuracy, and expand its scope of application.


Chapter - 1: Introduction: Dept. of Electronics and Communication

Uploaded by

Copyright:

Available Formats

You might also like

Chapter - 1: Introduction: Dept. of Electronics and Communication

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter - 1: Introduction: Dept. of Electronics and Communication

Uploaded by

Copyright:

Available Formats

LGNP and CNN based Face Recognition 2019-2020

Identification and verification systems utilize biometric technology nowadays.

Dept. of Electronics and Communication Page 1

1.1. Face Detection and Recognition

Dept. of Electronics and Communication Page 2

Dept. of Electronics and Communication Page 3

1.2. Advantages of Face Recognition

1.3. Limitations of Face Recognition

Dept. of Electronics and Communication Page 4

Dept. of Electronics and Communication Page 5

Dept. of Electronics and Communication Page 6

Chapter 3: Local Gradient Number Pattern and CNN

3.1. Introduction to Local Gradient Number Pattern (LGNP)

gx = ∂f (x, y)/∂x = f (x + 1, y) − f (x, y)

gy = ∂f (x, y)/∂y = f (x, y + 1) − f (x, y) (1)

Then, the gradient can be computed as,

G(x, y) = |gx | + |gy| (3)

Dept. of Electronics and Communication Page 7

The total gradient G of N can be computed by

G= |Gx | + |Gy| (5)

Then, the LGNP values is coded as

Dept. of Electronics and Communication Page 8

3.2. Convolutional Neural Network (CNN)

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm that takes an

Dept. of Electronics and Communication Page 9

Figure 3. 1 : Structure of Convolutional Neural Network.

Dept. of Electronics and Communication Page 10

Figure 3.2.1: Convolution Figure 3.2.2: Convolution across the image

Dept. of Electronics and Communication Page 11

Figure 3.3: Convolution of kernel with 5x5x3 input image

3.2.2 Pooling Layer

Dept. of Electronics and Communication Page 12

Figure 3.4: Maximum Pooling

Figure 3.5: Difference between maximum pooling and average pooling

Dept. of Electronics and Communication Page 13

3.2.3 Fully Connected Network Layer (FCN Layer)

Adding a Fully-Connected layer is a (usually) cheap way of learning non-linear combinations of

3.2.4. Definitions of parameters used in CNN

Dept. of Electronics and Communication Page 14

ReLU(x) = positive, if x is positive

ReLU(x) = zero, if x is negative

Dept. of Electronics and Communication Page 15

Figure 3.7: ReLU activation function representation

Dept. of Electronics and Communication Page 16

4.2. Proposed Model

4.2.1. Working of LGNP

Dept. of Electronics and Communication Page 17

LGNP images are generated for all original images.

Dept. of Electronics and Communication Page 18

4.2.2. Working of CNN

Figure 4.2: Layers of CNN for face recognition and classification

Dept. of Electronics and Communication Page 19

Training and Validation

Dept. of Electronics and Communication Page 20

Figure 4.3: Weights and Biases in CNN

Accuracy and Losses

Dept. of Electronics and Communication Page 21