Facial Recognition 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Facial Expression Recognition

Using Deep Learning

Major Project
Submitted in partial fulfilment of the requirement

In

Computer science and Engineering

To

Feroze Gandhi Institute of Engineering and Technology, Raebareli

By

Under the Guidance of

PROF. DR. Manish Saxena

FEROZE GANDHI INSTITUTE OF ENGINEERING AND TECHNOLOGY,

RAEBARELI (U.P.) 2023-24


CERTIFICATE
This is to certify that, students of 2nd year MCA 4th semester of Computer Science
Engineering Department, Feroze Gandhi Institute of Engineering and Technology,
Raebareli have completed their Major Project entitled FACIAL EMOTION
RECOGNITION USING DEEP LEARNING. They have submitted their Project
Report for the partial fulfilment of the curriculum of the Degree of Bachelor of
Computer science and engineering from Feroze Gandhi Institute of Engineering and
Technology, Raebareli

DR. DIGVIJAY SINGH CHAUHAN DR. Manish Saxena

Director of FGIET
Acknowledgement

It is our proud privilege and duty to acknowledge the kind of help and guidance
received from several people in preparation of this report. It would not have been
possible to prepare this project in this form without their valuable help, co-operation
and guidance.

First and foremost, we wish to record our sincere gratitude to Management of this
College and to our Director Digvijay Singh Chauhan, Feroze Gandhi Institute of
Engineering and Technology Raebareli, for his constant support and encouragement
in preparation of this project and for making available library and laboratory facilities
needed to prepare this project.

We express our sincere gratitude to our guide DR. Manish Saxena for guiding us in
investigations for this project and in carrying out experimental work. Our numerous
discussions with his were extremely helpful. We hold his in esteem for guidance,
encouragement, involvement and inspiration received from him. His vision and
execution aimed at creating a structure, definition and realism around the project and
fostered the ideal environment for us to learn and do. This project is a result of his
teaching, encouragement and inputs in the numerous meetings he had with us,
despite his busy schedule. The experience was novel one and we would like to thank
all the people, who have lent their valuable time for the completion of the report.
Without their consideration would have been difficult to complete the report.
Declaration
We, Rishabh and Animesh Pratap Singh do hereby declare that the project report
submitted to the FEROZE GANDHI INSTITUTE OF ENGINEERING AND
TECHNOLOGY, RAEBARELI in partial fulfilment for the entitled FACIAL EMOTION
RECOGNITION USING DEEP LEARNING, is an original piece of research work
carried out by me under the guidance and supervision of DR. Manish Saxena. We
further declare that the information has been collected from genuine & authentic
sources and We have not submitted this project report to this or any other university
for award if diploma or degree of certificate examination.
Abstract
Face recognition technology is a biometric technology, which is based on the
identification of facial features of a person. People collect the face images, and the
recognition equipment automatically processes the images. The paper introduces
the related researches of face recognition from different perspectives. The paper
describes the development stages and the related technologies of face recognition.
We introduce the research of face recognition for real conditions, and we introduce
the general evaluation standards and the general databases of face recognition. We
give a forward-looking view of face recognition. Face recognition has become the
future development direction and has many potential application prospects.

Keywords:
 facial recognition
 face detection
 feature extraction
 face verification
 fiducial point
 face alignment
 convolutional neural networks
 boosting
 deep neural networks
 video-based face recognition
 infrared face recognition
TABLE OF CONTENTS

CERTIFICATE............................................................................................ ii

DECLARATION ......................................................................................... iii

ACKNOWLEDGEMENT .................................................................................. iv

LIST OF FIGURES...................................................................................... v

ABSTRACT .............................................................................................. vi

CHAPTER 1. INRODUCTION…...................................................................... 1

1.1. FACE RECOGNIZATION… ....................................................................... 1

1.1.1 GEOMETRIC… ......................................................................... 1

1.1.2 PHOTOMETRIC… ..................................................................... 1

1.2 FACE DETECTION…................................................................................. 2

1.2.1 PRE-PROCSSING… ..................................................................2

1.2.2 CLASSIFICATION… .................................................................... 3

1.2.3 LOCALIZATION… ..................................................................... 3

CHAPTER 2. LITERATURE SURVEY ............................................................ 4

2.1. FEATURE BASE APPROCH ..................................................................... 4

2.1.1 DEFORMABLE TEMPLATES .......................................................... 5

2.2. LOW LEVEL ANALYSIS ............................................................................. 6

2.3 FEATURE ANALYSIS ................................................................................. 9

2.3.1 FEATURE SEARCHING… ............................................................ 9

2.4 CONSTELLATION METHOD… ................................................................. 10


2.5 DEEP LEARNING ............................................................................ 10

2.6 LINEAR SUB SPACE METHOD… ..........................................................11

2.7 STASTICAL APPROCH ......................................................................... 12

CHAPTER 3. PROPER REVIEW FROM 2015 TO 2022 ........................... 13

3.1 PROPER REVIEW FROM 2015 TO 2022 ................................... 13

CHAPTER 4. DIGITAL IMAGE PROCESSING… ...................................... 13

3.1. DIGITAL IMAGE PROCESSING…................................................. 13

3.2. FUNDAMENTAL STEPS IN IMAGE PROCESSING… .....................14

3.3. ELEMENTS OF DIGITAL IMAGE PROCESSING SYSTEM...........15

3.3.1. A SIMPLE IMAGE FORMATION MODEL .................................. 15

CHAPTER 5. TENSORFLOW… .................................................................... 16

4.1 FACE RECOGNITION ASSISTANT WITH TENSORFLOW .......... 16

4.2 HOW IT WORKS .................................................................................. 17

4.2.1 EMBEDDING GENERATION .......................................................18

4.2.2 INSTALLATION .............................................................................. 18

4.2.3 DATABASE ..................................................................................... 19

CHAPTRER 6. FACE DETECTION… ............................................................ 20

5.1 FACE DETECTION IN IMAGE .............................................................. 21

5.2 REAL TIME FACE DETECTION… ...................................................... 22

5.3 FACE DETECTION PROCESS ............................................................. 23

5.4 FACE DETETION ALGORITHM ............................................................ 24

CHAPTER 7. FACE RECOGNITION… ........................................................... 34

6.1 FACE RECOGNITION USING GEOMETRICAL FEATURE.............. 34

6.1.1 FACE RECOGNITION USING TEMPLATE MATCHI ............... 35

6.2. PROBLEM SCOPE AND SYSTEM SPECIFICATIONS ................... 36

6.3 BRIEF OUTLINE OF THE IMPLEMENTED SYSTEM ....................... 36

6.4 FACE RECOGNITION DIFFICULTIE ................................................... 38


6.5 PRINCIPAL COMPONENT ANALYSIS.................................................. 40

6.6 UNDERSTANDING EIGEN FACES ............................................................ 40

6.7 IMPROVING FACE DETECTION USING RECONSTRUCTION............ 44

6.8 POSE INVARIENT FACE RECOGNITION.............................................. 45

CHAPTER 8. CONCLUSION… ............................................................................. 47

CHAPTER 9. REFERENCES ................................................................................. 49

CHAPTER 10. APPENDIX ................................................................................ 51

LIST OF FIGURE:
Figure 1 - Photometric stereo image

Figure 2 - Geometric facial recognition

Fig: Face detection algorithm

Fig: detection method

Fig.2.2: face detection

Fig 3.1: Fundamental steps in digital image processing

Fig 3.3: Elements of digital image

Fig.5.1 a successful face detection in an image

Fig.5.2.1: frame 1 from camera

Fig.5.2.2: frame 2 from camera

Fig.5.2.3: spatio-temporally filtered image

Fig.5.3: face detection

Fig 5.3.1: Average human face in grey-scale

Fig 5.4: Face detection algorithm

Fig 5.4.1: mouth detection

Fig 5.4.2: Noise detection

Fig 6.1: Geometrical features (white) which could be used for face recognition

Fig 6.7: Face recognition using template matching


Fig.6.8: Pose invariant fact

CHAPTER-1
INTRODUCTION
Face recognition is the task of identifying an already detected object as a known or
unknown face. Often the problem of face recognition is confused with the problem of
face detection Face Recognition on the other hand is to decide if the "face" is
someone known, or unknown, using for this purpose a database of faces in order to
validate this input face.

1.1 FACE RECOGNIZATION:

DIFFERENT APPROACHES OF FACE RECOGNITION:

There are two predominant approaches to the face recognition problem: Geometric
(feature based) and photometric (view based). As researcher interest in face
recognition continued, many different algorithms were developed, three of which
have been well studied in face recognition literature.

Recognition algorithms can be divided into two main approaches:

1. Geometric: Is based on geometrical relationship between facial landmarks, or


in other words the spatial configuration of facial features. That means that the
main geometrical features of the face such as the eyes, nose and mouth are
first located and then faces are classified on the basis of various geometrical
distances and angles between features. (Figure 2)
2. Photometric stereo: Used to recover the shape of an object from a number
of images taken under different lighting conditions. The shape of the
recovered object is defined by a gradient map, which is made up of an array
of surface normals (Zhao and Chellappa, 2006) (Figure 1)

Popular recognition algorithms include:

1. Principal Component Analysis using Eigenfaces, (PCA)

2. Linear Discriminate Analysis,

3. Elastic Bunch Graph Matching using the fisherface algorithm,


Figure 1-photometric stereo image

Figure 2 –Geometric facial recognition

1.2 FACE DETECTION: Face detection is one of the most widely used computer
vision applications. It is a fundamental problem in computer vision and pattern
recognition. In the last decade, multiple face feature detection methods have
been introduced. In recent years, the success of deep learning and convolutional
neural networks (CNN) have recently shown great results in powering highly-
accurate face detection solutions.

"To begin, the system must find the face in the image or video. Most cameras
now include a built-in facial detection feature. Snapchat, Facebook, and other
social media platforms employ face identification to let users apply effects to
images and videos taken using their apps. Many apps identify the person in the
photo using this, they can even find a person standing in a crowd with this face
detection technique."

The face detection system can be divided into the following steps:-

1. Pre-Processing: To reduce the variability in the faces, the images are processed
before they are fed into the network. All positive examples that is the face images
are obtained by cropping Department of ECE Page 3 images with frontal faces to
include only the front view. All the cropped images are then corrected for lighting
through standard algorithms.

2. Classification: Neural networks are implemented to classify the images as faces


or nonfaces by training on these examples. We use both our implementation of the
neural network and the MATLAB neural network toolbox for this task. Different
network configurations are experimented with to optimize the results.

3. Localization: The trained neural network is then used to search for faces in an
image and if present localize them in a bounding box. Various Feature of Face on
which the work has done on:- Position Scale Orientation Illumination.

Fig: Face detection algorithm


CHAPTER-2
LITERATURE SURVEY
Face detection is a computer technology that determines the location and size of
human face in arbitrary (digital) image. The facial features are detected and any
other objects like trees, buildings and bodies etc are ignored from the digital image. It
can be regarded as a specific case of object-class detection, where the task is
finding the location and sizes of all objects in an image that belong to a given class.
Face detection, can be regarded as a more general case of face localization. In face
localization, the task is to find the locations and sizes of a known number of faces
(usually one). Basically there are two types of approaches to detect facial part in the
given image i.e. feature base and image base approach. Feature base approach
tries to extract features of the image and match it against the knowledge of the face
features. While image base approach tries to get best match between training and
testing images.

Fig: detection method

2.1 FEATURE BASE APPROACH:

Active Shape Model Active shape models focus on complex non-rigid features like
actual physical and higher level appearance of features Means that Active Shape
Models (ASMs) are aimed at automatically locating landmark points that define the
shape of any statistically modelled object in an image.
2.1.1 Deformable templates:

Deformable templates were then introduced by Yuille et al. to take into account the
a priori of facial features and to better the performance of snakes. Locating a facial
feature boundary is not an easy task because the local evidence of facial edges is
difficult to organize into a sensible global entity using generic contours. The low
brightness contrast around some of these features also makes the edge detection
process. Yuille et al. took the concept of snakes a step further by incorporating
global information of the eye to improve the reliability of the extraction process.

2.1.2 PDM (Point Distribution Model):

Independently of computerized image analysis, and before ASMs were developed,


researchers developed statistical models of shape. The idea is that once you
represent shapes as vectors, you can apply standard statistical methods to them just
like any other multivariate object. These models learn allowable constellations of
shape points from training examples and use principal components to build what is
called a Point Distribution Model. These have been used in diverse ways, for
example for categorizing Iron Age broaches. Ideal Point Distribution Models can only
deform in ways that are characteristic of the object.

2.2) LOW LEVEL ANALYSIS:

Based on low level visual features like color, intensity, edges, motion etc. Skin Color
Base Color is avital feature of human faces. Using skin-color as a feature for tracking
a face has several advantages. Color processing is much faster than processing
other facial features. Under certain lighting conditions, color is orientation invariant.
This property makes motion estimation much easier because only a translation
model is needed for motion estimation. Tracking human faces using color as a
feature has several problems like the color representation of a face obtained by a
camera is influenced by many factors (ambient light, object movement, etc.

Fig.2.2.face detection
2.4 FEATURE ANALYSIS:

These algorithms aim to find structural features that exist even when the pose,
viewpoint, or lighting conditions vary, and then use these to locate faces. These
methods are designed mainly for face localization.

2.4.1 Feature Searching

Viola Jones Method:

Paul Viola and Michael Jones presented an approach for object detection which
minimizes computation time while achieving high detection accuracy. Paul Viola and
Michael Jones [39] proposed a fast and robust method for face detection which is 15
times quicker than any technique at the time of release with 95% accuracy at around
17 fps.The technique relies on the use of simple Haar-like features that are
evaluated quickly through the use of a new image representation. Based on the
concept of an ―Integral Image‖ it generates a large set of features and uses the
boosting algorithm AdaBoost to reduce the overcomplete set and the introduction of
a degenerative tree of the boosted classifiers provides for robust and fast
interferences. The detector is applied in a scanning fashion and used on gray-scale
images, the scanned window that is applied can also be scaled, as well as the
features evaluated.

2.5 CONSTELLATION METHOD:

All methods discussed so far are able to track faces but still some issue like locating
faces of various poses in complex background is truly difficult. To reduce this
difficulty investigator form a group of facial features in face-like constellations using
more robust modelling approaches such as statistical analysis. Various types of face
constellations have been proposed by Burl et al. . They establish use of statistical
shape theory on the features detected from a multiscale Gaussian derivative filter.
Huang et al. also apply a Gaussian filter for pre-processing in a framework based on
image feature analysis. Image Base Approach.

2.6 DEEP LEARNING:

"Deep learning is a class of machine learning algorithms that[8]: 199–200 uses


multiple layers to progressively extract higher-level features from the raw input. For
example, in image processing, lower layers may identify edges, while higher layers
may identify the concepts relevant to a human such as digits or letters or faces."
CHAPTER 3
PROPER REVIEW FROM 2015 TO 2022

FER can appropriately predict individuals’ emotional state from the deformation
displays in the face as one of the cognitive and affective research fields. Many works
have been attempted in the field to make it an achievable task. FER research has
produced several models, and different FER databases together with their
annotations. The successes recorded so far in the literature are about FER models
that could predict the basic emotion from facial expression image. No consideration
is given to other aspects of FER research that considered the intensity estimation of
the emotion, Facial expression ambiguity, and the label inconsistency and correlation
of among labels. This section will present research diversities in FER as we
categorise them based on machine learning problem definitions; SLL, SLL extension
(FER and intensity estimation), MLL, and LDL. The trend in FER approaches to
emotion recognition is pictorially presented in Figure 2. Table 2 presents the
categories of emotion recognition research in FER with the associate limitations.

A. SINGLE LABEL LEARNING (MULTICLASS):


Early studies on the human cognitive and affective aspect of computer vision
were pilots by the established work, which includes six basic classes of
emotion. Classifying an instance of face expression image into any of the six
basic emotion states is identified as a multiclass task and termed single label
learning. Figure 5A illustrates how SLL reports only one emotion out of all the
possible outcomes. Methods that attempt facial expression multiclass tasks
are considerably presented in FER literature. These methods revolve around
the handcrafted, conventional machine learning and the deep learning
models.
B. FACIAL EXPRESSION RECOGNITION AND INTENSITY
ESTIMATION:
Facial expression intensity estimation can be referred to as the observable
differences between facial expression images of the same expression or the
degree of dissimilarities of facial expression image from its reference base.
One of the facial expression analysis tasks is facial expression intensity
estimation; expression intensity is estimated in emotion and AUs
quantifications. Figure 3 is the sample of expression intensity from sequence
data (Figure 3A) and static data (Figure 3B). Some methods for FER intensity
estimation have been explored in the field. Khairunmi [25] grouped these
methods into; distance-based, cluster-based, regression-based, and
probabilistic graphical-based.
Observation showed that the distance-based approach quantified facial
expression intensity before the recognition of the emotion. This model
disagrees with how human expresses emotion.

They proposed an ensemble of naive Bayesian classifiers for expression


classification and intensity estimation, respectively. They employed some
naive Bayes classifiers to classify selected features weakly and generate a
robust classifier from the weak classifiers’ output for expression classification,
and the normalised output scores are the class intensity estimation. Wu et al.
[61] considered expression intensity estimation by quantifying energy
variation of facial expression sequence. They were motivated by the
possibility of quantifying energy value for each state of expression using
facial landmarks. The model employed HMM to discriminate different
expressions and used a linear regression algorithm to obtain intensity curves
for each expression.

C. MULTILABEL LEARNING:
AU at different affective states is triggered in the same region of the face.
GLMM used the feature extracted for different expressions at the same
region to classify them into a zero or non-zero, making it possible for a group
to contain different expressions. The global solution of the model was
achieved by a function called Maximum Margin Hinge loss. GLMM was later
enhanced to Adaptive Group Lasso Regression [74] to assign a continuous
value to the distribution of expression present in a non-zero group. GLMM
shows its superior performance compares with some existing ML methods
from the experiment conducted on s-JAFFE.

Most of the FER databases do not come with distribution scores; to apply LDL to
these data sets, there is a need for methods that recover or relabel the data with
distribution scores. The few techniques that consider this challenge include; label
enhancement based on fuzzy clustering algorithms, which employs C-means
clustering to cluster feature vectors and iteratively minimise the objective function to
achieve label distribution from logical labels.
CHAPTER-4
PREVIOUS RESEARCH RELATED WORK

3.1 DIGITAL IMAGE PROCESSING:

Interest in digital image processing methods stems from two principal


application areas:

1. Improvement of pictorial information for human interpretation

2. Processing of scene data for autonomous machine perception

In this second application area, interest focuses on procedures for extracting image
information in a form suitable for computer processing. Examples includes automatic
character recognition, industrial machine vision for product assembly and inspection,
military recognizance, automatic processing of fingerprints etc

Image: Am image refers a 2D light intensity function f(x, y), where(x, y) denotes
spatial coordinates and the value of f at any point (x, y) is proportional to the
brightness or gray levels of the image at that point. A digital image is an image f(x, y)
that has been discretized both in spatial coordinates and brightness. The elements of
such a digital array are called image elements or pixels.

A simple image model: To be suitable for computer processing, an image f(x, y)


must be digitalized both spatially and in amplitude. Digitization of the spatial
coordinates (x, y) is called image sampling. Amplitude digitization is called gray-level
quantization. The storage and processing requirements increase rapidly with the
spatial resolution and the number of gray levels.

Types of image processing

 Low level processing


 Medium level processing
 High level processing

3.2 FUNDAMENTAL STEPS IN IMAGE PROCESSING

Fundamental steps in image processing are:

1. Image acquisition: to acquire a digital image.

2. Image pre-processing: to improve the image in ways that increases the chances
for success of the other processes.
3. Image segmentation: to partitions an input image into its constituent parts of
objects.

4. Image segmentation: to convert the input data to a from suitable for computer
processing.

5. Image description: to extract the features that result in some quantitative


information of interest of features that are basic for differentiating one class of
objects from another.

6. Image recognition: to assign a label to an object based on the information


provided by its description

Fig 3.1: Fundamental steps in digital image processing

3.3 ELEMENTS OF DIGITAL IMAGE PROCESSING SYSTEMS

A digital image processing system contains the following blocks as shown in the
figure
Fig 3.3: Elements of digital image

The basic operations performed in a digital image processing system include

1. Acquisition

2. Storage

3. Processing

4. Communication

5. Display

TensorFlow is a multipurpose machine learning framework. TensorFlow can be used


anywhere from training huge models across clusters in the cloud to running models
locally on an embedded system like your phone devices.

Machine Learning has been here for a while, there are a lot of open-source libraries
like TensorFlow where you can find a lot of pre-trained models and build cool stuff on
top of them, without starting from Scratch. What we are trying to achieve here falls
under Image Classification, where our Machine learning model has to classify the
faces in the images amongst the recognized people."

Face Recognition Assistant with TensorFlow:


Have you ever desired a personal tireless keeper able to recognize people passing
through your gate and inform you whenever you want? Do you want to remove
badges from your office and automatically and precise take note of all people coming
in and out? Or, do you want to create your personal big brother world in your
neighbourhood? If so, this repository is right for you. Powered by a cascade of four
state of the art very stupid and completely unaware neural networks, this repository
is giving you the possibility to recognize with a cheap and low range camera all
persons transit in its field of view.

How it works:

There are four networks running behind the curtains. They all worked in a cascade
manner. So, the output of the first network is the starting point of the second one and
so on.

 Faces detection
 Facial landmarks detection
 Face side detection
 Embeddings generation

Installation:

We used only a couple of libraries, but for your simplicity we also created a
'requirements.txt' file. We purposely didn't add TensorFlow installation in the txt file,
because it's up to you to choose your preferred version (gpu, cpu, etc...). Moreover,
TensorFlow installation can be sometime pretty easy as 'pip install'
Database embeddings creation:

The first step is to create reference embeddings (vectors of 256 elements). As


default option we acquire 50 frames (taking the best one every 5) from the main
camera connected to the computer. Then, the code produces automatically all
encodings of all persons acquired and train a simple knn model with all generated
encodings. Finally, the code saves encodings and knn model in two separate pickle
files inside faceNet folder.

Recognizing facial expressions would help systems to detect if people were happy or
sad as a human being can. This will allow software’s and AI systems to provide an
even better experience to humans in various applications. From detecting probable
suicides and stopping them to playing mood-based music there is a wide variety of
applications where emotion detection or mood detection can play a vital role in AI
applications.
The system works on CNN (convolutional neural network) for extracting the
physiological signals and make a prediction. The results can be drawn out by
scanning the person’s image through a camera and then correlate it with a training
dataset to predict one’s state of emotions.
CHAPTER-5

IMPLEMENTATION
The problem of face recognition is all about face detection. This is a fact that seems
quite bizarre to new researchers in this area. However, before face recognition is
possible, one must be able to reliably find a face and its landmarks. This is
essentially a segmentation problem and in practical systems, most of the effort goes
into solving this task. In fact the actual recognition based on features extracted from
these facial landmarks is only a minor last step. There are two types of face
detection problems: 1) Face detection in images and 2) Real-time face detection.

5.1 FACE DETECTION IN IMAGES

Fig.5.1 a successful face detection in an image

Most face detection systems attempt to extract a fraction of the whole face, thereby
eliminating most of the background and other areas of an individual's head such as
hair that are not necessary for the face recognition task. With static images, this is
often done by running a across the image. The face detection system then judges if
a face is present inside the window (Brunelli and Poggio, 1993). Unfortunately, with
static images there is a very large search space of possible locations of a face in an
image.

5.2 REAL-TIME FACE DETECTION:

Real-time face detection involves detection of a face from a series of frames from a
video capturing device. While the hardware requirements for such a system are far
more stringent, from a computer vision stand point, real-time face detection is
actually a far simpler process than detecting a face in a static image. This is because
unlike most of our surrounding Department of ECE Page 28 environment, people are
continually moving. We walk around, blink, fidget, wave our hands about, etc.
Fig.5.2.1 frame 1 from camera fig.5.2.2 frame 2 from camera

Fig.5.2.3 spatio-temporally filtered image

Since in real-time face detection, the system is presented with a series of frames in
which to detect a face, by using spatio-temperal filtering (finding the difference
between subsequent frames), the area of the frame that has changed can be
identified and the individual detected (Wang and Adelson, 1994 and Adelson and
Bergen 1986).Further more as seen in Figure exact face locations can be easily
identified by using a few simple rules, such as, 1)the head is the small blob above a
larger blob -the body Department of ECE Page 29 2)head motion must be
reasonably slow and contiguous -heads won't jump around erratically (Turk and
Pentland 1991a, 1991b).

5.3 FACE DETECTION PROCESS:

Fig.5.3 face detection


It is process of identifying different parts of human faces like eyes, nose, mouth,
etc… this process can be achieved by using tensorflow code. In this project the
author will attempt to detect faces in still images by using image invariants.

Scaled colormap scaled colormap(negative)

Fig 5.3.1 Average human face in grey-scale

5.4 FACE DETECTION ALGORITHM :

Fig 5.4; Face detection algorithm


Fig 5.4.1: mouth detection Fig 5.4.2: Noise detection
CHAPTER-6
METHODOLOGY
Over the last few decades many techniques have been proposed for face
recognition. Many of the techniques proposed during the early stages of computer
vision cannot be considered successful, but almost all of the recent approaches to
the face recognition problem have been creditable. According to the research by
Brunelli and Poggio (1993) all approaches to human face recognition can be divided
into two strategies:

(1) Geometrical features and

(2) Template matching

6.1 FACE RECOGNITION USING GEOMETRICAL FEATURES

This technique involves computation of a set of geometrical features such as nose


width and length, mouth position and chin shape, etc. from the picture of the face we
want to recognize. This set of features is then matched with the features of known
individuals. A suitable metric such as Euclidean distance (finding the closest vector)
can be used to find the closest match. Most pioneering work in face recognition was
done using geometric features (Kanade, 1973), although Craw et al. (1987) did
relatively recent work in this area.

Fig 6.1: Geometrical features (white) which could be used for face recognition

The advantage of using geometrical features as a basis for face recognition is that
recognition is possible even at very low resolutions and with noisy images (images
with many disorderly pixel intensities). Although the face cannot be viewed in detail
its overall geometrical configuration can be extracted for face recognition. The
technique's main disadvantage is that automated extraction of the facial geometrical
features is very hard. Automated geometrical

feature extraction based recognition is also very sensitive to the scaling and rotation
of a face in the image plane (Brunelli and Poggio, 1993). This is apparent when we
examine Kanade's(1973) results where he reported a recognition rate of between 45-
75 % with a database of only 20 people. However, if these features are extracted
manually as in Goldstein et al. (1971), and Kaya and Kobayashi (1972) satisfactory
results may be obtained.

6.1.1 Face recognition using template matching

This is similar the template matching technique used in face detection, except here
we are not trying to classify an image as a 'face' or 'non-face' but are trying to
recognize a face

Fig 6.: Face recognition using template matching

6.2 PROBLEM SCOPE AND SYSTEM SPECIFICATION:

The following problem scope for this project was arrived at after reviewing the
literature on face detection and face recognition, and determining possible real-world
situations where such systems would be of use. The following system(s)
requirements were identified 1 A system to detect frontal view faces in static images
2 A system to recognize a given frontal view face 3 Only Emotionless, frontal view
faces will be presented to the face detection & recognition 4 All implemented
systems must display a high degree of lighting invariency. 5 All systems must posses
near real-time performance. 6 Both fully automated and manual face detection must
be supported 7 Frontal view face recognition will be realised using only a single
known image 8 Automated face detection and recognition systems should be
combined into a fully automated face detection and recognition system. The face
recognition sub-system must display a slight degree of invariency to scaling and
rotation errors in the segmented image extracted by the face detection sub-system.
6.3 BRIEF OUT LINE OF THE IMPLEMENTED SYSTEM:

Fully automated face detection of frontal view faces is implemented using a


deformable template algorithm relying on the image invariants of human faces. This
was chosen because a similar neural-network based face detection model would
have needed far too much training data to be implemented and would have used a
great deal of computing time. The main difficulties in implementing a deformable
template based technique were the creation of the bright and dark intensity sensitive
templates and designing an efficient implementation of the detection algorithm.

6.4 FACE RECOGNITION DIFFICULTIES:

1. Identify similar faces (inter-class similarity)

2. Accommodate intra-class variability due to

2.1 head pose

2.2 illumination conditions

2.3 Emotions

2.4 facial accessories

2.5 aging effects

3. Cartoon faces

6.5 PRINCIPAL COMPONENT ANALYSIS (PCA):

Principal Component Analysis (or Karhunen-Loeve expansion) is a suitable strategy


for face recognition because it identifies variability between human faces, which may
not be immediately obvious. Principal Component Analysis (hereafter PCA) does not
attempt to categorise faces using familiar geometrical differences, such as nose
length or eyebrow width. Instead, a set of human faces is analysed using PCA to
determine which 'variables' account for the variance of faces. In face recognition,
these variables are called eigen faces because when plotted they display an eerie
resemblance to human faces.

6.7 IMPROVING FACE DETECTION USING RECONSTRUCTION:

Reconstruction cannot be used as a means of face detection in images in near real-


time since it would involve resizing the face detection window area and large matrix
multiplication, both of which are computationally expensive. However, reconstruction
can be used to verify whether potential face locations identified by the deformable
template algorithm actually. Instead of just identifying a single potential face location,
the face detection algorithm can be modified to output many high 'faceness' locations
which can be verified using reconstruction. This is especially useful because
occasionally the best 'faceness' location found by the deformable template algorithm
may not contain the ideal frontal view face pixel area contain a face. If the
reconstructed image differs greatly from the face detection window then.

6.8 POSE INVARIANT FACE RECOGNITION:

Extending the frontal view face recognition system to a pose-invariant recognition


system is quite simple if one of the proposed specifications of the face recognition
system is relaxed. Successful pose-invariant recognition will be possible if many
images of a known individual are in the face database. Nine images from each
known individual can be taken as shown below. Then if an image of the same
individual is submitted within a 30o angle from the frontal view he or she can be
identified.
Fig.6.8 pose invariant face

Project category:
Deep Learning (Subset of ML)

Language and Software Tools used:

1. Python as a programming language


2. OpenCV
3. TensorFlow
4. Visual Studio Code
5. Kaggle dataset
6. CNN as an Algorithm

Project Life Cycle:


Waterfall Model: A classical model used in system development life cycle to create a system
with a linear and sequential approach. It is termed as waterfall because the model develops
systematically from one phase to another in downward fashion. The waterfall approach does
not define the process to go back to the previous phase to handle changes in requirement. The
waterfall approach is the earliest approach that was used for software development
CHAPTER – 7
RESULT ANALYSIS
According to the basic definition of emotion by Ekman and Friesen, emotions are divided
into seven classes, namely happy, sad, surprise, fear, disgust, neutral, angry. In this paper
talks about seven face detection classifiers using open CV and one of its classifiers
for drawing the boundary box around the face to detect the correct expression. For
training the CNN models we have used 48x48 grey–scale images from Kaggle’s Face
Expression Recognition Dataset(https://www.kaggle.com/jonathanoheix/face-expression-
recognition-dataset). The FER dataset is divided into two folders called test and train,
further divided into separate folder each containing one of the seven types of FER
data.

IN MODEL BUILDING:

Total params: 4,478,727


Trainable params: 4,474,759
Non-trainable params: 3,968
MAKING TRAINING AND VALIDATION DATA:

IMPORTING LIBRARIES:
DISPLAYING IMAGES:

FITTING THE MODEL WITH TRAINING AND VALIDATION DATA(RESULTS):

CNN model is set to 10 epochs when trained gives accuracy 66.7% It is been observed that
the designed CNN model can flawlessly detect facial expressions such as: Happy, sad,
surprise subject to 48x48 gray scale images only.
Accuracy:
 To study the accuracy of the model, the metrics is set to accuracy. The
loss is set to categorical crossentropy as the data has to be classified into
only the 7 defined categories and each image can belong to one
classification type only.
 CNN model is set to 10 epochs when trained gives accuracy 66.7% It is
been observed that the designed CNN model can flawlessly detect facial
expressions such as: Happy, sad, surprise subject to 48x48 gray scale
images only.
CHAPTER 8
EXPERIMENTAL SETUP
CHAPTER 9
RESULT ANALYSIS
To analyze the performance of the algorithm, extended Cohn–Kanade
expression dataset was used initially. Dataset had only 486 sequences with
97 posers, causing accuracy to reach up to 45% maximum. To overcome the
problem of low efficiency, multiple datasets were downloaded from the
Internet and also author’s own pictures at different expressions were
included. As the number of images in dataset increases, the accuracy also
increased. We kept 70% of 10K dataset images as training and 30% dataset
images as testing images. In all 25 iterations were carried out, with the
different sets of 70% training data each time. Finally, the error bar was
computed as the standard deviation. Fig shows the optimization of the
number of layers for CNN. For simplicity, we kept the number of layers and
the number of filters, for background removal CNN (first-part CNN) as well
as face feature extraction CNN (the second-part CNN) to be the same. In
this study, we varied the number of layers from 1 to 8. We found out that
maximum accuracy was obtained around 4. It was not very intuitive, as we
assume the number of layers is directly proportional to accuracy and
inversely proportional to execution time. Hence due to maximum accuracy
obtained with 4 layers, we selected the number of layers to be 4. The
execution time was increasing with the number of layers, and it was not
adding significant value to our study, hence not reported in the current
manuscript. Figure shows the number of filters optimization for both
layers.

Also, the vast amount of work can be done if each layer is fed with a
different number of filters. This could be automated using servers. Due to
computational power limitation of the author, we did not carry out this
study, but it will be highly appreciated if other researchers come out with a
better number than 4 (layers), 4 (filters) and increase the accuracy beyond
96%, which we could achieve. Below figure shows regular front-facing cases
with angry and surprise emotions, and the algorithm could easily detect
them.
CHAPTER-10
CONCLUSION
This work can be further studied and researched to find more accurate
models using different algorithms and image processing techniques. With
people coming more in this research field, there are chances that a fully
automated facial Emotion recognition system can be brought to the markets with
100 % of accuracy. Those models will be able to help researchers to build an
efficient Artificial Intelligence. One can’t think of a humanoid without the ability
of knowing what a person feels in order to help or give service to him/her.

In this project, a LeNet architecture based convolution neural network(CNN) is


implemented to classify human facial expresssions i.e, happy, sad, surprise, anger,
disgust, and neutral. The system has been evaluated using Accuracy,
Precision, Recall and F1-score. The classifier achieved accuracy of 56.77%,
precision of 0.57, recall 0.57 and F1-score 0.57.

This project proposes an approach for recognizing the category of facial


expressions. Face Detection of expressions from facial images is useful in many
applications, such as robotics vision, video surveillance, digital cameras, security
and human-computer interaction.

Facial expression recognition is a very challenging problem. More efforts should be


made to improve the classification performance for important
applications.Our future work will focus on improving the performance of the
system and deriving more appropriate classifications which may be
useful in many real world applications.
CHAPTER 11
REFERENCES
• Adelson, E. H., and Bergen, J. R. (1986) The Extraction of Spatio-Temporal Energy
in

• Human and Machine Vision, Proceedings of Workshop on Motion: Representation


and

• Analysis (pp. 151-155) Charleston, SC; May 7-9

• AAFPRS(1997). A newsletter from the American Academy of Facial Plastic and


Reconstructive Surgery. Third Quarter 1997, Vol. 11, No. 3. Page 3.

• Baron, R. J. (1981). Mechanisms of human facial recognition. International Journal


of Man Machine Studies, 15:137-178

• Beymer, D. and Poggio, T. (1995) Face Recognition From One Example View, A.I.
Memo No. 1536, C.B.C.L. Paper No. 121. MIT

• Bichsel, M. (1991). Strategies of Robust Objects Recognition for Automatic


Identification of Human Faces. PhD thesis, , Eidgenossischen Technischen
Hochschule, Zurich.

• Brennan, S. E. (1982) The caricature generator. M.S. Thesis. MIT.

• Brunelli, R. and Poggio, T. (1993), Face Recognition: Features versus Templates.


IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(10):1042-1052

• Craw, I., Ellis, H., and Lishman, J.R. (1987). Automatic extraction of face features.
Pattern Recognition Letters, 5:183-187, February.

• Deffenbacher K.A., Johanson J., and O'Toole A.J. (1998) Facial ageing,
attractiveness, and distinctiveness. Perception. 27(10):1233-1243

• Dunteman, G.H. (1989) Principal Component Analysis. Sage Publications.

• Frank, H. and Althoen, S. (1994). Statistics: Concepts and applications. Cambridge


University Press. p.110

• Gauthier, I., Behrmann, M. and Tarr, M. (1999). Can face recognition really be
dissociated from object recognition? Journal of Cognitive Neuroscience, in press.

• Goldstein, A.J., Harmon, L.D., and Lesk, A.B. (1971). Identification of human
faces. In Proc. IEEE, Vol. 59, page 748
• de Haan, M., Johnson, M.H. and Maurer D. (1998) Recognition of individual faces
and average face prototypes by 1- and 3- month-old infants. Centre for Brain and
Cognitive

• Development, Department of Psychology, Birkbeck College.

• Hadamard, J. (1923) Lectures on the Cauchy Problem in Linear Partial Differential


Equations , Yale University Press Department of ECE Page 50

• Haralick, R.M. and Shapiro, L.G.. (1992) Computer and Robot Vision, Volume I.
Addison-Wesley • Haxby, J.V., Ungerleider, L.G., Horwitz, B., Maisog, J.M.,
Rapoport, S.I., and Grady, C.L. (1996). Face encoding and recognition in the human
brain. Proc. Nat. Acad. Sci. 93: 922 - 927.

• Heisele, B. and Poggio, T. (1999) Face Detection. Artificial Intelligence Laboratory.


MIT

. • Jang., J., Sun, C., and Mizutani, E. (1997) Neuro-Fuzzy and Soft Computing.
Prentice Hall.

• Johnson, R.A., and Wichern, D.W. (1992) Applied Multivariate Statistical Analysis.
Prentice Hall. p356-39

You might also like