Professional Documents
Culture Documents
Facial Recognition 1
Facial Recognition 1
Facial Recognition 1
Major Project
Submitted in partial fulfilment of the requirement
In
To
By
Director of FGIET
Acknowledgement
It is our proud privilege and duty to acknowledge the kind of help and guidance
received from several people in preparation of this report. It would not have been
possible to prepare this project in this form without their valuable help, co-operation
and guidance.
First and foremost, we wish to record our sincere gratitude to Management of this
College and to our Director Digvijay Singh Chauhan, Feroze Gandhi Institute of
Engineering and Technology Raebareli, for his constant support and encouragement
in preparation of this project and for making available library and laboratory facilities
needed to prepare this project.
We express our sincere gratitude to our guide DR. Manish Saxena for guiding us in
investigations for this project and in carrying out experimental work. Our numerous
discussions with his were extremely helpful. We hold his in esteem for guidance,
encouragement, involvement and inspiration received from him. His vision and
execution aimed at creating a structure, definition and realism around the project and
fostered the ideal environment for us to learn and do. This project is a result of his
teaching, encouragement and inputs in the numerous meetings he had with us,
despite his busy schedule. The experience was novel one and we would like to thank
all the people, who have lent their valuable time for the completion of the report.
Without their consideration would have been difficult to complete the report.
Declaration
We, Rishabh and Animesh Pratap Singh do hereby declare that the project report
submitted to the FEROZE GANDHI INSTITUTE OF ENGINEERING AND
TECHNOLOGY, RAEBARELI in partial fulfilment for the entitled FACIAL EMOTION
RECOGNITION USING DEEP LEARNING, is an original piece of research work
carried out by me under the guidance and supervision of DR. Manish Saxena. We
further declare that the information has been collected from genuine & authentic
sources and We have not submitted this project report to this or any other university
for award if diploma or degree of certificate examination.
Abstract
Face recognition technology is a biometric technology, which is based on the
identification of facial features of a person. People collect the face images, and the
recognition equipment automatically processes the images. The paper introduces
the related researches of face recognition from different perspectives. The paper
describes the development stages and the related technologies of face recognition.
We introduce the research of face recognition for real conditions, and we introduce
the general evaluation standards and the general databases of face recognition. We
give a forward-looking view of face recognition. Face recognition has become the
future development direction and has many potential application prospects.
Keywords:
facial recognition
face detection
feature extraction
face verification
fiducial point
face alignment
convolutional neural networks
boosting
deep neural networks
video-based face recognition
infrared face recognition
TABLE OF CONTENTS
CERTIFICATE............................................................................................ ii
ACKNOWLEDGEMENT .................................................................................. iv
LIST OF FIGURES...................................................................................... v
ABSTRACT .............................................................................................. vi
CHAPTER 1. INRODUCTION…...................................................................... 1
LIST OF FIGURE:
Figure 1 - Photometric stereo image
Fig 6.1: Geometrical features (white) which could be used for face recognition
CHAPTER-1
INTRODUCTION
Face recognition is the task of identifying an already detected object as a known or
unknown face. Often the problem of face recognition is confused with the problem of
face detection Face Recognition on the other hand is to decide if the "face" is
someone known, or unknown, using for this purpose a database of faces in order to
validate this input face.
There are two predominant approaches to the face recognition problem: Geometric
(feature based) and photometric (view based). As researcher interest in face
recognition continued, many different algorithms were developed, three of which
have been well studied in face recognition literature.
1.2 FACE DETECTION: Face detection is one of the most widely used computer
vision applications. It is a fundamental problem in computer vision and pattern
recognition. In the last decade, multiple face feature detection methods have
been introduced. In recent years, the success of deep learning and convolutional
neural networks (CNN) have recently shown great results in powering highly-
accurate face detection solutions.
"To begin, the system must find the face in the image or video. Most cameras
now include a built-in facial detection feature. Snapchat, Facebook, and other
social media platforms employ face identification to let users apply effects to
images and videos taken using their apps. Many apps identify the person in the
photo using this, they can even find a person standing in a crowd with this face
detection technique."
The face detection system can be divided into the following steps:-
1. Pre-Processing: To reduce the variability in the faces, the images are processed
before they are fed into the network. All positive examples that is the face images
are obtained by cropping Department of ECE Page 3 images with frontal faces to
include only the front view. All the cropped images are then corrected for lighting
through standard algorithms.
3. Localization: The trained neural network is then used to search for faces in an
image and if present localize them in a bounding box. Various Feature of Face on
which the work has done on:- Position Scale Orientation Illumination.
Active Shape Model Active shape models focus on complex non-rigid features like
actual physical and higher level appearance of features Means that Active Shape
Models (ASMs) are aimed at automatically locating landmark points that define the
shape of any statistically modelled object in an image.
2.1.1 Deformable templates:
Deformable templates were then introduced by Yuille et al. to take into account the
a priori of facial features and to better the performance of snakes. Locating a facial
feature boundary is not an easy task because the local evidence of facial edges is
difficult to organize into a sensible global entity using generic contours. The low
brightness contrast around some of these features also makes the edge detection
process. Yuille et al. took the concept of snakes a step further by incorporating
global information of the eye to improve the reliability of the extraction process.
Based on low level visual features like color, intensity, edges, motion etc. Skin Color
Base Color is avital feature of human faces. Using skin-color as a feature for tracking
a face has several advantages. Color processing is much faster than processing
other facial features. Under certain lighting conditions, color is orientation invariant.
This property makes motion estimation much easier because only a translation
model is needed for motion estimation. Tracking human faces using color as a
feature has several problems like the color representation of a face obtained by a
camera is influenced by many factors (ambient light, object movement, etc.
Fig.2.2.face detection
2.4 FEATURE ANALYSIS:
These algorithms aim to find structural features that exist even when the pose,
viewpoint, or lighting conditions vary, and then use these to locate faces. These
methods are designed mainly for face localization.
Paul Viola and Michael Jones presented an approach for object detection which
minimizes computation time while achieving high detection accuracy. Paul Viola and
Michael Jones [39] proposed a fast and robust method for face detection which is 15
times quicker than any technique at the time of release with 95% accuracy at around
17 fps.The technique relies on the use of simple Haar-like features that are
evaluated quickly through the use of a new image representation. Based on the
concept of an ―Integral Image‖ it generates a large set of features and uses the
boosting algorithm AdaBoost to reduce the overcomplete set and the introduction of
a degenerative tree of the boosted classifiers provides for robust and fast
interferences. The detector is applied in a scanning fashion and used on gray-scale
images, the scanned window that is applied can also be scaled, as well as the
features evaluated.
All methods discussed so far are able to track faces but still some issue like locating
faces of various poses in complex background is truly difficult. To reduce this
difficulty investigator form a group of facial features in face-like constellations using
more robust modelling approaches such as statistical analysis. Various types of face
constellations have been proposed by Burl et al. . They establish use of statistical
shape theory on the features detected from a multiscale Gaussian derivative filter.
Huang et al. also apply a Gaussian filter for pre-processing in a framework based on
image feature analysis. Image Base Approach.
FER can appropriately predict individuals’ emotional state from the deformation
displays in the face as one of the cognitive and affective research fields. Many works
have been attempted in the field to make it an achievable task. FER research has
produced several models, and different FER databases together with their
annotations. The successes recorded so far in the literature are about FER models
that could predict the basic emotion from facial expression image. No consideration
is given to other aspects of FER research that considered the intensity estimation of
the emotion, Facial expression ambiguity, and the label inconsistency and correlation
of among labels. This section will present research diversities in FER as we
categorise them based on machine learning problem definitions; SLL, SLL extension
(FER and intensity estimation), MLL, and LDL. The trend in FER approaches to
emotion recognition is pictorially presented in Figure 2. Table 2 presents the
categories of emotion recognition research in FER with the associate limitations.
C. MULTILABEL LEARNING:
AU at different affective states is triggered in the same region of the face.
GLMM used the feature extracted for different expressions at the same
region to classify them into a zero or non-zero, making it possible for a group
to contain different expressions. The global solution of the model was
achieved by a function called Maximum Margin Hinge loss. GLMM was later
enhanced to Adaptive Group Lasso Regression [74] to assign a continuous
value to the distribution of expression present in a non-zero group. GLMM
shows its superior performance compares with some existing ML methods
from the experiment conducted on s-JAFFE.
Most of the FER databases do not come with distribution scores; to apply LDL to
these data sets, there is a need for methods that recover or relabel the data with
distribution scores. The few techniques that consider this challenge include; label
enhancement based on fuzzy clustering algorithms, which employs C-means
clustering to cluster feature vectors and iteratively minimise the objective function to
achieve label distribution from logical labels.
CHAPTER-4
PREVIOUS RESEARCH RELATED WORK
In this second application area, interest focuses on procedures for extracting image
information in a form suitable for computer processing. Examples includes automatic
character recognition, industrial machine vision for product assembly and inspection,
military recognizance, automatic processing of fingerprints etc
Image: Am image refers a 2D light intensity function f(x, y), where(x, y) denotes
spatial coordinates and the value of f at any point (x, y) is proportional to the
brightness or gray levels of the image at that point. A digital image is an image f(x, y)
that has been discretized both in spatial coordinates and brightness. The elements of
such a digital array are called image elements or pixels.
2. Image pre-processing: to improve the image in ways that increases the chances
for success of the other processes.
3. Image segmentation: to partitions an input image into its constituent parts of
objects.
4. Image segmentation: to convert the input data to a from suitable for computer
processing.
A digital image processing system contains the following blocks as shown in the
figure
Fig 3.3: Elements of digital image
1. Acquisition
2. Storage
3. Processing
4. Communication
5. Display
Machine Learning has been here for a while, there are a lot of open-source libraries
like TensorFlow where you can find a lot of pre-trained models and build cool stuff on
top of them, without starting from Scratch. What we are trying to achieve here falls
under Image Classification, where our Machine learning model has to classify the
faces in the images amongst the recognized people."
How it works:
There are four networks running behind the curtains. They all worked in a cascade
manner. So, the output of the first network is the starting point of the second one and
so on.
Faces detection
Facial landmarks detection
Face side detection
Embeddings generation
Installation:
We used only a couple of libraries, but for your simplicity we also created a
'requirements.txt' file. We purposely didn't add TensorFlow installation in the txt file,
because it's up to you to choose your preferred version (gpu, cpu, etc...). Moreover,
TensorFlow installation can be sometime pretty easy as 'pip install'
Database embeddings creation:
Recognizing facial expressions would help systems to detect if people were happy or
sad as a human being can. This will allow software’s and AI systems to provide an
even better experience to humans in various applications. From detecting probable
suicides and stopping them to playing mood-based music there is a wide variety of
applications where emotion detection or mood detection can play a vital role in AI
applications.
The system works on CNN (convolutional neural network) for extracting the
physiological signals and make a prediction. The results can be drawn out by
scanning the person’s image through a camera and then correlate it with a training
dataset to predict one’s state of emotions.
CHAPTER-5
IMPLEMENTATION
The problem of face recognition is all about face detection. This is a fact that seems
quite bizarre to new researchers in this area. However, before face recognition is
possible, one must be able to reliably find a face and its landmarks. This is
essentially a segmentation problem and in practical systems, most of the effort goes
into solving this task. In fact the actual recognition based on features extracted from
these facial landmarks is only a minor last step. There are two types of face
detection problems: 1) Face detection in images and 2) Real-time face detection.
Most face detection systems attempt to extract a fraction of the whole face, thereby
eliminating most of the background and other areas of an individual's head such as
hair that are not necessary for the face recognition task. With static images, this is
often done by running a across the image. The face detection system then judges if
a face is present inside the window (Brunelli and Poggio, 1993). Unfortunately, with
static images there is a very large search space of possible locations of a face in an
image.
Real-time face detection involves detection of a face from a series of frames from a
video capturing device. While the hardware requirements for such a system are far
more stringent, from a computer vision stand point, real-time face detection is
actually a far simpler process than detecting a face in a static image. This is because
unlike most of our surrounding Department of ECE Page 28 environment, people are
continually moving. We walk around, blink, fidget, wave our hands about, etc.
Fig.5.2.1 frame 1 from camera fig.5.2.2 frame 2 from camera
Since in real-time face detection, the system is presented with a series of frames in
which to detect a face, by using spatio-temperal filtering (finding the difference
between subsequent frames), the area of the frame that has changed can be
identified and the individual detected (Wang and Adelson, 1994 and Adelson and
Bergen 1986).Further more as seen in Figure exact face locations can be easily
identified by using a few simple rules, such as, 1)the head is the small blob above a
larger blob -the body Department of ECE Page 29 2)head motion must be
reasonably slow and contiguous -heads won't jump around erratically (Turk and
Pentland 1991a, 1991b).
Fig 6.1: Geometrical features (white) which could be used for face recognition
The advantage of using geometrical features as a basis for face recognition is that
recognition is possible even at very low resolutions and with noisy images (images
with many disorderly pixel intensities). Although the face cannot be viewed in detail
its overall geometrical configuration can be extracted for face recognition. The
technique's main disadvantage is that automated extraction of the facial geometrical
features is very hard. Automated geometrical
feature extraction based recognition is also very sensitive to the scaling and rotation
of a face in the image plane (Brunelli and Poggio, 1993). This is apparent when we
examine Kanade's(1973) results where he reported a recognition rate of between 45-
75 % with a database of only 20 people. However, if these features are extracted
manually as in Goldstein et al. (1971), and Kaya and Kobayashi (1972) satisfactory
results may be obtained.
This is similar the template matching technique used in face detection, except here
we are not trying to classify an image as a 'face' or 'non-face' but are trying to
recognize a face
The following problem scope for this project was arrived at after reviewing the
literature on face detection and face recognition, and determining possible real-world
situations where such systems would be of use. The following system(s)
requirements were identified 1 A system to detect frontal view faces in static images
2 A system to recognize a given frontal view face 3 Only Emotionless, frontal view
faces will be presented to the face detection & recognition 4 All implemented
systems must display a high degree of lighting invariency. 5 All systems must posses
near real-time performance. 6 Both fully automated and manual face detection must
be supported 7 Frontal view face recognition will be realised using only a single
known image 8 Automated face detection and recognition systems should be
combined into a fully automated face detection and recognition system. The face
recognition sub-system must display a slight degree of invariency to scaling and
rotation errors in the segmented image extracted by the face detection sub-system.
6.3 BRIEF OUT LINE OF THE IMPLEMENTED SYSTEM:
2.3 Emotions
3. Cartoon faces
Project category:
Deep Learning (Subset of ML)
IN MODEL BUILDING:
IMPORTING LIBRARIES:
DISPLAYING IMAGES:
CNN model is set to 10 epochs when trained gives accuracy 66.7% It is been observed that
the designed CNN model can flawlessly detect facial expressions such as: Happy, sad,
surprise subject to 48x48 gray scale images only.
Accuracy:
To study the accuracy of the model, the metrics is set to accuracy. The
loss is set to categorical crossentropy as the data has to be classified into
only the 7 defined categories and each image can belong to one
classification type only.
CNN model is set to 10 epochs when trained gives accuracy 66.7% It is
been observed that the designed CNN model can flawlessly detect facial
expressions such as: Happy, sad, surprise subject to 48x48 gray scale
images only.
CHAPTER 8
EXPERIMENTAL SETUP
CHAPTER 9
RESULT ANALYSIS
To analyze the performance of the algorithm, extended Cohn–Kanade
expression dataset was used initially. Dataset had only 486 sequences with
97 posers, causing accuracy to reach up to 45% maximum. To overcome the
problem of low efficiency, multiple datasets were downloaded from the
Internet and also author’s own pictures at different expressions were
included. As the number of images in dataset increases, the accuracy also
increased. We kept 70% of 10K dataset images as training and 30% dataset
images as testing images. In all 25 iterations were carried out, with the
different sets of 70% training data each time. Finally, the error bar was
computed as the standard deviation. Fig shows the optimization of the
number of layers for CNN. For simplicity, we kept the number of layers and
the number of filters, for background removal CNN (first-part CNN) as well
as face feature extraction CNN (the second-part CNN) to be the same. In
this study, we varied the number of layers from 1 to 8. We found out that
maximum accuracy was obtained around 4. It was not very intuitive, as we
assume the number of layers is directly proportional to accuracy and
inversely proportional to execution time. Hence due to maximum accuracy
obtained with 4 layers, we selected the number of layers to be 4. The
execution time was increasing with the number of layers, and it was not
adding significant value to our study, hence not reported in the current
manuscript. Figure shows the number of filters optimization for both
layers.
Also, the vast amount of work can be done if each layer is fed with a
different number of filters. This could be automated using servers. Due to
computational power limitation of the author, we did not carry out this
study, but it will be highly appreciated if other researchers come out with a
better number than 4 (layers), 4 (filters) and increase the accuracy beyond
96%, which we could achieve. Below figure shows regular front-facing cases
with angry and surprise emotions, and the algorithm could easily detect
them.
CHAPTER-10
CONCLUSION
This work can be further studied and researched to find more accurate
models using different algorithms and image processing techniques. With
people coming more in this research field, there are chances that a fully
automated facial Emotion recognition system can be brought to the markets with
100 % of accuracy. Those models will be able to help researchers to build an
efficient Artificial Intelligence. One can’t think of a humanoid without the ability
of knowing what a person feels in order to help or give service to him/her.
• Beymer, D. and Poggio, T. (1995) Face Recognition From One Example View, A.I.
Memo No. 1536, C.B.C.L. Paper No. 121. MIT
• Craw, I., Ellis, H., and Lishman, J.R. (1987). Automatic extraction of face features.
Pattern Recognition Letters, 5:183-187, February.
• Deffenbacher K.A., Johanson J., and O'Toole A.J. (1998) Facial ageing,
attractiveness, and distinctiveness. Perception. 27(10):1233-1243
• Gauthier, I., Behrmann, M. and Tarr, M. (1999). Can face recognition really be
dissociated from object recognition? Journal of Cognitive Neuroscience, in press.
• Goldstein, A.J., Harmon, L.D., and Lesk, A.B. (1971). Identification of human
faces. In Proc. IEEE, Vol. 59, page 748
• de Haan, M., Johnson, M.H. and Maurer D. (1998) Recognition of individual faces
and average face prototypes by 1- and 3- month-old infants. Centre for Brain and
Cognitive
• Haralick, R.M. and Shapiro, L.G.. (1992) Computer and Robot Vision, Volume I.
Addison-Wesley • Haxby, J.V., Ungerleider, L.G., Horwitz, B., Maisog, J.M.,
Rapoport, S.I., and Grady, C.L. (1996). Face encoding and recognition in the human
brain. Proc. Nat. Acad. Sci. 93: 922 - 927.
. • Jang., J., Sun, C., and Mizutani, E. (1997) Neuro-Fuzzy and Soft Computing.
Prentice Hall.
• Johnson, R.A., and Wichern, D.W. (1992) Applied Multivariate Statistical Analysis.
Prentice Hall. p356-39