Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

A Real-Time Face Detection

Method Based on Blink


Detection
SEMINAR REPORT
In partial fulfilment of the requirements for the award of

Bachelor of Technology
In
Computer Science and Engineering
Of
A P J Abdul Kalam Technological University
NOVEMBER 2023
Submitted By

FEBA T RAJEEV(CEK20CS018)
Under the Guidance of
”JINU L”

Department of Computer Science & Engineering


College of Engineering Kottarakkara
Kollam 691 531. Phone: 0474-2453300
http://www.cekottarakkara.ac.in
cekottarakkara@ihrd.ac.in
DECLARATION
I, FEBA T RAJEEV(CEK20CS018) declare the seminar of”A Real-
Time Face Detection Method Based on Blink Detection” is the
result of original work done by me and to the best of my knowledge; a
similar work has not been submitted to COLLEGE OF ENGINEERING
KOTTARAKKARA, for fulfillment of the requirement of a course of study.
This seminar report is submitted on partial fulfillment of the requirement
for the B.Tech Computer Science and Engineering.

01-11-2023 FEBA T RAJEEV


College of Engineering Kottarakkara
Dept. of Computer Science & Engineering

Certificate

This is to certify that this report titled ”A Real-Time Face Detec-


tion Method Based on Blink Detection” is a bonafide record of the
CSQ413 SEMINAR done by CEK20CS018, FEBA T RAJEEV Sev-
enth Semester B.Tech. Computer Science & Engineering student, under our
guidance and supervision, in partial fulfillment of the requirements for the
award of the degree, B.Tech.Computer Science and Engineering of A P J
Abdul Kalam Technological University during the academic year
2023-2024.

Guide Coordinator

Mr. Jinu L. Mrs. Indu P K


Assistant Professor Assistant Professor
Computer Science & Engineering Computer Science & Engineering

Head of the Department

Mr.Manoj Ray D
Assistant Professor
Computer Science & Engineering
Acknowledgments
We wish to thank the almighty God to whom we greatly indebted for past,
present and future of our life and in making this venture a success for us.

I am placing this record on our floral gratitude to our principal,


Dr.BHADRAN V , College of Engineering Kottarakkara, who gave us
this opportunity and encouragement to come across the project with flying
colors.

I am deeply greatful to the HOD Dr.Manoj Ray,Dept of CSE,for their


invaluable guidance and support throught the preparation of this seminar.

A sincere expression of gratitude goes to our teacher, Mrs.INDU PK,


Dept. of CSE, for the blessings and the help to improve the subject’s clarity.

We remain deeply grateful to our teacher, Mrs. GEETHU RAJU, Dept.


of CSE, for her immense motivation, support and patience throughout this
work.I am pleased to express our thanks to all teaching and non teaching
staff members of our college, for their technical support and prayers. We
owe our heartfelt thanks to our parents and the entire family members for
their unconditional support before and after the successful completion of this
course. Furthermore, a whole hearted thanks to all my friend’s comment,
care and help, which turned into the soul of this work.
Abstract

Face anti-spoofing refers to the computer determining whether the face


detected is a real face or a forged face. In user authentication scenarios,
photo fraud attacks are easy to occur, where an illegal user logs into the sys-
tem using a legitimate user’s picture. Aiming at this problem and the influ-
ence of illumination in real-time video face recognition, this paper proposes
a real-time face detection method based on blink detection. The method
first extracts the image texture features through the LBP algorithm, which
eliminates the problem of illumination changes to a certain extent. Then
the extracted features are input into the ResNet network, and the facial
feature extraction is enhanced by adding an attention mechanism is added
to enhance the face feature extraction. Meanwhile, the BiLSTM method is
used to extract the temporal characteristics of images from different angles
or at different times to obtain more facial details. In addition, the fusion
of local and global features is realised by SPP pooling, which enriches the
expression ability of feature maps and improves detection accuracy. Finally,
the eye EAR value is calculated by the face key point detection technology to
achieve face anti-spoofing, and then real-time face recognition against fraud
is realised. The experimental results show that the algorithm proposed in
this paper has good accuracy on NUAA, CASIA-SURF and CASIA-FASD
datasets, which can reach 99.48%, 98.65% and 99.17%, respectively.
Contents

1 Introduction 1
1.1 Overview of Existing System . . . . . . . . . . . . . . . . . . 1
1.2 Feature Extraction and Enhancement . . . . . . . . . . . . . 1
1.3 Fusion of Local and Global Features . . . . . . . . . . . . . . 2
1.4 Face Anti-Spoofing and Authentication . . . . . . . . . . . . 2

2 Related Works 3
2.1 Residual Neural Network . . . . . . . . . . . . . . . . . . . . 3
2.2 Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Bi-Directional Long-Short Term Memory . . . . . . . . . . . . 5
2.4 Spatial Pyramid Pooling . . . . . . . . . . . . . . . . . . . . . 6
2.5 Local Binary Pattern . . . . . . . . . . . . . . . . . . . . . . . 6
2.6 Face Key Point Technology . . . . . . . . . . . . . . . . . . . 8

3 Architecture/Design 10
3.1 Experimental Environment And Preprocessing . . . . . . . . 10
3.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Evaluation Indicators . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Algorithm Model . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Experimental Result And Analysis 15


4.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Comparison With Other Algorithms . . . . . . . . . . . . . . 15
4.2.1 Real-time Face Recognition . . . . . . . . . . . . . . . 16
4.2.2 Anti-Spoofing . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Ablation Experiment . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Single-Sheet Model Recognition Results . . . . . . . . . . . . 18

5 Conclusion 21

References 22

i
Chapter 1

Introduction

Introduction of Topic

1.1 Overview of Existing System


The existing systems for face recognition and anti-spoofing have made progress
in identity verification but face limitations, particularly in handling varia-
tions in lighting and addressing the threat of biometric spoofing. These
systems struggle with significant face angle changes and have difficulty main-
taining high recognition accuracy.
The proposed LBASResnet50 method aims to overcome these challenges
by combining Local Binary Pattern (LBP) for illumination-invariant feature
extraction, an Attention Mechanism with ResNet50 for enhanced facial fea-
ture selection, and Bidirectional Long Short-Term Memory (BiLSTM) to
capture temporal features. Additionally, Spatial Pyramid Pooling (SPP)
technology is employed to fuse local and global features, improving the net-
work’s generalization capabilities. Anti-spoofing is achieved through Eye
Aspect Ratio (EAR) calculation using face key point detection.
The method achieves impressive accuracy rates on various datasets,
demonstrating its effectiveness. It offers a comprehensive solution for real-
time face detection, addressing challenges related to lighting, posture, back-
ground, and fraudulent attempts while ensuring high recognition accuracy
and anti-spoofing measures.

1.2 Feature Extraction and Enhancement


This section focuses on the initial steps of the proposed real-time face de-
tection method. It begins by discussing the use of Local Binary Pattern
(LBP) to extract texture features from facial images. LBP is employed to
mitigate the effects of changing illumination conditions, which is a common
challenge in face recognition. This step is crucial because it helps the system

1
achieve a degree of illumination invariance, making it more robust to varying
lighting conditions. The section then delves into the use of the ResNet50
architecture as the base network for facial feature extraction. To enhance
feature extraction, an attention mechanism is integrated with ResNet50.
This mechanism highlights important facial features, contributing to more
accurate and reliable feature selection. Additionally, Bidirectional Long
Short-Term Memory (BiLSTM) is introduced to capture temporal features
from images taken at different angles or times. This temporal information
enhances the accuracy of feature selection and adds an extra layer of detail
to the recognition process.

1.3 Fusion of Local and Global Features


In this section, the paper explores the fusion of local and global features. It
introduces Spatial Pyramid Pooling (SPP) technology, which fully considers
both local and global information in the facial images. By combining these
features, the system’s network gains improved generalization abilities, mak-
ing it more versatile and adaptable to a wide range of scenarios. Feature
fusion is essential for achieving a holistic understanding of the facial data,
leading to better recognition outcomes.
The use of SPP pooling is pivotal in enriching the expression capability
of the feature map. By considering various scales of information, the system
can better understand the intricacies of the facial features. This section
emphasizes the significance of combining local and global features and how
it contributes to enhancing the model’s overall performance.

1.4 Face Anti-Spoofing and Authentication


This final section deals with the critical issue of face anti-spoofing and au-
thentication. The paper describes the use of eye-related features, specifi-
cally the Eye Aspect Ratio (EAR) value, which is calculated through face
key point detection technology. This technology helps distinguish between a
live human face and a spoofed or fraudulent attempt. If the system detects
a living body, it proceeds with face recognition, ensuring the recognition
results are effective.
The section also presents the impressive accuracy achieved by the pro-
posed model, LBASResnet50, on various datasets, including NUAA, CASIA-
SURF, and CASIA-MFSD. The high accuracy rates demonstrate the ef-
fectiveness of the model in real-world scenarios. Additionally, the section
outlines the key contributions of the proposed model, emphasizing the signif-
icance of each component and its impact on the overall performance. Over-
all, this section highlights the achievements and strengths of the developed
real-time face detection method in addressing authentication challenges and
anti-spoofing measures.

2
Chapter 2

Related Works

2.1 Residual Neural Network


With the continuous advancement of Convolutional Neural Networks (CNNs),
researchers have been progressively adding more convolutional layers to
these networks to extract deeper and more intricate features from images.
However, as the depth of these networks increases, they encounter a critical
challenge: the vanishing gradient problem. When the number of network
layers becomes exceedingly deep, the gradients necessary for training the
network diminish, leading to a drop in network accuracy. To address this
limitation, the Residual Neural Network (ResNet) architecture was intro-
duced.
At the heart of ResNet lies the innovative concept of residual units.
These units incorporate skip connections, illustrated in Figure 1, enabling
information to flow directly across layers, effectively bypassing layers that
may not be contributing to the learning process. The outcome is that the
final output of a residual unit is the sum of its input and the result of
the convolution operation. In mathematical terms, if the input image is
denoted as ’x,’ the output ’H(x)’ is calculated as ’H(x) = F(x) + x,’ where
’F(x)’ represents the output after applying the convolution operation. This
approach allows the network to continue nonlinear transformations despite
its depth, effectively mitigating the vanishing gradient problem.
ResNet networks are characterized by two primary residual structures:
Building Blocks, suitable for shallower networks, and Bottlenecks, designed
for networks with more extensive layers. The Bottleneck structure, as
shown in Figure 2, is particularly well-suited for deeper ResNet networks
like ResNet50, which is employed in the method proposed in this paper.
The ResNet50 structure employed in this research leverages the Bottle-
neck residual network configuration. This choice aligns with the specific
design of the ResNet50 architecture and emphasizes the importance of han-
dling deep-layer networks effectively. The Bottleneck architecture introduces

3
a bottleneck layer to reduce computational complexity while preserving the
network’s expressive power. This, in turn, allows ResNet50 to capture in-
tricate and nuanced features from images, even in the presence of numerous
network layers.

2.2 Attention Mechanism


The concept of the attention mechanism, as described in reference [8], finds
its roots in the study of human vision and cognitive science. Human percep-
tion is limited by information processing bottlenecks, leading to a selective
focus on specific elements of visual information while disregarding others.
This fundamental idea is replicated in deep learning through the attention
mechanism, which enables neural networks to determine which portions of
the input data are critical and should receive more significant information
processing resources. This selective focus is achieved through the use of
weight vectors that guide the network’s attention.
In this paper, the authors adopt the Squeeze-and-Excitation Network
(SENet), which employs a channel attention mechanism to capture the most
pertinent information pertaining to facial features. The SENet structure,
illustrated in Figure 3, encompasses two pivotal operations: Squeeze and
Excitation.
The Squeeze operation (Fsq) is responsible for conducting a global aver-
age pooling operation on the input features. This operation compresses the
original features with dimensions H × W × C into a compact representation

4
of 1 × 1 × C. This reduction condenses the spatial information into a single
channel, preparing it for further processing.
The Excitation operation (Fex) is equally significant, involving two fully
connected layers applied to the result of the Squeeze operation. Subse-
quently, the Sigmoid activation function is employed to obtain a weight
matrix. This weight matrix is then utilized in a scaling operation (Fscale),
allowing the network to emphasize and de-emphasize specific features based
on their importance.
The SENet’s attention mechanism facilitates the network’s ability to dy-
namically focus on vital facial features. By adaptively weighting different
channels, it ensures that the most relevant information is captured and uti-
lized for subsequent processing, leading to improved feature selection and,
ultimately, enhancing the system’s performance in face detection and recog-
nition. This approach is integral to the real-time face detection method’s
effectiveness in this paper, as it enables the network to pay attention to the
most salient aspects of facial data, even amidst varying lighting conditions
and challenging poses.

2.3 Bi-Directional Long-Short Term Memory


LSTM, or Long Short-Term Memory [9], is a variant of Recurrent Neu-
ral Networks (RNNs) specially designed for effectively modeling time series
data. Another variant of LSTM is BiLSTM, which stands for Bi-directional
Long Short-Term Memory. It comprises both a forward LSTM and a back-
ward LSTM, working in tandem to capture contextual information. The
network structure of BiLSTM is visually represented in Figure 4, with w1

5
to w6 denoting shared weights.
This research paper discusses the integration of BiLSTM to enhance the
extraction of bidirectional sequence features from images. By doing so, it in-
creases the volume of information accessible to the network model, enhances
the algorithm’s contextual awareness, and ultimately leads to improved ac-
curacy in video face recognition.

2.4 Spatial Pyramid Pooling


As we know, basically all CNNs require a fixed input data size. But when
dealing with pictures, most of them are of different sizes and have different
pixel values. For the same batch of data, if it has to be processed into
images of the same size after a certain amount of cropping, there may be
some problems. For example, when some areas are cropped, there will be
repetitions, which invisibly increases the weight of the area. This paper uses
Spatial Pyramid Pooling (SPP) to solve the problem that the input image
size of CNN must be fixed so that the input image aspect ratio and size
can be arbitrary. SPP pooling performs three convolution operations on the
input feature maps, as shown in Figure 5. The following figures respectively
perform 4 × 4, 2 × 2, and 1 × 1 convolution operations. These output
feature maps are then flattened into a one-dimensional array. Then, these
one-dimensional arrays are concatenated. Finally, the feature map obtained
by splicing is sent to the fully connected layer.

2.5 Local Binary Pattern


Local Binary Pattern (LBP) is a widely used image processing technique
that captures local texture information in images. LBP is valued for its

6
key attributes: rotation invariance and grayscale invariance. In a typical
LBP operation, a 3x3 window is applied to an image, with the center pixel
serving as a threshold. The grayscale values of the eight neighboring pixels
are compared to this threshold. When a neighboring pixel’s grayscale value
exceeds that of the center pixel, it is marked as 1; otherwise, it is marked
as 0. This binary pattern generates an 8-bit binary number, which can be
converted into a decimal representation, resulting in 256 possible patterns.
LBP is instrumental in characterizing textures, patterns, and details
within an image, making it particularly useful in various computer vision
applications, including facial recognition, texture analysis, and object de-
tection. Its invariance properties enable it to work effectively across various
lighting conditions and orientations.
The LBP value for the center pixel within the 3x3 window serves as a
concise representation of the local texture information in that specific region,
making it a valuable tool for feature extraction. It has found applications in
pattern recognition, image analysis, and machine learning, contributing to
the understanding and processing of images in computer vision systems.
Among them, p represents the p-th pixel in the window except for the
center pixel. I(c) represents the gray value of the center pixel. I(p) represents

7
the gray value of the p-th pixel in the field. The formula for s(x) is as follows:

2.6 Face Key Point Technology


In this study, the authors utilized the 68 feature point model from Dlib for
facial detection, creating a detailed face feature map as depicted in Figure 6.
The key focus in their face detection process, specifically for the 68 feature
points, was on the eyes. Each eye was represented by six distinct feature
points, as illustrated in Figure 7. These points were numbered in a clockwise
manner, starting from the left corner of the eye.
The calculated Eye Aspect Ratio (EAR) played a pivotal role in assessing
eye states. When a human eye is open, the EAR exhibits periodic fluctu-
ations within a specific range. In contrast, when the human eye is closed,
the EAR decreases rapidly, theoretically approaching a value close to zero.
This EAR metric serves as a vital indicator for eye state recognition. By
monitoring EAR values, the system can effectively distinguish between open
and closed eyes, which is crucial for applications such as drowsiness detec-
tion in drivers or ensuring alertness in various human-computer interaction
systems. This approach demonstrates the paper’s emphasis on precise eye
state analysis for its intended application.

8
9
Chapter 3

Architecture/Design

3.1 Experimental Environment And Preprocess-


ing
The experiments in this study were conducted on a computing platform
with an Intel i7-6000CPU, operating at 3.40 GHz, and 32GB of memory.
The researchers employed the Python programming language and utilized
NumPy for efficient matrix computations. The backend framework of choice
was TensorFlow, which was used to develop an improved ResNet neural
network.
Prior to training the model, the researchers performed essential data
preprocessing steps. They leveraged the built-in frontal face detector from
Dlib, a popular computer vision library, for face detection within the images.
Detected faces were subsequently adjusted and resized to a standardized
format of 224x224 pixels. This step ensured that all input images were
consistent in size and orientation, a crucial requirement for training deep
learning models effectively.
The original dataset was then divided into two subsets: a training set
and a test set. This division followed a common practice, splitting the data
with a ratio of 7:3, where 70 percent of the data was allocated for training
and the remaining 30 percent for testing. Moreover, the labels associated
with the images were categorized into two classes: real faces and fraudulent
faces. This binary classification approach is typically used in scenarios where
the objective is to distinguish between genuine facial images and fraudulent
or manipulated ones, such as in facial recognition or security applications.
The combination of these hardware specifications, software tools, and
data preprocessing techniques provided a robust foundation for the subse-
quent experiments in training and evaluating the improved ResNet network’s
performance, particularly in the context of facial image classification and
fraud detection.

10
3.2 Datasets
The research paper utilized three distinct datasets for experimental evalu-
ation: the NUAA dataset, CASIA-SURF dataset [14], and CASIA-MFSD
dataset. Each of these datasets contributed to the paper’s comprehensive
assessment of face recognition and anti-fraud techniques.

1. NUAA Dataset: The NUAA dataset is designed for the specific pur-
pose of detecting photo-printing fraud. It consists of images from 15
individuals, with 500 images per individual. These images have a reso-
lution of 640x480 pixels. Each individual has both real and fraudulent
face images in the dataset, making it suitable for assessing the perfor-
mance of anti-fraud techniques.

2. CASIA-SURF Dataset: CASIA-SURF is a diverse dataset containing


various forms of data, including RGB, depth, and infrared (IR) images,
capturing individuals of different ages. This dataset comprises a total
of 1,000 different subjects and includes a staggering 21,000 videos.
The RGB images have a high resolution of 1280x720, while depth and
IR images have a resolution of 640x480. Notably, each sample in this
dataset records both a genuine video and six attack videos. These
attacks involve occlusions of areas such as the eyes, nose, and mouth,
making it valuable for evaluating the robustness of facial recognition
systems against spoofing attacks.

3. CASIA-FASD Dataset: CASIA-FASD is a face anti-fraud database


designed for both video and photo prints. Data collection involved
three different qualities of cameras, and it engaged 50 volunteers. Each
volunteer contributed 12 videos captured under varying conditions,
including different resolutions and lighting conditions. This dataset
serves as a rich resource for testing anti-fraud techniques in both photo
and video scenarios, reflecting real-world challenges in face recognition
and anti-spoofing measures.

The dataset has a total of 600 video recordings. The sample picture is shown
in Figure 10, the upper part is a real face, and the lower part is a fraudulent
face.
By utilizing these diverse datasets, the paper aimed to provide a com-
prehensive evaluation of face recognition systems and their effectiveness in
handling fraud detection and anti-spoofing challenges. These datasets en-
compass various aspects of facial recognition, including photo attacks, video
attacks, and different image modalities, enabling a thorough assessment of
the proposed techniques in real-world scenarios.

11
3.3 Evaluation Indicators
The paper utilizes two primary evaluation metrics to assess the effective-
ness of its face recognition and anti-spoofing algorithms. The first met-
ric, accuracy (ACC), is a common and intuitive measure of classification
performance. It calculates the proportion of correct classifications by sum-
ming the true positive (TP) and true negative (TN) predictions and divid-
ing it by the total number of samples. A higher accuracy score indicates
a better-performing classifier, demonstrating its ability to accurately cat-
egorize samples as positive or negative. Additionally, accuracy provides
an easy-to-understand performance indicator, making it a go-to metric for
many classification tasks.
In the context of face anti-spoofing, the paper introduces a second metric,
the half-error rate (HTER), which is specifically relevant for evaluating the
model’s ability to distinguish between genuine and fraudulent faces. HTER
takes into account both the false acceptance rate (FAR) and the false re-
jection rate (FRR). FAR quantifies the rate at which genuine samples are
incorrectly classified as fraudulent, while FRR represents the rate at which
fraudulent samples are mistakenly classified as genuine. HTER is calculated
as the average of FAR and FRR, emphasizing the importance of minimizing
both types of errors. A lower HTER signifies improved model performance,
indicating a more balanced approach that mitigates the risks of incorrectly
accepting genuine faces as fraudulent and wrongly rejecting fraudulent faces
as genuine. This makes HTER a crucial metric for evaluating the robust-
ness and reliability of anti-spoofing systems, which must contend with the
challenges of distinguishing between real and fake faces to ensure security

12
and accuracy in applications such as face recognition and fraud detection.
By employing both accuracy and HTER, the paper provides a well-rounded
evaluation of its algorithms, addressing the specific requirements and trade-
offs in face anti-spoofing scenarios.

3.4 Algorithm Model


The proposed algorithm network model LBASResnet50 is shown in Figure
11. First, preprocess the image, using the frontalfacedetector in the dlib
library to detect the face, and adjust the image size to 224 × 224. Since the
LBP features of real face images and fraudulent face, images are different,
and the LBP algorithm has the advantages of rotation invariance and gray
invariance, which effectively avoids the influence of illumination changes,
we extract the texture features of the input images through the LBP algo-
rithm. Then the extracted texture features are input into the network model
based on the ResNet50 network. Compared with other network architec-
tures, ResNet solves the problem of vanishing gradients caused by deepening
network layers. In the improved ResNet50 network called Resnet50-BAS,
BiLSTM is firstly added to extract the bidirectional feature information of
the image, to better consider the local and global features. Then, the classi-
cal channel attention mechanism SeNet is added, which gives higher weight
to important features, so that we can recognize the important features of
pictures more clearly. Finally, SPP pooling is used in the network to improve
the robustness of the model. In the LBASResnet50 model, after several ex-
periments, we found that the experimental results are not good when the
threshold is less than 0.9. The experimental results fluctuate a lot when
the threshold is larger than 0.9, and the results are not good. Therefore, we
chose 0.9 as the threshold value. In addition, before performing LBP texture
feature extraction, we first performed pre-processing by cropping the origi-
nal image to 224 × 224 size, so that the texture image after LBP algorithm
is also 224 × 244 size. Therefore, the texture image will look slightly larger
than the original image. The structural model of ResNet50-BAS is shown in
Figure 12. Part of the LBP texture map (selected from the NUAA dataset
example) is shown in Figure 13, the left is the texture map of the real face,
and the right is the texture map of the photo. According to the above net-
work model, the selected dataset is trained, and then the trained model is
used for real-time anti-fraud face detection. First, real-time face images are
extracted from the camera. Then the extracted image is fed into the trained
model to get the classified face image. Finally, in order to improve the effect
of living detection, the classified images are further detected by eye blink to
further judge whether they are living, and the final conclusion is drawn.

13
14
Chapter 4

Experimental Result And


Analysis

4.1 Experimental Results


The LBASResnet50 model is a deep neural network built upon the ResNet-
50 architecture, known for its effectiveness in image recognition tasks. Dur-
ing training, the Adam optimizer is utilized, which is favored for its quick
convergence and computational efficiency. The model is trained on three dis-
tinct datasets: NUAA, CASIA-SURF, and CASIA-FASD, each for a specific
number of epochs. It undergoes 85 epochs of training on the NUAA dataset
and 80 epochs on both the CASIA-SURF and CASIA-FASD datasets. Mini-
batches of 128 samples are used during training.
The primary objective of training is to minimize the cross-entropy loss
function, a common choice for classification tasks, which aids in enhancing
the model’s accuracy. After the training process, the model demonstrates
impressive accuracy rates, achieving 99.48 percent on NUAA, 98.65 percent
on CASIA-SURF, and 99.39 percent on CASIA-FASD. The loss function
gradually decreases, indicating that the model is learning and improving its
predictions.
The overall performance of the LBASResnet50 model is noteworthy, with
an average accuracy of 99.17 percent across the three datasets. This demon-
strates its effectiveness and generalizability in recognizing and classifying im-
ages. The accuracy and loss curves also illustrate that, after a certain period
of training, the model’s accuracy stabilizes with less fluctuation, confirming
the robustness of the model’s learned features.

4.2 Comparison With Other Algorithms


The provided text discusses a comparison between the proposed LBASRes-
net50 model and other real-time face recognition models, as well as its per-

15
formance in anti-spoofing. Let’s delve into a more detailed explanation of
the findings:

4.2.1 Real-time Face Recognition


The LBASResnet50 model outperforms other real-time face recognition mod-
els, as indicated in Table 1. The superiority of this model can be attributed
to several key factors:

• LBP Algorithm: The utilization of the Local Binary Pattern (LBP)


algorithm in the model is significant. LBP is known for its robustness
to variations in lighting conditions, making the model more resilient
to illumination changes. This is a crucial advantage in real-world sce-
narios where lighting conditions can be unpredictable.

• BiLSTM: The incorporation of Bidirectional Long Short-Term Mem-


ory (BiLSTM) is another key feature. BiLSTM models can cap-
ture temporal dependencies between consecutive frames in video data,
which is crucial for real-time video recognition. This means that the
model can consider the timing characteristics between frames, enhanc-
ing its accuracy in recognizing faces in dynamic video streams.

• SENet Mechanism: The addition of the Squeeze-and-Excitation Net-


work (SENet) mechanism is a notable enhancement. SENet improves
feature extraction by dynamically recalibrating channel-wise feature
responses. This aids in capturing more discriminative features, which
is particularly important for real-time face recognition where subtle
details can make a significant difference.

4.2.2 Anti-Spoofing
Table 2 shows a comparison of Half Total Error Rate (HTER) between the
LBASResnet50 model and other in vivo detection models, emphasizing its
performance in anti-spoofing. Here’s a further explanation:

16
The LBASResnet50 model is effective in face anti-spoofing, which is the
task of distinguishing between genuine faces and spoof attempts (e.g., photos
or masks). The lower HTER values in Table 2 indicate that this model excels
in anti-spoofing measures. This success can be attributed to its ability to
capture subtle cues and features that differentiate real faces from fraudulent
ones.
In summary, the LBASResnet50 model achieves superior performance in
real-time face recognition due to its illumination-robust LBP algorithm, tem-
poral awareness through BiLSTM, and feature enhancement via the SENet
mechanism. It also demonstrates strong capabilities in anti-spoofing, which
is a critical aspect of modern face recognition systems, ensuring the model’s
reliability in various practical applications, including security and access
control.

4.3 Ablation Experiment


The paper presents an algorithm for face recognition that incorporates four
key components: BiLSTM, SENet, SPP, and ResNet50. To demonstrate the
effectiveness of this algorithm, the authors conduct an ablation analysis by
systematically evaluating different configurations on three datasets: NUAA,
CASIA-SURF, and CASIA-FASD. The results are summarized in Table 3,
and comparison result plots are provided for the base ResNet50 structure
(item a) in Figures 14-16.
This ablation analysis serves to dissect the contributions of each compo-
nent and assess their impact on recognition performance. Let’s break down
the key findings:
a) ResNet50 as the Base Network: The starting point is ResNet50, a
well-established convolutional neural network architecture. This provides a
baseline to compare against. The accuracy achieved in this configuration
establishes a reference point.

17
b) Addition of BiLSTM: The incorporation of the Bi-directional Long
Short-Term Memory (BiLSTM) module enhances the network’s ability to
capture temporal dependencies between frames in video data, which is cru-
cial for real-time recognition. It contributes positively to accuracy.
c) Addition of SENet: The inclusion of the Squeeze-and-Excitation Net-
work (SENet) module improves feature extraction, enhancing the model’s
capability to capture and emphasize important features. This contributes
to a further boost in accuracy.
d) Addition of SPP: The Spatial Pyramid Pooling (SPP) module intro-
duces spatial information handling, making the network more robust and
adaptable to varying spatial resolutions in input images, which also posi-
tively impacts accuracy.
e) Combined BiLSTM, SENet, and SPP: The final algorithm includes
all components, combining the strengths of BiLSTM, SENet, and SPP with
ResNet50. The results demonstrate that this comprehensive configuration
outperforms the other ablation settings, achieving better recognition accu-
racy and showing strong generalization ability across the three datasets.

4.4 Single-Sheet Model Recognition Results


The provided information discusses the recognition performance of the LBAS-
Resnet50 model on the NUAA dataset, as well as its real-time recognition

18
capabilities using a camera and blink detection. Let’s break down the key
points and their implications:
Recognition on NUAA Dataset: Figure 17 illustrates the recognition
results on a randomly selected image from the NUAA dataset. In (a),
LBASResnet50 correctly classifies a real face, demonstrating its ability to
effectively identify genuine faces. In (b), it correctly identifies a false face,
showcasing its capability to discern fake or spoofed faces.
This is a crucial achievement, as it validates the model’s proficiency in
distinguishing real and fake faces, which is a fundamental requirement for
security and authentication applications.
Real-time Recognition Using a Camera and Blink Detection: In Figure
18, LBASResnet50’s real-time recognition performance is evaluated using a
camera, along with additional blink detection, to further enhance security
measures. The results for real and fake faces are presented as (a) and (b)
respectively.
(a) shows the successful recognition of a real face, indicating that the
model can reliably authenticate individuals in real-time scenarios, such as
access control or identity verification.
(b) demonstrates the model’s ability to detect fake faces, which is partic-
ularly important in countering various spoofing attempts, including the use
of photos or videos. The capability to differentiate between real and non-
real faces is a critical feature for ensuring the system’s robustness against
fraudulent access.

19
20
Chapter 5

Conclusion

This paper proposes a real-time face detection method based on blink de-
tection called LBASResnet50 to solve the problems of illumination and ex-
pression changes in the process of real-time face recognition. The model
takes ResNet50 as the basic network structure and sends the texture fea-
tures extracted by the LBP algorithm into the basic network to improve
the tolerance to illumination in the recognition process. Then by adding
BiLSTM to obtain context information, it is convenient to extract time se-
ries features, so as to improve the accuracy of real-time recognition. At
the same time, the channel attention mechanism is added to extract key
feature information and assign important weights, and SPP pooling is used
to improve the robustness of the model. Finally, the real face is judged
by eye blink detection. The experimental results indicate that the method
proposed in this paper has a good effect on the accuracy of anti-spoofing
realtime face recognition. Due to the different structures of paper, electronic
device screens and real faces, the facial images acquired by cameras differ in
brightness and illumination information. In the next research, we will con-
sider efficiently separating brightness and reflected light features from RGB
images to further improve model performance. In addition, we will consider
applying sparse representation to deep learning based on face recognition.

21
References

[1] Z. Wei, C. Gang, H. Gang, and Y. Shi, “Real-time face recognition sys-
tem based on Gabor wavelet and LBPH,” Comput. Technol. Develop.,
vol. 29, no. 3, pp. 47–50, 2019.

[2] A. Liu, X. Li, J. Wan, Y. Liang, S. Escalera, H. J. Escalante, M.


Madadi, Y. Jin, Z. Wu, X. Yu, Z. Tan, Q. Yuan, R. Yang, B. Zhou,
G. Guo, and S. Z. Li, “Cross-ethnicity face anti-spoofing recognition
challenge: A review,” IET Biometrics, vol. 10, no. 1, pp. 24–43, Jan.
2021.

[3] D. Xiong and W. Hongchun, “Face liveness detection algorithm based


on deep learning and feature fusion,” Comput. Appl., vol. 40, no. 4,
pp. 1009–1015, 2020.

[4] H. Qi, Y. Shi, X. Mu, and M. Hou, “Knowledge granularity for con-
tinuous parameters,” IEEE Access, vol. 9, pp. 89432–89438, 2021.

[5] R. Cai, “DRL-FAS: A novel framework based on deep reinforcement


learning for face anti-spoofing,” IEEE Trans. Inf. Forensics Security
vol. 16, pp. 937–951, 2021.

22

You might also like