Professional Documents
Culture Documents
Seminar Title 2
Seminar Title 2
Bachelor of Technology
In
Computer Science and Engineering
Of
A P J Abdul Kalam Technological University
NOVEMBER 2023
Submitted By
FEBA T RAJEEV(CEK20CS018)
Under the Guidance of
”JINU L”
Certificate
Guide Coordinator
Mr.Manoj Ray D
Assistant Professor
Computer Science & Engineering
Acknowledgments
We wish to thank the almighty God to whom we greatly indebted for past,
present and future of our life and in making this venture a success for us.
1 Introduction 1
1.1 Overview of Existing System . . . . . . . . . . . . . . . . . . 1
1.2 Feature Extraction and Enhancement . . . . . . . . . . . . . 1
1.3 Fusion of Local and Global Features . . . . . . . . . . . . . . 2
1.4 Face Anti-Spoofing and Authentication . . . . . . . . . . . . 2
2 Related Works 3
2.1 Residual Neural Network . . . . . . . . . . . . . . . . . . . . 3
2.2 Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Bi-Directional Long-Short Term Memory . . . . . . . . . . . . 5
2.4 Spatial Pyramid Pooling . . . . . . . . . . . . . . . . . . . . . 6
2.5 Local Binary Pattern . . . . . . . . . . . . . . . . . . . . . . . 6
2.6 Face Key Point Technology . . . . . . . . . . . . . . . . . . . 8
3 Architecture/Design 10
3.1 Experimental Environment And Preprocessing . . . . . . . . 10
3.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Evaluation Indicators . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Algorithm Model . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Conclusion 21
References 22
i
Chapter 1
Introduction
Introduction of Topic
1
achieve a degree of illumination invariance, making it more robust to varying
lighting conditions. The section then delves into the use of the ResNet50
architecture as the base network for facial feature extraction. To enhance
feature extraction, an attention mechanism is integrated with ResNet50.
This mechanism highlights important facial features, contributing to more
accurate and reliable feature selection. Additionally, Bidirectional Long
Short-Term Memory (BiLSTM) is introduced to capture temporal features
from images taken at different angles or times. This temporal information
enhances the accuracy of feature selection and adds an extra layer of detail
to the recognition process.
2
Chapter 2
Related Works
3
a bottleneck layer to reduce computational complexity while preserving the
network’s expressive power. This, in turn, allows ResNet50 to capture in-
tricate and nuanced features from images, even in the presence of numerous
network layers.
4
of 1 × 1 × C. This reduction condenses the spatial information into a single
channel, preparing it for further processing.
The Excitation operation (Fex) is equally significant, involving two fully
connected layers applied to the result of the Squeeze operation. Subse-
quently, the Sigmoid activation function is employed to obtain a weight
matrix. This weight matrix is then utilized in a scaling operation (Fscale),
allowing the network to emphasize and de-emphasize specific features based
on their importance.
The SENet’s attention mechanism facilitates the network’s ability to dy-
namically focus on vital facial features. By adaptively weighting different
channels, it ensures that the most relevant information is captured and uti-
lized for subsequent processing, leading to improved feature selection and,
ultimately, enhancing the system’s performance in face detection and recog-
nition. This approach is integral to the real-time face detection method’s
effectiveness in this paper, as it enables the network to pay attention to the
most salient aspects of facial data, even amidst varying lighting conditions
and challenging poses.
5
to w6 denoting shared weights.
This research paper discusses the integration of BiLSTM to enhance the
extraction of bidirectional sequence features from images. By doing so, it in-
creases the volume of information accessible to the network model, enhances
the algorithm’s contextual awareness, and ultimately leads to improved ac-
curacy in video face recognition.
6
key attributes: rotation invariance and grayscale invariance. In a typical
LBP operation, a 3x3 window is applied to an image, with the center pixel
serving as a threshold. The grayscale values of the eight neighboring pixels
are compared to this threshold. When a neighboring pixel’s grayscale value
exceeds that of the center pixel, it is marked as 1; otherwise, it is marked
as 0. This binary pattern generates an 8-bit binary number, which can be
converted into a decimal representation, resulting in 256 possible patterns.
LBP is instrumental in characterizing textures, patterns, and details
within an image, making it particularly useful in various computer vision
applications, including facial recognition, texture analysis, and object de-
tection. Its invariance properties enable it to work effectively across various
lighting conditions and orientations.
The LBP value for the center pixel within the 3x3 window serves as a
concise representation of the local texture information in that specific region,
making it a valuable tool for feature extraction. It has found applications in
pattern recognition, image analysis, and machine learning, contributing to
the understanding and processing of images in computer vision systems.
Among them, p represents the p-th pixel in the window except for the
center pixel. I(c) represents the gray value of the center pixel. I(p) represents
7
the gray value of the p-th pixel in the field. The formula for s(x) is as follows:
8
9
Chapter 3
Architecture/Design
10
3.2 Datasets
The research paper utilized three distinct datasets for experimental evalu-
ation: the NUAA dataset, CASIA-SURF dataset [14], and CASIA-MFSD
dataset. Each of these datasets contributed to the paper’s comprehensive
assessment of face recognition and anti-fraud techniques.
1. NUAA Dataset: The NUAA dataset is designed for the specific pur-
pose of detecting photo-printing fraud. It consists of images from 15
individuals, with 500 images per individual. These images have a reso-
lution of 640x480 pixels. Each individual has both real and fraudulent
face images in the dataset, making it suitable for assessing the perfor-
mance of anti-fraud techniques.
The dataset has a total of 600 video recordings. The sample picture is shown
in Figure 10, the upper part is a real face, and the lower part is a fraudulent
face.
By utilizing these diverse datasets, the paper aimed to provide a com-
prehensive evaluation of face recognition systems and their effectiveness in
handling fraud detection and anti-spoofing challenges. These datasets en-
compass various aspects of facial recognition, including photo attacks, video
attacks, and different image modalities, enabling a thorough assessment of
the proposed techniques in real-world scenarios.
11
3.3 Evaluation Indicators
The paper utilizes two primary evaluation metrics to assess the effective-
ness of its face recognition and anti-spoofing algorithms. The first met-
ric, accuracy (ACC), is a common and intuitive measure of classification
performance. It calculates the proportion of correct classifications by sum-
ming the true positive (TP) and true negative (TN) predictions and divid-
ing it by the total number of samples. A higher accuracy score indicates
a better-performing classifier, demonstrating its ability to accurately cat-
egorize samples as positive or negative. Additionally, accuracy provides
an easy-to-understand performance indicator, making it a go-to metric for
many classification tasks.
In the context of face anti-spoofing, the paper introduces a second metric,
the half-error rate (HTER), which is specifically relevant for evaluating the
model’s ability to distinguish between genuine and fraudulent faces. HTER
takes into account both the false acceptance rate (FAR) and the false re-
jection rate (FRR). FAR quantifies the rate at which genuine samples are
incorrectly classified as fraudulent, while FRR represents the rate at which
fraudulent samples are mistakenly classified as genuine. HTER is calculated
as the average of FAR and FRR, emphasizing the importance of minimizing
both types of errors. A lower HTER signifies improved model performance,
indicating a more balanced approach that mitigates the risks of incorrectly
accepting genuine faces as fraudulent and wrongly rejecting fraudulent faces
as genuine. This makes HTER a crucial metric for evaluating the robust-
ness and reliability of anti-spoofing systems, which must contend with the
challenges of distinguishing between real and fake faces to ensure security
12
and accuracy in applications such as face recognition and fraud detection.
By employing both accuracy and HTER, the paper provides a well-rounded
evaluation of its algorithms, addressing the specific requirements and trade-
offs in face anti-spoofing scenarios.
13
14
Chapter 4
15
formance in anti-spoofing. Let’s delve into a more detailed explanation of
the findings:
4.2.2 Anti-Spoofing
Table 2 shows a comparison of Half Total Error Rate (HTER) between the
LBASResnet50 model and other in vivo detection models, emphasizing its
performance in anti-spoofing. Here’s a further explanation:
16
The LBASResnet50 model is effective in face anti-spoofing, which is the
task of distinguishing between genuine faces and spoof attempts (e.g., photos
or masks). The lower HTER values in Table 2 indicate that this model excels
in anti-spoofing measures. This success can be attributed to its ability to
capture subtle cues and features that differentiate real faces from fraudulent
ones.
In summary, the LBASResnet50 model achieves superior performance in
real-time face recognition due to its illumination-robust LBP algorithm, tem-
poral awareness through BiLSTM, and feature enhancement via the SENet
mechanism. It also demonstrates strong capabilities in anti-spoofing, which
is a critical aspect of modern face recognition systems, ensuring the model’s
reliability in various practical applications, including security and access
control.
17
b) Addition of BiLSTM: The incorporation of the Bi-directional Long
Short-Term Memory (BiLSTM) module enhances the network’s ability to
capture temporal dependencies between frames in video data, which is cru-
cial for real-time recognition. It contributes positively to accuracy.
c) Addition of SENet: The inclusion of the Squeeze-and-Excitation Net-
work (SENet) module improves feature extraction, enhancing the model’s
capability to capture and emphasize important features. This contributes
to a further boost in accuracy.
d) Addition of SPP: The Spatial Pyramid Pooling (SPP) module intro-
duces spatial information handling, making the network more robust and
adaptable to varying spatial resolutions in input images, which also posi-
tively impacts accuracy.
e) Combined BiLSTM, SENet, and SPP: The final algorithm includes
all components, combining the strengths of BiLSTM, SENet, and SPP with
ResNet50. The results demonstrate that this comprehensive configuration
outperforms the other ablation settings, achieving better recognition accu-
racy and showing strong generalization ability across the three datasets.
18
capabilities using a camera and blink detection. Let’s break down the key
points and their implications:
Recognition on NUAA Dataset: Figure 17 illustrates the recognition
results on a randomly selected image from the NUAA dataset. In (a),
LBASResnet50 correctly classifies a real face, demonstrating its ability to
effectively identify genuine faces. In (b), it correctly identifies a false face,
showcasing its capability to discern fake or spoofed faces.
This is a crucial achievement, as it validates the model’s proficiency in
distinguishing real and fake faces, which is a fundamental requirement for
security and authentication applications.
Real-time Recognition Using a Camera and Blink Detection: In Figure
18, LBASResnet50’s real-time recognition performance is evaluated using a
camera, along with additional blink detection, to further enhance security
measures. The results for real and fake faces are presented as (a) and (b)
respectively.
(a) shows the successful recognition of a real face, indicating that the
model can reliably authenticate individuals in real-time scenarios, such as
access control or identity verification.
(b) demonstrates the model’s ability to detect fake faces, which is partic-
ularly important in countering various spoofing attempts, including the use
of photos or videos. The capability to differentiate between real and non-
real faces is a critical feature for ensuring the system’s robustness against
fraudulent access.
19
20
Chapter 5
Conclusion
This paper proposes a real-time face detection method based on blink de-
tection called LBASResnet50 to solve the problems of illumination and ex-
pression changes in the process of real-time face recognition. The model
takes ResNet50 as the basic network structure and sends the texture fea-
tures extracted by the LBP algorithm into the basic network to improve
the tolerance to illumination in the recognition process. Then by adding
BiLSTM to obtain context information, it is convenient to extract time se-
ries features, so as to improve the accuracy of real-time recognition. At
the same time, the channel attention mechanism is added to extract key
feature information and assign important weights, and SPP pooling is used
to improve the robustness of the model. Finally, the real face is judged
by eye blink detection. The experimental results indicate that the method
proposed in this paper has a good effect on the accuracy of anti-spoofing
realtime face recognition. Due to the different structures of paper, electronic
device screens and real faces, the facial images acquired by cameras differ in
brightness and illumination information. In the next research, we will con-
sider efficiently separating brightness and reflected light features from RGB
images to further improve model performance. In addition, we will consider
applying sparse representation to deep learning based on face recognition.
21
References
[1] Z. Wei, C. Gang, H. Gang, and Y. Shi, “Real-time face recognition sys-
tem based on Gabor wavelet and LBPH,” Comput. Technol. Develop.,
vol. 29, no. 3, pp. 47–50, 2019.
[4] H. Qi, Y. Shi, X. Mu, and M. Hou, “Knowledge granularity for con-
tinuous parameters,” IEEE Access, vol. 9, pp. 89432–89438, 2021.
22