Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Deep fake Detection using cascade CNN-LSTM-FCNs to identify

AIaltered video based on eye state se

Approach:

Fig 1. Approach for Deep fake detection using Eye State


Sequence
Eye Localization:

During the pre-processing phase, the video is parsed into individual frames, and each face within
these frames is meticulously analyzed. Recognized faces undergo a subsequent alignment
process to ensure uniformity in eye orientation and direction. Following alignment, the eyes are
cropped and their pixel values are saved for further processing.

Blink Detection System:

The performance procedures of the blink detection system involve processing sequences. These
sequences include the cropped eye regions obtained from Stage 1's eye localization. These
cropped eye regions, saved as RGB images, serve as input data sequences. A pre-trained
Convolutional Neural Network (CNN) is employed to extract spatial information from each eye.
Furthermore, the extracted features are fed into a Long Short-Term Memory (LSTM) network to
perform additional feature extraction.

Fig 2. Approach for building Blink detection System


Blink Probability Dataset:

In the Blink Detection System, we collected probability values indicating the likelihood of
blinking, aligning with the frame rate of the input video. Each video sequence underwent
modifications in terms of frame rate and duration, impacting the associated probability sequence.
To use blinking patterns as a valid feature for system training, it was crucial to standardize the
frame rate. We converted the frame rate to 50 frames per second, ensuring consistency.

If the input sequence was too long, we trimmed it using a sliding window approach,
maintaining a specific length. The sliding windows were set at 4.5 seconds, resulting in 225 eye-
state probability values in each sequence, providing ample opportunity for at least one blink.

The Blink Pattern Dataset (BPD) was then created, marking authentic videos with 0 and
tampered ones with 1. Each input had 225 features, representing the eye-state probabilities for
4.5 seconds. These probabilities ranged from 0 (eye fully open) to 1 (eye fully closed). The
dataset was then utilized for preparing, validating, and testing the study models in the final stage
of the process.

Fig 3. Approach for creating Blink Probability Dataset


Deep fake Classifier:

The study introduces a Fully Connected Network (FCN) for effective feature discrimination and
classification prediction. Operations involve a forward pass with individual weights for each
connected neuron, and backpropagation updates using adaptive moment estimation. Adaptive
gradients adjust the learning rate during training. Cross-entropy loss guides error calculation. The
FCN processes sequences representing eye blinking states and undergoes testing with different
architectures. Three experiments explore varying parameters. Inputs are batch-fed in each epoch,
and model accuracy is assessed. The study emphasizes efficient layer combinations and optimal
training settings for the FCN's performance

Image Processing :

To transform video datasets into image frames, I employed the Dlib library. I utilized a pre-existing shape
predictor to estimate 68 facial landmarks https://github.com/italojs/facial-landmarks-
recognition/blob/master/shape_predictor_68_face_landmarks.dat , which facilitated the extraction of key
facial features in the process.

Figure 1 .68 Landmarks in Dlib library


Landmarks count Respective point on face
0 to 16 Outer Jaw line
48 to 60 Inner Jaw line
17 to 21 Right Eyebrow
22 to 26 Left Eyebrow
27 to 35 Nose
36 to 41 Right Eye
42 to 47 Left Eye
48 to 67 Mouth

Table 1 . Dlib Landmarks and their respective point on face

Figure 2. Real Landmarks detection Fig 3. Applied Dlid landmarks on images in the
using Dlib dataset
Fig 4. Flowchart of Image Processing

Step-by-Step Guide to Utilizing Dlib Library for Extracting Facial


Landmarks from Video Datasets

Step – 1 : For Image Processing we need to import required libraries

 The below code imports Cv2 and Dlib for pre processing of image and Videos and
importing OS to interact with the operating system, allowing your Python scripts to
perform various tasks related to file and directory manipulation, environment variables,
and system commands.
Step-2 : Face detection using landmarks

 In this procedural step, the input video undergoes a transformation process wherein
individual frames are extracted. This extraction is facilitated by the detection of facial
landmarks within each frame. Subsequently, the script leverages these landmarks to
precisely crop the frames, thereby eliminating extraneous background elements. The
resultant cropped frames, tailored to the identified landmarks, are systematically stored in
a designated folder. This preliminary stage of preprocessing serves to refine the dataset
by isolating and preserving the facial regions of interest.

Input
Video

Landmarks based face Detection Face Alignment

Fig 5. Example of Face Preprocessing using proposed method

The ongoing work involves the cropping of eyes and the subsequent conversion of these eye
images into RGB format. The extracted eye pixels are then saved for further processing.

Fig 4. Example of Left and Right eye


The following is the code snippet for the face detection using landmarks,
The following is the code snippet for the face Alignment ,
References:

1. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0278989
2. https://www.youtube.com/watch?v=SIZNf_Ydplg&t=676s
3. https://www.youtube.com/watch?v=mckNUYfiqr8&t=511s

You might also like