Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Deepfake Buster: A Deep Learning-Based Approach to Detect Propaganda Videos and Images

Astitva Narain, Janhvi Tripathi, Kaushiki Tiwari


Department of Computer Science and Engineering Raj Kumar Goel Institute
of Technology and Management, AKTU Uttar Pradesh India
Ms. Tanvi Payal
Assistant Professor, Department of Computer Science and Engineering
Raj Kumar Goel Institute of Technology and Management ,AKTU Uttar Pradesh India
ABSTRACT typically falls into two categories: expression and identity
manipulation. In expression manipulation, the facial
The surge in artificial intelligence has led to the
expressions of one individual are superimposed onto
emergence of deepfake videos, presenting a
another in real-time, while identity manipulation involves
significant societal threat due to their potential
swapping the faces of two individuals. This latter form of
for misuse in various domains such as social
manipulation is particularly concerning as it can be used
and political manipulation. These videos,
to spread false information by replacing the face of a well-
created through advanced AI techniques,
known figure.
fabricate realistic images and sounds by
analyzing multiple angles of a target individual
to mimic their behavior and speech. Detecting
deepfakes is increasingly challenging as
creators develop more sophisticated The paper presents methodologies for detecting deepfakes, employing
techniques. To counter this, a proposed model neural networks like
employs deep learning to scrutinize video Convolutional Neural Network (CNN) and various image
frames for inconsistencies in facial features, preprocessing techniques. Ultimately, the goal is to distinguish
compression rates, and other telltale signs of genuine images from fake ones. This is accomplished by training the
manipulation. This model utilizes a model on a dataset and utilizing a suitable CNN architecture to
convolutional neural network (CNN) with classify images as either real or fake.
transfer learning, trained on the "DeepFake"
dataset, specifically curated for deepfake The algorithm works in the following three steps-
forensics. The neural network is trained to • Compilation of a comprehensive dataset comprising both
identify discrepancies around facial regions, authentic and synthetic videos, subsequently converted into
indicative of deepfake manipulation. The frames.
paper details methods to enhance the model's
learning process, emphasizing the importance • Extraction and alignment of facial features, followed by
of robust digital media forensics, particularly hidden feature extraction using a pre-trained Convolutional
in face forensics. Neural Network (e.g., VGG16 model or OxfordNet).

Keywords: convolutional neural networks, • Development of a model trained on the processed dataset,
deepfake which classifies images as real or fake, followed by post-
processing for video analysis.
1. INTRODUCTION
Today's era is dominated by artificial
intelligence and machine learning, the 2. DEEPFAKE CREATION
fabrication of data has become all too The landscape of deepfake creation is vast, with an array of tools
common. This distortion of reality poses a available online catering to users of varying technical proficiency.
significant challenge as fake images and From novices to experts, anyone can now generate deepfakes
videos proliferate across the internet. effortlessly, regardless of their technical knowledge. These deepfake
Deepfakes, powered by advanced technology, videos vary widely in quality, ranging from basic, easily identifiable
are becoming increasingly indistinguishable fabrications to highly sophisticated manipulations that defy
from genuine content, making them difficult to detection even by astute observers.
detect. While deepfake technology may offer
certain benefits, it also brings about significant At the core of deepfake creation lie artificial intelligence and deep
harm. learning methods. One commonly employed technique involves
leveraging a specific type of convolutional neural network known as
In a time marked by rampant misinformation, an autoencoder. An autoencoder works by compressing input images
the danger of accepting deepfake content at through dimension reduction and image compression, and then
face value cannot be overstated. These reconstructing them using a decoder. Notably, the autoencoder
manipulated videos and images can be used for operates as a self-supervised algorithm, employing targets within its
various malicious purposes, including training process. An upgrade to this method is Generative
defamation of celebrities, political Adversarial Network, an unsupervised deep learning algorithm,
manipulation, personal attacks, intimidation, which further improves the quality of deepfake created. A Generative
propaganda, piracy, and other nefarious Adversarial Network (GAN) comprises two neural networks: the
activities. generator and the discriminator
One of the primary targets for deepfake
manipulation is the human face. Many
algorithms and techniques have been
developed to detect facial manipulation, which
analysis target various parameters such as facial warping
artifacts, blinking rates, and head movements. In 2018,
"MesoNet" was introduced, utilizing the Inception model
to detect faults at a mesoscopic level. Convolutional Neural
Networks (CNNs) have demonstrated exceptional feature
extraction capabilities,

serving as a cornerstone for models aiming to identify deepfake


videos.
Additionally, several approaches have combined CNNs with other
learning models such as Recurrent Neural Networks (RNNs), Long
Figure 1: Deepfake Generation using auto encoder-
Short-Term Memory Networks (LSTMs), and Capsule Networks to
decoder enhance accuracy by detecting temporal discrepancies. These
techniques have shown promising results, particularly on datasets
The generator's role is to produce images closely containing videos generated by Face Swap and other deepfake
resembling real ones, while the discriminator is methods. Despite significant advancements, the quest for more
trained on a dataset containing both authentic robust models to detect lower quality deepfakes persists.
and generated images to distinguish between Maintaining considerable accuracy remains a challenge, especially
them. As training progresses, the generator given the enhancements in deepfake creation techniques. This paper
generates images that are increasingly endeavors to discuss successful feature extraction and processing
challenging for the discriminator to differentiate methods essential for identifying discrepancies in deepfake videos.
from real ones. Through this iterative process,
both networks improve and learn from each 4. DEEPFAKE DETECTION
other, resulting in the generation of higher
Deepfakes employ specialized techniques that typically alter
quality deepfakes.
specific areas on the face, serving as a foundation for
superimposition. While these algorithms follow a similar process
Deep Convolutional GANs (DCGANs) further for generating different deepfakes, they often leave behind
enhance this process by leveraging discernible discrepancies during the editing phase. Key factors such
convolutional layers for improved efficiency. as compression changes, variations in lighting, and temporal
An individual's face in an existing image can be inconsistencies like lip and eye movements can be specifically
replaced with someone else's face by overlaying targeted to train models to identify deepfake videos.
the latter onto the former using artificial neural
networks. Additionally, techniques like Among the proposed methods for deepfake detection,
FaceSwap enable the swapping of faces in Convolutional Neural Networks (CNNs) have gained considerable
manipulated images and videos. This method traction. CNNs exhibit remarkable efficacy and scalability for tasks
utilizes image compression to seamlessly involving image and video processing compared to other supervised
integrate the superimposed face into the original learning methods in artificial intelligence. CNNs can extract
image, ensuring color matching for a more features from images, which can be applied to various applications.
realistic appearance. With advancements in By combining feature extraction through CNNs with other
artificial intelligence, technologies can even supervised learning techniques, more precise models for deepfake
dynamically replace expressions from one detection can be developed.
person to another in real-time.
For practical deployment, transfer learning represents a viable
approach for detecting deepfakes. Transfer learning involves
The initial foray into deepfake creation began
leveraging knowledge gained from pre-trained models and applying
with FakeApp, an application enabling users to
it to related tasks, facilitating the development of more robust and
seamlessly swap faces with those of others.
efficient deepfake detection systems.
Since then, numerous similar applications have
emerged, including Face Swap, DeepFaceLab, Utilizing pre-trained weights of neural networks involves training a
DFaker, FaceSwap-GAN, DeepFake-tf, and refined version of the same network on a different dataset, tailored
various others. These applications have for a specific application.
proliferated over time, providing users with Several pre-trained models, such as VGG Net, Xception, Inception,
increasingly sophisticated tools for generating and ResNet, have been made available as opensource resources. A
deepfakes. proposed approach involves fine-tuning a model trained on a
pretrained convolutional network like VGG Net, particularly for
image analysis focusing on human faces.
3. RELATED WORK This fine-tuned model can leverage the learned features
from the pre-trained network while adapting to the nuances
This section aims to explore and evaluate of the new dataset, enhancing its effectiveness for face-
different techniques utilized for detecting related applications.
deepfakes, many of which rely on deep learning
Tools like Keras for Python offer convenient functionalities
methodologies. These approaches are focused
for implementing neural networks, especially for transfer
on identifying anomalies within the video or
learning. In this approach, the model utilizes pre-processed
individual frames. Methods employing image
frames (images) from the original dataset and applies
transfer learning using a fine-tuned VGG Net efficient learning. Maintaining an optimal learning rate, typically
model for deepfake detection. around 0.001, is crucial for successful feature extraction and model
training.
Transfer learning is then implemented on this model for a
preprocessed dataset. The image is passed through neural network that
assesses the image and extracts simple features from it. These features
are then checked for anomalies at the pixel level that are introduced
while creating the fake image like lossy compression scheme, artifacts
introduced during image warping and subtle colour changes. These
areas of discrepancies can be highlighted by the use of dlib library.
Figure 2: Process flow for deepfake detection
Following training, the results obtained for all frames within a training
4.1. Dataset video undergo post-processing, where video analysis is conducted.
The Celeb-DF dataset comprises 408 authentic
videos and 795 synthesized videos generated
using an altered Deepfake generation 4.3 Fake Video Detection
algorithm. These videos have an average The Fake Image detection model is then applied to each individual
duration of 13 seconds and operate at a frame frame within the video. The insights obtained from each frame are
rate of 30fps. Unlike previous datasets, the aggregated to draw a conclusive assessment about the video's class.
synthesized videos in Celeb-DF exhibit lower Subsequently, the weights in the network are updated based on this
visual artifacts, resulting in higher video collective information. This approach offers a simpler implementation
quality. This dataset presents a more compared to processing videos directly within the neural network
challenging scenario for deepfake detection itself.
compared to previous datasets, as it contains
lower quality deepfakes, thereby raising the Adjusting the parameter indicating the number of frames to be used
difficulty level of the problem. can be done by monitoring the model's learning process.

4.2 Proposed Method During testing, any given video can be processed frame by frame,
and the predictions for each frame are utilized to derive the final
This paper introduces a method for training a classification. This video processing approach enhances accuracy by
classifier using video frames as input. The incorporating various versions of similar inputs into the dataset fed
frames undergo face extraction and alignment to the neural network. Moreover, different image transformation
before being fed into the classifier for training. operations like zooming, flipping, and slight rotation enrich the
dataset, as the frame's output class remains consistent even after
Prior to training the model, the dataset
these transformations.
undergoes preprocessing, which includes face
alignment and extraction. The proposed model Integrating the feature extraction and pre-processing model with
focuses on detecting faults introduced during models designed to detect temporal features, such as Recurrent
deepfake creation around the outline of the face. Neural Networks (RNNs), can further improve accuracy.
Thus, face extraction is employed to isolate the
area requiring processing, while face alignment
accommodates variations in head positions 5. RESULT ANALYSIS
within the deepfake video.
The model described in the paper achieves an accuracy of
The proposed classifier is based on a fine-tuned approximately 70% . Figure 3 illustrates the plot of the categorical
convolutional model trained on the cross-entropy loss function value against the accuracy for the model
preprocessed dataset. It utilizes a VGG-16 trained over 20 epochs. As depicted, with each epoch, the loss
model as its base, supplemented with batch diminishes while accuracy ascends.
normalization, dropout, and a custom twonode
dense layer. The final dense layer comprises two
nodes representing the two classes (real and
fake). Batch normalization normalizes and
scales inputs from the previous layer, while
dropout reduces overfitting and aids in weight
optimization by randomly deactivating nodes
during each epoch. This introduces randomness
into the training process, enhancing model
robustness.

Additionally, the Adam Optimizer is utilized for its superior


performance compared to other optimizers such as
AdaGrad and RMSProp. Figure 3: Loss vs. Accuracy curve for 20 epochs
Adam Optimizer combines the benefits of RMSProp with
the inclusion of a momentum parameter, resulting in
Enhancing the model's learning can be The authors anticipate that the techniques outlined for deepfake
achieved by increasing the number of epochs analysis will catalyze further exploration in image and video forgery
for more comprehensive training. and digital media forensics.
The model demonstrates relatively good
accuracy over a dataset comprising low-
resolution images. However, learning from 7. REFERENCES
low-resolution images poses challenges, and
further efforts are required to construct a high-
resolution dataset for deepfakes. 1. Karen Simonyan and Andrew Zisserman, “Very Deep
Another challenge arises in the form of Convolutional Networks for Large –Scale Image Recognition”,
compression. While signature styles for ICLR 2015, arXiv:1409.1556v6 [cs.CV] 10 Apr
compression in deepfakes can aid in training 2015
the model, compression may still introduce
2. 107 California Law Review (2019, Forthcoming); U of Texas
errors in learning. Therefore, techniques for
Law, Public Law Research Paper No. 692; U of Maryland Legal
temporal analysis are necessary to address this
issue effectively. Studies Research Paper No. 2018-21 3. Yuezun Li, and SiweiLyu,
“Exposing DeepFake Videos By Detecting Face Warping
Artifacts”, arXiv:1811.00656v3
[cs.CV] 22
6. DISCUSSIONS, CONCLUSIONS AND
May 2019
FUTURE RESEARCH DIRECTIONS
https://doi.org/10.1016/S09694765(19)30137-7 4. DariusAfchar,
DeepFake detection is a major need in today’s
Vincent Nozick, Junichi Yamagishi and Isao Echizen, “MesoNet: a
world and needs considerable detection
techniques as detecting deepfakes will become Compact Facial Video Forgery Detection Network”,
more challenging in the future. As deepfakes arXiv:1809.00888v1 [cs.CV] 4 Sep
can have major social and political impact 2018 https://doi.org/10.1109/WIFS.2018.8630761
improvements should be made continuously in
its detection techniques. 5. Xin Yang ,Yuezun Li and SiweiLyu, “Exposing Deep Fakes
Using Inconsistent Head Poses”, ICASSP 2019 - 2019 IEEE
In this paper, the proposed method uses transfer ICASSP,17 May 2019
learning on VGG-16 model to train the dataset
and focus on facial manipulation for detection 6. Schwartz, Oscar (12 November 2018). "You thought fake news
of forgery. Transfer learning is essential as the was bad? Deep fakes are where the truth goes to die". The
model should be trained in a considerable Guardian.
amount of time and should require minimum
7. https://medium.com/@sh.tsang/reviewinception-v3-1st-
resources to give the desired accuracy for its
classification over varied examples in the runner-up-imageclassification-in-ilsvrc-2015- 17915421f77c.
dataset. The proposed model works well and can 8. Brian Dolhansky, Russ Howes, Ben Pflaum,
successfully gather features required for further
Nicole Baram, Cristian Canton Ferrer, “The Deepfake Detection
processing to test for deepfakes.
Challenge (DFDC) Preview Dataset”, arXiv:1910.08854 [cs.CV], 19
To enhance performance, future research could Oct 2019.
focus on detecting temporal and audio
discrepancies, integrating this information with
features extracted from the image processing
module.
9. PavelKorshunov and Sebastien
Marcel,
“Vulnerability assessment and detection of Deepfake
videos”, IAPR International Conference 2019.
16.ThanhThi Nguyen, Cuong M. Nguyen, Dung
Tien
The proposed model's accuracy tends to decrease with low Nguyen, DucThanh Nguyen, SaeidNahavandi,
quality images, and additional efforts are needed to increase
“Deep Learning for Deepfakes Creation and
accuracy further, particularly when dealing with
mediumquality videos. Combined models trained on Detection”, arXiv:1909.11573 [cs.CV], 25 Sep 2019.
temporal parameters could be employed to achieve this.
10. EkraamSabir, Jiaxin Cheng, Ayush Jaiswal,
Improving the dataset quality is crucial for enhancing
training effectiveness. WaelAbdAlmageed, IacopoMasi, Prem Natarajan,
“Recurrent Convolutional Strategies for Face
Ensemble learning techniques offer another avenue for Manipulation Detection in
improving model accuracy and accommodating dataset
Videos”, arXiv:1905.00582 [cs.CV], 2 May 2019.
variance. Aggregating results across each frame and
various learning models can yield optimal outcomes. 11. https://doi.org/10.1109/ICCRE.2019.8724212 1
12. Francesco Marra,Diego Gragnaniello, Davide
Cozzolino,Luisa Verdoliva,“Detection of GAN-
Generated Fake Images Over Social Networks”,
IEEE Conference on Multimedia Information
Processing and Retrieval (MIPR), 10-12 April 2018.

13.. Shubhangi Tirpude1, Naman


Vidyabhanu2, Hashir Sheikh3 ShoebPathan 4, Zeeshan
Ali syed5 , Shivam Singh, “Abnormal X-Ray Detection
System using Convolution Neural Network”,
International Journal of Advanced Trends in Computer
Science and Engineering, ISSN 2278-3091, Volume 9,
No. 1, January-February 2020.

You might also like