Professional Documents
Culture Documents
Presentation On A Deep Learning Approach To Learn Lip Sync From Audio
Presentation On A Deep Learning Approach To Learn Lip Sync From Audio
Presentation On A Deep Learning Approach To Learn Lip Sync From Audio
Pre-defense
2
Introduction
The advancement in new information and technology, the aspects of media from the
few past years making an epoch-making change to the new era of audio and visual
things with the popularity and essentiality of generating creative media contents.
Pre-defense
3
Motivation
• Our approach is based on synthesizing video from audio in the region
around the mouth, and using compositing techniques to borrow the rest
of the head and torso from other stock video.
Pre-defense
4
Objectives
• Generating photorealistic mouth texture preserves fine detail in the
lips and teeth, and reproduces time-varying wrinkles and dimples
around the mouth and chin.
Pre-defense
5
Methodology
Our novel Wav2Lip model produces significantly more accurate lip-
synchronization in dynamic, unconstrained talking face videos. Quantitative
metrics indicate that the lip-sync in our generated videos are almost as good
as real-synced videos.
Pre-defense
6
Outcome
Pre-defense
7
Reference
References:
[1] A. Jamaludin, J. S. Chung and A. Zisserman, "You said that?: Synthesising talking faces from
audio," International Journal of Computer Vision, vol. 127, no. 11-12, pp. 1767-1779, 2019.
[2] Y. Chen, W. Gao, Z. Wang, J. Miao and D. Jiang, "Mining audio/visual database for speech driven
face animation," in 2001 IEEE International Conference on Systems, Man and Cybernetics. e-
Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236), 2001.
[3] T. Karras, T. Aila, S. Laine, A. Herva and J. Lehtinen, "Audio-driven facial animation by joint end-to-
end learning of pose and emotion," ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1-12,
2017
Pre-defense
8