Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 2

Literature Review

• [1] Wu et al. introduce an innovative approach that bridges the gap between
static image models and dynamic video generation. By harnessing the
capabilities of image diffusion models, the authors demonstrate how a single-
shot tuning process can effectively adapt these models to generate videos from
textual prompts.
[1] Tune-A-Video

Noising

nt
Com poral
pone
Tem
Classifier Free
“A Dog is eating Guidance
Burger”
Prompt
Denoising

“A rabbit is eating Output Video


Watermelon on the table.” 3D – UNET
Architecture
Input Video +
Caption
Figure 1: [1] Tune-A-Video Architecture
24/04/2024 6
Literature Review

• [2] Zhao et al. "ControlVideo" presents a method for incorporating conditional control within the
realm of one-shot text-to-video editing. The research introduces a framework that allows for dynamic
adjustments to video content based on textual descriptions, enabling precise manipulation of videos
with minimal input.

• [3] Songwei et al. tackle the challenge of preserving temporal consistency in generated videos. The
research introduces a noise prior specifically designed for video diffusion models, aimed at
enhancing the coherence and stability of video frames across time.

24/04/2024 7

You might also like