Professional Documents
Culture Documents
Team15 Dreamfusion
Team15 Dreamfusion
USING 2D DIFFUSION
[Poole et al., ICLR 2023]
Team 15
20190156 Yun Kim
20190063 Ki Nam Kim
https://forums.fast.ai/t/new-paper-upainting-unified-text-to-image-diffusion-generation-with-cross-modal-guidance/101669
Q. Can we train Text-to-3D model directly using text - 3D object pair dataset?
https://forums.fast.ai/t/new-paper-upainting-unified-text-to-image-diffusion-generation-with-cross-modal-guidance/101669
≪
Text-3D pair dataset (800K) Text-Image pair dataset (5B)
https://paperswithcode.com/dataset/laion-5b
https://blog.allenai.org/objaverse-a-universe-of-annotated-3d-objects-718ef3d61fd6
≪
Text-3D pair dataset (800K) Text-Image pair dataset (5B)
Background: NeRF
Background: NeRF
Issue using NeRF in text-to-3D
• Need multiple images from various perspectives to train NeRF.
• However in text-to-3D, we don’t have ground truth images, only have a single
text. → We can’t train NeRF in general way
Q. Then how can we train NeRF without ground truth images, but using only single text?
Render
image
NeRF
Text prompt
“A yellow lego bulldozer”
Render
image Input SDS Loss
Text-to-Image diffusion (contains information on
NeRF model how to adjust the rendered
image to align with the pro-
vided text)
Text prompt
“A yellow lego bulldozer”
Render
image Input SDS Loss
Text-to-Image diffusion (contains information on
NeRF model how to adjust the rendered
image to align with the pro-
vided text)
Optimize NeRF
Backpropagation
Text prompt
“A yellow lego bulldozer”
Render
image Input SDS Loss
(contains information on
NeRF how to adjust the rendered
image to align with the pro-
vided text)
(
Text-to-Image dif-
NeRF
fusion model
(
Text-to-Image dif-
NeRF
fusion model
(
Text-to-Image dif-
NeRF
fusion model
(
Text-to-Image dif-
NeRF
fusion model
𝑥𝑇 𝑥 𝑇 −1 𝑥3 𝑥2 𝑥1 𝑥0
Text prompt Text prompt Text prompt Text prompt
𝑥𝑇 𝑥 𝑇 −1 𝑥3 𝑥2 𝑥1 𝑥0
Text prompt Text prompt Text prompt Text prompt
Denoise
Key point !
Diffusion model doesn’t predict denoised image directly,
but it predicts noise first and subtract to to denoise image
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF
NeRF
𝑥0
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF
NeRF
𝑥0
2. Generate random noise
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF 3. Select random denoise
timestep t
t = random(1, T)
NeRF
𝑥0
2. Generate random noise
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF 3. Select random denoise
timestep t
t = random(1, T)
𝑥0
2. Generate random noise
𝑥𝑡
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF 3. Select random denoise
timestep t
t = random(0, T)
𝑥0
Text-to-Image diffu-
sion model
2. Generate random noise
𝑥𝑡 ^𝑡
𝜖
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss
1. Render image from NeRF 3. Select random denoise
timestep t
t = random(0, T)
𝑥0
Text-to-Image diffu-
sion model
2. Generate random noise
𝜖
( update direction that tells NeRF
how to render better image )
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss 1
1. Render image from NeRF 3. Select random denoise
timestep t
t = random(0, T)
𝑥0
Text-to-Image diffu-
sion model
2. Generate random noise
rop
𝜖 bac
kp
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss 1 2
rop
𝜖 bac
kp
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss 1 2 3
3 backprop t = random(0, T)
backprop
4. Add noise to make noisy
“A yellow lego bulldozer”
NeRF image at timestep t 5. Predict noise
2
𝑥0 Text-to-Image diffu-
sion model
2. Generate random noise
rop
𝜖 bac
kp
Dreamfusion Pipeline
Step-by-step how NeRF is optimized using SDS loss 1 2 3
𝑥0 Text-to-Image diffu-
sion model
2. Generate random noise
rop
𝜖 bac
kp
𝑥0 Text-to-Image diffu-
sion model
2. Generate random noise
rop
𝜖 bac
kp
CLIP Encoder
Result: Examples
Limitation
• SDS loss is not a perfect loss function. It often produces oversmoothed results.
• Dreamfusion uses 64x64 Imagen model so the image resolution is limited to 64x64.
Oversmoothed example
(64x64)
Limitation
Janus Problem
DreamFusion approximates the view direction by categorizing angles into four rough categories, which
are “overhead”, “front”, “side”, and “back”.
However, this method can lead to issues, such as the occurrence of multiple features
(e.g., faces, eyes) at different angles.
Contribution
Papers inspired by Dreamfusion
- As the originator of using a 2D diffusion model to create 3D objects, this methodology led to many
subsequent studies that improved SDS loss, resulting in better text-to-3D models.
- This approach offers a revolutionary methodology for solving 3D-related tasks not by relying on scarce
3D data, but by utilizing abundant 2D data alone.
Thank You
Team 15
20190156 Yun Kim
20190063 Ki Nam Kim
Quiz
https://forms.gle/HG67Nz3DrawxLVkq7
Team 15
20190156 Yun Kim
20190063 Ki Nam Kim