S52163 - 3D by AI Using Generative AI and NeRFs For Building Virtual Worlds, With Q&A in Japanese - 1679417889673001og8r

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

3D by AI: Using Generative AI and NeRFs for Building

Virtual Worlds [S52163]


Gavriel State, Senior Director, Simulation and AI | GTC Online - March 2023
1
NVIDIA Omniverse
USD Based Platform for Creating and Connecting Virtual Worlds

USD Physics Materials Path-Tracing AI

2
Omniverse: 3D Workflows and Tools for Every Industry

ARCHITECTURE, ENGINEERING, AND MEDIA, ENTERTAINMENT, AND GAME PRODUCT DESIGN AND MANUFACTURING
CONSTRUCTION DEVELOPMENT

SCIENTIFIC VISUALIZATION ROBOTICS AUTONOMOUS VEHICLES

3
Virtual Worlds Need Content
Without it we just have a blank slate

4
Traditional Approach: Manual Artist-Created Assets
Very labour intensive – impossible to scale!

5
AI can Help Everywhere

Lecture hall with green chairs

Asset Creation Behavior & Animation World Capture + Augmentation


6
Neural Radiance Fields (NeRFs) introduced in 2020
NeRFs are an astonishing new method for capturing reality

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis 7


Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020.
What is a NeRF exactly?
Light field learned by a network from 2D inputs

Note: Lighting for the scene


is baked into the NeRF
during training!

8
How do we render NERFs?
Evaluate the network using a set of input rays from the camera

Once trained, a NeRF can


be rendered from arbitrary
new camera positions

9
2022: NVIDIA Instant NGP – Real Time Nerfs
TIME Magazine Named NVIDIA Instant NeRF a Best Invention of 2022

The original neural radiance fields paper from 2020 took many With Instant NGP, NeRFs train in seconds, and render at real
hours to train a model and rendered at 0.03fps time rates (30 fps or faster)

Thomas Muller, Alex Evans, Christopher Schied, and Alex Keller

10
Real-time NeRF rendering
Real-time NeRF rendering
Instant NGP NeRFs in VR
With Editing!

13
Limitations of today are tomorrow’s applied research
Over 140 research papers related to NeRFs in 2022 alone

Eikonal Fields for Refractive Novel-View Synthesis


Bemana et al SIGGRAPH 2022

Decomposing NeRF for Editing via NeRFReN Neural Radiance Fields with Reflections NeSF: Neural Semantic Fields for Generalizable Semantic
Feature Field Distillation Guo et al CVPR 2022 Segmentation of 3D Scenes
Kobayashi et al NeurIPS 2022 Vora, Radwan, et al TMLR 2022 14
NeRF capture tools beginning to appear broadly
Luma AI iPhone Capture Tool Example

15
NeRF Environments + NeRF Object
Two ways to use NeRF in a virtual world

16
NeRF Objects In Omniverse
Development Prototype

17
NeRF Rendering
Depth Based Composition + Proxy Mesh

Reflection from Proxy

Shadow from Proxy

+ =
Inferenced NeRF

RTX Renderer RGB + Composited Final


Depth Output Output 18
Standardization for Production

+ NeRF

19
3D Capture with Traditional Asset Formats

20
Forward (Traditional) 3D Rendering

Mesh

Renderer Output

Material 21
Differentiable Rendering

Mesh

Renderer Output

NVDiffRast: https://nvlabs.github.io/nvdiffrast/
DIB-R: https://nv-tlabs.github.io/DIB-R/

Material 22
Differentiable Rendering – Inverse Graphics

Mesh

Renderer Output

Material 23
Jun Gao
3/11/23
24

3D MoMa / nvdiffrec

Diff render

End-to-end Training
Joint optimization

Extracting Triangular 3D Models, Materials, and Lighting From Images


Munkberg, Hasselgren, Shen, Gao, Chen, Evans, Muller, Fidler
CVPR 2022 (Oral)

24
Learning Example

25
Real World Capture

26
3D Capture from a Single Image

27
GANverse 3D
Generative Adversarial Networks + Differentiable Rendering

loss mask

Neural network

loss
Differentiable
rendering
pipeline
loss
Photo

Explicit geometry,
Texture & Lighting

28
Training data: images only! No 3D
need
e d!

•Car: 1,422,984
•Van: 2,674,607
•Bus: 4,799,402
•Building: 1,638,425
•Bicycle: 3,071,418
•Tricycle: 111,341
•Traffic sign: 150,647
•Dog: 1,731,929
•Human: 3,276,718
•Pedestrian: 659,542
•Person: 507,506
•Skater: 2,454,728
•Skateboard: 1,073,733

29
GANverse3D: 3D Content Creation from Images

• 3D (mesh) • texture • part map


30
GANverse3D Omniverse Extension

31
Digression: Latent Spaces / Embeddings

32
DeepSearch
AI Search of Nucleus Server

33
DeepSearch
Latent Space Visualization

34
GET3D
3D Asset Creation

35
GET3D

Inference Training

Geometry latents Real / Fake


Geometry Feature

Texture latents Real / Fake


Texture Feature
Mapping Networks Textured Mesh

Differentiable Rendering GAN

36
GET3D
Fine grained interpolation

37
GET3D
Locally perturbing the latent codes: creating content variations

38
3D Content Creation: text to 3D

39
Back to 2D: Diffusion Models
Picasso / Edify-Image

A highly detailed digital painting of a portal A highly detailed zoomed-in digital painting An image of a beautiful landscape of an
in a mystic forest with many beautiful trees. of a cat dressed as a witch wearing a wizard ocean. There is a huge rock in the middle of
A person is standing in front of the portal. hat in a haunted house, artstation. the ocean. There is a mountain in the
background. Sun is setting.

Diffusion Process

Pure Noise Final Image

40
Diffusion Models
Text Generation with Edify-Image

A photo of a golden retriever puppy wearing a green shirt. The shirt has text that says “NVIDIA
rocks”.
Background office. 4k dslr.

Stable Diffusion DALL·E 2 Edify-Image 41


AI Based Texture + Material Generation
Using 2D Generative Models for 3D Applications

43
AI Based Texture + Material Generation
Wrapping Generated Materials around a Model

“Black and yellow wrap ” “Cheetah-inspired style”

44
AI Based Texture + Material Generation
Wrapping Generated Materials around a Model

45
Bringing it all Together
Time for some real magic

Text Input for Latent Space Embedding

Diffusion Process

Pure Noise Final Image

Diffusion Models

Instant-NGP NeRFs + Hashgrid Differentiable Rendering 46


Text to 3D Asset Generation
Magic 3D

47
Text to 3D Asset Generation
Picasso / Edify-3D

48
AI for Behaviour
ASE: Adversarial Skill Embedding

49
Character animation

50
Isaac Gym: Reinforcement Learning with GPU Parallel simulation

Ant Locomotion Balancing Bot

Environment
Environment
Environment
Environments

Observation, Actions vector


Reward vectors Cartpole Humanoid Locomotion

Agents

51
PhysX 5
Open Source USD Physics Implementation

52
Isaac Gym
Synthetic Physics Data – RL Training
Training in Simulation in Simulation

Wheeled Quadruped Locomotion Dexterous Object Manipulation

53
Isaac Gym
Trained policy transfer to real robots

Wheeled Quadruped Locomotion Dexterous Object Manipulation

54
Character Animation AI: ASE

Latent space that represents


reusable motion “skills”

<latexit sha1_base64="5lU0WcLcAnfKmSoFSvyEPmgxzb0=">AAAB7nicjVDLSgNBEOyJrxhfUY9eBoPgKexGQY9BLx4jmAckS5idzCZDZmeXmV4hLPkILx4U8er3ePNvnDwOKgoWNBRV3XR3hamSFj3vgxRWVtfWN4qbpa3tnd298v5ByyaZ4aLJE5WYTsisUFKLJkpUopMaweJQiXY4vp757XthrEz0HU5SEcRsqGUkOUMntfNeGFE77ZcrftWbg/5NKrBEo19+7w0SnsVCI1fM2q7vpRjkzKDkSkxLvcyKlPExG4quo5rFwgb5/NwpPXHKgEaJcaWRztWvEzmLrZ3EoeuMGY7sT28m/uZ1M4wug1zqNEOh+WJRlCmKCZ39TgfSCI5q4gjjRrpbKR8xwzi6hEr/C6FVq/pn1drteaV+tYyjCEdwDKfgwwXU4QYa0AQOY3iAJ3gmKXkkL+R10Vogy5lD+Aby9gktJI92</latexit>
s

Task policy “Strike a target”

g <latexit sha1_base64="AQhANFdmLaqKbyl4T5b/phnwnzI=">AAAB7nicjVDLSgNBEOzxGeMr6tHLYBA8hd0o6DHoxWME84BkCbOT3mTI7OwyMyvEJR/hxYMiXv0eb/6Nk8dBRcGChqKqm+6uMJXCWM/7IEvLK6tr64WN4ubW9s5uaW+/aZJMc2zwRCa6HTKDUihsWGEltlONLA4ltsLR1dRv3aE2IlG3dpxiELOBEpHgzDqplXfDiN5PeqWyX/FmoH+TMixQ75Xeu/2EZzEqyyUzpuN7qQ1ypq3gEifFbmYwZXzEBthxVLEYTZDPzp3QY6f0aZRoV8rSmfp1ImexMeM4dJ0xs0Pz05uKv3mdzEYXQS5UmllUfL4oyiS1CZ3+TvtCI7dy7AjjWrhbKR8yzbh1CRX/F0KzWvFPK9Wbs3LtchFHAQ7hCE7Ah3OowTXUoQEcRvAAT/BMUvJIXsjrvHWJLGYO4BvI2yc3x499</latexit>
z
<latexit sha1_base64="WphnGE5ViZLFqiba0l0aGHD7FNY=">AAAB6HicjVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GTduDioIPBh7vzTAzL0gE18Z1P5zCyura+kZxs7S1vbO7V94/aOs4VQxbLBax6gZUo+ASW4Ybgd1EIY0CgZ1gcp37nXtUmsfy1kwT9CM6kjzkjBorNUeDcsWrunOQv0kFlmgMyu/9YczSCKVhgmrd89zE+BlVhjOBs1I/1ZhQNqEj7FkqaYTaz+aHzsiJVYYkjJUtachc/TqR0UjraRTYzoiasf7p5eJvXi814aWfcZmkBiVbLApTQUxM8q/JkCtkRkwtoUxxeythY6ooMzab0v9CaNeq3lm11jyv1K+WcRThCI7hFDy4gDrcQANawADhAZ7g2blzHp0X53XRWnCWM4fwDc7bJ9IojPI=</latexit>

<latexit sha1_base64="5lU0WcLcAnfKmSoFSvyEPmgxzb0=">AAAB7nicjVDLSgNBEOyJrxhfUY9eBoPgKexGQY9BLx4jmAckS5idzCZDZmeXmV4hLPkILx4U8er3ePNvnDwOKgoWNBRV3XR3hamSFj3vgxRWVtfWN4qbpa3tnd298v5ByyaZ4aLJE5WYTsisUFKLJkpUopMaweJQiXY4vp757XthrEz0HU5SEcRsqGUkOUMntfNeGFE77ZcrftWbg/5NKrBEo19+7w0SnsVCI1fM2q7vpRjkzKDkSkxLvcyKlPExG4quo5rFwgb5/NwpPXHKgEaJcaWRztWvEzmLrZ3EoeuMGY7sT28m/uZ1M4wug1zqNEOh+WJRlCmKCZ39TgfSCI5q4gjjRrpbKR8xwzi6hEr/C6FVq/pn1drteaV+tYyjCEdwDKfgwwXU4QYa0AQOY3iAJ3gmKXkkL+R10Vogy5lD+Aby9gktJI92</latexit>
s
55
Character Animation AI: ASE
“Real” data: Mocap motion clips

Dataset:
187 Clips
Example Motion Clips
30 Minutes 56
Large-Scale Training

57
Large-Scale Training

58
Large-Scale Training

59
Large-Scale Training

60
Game AI (Tennis) Learned from Broadcast videos

Video Data Motion Physics


Reconstruction Simulation

61
Game AI (Tennis) Learned from Broadcast videos

62
63

You might also like