Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

GPU Technology Conference 2017

VR Rendering Improvements Featuring


Autodesk VRED
Michael Nikelsky Ingo Esser
Sr. Principal Engineer, Autodesk Sr. Engineer, Developer Technology, NVIDIA

© 2017 Autodesk
NVIDIA VRWorks
at a glance
AGENDA Autodesk VRED
VR Rendering Improvements

2
NVIDIA VRWORKS
Comprehensive SDK for VR Developers
GRAPHICS HEADSET TOUCH & PHYSICS AUDIO

PROFESSIONAL
VIDEO

3
NVIDIA VRWORKS
Comprehensive SDK for VR Developers
GRAPHICS HEADSET TOUCH & PHYSICS AUDIO

PROFESSIONAL
VIDEO

4
GRAPHICS PIPELINE
VR Workloads

124M Pix/s Preprocessing


1080

N vertices
60 Hz

3x Geometric
1920 Pipeline

Rasterization
~3.6x
Fragment Shader
457M Pix/s
1680

2N vertices
90 Hz
Postprocessing
5
1512 1512
NVIDIA VRWORKS
Comprehensive SDK for VR Developers
GRAPHICS HEADSET TOUCH & PHYSICS AUDIO

PROFESSIONAL
VIDEO

6
SINGLE PASS STEREO
Traditional Rendering

Render eyes separately


Doubles CPU and GPU load

7
SINGLE PASS STEREO
Using SPS to improve rendering performance

Single Pass Stereo uses Simultaneous Multi-Projection architecture


Draw geometry only once
Vertex/Geometry stage runs once
Outputs two positions for left/right
Only rasterization is performed per-view

8
SINGLE PASS STEREO
OpenGL

In OpenGL via GL_NV_stereo_view_rendering


Create texture array
for rendering left and right eye simultaneously
No other changes needed, shaders perform SPS

9
SINGLE PASS STEREO
Vertex Shader

Calculate projection space position


proj_pos = proj * view * model * inPosition;

Output both positions via different builtin variables, only x component may differ
gl_Position = proj_pos + vec4(offset, 0, 0, 0);
gl_SecondaryPositionNV = proj_pos – vec4(offset, 0, 0, 0);
Use declaration and value of gl_Layer to route output to layers 0 and 1 of tex array
layout(secondary_view_offset=1) out highp int gl_Layer;
gl_Layer = 0; 10
GRAPHICS PIPELINE
Single Pass Stereo Performance Results

Preprocessing
Single Pass Stereo brings benefits
in geometry bound scenarios
Heavy fragment shaders will reduce scaling SPS Geometric
Pipeline

Rasterization
Fragment Shader

Postprocessing
11
NVIDIA VRWORKS
Comprehensive SDK for VR Developers
GRAPHICS HEADSET TOUCH & PHYSICS AUDIO

PROFESSIONAL
VIDEO

12
HMD OPTICS
Countering Lens Distortion

Displayed Image Optics User’s View


13
HMD RENDERING
Oversampling near the borders

Rendered Image Displayed Image


14
LENS MATCHED SHADING
Four Viewports

Original Image LMS Image


15
LENS MATCHED SHADING
OpenGL

Viewport 0
In OpenGL via GL_NV_clip_space_w_scaling extension
Scissor 0
Set up four viewports, rendering full resolution
Set scissors to each quadrant
glScissorArray(0, 4, scissors);

W scaling parameters
glViewportPositionWScaleNV(i, Wx, Wy);

16
LENS MATCHED SHADING
Shaders

Viewport 0
gl_ViewportMask[0] controls broadcasting
of vertices and primitives Scissor 0

Inefficient – set mask in vertex shader


gl_ViewportMask[0] = 15;

More efficient – filter in pass through geometry shader


Determine quadrant(s) for each primitive
Set bit(s) in gl_ViewportMask[0]

17
LENS MATCHED SHADING
Scaling and Unscaling

HMD runtime can‘t consume w warped images yet, need to unscale before submit
w/2, h/2

1 Quadrant 0
𝑠𝑐𝑎𝑙𝑒 = 𝑃
1− 𝑤𝑥 ∗𝑃′ 𝑥 − 𝑤𝑦 ∗𝑃′ 𝑦

𝑃′ = 𝑠𝑐𝑎𝑙𝑒 ∗ 𝑃 𝑠𝑐𝑎𝑙𝑒
1
𝑢𝑛𝑠𝑐𝑎𝑙𝑒 = 𝑢𝑛𝑠𝑐𝑎𝑙𝑒
1+ 𝑤𝑥 ∗𝑃𝑥 + 𝑤𝑦 ∗𝑃𝑦

𝑃 = 𝑢𝑛𝑠𝑐𝑎𝑙𝑒 ∗ 𝑃′ 𝑃′

0,0 18
LENS MATCHED SHADING
Extreme example, Wx = 2.0 Wy = 2.0

19
LENS MATCHED SHADING
Extreme example, Wx = 2.0 Wy = 2.0

20
GRAPHICS PIPELINE
Lens Matched Shading Results

Preprocessing
LMS can improve performance of
Raster / Fragment stage
Trade-off between quality and performance SPS Geometric
Pipeline

Rasterization
LMS
Fragment Shader

Postprocessing
21
NVIDIA VRWORKS
Comprehensive SDK for VR Developers
GRAPHICS HEADSET TOUCH & PHYSICS AUDIO

PROFESSIONAL
VIDEO

22
HMD RENDERING
VR SLI functionality

VR SLI HMD rendering


Prepare scene GL allocations & uploads are broadcast

Upload left view data to GPU0


Separate data upload
Upload right view data to GPU1
Render scene on both GPUs GL render calls are broadcast

Transfer texture Efficient texture copies

Submit to HMD

23
VR SLI
Updates between NVX and NV extensions

Command & data broadcast


BufferSubData to specific GPU
CopyImageSubData & CopyBufferSubData
GPU-GPU Framebuffer Blit
Global barrier & directed sync functions
GPU Masks
Per-GPU sample locations
Per-GPU queries 24
VR SLI
Broadcast allocations & uploads
tex0 tex1
Left view data

Geometry
Parameters
Textures
tex0 tex1
Right view data

25
VR SLI
Broadcast allocations & uploads

for( auto i = 0; i < 2; ++i )


{
sceneData.viewMatrix = view[i];
sceneData.viewProjMatrix = proj[i] * view[i]; Per-eye data

glMulticastBufferSubDataNV (

1<<i, GPU Mask


sceneUbo, Same UBO
0, sizeof(SceneData), &sceneData Different data
);
}

26
VR SLI
Broadcast render commands
tex0 tex1
Application sends draw commands only once
Commands are broadcast between GPUs

Render

tex0 tex1

27
VR SLI
Broadcast render commands
tex0 tex1
L

glBindFramebuffer( ...,
renderFBO
);

glFramebufferTexture2D( ..., tex0 tex1


tex0, tex0 on both GPUs
0 );
R
render(); render on both GPUs

28
VR SLI
Texture transfer
tex0 tex1
Copy function allows direct copy between GPUs
L R
Avoids CPU copy, transfer directly via PCIe
glMulticastWaitSyncNV(
GPU 1 wait for GPU 0
0, GPUMASK_1 );
(Target is ready)
glMulticastCopyImageSubDataNV(
1, 1<<0,
copy tex0 @ GPU 1
tex0, ...,
to tex1 @ GPU0
tex0 tex1
tex1, ...,
width, height, 1);
GPU 0 wait for GPU 1
R
glMulticastWaitSyncNV(
1, GPUMASK_0 ); (Copy is done)

29
GRAPHICS PIPELINE
VR SLI Results

Preprocessing
VR SLI covers a wide variety of workloads
Perfect load balancing between
left/right eye and two GPUs SPS Geometric
Pipeline
Copy overhead and view independent
workloads limit scaling VR SLI
Rasterization
Some pre- and postprocessing LMS
Fragment Shader
can be distributed

Postprocessing
30
TRY IT OUT!

NVIDIA VRWorks SDK provides OpenGL, Direct3D & Vulkan samples


developer.nvidia.com/vrworks
Extensions
www.khronos.org/registry/OpenGL/extensions/NV/NV_stereo_view_rendering.txt

www.khronos.org/registry/OpenGL/extensions/NV/NV_clip_space_w_scaling.txt

www.khronos.org/registry/OpenGL/extensions/NV/NV_gpu_multicast.txt

31
NVIDIA VRWorks
at a glance
AGENDA Autodesk VRED
VR Rendering Improvements

32
Safe harbor statement

We may make statements regarding planned or future development efforts for our
existing or new products and services. These statements are not intended to be a
promise or guarantee of future availability of products, services or features but
merely reflect our current plans and based on factors currently known to us. These
planned and future development efforts may change without notice. Purchasing
decisions should not be made based upon reliance on these statements.
These statements are being made as of May, 9th 2017 and we assume no obligation
to update these forward-looking statements to reflect events that occur or
circumstances that exist or change after the date on which they were made. If this
presentation is reviewed after May, 9th 2017, these statements may no longer
contain current or accurate information.
Autodesk VRED Professional

▪ Visualization and virtual


prototyping tool
▪ Focus on Automotive
▪ High Quality OpenGL and
raytracing rendering
▪ VR support
▪ Powerwalls, Cave
▪ Oculus Rift
▪ HTC Vive
Image courtesy of Porsche AG
Requirements

▪ Engineering Datasets
▪ 30-70M triangles inside
the view frustum
▪ 3-5k meshes
▪ 10-20k scenegraph nodes
▪ 100-300 materials
▪ Realistic appearence
▪ Measured materials
▪ No data reduction possible
Image courtesy of Porsche AG
Single Pass Stereo

▪ Render to layered texture


▪ Use latest drivers
▪ Don´t write to individual layers
▪ Adjust Frustum culling to account for both eyes
▪ Setup uniform buffers with matrices for both eyes
▪ Set layout(secondary_view_offset = 1) out int gl_Layer
▪ Use gl_Layer to access correct matrices for shading
▪ Write gl_SecondaryPositionNV in vertex or geometry shader
Lens Matched Shading

▪ Not yet available in VRED


▪ Divide view into 4 quadrants
▪ Set lens coefficients for each
quadrant
▪ Setup scissor masks for each
viewport
▪ Render to all viewports
▪ Unproject the distortion
Lens Matched Shading

▪ Need to avoid rendering outside


the visible area
▪ glWindowRectanglesEXT
▪ Hidden Area Mesh
▪ Need to calculate which
viewports a triangle touches
▪ Use pass through geometry
shader for best performance
▪ Requires different shader for
each geometry type
Datasets used for testing

▪ Small Dataset
▪ ~5.5 Mtriangles, ~900 meshes, 2.5k nodes
▪ Medium Dataset
▪ ~34 Mtriangles, ~3k meshes, 19k nodes
▪ Large Dataset
▪ ~63 Mtriangles, ~5k meshes, 17k nodes
▪ Measurements done using
▪ 2 Quadro P6000
▪ 4x Multisampling + Pixelfilter
▪ HTC Vive
Results
30,0 29,0
Small Dataset
Frametime Milliseconds

Medium Dataset
22,5 23,0 Large Dataset

20,0
16,6 17,1
15,3 15,0 15,0

10,0 9,5
10,0 8,1 7,8

0,0
Baseline Single Pass Lens Matched LMS + SPS
Stereo Shading
Occlusion Culling

▪ Shader based occlusion culling


▪ https://github.com/nvpro-samples/gl_occlusion_culling
▪ Algorithm
▪ Render all geometries visible in the previous frame
▪ Disable Color and Depth writes
▪ Rasterize bounding boxes of all geometries
▪ Record visible bounding boxes
▪ Read back results
▪ Render remaining visible geometries
Why the readback?

▪ Original algorithm relies on bindless buffers and textures


▪ Requires custom memory management
▪ Few buffers shared by many objects
▪ Difficult to handle out of memory scenarios
▪ CPU does not know what is visible
▪ Requires binding of all shaders and geometries twice
▪ Binding costs can eliminate performance gains
▪ Sorted rendering difficult
Occlusion Culling results
30,0
Frametime Milliseconds

Small Dataset
Medium Dataset
22,5 Large Dataset

20,0 17,9
16,6 17,1
14,5
13,3

10,0
10,0
7,0
5,5

0,0
Baseline Occlusion Culling LMS + SPS + Occlusion
Culling
VR SLI Rendering

▪ For details see GTC 2016 talk: „Integrating VR SLI into Autodesk
VRED“
▪ Use one GPU per eye
▪ Bind rendersurface
▪ Setup Camera Buffer for both eyes
▪ Render the scene
▪ Copy rendersurface from GPU1 to GPU0
▪ Submit rendersurfaces to HMD
▪ New NV_gpu_multicast extension allows more flexibility
▪ Occlusion Culling
VR SLI results
30,0
Small Dataset
Frametime Milliseconds

Medium Dataset
22,5 Large Dataset

20,0
16,6

11,2
10,0
10,0 8,9
5,8 6,0 5,8 5,6 5,6 5,7 6,0

0,0
Baseline SLI SLI + Culling SLI + LMS +
Culling
Conclusion and final thoughts

▪ Using extensions can greatly improve performance


▪ Not every extension always works
▪ Test out different options
▪ SLI still the best option
▪ Asynchronous Reprojection/Timewarp helps a lot
Autodesk and the Autodesk logo are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders.
Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document.
© 2017 Autodesk. All rights reserved.

You might also like