Professional Documents
Culture Documents
Batch, Batch, Batch
Batch, Batch, Batch
Matthias Wloka
What Is a Batch?
Every DrawIndexedPrimitive() is a batch
Submits n number of triangles to GPU Same render state applies to all tris in batch SetState calls prior to Draw are part of batch
Lots of guesses
Changing state inefficient on GPUs (WRONG) GPU triangle start-up costs (WRONG) OS kernel transitions (WRONG)
triangles/batch
Optimization Opportunities
100 90 80 million triangles/s 70 60 50 40 30 20 10 0
10 30 50 70 90 110 130 150 170 190 300 500 700 900 1100 1300 1500
Athlon XP 2.7+; NVIDIA GeForce FX 5800 Athlon XP 2.7+; NVIDIA GeForce4 Ti 4600 Athlon XP 2.7+; NVIDIA GeForce3 Ti 500 Athlon XP 2.7+; NVIDIA GeForce4 MX 440 Athlon XP 2.7+; NVIDIA GeForce2 MX/MX 400
>100x
40x
triangles/batch
triangles/batch
CPU-Limited?
Then performance results only depend on
How fast the CPU is
Not GPU
CPU processes draw calls (and SetStates), i.e., batches Lets graph batches/s!
fast CPU
slow CPU
batch-size: triangles/batch
fast CPU Two distinct bands, corresponding to different CPU speeds slow CPU
batch-size: triangles/batch
fast CPU Straight horizontal lines: batches/s independent of number of triangles per batch batch-size: triangles/batch
slow CPU
fast CPU Different GPUs perform similarly; slight variations due to different driver paths
slow CPU
batch-size: triangles/batch
~170k batches/s
Athlon XP 2.7+
Athlon XP 2.7+; NVIDIA GeForceFX 5800 Ultra Athlon XP 2.7+; NVIDIA GeForce4 Ti 4600 Athlon XP 2.7+; NVIDIA GeForce3 Ti 500 Athlon XP 2.7+; NVIDIA GeForce4 MX 440 Athlon XP 2.7+; NVIDIA GeForce2 MX/MX 400 1GHz Pentium 3; NVIDIA GeForceFX 5800 Ultra 1GHz Pentium 3; NVIDIA GeForce4 Ti 4600 1GHz Pentium 3; NVIDIA GeForce3 Ti 500 1GHz Pentium 3; NVIDIA GeForce4 MX 440 1GHz Pentium 3; NVIDIA GeForce2 MX/MX 400 1GHz Pentium 3; Radeon 9700/9500 SERIES
batches/s
x ~2.7
~60k batches/s
1GHz Pentium 3
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 triangles/batch
1GHz Pentium 3; NVIDIA GeForce4 Ti 4600; OpenGL 1GHz Pentium 3; NVIDIA GeForce4 Ti 4600; Direct3D
batches/s
OpenGL
x 1.7-2.3
Direct3D Direct3D
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 triangles/batch
CPU Limited?
Yes, at < 130 tris/batch (avg) you are
completely, utterly, totally, 100%
CPU limited!
Nvidia is optimizing drivers, but Submitting X batches: O(X) work for CPU
CPU (game, runtime, driver) processes batch Can reduce constants but not order O()
CPU MHz
GPU MTris GPU 32-bit AA Fill GPU GFlops CPU MHz 5000 4000 3000
0
TNT2 GeforceFX GeForce GeForce2 GeForce2 Ultra GeForce3 GeForce3 GeForce4 Ti Riva TNT Ti
2H97 1H98 2H98 1H99 2H99 1H00 2H00 1H01 2H01 1H02 2H02
Avg. 18month CPU Speedup: 2.2 Avg. 18month GPU Speedup: 3.0-3.7
I am a donut! Ask not how many tris/batch, but rather how many batches/frame! You get X batches per frame, depending on:
Target CPU spec Desired frame-rate How much % CPU available for submitting batches
What is X?
25k batches/s @ 100% 1 GHz CPU
Target: 30fps; 2GHz CPU; 20% (0.2) Draw/SetState:
X = 333 batches/frame
Or
Game is CPU bound, but dont care because you budgeted your CPU ahead of time, right? GPU idle (available for upping visual quality)
To improve batching
Making GPU work, so CPU does NOT Overall effect: faster game
Acknowledgements
Many thanks to Gary McTaggart, Valve Jay Patel, Blizzard Tom Gambill, NCSoft Scott Brown, NetDevil Guillermo Garcia-Sampedro, PopTop