Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Performance Optimization

Matthias Wloka
Visual Concepts
Recent Background
Caveats

 Discussion limited to only shaders

 No magic bullets

 All common sense really


Console vs. PC

 Console: fixed configuration


 Tune to that particular machine once…
 At most one version per console

 PC: Ugh!
 Product of:
 IHVs * Chipsets * Clocks * Memories
* Drivers * …
PC: Pick Your Battles

 Group configs into categories


 Three or less seems reasonable
 Five is right out

 Cater one to highest-end:


 Reviewers, marketing, and IHVs like it

 Cater one to low-end:


 Publishers like it
 Probably do not need to fret about quality
PC: No Low-Level Control

 Your shader code is always interpreted


 By driver

 Pulling out the last 5% not worth it


 Driver might (un)do it anyways
 How does it affect the other 1000 configs?
Non-Solution: Adv. Options

 100 sliders adjusting graphics qualities


 Uh, I consider myself a graphics pro:
 Yet, I mostly cannot discern what sliders do
 Or how they possibly influence performance

 Can your mother figure out your sliders?


 Use categories (see above)!
 And resolution
 All other qualities (sensibly!) depend on those
Don’t Worry About
Performance
 Optimize development time instead
 Use high-level languages, etc.

 Optimize quality instead

 Unless you know it is going to be hairy


Use Performance Profilers

 All hardware vendors have them

 Tells you how long each draw/state takes

 Don’t optimize the thing that takes 2%


 Look at the larger picture
Know Your Targets

 Target frame rate?


 Don’t optimize beyond
 Only IHVs care for 80 vs 85Hz

 Quality?
 Camera is an inch from the uniform:
 Does that really have to look perfect?
 Or are you sacrificing general performance/quality
for this non-case?
It’s a Pipeline:
Exactly One Stage Is Bottleneck

Vertex/Index
Data

Vertex Triangle Raster Pixel Raster


Shader Setup Shader Operations

Texture/
Framebuffer
Why Is That Stage Slow?

 Understand that stage

 For example: shader limited by


 Number of instructions/shader units
 Texture fetch (cache and latency issues)
 Cost of input fetches
 Number of registers vs. number of threads
Ensure It Works As Expected

 Real problem may be up- or downstream

 An endless array of little things:


 Zcull, memory lay outs, …

 Let IHV devrel verify your sanity


Optimization Strategies

 Reduce amount of work

 Push work to other stages

 Perform work more efficiently


Reduce Amount of Work

 By far the most effective optimization

 Reduce quantity!

 Reduce quality!
Example: Vertex Shader

 Ask artist to make more efficient model


 I.e., same look for lower cost

 Use less expensive model


 I.e., lower quality
 LOD schemes are alive and well!

 Avoid rendering it at all


 I.e., better culling schemes
Vertex Shader Cont.

 Does the vertex need all its inputs?


 Derive data on the fly
 Compress data

 Throw out computations


 I.e., is your lighting physically-correct?
 Make it incorrect (with little/no loss of quality)
Push Work to Other Stages

 Effective on small and big scale

 Use other stages to reduce work


 Even at ‘unreasonable’ cost!
 Culling

 Use other stages to do the work


Example: Particle Systems

 See Lutz’s earlier talk

 Particles can compute anywhere


 CPU threads,
 Vertex shader,
 Pixel shader…
Example: Vertex Shader

 Use CPU to do more effective culling


 E.g., frustum cull each triangle

 Compute constants/uniforms on CPU


 Shame on you

 Do per-vertex work per pixel


Perform Work More Efficiently

 Use more efficient algorithm

 Fine-tune to max out multiple resources


 I.e., perfectly balance the machine

 Use assembly

 Most satisfying, but sadly least effective


Conclusions: Think Big

 Reduce amount of work


 Reducing quality may be invisible or ok

 Push work to other stages


 Idle CPU threads are fair game

You might also like