GDC2006 - C++ For Next Gen Consoles

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

C++ on Next-Gen Consoles:

Effective Code for New


Architectures
Pete Isensee
Development Manager
Microsoft Game Technology Group
Last Year at GDC
 Chris Hecker ranted
 What did he say?
 Programmers: danger ahead
 Out-of-order execution: good
 In-order execution: bad
 Microsoft and Sony are going to screw you
 You are so hosed. Game over, man.

 “There’s absolutely nothing you can do


about this”
Console Hardware Architectures
 Optimized to do floating-point math
 Optimized for multithreaded tasks
 Optimized to run games
 Not optimized to run general purpose code
 Not optimized to do branch prediction, code
reordering, instruction pipelining or other
out-of-order magic
 Large L2 caches
 Large latencies
We’re Game Programmers.
We Love Challenges.
 We will make games on these consoles
 The solution is not assembly language
 The solution is to tailor our C/C++ engines,
inner loops and bottleneck functions to the
realities of the hardware
 Remember: C++ code can make or break
your game’s performance
Not Covering
 Profiling (do it)
 Multithreading (do it)
 Memory allocation (avoid in game loop)
 Compiler settings (experiment)
 Exception handling (avoid it)
Topics for Today
 Thinking about L2
 Optimize memory access
 Use CPU caches effectively

 Thinking about in-order processing


 Avoid function call overhead
 Tips for efficient math
 Avoid hidden C++ inefficiencies
Optimize Memory Access
 Proverb: thou shalt treat memory as if it were
thy hard drive
 You will be memory-bound on new consoles
 Recommendations
 Never read from the same place twice in a frame
 Read data sequentially
 Write data sequentially
 Use everything you read
Minimize Data Passes
 Game frame loops often access data twice
 Or three times
 Or more

 Optimize for a single pass


 Consider less frequent operations
 AI
 Physics, collision
Multiple Pass
 Networking
Architecture
 Particle systems
Pointer Aliasing Explained
void init( float *a, const float *b ) {
a[0] = 1.0f - *b;
a[1] = 1.0f - *b;
}

Nominal case 0.0 0.0 1.0


1.0 0.0
b a
Worst case
float a[2]={0.0f}; 0.0 0.0
1.0
init( a, &a[0] ); a
b
A Solution: Restrict
 Restrict keyword tells the compiler there’s no
aliasing
 Restrict permits the compiler to generate much
more efficient code

void init( float* __restrict a,


const float* __restrict b ) {
a[0] = 1.0f - *b; // compiler can do
a[1] = 1.0f - *b; // the right thing
}
What to Restrict
 Use restrict widely
 Function pointer parameters
 Local pointers
 Pointers in structs/classes
 But not:
 Function return types
 Casts
 Global pointers (maybe)
 References (maybe)
Use the CPU Caches Effectively
 The L2 cache is your best friend
 Using the cache well is an art
 Ensure you have a good profiler by your side
Keep the Working Set Small
 Pack commonly used data together
 Frequently used data might deserve its own
struct/class
 Keep rarely used data separate
 Example: texture file names
 Consider bitfields
 Bitfields are extremely efficient on PowerPC
 Consider other forms of lossless
compression
Inefficient Structs Are Bad Mojo
struct InefficientCar {
bool manual; // padding here
wheel wheels[8]; // 8 wheels?
bool convertible; // more pad
char engine; // 4 bits used
char file[32]; // rarely used
double maxAccel; // double?
};
sizeof(InefficientCar) = 80
Carefully Design Structures
struct EfficientCar {
wheel wheels[4]; // 4 wheels
wheel *moreWheels;
char *file; // stored elsewhere
float maxAccel; // float
unsigned engine:4; // bitfields
unsigned manual:1;
unsigned convertible:1;
};
sizeof(EfficientCar) = 32
Choose the Right Container
 Prefer contiguous containers
 Or at least mostly contiguous
 Examples: array, vector, deque

 Avoid node-based containers


 List, set/map, binary trees, hash tables
 If you must use a tree, consider a custom
allocator for memory locality
 Vector + std::sort is often faster (and
smaller) than set or map or hash tables, by
an order of magnitude
Avoid Function Call Overhead
 Function call overhead was a surprising
cause of performance issues on Xbox
 The same is true on Xbox 360 and PS3
 Fortunately, there are lots of solutions
 Research compiler settings. On Xbox 360:
 Inline “any suitable”
 Enable link-time code generation

 Spend time ensuring the compiler is inlining


the right things
Avoid Virtual Functions
 Weigh the limitations of virtual functions
 Adds a branch instruction
 Branch is always mispredicted
 Compiler is limited in how it can optimize

 Consider replacing
 virtual void Draw() = 0;
 With
 Xbox360.cpp: void Draw() { ... }
 Windows.cpp: void Draw() { ... }
 PS3.cpp: void Draw() { ... }
Maximize Leaf Functions
 Leaf functions don’t call other functions, ever
 If a potential leaf function calls another
function, the high-level function:
 Is much less likely to be inlined
 Must set up a stack frame
 Must set up registers

 Potential solutions
 Remove the inner function completely
 Inline the inner function
 Provide two versions of the outer function
Unroll Inner Loops
 Compiler can’t unroll loops where n is variable
 Even unrolling from ++i to i+=4 can be a
significant gain
 Eliminates three branch instructions
 Increases opportunity for code scheduling

 Don’t forget to hoist invariants out, too


Example Unrolling
// original
for( i=a.beg(); i!=a.end(); ++i )
process(i);

// unrolled
e = a.end();
for( i=a.beg(); i!=e; i+=4 ) {
process(i); process(i+1);
process(i+2); process(i+3);
}
Pass Native Types by Value
 Tradition says that “large” types are passed
by pointer or reference, but be careful
 New consoles have really large registers
 Native types include
 64-bit int (__int64)
 VMX vector (__vector4) – 128 bits!

 Pass structs by pointer or reference


 One exception: pass structs consisting of bitfields
<= 64 bits by value
Know Data Type Performance
 int32 and int64 have equivalent perf
 float and double have equivalent perf
 int8 and int16 are slower than int
 They generate extra instructions
 High bits cleared or sign-extended
 Example: int32 adds 2X faster than int16 adds
 Recommendations
 Store as smallest type required
 Load into int32, int64 or double for calculations
Use Native Vector Types
 In CS 101, you learned to create abstract
data types, such as matrices
typedef std::vector<float,4> vec;
typedef std::vector<vec,4> matrix;

 This code is an abomination


 At least on Xbox 360 and PS3
 Xbox 360 and PS3 have dedicated vector
math units called VMX units
 Use them!
Your Math Buddies
 __vector4 (4 32-bit floats; 128-bit register)
 XMVECTOR (typedef for vector4)
 XMMATRIX (array of 4 vector4s)
 XMVECTOR operators (+,-,*,/)
 Hundreds of XMVECTOR and XMMATRIX
functions
 Xbox 360-specific, but similar constructs in
PS3 compilers
Avoid Floating-Point Branches
 FP branches are slow
 Cache has to be flushed
 ~10X slower than int branches

 Avoid loops with float test


expressions
 Eliminate altogether if possible
 Can be faster to calculate values
you won’t use!
 Compare integers instead
 Replace with fsel when possible
 10-20X performance gain
The fsel Option in Detail
 Definition of hardware implementation:
float fsel(float a, float b, float c)
{
return ( a < 0.0f ) ? b : c;
}

 You can replace expressions like


 v = ( w < x ) ? y : z; // slow
 With faster expressions like
 v = fsel( w - x, y, z ); // turbo
Prefer Platform-Specific Funcs
 The C runtime (CRT) is not usually the best
option when performance matters
 Xbox 360 examples
 Prefer CreateFile to fopen or C++ streams
 Options for asynchronous reads and other goodness
 Prefer XMemCpy to memcpy
 2-6X faster
 Prefer XMemSet to memset
 8-14X faster
Avoid Hidden C++ Inefficiencies
 C++ rocks the house!
 C++ can bring your game to its knees!
 Consider these innocuous snippets

 Quaternion q;
 s.push_back( k );
 if( (float)i > f )
 obj->Draw();
 GameObject arr[1000];
 a = b + c;
 i++;
C++ is Dangerous
 With power comes responsibility
 Beware constructors
 Is initialization the right thing to do?
 Beware hidden allocations
 Conversion casts may have significant cost
 Use virtual functions with care
 Beware overloaded operators
 Stick to known idioms
 Operator++ should be a constant-time operation.
 Really.
Summary
 There absolutely are many things you can
do to efficiently program next-gen consoles
 Two key issues: L2/memory and in-order
processing
 Treat memory as you would a hard disk
 Watch out for those branches; use tricks like fsel

 Prefer a light C++ touch


What’s Next
 Our games are only as good as the weakest
member of the team
 Share what you’ve learned
 “The sharing of ideas allows us to stand on
one another’s shoulders instead of on one
another’s feet” – Jim Warren
Questions
 pkisensee@msn.com
 Fill out your feedback forms

You might also like