Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

64-bit Insider

Volume I, Issue 13, June 2006, Microsoft

The 64-bit Advantage


The computer industry is
Optimization on Windows
changing, and 64-bit technology
is the next, inevitable step. The
64-bit: Part 1 of 3
64-bit Insider newsletter will help
you adopt this technology by Your computer applications can have increased
providing tips and tricks for a resources and scalability, thanks to 64-bit processors
successful port. and operating systems. However, it is still important to
understand and follow the basic principals of software
Development and migration of 64-
bit technology is not as optimization. This issue of the 64-bit Insider newsletter
complicated as the 16-bit to 32-bit is the first of a three-part series that focuses on different
transition. However, as with any aspects of optimization. In this issue, we discuss many
new technology, several areas do of the principals and tools for software optimization as
require close examination and they relate to 64-bit processors. In upcoming issues, we
consideration. The goal of the 64-
bit Insider newsletter is to identify will examine software optimization for multi-core and
potential migration issues and multiprocessor systems and also for specific 64-bit
provide viable, effective solutions processors.
to these issues. With a plethora of
Web sites already focused on 64-
bit technology, the intention of
this newsletter is not to repeat
previously published information.
Instead, it will focus on 64-bit
issues that are somewhat isolated
yet extremely important to
understand. It will also connect
you to reports and findings from
64-bit experts.

64-bit Insider Newsletter


Volume 1, Issue 13
Page 1/7
What is Optimization?
First, we should clarify what we mean by
optimization. Optimization means to improve the
efficiency of your application. That can mean
building an application that minimizes the use of
some set of computer resources. For example, one
that uses as little RAM as possible, or runs as fast
as possible. In some cases, optimization may also
relate to network bandwidth or hard disk space.

Software optimization techniques can require changes to any level of your application—
from the high-level architecture of a multi-component system, to the algorithms you use
to implement small, functional units, down to the specific machine-code instructions
you use to execute simple statements. Although it can be a mistake to focus too much
on optimization when functionality is still immature, optimization should never be too
far from your mind. Commercial software and custom solutions routinely include
performance benchmarks in their requirements specifications.

Optimizing your Application


There are three levels at which you can improve the efficiency of your 32-bit or 64-bit
applications.
1. Enhance hardware (add memory/add processors)

2. Make code modifications

3. Use compiler options.

Enhancing Hardware
Adding processors is the fastest way to gain short-term performance increases. But this
option only works if the processor is the cause of the bottleneck in your application—
this source of the problem is not always the case. However, assuming that your
algorithm is optimal and all the other alternatives discussed in this newsletter have been
applied, adding processors is a valid way to increase the performance level of your
application.
“Software optimization A basic axiom of system performance is that Random
techniques can require Access Memory (RAM) storage is more expensive, and
changes to any level of faster, than disk storage. A common technique used to
your application—from the improve performance is to eliminate or reduce disk
high-level architecture of a access by expanding available memory and keeping
multi-component system, to everything stored in RAM. 64-bit Windows® makes this
the algorithms you use to technique feasible by greatly increasing the amount of
implement small functional RAM that is available to an application. This technique
units, down to the specific is a cheap but effective way to speed certain
machine-code instructions applications. For example, databases and Web servers
you use to execute simple can make significant performance gains by moving to
statements.” 64-bit systems that have large amounts of memory.

64-bit Insider Newsletter


Volume 1, Issue 13
Page 2/7
Modifying Your Code
Making code modifications does not necessitate
using a new algorithm or changing the design of
the application. This newsletter assumes that
you are already using an optimal design that
works best in your chosen scenarios. Making
code modifications can also mean to optimize
the application’s use of memory or to use
compiler directives that help the compiler create
code that works better for specific processors. This series of newsletters will provide
several directives that help in this way.

Using Compiler Options


For example, compilers will not always use Single Instruction, Multiple Data (SIMD) in
certain algorithms. This limitation may be due to the complexity of the loops or the
inability of the compiler to guarantee the independence between iterations of the loop
that is required to ensure correct behavior. A suitable compiler directive or restructuring
of the loop may be required to let the compiler know that SIMD will work fine.

In addition to compiler directives or switches, command-line options passed to the


compiler and linker enable you to identify conditions where the compiler can make
further optimizations—conditions that the compiler might not be able to make on its
own. For example, the /fp:fast option tells the compiler to use faster but less-precise
floating point instructions.

Compiler Switches
Table 1 identifies some of the compiler switches related to optimization that are
available in the C++ compilers from Microsoft and Intel.

Table 1 Compiler Switches


Microsoft C++ Intel C/C++ Description
/O1 -O1 Creates the smallest possible code.
/O2 -O2 Creates faster code, possibly increasing size.
/O3 -O3 Creates even faster code (sometimes).
/Oa -fno-alias Assumes no aliasing in the application, and
enables some register and loop optimizations.
/Ow -fno-fnalias Assumes no aliasing between functions, but
aliasing within functions.
/Ob -Ob Controls inline expansion. Inlining functions
reduce function call overhead.
/Og - Combines several types of optimizations.
/Oi /fbuiltin Enables inlining intrinsic functions to replace
some common functions
/Os -Os Optimizes for speed, but favors small code.
/Ot - Favors speed over size.
/Ox - Provides maximum optimization.
/arch -mtune,-mcpu,- Uses Simple Sharing Extensions (SSE) or SSE2
Qx<cpu> instructions.

64-bit Insider Newsletter


Volume 1, Issue 13
Page 3/7
/G5 -mtune,-mcpu,- Favors the Pentium processor.
Qx<cpu>
/G6 -mtune,-mcpu,- Favors the Pentium Pro, II, III, and Pentium 4
Qx<cpu> processors.
/G7 -mtune,-mcpu,- Favors Pentium 4 and AMD Athlon processors.
Qx<cpu>
/fp:fast -fp-modal Enables more aggressive optimizations on
floating-point data, possibly sacrificing precision.
/GL -Qipo Yields whole program optimization / Inter-
procedural optimization.
/PGI,/PGO /prof-gen Gives profile-guided optimization.
/favor - Optimizes for either the AMD or Intel 64-bit
processors or for both.

Note Please review the documentation for both compilers to learn the specific details
and associated caveats for each switch. The Intel compiler has additional options for
high-level and floating-point optimization. Please refer to the Intel documentation for
more information about these options.

Understanding Link-Time Code Generation and Profile Guided


Optimization
Confusion frequently surrounds the terminology and differences between Whole
Program Optimization (or Link-Time Code Generation [LTCG]) and Profile-Guided
Optimization (PGO). This section clarifies the differences.

In the Microsoft® Visual Studio® documentation, Whole Program Optimization is also


called LTCG. Confusion usually stems from the fact that PGO is also a type of Whole
Program Optimization and uses LTCG to get its work done. As a first step toward
clarifying both of these, this newsletter will only use the term LTCG—not the less
accurate term, Whole Program Optimization.

Both LTCG and PGO are techniques that allow you to optimize your application
without making changes to your code.

Link-Time Code Generation


LTCG is simply a mechanism by which the generation of machine code (and
accompanying optimization) is being delayed until link time. So strictly speaking, it is
not a form of optimization—LTCG just enables additional optimizations. The classic
compile/link process compiles and optimizes files individually and then links them
together. LTCG enables additional optimizations
because it postpones the optimization steps until link
time when all the object files are available, and they can “LTCG is simply a
all be optimized together. mechanism by which the
generation of machine
As shown in Figure 1, there are two differences between code (and accompanying
the procedures for compiling and linking when optimization) is being
performed with and without LTCG. First, when using delayed until link time.”
LTCG, the optimization step is moved to link time.
Second, the object files generated by the compiler are
64-bit Insider Newsletter
Volume 1, Issue 13
Page 4/7
not in the standard Common Object File Format (COFF). They are in a proprietary
format that can change between versions of the compiler. As a result, programs like
DUMPBIN and EDITBIN do not work with these object files.

The proprietary format allows the optimizer to consider all object files during
optimization. This consideration enables more effective inlining, memory
disambiguation, and other inter-procedural optimizations. Also, the executable can be
better arranged to reduce offsets for things like Thread Local Storage and to reduce
paging in large executables. Refer to the compiler documentation for a full description
of the optimizations that are enabled by LTCG.

C++ C++ C++ C++

Compiler Compiler Compiler Compiler


(/GL) (/GL)
Optimizer Optimizer

OBJ OBJ OBJ OBJ


(COFF) (COFF) (CIL) (CIL)

Linker Linker
(/LTCG)
Optimizer

EXE EXE

Without LTCG With LTCG


Figure 1 Comparison of compiling procedures with and without LTCG

Profile-Guided Optimization
Many optimization techniques are heuristic in nature and many involve trade-offs
between image size and speed. For example, choosing whether to inline a function
depends on how large the function is and how often it is called. Small functions that are
called many times should be inlined. Large functions that are called only once, from a
few locations, should not be inlined. At least, this practice is usually a safe bet. But you
must also consider situations that fall between these two extremes.

Generic algorithms can be used to determine whether or not to inline functions. And,
frequently, it is not clear how often a function is called. For example, the function may
be guarded by a condition. Similar problems exist for branch prediction (which
determines the order of switch and other conditional statements).
64-bit Insider Newsletter
Volume 1, Issue 13
Page 5/7
“PGO provides additional PGO is a technique whereby your program can be
information about your executed with a representative set of data. Your
application’s behavior that program’s behavior will be monitored to determine how
is gathered during a often pieces of code have been executed and how often
runtime analysis of the certain branches have been taken. This technique
application.” produces a profile that the optimizer can use to make
better decisions during optimization.

So, PGO provides additional information about your


application’s behavior that is gathered during a runtime analysis of the application. PGO
provides information that is not available during the static analysis of your application,
alone. LCTG must still be used; however, PGO gives the optimization process even
more information so that better decisions can be made.

PGO requires a three-step approach, which is also highlighted in Figure 2:

1. Compile and link your application to produce an instrumented program version


that gathers information on your application’s behavior at runtime. This step
requires a /GL switch for the compiler and a /LTCG:PGI switch for the linker.

2. Execute your application and feed it data or user input that represents data that
would be expected from a target user. It is important to choose this data
carefully; otherwise, you will be optimizing your program for irrelevant
scenarios.

3. Re-link your application with the /LTCG:PGO switch to re-optimize your


application by using the new information generated in Step 2.

C++ Compiler OBJ Linker Optimizer


(/GL) (CIL) (/LTCG:PGI)

Linker
(/LTCG:PGO)
Instrumented
EXE Optimizer

Profile
Optimized
Sample EXE
data

Figure 2 Profile-Guided Optimization three-step approach

Again, please consult the compiler documentation for a full description of the kinds of
optimizations that can be performed during PGO.
64-bit Insider Newsletter
Volume 1, Issue 13
Page 6/7
Summary
Optimization means to maximize your application’s performance by reducing its use of
expensive or slow resources, and doing so without reducing the application’s ability to
do work. Although spending money on extra hardware can help, appropriate changes to
how your application is written or built can yield substantial performance gains, as well.

Assuming that your algorithms are sound, compiler flags should be the first place you
look to increase your application’s performance. Both LTCG and PGO can be enabled
by using compiler flags and can substantially improve performance without changing a
single line of code.

64-bit Insider Newsletter


Volume 1, Issue 13
Page 7/7

You might also like