Optimization in GCC-1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Optimization in GCC | Linux Journal http://www.linuxjournal.

com/article/7269

Username/Email: Password: Login


Register | Forgot your password?

Optimization in GCC
Jan 26, 2005 By M. Tim Jones (/user/801462)
in

Here's what the O options mean in GCC, why some


optimizations aren't optimal after all and how you can make
specialized optimization choices for your application.

In this article, we explore the optimization levels


provided by the GCC compiler toolchain, including
the specific optimizations provided in each. We
also identify optimizations that require explicit
specifications, including some with architecture
dependencies. This discussion focuses on the 3.2.2 (/issue/131)
version of gcc (released February 2003), but it also
From Issue #131
applies to the current release, 3.3.2.
March 2005 (/issue/131)
Levels of Optimization

Let's first look at how GCC categorizes optimizations and how a developer can
control which are used and, sometimes more important, which are not. A large
variety of optimizations are provided by GCC. Most are categorized into one of three
levels, but some are provided at multiple levels. Some optimizations reduce the size
of the resulting machine code, while others try to create code that is faster,
potentially increasing its size. For completeness, the default optimization level is
zero, which provides no optimization at all. This can be explicitly specified with
option -O or -O0.

Level 1 (-O1)

The purpose of the first level of optimization is to produce an optimized image in a


short amount of time. These optimizations typically don't require significant
amounts of compile time to complete. Level 1 also has two sometimes conflicting
goals. These goals are to reduce the size of the compiled code while increasing its
performance. The set of optimizations provided in -O1 support these goals, in most
cases. These are shown in Table 1 in the column labeled -O1. The first level of
optimization is enabled as:

gcc -O1 -o test test.c

(/files/linuxjournal.com/linuxjournal/articles/072/7269/7269t1.jpg)

Table 1. GCC optimizations and the levels at which they are enabled.

1 of 4 12/28/2010 4:44 PM
Optimization in GCC | Linux Journal http://www.linuxjournal.com/article/7269

Any optimization can be enabled outside of any level simply by specifying its name
with the -f prefix, as:

gcc -fdefer-pop -o test test.c

We also could enable level 1 optimization and then disable any particular
optimization using the -fno- prefix, like this:

gcc -O1 -fno-defer-pop -o test test.c

This command would enable the first level of optimization and then specifically
disable the defer-pop optimization.

Level 2 (-O2)

The second level of optimization performs all other supported optimizations within
the given architecture that do not involve a space-speed trade-off, a balance between
the two objectives. For example, loop unrolling and function inlining, which have the
effect of increasing code size while also potentially making the code faster, are not
performed. The second level is enabled as:

gcc -O2 -o test test.c

Table 1 shows the level -O2 optimizations. The level -O2 optimizations include all of
the -O1 optimizations, plus a large number of others.

Level 2.5 (-Os)

The special optimization level (-Os or size) enables all -O2 optimizations that do not
increase code size; it puts the emphasis on size over speed. This includes all
second-level optimizations, except for the alignment optimizations. The alignment
optimizations skip space to align functions, loops, jumps and labels to an address
that is a multiple of a power of two, in an architecture-dependent manner. Skipping
to these boundaries can increase performance as well as the size of the resulting code
and data spaces; therefore, these particular optimizations are disabled. The size
optimization level is enabled as:

gcc -Os -o test test.c

In gcc 3.2.2, reorder-blocks is enabled at -Os, but in gcc 3.3.2 reorder-blocks is


disabled.

Level 3 (-O3)

The third and highest level enables even more optimizations (Table 1) by putting
emphasis on speed over size. This includes optimizations enabled at -O2 and
rename-register. The optimization inline-functions also is enabled here, which can
increase performance but also can drastically increase the size of the object,
depending upon the functions that are inlined. The third level is enabled as:

gcc -O3 -o test test.c

Although -O3 can produce fast code, the increase in the size of the image can have
adverse effects on its speed. For example, if the size of the image exceeds the size of
the available instruction cache, severe performance penalties can be observed.
Therefore, it may be better simply to compile at -O2 to increase the chances that the
image fits in the instruction cache.

______________________

2 of 4 12/28/2010 4:44 PM
Optimization in GCC | Linux Journal http://www.linuxjournal.com/article/7269

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

hi, follow the "Listing 3. (/article/7269#comment-356795)


Submitted by Anonymous (not verified) on Mon, 10/11/2010 - 20:58.

hi, follow the "Listing 3. Simple Example of gprof" but when using -O or -O2, the profile is "Flat
profile".So how to resoult it?

my step is:
1: gcc -o test_optimization test_optimization.c -pg -march=i386
2: ./test_optimization
3: gprof --no-graph -b ./test_optimization gmon.out
4: the result is:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 factorial


if add -O2 the result is:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated

% cumulative self self total


time seconds seconds calls Ts/call Ts/call name

single optimization flag without level (/article/7269#comment-349787)


Submitted by Anonymous on Sun, 03/21/2010 - 17:36.

Any optimization can be enabled outside of any level simply by specifying its name with the -f prefix, as:
gcc -fdefer-pop -o test test.c

In current versions of GCC it is incorrect ( http://gcc.gnu.org/wiki/FAQ#optimization-options


(http://gcc.gnu.org/wiki/FAQ#optimization-options) ). Single optimization flag without optimization level doesn't work.
I don't know what about old versions.

gcc 4.2.3 vs visual c 2005 (/article/7269#comment-321862)


Submitted by nanjil (not verified) on Thu, 05/08/2008 - 16:52.

hello:
I just compiled a code under gcc cygwin and visual c 2005 in a lpatop with dula core intel processor.

The debuggable gcc code was about 2x times than faster than visual c++ debuggable code

however the situation reversed when i used O3 optimization in gcc and "release" optimization in visual c.

now the visual c code is 2x faster than gcc.

i did not expect that large a difference; it is HUGE!!


am i missin gsomehting or anybody else has noticed similar thing?

visual c++ optimizations (/article/7269#comment-333239)


Submitted by Anonymous (http://corporatedrones.wordpress.com) (not verified) on Thu, 02/12/2009 -
06:47.

apparentely, MSVC uses a few insecure optimizations counting that the developer created a
secure code. Probably thats why its debug build is slower.

I've seen lots of situations where gcc code gives a error right away, and promptly showing
me and bug and MSVC happily executing a code until it finally stumble upon a non-static field of a class and
finally giving a error. For me , this is simple misleading and thats why I prefer gcc

Detailed article, that is great! (/article/7269#comment-283146)


Submitted by Anonymous (not verified) on Sun, 09/09/2007 - 16:57.

You wrote very detailedly!


It is really useful for me right now since I am doing my thesis work on optimization under Linux. Thank
your so much!

Someone should write some (/article/7269#comment-193325)


Submitted by Anonymous (not verified) on Thu, 11/02/2006 - 11:57.

Someone should write some "C" code and a few scripts that will enable / disable every compiler option
and then print out which options worked best for _your_ particular system.

A benchmark that would specifically test each option (as opposed to using a single benchmark, and huge)

3 of 4 12/28/2010 4:44 PM
Optimization in GCC | Linux Journal http://www.linuxjournal.com/article/7269

could be written.

EG: no point in benchmarking if we should use:


gcc -O2 -O3 code.c -- One disables the other

gcc -fno-gcse SSE2_code.c

Benchmarks need to have a 'large' effect on the option that is being switched.

This could be ran overnight (or on multiple machines, each doing part of the testing) and results provided on a web page
somewhere.

Experts could put in thier two cents and a wiki of snipperts could
be fed into a code compilator (not compiler, just a bunch of scripts) that would compilate all the snippets and produce a
final program to be compiled on many different machines.

This way we could figure out that if we had such-and-such a system then "how-often" (what % of the time) would we
simply be better off
to use a particular option and when is it more likely based on that TYPE of program we are running (wordprocessor vs.
MultiMedia app).

EG: If you have a Pentium is is ALWAYS (or should be if gcc is correct) best to use the -march=pentium option - BUT - it
is NOT always best to use "-fcrossjumping" (though it _could_ be for certain applications).

The output of all this could simply be a half dozen command line choices for each processor - including a "general purpose
'best'" setting and a "quick compile with great optimization" setting (for intermediate builds).

This is something that a few dozen people need to work on to get the ball rolling and then the rest of us need to pitch in and
compile the resulting test scripts to check for errors. With everyone's help we should have the so-called answer(S) to
"which compilation options should I use for machine-X when compiling applcation=category Y.

Just a thought ...

Looks like you have a good (/article/7269#comment-196321)


Submitted by Anonymous (not verified) on Tue, 11/21/2006 - 12:32.

Looks like you have a good project to setup now.

Got Table? (/article/7269#comment-128874)


Submitted by Anonymous (http://www.screwylizardracing.com) (not verified) on Sun, 02/05/2006 - 18:08.

Where can I get a readable copy of Table 1? The copy here is too small to read, and can't be enlarged.

try clicking on it (/article/7269#comment-133643)


Submitted by Anonymous (not verified) on Thu, 03/30/2006 - 02:58.

try clicking on it

-O versus -O0 (/article/7269#comment-15123)


Submitted by Anonymous (not verified) on Wed, 02/02/2005 - 12:19.

Minor comment -- -O defaults to -O1, not to -O0.

4 of 4 12/28/2010 4:44 PM

You might also like