Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

CME 212 Section Notes

Christopher Maes

These notes describe how to profile the example graphics program bounce, available at
http://www.stanford.edu/class/cme212/assignments.html.

Why Profile?
Profiling is used to determine which parts of a program to optimize for speed or memory
usage [2]. A general rule of thumb is that 90% of a program’s time is spent in just 10% of
the code. Profiling enables you to determine which 10% of the code.

Profiling with gprof


The most commonly used profiler on Linux systems is the program gprof. gprof comes
bundled with the GCC compiler. Profiling a program with gprof involves three steps:
preparing your program for profiling, executing your program to collect data, and running
gprof to analyze the results.
To prepare your program for profiling, you need to compile your program with the additional
options ‘-pg -g’. The option ‘-pg’ inserts profiling code into the program, and the option
‘-g’ adds debugging information to the program. Inserting the line
CFLAGS = -pg -g

at the beginning of bounce’s Makefile will include these flags in the implicit rules which build
.o files from .c files. In addition, you need to add these flags to the rule that constructs
bounce as follows:
bounce: ${OBJ}
gcc $(CFLAGS) -o $@ ${OBJ} -L/usr/X11R6/lib64 -lX11
# Note ^^^^^ added compilation flags

Running make then yields the following output:


chris@nullspace: make
cc -pg -g -c -o bounce.o bounce.c
cc -pg -g -c -o graphics.o graphics.c
cc -pg -g -c -o timer.o timer.c
gcc -pg -g -o bounce bounce.o graphics.o timer.o -L/usr/X11R6/lib64 -lX11

Next we run bounce, and let it run for a few seconds to collect data.
chris@nullspace:./bounce
Hit q to quit.

This produces a file called gmon.out which contains the profiling information collected.
To display this information we use the program gprof. There are three main sources of
information displayed by gprof: a flat profile, a call graph, and an annotated source listing.
First, we will take a look at the flat profile by giving gprof the option ‘-p’
Christopher Maes CME 212 Section Notes February 20, 2008

chris@nullspace: gprof -p ./bounce gmon.out


Flat profile:

Each sample counts as 0.01 seconds.


no time accumulated

% cumulative self self total


time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 25750 0.00 0.00 Bounce
0.00 0.00 0.00 25750 0.00 0.00 DrawCircle
0.00 0.00 0.00 2576 0.00 0.00 CheckForQuit
0.00 0.00 0.00 2575 0.00 0.00 ClearScreen
0.00 0.00 0.00 2575 0.00 0.00 Refresh
0.00 0.00 0.00 2 0.00 0.00 SetCAxes
0.00 0.00 0.00 1 0.00 0.00 InitializeGraphics
0.00 0.00 0.00 1 0.00 0.00 create_gc
0.00 0.00 0.00 1 0.00 0.00 create_simple_window

<output omitted>

The flat profile gives a list of functions executed by the program bounce, and displays the
percentage of bounce’s running time spent in the function and the number of calls to the
function. For information about the other columns please see the output of gprof.
Something funny is going on here, according to gprof, bounce made a lot of calls to the
functions Bounce and DrawCircle (see the calls columns above) but spent none of its running
time in these functions (see the % time column above). To understand what is happening
here we need to understand a little about how gprof works. The number of calls to a
function is obtained by counting; compiling the code with -pg added code to bounce to
count function calls. However, the amount of time spent inside a function is calculated
via a sampling process. Note the line at the beginning of the gprof’s output specifying
that the sampling period is 0.01 seconds. So, every 0.01 seconds the program is sampled
and the function in which the program currently resides is recorded. Unfortunately, this
sampling process doesn’t work very well for a program like bounce where most of the time
is spent in library calls. To see where bounce is spending all of its time we will turn to a
simulation—rather than sampling—based profiler.
Before we do, we’ll briefly mention the annotation abilities of gprof. Executing gprof with
the option ‘-A’ will produce source code annotated with calls and timing information. For
example, the following command:

chris@nullspace: gprof -A ./bounce gmon.out


<output omitted>
2575 -> void ClearScreen(void) {
XSetForeground(display,gc,black);
XFillRectangle(display,pixmap,gc,0,0,width,height);
}
<output omitted>

2
Christopher Maes CME 212 Section Notes February 20, 2008

shows us that there were 2575 calls to the function ClearScreen.

Profiling with Cachegrind and Callgrind


To see where bounce is spending its time we turn to the program Valgrind. Valgrind is a
suite of tools for debugging and profiling. Valgrind is, perhaps most famous for its Memcheck
tool which can be used to find memory leaks and errors. However, Valgrind also includes
the Cachegrind and Callgrind tools which can be used to construct a very accurate profile
of a program.
Valgrind is basically a virtual machine or processor emulator. We execute a program through
Valgrind, and Valgrind records information about the instructions the program executes and
the memory the program access. Since Valgrind is a processor emulator it does not need
to augment a program to profile it. This means that we don’t need to prepare a program
for profiling, so we can remove the ‘-pg’ flag from the Makefile. This also means that we
can run Valgrind on programs that we do not have the source code for. However, running a
program through Valgrind will cause the program to run around 50 times slower [1].
To use Valgrind with bounce, remember to remove the ‘-pg’ flag from the Makefile, then
run make clean; make, and then the following command:

chris@nullspace: valgrind --tool=callgrind ./bounce


==7360== Callgrind, a call-graph generating cache profiler.
==7360== Copyright (C) 2002-2007, and GNU GPL’d, by Josef Weidendorfer et al.
==7360== Using LibVEX rev 1732, a library for dynamic binary translation.
==7360== Copyright (C) 2004-2007, and GNU GPL’d, by OpenWorks LLP.
==7360== Using valgrind-3.2.3-Debian, a dynamic binary instrumentation framework.
==7360== Copyright (C) 2000-2007, and GNU GPL’d, by Julian Seward et al.
==7360== For more details, rerun with: -v
==7360==
==7360== For interactive control, run ’callgrind_control -h’.
Hit q to quit.
==7360==
==7360== Events : Ir
==7360== Collected : 33103198
==7360==
==7360== I refs: 33,103,198

As before, we let bounce run for a few seconds to collect profiling information. The number
enclosed in ‘==’ on the left-hand side of Valgrind’s output is the process id (in this case 7360).
Valgrind (or really the Callgrind tool) produces a file called callground.out.# where # is
the process id (in this case 7360).
Running the program callgrind_annotate produces a summary of the data collected in
the callgrind.out file:

chris@nullspace: callgrind_annotate
Reading data from ’callgrind.out.7360’...

3
Christopher Maes CME 212 Section Notes February 20, 2008

--------------------------------------------------------------------------------
Profile data file ’callgrind.out.7360’ (creator: callgrind-3.2.3-Debian)
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
L2 cache:
Timerange: Basic block 0 - 7586446
Trigger: Program termination
Profiled target: ./bounce (PID 7360, part 1)
Events recorded: Ir
Events shown: Ir
Event sort order: Ir
Thresholds: 99
Include dirs:
User annotated:
Auto-annotation: off

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
33,103,198 PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
21,379,973 ???:XCheckMaskEvent [/usr/lib/libX11.so.6.2.0]
3,437,648 bounce.c:main [/home/chris/Desktop/cme212/assignment3/bounce]
2,525,400 graphics.c:DrawCircle [/home/chris/Desktop/cme212/assignment3/bounce]
1,381,284 ???:XFillArc [/usr/lib/libX11.so.6.2.0]
703,261 bounce.c:Bounce [/home/chris/Desktop/cme212/assignment3/bounce]
588,550 ???:XSetForeground [/usr/lib/libX11.so.6.2.0]
287,916 ???:0x0000000000071E20 [/lib/libc-2.6.1.so]
276,765 ???:0x0000000000025690 [/usr/lib/libX11.so.6.2.0]
245,774 ???:0x0000000000071440 [/lib/libc-2.6.1.so]
200,182 ???:_XFlush [/usr/lib/libX11.so.6.2.0]
180,125 ???:_XData32 [/usr/lib/libX11.so.6.2.0]
166,896 ???:_XFlushGCCache [/usr/lib/libX11.so.6.2.0]
127,365 ???:XFillRectangle [/usr/lib/libX11.so.6.2.0]
125,233 ???:XCopyArea [/usr/lib/libX11.so.6.2.0]
117,460 ???:0x0000000000009680 [/lib/ld-2.6.1.so]
104,994 ???:_XEventsQueued [/usr/lib/libX11.so.6.2.0]
102,474 ???:0x0000000000009AB0 [/lib/ld-2.6.1.so]
90,544 ???:malloc [/lib/ld-2.6.1.so]
85,738 ???:_XWireToEvent [/usr/lib/libX11.so.6.2.0]
85,718 ???:_XEnq [/usr/lib/libX11.so.6.2.0]
80,045 ???:free [/lib/libc-2.6.1.so]
<output omitted>

During the profiling run bounce executed 33,103,198 instructions, 21,379,973 of these in-
structions were executed inside the X11 library function XCheckMaskEvent.
To get a annotated source we can run the following command:

4
Christopher Maes CME 212 Section Notes February 20, 2008

chris@nullspace:~/cme212/: callgrind_annotate --auto=yes


<output omitted>
2 printf(‘‘Hit q to quit.\n’’);
1,189 => ???:puts (1x)
867 => ???:_dl_runtime_resolve (1x)
6,592 while(!CheckForQuit()) {
21,729,937 => graphics.c:CheckForQuit (2197x)
98,820 for(i=0;i<N;i++) {
307,440 ax=forceX[i]/mass[i];
351,360 ay=forceY[i]/mass[i]-grav;
.
307,440 u[i]+=ax*dt;
307,440 v[i]+=ay*dt;
417,240 x[i]+=u[i]*dt;
417,240 y[i]+=v[i]*dt;
.
549,000 Bounce(&x[i],&y[i],&u[i],&v[i],L,W);
703,261 => bounce.c:Bounce (21960x)
. }
<output omitted>

For instance the above output shows lines 47-59 of bounce.c.


In addition to the terminal program callgrind_annoate, a graphical program KCachegrind
can be used to examine callgrind.out files. A screenshot of KCachegrind is shown in
Figure 1.

Figure 1: A screenshot of the program KCachegrind. Note the coverage list (left), the call
tree graph (bottom right) and the tree map visualization (top right).

KCachegrind is able to produce annotated call graphs from the profiling information provided
in the callgrind.out file. A call graph shows the relationship between function calls. Each
node in a call graph represents a function or subroutine. We draw an edge from node a to
node b if function a calls function b. With KCachegrind it is possible to construct call graphs
according to the amount of time spent in a function. Figure 2 shows a call graph for the
program bounce.

5
Christopher Maes CME 212 Section Notes February 20, 2008

main
99.09%

Bounce ClearScreen DrawCircle Refresh CheckForQuit XCloseDisplay


2.12% 1.65% 14.35% 2.85% 65.64% 1.08%

XFillRectangle XSetForeground XFillArc XFlush XCheckMaskEvent _XFreeDisplayStructure


1.33% 1.78% 5.11% 2.32% 65.29% 1.03%

Figure 2: A call graph for bounce. Only functions less than two levels below main that
consume 1% or more of the total running time are shown.

This call graph provides a nearly complete description of where bounce spends its time.

References
[1] KCachegrind website http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/
KcacheGrindWhat

[2] Performance analysis http://en.wikipedia.org/wiki/Performance_analysis

You might also like