Professional Documents
Culture Documents
Profiling
Profiling
Christopher Maes
These notes describe how to profile the example graphics program bounce, available at
http://www.stanford.edu/class/cme212/assignments.html.
Why Profile?
Profiling is used to determine which parts of a program to optimize for speed or memory
usage [2]. A general rule of thumb is that 90% of a program’s time is spent in just 10% of
the code. Profiling enables you to determine which 10% of the code.
at the beginning of bounce’s Makefile will include these flags in the implicit rules which build
.o files from .c files. In addition, you need to add these flags to the rule that constructs
bounce as follows:
bounce: ${OBJ}
gcc $(CFLAGS) -o $@ ${OBJ} -L/usr/X11R6/lib64 -lX11
# Note ^^^^^ added compilation flags
Next we run bounce, and let it run for a few seconds to collect data.
chris@nullspace:./bounce
Hit q to quit.
This produces a file called gmon.out which contains the profiling information collected.
To display this information we use the program gprof. There are three main sources of
information displayed by gprof: a flat profile, a call graph, and an annotated source listing.
First, we will take a look at the flat profile by giving gprof the option ‘-p’
Christopher Maes CME 212 Section Notes February 20, 2008
<output omitted>
The flat profile gives a list of functions executed by the program bounce, and displays the
percentage of bounce’s running time spent in the function and the number of calls to the
function. For information about the other columns please see the output of gprof.
Something funny is going on here, according to gprof, bounce made a lot of calls to the
functions Bounce and DrawCircle (see the calls columns above) but spent none of its running
time in these functions (see the % time column above). To understand what is happening
here we need to understand a little about how gprof works. The number of calls to a
function is obtained by counting; compiling the code with -pg added code to bounce to
count function calls. However, the amount of time spent inside a function is calculated
via a sampling process. Note the line at the beginning of the gprof’s output specifying
that the sampling period is 0.01 seconds. So, every 0.01 seconds the program is sampled
and the function in which the program currently resides is recorded. Unfortunately, this
sampling process doesn’t work very well for a program like bounce where most of the time
is spent in library calls. To see where bounce is spending all of its time we will turn to a
simulation—rather than sampling—based profiler.
Before we do, we’ll briefly mention the annotation abilities of gprof. Executing gprof with
the option ‘-A’ will produce source code annotated with calls and timing information. For
example, the following command:
2
Christopher Maes CME 212 Section Notes February 20, 2008
As before, we let bounce run for a few seconds to collect profiling information. The number
enclosed in ‘==’ on the left-hand side of Valgrind’s output is the process id (in this case 7360).
Valgrind (or really the Callgrind tool) produces a file called callground.out.# where # is
the process id (in this case 7360).
Running the program callgrind_annotate produces a summary of the data collected in
the callgrind.out file:
chris@nullspace: callgrind_annotate
Reading data from ’callgrind.out.7360’...
3
Christopher Maes CME 212 Section Notes February 20, 2008
--------------------------------------------------------------------------------
Profile data file ’callgrind.out.7360’ (creator: callgrind-3.2.3-Debian)
--------------------------------------------------------------------------------
I1 cache:
D1 cache:
L2 cache:
Timerange: Basic block 0 - 7586446
Trigger: Program termination
Profiled target: ./bounce (PID 7360, part 1)
Events recorded: Ir
Events shown: Ir
Event sort order: Ir
Thresholds: 99
Include dirs:
User annotated:
Auto-annotation: off
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
33,103,198 PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
21,379,973 ???:XCheckMaskEvent [/usr/lib/libX11.so.6.2.0]
3,437,648 bounce.c:main [/home/chris/Desktop/cme212/assignment3/bounce]
2,525,400 graphics.c:DrawCircle [/home/chris/Desktop/cme212/assignment3/bounce]
1,381,284 ???:XFillArc [/usr/lib/libX11.so.6.2.0]
703,261 bounce.c:Bounce [/home/chris/Desktop/cme212/assignment3/bounce]
588,550 ???:XSetForeground [/usr/lib/libX11.so.6.2.0]
287,916 ???:0x0000000000071E20 [/lib/libc-2.6.1.so]
276,765 ???:0x0000000000025690 [/usr/lib/libX11.so.6.2.0]
245,774 ???:0x0000000000071440 [/lib/libc-2.6.1.so]
200,182 ???:_XFlush [/usr/lib/libX11.so.6.2.0]
180,125 ???:_XData32 [/usr/lib/libX11.so.6.2.0]
166,896 ???:_XFlushGCCache [/usr/lib/libX11.so.6.2.0]
127,365 ???:XFillRectangle [/usr/lib/libX11.so.6.2.0]
125,233 ???:XCopyArea [/usr/lib/libX11.so.6.2.0]
117,460 ???:0x0000000000009680 [/lib/ld-2.6.1.so]
104,994 ???:_XEventsQueued [/usr/lib/libX11.so.6.2.0]
102,474 ???:0x0000000000009AB0 [/lib/ld-2.6.1.so]
90,544 ???:malloc [/lib/ld-2.6.1.so]
85,738 ???:_XWireToEvent [/usr/lib/libX11.so.6.2.0]
85,718 ???:_XEnq [/usr/lib/libX11.so.6.2.0]
80,045 ???:free [/lib/libc-2.6.1.so]
<output omitted>
During the profiling run bounce executed 33,103,198 instructions, 21,379,973 of these in-
structions were executed inside the X11 library function XCheckMaskEvent.
To get a annotated source we can run the following command:
4
Christopher Maes CME 212 Section Notes February 20, 2008
Figure 1: A screenshot of the program KCachegrind. Note the coverage list (left), the call
tree graph (bottom right) and the tree map visualization (top right).
KCachegrind is able to produce annotated call graphs from the profiling information provided
in the callgrind.out file. A call graph shows the relationship between function calls. Each
node in a call graph represents a function or subroutine. We draw an edge from node a to
node b if function a calls function b. With KCachegrind it is possible to construct call graphs
according to the amount of time spent in a function. Figure 2 shows a call graph for the
program bounce.
5
Christopher Maes CME 212 Section Notes February 20, 2008
main
99.09%
Figure 2: A call graph for bounce. Only functions less than two levels below main that
consume 1% or more of the total running time are shown.
This call graph provides a nearly complete description of where bounce spends its time.
References
[1] KCachegrind website http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/
KcacheGrindWhat