Nonfictionsrivastav A 94 Atom

ATOM A System for Building Program Analysis Customized Tools
Amitabh Srivastava and Alan Eustace Digital Equipment Western Research Laboratory 250 University Ave., Palo Alto, CA 94301 {amitabh, eustace}(!decwrl .pa. dec. com
Abstract
ATOM building (Analysis Tools with OM) is a single framework infrastructure for a wide range of customized program analysis tools. the common present in all codeand time-consuming details in Building a basic block tools; this is the difficult and analysis routines.
Introduction
important for underon new writor Computer architects need such perform
Program analysis tools are extremely standing program behavior. architectures. Software
It provides instrumenting part.
tools to evaluate how well programs will programs and identify scheduling
writers need tools to analyze their
The user simply
defines the tool-specific
critical pieces of code. Compiler atgorithm
instrumentation counting code.

ATOM, using
ers often use such tools to find out how well their instruction or branch prediction is performing to provide input for profile-driven optimizations. The tirst
tool like Pixie with ATOM requires only a page of

OM link-time
technology,
organizes the fi-
Over the past decade three classes of tools for different machines and applications Epoxie[14] have been developed. counting class consists of basic block basic block is executed. tools like Pixie[9],
nal executable such that the application tion is directly inter-process
program and users program ATOM to the takes
analysis routines run in the same address space. Informapassed from the application simple procedure communication analysis routines through calls instead of
and QPT[8] that count the number of times each The second class consists of adtraces.
or files on disk.
dress tracing tools that generate data and instruction
care that anatysis routines do not interfere with the programs execution, and precise information simulation ATOM OSF/1. memory pipeline or interpretation. has been implemented It is efficient recording, simulation, on the Alpha AXP under profiling, dynamic and inand has been used to build a diverse counting, instruction evaluating and data cache simulation, branch prediction, about the program is preATOM uses no sented to the analysis routines at all times.
Pixie and QPT also generate address traces and communicate trace data to analysis routines through inter-process communication. communicated but it collects ifying ulators. Tracing and analysis on the WRL Titan [3] via shared memory MPTRACE but required operating [6] is also similar to Pixie by instrumenting
system modifications. assembly code. ATUM microcode is analyzed offline. by instrumenting
traces for multiprocessors
set of tools for basic block
[1] generates address traces by mod-
and saves a compressed trace in a file that The third class of tools consists of simsupports multiprocessor simulation simulation assembly language code. g88[2] PROTEUS [4] 88000 selec-
struction scheduling. Permission to cop without fee all or part of this material is granted provided t{ at the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. SIGPLAN 94-6/94 Orlando, Florida USA Q 1994 ACM 0-89791 -662-xJ9410006..$3.5O
Tango Lite[7]
also supports multiprocessor is done by the compiler. using threaded interpreter
but instrumentation Shade[5] attempts to
simulates Motorola
techniques.
address the problem of large address traces by allowing simulation. These existing tools have several limitations.
tive generation of traces but has to resort to instruction-level
First, most tools are designed to perform a single specific type of instrumentation, tracing. Modifying typically block counting or address these tools to produce more detailed
196
or less detailed information insufficient information
is difficult.
A tool generating
the application at all times.
program is presented to analysis routines
is of no use to the user. too much computed informaq
Second, most address tracing tools compute detailed address information. However, As ATOM of compiler works on object modules, it is independent and language systems. tion renders the tool inefficient tire instruction extremely for the user. For example, a branches run
user interested in branch behavior has to sift through the entrace, even though only conditional The instruction need to be examined. into gigabytes. Third, tools based on instruction-level overheads to the processing time. been used to make the simulation system, but simulation many times slower. Fourth, tools such as Tango Lite, which instrument assembly language code, change the application addresses. Instrumenting all libraries have to be available lection mechanisms. communication, programs heap as library routines is inconvenient simulation add large and address traces are and typically
In this paper, we describe the design and implementation of ATOM. We show through a real example how to build tools. Finally, we evaluate the systems performance.
large even for small programs
Design of ATOM
is based on the observation all can be accomplished that al-
Several techniques have faster, such as in the Shade The design of ATOM appear vastly different, menting basic block counting store instruction, each conditional though tasks like basic block counting and cache simulation by instruof each a program at a few selected points. tools instrument instrument For example, each load and ATOM al-
nevertheless makes the programs run
the beginning
basic block, data cache simulators branch instruction.
in assembly language form.
and branch prediction
analyzers instrument Therefore,
Finally, most address tracing tools provide trace data colData in form of address traces is communicated to the data analysis routines through inter-process or files on disk, Both are expensive, and the large size of address traces further aggravates this problem. Using a shared buffer reduces this expense but still requires a lot of process switching efficiently and sometimes can be implemented by providing the prinonly with changes to the operating system. in building performance tools. The important systems are listed
lows a procedure call to be inserted before or after any program, procedure, basic block, or instruction. viewed as a linear collection collection instructions. Furthermore, the common the infrastructure ATOM separates the tool-specific needed in all tools. manipulation for object-code part from It provides and a highThe user infrastructure A program is of of procedures, procedures as a
of basic blocks, and basic blocks as a collection
ATOM overcomes these limitations cipal ingredient below. ATOM is a tool-building ranging can be easily built. features that distinguish
level view of the program in object-module defines the tool-specific indicating the points in the application
form.
it from previous
part in instrumentation routines by program to be instru-
mented, the procedure calls to be made, and the arguments system. A diverse set of tools to cache modeling to be passed. The user also provides code for these procedures in the analysis routines. both the application of print The user specATOM1 Figure 1. In the first step, common the users instrumentation of data is through procedure calls. proThis tool will instrument machinery is combined program with routines to build a custom tool. an application at points routines. application program so
f
from basic block counting
The analysis routines do not program; if
share any procedures or data with the application in all codesame library procedure, like print in the final executable, internally works
f,
ATOM provides the common infrastructure instrumenting user simply specifies the tool details. ATOM allows selective instrumentation. ifies the points in the application ments to be passed. The communication Information
program and the analysis routines use the there are two copies one in the application in
tools, which is the cumbersome part. The
program and the other in the analysis routines. in two steps, as shown program to be irMru-
mented, the procedure calls to be made, and the argu-
is directly passed from the application interprocess
specified by the users instrumentation cation program to build an instrumented executable. The instrumented from application that information
gram to the specified analysis routine with a procedure call instead of through mechanism. Even though the anatysis routines run in the same address space as the application, precise information about communication, files on disk, or a shared buffer with central dispatch
In the second step, this custom tool is applied to the appliexecutable is organized
program is communicated
1ExtemaUy, the user specifies: atomprog in.rt.c ard.c -o Prog.atom to produce dre instrumented program prog.atom.
197
generic object moditier
user application
applia2a3ion
analysis output
user instruouenting n
user a~:~is
application output
Figure 1: The ATOM Process
directly to procedures in the analysis routines through procedure calls. The data is passed as arguments to the handling routine in the requested form, and does not have to go through a central dispatch mechanism. To reduce the communication plication address space. ATOM partitions places the application More importantly, with the information plication program such that they do not interfere to a procedure call, the apthe symbol name space and with each others execution. program and the analysis routines run in the same and analysis routines in the executable
The user provides taining
three files to ATOM: routines,
the application a file conspecand
program object module that is to be instrumented, the instrumentation the analysis routines. ify where the application The instrumentation
and a file containing routines
program is to be instrumented
what procedure calls are to be made. The user provides code for these procedures in the analysis routines. sections show how to write the instrumentation routines for our example tool. The next two and analysis
the analysis routine is always presented (data and text addresses) about the apas if it was executing uninstrumented. Our branch counting tool needs to examine all the conditional branches in the program. We traverse the program a procein the basic block is a conditional The instrumentation the dure at a time, and examine each basic block in the procedure. If the last instruction branch, we instrument ATOM
Instrument
Defhing
Instrumentation
Routines
Section 4 describes how the system guarantees the precise information. ATOM, piler modules. built using OM[l Since OM 1], is independent to work of any comwith different and language system because it operates on objectis designed
the instruction.
architectures,
ATOM can be applied to other architectures.
routines are given in Figure 2. starts the instrumentation

prOCedure3. t rument by defining routine enables
process by invoking instrumentation

The instrumentation
All
modules
Building Customized An Example
Tools:
contain the Ins

process begins in the analysis
procedure. the prototype be called
of each procedure from the application interpret the ar-
that will ATOM 1P rot
In this section we show how to build a simple tool that counts how many times each conditional are written to a file. zoM ~a~ inltiauy implemented on the DECStations running under ULTRIX and was ported to Alpha AXP running under OSF/1. UL~IX, DECStatlon and Alpha AXP are trademarks of Digital Equipment Corporation.
program. guments.
This
to correctly o primitive
branch in the program
is
The AddCal
is used to define
PrintBranch,
taken and how many times it is not taken. The final results
the prototypes. procedures

and Cl o seFile
In our example, prototypes of four analysis

CondBranch, Besides are defined. the standard C data
OpenFi.le,
s~e Instmment procedure takes argc and argv as arguments which can be optionally passedfrom the atom command line.
198
verses the program dure, Get First Instrument(int { Proc *p; Block *b; Inst *insq int nbranch = O; AddCaBProto(OpenFile(int)); AddCaBProto(CondBranch(int, AddCalWroto(PrintBranch(int, AddCallProto(CloseFileo); for(p=GetFirstProco; p!=NULL;p=GetNextProc(p)){ for(b=GetFirstBlock(p) ;b!=NULL;b=GetNextBlock(b)){ inst = GetLastInst(b); if(IsInstType(inst, InstTypeCondBr)){ AddCallInst(inst, InstBefore, CondBrattch, nbranch,BrCondValue); AddCallProgram(ProgramAfter, nbranch++; PrintBranch, nbranch, InstPC(inst)); VALUE)); long)); iargc, char **iargv)
GetNext Block
a procedure
at a time.
In each proceand Using these
Block
returns the first basic block
returns the next basic block.
primitives procedure.
the inner loop traverses all the basic blocks of a we are interested
I nst
In this example, branch instructions. it is a conditional itive.

AddCall CondBranch
only in conditional in the baplimthe and check if

Type
We find the last instruction primitive branch using the Is I nst instructions are ignored.
sic block using the Get Last All

Inst
other
With
primitive,
ore
a call to the analysis procedure
is added at the conditional branch instruction.

Bef
The Inst
argument specifies that the call is to be is executed. The two arguments are the linear number of the vatue specifies
made before the instruction to be passed to CondBranch branch and its condition The AddCallP
(programBef ore)
value. The condition

k
whether the branch will be taken.

rogram
used to insert calls before program starts executprogram
the application
ing aud after (P rog ramAf finishes executing. tively.
t e r) the application
These calls are generally used to initialle
) } }
AddCallProgram(ProgramBefore, AddCallProgram(ProgramAfter, } OpenFile, CloseFile); nbranch);
ize analysis routine data and print results at the end, respecA catl to OpenFi before the application program starts executing creates the branch statistics array and opens the output file. We insert calls for each branch to print its count at the end. PC (program counter) and its accumulated branch after the application Finally, the CloseFile
Note that these calls are made only once for each conditional program has finished executing. procedure is executed which closes
the output file. If more than one procedure is to be called at a point, the calls are made in the order in which they were Figure 2: Instrumentation Routines: Branch Counting Tool added by the instrumentation routines.
types as arguments, as REGV and VALUE. the run-time
AIXIM
supports additional
types such
Defining
Analysis
Routines
If the argument type is REGV, the acregister are passed. may

Ef fAddrValue
The anatysis routines contain code and data for all procedures needed to analyze information tion program. in the instrumentation that is passed from the applicaThese include procedures that were specified routines but may contain other proce-
tual argument is not an integer but a register number, and contents of the specified argument or BrCondValue.
BrCondValue
For the VALUE be Ef fAddrValue
type, the actual argument
dures that these procedures may call. The analysis routines do not share the code for any procedure with the application program, including Code
PrintBranch,
passes the memory store instructions.
address being referenced
by load and
k used for conditional
library routines.
OpenFi le, CondBranch,
branches and passes zero if the run-time evatttates to a false and a non-zero evaluates to true.
VALUE. CondBranch
branch condition type
for
procedures
value if the condition uses the argument
and CloseFile routines uses its argument
whose prototypes containing
were 3.
defined in instrumentation The OpenFile
are given in Figure
the number It also routine in-
ATOM atlows the user to traverse the whole program by modeling a program as consisting of a sequence of proceGet First P roc
of branches to atlocate the branch statistics array. opens a file to print results. The CondBranch
dures, basic blocks and instructions. returns the next procedure.
re-
turns the first procedure in the program, and Get Next The outermost
for
P roc
dArl~t,fter me~od would be to store the PC of each branch in ~ arraY

and pass the array at the end to be printed along wittr ttre counts. ATOM allows passing of arrays as arguments.
loop tra-
199
4
#include <stdio.h> File Wile struct BranchInfo{ long taken; long notTaken; } *bstats; void OpenFile(int n){ BranchInfo));
Implementation
is built using OM[l
of ATOM
1], a link-time code modificaof object files builds a
ATOM
tion system. and libraries symbolic
OM takes as input a collection that make up a complete representation,
program,
intermediate
applies instrumentarepresenrou-
tion and optimizations[12, ATOM starts by linking
13] to the intermediate
tation, and finally outputs an executable. the users instrumentation tines with OM using the standard linker to produce a custom tool. This tool is given as input the application the analysis routines. symbolic representations analysis routines. provide of the application representation requested. program and to build interIt uses OMS infrastructure
bstats = (strttctBrrmchInfo *) malloc (n * sizeof(struct file = fopen(btaken.out, fprintf(file, } void CondBranch(int if (taken) bstats[n] taken++; else bstats[n] .notTaken++; n, long taken){ w);
program and the of the program to
PC \t Taken \t Not Taken \n);
The traversal and query primitives
face with the intermediate the information intermediate in [11]. conveniently representation
More details of OMS so it can be exe-
and how it is built are described
We extended the OMS representation
annotated for procedure call insertions. pass builds the instrumented representation. This pass is
OMS code generation modified
}
void PrintBranch(int n, long pc){ fprintf(file, OX%lX \t %d \t %d\n, PC, bstats[n].taken, } void CloseFileo{ fclose(file); } bstats[n].notTaken);
cutable from the intermediate
to organize the data and text sections in a specific is presented to the analysis routines at
order because ATOM has to ensure that precise information about the application all times. In this section, we first describe the extensions to the intermediate representation and the insertion of procedure calls. we describe how Next, we discuss how we minimize ATOM organizes the final executable. the number of registers
that need to be saved and restored. Finally,
Inserting
Procedure
Calls
representation of OM to have a
We extended the intermediate Figure 3: Analysis Routines: Branch Counting Tool
slot for actions that may be performed before or after the entity is executed. The entity maybe a procedure, basic block,
ittSttWCtiOtt
crements the branch taken or branch not taken counters for the specified branch by examining gument.
Print Branch prints
or an edges. The AddCal

representation the call when describing and indicating
1 primitives
annotate to the to
the condition
value arthe
the intermediate action slot
by adding
a structure
the PC of the branch,
to be inserted,
arguments
number of times the branch is taken and number of times it is not taken. CloseFi.le closes the output file.
be passed,
the call is to be made.
Cur-
rently, adding calls to edges is not implemented.

AddCa 11P rot o primitive, a linked calls and ATOM
VtXifi(X
The protothat. The
type of the procedure must already have been added with the
action slot contains as multiple they list of all such actions at a point. to be perThe order will be
Collecting
Program
Statistics
ATOM is given as input the format, The
To find the branch statistics, fully linked application the instrumentation instrumented
formed in which
can be added is maintained
program
in object-module executable.
are added
so that calls
routines, and the analysis routines. program
made in the order they were specified.
output is the instrumented program
When this
After the intermediate
representation
has been fully annoThis process is easy
is executed, the branch statistics are
tated, the procedure calls are inserted.
produced as a side effect of the normal program execution.
5An edge ~onnect~two basic blocks and representsthe transfer of control between them.
200
because all insertion is done on OMS intermediate tation and no address tixups are needed. ATOM, does not steal any registers from the application ters that may be modified
represenlike QPT, programb.
here
where to save these caller-save
registers, and which code, where the call is We The
caller-save registers to save. Saving registers in the application being inserted, is not a good idea if there are more than a few registers to be saved, as it may cause code explosion. create a wrapper routine for each analysis procedure. and makes the call to the analysis routine. routine. Unfortunately,
It allocates space on the stack before the call, saves regisduring the call, restores the saved registers after the call and deallocates the stack space. This enables a number of mechanisms such as signals, setjmp and vfork to work correctly without needing any special attention. The calling conventions are followed in setting up calls to analysis routines. The first six arguments are passed in
wrapper routine saves and restores the necessary registers, The application in calls to program now calls the wrapper routine instead of the analysis this creates an indirection analysis routines. However, this has the advantage that it
registers and the rest are passed on the stack. The number of instructions needed to setup an argument depends on the type of the argument. For example, a 16-bit integer constant can a 32-bit constant in two instructions, and so on. Passing subroutine is within branch range, be built in 1 instruction,
makes no changes to the analysis code so it works well with a debugger like dbx. This is the default mechanism. ATOM provides an additional routines. facility in which the saves and restores of caller-save registers are added to the analysis No wrapper routines are created in this case. The extra space is allocated in the anrttysis routines stack frame. This requires bumping the stack frame and fixing stack references in the analysis routines as needed. This is more work but is more efficient level debugging. optimization as analysis routines are called directly. is available as a higher Since this modifies the analysis routines, it hampers sourceThis mechanism option. the analysis routines. The data flow
a 64-bit program counter in 3 instructions contents of a register takes 1 instruction. To make the call, a pc-relative instruction
is used if the analysis routine
otherwise, the value of the procedure is loaded in a register and a js r instruction is used for the procedure call. Therewhen a call is made This register the proceturn address register is always modified
so we always save the return address register. dures address for the js r instruction.
becomes a scratch registeq it is used for holding
The number of registers that need to be saved and restored is reduced by examining summary information with inall regisof the analysis routines determines all Only these registers need to be the
Reducing
Procedure
Call Overhead
the registers that may be modified when the control reaches a The application terprocedural not follow program may have been compiled conventions 8. Therefore, particular analysis procedure. number of different routines. Moreover, if an analysis routine contains procedure calls and delay the saves of other to other analysis routines, we save only the registers directly used in this analysis routine none of theprocedttrecalls path the program normally takes. registers to procedures that maybe called. We only do this if occur in a loop. Thus we distribute This helps analysis routines involves printing that the cost of saving registers; the overhead now depends on the return if their argument is valid but otherwise raise art error optimization and may contain routines that do in the call to the analysis routines as they have to allow The calling convensaved and restored. We use register renaming to minimize
the catling
caller-save registers used in the analysis
ters that may be modified
need to be saved, The analysis routines, on the other hand, have to follow the calling conventions arbitrary procedures to be linked in.
tions define some registers as callee-save registers that are preserved across procedure calls, and others as caller-save registers that are not preserved across procedure calls. All the caller-save registers need to be saved before the call to the analysis routine and restored on return from the analysis routines. This is necessary to maintain program. the execution state of the application automatically The callee-save registers would
an error. Raising an error typically
message and touching a lot more registers. For such routines, the common case of a vatid argument has low overhead as few registers are saved. This optimization current implementation. The number of registers that need to be saved may be further reduced by computing live registers in the applilive variable Only cation program. anatysis[l OM can do interprocehral is available in the
be saved and restored in analysis routines if Iivo issues need to be addressed
they are used by them.
Gpixie ~te~s three registers away from the application program for its own use. Pixie maintains three memory locations that have the values of these three registers, and replaces the use of dtese registers by uses of the memory locations. 7~pha[10] ha5 a signed 21-bit pc-relative subroutine branch ~stm~tion. The application may contain hand-crafted assembly language code that often does not follow standard conventions. All)M can handle such programs. $AnalYsis routines are analogous to standard library routines hat have to follow calling conventions so they can be linked wirb progrants.
1] and compute all registers live at a point. execution. Optimization
the live registers need to be saved and restored to preserve the state of the program inlining such as further reduce the overhead of procedure calls at the
201
A
Stack I
textstart ~
cad-only data ~xcedion data program text
a
t
instrumented program text Program Text Addresses Changed \ analysis gp
new datastart
old datastart /\ program data initialized program data uninitialized
n
Heap Figure 4: Memory layout have not The text addresses. the addresses supplied. program text However,
program data initialized program data uninitialized I
I ~a::ram Addresses Unchanged
t Uninstrumented Proaram LayZut Instrumented Program Layout
cost of increasing the code size. These refinements been added to the current system.
same as before. This is shown in Figure 4. addresses of the application program have changed because of the addition of instrumented code. How-
Keeping
Pristine
Behavior
ever, we statically
know the map from the new to original program, the original PC is simply
If an analysis routine asks for the PC of an inThis works well for most of the tools. if the address of a procedure in the application If the
One major goal of ATOM is to avoid perturbing in the application program. Therefore, are put in the space between the application and data segments. Analysis cedures or data with the application data and code for all procedures that they may need. The data sections unchanged. of the application and uninitialized program;
struction in the application
the analysis routines programs
routines do not share any prothey contain routines are not including library program
is taken, its address may exist in a register. is not the original text address.
analysis routine asks for the contents of such a register, the value supplied not implemented We have to return in our current system the ability
moved, so the data addresses in the application The initialized
program are programs data
original text address in such cases. Analysis routines may dynamically
allocate data on heap. program do not 1 routines, one in two
data of analysis
Since analysis routines and the application share any procedures, there are two sbrkl the application
routines is put in the space between the application must be located before all uninitialized data by initializing it with zero.
text and data segments. In an executable, all initialized
program and the other in the analysis routines
data, so the uninitial-
that atlocate space on the same heap. ATOM provides options for tools that must allocate dynamic memory.
ized data of the analysis routines is converted to initialized The start of the stack and heap10 are unchanged,
lo~ the MPha Am
so all stack and heap addresses are
Mder OSF/1 stack begins at start of text S9gItIetIt
and grows towards low memory, and heap starts at end of uninitialized data and grows towards high memory. 11sbrk routines ~ocate more data spacefor tie program.
202
The first method links the variables of the two sbrks, so both allocate space on the same heap without on each other. stepping This Each starts where the other left off.
to the uninstrumented tool.
programs
execution
time for each
Figure 6 shows the ratios for the SPEC92 programs.
The procedure call overhead is dependent on the code in the analysis routines, and the number and type of arguments that are passed. ATOM uses the data flow summary information along with register renaming to save. The contribution instrumented program to find the necessary registers time is also dependent on of procedure call overhead in the
method is useful for tools that are not concerned with the heap addresses being same as in the uninstrumented of the program. also sufficient Such tools include version basic block counting, that require
branch analysis, inline analysis and so on. This method is for tools such as cache modeling This is the default behavior. precise heap addresses but do not allocate dynamic memory in analysis routines. The second method is for tools that allocate dynamic memory and also require heap addresses to be same as in the tminstrumented version of the application application between the application program. To keep the heap addresses as before, the heap is partitioned and the analysis routines. The appli-
execution
the number of times the procedure calls take place. The inline tool instruments only procedure call sites; the total overhead is much less than the cache tool, which do when the control information communication instruments each memory reference. The amount of work the analysis routines reaches them is totally to compute. dependent on Although the the user is trying
overhead is small, we expect it to decrease live register analysis and inlining. Alpha AXP 3000
cation heap starts at the same address but the analysis heap is now made to start at a higher address. The user supplies the offset by which the start of analysis heap is changed. ATOM modifies the sbrk in analysis routines to start at the new address; the two sbrks application are not linked this time. The disad-
further when we implement
All measurements were done on Digital Model 400 with 128 Mb memory.
vantage of this method is that there is no runtime check if the heap grows and enters into the analysis heap.
Status
runs on Alpha AXP with The system Work is in
ATOM is built using OM and currently Fortrart, C++ and two different currently
under OSF/1. It has been used with programs compiled
Performance
how long ATOM takes to instrument a program, and programs execution time compares to 20 SPEC92 progr~s with programs execution time.
C compilers. modules.
works on non-shared library
To find how well ATOM performs, two measurements are of interest how the instrumented the uninstrumented
progress for adding support for shared libraries. ATOM projects. has been used both in hardware Besides the SPEC92 benchmarks, real applications and software it has successand at a Few
fully instrumented few universities12.
of up to 96 Megabytes. inside Digital
We used ATOM to instrument
The system is being used extensively
11 tools, The tools are briefly described in Figure 5. The time taken to instrument a program is the sum of the ATOMs processing time and the time taken by the users instrumentation routines. different The time taken by a tool varies as each tool does amounts of processing. For example, the malfoc
Our focus until now has mainly been on functionality. optimization overhead. Currently, reduction
have been added to reduce the procedure call in register saves has been information of We live register analysis system. data flow summary
obtained by computing along with inlining
tool simply asks for the malloc procedure and instruments i~ the processing time is very small. CPU pipeline scheduling The pipe tool does static for each basic block at instrttmen-
analysis routines. We plan to implement are just starting to instrument
to further improve the performance. the operating
tation time and takes more time to instrument an application. The time taken to instrument 20 SPEC92 programs with each tool is also shown in Figure 5. The execution time of the instrumented program is the sum of the execution time of the uninstrumented analysis routines. application
program, the procedure call setup, and the time spent in the This total time represents the time needed and do not include those numbers statistics. The time spent in analysis time required by
12A~M is available to external users. If you would like a copY. Please contact the authors.
by the user to get the final answers. Many systems process the collected data offline as part of data collecting other systems. We compared each instrumented programs execution time
routines is analogous to the postprocessing
20 3
Analysis Tool branch

cache dyninst gprof inline io malloc pipe prof Syscall unalign
Tool Description
prediction using 2-bit history table counts
Time to instrument SPEC92 suite

110.46 WCS 120.58 SCXX 126.31 WCS 113.24 146.50
SeCS S(XS
Average Time
5.52 WCS 6.03 WX 6.32 St%X 5.66 S(3(3 7.33 Sees
model direct mapped 8k byte cache computes dynamic instruction call graph based profiling finds potential inpui/output inlining summary tool tool
call sites
121.60 WCS 97.93 Sees 257.48 SCXX 122.53 WCS 120.53 St?CS 135.61 S(?CS
6.08 sees
4.90 sees 12.87 WCS 6.13 WCS 6.03 WCS 6.78 WCS
histogam of dynamic memory pipeline stall tool Instruction profiling tool system call summary tool unalign access tool
Figure 5: Time taken by ATOM to instrument
20 SPEC92 benchmark programs
Analysis Tool
branch cache dyninst gprof inline io malloc pipe prof Syseall unalign
Instrumentation points
each conditional eaeh basic block each procedure/each basic block each call site before/after write procedure before/after malloc procedure basic block each basic block each procedure/each before/after each basic block branch each memory reference
Number of Arguments 3 1 3 2 1 4 1 2 2 2 3
Time taken by Instrumented Program

3.03X 11.84x 2.91x 2.70x 1.03X 1.OIX 1.02X 1.80x 2.33x 1.OIX 2.93x
each system call
Figure 6: Execution
time of instrumented
SPEC92 Programs as compared to uninstrumented
SPEC92 programs
Conclusion
modification details from tool tools to harda high-level view of the program,
Acknowledgements
Great many people have helped us bring ATOM to its current form. Jim Keller, Mike Burrows, Roger Cruz, John Edmondson, Mike McCallig, of bugs and instability Dirk Meyer, Richard Swan and Mike process. Uhler were ottrtirst users and they braved through amine field during the early development Jeremy Dion, Ramsey Haddad, Russel Kao, Greg Lueck and Louis Monier built popular tools with ATOM. Many people, too many to name, gave comments, reported bugs, and provided encouragement. Linda Wilson atl. Roger Cruz, Jeremy Dion, Ramsey PLDI reviewers gave useful Haddad, Russell Kao, Jeff Mogul, Louis Monier, David Wall, and anonymous comments on the earlier drafts of this paper. Our thanks to
By separating object-module details and by presenting ATOM has transferred
the power of building
ware and software designers. only on what information
A tool designer concentrates
is to be collected and how to proATOMs fast commuthat there proof the
cess it. Tools can be built with few pages of code and they compute only what the user asks for.
nication between application traces is no need to record and analysis means
as all data is immediately in one execution
cessed, and final results instrumented programs. of tools ATOM program. It has already to solve will
are computed Thus,
one can process
long-running a wide variety We hope for studies
been used to build and software
hardware
problems. platform
continue
to be an effective design.
in software
and architectural
204
References
[1]
l(l),
pp 1-18, March 1993. Also available as WRL
Research Report 92/6, December 1992.
Anant Agarwal, Richard Sites, and Mark Horwitz,

ATUM A New Technique for Capturing Address [12] Amitabh Srivastava and David W. Wall. LinkTraces Using Microcode.
Proceedings of the 13th
Time Optimization bit Architecture.
of Address Calculation
on a 64-
International
Symposium on Computer Architec-
Proceedings of the SIGPLAN94 Language Design

1994. in
ture, June 1986.

[2] Robert Bedichek. Some Efficient Architectures
Conference on Programming
and Implementation, to appear. Also available as

WRL Research Report 94/1, February [13] Amitabh Lazana, and Srivastava. Unreachable Simulation Techniques. Winter 1990 USENIX Conprocedures object-oriented [3] Anita chines: Borg, R.E. Kessler, Georgia David Wall. Long Address Traces from RISC MaGeneration and Analysis, Proceedings of the 17th Annual Symposium on ComputerArchitecprogramming,
ference, January 1990.
ACM LOPLAS, Vol
1, #4, pp 355-364, December 1992. Also available as WRL Research Report 93/4, August 1993. [14] David W. Wall. Systems for late code modification. In Robert Giegerich Code Generation and Susan L. Graham, eds, - Concepts, Tals, Techniques,
ture, May 1990, also available as WRL Reseasch

Report 89/14, Sep 1989. [4] Eric A. Brewer, Chrysanthos N. Dellarocas, Adrian Colbrook, and William E. Weihl. PROTEUS: A High-Performance tor. MIT/LCS/l_R-5 Parallel-Architecture 16, MIT, 1991. David Keppel, Shade A Fast for Execution Profiling. of Simula-
pp. 275-293, Springer-Verlag,
1992. Also available
as WRL Research Report 92/3, May 1992.
[51 RobertF. Cmelikand

Instruction-Set Washington. [6] Susan J. Eggers, Koldinger,
Simulator
Technical Report UWCSE 93-06-06, University
David
R.
Keppel,
Eric
J.
and Henry M. Levy. T&hniques
for EfMulti-
ficient Inline Tracing on a Shared-Memory
processor. SIGMETRICS Conference on A4eastwe-
ment and Modeling of Computer Systems, VOI8, no

1, May 1990. [7] Stephen R. Goldschmidt and John L. Hennessy, Simulations Computer of MulSystems
The Accuracy of Trace-Driven tiprocessors. CSL-TR-92-546, Laboratory,
Stanford University,
September 1992. ex-
[8] James R. Larus and Thomas Ball. Rewriting
ecutable files to measure program behavior. Soft-
ware, Practice and Experience, vol 24, no. 2, pp

197-218, February 1994. [9] MIPS Computer Systems, Inc. Assembly Language
Programmers Guide, 1986.

[10] Richard L, Sites, ed. Alpha Architecture Reference
Manual
Digital Press, 1992. Srivastava and David W. Wall. A PracCode Optimization
[11] Amitabh
tical System for Intermodule at Link-Time.
Journal of Programming Language,
205

Nonfictionsrivastav A 94 Atom

Uploaded by

Copyright:

Available Formats

You might also like

Nonfictionsrivastav A 94 Atom

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nonfictionsrivastav A 94 Atom

Uploaded by

Copyright:

Available Formats

ATOM A System for Building Program Analysis Customized Tools

It provides instrumenting part.

writers need tools to analyze their

The user simply

defines the tool-specific

critical pieces of code. Compiler atgorithm

instrumentation counting code.

tool like Pixie with ATOM requires only a page of

organizes the fi-

nal executable such that the application tion is directly inter-process

program and users program ATOM to the takes

dress tracing tools that generate data and instruction

system modifications. assembly code. ATUM microcode is analyzed offline. by instrumenting

traces for multiprocessors

set of tools for basic block

[1] generates address traces by mod-

also supports multiprocessor is done by the compiler. using threaded interpreter

but instrumentation Shade[5] attempts to

tive generation of traces but has to resort to instruction-level

or less detailed information insufficient information

the application at all times.

program is presented to analysis routines

is of no use to the user. too much computed informaq

large even for small programs

nevertheless makes the programs run

basic block, data cache simulators branch instruction.

in assembly language form.

and branch prediction

analyzers instrument Therefore,

of basic blocks, and basic blocks as a collection

part in instrumentation routines by program to be instru-

from basic block counting

The analysis routines do not program; if

tools, which is the cumbersome part. The

mented, the procedure calls to be made, and the argu-

is directly passed from the application interprocess

generic object moditier

Figure 1: The ATOM Process

The user provides taining

three files to ATOM: routines,

the application a file conspecand

and a file containing routines

ATOM can be applied to other architectures.

routines are given in Figure 2. starts the instrumentation

process by invoking instrumentation

Building Customized An Example

contain the Ins

procedure. the prototype be called

of each procedure from the application interpret the ar-

that will ATOM 1P rot

branch in the program

the prototypes. procedures

In our example, prototypes of four analysis

In each proceand Using these

returns the first basic block

returns the next basic block.

In this example, branch instructions. it is a conditional itive.

only in conditional in the baplimthe and check if