Nonfictionsrivastav A 94 Atom

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

ATOM A System for Building Program Analysis Customized Tools

Amitabh Srivastava and Alan Eustace Digital Equipment Western Research Laboratory 250 University Ave., Palo Alto, CA 94301 {amitabh, eustace}(!decwrl .pa. dec. com

Abstract
ATOM building (Analysis Tools with OM) is a single framework infrastructure for a wide range of customized program analysis tools. the common present in all codeand time-consuming details in Building a basic block tools; this is the difficult and analysis routines.

Introduction
important for underon new writor Computer architects need such perform

Program analysis tools are extremely standing program behavior. architectures. Software

It provides instrumenting part.

tools to evaluate how well programs will programs and identify scheduling

writers need tools to analyze their

The user simply

defines the tool-specific

critical pieces of code. Compiler atgorithm

instrumentation counting code.


ATOM, using

ers often use such tools to find out how well their instruction or branch prediction is performing to provide input for profile-driven optimizations. The tirst

tool like Pixie with ATOM requires only a page of


OM link-time

technology,

organizes the fi-

Over the past decade three classes of tools for different machines and applications Epoxie[14] have been developed. counting class consists of basic block basic block is executed. tools like Pixie[9],

nal executable such that the application tion is directly inter-process

program and users program ATOM to the takes

analysis routines run in the same address space. Informapassed from the application simple procedure communication analysis routines through calls instead of

and QPT[8] that count the number of times each The second class consists of adtraces.

or files on disk.

dress tracing tools that generate data and instruction

care that anatysis routines do not interfere with the programs execution, and precise information simulation ATOM OSF/1. memory pipeline or interpretation. has been implemented It is efficient recording, simulation, on the Alpha AXP under profiling, dynamic and inand has been used to build a diverse counting, instruction evaluating and data cache simulation, branch prediction, about the program is preATOM uses no sented to the analysis routines at all times.

Pixie and QPT also generate address traces and communicate trace data to analysis routines through inter-process communication. communicated but it collects ifying ulators. Tracing and analysis on the WRL Titan [3] via shared memory MPTRACE but required operating [6] is also similar to Pixie by instrumenting

system modifications. assembly code. ATUM microcode is analyzed offline. by instrumenting

traces for multiprocessors

set of tools for basic block

[1] generates address traces by mod-

and saves a compressed trace in a file that The third class of tools consists of simsupports multiprocessor simulation simulation assembly language code. g88[2] PROTEUS [4] 88000 selec-

struction scheduling. Permission to cop without fee all or part of this material is granted provided t{ at the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. SIGPLAN 94-6/94 Orlando, Florida USA Q 1994 ACM 0-89791 -662-xJ9410006..$3.5O

Tango Lite[7]

also supports multiprocessor is done by the compiler. using threaded interpreter

but instrumentation Shade[5] attempts to

simulates Motorola

techniques.

address the problem of large address traces by allowing simulation. These existing tools have several limitations.

tive generation of traces but has to resort to instruction-level

First, most tools are designed to perform a single specific type of instrumentation, tracing. Modifying typically block counting or address these tools to produce more detailed

196

or less detailed information insufficient information

is difficult.

A tool generating

the application at all times.

program is presented to analysis routines

is of no use to the user. too much computed informaq

Second, most address tracing tools compute detailed address information. However, As ATOM of compiler works on object modules, it is independent and language systems. tion renders the tool inefficient tire instruction extremely for the user. For example, a branches run

user interested in branch behavior has to sift through the entrace, even though only conditional The instruction need to be examined. into gigabytes. Third, tools based on instruction-level overheads to the processing time. been used to make the simulation system, but simulation many times slower. Fourth, tools such as Tango Lite, which instrument assembly language code, change the application addresses. Instrumenting all libraries have to be available lection mechanisms. communication, programs heap as library routines is inconvenient simulation add large and address traces are and typically

In this paper, we describe the design and implementation of ATOM. We show through a real example how to build tools. Finally, we evaluate the systems performance.

large even for small programs

Design of ATOM
is based on the observation all can be accomplished that al-

Several techniques have faster, such as in the Shade The design of ATOM appear vastly different, menting basic block counting store instruction, each conditional though tasks like basic block counting and cache simulation by instruof each a program at a few selected points. tools instrument instrument For example, each load and ATOM al-

nevertheless makes the programs run

the beginning

basic block, data cache simulators branch instruction.

in assembly language form.

and branch prediction

analyzers instrument Therefore,

Finally, most address tracing tools provide trace data colData in form of address traces is communicated to the data analysis routines through inter-process or files on disk, Both are expensive, and the large size of address traces further aggravates this problem. Using a shared buffer reduces this expense but still requires a lot of process switching efficiently and sometimes can be implemented by providing the prinonly with changes to the operating system. in building performance tools. The important systems are listed

lows a procedure call to be inserted before or after any program, procedure, basic block, or instruction. viewed as a linear collection collection instructions. Furthermore, the common the infrastructure ATOM separates the tool-specific needed in all tools. manipulation for object-code part from It provides and a highThe user infrastructure A program is of of procedures, procedures as a

of basic blocks, and basic blocks as a collection

ATOM overcomes these limitations cipal ingredient below. ATOM is a tool-building ranging can be easily built. features that distinguish

level view of the program in object-module defines the tool-specific indicating the points in the application

form.

it from previous

part in instrumentation routines by program to be instru-

mented, the procedure calls to be made, and the arguments system. A diverse set of tools to cache modeling to be passed. The user also provides code for these procedures in the analysis routines. both the application of print The user specATOM1 Figure 1. In the first step, common the users instrumentation of data is through procedure calls. proThis tool will instrument machinery is combined program with routines to build a custom tool. an application at points routines. application program so
f

from basic block counting

The analysis routines do not program; if

share any procedures or data with the application in all codesame library procedure, like print in the final executable, internally works
f,

ATOM provides the common infrastructure instrumenting user simply specifies the tool details. ATOM allows selective instrumentation. ifies the points in the application ments to be passed. The communication Information

program and the analysis routines use the there are two copies one in the application in

tools, which is the cumbersome part. The

program and the other in the analysis routines. in two steps, as shown program to be irMru-

mented, the procedure calls to be made, and the argu-

is directly passed from the application interprocess

specified by the users instrumentation cation program to build an instrumented executable. The instrumented from application that information

gram to the specified analysis routine with a procedure call instead of through mechanism. Even though the anatysis routines run in the same address space as the application, precise information about communication, files on disk, or a shared buffer with central dispatch

In the second step, this custom tool is applied to the appliexecutable is organized

program is communicated

1ExtemaUy, the user specifies: atomprog in.rt.c ard.c -o Prog.atom to produce dre instrumented program prog.atom.

197

generic object moditier

user application

applia2a3ion

analysis output

user instruouenting n

user a~:~is

application output

Figure 1: The ATOM Process

directly to procedures in the analysis routines through procedure calls. The data is passed as arguments to the handling routine in the requested form, and does not have to go through a central dispatch mechanism. To reduce the communication plication address space. ATOM partitions places the application More importantly, with the information plication program such that they do not interfere to a procedure call, the apthe symbol name space and with each others execution. program and the analysis routines run in the same and analysis routines in the executable

The user provides taining

three files to ATOM: routines,

the application a file conspecand

program object module that is to be instrumented, the instrumentation the analysis routines. ify where the application The instrumentation

and a file containing routines

program is to be instrumented

what procedure calls are to be made. The user provides code for these procedures in the analysis routines. sections show how to write the instrumentation routines for our example tool. The next two and analysis

the analysis routine is always presented (data and text addresses) about the apas if it was executing uninstrumented. Our branch counting tool needs to examine all the conditional branches in the program. We traverse the program a procein the basic block is a conditional The instrumentation the dure at a time, and examine each basic block in the procedure. If the last instruction branch, we instrument ATOM
Instrument

Defhing

Instrumentation

Routines

Section 4 describes how the system guarantees the precise information. ATOM, piler modules. built using OM[l Since OM 1], is independent to work of any comwith different and language system because it operates on objectis designed

the instruction.

architectures,

ATOM can be applied to other architectures.

routines are given in Figure 2. starts the instrumentation


prOCedure3. t rument by defining routine enables

process by invoking instrumentation


The instrumentation

All

modules

Building Customized An Example

Tools:

contain the Ins


process begins in the analysis

procedure. the prototype be called

of each procedure from the application interpret the ar-

that will ATOM 1P rot

In this section we show how to build a simple tool that counts how many times each conditional are written to a file. zoM ~a~ inltiauy implemented on the DECStations running under ULTRIX and was ported to Alpha AXP running under OSF/1. UL~IX, DECStatlon and Alpha AXP are trademarks of Digital Equipment Corporation.

program. guments.

This

to correctly o primitive

branch in the program

is

The AddCal

is used to define
PrintBranch,

taken and how many times it is not taken. The final results

the prototypes. procedures


and Cl o seFile

In our example, prototypes of four analysis


CondBranch, Besides are defined. the standard C data

OpenFi.le,

s~e Instmment procedure takes argc and argv as arguments which can be optionally passedfrom the atom command line.

198

verses the program dure, Get First Instrument(int { Proc *p; Block *b; Inst *insq int nbranch = O; AddCaBProto(OpenFile(int)); AddCaBProto(CondBranch(int, AddCalWroto(PrintBranch(int, AddCallProto(CloseFileo); for(p=GetFirstProco; p!=NULL;p=GetNextProc(p)){ for(b=GetFirstBlock(p) ;b!=NULL;b=GetNextBlock(b)){ inst = GetLastInst(b); if(IsInstType(inst, InstTypeCondBr)){ AddCallInst(inst, InstBefore, CondBrattch, nbranch,BrCondValue); AddCallProgram(ProgramAfter, nbranch++; PrintBranch, nbranch, InstPC(inst)); VALUE)); long)); iargc, char **iargv)
GetNext Block

a procedure

at a time.

In each proceand Using these

Block

returns the first basic block

returns the next basic block.

primitives procedure.

the inner loop traverses all the basic blocks of a we are interested
I nst

In this example, branch instructions. it is a conditional itive.


AddCall CondBranch

only in conditional in the baplimthe and check if


Type

We find the last instruction primitive branch using the Is I nst instructions are ignored.

sic block using the Get Last All


Inst

other

With

primitive,
ore

a call to the analysis procedure

is added at the conditional branch instruction.


Bef

The Inst

argument specifies that the call is to be is executed. The two arguments are the linear number of the vatue specifies

made before the instruction to be passed to CondBranch branch and its condition The AddCallP
(programBef ore)

value. The condition


k

whether the branch will be taken.


rogram

used to insert calls before program starts executprogram

the application

ing aud after (P rog ramAf finishes executing. tively.

t e r) the application

These calls are generally used to initialle

) } }
AddCallProgram(ProgramBefore, AddCallProgram(ProgramAfter, } OpenFile, CloseFile); nbranch);

ize analysis routine data and print results at the end, respecA catl to OpenFi before the application program starts executing creates the branch statistics array and opens the output file. We insert calls for each branch to print its count at the end. PC (program counter) and its accumulated branch after the application Finally, the CloseFile

Note that these calls are made only once for each conditional program has finished executing. procedure is executed which closes

the output file. If more than one procedure is to be called at a point, the calls are made in the order in which they were Figure 2: Instrumentation Routines: Branch Counting Tool added by the instrumentation routines.

types as arguments, as REGV and VALUE. the run-time

AIXIM

supports additional

types such

Defining

Analysis

Routines

If the argument type is REGV, the acregister are passed. may


Ef fAddrValue

The anatysis routines contain code and data for all procedures needed to analyze information tion program. in the instrumentation that is passed from the applicaThese include procedures that were specified routines but may contain other proce-

tual argument is not an integer but a register number, and contents of the specified argument or BrCondValue.
BrCondValue

For the VALUE be Ef fAddrValue

type, the actual argument

dures that these procedures may call. The analysis routines do not share the code for any procedure with the application program, including Code
PrintBranch,

passes the memory store instructions.

address being referenced

by load and

k used for conditional

library routines.
OpenFi le, CondBranch,

branches and passes zero if the run-time evatttates to a false and a non-zero evaluates to true.
VALUE. CondBranch

branch condition type

for

procedures

value if the condition uses the argument

and CloseFile routines uses its argument

whose prototypes containing

were 3.

defined in instrumentation The OpenFile

are given in Figure

the number It also routine in-

ATOM atlows the user to traverse the whole program by modeling a program as consisting of a sequence of proceGet First P roc

of branches to atlocate the branch statistics array. opens a file to print results. The CondBranch

dures, basic blocks and instructions. returns the next procedure.

re-

turns the first procedure in the program, and Get Next The outermost
for

P roc

dArl~t,fter me~od would be to store the PC of each branch in ~ arraY


and pass the array at the end to be printed along wittr ttre counts. ATOM allows passing of arrays as arguments.

loop tra-

199

4
#include <stdio.h> File Wile struct BranchInfo{ long taken; long notTaken; } *bstats; void OpenFile(int n){ BranchInfo));

Implementation
is built using OM[l

of ATOM
1], a link-time code modificaof object files builds a

ATOM

tion system. and libraries symbolic

OM takes as input a collection that make up a complete representation,

program,

intermediate

applies instrumentarepresenrou-

tion and optimizations[12, ATOM starts by linking

13] to the intermediate

tation, and finally outputs an executable. the users instrumentation tines with OM using the standard linker to produce a custom tool. This tool is given as input the application the analysis routines. symbolic representations analysis routines. provide of the application representation requested. program and to build interIt uses OMS infrastructure

bstats = (strttctBrrmchInfo *) malloc (n * sizeof(struct file = fopen(btaken.out, fprintf(file, } void CondBranch(int if (taken) bstats[n] taken++; else bstats[n] .notTaken++; n, long taken){ w);

program and the of the program to

PC \t Taken \t Not Taken \n);

The traversal and query primitives

face with the intermediate the information intermediate in [11]. conveniently representation

More details of OMS so it can be exe-

and how it is built are described

We extended the OMS representation

annotated for procedure call insertions. pass builds the instrumented representation. This pass is

OMS code generation modified

}
void PrintBranch(int n, long pc){ fprintf(file, OX%lX \t %d \t %d\n, PC, bstats[n].taken, } void CloseFileo{ fclose(file); } bstats[n].notTaken);

cutable from the intermediate

to organize the data and text sections in a specific is presented to the analysis routines at

order because ATOM has to ensure that precise information about the application all times. In this section, we first describe the extensions to the intermediate representation and the insertion of procedure calls. we describe how Next, we discuss how we minimize ATOM organizes the final executable. the number of registers

that need to be saved and restored. Finally,

Inserting

Procedure

Calls
representation of OM to have a

We extended the intermediate Figure 3: Analysis Routines: Branch Counting Tool

slot for actions that may be performed before or after the entity is executed. The entity maybe a procedure, basic block,
ittSttWCtiOtt

crements the branch taken or branch not taken counters for the specified branch by examining gument.
Print Branch prints

or an edges. The AddCal


representation the call when describing and indicating

1 primitives

annotate to the to

the condition

value arthe

the intermediate action slot

by adding

a structure

the PC of the branch,

to be inserted,

arguments

number of times the branch is taken and number of times it is not taken. CloseFi.le closes the output file.

be passed,

the call is to be made.

Cur-

rently, adding calls to edges is not implemented.


AddCa 11P rot o primitive, a linked calls and ATOM
VtXifi(X

The protothat. The

type of the procedure must already have been added with the
action slot contains as multiple they list of all such actions at a point. to be perThe order will be

Collecting

Program

Statistics
ATOM is given as input the format, The

To find the branch statistics, fully linked application the instrumentation instrumented

formed in which

can be added is maintained

program

in object-module executable.

are added

so that calls

routines, and the analysis routines. program

made in the order they were specified.

output is the instrumented program

When this

After the intermediate

representation

has been fully annoThis process is easy

is executed, the branch statistics are

tated, the procedure calls are inserted.

produced as a side effect of the normal program execution.

5An edge ~onnect~two basic blocks and representsthe transfer of control between them.

200

because all insertion is done on OMS intermediate tation and no address tixups are needed. ATOM, does not steal any registers from the application ters that may be modified

represenlike QPT, programb.

here

where to save these caller-save

registers, and which code, where the call is We The

caller-save registers to save. Saving registers in the application being inserted, is not a good idea if there are more than a few registers to be saved, as it may cause code explosion. create a wrapper routine for each analysis procedure. and makes the call to the analysis routine. routine. Unfortunately,

It allocates space on the stack before the call, saves regisduring the call, restores the saved registers after the call and deallocates the stack space. This enables a number of mechanisms such as signals, setjmp and vfork to work correctly without needing any special attention. The calling conventions are followed in setting up calls to analysis routines. The first six arguments are passed in

wrapper routine saves and restores the necessary registers, The application in calls to program now calls the wrapper routine instead of the analysis this creates an indirection analysis routines. However, this has the advantage that it

registers and the rest are passed on the stack. The number of instructions needed to setup an argument depends on the type of the argument. For example, a 16-bit integer constant can a 32-bit constant in two instructions, and so on. Passing subroutine is within branch range, be built in 1 instruction,

makes no changes to the analysis code so it works well with a debugger like dbx. This is the default mechanism. ATOM provides an additional routines. facility in which the saves and restores of caller-save registers are added to the analysis No wrapper routines are created in this case. The extra space is allocated in the anrttysis routines stack frame. This requires bumping the stack frame and fixing stack references in the analysis routines as needed. This is more work but is more efficient level debugging. optimization as analysis routines are called directly. is available as a higher Since this modifies the analysis routines, it hampers sourceThis mechanism option. the analysis routines. The data flow

a 64-bit program counter in 3 instructions contents of a register takes 1 instruction. To make the call, a pc-relative instruction

is used if the analysis routine

otherwise, the value of the procedure is loaded in a register and a js r instruction is used for the procedure call. Therewhen a call is made This register the proceturn address register is always modified

so we always save the return address register. dures address for the js r instruction.

becomes a scratch registeq it is used for holding

The number of registers that need to be saved and restored is reduced by examining summary information with inall regisof the analysis routines determines all Only these registers need to be the

Reducing

Procedure

Call Overhead

the registers that may be modified when the control reaches a The application terprocedural not follow program may have been compiled conventions 8. Therefore, particular analysis procedure. number of different routines. Moreover, if an analysis routine contains procedure calls and delay the saves of other to other analysis routines, we save only the registers directly used in this analysis routine none of theprocedttrecalls path the program normally takes. registers to procedures that maybe called. We only do this if occur in a loop. Thus we distribute This helps analysis routines involves printing that the cost of saving registers; the overhead now depends on the return if their argument is valid but otherwise raise art error optimization and may contain routines that do in the call to the analysis routines as they have to allow The calling convensaved and restored. We use register renaming to minimize

the catling

caller-save registers used in the analysis

ters that may be modified

need to be saved, The analysis routines, on the other hand, have to follow the calling conventions arbitrary procedures to be linked in.

tions define some registers as callee-save registers that are preserved across procedure calls, and others as caller-save registers that are not preserved across procedure calls. All the caller-save registers need to be saved before the call to the analysis routine and restored on return from the analysis routines. This is necessary to maintain program. the execution state of the application automatically The callee-save registers would

an error. Raising an error typically

message and touching a lot more registers. For such routines, the common case of a vatid argument has low overhead as few registers are saved. This optimization current implementation. The number of registers that need to be saved may be further reduced by computing live registers in the applilive variable Only cation program. anatysis[l OM can do interprocehral is available in the

be saved and restored in analysis routines if Iivo issues need to be addressed

they are used by them.

Gpixie ~te~s three registers away from the application program for its own use. Pixie maintains three memory locations that have the values of these three registers, and replaces the use of dtese registers by uses of the memory locations. 7~pha[10] ha5 a signed 21-bit pc-relative subroutine branch ~stm~tion. The application may contain hand-crafted assembly language code that often does not follow standard conventions. All)M can handle such programs. $AnalYsis routines are analogous to standard library routines hat have to follow calling conventions so they can be linked wirb progrants.

1] and compute all registers live at a point. execution. Optimization

the live registers need to be saved and restored to preserve the state of the program inlining such as further reduce the overhead of procedure calls at the

201

A
Stack I
textstart ~

cad-only data ~xcedion data program text

a
t
instrumented program text Program Text Addresses Changed \ analysis gp

new datastart

old datastart /\ program data initialized program data uninitialized

n
Heap Figure 4: Memory layout have not The text addresses. the addresses supplied. program text However,

program data initialized program data uninitialized I

I ~a::ram Addresses Unchanged

t Uninstrumented Proaram LayZut Instrumented Program Layout

cost of increasing the code size. These refinements been added to the current system.

same as before. This is shown in Figure 4. addresses of the application program have changed because of the addition of instrumented code. How-

Keeping

Pristine

Behavior

ever, we statically

know the map from the new to original program, the original PC is simply

If an analysis routine asks for the PC of an inThis works well for most of the tools. if the address of a procedure in the application If the

One major goal of ATOM is to avoid perturbing in the application program. Therefore, are put in the space between the application and data segments. Analysis cedures or data with the application data and code for all procedures that they may need. The data sections unchanged. of the application and uninitialized program;

struction in the application

the analysis routines programs

routines do not share any prothey contain routines are not including library program

is taken, its address may exist in a register. is not the original text address.

analysis routine asks for the contents of such a register, the value supplied not implemented We have to return in our current system the ability

moved, so the data addresses in the application The initialized

program are programs data

original text address in such cases. Analysis routines may dynamically

allocate data on heap. program do not 1 routines, one in two

data of analysis

Since analysis routines and the application share any procedures, there are two sbrkl the application

routines is put in the space between the application must be located before all uninitialized data by initializing it with zero.

text and data segments. In an executable, all initialized

program and the other in the analysis routines

data, so the uninitial-

that atlocate space on the same heap. ATOM provides options for tools that must allocate dynamic memory.

ized data of the analysis routines is converted to initialized The start of the stack and heap10 are unchanged,
lo~ the MPha Am

so all stack and heap addresses are

Mder OSF/1 stack begins at start of text S9gItIetIt

and grows towards low memory, and heap starts at end of uninitialized data and grows towards high memory. 11sbrk routines ~ocate more data spacefor tie program.

202

The first method links the variables of the two sbrks, so both allocate space on the same heap without on each other. stepping This Each starts where the other left off.

to the uninstrumented tool.

programs

execution

time for each

Figure 6 shows the ratios for the SPEC92 programs.

The procedure call overhead is dependent on the code in the analysis routines, and the number and type of arguments that are passed. ATOM uses the data flow summary information along with register renaming to save. The contribution instrumented program to find the necessary registers time is also dependent on of procedure call overhead in the

method is useful for tools that are not concerned with the heap addresses being same as in the uninstrumented of the program. also sufficient Such tools include version basic block counting, that require

branch analysis, inline analysis and so on. This method is for tools such as cache modeling This is the default behavior. precise heap addresses but do not allocate dynamic memory in analysis routines. The second method is for tools that allocate dynamic memory and also require heap addresses to be same as in the tminstrumented version of the application application between the application program. To keep the heap addresses as before, the heap is partitioned and the analysis routines. The appli-

execution

the number of times the procedure calls take place. The inline tool instruments only procedure call sites; the total overhead is much less than the cache tool, which do when the control information communication instruments each memory reference. The amount of work the analysis routines reaches them is totally to compute. dependent on Although the the user is trying

overhead is small, we expect it to decrease live register analysis and inlining. Alpha AXP 3000

cation heap starts at the same address but the analysis heap is now made to start at a higher address. The user supplies the offset by which the start of analysis heap is changed. ATOM modifies the sbrk in analysis routines to start at the new address; the two sbrks application are not linked this time. The disad-

further when we implement

All measurements were done on Digital Model 400 with 128 Mb memory.

vantage of this method is that there is no runtime check if the heap grows and enters into the analysis heap.

Status
runs on Alpha AXP with The system Work is in

ATOM is built using OM and currently Fortrart, C++ and two different currently

under OSF/1. It has been used with programs compiled

Performance
how long ATOM takes to instrument a program, and programs execution time compares to 20 SPEC92 progr~s with programs execution time.

C compilers. modules.

works on non-shared library

To find how well ATOM performs, two measurements are of interest how the instrumented the uninstrumented

progress for adding support for shared libraries. ATOM projects. has been used both in hardware Besides the SPEC92 benchmarks, real applications and software it has successand at a Few

fully instrumented few universities12.

of up to 96 Megabytes. inside Digital

We used ATOM to instrument

The system is being used extensively

11 tools, The tools are briefly described in Figure 5. The time taken to instrument a program is the sum of the ATOMs processing time and the time taken by the users instrumentation routines. different The time taken by a tool varies as each tool does amounts of processing. For example, the malfoc

Our focus until now has mainly been on functionality. optimization overhead. Currently, reduction

have been added to reduce the procedure call in register saves has been information of We live register analysis system. data flow summary

obtained by computing along with inlining

tool simply asks for the malloc procedure and instruments i~ the processing time is very small. CPU pipeline scheduling The pipe tool does static for each basic block at instrttmen-

analysis routines. We plan to implement are just starting to instrument

to further improve the performance. the operating

tation time and takes more time to instrument an application. The time taken to instrument 20 SPEC92 programs with each tool is also shown in Figure 5. The execution time of the instrumented program is the sum of the execution time of the uninstrumented analysis routines. application

program, the procedure call setup, and the time spent in the This total time represents the time needed and do not include those numbers statistics. The time spent in analysis time required by
12A~M is available to external users. If you would like a copY. Please contact the authors.

by the user to get the final answers. Many systems process the collected data offline as part of data collecting other systems. We compared each instrumented programs execution time

routines is analogous to the postprocessing

20 3

Analysis Tool branch


cache dyninst gprof inline io malloc pipe prof Syscall unalign

Tool Description
prediction using 2-bit history table counts

Time to instrument SPEC92 suite


110.46 WCS 120.58 SCXX 126.31 WCS 113.24 146.50
SeCS S(XS

Average Time
5.52 WCS 6.03 WX 6.32 St%X 5.66 S(3(3 7.33 Sees

model direct mapped 8k byte cache computes dynamic instruction call graph based profiling finds potential inpui/output inlining summary tool tool

call sites

121.60 WCS 97.93 Sees 257.48 SCXX 122.53 WCS 120.53 St?CS 135.61 S(?CS

6.08 sees
4.90 sees 12.87 WCS 6.13 WCS 6.03 WCS 6.78 WCS

histogam of dynamic memory pipeline stall tool Instruction profiling tool system call summary tool unalign access tool

Figure 5: Time taken by ATOM to instrument

20 SPEC92 benchmark programs

Analysis Tool
branch cache dyninst gprof inline io malloc pipe prof Syseall unalign

Instrumentation points
each conditional eaeh basic block each procedure/each basic block each call site before/after write procedure before/after malloc procedure basic block each basic block each procedure/each before/after each basic block branch each memory reference

Number of Arguments 3 1 3 2 1 4 1 2 2 2 3

Time taken by Instrumented Program


3.03X 11.84x 2.91x 2.70x 1.03X 1.OIX 1.02X 1.80x 2.33x 1.OIX 2.93x

each system call

Figure 6: Execution

time of instrumented

SPEC92 Programs as compared to uninstrumented

SPEC92 programs

Conclusion
modification details from tool tools to harda high-level view of the program,

Acknowledgements
Great many people have helped us bring ATOM to its current form. Jim Keller, Mike Burrows, Roger Cruz, John Edmondson, Mike McCallig, of bugs and instability Dirk Meyer, Richard Swan and Mike process. Uhler were ottrtirst users and they braved through amine field during the early development Jeremy Dion, Ramsey Haddad, Russel Kao, Greg Lueck and Louis Monier built popular tools with ATOM. Many people, too many to name, gave comments, reported bugs, and provided encouragement. Linda Wilson atl. Roger Cruz, Jeremy Dion, Ramsey PLDI reviewers gave useful Haddad, Russell Kao, Jeff Mogul, Louis Monier, David Wall, and anonymous comments on the earlier drafts of this paper. Our thanks to

By separating object-module details and by presenting ATOM has transferred

the power of building

ware and software designers. only on what information

A tool designer concentrates

is to be collected and how to proATOMs fast commuthat there proof the

cess it. Tools can be built with few pages of code and they compute only what the user asks for.
nication between application traces is no need to record and analysis means

as all data is immediately in one execution

cessed, and final results instrumented programs. of tools ATOM program. It has already to solve will

are computed Thus,

one can process

long-running a wide variety We hope for studies

been used to build and software

hardware

problems. platform

continue

to be an effective design.

in software

and architectural

204

References
[1]

l(l),

pp 1-18, March 1993. Also available as WRL

Research Report 92/6, December 1992.

Anant Agarwal, Richard Sites, and Mark Horwitz,


ATUM A New Technique for Capturing Address [12] Amitabh Srivastava and David W. Wall. LinkTraces Using Microcode.

Proceedings of the 13th

Time Optimization bit Architecture.

of Address Calculation

on a 64-

International

Symposium on Computer Architec-

Proceedings of the SIGPLAN94 Language Design


1994. in

ture, June 1986.


[2] Robert Bedichek. Some Efficient Architectures

Conference on Programming

and Implementation, to appear. Also available as


WRL Research Report 94/1, February [13] Amitabh Lazana, and Srivastava. Unreachable Simulation Techniques. Winter 1990 USENIX Conprocedures object-oriented [3] Anita chines: Borg, R.E. Kessler, Georgia David Wall. Long Address Traces from RISC MaGeneration and Analysis, Proceedings of the 17th Annual Symposium on ComputerArchitecprogramming,

ference, January 1990.

ACM LOPLAS, Vol

1, #4, pp 355-364, December 1992. Also available as WRL Research Report 93/4, August 1993. [14] David W. Wall. Systems for late code modification. In Robert Giegerich Code Generation and Susan L. Graham, eds, - Concepts, Tals, Techniques,

ture, May 1990, also available as WRL Reseasch


Report 89/14, Sep 1989. [4] Eric A. Brewer, Chrysanthos N. Dellarocas, Adrian Colbrook, and William E. Weihl. PROTEUS: A High-Performance tor. MIT/LCS/l_R-5 Parallel-Architecture 16, MIT, 1991. David Keppel, Shade A Fast for Execution Profiling. of Simula-

pp. 275-293, Springer-Verlag,

1992. Also available

as WRL Research Report 92/3, May 1992.

[51 RobertF. Cmelikand


Instruction-Set Washington. [6] Susan J. Eggers, Koldinger,

Simulator

Technical Report UWCSE 93-06-06, University

David

R.

Keppel,

Eric

J.

and Henry M. Levy. T&hniques

for EfMulti-

ficient Inline Tracing on a Shared-Memory

processor. SIGMETRICS Conference on A4eastwe-

ment and Modeling of Computer Systems, VOI8, no


1, May 1990. [7] Stephen R. Goldschmidt and John L. Hennessy, Simulations Computer of MulSystems

The Accuracy of Trace-Driven tiprocessors. CSL-TR-92-546, Laboratory,

Stanford University,

September 1992. ex-

[8] James R. Larus and Thomas Ball. Rewriting

ecutable files to measure program behavior. Soft-

ware, Practice and Experience, vol 24, no. 2, pp


197-218, February 1994. [9] MIPS Computer Systems, Inc. Assembly Language

Programmers Guide, 1986.


[10] Richard L, Sites, ed. Alpha Architecture Reference
Manual

Digital Press, 1992. Srivastava and David W. Wall. A PracCode Optimization

[11] Amitabh

tical System for Intermodule at Link-Time.

Journal of Programming Language,

205

You might also like