Efficiently Computing Static Single Assignment Form and The Control Dependence Graph

Efficiently Computing Static Single Assignment Form and the Control Dependence Graph
RON CYTRON, MARK and F. KENNETH Brown ZADECK University IBM Research JEANNE Division FERRANTE, BARRY K. ROSEN, and N. WEGMAN
In optimizing compilers, data structure choices directly influence the power and efficiency of practical program optimization. A poor choice of data structure can inhibit optimization or slow compilation to the point that advanced optimization features become undesirable. Recently, static single assignment form and the control dependence graph have been proposed to represent data flow and control flow propertiee of programs. Each of these previously unrelated techniques lends efficiency and power to a useful class of program optimization. Although both of these structures are attractive, the difficulty of their construction and their potential size have discouraged their use. We present new algorithms that efficiently compute these data structures for arbitrary control flow graphs. The algorithms use dominance frontiers, a new concept that may have other applications. We also give analytical and experimental evidence that all of these data structures are usually linear in the size of the original program. This paper thus presents strong evidence that these structures can be of practical use in optimization. Categories and Subject Descriptors: D .3.3 [Programming Languages]: Language Constructscontrol structures; data types and structures; procedures, functions and subroutines; Languages]: Processorscompilers; optimization; I. 1.2 [Algebraic D.3.4 [Programming Intelligence]: Automatic Manipulation]: Algorithmsanalysis of algorithms; 1.2.2 [Artificial Programming-program transformation
A preliminary version of this paper, An Efficient Method of Computing Static Single Assignment Form, appeared in the conference Record of the 16th ACM Symposium on principles of Programming Languages (Jan. 1989). F. K. Zadecks work on this paper was partially supported by the IBM Corporation, the office of Naval Research, and the Defense Advanced Research Projects Agency under contract NOCIO14-83K-0146 and ARPA order 6320, Amendment 1. Authors ac!drwses: R. i3yiru:I, J. Ferrante, and M. N. Wegman, Computer Sciences Department, IBM Research Division, P. O. Box 704, Yorktown Heights, NY 10598; B. K. Rosen, Mathematical Sciences Department, IBM Research Division, P. O. Box 218, Yorktown Heights, NY 10598; F. K. Zadeck, Computer Science Department, P.O. Box 1910, Brown University, Providence, RI 02912. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. @ 1991 ACM 0164-0925/91/1000-0451 $01.50
ACM Transact~ons on Programmmg Languages and Systems, VO1 13, NO 4, October, le91, Pages 451.490
452
General
Ron Cytron et al.

Languages Control dependence, control flow graph, clef-use chain,
Terms: Algorithms,
Additional dominator,
Key Words and Phrases: optimizing compilers
1. INTRODUCTION In optimizing and efficiency structure can advanced compilers, data structure choices directly influence the power of practical program optimization. A inhibit optimization or slow compilation features become undesirable. poor choice of data to the point where static single
optimization
Recently,
assignment (SSA) form [5, 431 and the control dependence graph [241 have been proposed to represent data flow and control flow properties of programs. Each of these previously unrelated techniques lends efficiency and power to a useful class of program optimizations. Although both of these structures are attractive, the difficulty of their construction and their potential size have discouraged their use [4]. We present new algorithms that efficiently compute these data structures for arbitrary control flow graphs. The algorithms use
dominance frontiers,
a new
concept
that linear
may
have that that
other
applications.
We
also give analytical the dominance This paper dependence Figure thus
and experimental is usually strong role presents the
evidence evidence
the sum of the sizes of all program. and control SSA form The
frontiers
in the size of the original
can be of practical 1 illustrates
use in optimization. of SSA form in a compiler. intermediate
code is put into SSA form, optimized in various ways, and then translated back out of SSA form. Optimizations that can benefit from using SSA form include code motion [221 and elimination of partial redundancies [431, as well as the constant propagation discussed later in this section. Variants of SSA form have been used for detecting program equivalence [5, 521 and for increasing parallelism in imperative programs [211. The representation of simple data flow information (clef-use chains) may be made more compact through SSA form. If a variable has D definitions and U uses, then there can be D x U clef-use chains. When similar information is encoded in SSA form, there can be at most E def-use chains, where E is the number of edges in the control flow graph [40]. Moreover, the clef-use information encoded in SSA form can be updated easily when optimizations are applied. This is important for a constant propagation algorithm that deletes branches to code proven at compile time to be unexecutable [50]. Specifically, the def-use program information text that is just use that a list, for each variable, of the places in the variable. Every use of V is indeed a use of the to begin with straight-line
value provided by the (unique) assignment to V. To see the intuition behind SSA form, it is helpful
code. Each assignment to a variable is given a unique name (shown as a subscript in Figure 2), and all of the uses reached by that assignment are renamed to match the assignments new name. Most programs, however, have branch and join nodes. At the join nodes, we In Figure 3, the add a special form of assignment called a @-function.
ACM TransactIons on Programming Languages and Systems, Vol. 13, No, 4, October 1991
Static Single Assignment Form and the Control Dependence Graph

Orlgmal Intermediate Code Opt Imlzed Intermediate Code
453
P()+P~+P~-+
Fig, 1. Vertical arrows represent arrows represent optimizations, translation to/from
...
Pn
form, Horizontal
static single assignment
V+4
VI-4 + V2*6 +-- V2 +7

Straight-line code and its single assignment version.
+-V+5 V+-6 --V+7

Fig. 2.
vi
+5
if
P then else V V +4 6 times. Fig.3. */ if-then-else anditssingle
if
P then else VI V2 + + 4 6 tines. */
v~ e- q5(vf, /* IJse V several

/* Use
V2)
V~ several version.
assignment
operands point. replaced just
to the by new
@-function variables to Vi.
indicate
which
assignments and is only each
to V reach old variable
thle join V is thus by
Subsequent one assignment
uses of V become Indeed,
uses of V3. The
Vl, Vz, V3, ..., there
use of Vi
is reached
one assignment
to Vi in the
entire program. This simplifies the record keeping An especially clear example is constant propagation the next subsection sketches this application. 1.1 Constant Propagation elaborate version of Figure true
for several optirnizations. based on SSA form [501;
Figure
4 is a more
3. The else branch to this use of Q tells
includes a compiler
test of Q, and propagating
the constant
that the else branch never falls through to the join point. If Q is the constant true, then all uses of V after the join point are really uses of the constant 4. Such possibilities is either can be taken into account without SSA form, but the processing costly [48] or difficult that to understand [49]. With SSA form, (i. e., never
the algorithm is fast and simple. Initially, the algorithm assumes
each edge is unexecutable
followed at run time) and that each variable is constant with an as-yet unknown value (denoted T). Worklists are initialized appropriately, and the assumptions are corrected until they stabilize. Suppose the algorithm finds that variable P is not constant (denoted _L) and, hence, that either branch of ACM
Transactions on Programming Languages and Systems, Vol. 13. No. 4, October 1991.
454
if then P
Ron Cytron et al
if P
do
V +-
then else return
do end
Vi
++-
4 6 return
end else do end V3 . . . +V+5

Fig 4, Example ofconstant
V + if
6 Q then
do V2 if end + 4(V1,
Q then
V2)
. . . +- V3+S
propagation
the outer executable, processed,
conditional and the the
may betaken. statements about VI
The outedges they reach are is changed
of the test ofP processed. from When T to 4, and
are marked VI +4 all is uses of
assumption
VI are notified that they are uses of4. (Each use ofVl is indeed a use of this value, thanks to the single assignment property.) In particular, the @function combines 4with Vz. The second operand of the @function, however, isassociated uses with T for the inedge So long second the of thejoin as this operand, point that corresponds what to falling the asthrough algorithm the else branch. edge is considered no matter unexecutable, is currently
sumed about Vz. Combining 4 with T yields 4, so uses of V~ are still tentatively assumed to be uses of 4. Eventually, the assumption about Q may change from T to a known constant or to ~ . A change to either false or L would lead to the followed, and then discovery that the second the 6 at Vz combines with inedge of the join point can be the 4 at VI to yield L at V~. A
change to true would have no effect on the assumption at V3. Traditional constant propagation algorithms, on the other hand, would see that assignments of two different constants to V seem to reach all uses after the join point. They would thus decide The algorithm just sketched that V is not constant at these uses. is linear in the size of the SSA program it sees
[501, but this size might be nonlinear in the size of the original program. In particular, it would be safe but inefficient to place a +-function for every variable at every join point. This paper shows how to obtain SSA form efficiently. When ~-functions are placed possible but is unlikely in practice. The linear in essentially the size of the original linear in the SSA size. carefully, nonlinear behavior is still size of the SSA program is typically the time to do the work is
program;
1.2
Where
to Place
@-Functions placement might seem to require the enumeration of
At first
glance,
careful
pairs of assignment statements for each variable. Checking whether there to V that reach a common point might seem to be are two assignments intrinsically nonlinear. In fact, however, it is enough to look at the domiof each node in the control flow graph. Leaving the technicalinance fi-ontier ties to later sections, we sketch the method here.
ACM Transactions on Programming Languages and Systems, Vol 13, No 4, October 1991
Static Single Assignment Form and the Control Dependence Suppose so that that a variable V has just be either one assignment a use of the
Graph
455
in the original value V.
program, to the
any
use of V will
at entry
program or a use of the value assignment to V. Let X be the will determine the value basic block Y. When entered
VI from the most recent execution of the basic block of code that assigns to V, so X control flows along any edge Y will X + 1{ to a and be X (in a [ways X + Y, the code in see VI
of V when along
unaffected by V.. If Y # X, but all paths to Y must still go through dominate Y), then the code in Y will which case X is said to strictly see VI. Indeed, any matter how far from to reach a node node strictly dominated X it may be. Eventually, strictly dominated by
by X will always see Vl, no however, control may be able X. Suppose but Z is the may first such see V. along frontier of X and is no matter how many and no matter how
Z not
node on a path, so that Z sees VI along one inedge another inedge. Then Z is said to be in the dominance clearly in need of a @-function for V. In general, assignments to V may appear in the original program
complex the control flow may be, we can place ~-functions for V by finding the dominance frontier of every node that assigns to V, then the dominance frontier of every node where a +-function has already been placed, and so on. The same concept also be used conditions dependent statement skipped. of dominance control frontiers dependence used for computing [20, 24], which Informally, the branch SSA form identify can those to compute
affecting statement execution. on a branch if one edge from
a statement is control definitely causes that
to execute while another edge can Such information is vital for detection and program analysis [28].
cause the statement to be of parallelism [2], program
optimization, 1.3 Outline
of the Rest of the Paper the representation of control flow by a directed graph, section can be
Section
2 reviews
Section 3 explains SSA form and sketches how also considers variants of SSA form as defined adjusted concept to deal with and formalizes
to construct it. This here. Our algorithm
these variants. Section 4 reviews the dominator tree dominance frontiers. Then we show how to compute
SSA form (Section Section 7 explains algorithms structures. experiments
5) and the control dependence graph (Section 6) efficiently. how to translate out of SSA form. Section 8 shows that our
are linear in the size of programs restricted to certain control We also give evidence of general linear behavior by reporting on Section 9 summarizes the algowith FORTRAN programs. our technique with other techniques, and
rithms and time bounds, compares presents some conclusions. 2. CONTROL The basic statements blocks, FLOW GRAPHS are flow
of a program where program
organized enters
into
(not block
necessarily at its first
maximal) statement
a basic
and leaves the basic block at its last statement [1, 36]. Basic blocks are indicated by the column of numbers in parentheses in Figure 5. A control is a directed graph whose nodes are the basic blocks of a program jZow graph and two additional nodes, Entry and Exit. There is an edge from Entry to
ACM Transactions on Programming Languages and Systems, Vol. 13, No, 4, October 1991
456
Ron
Cytron
et
al.
1+1 J+ K+l L+-1 I
(1) (1) (1) (1) if (P) then

Jif
repeat do
I (Q) then else K-K+ end L L I t 2 3
(2) (2) (3) (3) (3) (4) (5) (6) (6) (7) (8) (9)
(R) then L +L + 4 (S)
else print repeat if until 1-1+6 until (T) (I,
KtK+2 J,K, L)
(9) (10)
(11)
(12) (12)
Fig 5 Simple pro~amand itscontrol
@IEl--&
flow graph
4
9 10 11 6, there is represent that each X, a and
any basic block at which the program can be entered, Exit from any basic block that can exit the program. the representation also an edge from transfers of control node
successor
and there is an edge to For reasons related to in Section
of control
dependence
and explained
Entry to Exit. The other (jumps) between the basic Entry node
edges of the graph blocks. We assume
is on a path from of X is any
and on a path to Exit. For Y with an edge X + Y in
each node the graph,
SZLCC( X) is the set of all successors of X; with more than one successor is a branch
similarly for predecessors. A node node; a node with more than one
node. Finally, each variable is considered to have an predecessor is a join assignment in Entry to represent whatever value the variable may have when the program is entered. This assignment is treated just like the ones the that appear explicitly in the code. Throughout this paper, CFG denotes control flow graph of the program under discussion. of length J in CFG consists of a For any nonnegative integer J, a path Xo, ., XJ) and a sequence of J edges sequence of J + 1 nodes (denoted (denoted e,, . . . . eJ) 1< j < J. (We write (node or edge) may such that e, runs from XJ_, to X, for all j with sequences, one item eJ: X~_l ~ X~. ) As is usual with occur several times. The null path, with
1991
J = O, is
ACM Transactions
on Programmmg
Languages and Systems, Vol
13, No 4, October
Static Single Assignment Form and the Control Dependence allowed. Nonnull if We write paths
p: p:
Graph
p:
457 if
X.:
XJ
for an unrestricted
path
p, but
X. XJ
p is known
to be nonnull. X. ~ XJ and q: YO ~ YK are said to converge at a node Z
X. # Yo;
XJ=Z=YK; (XJ=Y,) Intuitively, node-disjoint, we still the but paths they p and -(j= q start Jork=K). at at the different end. The nodes and are
(1)
(2) (3) almost than
come together the possibility
use of or rather to be a cycle
and in (3) is deliberate. need to consider
One of the paths
may happen of convergence.
Z ~ Z, but
3. STATIC
SINGLE
ASSIGNMENT
FORM been expanded to an some expressions and
This section intermediate
initially assumes that programs have form in which each statement evaluates
uses the results either to determine a branch or to assign to some variables. (Other program constructs are considered in Section 3.1.) An assignment statement A has the form LHS( A) - RHS( A), where the left-hand side LHS( side tuple. from there A) is a tuple RHS( Each RHS( A) target of distinct variable in target LHS( variables with A) (U, V, . ..) the same the and the right-hand length as the LHS value and the corresponding are l-tuples, 3.1 sketches (such is a tuple of expressions, already between other
is assigned discussed, V and program (V).
A). In the examples tuples in expanding
all tuples Section constructs
is no need to distinguish
use of longer
as proce-
dure calls) so as to make explicit the variables used and/or changed by the statements in the source program. Such explicitness is a prerequisite for most optimization anyway. The only novelty in our use of tuples is the fact that into they provide a simple text. and uniform way to fold about the results of analysis the intermediate Practical concerns the size of the inte rmedi process. the second In the step, first of the new
ate text are addressed in Section 3.1. Translating a program into SSA form step, join some nodes trivial in the o-functions programs control flow
is a two-step graph. In
V ~ @(V, V, . . . ) are
inserted
at some
variables V, (for i = O, 1,2, . . . ) are generated. Each mention of a variable V in the program is replaced by a mention of one of the new variables V,, where a mention may be in a branch expression or on either side of an assignment may the be either result of statement, Throughout an ordinary assignment this paper, an assignment or a @function. Figure statement 6 displays
translating a simple program to SSA form. A +-function at entrance to a node X has the form V ~ +( R, S, . . . ), where V, R, S, . . . are variables. The number of operands R, S, . is the number of control flow predecessors of X. The predecessors of X are listed in some arbitrary fixed order, and the jth operand of + is associated with the jth
ACM Transactions on Programming Languages and Systems, Vol. 13, No. 4, October 1991.
458
Ron Cytron et al. 1-1 J+ Kti L+-1

repeat
11-1 Jl +-1 Kl +-l L1tl

repeat 12 #(13,11)
J2 K2 L2
+ +(P) then
@(J4, JI) l#J(K~,K~) 4(L9, L1)

do J3 if (Q) then else L5 K3 L3 L4 t +2 3 12
if
(P) then
J+
if do
I
if
(Q) then else L +-- 2 L +--- 3
K-K+
end else K+--K+2
I
end else J4 K5 L6 +
@(L3, L4) +K2+1 K4 +K2+2
#( J3, J2) @(K3, K4) ~(L2,L~) J4, K5, L6) +(R) then L8 -L7+4 @(L~,L~)
print repeat
(I,
J,K,
L)
print(12, repeat L7
if
(R) then L+-L+4
if L9
+ (S)
@(L8, L7)
until 1+1+6 until (T)
(S) until Fig.6 Simple program
until 13-12+6 (T)
anditsSSAform.
predecessor. Ifcontrol supportl remembers @(R, S,...) qifunction is just uses only
reaches Xfromits J while executing the value one ofthe
jth predecessor, then the run-time the o-functions in X. The value of
of the jth operand. Each execution of a operands, but which one depends on the flow
of control just before entering X. Any d-functions in X are executed before the ordinary statements in X. Some variants of d-functions as defined here are useful for special purposes. For example, each @function can be tagged
lWhen SSA form is used only as an intermediate form (Figure 1), there is no need actually to provide this support. For us, the semantics of @ are only important when assessing the correctness of intermediate steps in a sequence of program transformations beginning and ending with code that has no +-functions, Others [6, 11, 521 have found it useful to give @ another parameter that incidentally encodes j, under various restrictions on the control flow.
ACM Transactions on Programming Languages and Systems, Vol. 13, No 4, October 1991
Static Single Assignment Form and the Control Dependence with the node X where it appears [5]. When the control
Graph
459 is
flow
of a language
suitably restricted, each @function can be tagged with information conditionals or loops [5, 6, 11, 52]. The algorithms in this paper apply variants SSA relation form as well. form may be considered between two programs. is a target
about to such
as a property of a single program OY as a A single program is defined to be in SSA of exactly one assignment statement in the the original program by a For every original variable
if each variable
program text. Translation to SSA form replaces new program with the same control flow graph. V, the following conditions are required
of the new program:
(1) If two nonnull paths X ~ Z and Y~ Z converge at a node Z, and nodes X and Y contain assignments to V (in the original program), then a trivial +-function V + q5(V, . . . . V) has been inserted at Z (in the new program). (2) Each mention of V in the original program or in an inserted +fulnction has been replaced by a mention of a new variable V,, leaving the new program in SSA form. flow path, consider and the corresponding the same value. SSA form is translation to SSA form with the any use of a variable V (in the use of V, (in the new program). (3) Along any control original program) Then V and to
Vi have
minimal
Translation
proviso that the number of @-functions inserted is as small as possible, subject to Condition (1) above. The optimizations that depend on SSA form are still valid if there are some extraneous +-functions beyond those that would cause where forego appear information they placing
are
in
minimal process required.
SSA itself. One
form. they Thus,
However, always of SSA point it is important
extraneous to place form [52] Z, so long
+-functions overhead +-functions would as there
can to only nre no
to be lost,
and
add unnecessary
the optimization
variant
sometimes
a @-function
at a convergence
more uses for V in or after any risk of losing Condition and it is sometimes However, as Figure
Z. The @function could then be omitted without (3). This has been called pruned SSA form [16], at all our convergence points. form is sometimes
preferable to our placement 16 in Section 7 illustrates,
preferable to pruned form. When desired, pruned SSA form can be obtained by a simple adjustment of our algorithm [16, Section 5.1]. For any variable V, the nodes at which we should insert @-functions in the original program that to can be defined originate at V or needing Z needs recursively two different @-functions a @-function by Condition (1) in the definition point for containing we may of SSA form. two paths assignments observe that A node Z needs a +-function for V if Z is a convergence nodes, both nodes for V. Nonrecursively, for V because
a node
Z is a convergence
point for two nonnull paths X ~ Z and Y ~ Z that already containing assignments to V. If Z did
start at nodes X and Y not already contain an
assignment to V, then the @function inserted at Z adds Z to the set of nodes that contain assignments to V. With more nodes to consider as origins of paths, we may observe more nodes appearing as convergence points of
460 nonnull needing insertion in much 3.1 If
. paths
Ron Cytron et al. originating could algorithm at nodes thus with be assignments by here obtains to V. The an the same set of nodes observation/ end results
@functions cycle. 2 The less time. Program
found
iterating
presented
Other the
Constructs computes expressions using constants and scalar to (or
source
program
variables to assign values to scalar variables, then it is straightforward derive an intermediate text that is ready to be put into SSA form otherwise analyzed and transformed by an optimizing compiler).
However,
source programs commonly use other constructs as well: Some variables are not scalars, and some computations do not explicitly indicate which variables they use or change. This section sketches text how to map some important constructs to an explicit intermediate suitable for use in a compiler.
3.1.1 Arrays. It will suffice to consider one-dimensional arrays here. Arrays of higher dimension involve more notation but no more concerns. If A and B are array variables, then array assignment statements like A + B or A +- O, if allowed assignments program, taken value rather to awkward, be both to scalar however, an will integer because by the source Many language, of the for can be treated i, which not just may would change like be be the variables. variable. mentions A(i) may as of A in the a variable source
be mentions an assignment
of A(i) Treating
some index or may
to A(i)
of A(j) and because the value of A(i) could be changed by assigning to i than to A(i). An easier approach is illustrated in Figure 7. The entire
array is treated like a single scalar variable, which may be one of the operands of Access or Update. 3 The expression Access(A, i) evaluates to the ith component of A; the expression Update(A, j, V) evaluates to an array value that is of the same size as A and has the same component values, except for V as the value of the jth component. Assigning a scalar value V to A(j) is equivalent to assigning an array value to the entire array A, where the new array value depends on the old one, as well as on the index j and on the new scalar value V. The translation to SSA form is unconcerned with whether the values of variables As with scalars, translation are large objects or what the operators mean. of array references to SSA form removes some
anti- and output-dependences [32]. In the program in Figure 7a, dependence analysis may prohibit reordering the use of A(i) by the first statement and the definition of A(j) by the second statement. After translation to SSA form, the two references to A have been renamed, and reordering is then possible. For example, the two statements can execute concurrently. Where optimization does not reorder the two references, Section 7.2 describes how transla -
2In typical cases, paths originating at @functions add nothing beyond the contributions from paths originating at ordinary assignments. The straightforward iteration is still too slow, even in typical cases 3This notation is similar to Select and Update [24], which, in turn, is similar to notation for referencing aggregate structures in a data flow language [23]. ACM Transactions on Programming Languages and Systems, Vol. 13, No. 4, October 1991
461
+ A(i)
A(j) + -
V
A(k) + 2
A t
T -
Access Access
(A, i) j,V) A9 T1 (A, k)
++ +
ACCeSS(A8,
17)
Llpdate(A,
Update(A8,
j~, V~)
Access(A9,k~~)
-T+2 (a) (b)
+T1+2 (c)
Fig. 7. Source code with array component references (a) is equivalent to code with explicit Access and Update operators that treat the array A just like a scalar (b). Transformationto SSA form proceeds as usual (c).
tion out of SSA form reclaims maintain distinct variables. Consider array A. indirectly) terms the loop shown
storage
that
would
otherwise
be necessary component
to of
in Figure
8a, which
assigns
to every
The value assigned to each component A(i) does not depend (even on any of the values previously assigned to components of A. In effect on A, the whole loop is like an assignment of the form ). Any assignment to a component of A an initialization loop is dead code that form, the Update to the loop. operator in
of its
A + (...), where A is not used in (... that is not used before entering such should Figure
be eliminated. With or without SSA 8b makes A appear to be live at entry
One reasonable accept it. Address
response to the crudeness of the Update operator is to calculations and other genuine scalar calculations can still is to perform dependence analydetermine that no subsequent assignment to A(i) in initialization to make are usually arrays to A. Such is Figure 8a. of A. loolk analysis formulated The The like to in
be optimized extensively. Another response sis [3, 10, 32, 51], which can sometimes accesses of A require the case for for each assignment problem scalars, statement us, or for (like values execution can anyone code produced of the then some who
by any other assignment uses Update results that as an
be viewed of the
is to communicate dead
of dependence
optimizations
elimination)
terms of scalar variables. A simple solution to this formal problem is shown in Figure 8c, where the HiddenUpdate operator does not mention the assigned array operand. The actual code generated for an assignment from a HiddenUpdate expression is exactly the same as for an assignment from the corresponding the target Update expression, where the hidden operand is supplied by of the assignment. as an array, to elements of
iIItO
3.1.2 Structures. A structure can be generally regarded where references to structure fields are treated as references the array. Thus, an assignment to a structure field Update of the structure, Access of the structure. references, this treatment constants. Dependence
is translated
an
and a use of a structure field is translated into an In the prevalent case of simple structure field results in arrays whose elements are indexed by analysis can often determine independence
i31111011g
such accesses, so that optimization may move an assignment to one field far from an assignment to another field. If analysis or language semantics reveals a structure whose n fields are always accessed disjointly, then the
ACM Transactions on Programming Languages and Systems, Vol. 13, No. 4, Octob,w 1991,
462
Ron Cytron et al.
integer
A(1:1OO)
integer integer integer
AO(I:lOO) A1(l:lOO) A2(I:IOO)
integer integer integer il+--l repeat
AO(I:lOO) A1(I:lOO) A2(1:100)
i+l repeat
il
+--l i2 Al t + + ~(ii,i~) f#(fio, A2)
repeat
i2 Al
+-++-
#(i~,
i3) i2)
~(A0,A2) HiddenUpdate(12,
A(i) i+l+l
t--
A2 i3
update (Al,lz,
iZ)
until
A2 13
until
> 100 (a)
until
-iz+i i3 > 100 (b)
+-12+1 i3 > 100
(c)
Fig. 8, Source loop with array assignment (a) is equivalent to code with an Update operator that treats the array A just hke a scalar (b), As Section 72 explains, the eventual translation out of SSA form will leave just one array here, Using HiddenUpdate (c) is a purely formal way tosummarize some results of dependency analysis, if available.
structure structures and the the
can be decomposed are united in the expression actual of the use
into n distinct source program structures of the structure
variables. The elements only for organizational in apparent SSA to form more
of such reasons, makes
decomposition
programs
subsequent
optimization. 3.1.3 requires addition modify

implicit Implicit References to Variables.
The and referenced,
construction used by a statement
of
SSA may
form In or use
knowing to those
those variables
variables explicitly
modified
a statement.
variables not mentioned by the statement itself. Examples of such references are global variables modified or used by a procedure call, variables, and dereferenced pointer variables. To obtain SSA form,
aliased
we must account for implicit as well as explicit references, either by conservative assumptions or by analysis. Heap storage can be conservatively modeled by representing the entire heap as a single variable that is both modified and used by any statement that may change the heap. More refined modeling is also possible [15], but the conservative approach is already strong enough to support optimization of code that does not involve affect the heap but is interspersed with heap-dependent code. For any statement S, three types of references form: (1) (2) (3) MustMod(S) is the set of variables is the set of variables that that whose
must
translation
into
SSA
be modified be modified prior
by execution by execution to execution
of of of S
s.
MayMod(S) may
s.
May Use(S) is the set of variables may be used by S. values
We represent an assignment
ACM Transactions
implicit references for any statement statement A, where all the variables
on Programmmg Languages and Systems, Vol,
S by transformation to in MayMod(S) appear

1991
13, No, 4, October
Static Single Assignment Form and the Control Dependence Graph in An LHS( A) and appear called all in compiler (directly the RHS( may variables A). or may not have access to the of their in May Use(S) u
463
( ikfcz~illod(s) bodies effects.
MustMod(S)) optimizing procedures
of all If no
or indirectly)
or to summaries
specific information is available, it is customary sumptions. A call might change any global variable by reference, will be performed. To determine but it is reasonable Such assumptions aliasing to assume limit and that not change. the extent to extract effects
to make conservative asor any parameter passed variables that more local to the caller can variinformation: see [14, 15, result are is that small. is at all. Both own transformations detailed aliasing, the usual and analysis, been The on global
Techniques parameter
are available
of procedures pointer
ables, see [7-9], [19], [37], and [42]; to determine 27, 29, 33, 34], and [44]. When there Small often a sophisticated side analysis and technique the tuples directly. are few tuples effects Many (both
is applied, LHS Sophisticated
RHS)
can be represented
however,
unavailable.
compilers
do no interprocedural
analysis analyzed. compilers
Consider a call to an external procedure that has not tuples for the call must contain all global variables. representation structure plus that a direct of the tuples includes a flag representation can still of the be compact. that few local (set to indicate
The representation all globals that
caln be a tuple
are in the tuple) are in the
variables
because of parameter transmission. The cal analyses of optimization techniques
intuitive (with
explanations and theoretior without SSA form) are compact representations
conveniently formulated in terms of explicit can still be used in the implementations. 3.2 Overview of the SSA Algorithm to minimal SSA form
tuples;
Translation
is done in three is constructed
steps: from the contrcd flow
(1) The dominance graph (Section
frontier 4.2).
mapping
(2) Using the dominance frontiers, variable in the original program
the locations of the @-functions are determined (Section 5.1).
for each
(3) The variables are renamed (Section 5.2) by replacing each mention of an original variable V by an appropriate mention of a new variable V,. 4. DOMINANCE Section 4.1 reviews the dominance flow graph and how to summarize 4.2 introduces computation. 4.1 Dominator Trees the dominance frontier relation [47] between nodes in the control this relation in a dominator tree. Section mapping and gives an algorithm for its
Let X and Y be nodes in the control flow graph CFG of a program. If -X appears on every path from Entry to Y, then X dominates Y. Domination is both reflexive and transitive. If X dominates
Languages
Y and
and Systems,
X # Y, then
Vol.
X strictly
1991
ACM Transactions
on Programming
13, No. 4, October
464
Ron Cytron et al
Entry [- I -]
(-i
1 [- I Exit] Exit
(-)
[- I -]
I
(Exit
2[-1
Exit,2]
(-l~xit2 3[-18]
7[8181
8[-1 Exit ,2]
(-) 4[616]
P
5[;]6] dominate
Y)) idom(
6&\]
(Exit,2) 9[-1 Exit,2,9]
\
,,
,_,~2 10[111 11] li[91Exit:2,9]
( E=k
(Exit ,2) 12[Exit,21Exit,2]
Fig. 9 Control flow graph anddominator tree of the simple program Thesets ofnodes listed in ()and[] brackets summarize thedominance frontier calculation from Section 4.2, Each node X is annotated with two sets [DFIOCaj (X) I DF(X)] and a third set (DFUP(X)).
dominates for
immediate
Y. In formulas, If X does
dominator
we write
not
X>>
Y for strict is the
domination Y, we write
closest strict
and X ~
X ~
domination.
strictly
Y. The
of Y (denoted
dominator
of Y on any path from Entry to Y. In a dominator tree, the children of a node X are all immediately dominated by X. The root of a dominator tree is Entry, and any node Y other than Entry has
idom( Y)
as its parent
in the
tree. The dominator tree for CFG from Figure 5 is shown in Figure 9. Let N and E be the numbers of nodes and edges in CFG. The dominator tree can be constructed in 0( Ea( E, N)) time [35] or (by a more difficult algorithm) in 0(E) time [26]. For all practical purposes, a( E, N) is a small constant,4 so this paper will consider the dominator tree to have been found in linear time. The dominator tree of CFG has exactly the same set of nodes as CFG but a very different set of edges. Here, the words
parent,
predecessor,
child, ancestor,
successor,
and
and
path
always always
refer to CFG. The words refer to the dominator tree.
descendant
4Under the definition of a ueed in analyzing the dominator tree algorithm [35, p. 123], N s E implies that a(17, N) = 1 when logz N < 16 and a(ll, N) = 2 when 16 < logz N < 216.
ACM TransactIons on Programming Languages and Systems, Vol. 13, No 4, October 1991
Static Single Assignment Form and the Control Dependence 4.2 The Y: DF(X) = {Y[(~Pe Pred(Y))(X> Pand X? Dominance dominance Frontiers frontier X dominates DF( X) of a CFG node
Graph
465
X is the set of all does not strictly
CFG
nodes
Y such that
a predecessor
of Y but
dominate
Y)}.
Computing D1( X) directly much of the dominator-tree. X would the be quadratic, frontier we define dominance
from the definition would require searching The total time to compute DF( X) for all nodes the sets themselves in time holds: U (J
ZeChildren(X)
even when mapping equation
are small. size Ex
To compute I DF( X) I of
linear
in the
the mapping, such that
two intermediate
sets DFIOC~l and
DFUP for eaclh node
the following DF(X)
= DFIOC~l(X)
DFUP(Z).
(4)
Given This
any local
node
X,
some
of the
successors is defined
of X may by
contribute
to DIF( X).
contribution
DFIOC.l( X)
Given any node Z that is not the root Entry of the dominator tree, some of the nodes in DF(Z) may contribute to DF( idom(Z)). The contribution DFUP(Z) that Z passes up to idom(Z) is defined by DFUP(Z)~f{Ye
LEMMA 1.
DF(Z) frontier
I idom(Z)
Y}.
The dominance dominance
equation
(4) is correct. DFIOC.l( X) G DF( X). Because
PROOF. dominance still show DF( X), strictly cannot implies The follows:
LEMMA 2.
Because
is reflexive,
is transitive, each child Z of X has DFUP(Z) G DF( X). We must that everything in DF( X) has been accounted for. Suppose Y e let U ~ Y be an edge Y. If U = X, then Y hand, there such that X dominates and Z of X that strictly U but we are dominates dominate dcles not done. If U but Y. This then X Y e DFIOC.Z( X), is a child does not
and
dominate
U # X, on the other
strictly dominate that Ye DFUP(Z). intermediate sets
because
K
can be computed with simple equality tests as
For
any node X, = {Y= SUCC(X) I idom(Y) # X}.
DFIOC~t(X) PRooF. We assume that (X>
Y ~ Succ( X) Y)
and show that =X). is defined dominates to be a strict Y and, hence,
# (idom(Y)
The = part is true because the immediate dominator. For the = part, suppose that
ACM Transactions on Programming
dominator X strictly
Languages and Systems, Vol. 13, No. 4, October 1991.
466
.
for
Ron Cytron et al,

each ~ each if end for each for if end end end Fig. 10. Calculation of DF( X) for each CFG node X. Z each E Y Children(x) E LIF(Z) # X do do then DF(X) llF(X) U {Y} in a bottom-up traversal do then llF(X) +IIF(X) U {Y} of the dominator tree do
DF(X)
for
+
Y
0
E Succ(X) # X
/*local./
id.m(Y)
/*up*/
idorn(Y)
that Entry V idom(
some to Y) dominates
child Y that X
V of or
dominates X n and But V
Y. Then then follows cannot
V appears the edge dominate
on any X, so
path V = Y
from and
goes to = x.
X + Y, so either
V = Y.
= idom(v)
LEMMA 3.
For any node X and DFUP(Z) = {Ye
any child DF(Z)
Z of X in the dominator # X}. that =X).
tree,
I idom(Y) and show
PROOF.
We assume
that (X>
Y G DF(Z) Y)+
(idom(Y)
The - part is true immediate dominance.
because strict dominance For the + part, suppose
is the transitive closure that X strictly dominates
of Y U o!i to Y U or
and, hence, that some child V of X dominates Y. Choose a predecessor Y such that Z dominates U. Then V appears on any path from Entry V dominates that goes to U and then follows the edge U + Y, so either V = Y. If that Only V = Y, then idom( Y) = idom( V ) = X, and we are done.
Suppose
V + Y (and, hence, that V dominates U) and derive a contradiction. one child of X can dominate U, so V = Z and Z dominates Y. This the hypothesis results imply the given that Ye DF(Z).
contradicts These dominance
K
algorithm /*local*/ for line computing effectively the com-
correctness in Figure fly
of the 10. The
frontiers
putes DFIOC.Z( X) on the storage to it. The /*up*/ dominator tree bottom-up, children. To illustrate the dominator tree in Figure
THEOREM PROOF. 1.
and uses it in (4) without needing to devote line is similar for DFUP( Z). We traverse the visiting each node X only after visiting each of its working of this algorithm, we have annotated the 9 with the information [1 and () brackets. in Figure 10 is correct.
The algorithm from
Direct
the preceding
lemmas.
K
10 are
Let CFG have N nodes and E edges. The loop over Succ( X) in Figure examines each edge just once, so all executions of the /*local*/ line
ACM TransactIons on Programming Languages and Systems, Vol. 13, No. 4, October 1991
Static Single Assignment Form and the Control Dependence Graph complete complete which shows have standard 4.3 in in time time 0(E). Similarly, The all executions time of the is thus DF compiler /*up*/ O(E
. line
467 are 8 We the
O(size(DF)). the
overall
+ size(DF)), Section linear. than
amounts that, implemented
to a worst-case this algorithm
complexity size of the and have
of 0( E + N2 ). However, mapping observed is usually [21. that it is faster
in practice,
data-flow
computations Frontiers formally
in the PTRAN to Joins the nonrecursive Given
Relating
Dominance more
We start of join nonnull The nodes
by stating
characterization
of where
the @-functions
should
be located.
a set Y of CFG nodes, the set J(Y)
nodes is defined to be the set of all nodes Z such that there are two CFG paths that start at two distinct nodes in Y and converge at Z. join J+ ( 5 ) is the limit of the J(Y); = J(YU J,). nodes for a variable We increasing sequence of sets of
iterated
J1= J,+l In particular, if Y happens
to be the
set of assignment
V, then J+(Y) is the set of @function The join and iterated join operations extend natural the dominance way: frontier DF( As with increasing join, the iterated mapping S ) =
nodes for V. map sets of nodes to sets of nodes. from nodes X) . DF+ ( Y ) is the limit to sets of nodes
in the
~QJDF( frontier
dominance
of the
sequence
of sets of nodes DF1=DF(Y); DF,+l = DF(W DF,) . by the efficient worklist
The
actual
computation
of
DF+ ( Y ) is performed
algorithm iterated assignment
in Section 5.1; the formulation dominance frontiers to iterated nodes for a variable V, then
here is convenient joins. If the set Y we will show that
for relating is the set of
J+(Y)
=DF+(Y)
(this equation depends on the fact that Entry is in Y) and, hence, that the location of the @functions for V can be computed by the worklist algorithm for computing DF+ ( Y ) that is given in Section 5.1. The following lemmas do most of the work by relating dominance frontiers to joins:
LEMMA 4.
For
any
nonnull
path
p:
X ~ Z
in
CFG,
there
is
a node
X e {X} U DF+ ({ X}) on p that dominates every node on p, the node X can be chosen PRooF. If X dominates properties every node
Z. Moreover, in DFf ({ X}). we just that
unless
X dominates
on p, then
choose
= X to
get all the claimed
of X.
We may
Languages
assume
and Systems,
some of the nodes

13, No. 4, October 1991.
ACM Transactions
on Programming
Vol.
468 on p
. are
Ron
Cytron
et al
not
dominated
by
X.
Xo, . . ., XJ = Z. For the smallest
Let the sequence of nodes on p be X = i such that X does not dominate X,, the
predecessor X,_ ~ is dominated by X and so puts X, into DF( X). Thus, there are choices of ~ with XJ e DF+ ({ X}). Consider X = XJ for the largest j with XJ ~ DF+ ({ X}). We will show that X >> Z. Suppose not. Then j < J, and there is a first k with j < k s J such that X does not dominate Xh. The predecessor Xh _ ~ is dominated by X and so puts X~ into DF( X). Thus, Xh e DF( DF+ ({ X} )) = DF+ ({ X} ), contradicting
LEMMA
the choice and suppose at Z. Then
of j. that
D nonnull paths X})

U
5. Z
Let X + Y be two nodes in CFG, and q: Y$ Z in CFG converge
p:
X$
Z ~DF+({
DF+({
Y}).
PROOF. We consider three cases that are obviously exhaustive. In the first two cases, we prove that Z e DF+ ({ X} ) U DF+ ({ Y} ). Then we show that the
first
two cases are (unobviously)
exhaustive
because
the third
case leads to a
contradiction. Let nodes X= XO, ..., sequence of nodes
X be from Lemma 4 for the path p, with the sequence of XJ = Z. Let Y be from Lemma 4 for the path q, with the Y = YO, . . ., YK = Z.
X is on q and show that Z e DF+({ X}). By the Case 1. We suppose that definition of convergence (specifically, (3) in Section 2), X = Z. We may now assume already of Z but that Z = X and X dominates every node on p. (Otherwise, Lemma 4 asserts Z e DF+ ({ X}).) Because X dominates the predecessor XJ_ ~ does not strictly dominate Y Z, we have is on p, Z c DF( X) show G DF+ ({ X}). Z e DF ({ Y}),
Case 2. We suppose that reasoning just as in Case 1.
and
that
We derive a contradiction Case 3. q and Y is not on p. Because X
from ~
the suppositions X is not
that on q, X
X is not on >> YK = Z
Z but
and, therefore, dominates all predecessors of YK. In particular, X >> YK_ ~. But X # YK_~, SO X >> Y&~, and we continue inductively to show that X >> Y~ for all k. In particular, X >> Y. On the other hand, by similar reasoning from the supposition that Y >> Z but Y is not on p, we can show that Y >> X. Two nodes cannot strictly dominate each other, so Case 3 is K impossible.
LEMMA PROOF. 6.
For any set Y of CFG We apply Lemma 5.
nodes,
J(7)
G DF(J7
).
K
nodes such that Entry e Y, DF( Y ) G
LEMMA
7.
For
any set Y
of CFG
J(Y).
PROOF. Consider any Xc .Y and any Ye DF( X). There is a path from X to Y where all nodes before Y are dominated by X. There is also a path from Entry to Y where none of the nodes are dominated by X. The paths K therefore converge at Y.
THEOREM 2. The set of nodes that need ~-functions for any variable V is the iterated dominance frontier DF+ ( Y ), where Y is the set of nodes with assignments to V. ACM Transactions on Programmmg Languages and Systems, Vol. 13, No. 4, October 1991

PROOF.
469 we can
By Lemma
6 and
induction
on
i in the
definition
of J+,
show that J+(Y) The induction step is as follows: Ji+l = J(YUJ,)

GJ(SWDF+(Y))
GDF+(Y).
(5)
GDF+(!YUDF+(.Y)) The node Entry is in Y, so Lemma DF+(Y)G The induction step is as follows: DF,+l = DF(YUDF,) GJ(YU The set of nodes that prove the theorem. J+(Y)) for
=DF+(Y). induction yield (6)
7 and another J+(Y).
SDF(JW =J+(Y).
J+(Y))
need @-functions
V is precisely
J+(Y),
so (5) and (6)
K
OF MINIMAL SSA FORM
5. CONSTRUCTION 5.1
Using Dominance Frontiers to Find Where @-Functions Are Needed

in Figure 11 inserts trivial @-functions. The outer loop of this Several data is performed are used: once for each variable V in the program.
The algorithm algorithm structures
W is the worklist algorithm, ments to DF( X) becomes Work(*) indicates
of CFG
nodes
being
processed.
In each iteration
of this
W is initialized to the set &(V) V. Each node X in the worklist a @function. array of flags, Each one iteration flag for empty. is an
of nodes that contain assignensures that each node Y in terminates each node, when the w orklist Work(X) iteration X)
receives
where
whether loop.
X has ever been added to W during of flags, for
the current
of the outer HasAlready(*) indicates The flags because needing
is an array whether
one for each node, where V has already X) to V could
HasAlready( at X.
a @-function and for
been inserted
Work(X) the property a d-function
Ha.sAlready( of assigning V. The flags
are independent. is independent have
We need two flags of the property with of just
been implemented
the values true and false, but this would to reset any true flags between iterations,
require additional record keeping without the expense of looping
over all the nodes. It is simpler to devote an integer to each flag and to test flags by comparing them with the current iteration count. assignments to variables,, where Let each node X have AO,,~ (X) original each ordinary assignment statement LHS + RHS contributes the length of the tuple LHS to A o,,~( x). counting assignments to variables is one of
ACM Transactions on Programming Languages and Systems, Vol. 13, No. 4, Octc,ber 1991.
470
Ron Cytron et al.

IterCount for each Work(X) end W+o for each IterCount variable +V do --I 1 do + node O X do + 0 O
HasAlready(X)
IterCount
for
each X
Work(X)
~
+-
d(V) {x}
IterCount
W+wu
end while take for if W X each then # @ do from Y
W & do DF(.T) < do IterCount V))
HasAlready(Y) place if (F
@(V,..., + <
at
HasAlready(Y) Work(Y) then do Work(Y) Jv end end end end end Fig, 11. Placement +w
IterCount
IterCount +IterCount
{Y}
of ~-functions.
several measures of program size. 5 By this measure, the program from size AO,,~ = XX AO,,~(X) to size A,O, = XX AtO,( X), where
= A o,,g(x) function placed at X contributes 1 to A,.,(X) + A+(X). a similar expansion in the number of mentions of variables, from Xx M~,,g( X) to Mto, = Xx Mto,(
expands each @There

Morlg
is
=
contributes
1 plus
the indegree
where each @-function placed X), of X to MtOt( X) = MOrLg( X) + M+(X).
at
Placing a ~-function at Y in Figure 11 has cost linear in the indegree of Y, so there is an O(EX M*( X)) contribution to the running time from the work done when HasAlready( Y) < IterCount. Replacing mentions of variables will contribute at least 0( M~Ot) to the running time of any SSA translation algorithm, so the cost of placement can be ignored in analyzing the contribution of Figure 11 to the 0(. . ) bound for the whole process. The 0(N) cost of
5The various measures relevant summarized in Section 9 1.

ACM TransactIons on Programming
here are reviewed
when the whole SSA translation
process is
Languages
and Systems,
Vol.
13, No, 4, October
1991
Static Single Assignment Form and the Control Dependence initialization the dominance managing the performed because running Sizing can be similarly frontier worklist ignored because
Graph
471 cost of
it is subsumed
by the
calculation. What W. The statement and each time
cannot be ignored is the cost of take X from W in Figure 11 is incurs a cost linear of Figure A ~Ot(X) x also average Aw(X)) in I DF( X) I 11 to the I DF( X) I )). describe the
A,O,( X)
times,
all Ye DF( X) are polled. The contribution time of the whole process is therefore O(ZX( the output in the natural way where x as
A tot,
we can
contribution
as 0( A ~Otx aurgDF), aurgDF~f (~ (A,O,(X)
the weighted 1))/(~
I DF(X)
(7)
As 11
emphasizes the dominance frontiers of nodes with many assignments. Section 8 shows, the dominance frontiers are small in practice, and Figure is effectively 0( A tOt). This, in turn, is effectively 0( AO,i~).
5.2 Renaming
The algorithm denoted 12 begins the root with have the been in Figure 12 renames traversal The visit The all mentions are generated of the dominator order, processing starting with of variables. tree by calling the any statements @-functions requires work New variables V. Figure SEAFtCH that for at may only associated V,, where a top-down node Entry. node in inserted. i is an integer, for each variable
to a node processes of a statement
sequential
those variables actually mentioned in the statement. In contrast with Figure 11, we need a loop over all variables only when we initialize two arrays among the following data structures: S(*) hold C(*) C(V) is an array integers. Vi that tells is an array of stacks, The integer should one stack i at the
One
for top
each variable of S(V) variable
V. The
stacks
can the value
is used
to construct countelc
variable
replace assignments
a use of V. for each V. The to V have been processed.
of integers,
how many
WhichPred( X. The jth of Y from Each
X, Y) is an integer telling which predecessor of Y in CFG is operand of a +-function in Y corresponds to the jth predecessor the listing of the inedges of Y. statement A has the form LHS(A) RHS(A) of expressions and the variables. These tuples target tuple notation, a conditional
assignment
where the right-hand side RHS( A) is a tuple left-hand side LHS( A) is a tuple of distinct target change is still as mentions remembered of variables as oldLILS( are renamed, A). To minimize
but the original
branch is treated as an ordinary assignment to a special variable whose value indicates which edge the branch should follow. A real implementation would recognize that a conditional branch involves a little less work The LHS part of the processing can be than a genuine assignment: omitted.
ACM Transactions on Programmmg Languages and Systems, Vol. 13, No. 4, October 1991
472
for each
Ron Cytron et al.
variable
do
c(v)
S(V) end call
+
+-
o
Empty Stack
SEARCH (Entry)
SEARCH(X) : for each statement

if A
is an
.4 in
do
ordinary each variable use LHS(A)
assignment V of used In RHS(A) of do i = Top(S(V))
then for end for each itreplace push C(V) end end for /* + of Y first c loop SUCC(X) +/ do F in operand Y do t in RHS(F) by V, where i = Top(S(V)) each ~ for end end for each call Y c Children(X) do V in do replace 1 by use V, where
C(v)
P by i onto i new tj in LHS(A) S(V) +
tt7hichPred(Y)X) each ~-function the j-th
replace
SEARCH(Y) A in
oldLHS(A)
end for each assignment

for each pop end end end SEARCH V in
X do
do
s(v)
Fig. 12 Renaming mentions ofvariables. V by V, ina RHS
Thelistofusesof
V, grows with each replacement of
The processing ofan assignment statement A considers the mentions of variables in A. The simplest case is that of a target variable V inthetuple LHS(A). Weneed anew variable V,. By keeping acount C(V) ofthe number have already been processed, we can of assignments to V that appropriate new variable by using i = C(V) and then incrementing To facilitate renaming uses of V in the future, identifies V,) onto a stack S(V) of (integers that V. replacing
ACM Transactions on Pro~amming Languages and Systems, Vol
find an C(V).
i (which variables
we also identify)
push new
13, N0
4, October
1991
Static Single Assignment Form and the Control Dependence A subtler any variable expressions computation V used in in the tuple. is needed RHS( A); We want for the right-hand that is, V appears V by to replace
Graph A).
473
side RHS( Vi, where
Consider one of the
in at least
V, is the target
of the assignment that There are two subcases
produces the value for V actually used in RHS( A). because A maybe either an ordinary assignment or a but they later in 4 and of
~-function. Both subcases get V, from the top of the stack S(V), inspect S(V) at different times in Figure 12. Lemma 10, introduced this section, shows that The correctness proof V, is correctly for renaming chosen depends
in both subcases. on results from Section it makes program.
on three more lemmas. We start by showing that the assignment to a variable in the transformed
LEMMA 8.
sense to speak
Each
new variable
Vi in the transformed
program
is a target
of
exactly
one assignment. Because the counter C(V) is incremented after processing each
PRooF.
assignment to V, there can be at most one assignment to V,. To show that there is at least one assignment to V,, we consider the two ways V, can be mentioned. If V, is mentioned on the LHS on an assignment, then there is nothing more to show. If the value of when time Vi when is used, on the other onto hancl, S(V), then it was i = Zop(S( V )) was true pushed to vi. because at the time the algorithm renamed the old use
of V to a use of Vi. At the earlier an assignment
i was pushed been changed
to V had just
to an assignment
K
X may program contain or were assignments introduced to the variable to receive V that appeared in the for all 12. loop
A node original
the value
of a o-function
V. Let TopA/7er( V, X) denote the new variable in effect for V after statements in node X have been considered by the first loop in Figure Specifically, and define we consider the top of each stack S(V) at the end of this
TopAfter If there that assigns

LEMMA
( V, X)
~f
V,
where X, then from the
i = Top(S(V)). the top-down closest traversal of ensures X that
are no assignments V, X) is to V.
to V in inherited
TopAf3er(
dominator
not
9. For any variable have a rp-function for V,
V and
any CFG
edge X ~ Y such
that
Y does
TopAfter(V, PRooF. o-function We may for assume
X) that Z has
= TopAfter( X # idom( Ye DF(Z),
V, idom( Y). then
Y)). Y does not have
(8) a
Because
V, if a node
Z does not assign
to a new to a
variable derived from V. We use this fact twice below. By Lemma 2, Ye DFIOC=l( X) G DF( X), and, thus, X does not new idom( variable X), derived from X)), V. Let U be the first node in idom( idom( . . . that ( V, X) assigns to such a variable. ( V, U).
Vol.
assign
the Then
sequence (9)
TopAfter
ACM Transactions
= TopAfter
Languages
on Programming
and Systems,
13, No. 4, October
1991.
474 Because DF(U). any
Ron Cytron et al U assigns to a new variable Y), we get derived from V, it follows dominates that that Y $ is
But
U dominates
a predecessor
of Y, so U strictly Z >> X because implies
Y. For
Z with
U >> Z >> idom(
Z >> Y and there
an edge X * Y. By the choice assign to a new variable derived 7opAfter( and (8) follows The preceding form, but first from (9). V, U)
of U, U>> Z >> X from V. Therefore, = TopAfler
Z does not
( V, idom(
Y)),
K
helps extend establish TopAfter Condition (3) in the definition which
A.
lemma we must
of SSA
so as to specify
new variable There is one top of each
corresponds to V just before and just after iteration of the first loop in Figure 12 for stack just before and just ~f ~f V, V, after this where where iteration
each statement A. We consider to define
the
TopBefore( TopAfter
V, A) ( V, A)
i = Top( S ( V ) ) before i = TOP( S( V ) ) after
processing processing
A; A.
to be the last In particular, if A happens TopAfter( V, A) = TopAfler( V, X). If, on the another
LEMMA
statement in block X, then A is followed by other hand, V, B). program V and any
statement
B, then
TopA/ler(
V, A) = TopBefore(
and
10. Consider any control flow the same path in the original program.
path in the transformed Consider any variable
occurrence of a statement A along the path. If A is from then the value of Vjust before executing A in the original as the value program. PRooF. executed is either TopBefore( the original before A. Case of TopBefore( V, A) Just before executing
the original program, program is the same A in the transformed
We use induction
along
the path.
We consider
the
kth
statement
along the path and assume that, for all j < k, the jth statement T not from the original programG or has each variable V agreeing with V, T) just program before and T. We assume show that that the kth statement TopBefore( A is from V, A) just V agrees with
1. Suppose
that
is not
the
first
original
statement
in
its
basic
block Y. hypothesis just after
Let T be the statement just before A in Y. By the induction and the semantics of assignment, V agrees with TopAfter(V, T)
T.
Thus,
Suppose
V agrees
that edge
with
TopBefore(
first by the original control
V, A) just
statement path. We
before
in its
A. basic that block Y,
Case 2.
Let X +
A is the followed
Y be the
claim
V agrees
with TopAfler( V, X) when control flows along this edge. If X = Entry, then TopAfter( V, X) is VO and does hold the entry value of V. If X # Entry, let T be the last statement in X. By the induction hypothesis and the semantics of assignment, V agrees with TopAfter( V, T)
It
could be a nominal
Entry
assignment
or a @function, 13, No, 4, October 1991
ACM Transactions
on Programming
Languages and Systems, Vol
475
... ?
~ + Y
... +
T t
...
...
@(..., TopAfter(V,X),
. ..) Y
/*
No ~ for
*/
V4K
V 2 ZopAfter(V,
idom(Y))
1
Fig. 13. In the proof of Lemma 10, the node Y mayor may not have a +-function for V.
just
after
and, still
hence, bridge along before
with the X +
TopAfter( gap Y between and A in for V
V, X)
when
control that V V in the
flows agrees agrees
along with with there
x+
We
Y.
must knowing V, X) may not knowing ahead of that A
TopAfter( TopBefore( may or program. Case
V, A) just
doing
Y. As Figure
13 illustrates,
be a @-function
transformed
2.1.
Suppose
that
has
a ~-function
Vi+-
4(.
..)
in
Y.
The
operand corresponding to X + Y is TopAfter ( V, X), thanks to the loop over successors of X in Figure 12. This is the operand whose value is assigned to Vi, which doing is TopBefore( A in Y. that Y. V does not have V, X) = TopAfter( a ~-function V, idom( Y)) in Y. By Lemma 9, V, A). Thus, V agrees with TopBefore( V, A) just before
Case 2.2. V agrees just before
Suppose with doing

3.
TopAfter( A in Any
= TopBefore(
V, A)
K
can be put and then number into minimal SSA form by computin Figure of variables Let avr.gDF 11 in be in the
THEOREM
program frontiers
ing
the dominance
applying
the algorithms
and Figure
12. Let A ~Ot be the total
of assignments
to variables
of mentions resulting program. ~ Let MtOt be the total number the resulting program. Let E be the number of edges in CFG. the weighted the whole average is E + ~ x I DF(X) I + (A,o,
x
(7) of dominance
frontier
sizes. Then
the running
time of
process O
avrgDF)
+ M,O,
)
. to V in the that need
PROOF.
Figure
11 places
the
@functions
for
V at the nodes in the iterated set of nodes
dominance original
frontier program.
DF+ ( Y ), where By Theorem 2,
is the set of assignments
DF+ ( Y ) is the
7Each ordinary assignment each @-function contributes

ACM Transactions
LHS + RHS contributes 1 to A ~Ot.

on Programming Languages
the length
of the tuple
LHS
to A,.,,
and
and Systems,
Vol.
13, No. 4, October
1991.
476
Ron Cytron et al. for V, so we have with obtained the fewest correctly by Condition possible Figure (1) in @functions. 12. Condition the definition We must (2) in of still the
@-functions translation show that definition Let N edges, and subsumed frontiers, Section
to SSA form renaming follows be the Figure
is done
from Lemma 8. Condition (3) follows from number of nodes in CFG. The dominator 12 runs in 0( N + E + Mtot)
X ) I)
Lemma 10. tree has O(N) N + E term is the dominance at the end of
time.
The
by the 0( E + x ~ I DF( so Figure 12 contributes 5.1, Figure 11 contributes
cost of computing 0( M,O,). As explained
0( AfOt
x czurgDF).
6. CONSTRUCTION
In this section in we dominance Y be nodes
postdominates
OF CONTROL DEPENDENCE
show in the that control graph dependence of the control [241 are flow essentially Let the X and X is reverse graph.
frontiers
Y.a
CFG. If X appears on every path from Y to Exit, then Like the dominator relation, the postdominator relation
reflexive and transitive. postdominates Y. The postdominator
If X postdominates Y but X # Y, then X strictly immediate postdominator of Y is the closest strict from Y to Exit. In a postdominator
tree,
of Y on any path
the
children of a node X are all immediately dependent A CFG node Y is control following hold: path
postdominated by X. on a CFG node X if both
of the
(1) There is a nonnull after X on p. (2) The node In other
p: X ~ Y such that postdominate edge path from from
Y postdominates the node X X that that X. definitely avoids
every
node
Y does not strictly there there is some
words, and with edge
causes executing
Y Y.
to execute, We associate control flow
is also
some
this control dependence from X to Y the from X that causes Y to execute. Our here can easily be shown to be equivalent
label on the definition of to the original
control dependence definition [241.

LEMMA 11.
Let X and
only every if (iff node )
Y be CFG there after is
nodes.
a
Then
Y postdominates
path p: X ~ Y
a successor
of X
if
and
nonnull
such
that
postdominates
PROOF.
X on p.
Suppose
that
Y postdominates
a successor
U of
X.
Choose
any
path q from U to Exit. q that reaches the first
Then Y appears on q. Let r be the initial segment of appearance of Y on q. For any node V on r, we can r to V and then by taking any path from V U but does not appear before the end of r, Let r. p be the path Then p: that starts with the edge X ~ Y and Y postdominates
get from U to Exit by following to Exit. Because Y postdominates Y must every postdominate then V as well. along X + U and proceeds X on p.
node after
8The postdominance relation in [24] is irreflexive, whereas the definition we use here is reflexive. The two relations are identical on pairs of distinct elements, We choose the reflexive defimtlon here to make postdominance the dual of the dominance relation, ACM Transactions
on Programming Languages and Systems, Vol. 13, No. 4, October 1991,
477
build build apply
RCFG dominator the dominance tree frontier X Y E do do RDF(Y) CD(X)

U
for in
RCFG Figure 10 to RDF 0 end find for the RCFG mapping t--
algorithm
for for
each each for
node node X
CD(X)
each CD(X)
do
{ Y }
end end Fig. 14. Algorithm for computing the set CD(X) of nodes that are contro 1dependent on X
Node Entry 1 2 3 4 5 6 7 8 9
CD(Node) 1,2,8,9,11,12 3,6,7 4,5
Fig. 15.
Control
dependences of theprogram
in Figure5.
10
9,11 2,8,9,11,12
10 11
12
Conversely, after The
given
a path
p with
these
properties,
let
U be the first U. as the []
node
X on p. Then reverse control
U is a successor flow graph
of X and
Y postdominates same nodes
RCFG
has the
control The on
flow graph CFG, but has an edge Y + X for each edge X + Y in CFG. roles of Entry and Exit are also reversed. The postdominator relation CFG is the dominator
1.
relation
on RCFG. Y be nodes in CFG. Then Y is control dependent
COROLLARY
Let X and
on X in CFG
l?ROOF.
iff X~ DF( Y)
in RCFG.
control
Using Lemma 11 to simplify the first condition in the definition of dependence, we find that Y is control dependent on X iff Y postdomi -
nates a successor of X but does not strictly postdominate X. In RCFG this means that Y dominates a predecessor of X but does not strictly dominate X; that Figure is, X~DF(Y). 14 applies this
K
result to the computation of control depenclences.
After building the dominator tree for RCFG by the in time 0( Ea( E, N)), we spend O(size( RDF)) finding and then inverting them. The total time is thus
standard method [351 dominance frontiers for all

1991
0( E + size( RDF))
13, No
ACM TransactIons on Programming Languages and Systems,Vol.
4, October
478 practical graph the
Ron Cytron et al. purposes. By applying to Exit the algorithm the control was added would in Figure so that 14 to the control in Figure the at Entry. 15. Note control flow that
in Figure
5, we obtain Entry viewed
dependence to CFG be rooted
edge from
depend-
ence relation,
as a graph,
7. TRANSLATING Many powerful
FROM analysis
SSA FORM and transformation techniques can be applied to
programs in SSA form. Eventually, however, The @-functions have precise semantics, but sented in existing target machines. each yield This out of SSA form by replacing ments. Naive translation could can be generated elimination if two already allocation and storage @-function inefficient
a program must be executed. they are generally not repredescribes with object how to translate some ordinary assigncode, but efficient code are applied: dead code
section
useful
optimization
by coloring. X can be replaced by k flow predecessor of X. sometimes perform a
Naively, a k-input @-function at entrance to a node ordinary assignments, one at the end of each control This is always correct, but these ordinary assignments good deal elimination efficient. 7.1 Dead Code Elimination program may may in the have
dead code
of useless work. If the naive replacement is preceded by dead code and then followed by coloring, however, the resulting code is
The original as procedure live may
source
(i.e.,
code that
has no effect (such once was possible
on any program become
output). dead
Some of the intermediate also introduce course of optimization.
steps in compilation With so many
integration)
dead code. Code that
sources for dead code, it is natural to perform (or repeat) tion late in the optimization process, rather than burden steps with concerns about dead code.
dead code eliminamany intermediate
Translation to SSA form is one of the compilation steps that may introduce dead code. Suppose that V is assigned and then used along each branch of an if . ..then... but that V is never used after the join point. The else.... original assignments to V are live, but the added assignment by the @ function is dead. Often, such dead +-functions are useful, as in the equivalencing and redundancy elimination algorithms that are based on SSA form [5, 431. One such use is shown in Figure 16. Although others have avoided placement of dead @functions in translating to SSA form [16, 52], we prefer to include the dead O-functions to increase optimization opportunities. There are many different definitions of dead code in the literature. Dead code is sometimes defined to be unreachable code and sometimes defined (as it is here) to be ineffectual code. In both cases, it is desirable to use the broadest code really possible definition, subject removed. to the correctness g A procedural condition of the that dead is can be safely version definition
The definition used here is broader variables [25, p, 489].

ACM TransactIons on Programming
than the usual one [1, p 595] and similar
to that of faint
Languages
and Systems,
Vol.
13, No. 4, October
1991
Static Single Assignment Form and the Control Dependence

if Pi then do Yi+-1 use end else Y2 end Y3 . . . if Pi then else Z1 Z2 t ~ 1 Xl 4(Y1, Y2) Y3+ do +
xl
Graph
479
if
Pi then do Yltl
of
Y1 end else Y2 end
use do Y2
of
Y1
xl
use of
use of
4( YI, Y2) . . .
Y2
Z3 + 4( ZI, Z2) use of Z3
use of
Y3
Fig. 16. Ontheleft isanunoptimized program containing a dead @function that assigns toY3. The value numbering technique in [5] can determine that Y3 and Z3 have the same value, Thus, Z3 and many of the computations that produce it can be eliminated. The dead ~-function is brought tolifebyusing Y3 inplaceof Z3.
more Initially,
intuitive all need
than
a recursive are
version, tentatively live
so we prefer marked of the
the
procedural Some listed
style. below.
statements
dead.
statements,
however, Marking natural
to be marked
because
conditions
these statements live may cause others to be marked live. When the worklist eventually empties, any statements that are still marked dead and can be safely is marked live removed from the code. holds: program parameter, are output, or a iff at least should have one of the following to affect
dead are truly A statement (1) The such
statement as an 1/0
is one that statement, that may
be assumed
an assignment side effects.
to a reference there
call to a routine (2) The statement already marked
is an assignment statement, and live that use some of its outputs.
statements
(3) The statement is a conditional branch, marked live that are control dependent Several published that requires every algorithm, conditional given in branches. algorithms conditional Figure
and there are statements already on this conditional branch,
eliminate dead code in the narrower sense branch to be marked live [30, 31, 381.1 Our 17, goes one step further in eliminating does this dead al so.) algorithm by R. Paige
(An unpublished
The more readily accessible [31] contains a typographical should be consulted for the correct version.
ACM TransactIons on Programmmg Languages
error; the earlier
technical
report [301
and Systems,
Vol.
13, No. 4, October
1991.
480
Ron Cytron et al
for if each then else end iVorkList while take for
if
statement Live(S) Live(S)
S do + true false
S G Pre Live
t(WorkList
Pre Live # 0) do
S from each then D
14To~kList c do Definers(S) = false ttrue \VorkList

U { ~ } do
Live(D) Live(l))
PVorkList end end for If each then block do Live(Last(B)) WorkList end end end for If end Fig. 17 each statement = false delete B
in
CD-l = false -
(B/ock(S))
do
Liz~e(Last(B))
true
U {
WorkList
Last(B)
S do
Live(S) then
S from
Block(S)
Dead code ehmination
The following ~ive(S)

PreLiue
data
structures that
are used: S is live. whose execution is initially assumed to or side effects has been outside disstate-
indicates is the
statement
set of statements
affect program output. the current procedure WorkList covered.

Definers(S)
Statements that result in 1/0 scope are typically included. whose that Iiveness provide basic statement block S.
is a list is the
of statements set of statements that
recently used by
values B.
ment
Last(
S.
B)
is the statement is the basic

l?)
terminates
Block(S) ipdom(
ACM
block block
containing that
is the basic
immediately
and Systems,
postdominates
Vol 13, No. 4, October
block
1991
1?.
TransactIons
on Programming
Languages
Static Single Assignment Form and the Control Dependence Graph CD-1(B) is the set of nodes that to block discovers 11 Other are control is the dependence
481 of the 14.
predecessors B) in Figure still
node corresponding After live the are algorithm deleted.
B. This the live
same as RDF( all those as code
statements, (such
not marked may leave
optimizations
motion)
empty blocks, remove them.
so there is already ample reason for a late The empty blocks left by dead code elimination
optimization to can be removed
along with any other empty blocks. The fact that statements are considered for condition (2). Statements that depend never marked live unless required by some (3) is handled tion controls 7.2 Allocation by the loop over a block with live by Coloring seem possible simply
dead until marked live is crucial (transitively) on themselves are other live statement. whose Condition termina-
CD- l(Block(S)). A basic block statements is itself live.
At first,
it might
to map
all occurrences the new variables
of Vi back to V introduced by
and to delete translation may have useful (Figure can 18c). leave be yielding The
all of the
~-functions.
However,
to SSA form cannot always be eliminated, because optimization capitalized on the storage independence of the new variables. The of the new variables by the code motion to V twice by moving with the the introduced example invariant variables VI by translation in Figure assignment for and separate The SSA form out These to SSA form code 18b) loop, (Figure of the 18. The source
persistence
can be illustrated optimized a program dead a region in
18a) assigns
and uses it twice.
separate
purposes
(Figure
live.
assignment
to V3 will where
be eliminated.
optimization
program
Vz are
simultaneously
Thus, both variables are required: The original variable V cannot substitute for both renamed variables. Any graph coloring algorithm [12, 13, 17, 18, 211 can be used to reduce the number of variables needed and thereby can remove most of the associated assignment statements. The choice of coloring technique should be guided by the eventual then just use of the output. to consider derived of the should most If the goal is to produce each original from variable readable source code, all of of the it is desirable the SSA variables changes V separately, coloring
V. If the goal is machine at once. In both that were inserted
code, then to model
the SSA variables coloring
be considered assignments
cases, the process
@functions into identity assignments, that is, assignments of the form V - V. These identity assignments can all be deleted. Storage savings are especially noticeable for arrays. If optimization does not perturb the order of the first two statements in Figure 7, then ~arrays A ~ and A ~ can be assigned the same color and, hence, can share the same storage. The array A ~ is then assigned an Update from an identically colored array. Such operations can be implemented inexpensively by
llA
conditional
branch
can be deleted by transforming
it to an unconditional
branch
to any one
of its prior
targets. ACM Transactions on Programming Languages and Systems, Vol. 13, No. 4, October 1991.
482
Ron Cytron et al.

V2G6
while
(. ..)
do
while W3
(. ..) -
do
~(w~, W-J
while
(. ..)
do
V3 read V Wtv+w V+6 W+v+w end (a)

Fig 18. program; Program that (b)unoptimized
lj(v~, V1
v~)
read
W3 + ~(w~, w~) V3 + $+(VO, V2) read VI WI +V1+W3
WI +vi+w3 V2t6 W2 +V2+W1 end (b)

really uses two instances for a variable SSA form; (c) result of code motion, after end
W2 4-
V2+W1
(c)
code motion.
(a) Source
assigning actual
to just
one component performed
if the array s share
storage.
Inparticular, of this form.
the
operation
by HiddenUpdate
is always
8. ANALYSIS AND MEASUREMENTS

The number of nodes that contain d-functions for a variable V is a function of the program control flow structure and the assignments to V. Program structure alone determines dominance frontiers and the number of control dependence. It is possible that dominance frontiers may be larger than necessary for computing @function locations for some programs, since the actual assignments are not taken the size of the dominance frontiers control loops. ure linear flow (We branching assume Such is restricted that expressions into account. In this section we prove that is linear in the size of the program when to if-then-else and results by the that constructs perform grammar that and no given while-do internal in Figis predicates suggest
branching.)
programs programs. For

programs the
can be described
19. We also give for actual

4.
experimental
the behavior
THEOREM
comprised dominance
of frontier
straight-line of any
code, CFG node
if-then-else,
contains at
and
while-do constructs, most two nodes.

PROOF. Consider 19. Initially, flow graph
a top-down we have
parse
of a program (program)
using
the
grammar in the parse
shown tree
in and
Figure
a single
node
a control
CFG
with
two
nodes
and one edge:

to
Entry
Exit.
The initial dominance frontiers are DF(Entry) changes production, we consider the associated frontiers of nodes. When a production expands
= @ = DF(Exit). For each CFG and to the dominance a nonterminal parse tree node
S, a new subgraph is inserted into CFG in place of S. In this new subgraph, each node corresponds to a symbol on the right-hand side of the production. We will show that applying any of the productions preserves the following invariants:
CFG node S corresponding to an unexpanded Each has at most one node in its dominance frontier.
ACM TransactIons on Programmmg Languages and Systems, Vol.
(statement)
symbol
13, No. 4, October
1991

<program> <statement> <statement> :: = <S. tatenent> ::= ::= <statement><staternent> if <predicate> then else <statement> ::= while do <statement.> ::= <statement> <statement> <predicate> <statement> <expression> for control structures (5) (4) (1) (2) (3)
483
<variable> Grammar
Fig. 19.
Each
CFG
node
T corresponding frontier. inturn. node
to a terminal
symbol
has
at most
two
nodes in its dominance We consider (l) the productions
This production adds aCFG ing DF(S) = {Exit}. this production SI and Sz. Edges leave Sz. A single flow graph tortree: Nodes this
Sand aCFG
edges Entry node
+S+Exit,
yield-
(2) When
is applied,
Sisreplaced
by twon odes SI and control S; Ti~,
previously entering and leaving S now enter edge is inserted from SI to Sz. Although the consider Szdominate Sz. Thus, Edges is applied, T,~~z~. Edges how this wehave node production DF(Sl) entering from affects were = DF(S) and all nodes that a CFG previously
has changed, Sland Sldominates production Te.~i~.
thedornina = DIF(SJ. S now S~~,.m and
dominateclby by nodes leaving
additionally, (3) When
S is replaced
then?
Sel.e, and
enter
T,~ and leave
are inserted
Ti~ to both
s~l~~; edges are also inserted from St~~~ and S~l~~ to Te~~i~. In the dominator tree, Ti ~ and T,~~i ~ both dominate all nodes that were dominated made
consider fr(mtier,
by S. Additionally, Ti ~ dominates S~~~~ and S.l,.. IBy the argument for production (2), we have DF( T,~) = DF( S) = DF(T,~~i ~]). NOW
nodes we obtain Sthen and Sel~e. From the definition of a dominance
DF(S,~e.)
= DF(S.l.,) a CFG
= { T.n~,fl}. node S is replaced with from node TW~,l, by nodes
(4) When T whzle
this
production
is applied,
and SdO. All edges previously associated associated with node TUkil.. Edges are inserted
S are now to Sd. and dominated DF( TWhil.) is isomor-
from S& to TU~,l,. Node TWh,l, dominates all nodes that were by node S. Additionally, TU~,l, dominates SdO. Thus, we have U { TWhi~=) and DF(SdO) = { T~h,~,}. = DF(S)
(5) After
phic
application of this production, K to the old graph.

2.
the new control
flow
graph
COROLLARY
For programs constructs, every
comprised node
of straight-line dependent
code, if-then-else, on at most two
and
while-do
is control
nodes.
PROOF. Consider a program
P composed CFG.
of the allowed control
constructs graph
and its R(CFG is
associated
control
ACM
flow
graph
The reverse
Languages
flow
Vol.
Transactions
on Programmmg
and Systems,
13, No. 4~October
1991.
484
Ron Cytron et al.

Table 1. Statements in all procedures 7,034 2,054 14,093 23,181 Summary Statistics of Our Experiment
Statements per procedure Min 22 9 8 8 Median 89 54 43 55 Max 327 351 753 753 Description Dense matrix eigenvectors Flow past an airfoil Circuit simulation 221 FORTRAN procedures and values
Package name EISPACK FL052 SPICE Totals
itself RCFG, is then
a structured
control
flow
graph
for
some
program
P.
For
all
Y in 1, Y
DF( Y) contains at most two nodes by Theorem K control dependent on at most two nodes. these each loop, linearity the the results nest do not frontier whose hold of the loops. total For
4. By Corollary
Unfortunately, tures. Figure includes leads variable mapping computation In particular, 5. For
for
all loops
program illustrated to that
strucin loop this each
consider
of repeat-until
dominance mapping d-functions. in placing might
entrance n nested
each of the entrances frontier 0(n) used needs at most is not actually of dominance
to surrounding
loops, yet
to a dominance
size is 0( nz),
Most of the dominance @functions, so it seems take excessive time with
frontier that the respect to
frontiers
the resulting number of actual number of dominance frontier diverse set of programs. We implemented placing ~-functions local library data flow procedures
@functions. We therefore wish to measure the nodes as a function of program size over a for constructing dominance frontiers and system, which already offered the required analysis [2]. We ran these algorithms from on 61 [46] and 160 procedures two Perfect
our algorithms in the PTRAN flow from EISPACK
and control
[391 benchmarks. Some summary statistics of these procedures are shown in Table I. These FORTRAN programs were chosen because they contain irreducible intervals and other unstructured constructs. As the plot in Figure 20 shows, the size of the dominance frontier mapping appears to vary linearly with program size. The ratio of these sizes ranged from 0.6 (the Entry node has an empty dominance frontier) to 2.1. For the programs ~-functions is also we tested, linear the plot in Figure 21 shows that program. the number The ratio of of in the size of the original
these sizes ranged from 0.5 to 5.2. The largest ratio occurred for a procedure of only 12 statements, and 95 percent of the procedures had a ratio under 2.3. All but one of the remaining procedures contained fewer than 60 statements. Finally, the plot in Figure 22 shows that graph is linear in the size of the original ranged from O.6 to 2.4, which is very dominance frontiers. The placing
ACM
the size of the control dependence program. The ratio of these sizes close to the range of ratios for
ratio avrgDF (defined by (7) in Section 5.1) measures the cost of ~-functions relative to the number of assignments in the resulting
on Programmmg Languages and Systems, Vol. 13, No. 4, October 1991
TransactIons

16001500140k 1300120011oo1000-900 800 700 600 500 400 300 200 *:*@*?:*:: * ** * ** ***** *** * * * *
. *
485
* * * *
loo 0 o Fig. 20. I 100 I

200
I
300
I
400
I
500
6~o of program
~~oo statements.
Size of dominance
frontier
mapping
versus number
1300 1200 110 100 900 800 700 600 !

500 400 300 ** ****:* *** ** e * * * *** *
* * *
* * * * *
*
*
*
loo 0 0 I 100 Fig, 21. I

200
I
300
I
400
I
500
I
600
--&-Too
Number
of rj-functions
versus number
of program
statements
SSA form program. no correlation with
This ratio varied program size.
from
1 to 2, with
median
1.3. There
was
We also measured the expansion A,., / A.... in the number of assignments when translating to SSA form. This ratio varied from 1.3 to 3.8. Finally, we measured the expansion MtOt / MO,,~ in the number of mentions (assignments
486
160 150 1400 130 120 110 100 900 800 700 ! 600 500 400 300 1 200 loo 0
Ron Cytron et al.
* ** *
* *******
** * * **
**
**2** $*
?@% ****
I
200
0
Fig. 22.
100
1
300 400
I
500 600 700
I
800
Size of control
dependence graph versus number
of program
statements.
or uses) to 9. 6.2.
of variables For both
when of these
translating ratios, there
to SSA was no
form.
This
ratio with
varied
from
1.6 size.
correlation
program
DISCUSSION and Time Bounds

is done in three steps: the control flow E edges. Let DF to is
9.1 Summary of Algorithms

The conversion to SSA form
(1) The
graph
dominance fi-ontier mapping is constructed from CFG (Section 4.2). Let CFG have N nodes and nodes to their tree and then
be the mapping from compute the dominator O(E + Xx I DF(X)I).
dominance frontiers. The time the dominance frontiers in CFG
(2) Using
the dominance frontiers, variable in the original program the total number of assignments where each ordinary assignment len~h
A ,.,.
the locations of the @functions for each are determined (Section 5.1). Let A ~Ot be to variables in the resulting program, statement LHS * RHS contributes the
time,
of the tuple LffS to A ~ot, and each ~-function contributes I to Placing @functions contributes 0( A,O, x aurgDF) to the overall where aurgDF is the weighted average (7) of the sizes I DF( X) 1. are renamed (Section 5.2). Let M,O, be the total number of of variables in the resulting program. Renaming contributes time. in terms be the
Languages
(3) The variables

mentions
0( M, O,) to the overall R To state the time bounds of the original program
TransactIons on Programming
of fewer maximum
and Systems,
parameters, let the overall size of the relevant numbers: N

Vol 13, No 4, October 1991
ACM
Static Single Assignment Form and the Control Dependence nodes, mentions ordinary E edges, AOT,~ original assignments In the worst case, can require worst 0( I&) case, In the
Graph
487
of variables. assignments
to variables, and MO,,~ original avrgDF = ~ ( IV) = 0(R), and k insertions a c$function of qLfunctions. has Thus, !J( 1?) operands.
A tOt = !J(R2)
at worst.
Thus, M,Ot = 0( R3) at worst. The one-parameter worst-case time bounds are thus 0(122) for finding dominance frontiers and 0( R3) for translation to SSA form. However, the data in Section 8 suggest that the entire translation to SSA form will is small, avrgDF lation reverse RDF Control be linear as is the is constant, process graph is the dependence RCFG size of the in practice. The dominance frontier of each node in CFG number of ~-functions added for each variable. In effect, A ~Ot= 0( Ao,,~), O(R). are (Section output read off of the and from M,O, = O(MO,,~). the dominance dependence The entire frontiers Since calculation, transin the this is 8
is effectively
6) in time
0( E + size( RDF)). control
the size of
algorithm is linear in the size of the output. The only quadratic caused by the output being fl( R2) in the worst case. The data suggest that the control dependence calculation is effectively
behavior in Section
0(R).
9.2
Related Work
SSA form is a refinement of Shapiro and Saints [45] notion of a
Minimal
pseudoassignment. The pseudoassignment nodes for V are exactly the nodes that need o-functions for V. A closer precursor [221 of SSA form associated new from names for V with pseudoassignment Without nodes explicit and inserted assignments it was one new name to another. @functions, however,
difficult to manage the new names or reason about the flow of values. Suppose the control flow graph CFG has N nodes and. E edges for a program with Q variables. One algorithm [411 requires 0( Ea( E, N)) bit vector operations (where each vector algorithm is of [43] length for Q) to find all of the compseudoassignments. A simpler reducible programs
putes SSA form in time 0( E x Q). With lengths of bit vectors taken into account, both of these algorithms are essentially 0( R2 ) on programs of size R, and the simpler algorithm sometimes inserts extraneous o-functions. The method presented here is 0( R3) at worst, but Section 8 gives evidence that it is 0(R) in practice. The earlier 0( R2 ) algorithms have no provision for running faster in typical cases; they appear to be intrinsically quadratic. For CFG with N nodes and E edges, previous general control dependence algorithms [24] can take quadratic the worst-case 0(N) depth of the time in (N + E). This analysis is based on (post) dominator tree [24, p. 326]. Section 6
shows that control dependence can be determined by computing dominance frontiers in the reverse graph RCFG. In general, our approach can also take quadratic time, but the only quadratic behavior is caused by the output being fl( N2) in the worst case. In particular, suppose a program is comprised only of straight-line code, if-then-else, and while-do constructs. By Corollary 2, our algorithm computes control dependence in linear time. We obtain a better time bound for such programs because our algorithm is based on dominance frontiers, whose sizes are not necessarily related to the depth of
ACM Transactions on Programming Languages and Systems, Vol. 13, No. 4, October
1991.
488
Ron Cytron et al.
the dominator tree. For languages that offer only these constructs, control dependence can also be computed from the parse tree [28] in linear time, but our algorithm is more robust. typical cases in linear time. It handles all cases in quadratic time and
9.3 Conclusions
Previous powerful work has shown that SSA form and control that dependence efficient can support in terms of code optimization algorithms are highly
time and space bounds based on the size of the program after translation to the forms. We have shown that this translation can be performed efficiently, that it leads to only a moderate increase in program size, and that applying the early steps in the SSA translation to the reverse graph is an efficient way to compute control control dependence
ACKNOWLEDGMENTS
dependence. This is strong evidence form a practical basis for optimization.
that
SSA form
and
We would Trina Jin-Fan referees,
like Shaw whose
to thank Julian for various thorough
Fran
Allen
for early Tom
encouragement Reps, Randy improvements We are especially
and Fran Scarborough, grateful in this
Allen, and to the
Avery,
Padget,
Bob Paige,
helpful reports
comments. led to many
paper.
REFERENCES 1. AHO,
2,
3.
5.
6.
A. V., SETHI, R., AND ULLMAN, J. D. Compzlers: Principles, Techniques, and Tools. Addison-Wesley, Reading, Mass , 1986. ALLEN, F, E., BURKE, M,, CHARLES, P,, CYTRON, R., AND FERRAiYTE, J An overview of the Distrib. Comput. 5, (Oct. 1988), PTRAN analysis system for multiprocessing. J. Parallel 617-640. ALLEN, J. R Dependence analysis for subscripted variables and its application to program transformations. Ph.D. thesis, Department of Computer Science, Rice Univ., Houston, Tex,, Apr. 1983. ALLEN, J R , AND JOHNSON, S Compiling C for vectorization, parallelization and inline of the SIGPLAN 88 Symposium on Compder Construction. expansion. In Proceedings SIGPLAN Not. (ACM) 23, 7 (June 1988), 241-249 ALPERN, B , WEGMAN, M, N., AND ZADECK, F. K. Detecting equality of values m programs on Principles of Programming Languages In Conference Record of the 15th ACM Symposium (Jan. 1988), ACM, New York, pp. 1-11, BALLANCE, R. A., MACCABE, A. B., AND OTTENSTEIN, K. J. The program dependence web: A representation supporting control-, data-, and demand driven interpretation of languages, In
Proceedings (ACM) 25, 6 of the SIGPLAN 90 Symposkum on Compiler Construction. SIGPLAN Not.
(June 1990), 257-271.
7. BANNING, J. B, An efficient way to find the side effects of procedure calls and the aliases of Record of the 6th ACM Syrnpoa~um on Principles of Programming variables. In Conference Languages (Jan. 1979) ACM, New York, pp. 29-41 8. BARTH, J. M. An interprocedural data flow analysis algorithm. In Conference Record of the
4th York: flow ACM pp. Symposium 119-131. An interval-based ACM of the Trans. SIGPLAN approach Program 86 to exhaustive Lang. Symposium Syst. on and incremental interprocedural data 12, 3 (July Compiler on Principles of Programmmg Languages (Jan. 1977) ACM, New
9. BURKE, M. analysis.
10. BURKE, M,, AND CYTRON, R.

Proceedings (ACM) ACM 21, 7 (June
Interprocedural
dependence
1990), 341-395. analysis and parallelization.

Construction SIGPLAN
In
Not
1986), 162-175.
Languages and Systems, Vol. 13, No. 4, October 1991
Transactions
on Programming
Static Single Assignment Form and the Control Dependence

11. CARTWRIGHT, R., AND FELLEISEN, M.
of the SIGPLAN 89 Symposium
Graph
489
The semantics
of program
dependence.
Not.
on Compiler
Construction.
SIGPLAN
(ACM)
In Proceedings 24, 7 (July

of the 6 (June
1989), 13-27. 12. CHAITIN, G. J.

SIGPLAN 82
Register
Symposium
allocation
on
and spilling
via graph
coloring.
Not.
In Proceedings
(ACM) 17,
Compiler
Construction.
SIGPLAN
1982), 98-105. 13. CHAITIN, G. J., AUSLANDER, M. A., CHANDRA, A. K., COCICE, J., HOPKINS, M. IE., AND MARKSTEIN, P. W. Register allocation via coloring. Comput. Lang. 6 (1981), 47-57. 14. CHASE, D. R. Safety considerations for storage allocation optimizations. In Proceedings of the SIGPLAN 88 Symposium on Compiler Construction. SIGPLAN Not. (ACM) 23, 7 (June 1988), 1-10. 15. CHASE, D. R., WEGMAN, M. AND ZADECK, F. K. Analysis of pointers and structures. In
Proceedings (ACM) of the SIGPLAN 90 Symposium on Compiler Construction. SIGPLAN Not.
16,
17.
18. 19. 20.
1990), 296-310. CHOI, J., CYTRON, R., AND FERRANTE, J. Automatic construction of sparse data flow evaluaRecord of the 18th ACM Symposium on Principles of Programtion graphs. In Conference ming Languages (Jan, 1991). ACM, New Yorkj pp. 55-66. CHOW, F. C. A portable machine-independent global optimizerDesign and measurements. Ph. D. thesis and Tech. Rep. 83-254, Computer Systems Laboratory, Stanford Univ., Stanford, Calif., Dec. 1983. CHOW, F. C., AND HENNESSY, J. L. The priority-based coloring approach to register allocaLang. Syst. 12, 4 (Oct. 1990), 501-536. tion. ACM Trans. Program. COOPER, K. D. Interprocedural data flow analysis in a programming environment. Ph.D. thesis, Dept. of Mathematical Sciences, Rice Univ., Houston, Tex., 1983. CYTRON, R., AND FERRANTE, J. An improved control dependence algorithm. Tech. Rep. RC
25, (June
13291, IBM Corp., Armonk, N. Y., 1987. 21.

22, Whats in a name? In Proceedings of the 1987 International (Aug. 1987), pp. 19-27. CYTRON, R., LOWRY, A., AND ZADECK, F. K. Code motion of control structures in high-level on Principles of Programming languages. In Conference Record of the 13th ACM Symposium Languages (Jan. 1986). ACM, New York, pp. 70-85. DENNIS, J. B. First version of a data flow procedure language. Tech. Rep. Comput. Strut. Group Memo 93 (MAC Tech. Memo 61), MIT, Cambridge, Mass., May 1975. FERRANTE, J., OTTENSTEIN, K. J., AND WARREN, J. D. The program dependence gmph and ACM Trans. Program. Lang. Syst. 9, 3 (July 1987), 319 349. its use in optimization. ACM GIEGERICH, R. A formal framework for the derivation of machine-specific optimizers. Trans. Program. Lang. Syst. 5, 3 (July 1983), 478-498. HAREL, D. A linear time algorithm for finding dominators in flow graphs and related of the 17th ACM Symposium on Theory of Computing (May 1985). problems. In Proceedings
Conference on Parallel Processing CYTRON, R., AND FERRANTE, J.
23. 24. 25. 26.
ACM, New York, pp. 185-194. 27. HORWITZ, S., PFEIFFER, P., AND REPS, T.
Proceedings of the SIGPLAN 89 Symposium
Dependence
on
analysis
for pointer
variables.
SIGPLf~N
In
Not.
Compiler
Construction.
(ACM) 24, 7 (June 1989). 28. HORWITZ, S., PRINS, J., AND REPS, T.
ACM Trans. Program. Lang. Syst.
29.
30. 31. 32. 33.
Integrating non-interfering versions of programs. 1989), 345-387. JONES, N. D., AND MUCHNICK, S. S. Flow analysis and optimization of lLISP-like structures. Flow Analysis, S. S. Muchnick and N, D. Jones, Eds. Prentice-Hall, Englewood In Program Cliffs, N. J., 1981, chap. 4, pp. 102-131. KENNEDY, K. W. Global dead computation elimination. Tech. Rep. SETL Newsl. 111, Courant Institute of Mathematical Sciences, New York Univ., New York, N. Y., Aug. 1973. In Program Flow Analysis, KENNEDY, K. W. A survey of data flow analysis techniques. S. S. Muchnick and N. D. Jones, Eds. Prentice-Hall, Englewood Cliffs, N. J., 1981. of Computers and Computations. Wiley, New York, 1978. KUCK, D. J. The Structure LARUS, J. R. Restructuring symbolic programs for concurrent execution on multiprocessors. at Berkeley, Tech. Rep. UCB/CSD 89/502, Computer Science Dept., Univ. of California Berkeley, Calif., May 1989.
11, 3 (July
ACM Transactions
on Proq-amming
Languages and Systems, Vol. 13, No. 4, October 1991.
490
34
.
LARUS, J
Proceedings
Ron Cytron et al R., AND HILFINGER, P. N

of the ACM SIGPLAN
Detecting
conflicts
between
structure
accesses
SIGPLAN
In
Not.
88 Sympostum
on Compder
Construction.
23, 7 (July 1988), 21-34, 35. LENGAUER, T., AND TARJAN, R. E. ACM Trans. Program. Lang. Syst.
36. 37
38. 39. 40 41. 42. 43.
44.
A fast algorithm for finding dominators m a flowgraph. 1 (July 1979), 121-141 Prentice-Hall, EngleProgram Flow Analysis. MUCHNICK, S. S., AND JONES, N. D , EDS wood Cliffs, NJ., 1981 Record of the MYEKS, E. W. A precise interprocedural data flow algorithm, In Conference 8th ACM Symposium on Principles of Programming Languages (Jan. 1981). ACM, New York, pp. 219-230. OTTENSTEIN, K. J. Data-flow graphs as an intermediate form. Ph.D. thesis, Dept. of Computer Science, Purdue Univ., W. Lafayette, Ind., Aug. 1978. POINTER, L. Perfect report: 1. Tech. Rep CSRD 896, Center for Supercomputing Research and Development, Univ. of Illinois at Urbana-Champaign, Urbana, 111,, July 1989 REIF, J. H., AND LEWIS, H. R. Efficient symbolic analysis of programs. J. Comput. Syst, SCZ. 32, 3 (June 1986), 280-313. REIF, J. H., AND TARJAN, R. E Symbolic program analysis in almost linear time. SIAM J. Comput. 11, 1 (Feb. 1982), 81-93 ROSEN, B. K. Data flow analysis for procedural languages J. ACM 26, 2 (Apr. 1979), 322-344. ROSEN, B. K , WEGMAN, M. N., AND ZADECK, F. K. Global value numbers and redundant Record of the 15th ACM Symposui m on Przn ciples of Programcomputations. In Conference ming Languages, (Jan. 1988). ACM, New York, pp. 12-27, RUGGIERI, C., AND MURTAGH, T. P. Lifetime analysis of dynamically allocated objects. In
1, Conference Record of the 15th ACM Symposzum on Principles of Programming Languages
(Jan. 1988). ACM, New York, pp. 285-293. The representation of algorithms. Tech. Rep, CA-7002-1432, 45. SHAPIRO, R. M , AND SAINT, H Massachusetts Computer Associates, Feb. 1970. 46. SMITH, B. T,, BOYLE, J. M., DONGARRA, J. J,, GARBOW, B, S., IIIEBE, Y., KLEMA, V. C,, AND MOLER, C B. Matrzx Eigensystem Routines Eispack Guide. Springer-Verlag, New York, 1976. 3, 1 (1974), 62-89, 47 TARJAN, R. E. Finding dominators in directed graphs. SIAM J. Comput Property extraction in well-founded property sets. IEEE Trans. Softw. Eng. 48. WEGBREIT, B SE-1, 3 (Sept. 1975), 270-285. 49. WEGMAN, M. N , AND ZADECK, F, K. Constant propagation with conditional branches. In
Conference Record of the 12th ACM Symposium on Prlnczples of Programm mg Languages
(Jan.). ACM, New York, pp. 291-299. 50. WEGMAN, M. N., AND ZADECK, F. K. Constant propagation with conditional branches. ACM Trans. Program. Lang. Syst. To be published. 51 WOLFE, M. J. Optimizing supercompilers for supercomputers. Ph.D, thesis, Dept of Computer Science, Univ. of Illinois at Urbana-Champaign, Urbana 111., 1982 52. YANG, W., HORWITZ, S., AND REPS, T. Detecting program components with equivalent behaviors. Tech Rep. 840, Dept. of Computer Science, Univ. of Wisconsin at Madison, Madison, Apr. 1989 Received July 1989; revised March 1991; accepted March 1991
ACM
TransactIons
on Programming
Languages
and Systems,
Vol
13, No
4, October
1991

Efficiently Computing Static Single Assignment Form and The Control Dependence Graph

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Efficiently Computing Static Single Assignment Form and The Control Dependence Graph

Uploaded by

Copyright:

Available Formats

Efficiently Computing Static Single Assignment Form and the Control Dependence Graph

Ron Cytron et al.

Key Words and Phrases: optimizing compilers

have that that

and experimental is usually strong role presents the

in the size of the original

can be of practical 1 illustrates

use in optimization. of SSA form in a compiler. intermediate

Static Single Assignment Form and the Control Dependence Graph

static single assignment

VI-4 + V2*6 +-- V2 +7

+-V+5 V+-6 --V+7

P then else V V +4 6 times. Fig.3. */ if-then-else anditssingle

P then else VI V2 + + 4 6 tines. */

v~ e- q5(vf, /* IJse V several

operands point. replaced just

@-function variables to Vi.

assignments and is only each

to V reach old variable

thle join V is thus by

Subsequent one assignment

uses of V become Indeed,

uses of V3. The

Vl, Vz, V3, ..., there

for several optirnizations. based on SSA form [501;

3. The else branch to this use of Q tells

test of Q, and propagating

the algorithm is fast and simple. Initially, the algorithm assumes

each edge is unexecutable

then else return

end else do end V3 . . . +V+5

the outer executable, processed,

conditional and the the

may betaken. statements about VI

The outedges they reach are is changed

of the test ofP processed. from When T to 4, and

are marked VI +4 all is uses of

@-Functions placement might seem to require the enumeration of

in the original value V.

affecting statement execution. on a branch if one edge from

a statement is control definitely causes that

cause the statement to be of parallelism [2], program

optimization, 1.3 Outline

to construct it. This here. Our algorithm

SSA form (Section Section 7 explains algorithms structures. experiments

of a program where program

necessarily at its first

1+1 J+ K+l L+-1 I

(1) (1) (1) (1) if (P) then

else print repeat if until 1-1+6 until (T) (I,

and there is an edge to For reasons related to in Section

edges of the graph blocks. We assume

is on a path from of X is any

and on a path to Exit. For Y with an edge X + Y in

each node the graph,

Languages and Systems, Vol

to be nonnull. X. ~ XJ and q: YO ~ YK are said to converge at a node Z

come together the possibility

use of or rather to be a cycle

and in (3) is deliberate. need to consider

One of the paths

may happen of convergence.

FORM been expanded to an some expressions and