Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

INFORMATION AND CONTROL 3, 179-190 (1960)

A Note on the Application of Graph Theory


to Digital Computer Programming*
RICHARD M. KARP
International Business Machines Corporation Research
Center, Yorktown Heights, New York

A graph-theoretic model for the description of flowcharts and pro-


grams is defined. It is shown that properties of directed graphs and
the associated connection matrices can be used to detect errors and
eliminate redundancies in programs. These properties are also used
in the synthesis of composite programs. Finally, the model is ex-
panded to take into account frequencies of execution of portions of a
program, and a problem concerning optimum arrangement of a pro-
gram in storage is solved.

LIST OF SYMBOLS
XI~ X2~ Operational elements
. .. ~ X~
a, ~, % . . . Decision elements
mij E l e m e n t of a c o n n e c t i o n m a t r i x
C = {ci} Set of i n s t r u c t i o n s d e p e n d e n t on a g i v e n condi-
tion
D = {d~.} Set of i n s t r u c t i o n s which cause a decision n o t to
hold
S =- {sk} Set of i n s t r u c t i o n s which set u p a c o n d i t i o n
e, e' A preset b i n a r y v a r i a b l e a n d its n e g a t i o n
P 1 , P~ A pair of p r o g r a m s to be c o m b i n e d
M 1 , M2 M a t r i c e s associated w i t h P1 a n d P2
M C o n n e c t i o n m a t r i x describing a composite pro-
gram
* This paper is derived from a dissertation submitted in partial fulfillment of
the requirements for the Ph.D. degree in applied mathematics at Harvard Uni-
versity, Cambridge, Massachusetts. This work was partially supported by Bell
Telephone Laboratories, Inc. through a contractual agreement with Harvard Uni-
versity.
179
180 ic~Rv

M S Revised form of M
p~. Conditional probability that Xj will follow X~-
N A vector of expected frequencies of elements
Ni The ith element of N
P The matrix (Pc')
Ti The execution time of X~
Ccy N~(1 - plj)
Xij One if X~- follows X~ in storage, zero otherwise

I. INTRODUCTION
In any discussion of a digital computer program, reference is inevitably
made to the "pattern of control" of the program, and to the "paths"
and "loops" which may be traversed in the course of the calculation.
The use of such descriptive terms indicates ~ widespread awareness of
the relevance to programming of the concepts of graph theory. It is
therefore surprising that these concepts have rarely been applied ana-
lytically to programs.
The desirability of such analysis was recognized as early as 1947,
when Goldstine and yon Neumann suggested the development of a
formal logic of programs, based on the flow diagram. Since that time,
however, progress in this area has been slow. A few writers (Blum, 1956;
Perlis and Smith, 1957; Janov, 1957a, b; and Voorhees, 1958) have sug-
gested interesting notations for describing the manner in which the vari-
ous sections of a program are connected together. The work of Janov
also contains some results on the equivalence transformations of pro-
grams.
In spite of the contributions of these writers, there remains a need for
a systematic study of the application of graph theory to digital computer
programming. A model suitable for such a study is developed in the
writer's doctoral thesis. This paper presents the essential features of this
model, and describes some of the applications which have been investi-
gated. A recent paper by Prosser (1959) employs Boolean matrices to
derive independently some of the results given in Section III of this
paper.
Ii. THE REPRESENTATION OF FLOWCHARTS AND PROGRAMS
The flowchart is a commonly used device for the representation of
algorithms. In the form considered here (see Fig. 1), it consists of an
arrangement of elements connected by directed ares. These elements
APPLICATION OF GRAPH THEORY TO PROGRAMMING 181

FIG. 1. A f l o w c h a r t

are of two types: the operational element, designated by a square con-


taining one of the symbols X1, X2, • • • , X~, and the decision element,
designated by a diamond-shaped symbol containing a Greek letter.
Each operational element corresponds to a set of operations performed
successively in executing the algorithm represented; each decision ele-
ment corresponds to a ehoiee between two or more courses of action.
An instance of the execution of the algorithm may be specified by a path
through the flowchart, beginning at a given initial element, and termi-
nating at an operational element having no successor. This model is not
capable of describing flowcharts which modify their own structures.
Other writers (see, for example, Goldstine a n d von Neumann, 1947)
have introduced an element called the variable remote connector to
designate self-modification in flowcharts. The analysis of self-modifying
flowcharts is quite difficult and will not be discussed in this paper.
Given a program written either in machine language or in the source
language of an automatie programming system, one can perform a seg-
mentation into operational elements and decision elements, provided
that the program does not modify its own transfer-of-control instrue-
tions. An operational element corresponds to a sequence of instructions
uninterrupted by a conditional transfer of control, and a decision ele-
ment corresponds to a conditional transfer-of-control instruction. A
segmentation of this kind is implicit in the procedures of Scott (1958)
and Haibt (1959), who were concerned with the automatic preparation
of flowcharts from programs. Henceforth, the terms "program" and
"flowchart" will be used interehangeably.
Corresponding to any given flowchart one can construct a directed
linear graph (el. K6nig, 1950), with a node corresponding to each de-
cision element, initial or terminal operational element, or to any point
at which two arrowheads meet in the flowchart. The graph correspond-
182 KARP

a; $ X5
I ~'1'~ 2 aX2 } ,B' '~ Y ~ 6 X6 7

TtX3
FIG. 2. G r a p h corresponding to t h e flowchart of Fig. 1

ing to the flowchart of Fig. 1 is shown in Fig. 2. I n this example each


decision has two outcomes, labelled -4- and - in the flowchart. Corre-
spondingly, the graph has two arcs, labelled ~ and ~', for each decision
element ~. Decisions with more than two outcomes can be represented
by a simple extension of the notation.
Figure 3 shows a connection matrix (m~.) derived from the graph of Fig.
2. The element mlj is zero whenever nodes i and j of the graph are not
connected by an arc. If nodes i and j are connected by a single arc a l ,
then mij = al • If nodes i and j are connected by the arcs a l , a2, • • • , a~
then

mij = al + a2 + . . . + a~.
The diagonal element m , is zero unless there exists an arc whose head
and tail meet at the node i.

III. SOME TECHNIQUES OF ANALYSIS


The preceding section introduced the directed graph and connection

0 X1 0 0 0 0 0

0 aX 2 0 0 0

~0 0 0 0 0
~' 0 o ~'
y'x 3 r o iI
0 0 0 0 5X 5 + 6~X4

0 0 0 0 0 X6 /

0 0 0 0 0 /
FIG. 3. A connection m a t r i x
A P P L I C A T I O N OF G R A P H T H E O R Y TO P R O G R A M M I N G 183

matrix as descriptors of flowcharts. Of the two descriptors, the connec-


tion matrix is the more suitable for formal manipulation. In this section
we define some fundamental entities associated with connection matrices.
These definitions are then applied to the analysis of flowcharts.
A string of matrix elements of the form m~ilmjlj~ • • • mik_li is a path
of length k from node i to node j. If i = j, the path is called a cycle. A
proper path is one such that no number appears more than once as a
row index or more than once as a column index in the representation
of the path. A proper cycle is a proper path which is a cycle. Any cyclic
permutation of the symbols representing the arcs in a proper cycle gives
a valid representation of the proper cycle. A path B is called a subpath
of a path Z if Z can be written as Z = A B C , where A and C are paths,
either or both of which m a y be null. Thus, the path m2ama4m~2 is a
subpath of the path m~2m2ama4m42m25. A path is a proper path if and
only if it has no subpath which is a cycle, except possibly itself.
For the flowchart of Fig. 1, the proper paths from initial node to termi-
nal node are:
X~aX2~ 'v~XsX~
XlaX2fl'v~'X4X6
l~he proper cycles are:
Xlot '

XtaX25
"y'Xa
Convenient procedures for enumerating the proper paths and proper
cycles of a connection matrix (or the corresponding directed graph)
have been given by Hohn et al. (1957) and by Lunts (1950). The avail-
ability of these procedures will be assumed.
M a n y properties desirable in a "good" flowchart m a y be expressed
in terms of the graph-theoretic concepts just introduced. The first such
property is called eligibility. A flowchart is said to be eligible if each of
its elements lies in a finite path from the initial node to some terminal
node. If a flowchart is ineligible, at least one of the following undesirable
conditions must hold:
1. There is an element which cannot be reached from the initial node,
and hence is totally redundant.
2. There is an element from which no path leads to a terminal node,
184 KARe

so that the algorithm must fail to terminate if that element is reached.


Examples of both these conditions are shown in Fig. 4.
A simple test of eligibility is based on the concept of linkage. Two
paths are said to link each other if one of them can be written A}B, and
the other, C}'D, where A, B, C, and D are subpaths and } is the name
of a decision element) The following recursive procedure m a y be de-
fined:
1. Let S0 be the set of all proper paths from the initial node to a termi-
nal node.
2. The set & having been formed, let &+l be formed by adding to
& all proper cycles which link paths or cycles in & .
3. Let the process continue until, for some N, SN+I = S~.
THEOREM: A flowchart is eligible if and only if each of its elements
appears in some member of S~.
A second type of problem solvable by simple graph-theoretic means
may be stated. Suppose that a given program contains a set C = {c~}
of instructions which require for their proper execution that some speci-
fied condition be satisfied. Typical conditions are the following: tape
rewound, sense light on, accumulator cleared, appropriate constant in
temporary storage, modifiable data address properly set. Suppose also
that there is a set D = {dj} of instructions which cause the condition
not to hold, and a set S = {sk} of instructions which set up the condi-
tion. To ensure proper execution of the instructions in C it is necessary
to test that every proper path from any d~. e D to any c~ e C contains
an sk E S. This condition may be checked by enumerating these proper
paths, using the method of Hohn. The corresponding synthesis problem,
which requires the efficient insertion of elements of S into the program,
is also readily solved. The reader familiar with computer programming
will recognize m a n y frequently encountered variations of this problem.
IV. COMPOSITE PROGRAMS
Often, a program treats several different versions of a single algo-
rithm. The procedure to be used in any particular instance of the exe-
cution of the program is to be specified by the user. This specification
can conveniently be given by preset binary variables, represented
physically by sense switches or equivalent devices. A program in which
the choice of a procedure is so specified will be called a composite pro-
gram. It is desirable to find algorithms for synthesizing efficient compo-
i The restriction to binary decisions is an inessential simplifying assumption.
APPLICATION OF GRAPH THEORY TO PROGRAMMING 185

FIG. 4. Examples of ineligible flowcharts

site programs when a program for each of the possible procedures is


available. It is also desirable to find algorithms for performing the inverse
process, in which a program efficient for a single procedure or only a
few procedures is generated from a composite program.
Let us suppose that the programs PI and P2 of Fig. 5a and b are to
be combined. These programs have the operational elements XI, X2, X3,
and X5 in common; one instance of each element is to appear in the
composite program. The binary variable E will be used to specify which
of the two cases of the composite program is to be executed; the case
e - 1 corresponds to PI ; the case ep = 1, to P2.
The synthesis procedure is as follows:
I. Construct matrices MI and 21//2, each having a row and column
:for each operational element in either P~ or Pc. The entry (M~)ij will
contain a Boolean expression giving the conditions under which Xj. is a
successor of Xi in Pl • These matrices are shown in Fig. 6.

(a)

( b)
FIG. 5. A p a i r of p r o g r a m s to be c o m b i n e d
186 KARP

0 1 0 0 0 0 1 0 0

: a' 0 ~ : 13 I 0 , 8 0 0

M 1= 0 0 1 M2= 0 0 0 0 1

i 0 yl 0 o
° 0 0 0 o

0 0 0 0 0 0
FIG. 6. Matrices representing the structures of P1 and P2
Y
2. Construct a matrix M such that
(M)i~ = e(M1)~ + d ( M 2 ) ~ .
This matrix is shown in Fig. 7.
3. Form a matrix M* from M as follows:
(a) If (M),~. = e + g, (M*)i~ = 1
(b) If {E:} does not appear in some row of M delete all occurrences

The matrix M* is shown in Fig. 8. This matrix is a complete representa-


tion of the composite program, which is shown in Fig. 9. A generaliza-
tion of this procedure to composite programs involving more than one
preset binary variable can be made, but it is not entirely trivial.
The synthesis of composite programs from individual eases has been
considered. Next, it is necessary to consider the inverse problem, in

/c~ ~-F~l o o o1

;!oo. ;1
,

M =

0 ~ 7 "l 0

0 0 0

FIG. 7. Matrix representing the structure of a composite program


APPLICATION OF GRAPH THEORY TO PROGRAMMING 187

0 1 0 0 0
e/~l E~ I E I/3 E~ Oi

M " 0 0 0 E •
t
0 0 )" 0 -

0 0 0 0

F i e . 8. R e v i s e d m a t r i x for a c o m p o s i t e p r o g r a m

which an efficient program for a single case is derived from a composite


program. It will now be assumed that the program of Fig. 9 is given,
and that an efficient program is desired for the ease e = 1. This can
be done simply by deriving the proper paths and proper cycles of the
composite program, removing those in which e' appears, and constructing
a program which has only the remaining proper paths and proper cycles.
In the program of Fig. 9 the only proper path which does not contain
g is:
XIX2EaX47Xs .
The proper cycles which do not contain d are:
EO/'Z2

~X43" X3
Removing the symbol e from these paths, and constructing the unique
eligible program which has the resulting proper path and proper cycles,
we obtain the program of Fig. 5a, as expected.

I F I o . 9. A c o m p o s i t e p r o g r a m
188 KARF

V. A MARKOV MODEL FOR PROGRAMS


For many purposes it is important to know the frequencies of execu-
tion of the operational elements in a program, as well as the manner in
which these elements are interconnected. Often, approximate estimates
of these frequencies can be obtained from a model which treats a program
as a stochastic process whose states are the operational elements. In
the simplest possible stochastic model, a program is considered as a
Markov process with constant transition probabilities. A model of this
type is suitable only if it may be assumed that the outcomes of any
given decision in the program are selected with constant probabilities,
independently of the sequence of program steps prior to the execution
of the decision. Although a Markov model will be used for illustrative
purposes, more complicated models may be required in practice.
Let p~j be the conditional probability that operational element Xj
will follow, given that element X~ is currently being executed. If Xi is a
terminal element, let pil -- 1, where X~ is the initial element of the pro-
gram, and P~s = 0 otherwise. Then if the Markov process described by
the transition matrix P = (p~j) is ergodic, ~ the frequencies of the ele-
ments are given by a solution N = (NI, N2, ... , Nn) of the matrix
equation
NP = N
normalized so t h a t the frequencies of all terminal elements sum to 1.
Once the frequencies N~ have been found, either b y the model de-
scribed or b y some other means, certain useful results can be obtained.
First, if r~ denotes the execution time of element X~, and if no two op-
erational elements are executed simultaneously, t h e n the expected exe-
cution time of the program is given b y

Secondly, certain minimization problems of interest can also be


treated, once the quantities p~- and N~ are known. One such problem is
to minimize the expected number of transfers of control performed in
executing a program. If X j is the successor of X~ in storage, then a trans-
fer of control at the termination of X~ is required whenever some ele-
m e n t other t h a n X j is to be executed next; the expected n u m b e r of
such transfers of control is N~(1 - p~j), which we denote b y c ~ . L e t
2 Finite state Markov processes are discussed by Feller (1957), Chapters 15
and 16.
APPLICATION OF GRAPH THEORY TO PROGRAMMING 189

x~j be one if X i immediately follows X~ in storage, zero otherwise. Let


an element Xn+l, consisting of data words and other words which are
not executed as instructions, be defined. Then c~,~+1 = c~+I,~ = 0 for
all i. We define xi,~+l = 1 if X~ is the last element in storage,
and x~+l,~ = 1 if X~ is the first element in storage. Otherwise xi,~+l and
x~+l,~ are equal to zero for all i. Thus X~+~ is the successor of the last op-
erational element in storage, and the predecessor of the first. Then (xi~)
is constrained to be a cyclic permutation matrix. Subject to this con-
straint, we wish to minimize the expected number of transfers of control
executed, given by
nq-1 n-~-I
E E
i=l j = l

This is precisely the traveling-salesman problem, for which methods of


solution have been given by Eastman (1958) and Dantzig et al. (1954).
The method of E a s t m a n seems quite effective for small problems (most
problems involving matrices of dimension ten can be done easily by
hand), but it has not been tested for large problems.
ACKNOWLEDGMENT
It is a pleasure to acknowledge the many helpful suggestions of my colleagues
both at the Harvard Computation Laboratory and at the IBM Research Center,
and to express special thanks to Professor Anthony G. Oettinger for his guidance
and encouragement.

RECEIVED: October 7, 1959.

REFERENCES

BLUM, E. K. (1956). Automatic digital encoding system II (Ades II). U. S. Naval


Ordnance Laboratory.
])ANTZIG, G., FULKERSON,R., AND JOHNSON, S. (1954). Solution of a large scale
traveling-salesman problem. J. Operations Res. Soc. Am., 2, No. 4, 393-410.
EASTMAN,W. L. (1958). Linear programming with pattern constraints. Doctoral
Thesis, Harvard University.
FELLER, W. (1957). "An Introduction to Probability Theory and its Applica-
tions," Vol. 1, 2nd edn. Wiley, New York.
GOLDSTINE, H. H., ANDYON ~NTEUMANN,J. (1947). "Planning and Coding of Prob-
lems for an Electronic Computing Instrument," Vol. 1. Part II. Institute for
Advanced Study, Princeton.
ttA~BT, L. M. (1959). A program to draw multilevel flowcharts. Proc. Western Joint
Computer Conf., pp. 131-137.
HOHIN,F. E., SESI-IU,S., ANDAUFENKAMP,D. D. (1957). The theory of nets. IRE
Trans. on Electronic Computers, EC-6, No. 3, 154-161.
190 KanP

JANOV, Y. I. (1957a). O ravnosil'nosti preobrazovanijax sxem programm (On the


equivalence and transformation of program schemata). Doklady Akad. Nauk
SSSR, 113, No. 1, 39-42. (English translation given in Communications Assoc.
Computing Mach., 1, No. 10, 8-12, 1958.
JANov, Y. I. (1957b). O matrichnyx sxcmax (On matrix schemes). Doklady Akad.
Nauk SSSR, 113, No. 2, 283-286.
KARP, R. M. (1959). Some applications of logical syntax to digital computer pro-
gramming. Doctoral Thesis, Harvard University.
K6NIo, D. (1950). "Theorie der Endlichen und Unendlichen Graphen," (Theory
of Finite and Infinite Graphs). Chelsea, New York.
LUNTS, A. G. (1950). Prilozhenie matrichnoj bulevoj algebry k analizu i sintezu
relejno-kontaktnyx sxem (The application of Boolean matrix algebra to the
analysis and synthesis of relay contact networks). Doklady Akad. Nauk NSSR,
70, No. 3, 421-423. (English translation given in "Theory of Switching."
Progress Report No. BL-5, Harvard Computation Laboratory, 1954.
PERLIS, A. J., AND SMITH, J. W. (1957). A mathematical language compiler. Frank-
lin Inst. Symposium on Automatic Coding, pp. 87-102.
PROSSER, R. T. (1959). Applications of Boolean matrices to the analysis of flow
diagrams. Proc. Eastern Joint Computer Conf., pp. 133-138.
SCOTW, A. E. (1958). Automatic preparation of flowchart listings. J. Assoc. Com-
puting Mach., 5, No. 1, 57-66.
VOORI~EES, E. A. (1958). Algebraic formulation of flow diagrams. Communications
Assoc. Computing Mach., 1, No. 6, 4-8.

You might also like