The Collected Works of Professor Donald Knuth

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 625

The Collected Works Of Professor Donald Knuth

Donald Ervin Knuth is an American computer scientist, mathematician, and professor emeritus at Stanford

University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer

science. He is the author of the multi-volume work The Art of Computer Programming.
Semantics of Context-Free Languages

by
DONALD E. KNUTH
California Institute of Technology

ABSTRACT

"Meaning" may be assigned to a string in a context-free language by defining "at-


tributes" of the symbols in a derivation tree for that string. The attributes can be de-
fined by functions associated with each production in the grammar. This paper
examines the implications of this process when some of the attributes are "synthe-
sized", i.e., defined solely in terms of attributes of the descendants of the correspond-
ing nonterminal symbol, while other attributes are "inherited", i.e., defined in terms
of attributes of the ancestors of the nonterminal symbol. An algorithm is given which
detects when such semantic rules could possibly lead to circular definition of some
attributes. An example is given of a simple programming language defined with both
inherited and synthesized attributes, and the method of definition is compared to
other techniques for formal specification of semantics which have appeared in the
literature.

A simple technique for specifying the "meaning" of languages defined


by context-free grammars is introduced in Section 1 of this paper, and its
basic mathematical properties are investigated in Sections 2 and 3. An
example which indicates how the technique can be applied to the formal
definition of programming languages is described in Section 4, and finally,
Section 5 contains a somewhat biased comparison of the present method to
other known techniques for semantic definition. The discussion in this
paper is oriented primarily towards programming languages, but the same
methods appear to be relevant also in the study of natural languages.
1. Introduction. Let us st/ppose that we want to give a precise defini-
tion of binary notation for numbers. This can be done in many ways, and in
this section we want to consider a manner of definition which can be gen-
eralized so that the meaning of other notations can be expressed in the
same way. One such way to define binary notation is to base a definition on

127
MATHEMATICAL SYSTEMS THEORY, Vol. 2, No. 2
Published by Springer-Verlag New York Inc.
128 DONALD E. KNUTH
the following context-free grammar:
B ---> 0

B ---> l
L--> B
(1.1) L --+LB

N---~ L
N---~ L . L
(Here the terminal symbols a r e . , 0, and 1; the nonterminal symbols are
B, L, and N, standing respectively for bit, list of bits, and n u m b e r ; and a
binary n u m b e r is i n t e n d e d to be any string of terminal symbols which can
be obtained f r o m N by application of the above productions.) This gram-
mar says in effect that a binary n u m b e r is a sequence of one or more O's
and l's, optionally followed by a radix point and a n o t h e r sequence o f one
or more O's and l's. F u r t h e r m o r e , the g r a m m a r assigns a certain tree struc-
ture to each binary n u m b e r ; for example, the string 1101 • 01 receives the
following structure:

N
/\
L L
/\ /\
L B L B
(1.2)
/\ I I I
L B 1 B 1
/\ I I
L B 0 0
1 I
B 1
I
1

It is natural to define the m e a n i n g of binary notation (1.1) in a step-by-


step m a n n e r corresponding to this structure; the m e a n i n g o f the notation
as a whole is built up f r o m meanings of each part. This can be d o n e by
assigning attributes to the nonterminal symbols, as follows:
Each bit B has a "value" v(B) which is an integer.
Each list of bits L has a "length" l(L) which is an integer.
Each list of bits L has a "value" v(L) which is an integer.
Each n u m b e r N has a "value" v(N) which is a rational number.
(Note that each L has two attributes; in general we could ascribe any de-
sired n u m b e r of attributes to each nonterminal symbol.)
T h e g r a m m a r (1.1) may now be a u g m e n t e d so that semantic rules are
Semantics of Context-Free Languages 129
given for each rule of the syntax:

B-~ 0 v(B) = o

B---~ 1 v(B)= 1
L---~ B v(L) = v(B), l(L) = 1
(1.3)
L 1 ---} L2B v(L1) = 2v(L2) + v(B), I(L1) = I(L2) + 1
N'-* L v(N) = v(L)
N ' - * L1 • L2 v ( N ) = v(LO + v(L2)/2 uL2~

(In the fourth and sixth rules subscripts have been used to distinguish
between occurrences of like nonterminals.) Here the semantic rules define
all of the attributes of a nonterminal in terms of the atrributes of its im-
mediate descendants, so ultimately values are defined for each attribute.
The semantic rules are phrased in terms of notations which are assumed to
be already understood. Notice for example that the symbol "0" in the
semantic rule "v(B) = 0" is to be interpreted quite differently from the
symbol "0" in the production "B ~ 0"; the former denotes a mathematical
concept, the integer zero, while the latter denotes a written character which
has a certain elliptical shape. In a sense it is just coincidence that the two
symbols look the same.
The structure (1.2) may be augmented by showing the attributes at
each level:

N(v _ 13.25)

y d, k2 ) ,(VT,)
L(v/= 3, l = 2) B(v=. O) 1 =0)
.(y=,
1
o i
B(~ = 1) 1
!
1
Thus "1101 • 01" means 13.25 (in decimal notation).
This manner of defining semantics for context-free languages is es-
sentially well known, since it has already been used by several authors. But
there is an important way to extend this method, and it is this extension
which will be of primary interest to us.
Suppose for example that we want to define the semantics of binary
notation in a different way corresponding more closely to the manner in
which we usually think of the notation. T h e leading "1" in "1101 • 01"
really denotes 8, although according to (1.4)it is ascribed the value 1. Per-
haps therefore it would be better to define the semantics in such a way that
130 DONALD E. KNUTH

positional characteristics play a role. We could have the following at-


tributes:

Each B has a "value" v(B) which is a rational n u m b e r .


Each B has a "scale" s(B) which is an integer.
Each L has a "value" v(L) which is a rational n u m b e r .
Each L has a "length" l(L) which is an integer.
Each L has a "scale" s(L) which is an integer.
Each N has a "value" v ( N ) which is a rational number.

T h e s e attributes can be defined as follows:

Syntactic rules Semantic rules


B---~O v(B) = 0
B--~I v(B) = 2 * ~
L--*B v(L) = v(B), s(B) = s(L), l(L) = 1
(1.5) L1 --* L2B v(L1) = v(L~) + v(B), s(B) = s(L~),
s(L2) = s(L1) + 1, l(L~) = l(L~) + 1
N --~ L v ( N ) = v(L), s(L) = 0
N ~ L 1 • L2 v ( N ) = v ( L , ) + v(L2), s(LO = O,
s(L2) = --/(L2)

( H e r e the semantic rules are listed using the convention that the right-
h a n d side o f each equation is the definition o f the left-hand side; thus,
"s(B) = s(L)" says that s(L) is to be evaluated first, t h e n s(B) is defined to
have this same value.)
T h e i m p o r t a n t feature o f g r a m m a r (1.5) is that some o f the attributes
are defined for nonterminals which a p p e a r on the right side o f the corre-
s p o n d i n g production, while in (1.3) all attributes were defined when the
n o n t e r m i n a l a p p e a r e d on the left side. H e r e we are using both synthesized
attributes (which are based on the attributes o f the descendants o f the non-
terminal symbol) and inherited attributes (which are based on the attributes
o f the ancestors). Synthesized attributes are evaluated f r o m the b o t t o m u p
in the tree structure, while inherited attributes are evaluated f r o m the top
down. G r a m m a r (1.5) contains the synthesized attributes v(B), v(L), l(L),
v ( N ) and also the inherited attributes s(B) and s(L), so the evaluation in-
volves going in both directions. T h e evaluated structure c o r r e s p o n d i n g to
the string 1101 • 01 is
Semantics of Context-Free Languages 131

(1.6) iN(v= li.25) ~

L(v = 13, l = 4, s.= 0)

~/
i
v = 12, l = 3, s = l~ B(v = 1, s = O)
\
\.
L(v = O, l= 1, s = - 1 ) B(v=
\ .25, s = - 2 ) I
L(v = 12, 1 = 2 , s = 2) B(v = O, s ~ 1) 1 B(vTO, s = - I ) 1
LIV T 8, I= I, s= 3~)B(v T 4, s= 2~) 0 0
B(v ~ 8, s = 3) 1
1
Here it can be noted that the "length" attributes of the L's to the right of
the radix point must be evaluated from the bottom up before the "scale"
attributes can be evaluated (from the top down) and finally the "value"
attributes (from the bottom up).
Grammar (1.5) is probably not the "best possible" grammar for binary
notation, but it does seem to correspond better to our intuition than gram-
mar (1.3). (A grammar which agrees more exactly with our conventional
understanding of binary notation could be based on a different set of pro-
duction rules which would assign another structure to the string of bits at
the right of the radix point; then the "length" attribute, which is not really
relevant, would be unnecessary.)
Our interest in g r a m m a r (1.5) is not that it is an ideal definition of
binary notation, but rather that it shows an interaction between inherited
and synthesized attributes. It is not always obvious when semantic rules
such as those in (1.5) do not amount to a circular definition, because the
attributes are not evaluated in a single direction; an algorithm which tests
for circularity appears later in this paper.
The importance of inherited attributes is that they arise naturally in
practice and that they are "dual" to synthesized attributes in a straight-
forward manner. Although binary notation can be formulated using
nothing but synthesized attributes, there are many languages for which
such a restriction leads to a very awkward and unnatural definition of
semantics. Situations which involve a mixture of inherited and synthesized
attributes are essentially the same as the cases which have been most dif-
ficult to handle in previous formulations of semantic rules.
2. Formal properties. Let us now put the ideas of synthesized and in-
herited attributes into a more precise and more general setting.
Suppose we have a context-free grammar f~ = (V, N, S, 6~), where V is
the (finite) vocabulary of terminal and nonterminal symbols; N C V is the
set of nonterminal symbols; S • N is the "start symbol", which appears on
the right-hand side of no production rule; and ~ is the set of production
rules. Semantic rules are added to fg in the following manner: To each
132 DONALDE. KNUTH
symbol X • V we associate a finite set `4(X) o f attributes; `4(X) is p a r t i t i o n e d
into two disjoint sets, the synthesized attributes Ao(X) a n d the i n h e r i t e d
attributes A~(X). We r e q u i r e AI(S) to be e m p t y (i.e., the start symbol S has
no i n h e r i t e d attributes); similarly we r e q u i r e Ao(X) to be e m p t y if X is a
t e r m i n a l symbol. Each attribute a in A(X) has a (possibly infinite) set o f
possible values V~, f r o m which o n e value will be selected (by m e a n s o f the
semantic rules) for each a p p e a r a n c e o f X in a derivation tree.
Let ~ consist o f m productions, a n d let the p-th p r o d u c t i o n be

(2.1) Xpo ~ XpaXp2 • • • Xp,p ,


w h e r e np t> 0, Xp0 • N, a n d Xpj • V for 1 ~<j ~< np. T h e semantic rules a r e
functionsfpj~ d e f i n e d for all 1 ~< p ~< m, 0 <~j ~< np, a n d ct • Ao(Xpj) i f j = 0,
ot •,4 ~(Xpj) i f j > 0. Each such function is a m a p p i n g o f V~, x V~2 x • • • x Vat
into V~, f o r s o m e t = t(p,j, o0 >i O, w h e r e each o~ = ai(p,j, or) is an attribute
o f s o m e Xpki, f o r 0 ~< k~ ----k~(p,j, o~) <~ np, 1 <~ i <~ t. I n o t h e r words, each
semantic rule m a p s values o f certain attributes o f Xp0, Xpl, • • • , Xpnp into
the value o f s o m e attribute o f Xpj.
For e x a m p l e , (1.5) is the g r a m m a r ~ = ({0, 1,., B, L, N } , {B, L, N } , N,
{B ~ O, B ~ 1, L ---) B, L ~ LB, N --) L, N --~ L • L } ). T h e attributes are
Ao(B) = {v}, AI(B) = {s}, Ao(L) = {v, l}, A,(L) = {s}, Ao(N) = { v ) , A I ( N ) =~,
a n d Ao(x) = A~(x) = ~ for x • {0, 1, .}. T h e attribute value sets are Vv =
{rational n u m b e r s } , V, = VI = {integers}. A typical p r o d u c t i o n rule is the
f o u r t h p r o d u c t i o n X40 "-~ X41X42, w h e r e n4 = 2, X40 = X41 = L, X42 = B. A
typical semantic rule c o r r e s p o n d i n g to this p r o d u c t i o n isf40~, which defines
v(X4o) in t e r m s o f o t h e r attributes; in this casef40~ m a p s V~ x V~ into V~, a n d
it is t h e mappingf40~(x, y) = x + y . (This is the rule "v(L1) = v(L2) + v(B)" o f
(1.5); in t e r m s o f the r a t h e r c u m b e r s o m e notation o f the p r e c e d i n g para-
g r a p h we have t(4, 0, v) = 2, oq(4, 0, v) = a2(4, 0, v) = v, k1(4, 0, v) = 1,
k2(4, 0, v) = 2.)
T h e semantic rules m a y be used to assign a " m e a n i n g " to strings o f the
c o n t e x t - f r e e language, in the following way. For any derivation o f a termi-
nal string t f r o m S by a sequence o f p r o d u c t i o n s , construct the derivation
tree in the usual way: T h e root o f this tree is S, a n d each n o d e is labeled
either with a t e r m i n a l symbol, or with a n o n t e r m i n a l symbol Xp0 corre-
s p o n d i n g to an application o f the p-th p r o d u c t i o n , for s o m e p; in the latter
case the n o d e has np i m m e d i a t e descendants,

(2.2)
Xpl Xp2 ° . . Xpn p

(cf. (1.2)). Now let X be the label o f a n o d e o f the tree a n d let c~ e A ( X ) b e


an attribute o f X. I f o~ • Ao(X) t h e n X = Xp0 f o r s o m e p, while if o~ • A 6 X )
t h e n X = Xp~ f o r s o m e j a n d p, 1 ~<j ~< np, w h e r e in either case the tree in the
n e i g h b o r h o o d o f this n o d e has the f o r m (2.2). T h e attribute ~ is d e f i n e d to
Semantics of Context-Free Languages 133
have the value v at this node if, in the corresponding semantic rule

(2.3) fpj,~: Val X • • • X Vat ---> Va


all of the attributes al, • • • , at have previously been defined to have the
respective values v l , ' ' ' , vt at the respective nodes labeled Xpkl, • " • ,
Xpk t, and v = fpj,~(vl, • • • , vt). This process of attribute definition is to be
applied t h r o u g h o u t the tree until no more attribute values can be defined,
and then the defined attributes at the root of the tree constitute the "mean-
ing" corresponding to the derivation tree (cf. (1.6)).
It is natural to require that the semantic rules are formulated in such a
way that all attributes can always be defined at all nodes, in any conceivable
derivation tree. Let us say the semantic rules are well defined if this condition
holds. Since there are in general infinitely many derivation trees, it is im-
portant to be able to decide if a given g r a m m a r has well defined semantic
rules or not. An algorithm for testing this condition is presented in Section 3.
Let us note that this m e t h o d of semantic definition is as powerful as
any conceivable m e t h o d could be, in the sense that the value of any at-
tribute of any node of a derivation tree may d e p e n d in any desired way on
the entire tree. For example, suppose we ascribe two inherited attributes
l ("location") and t ("tree") to each symbol except S in a context-free
g r a m m a r , and one synthesized attribute s Csubtree") t o each nonter-
minal symbol. Here 1 ranges over finite sequences of positive integers
{a~ • a2 . . . . . ak} which specify the location of tree nodes in a familiar
index or "Dewey decimal" notation (see [8, p. 310]); t and s consist of sets
of o r d e r e d pairs (l, X ) , where l is a node location and X is a symbol of the
g r a m m a r denoting the label of the node at location I. T h e semantic rules,
for each production (2.1), are:

= II(Xpo) " j if Xpo ~ S ;


l(Xpj) tj
ifXpo = S;

(2.4) t(Xpj) = Jt(X,o) if Xpo ~ S ;


[s(X.o) if X~o -- S ;
np
s(X,,o) = {(l(Xpo), Xpo)[ Xpo ~ S } U jl~J= "[s(Xpj)l X p j ~ N } :

Thus, for example, in the tree (1.2) we have


s ( N ) = {(1, L), (2, • ) (3, L), (1.1, L), (1.2, B), (3.1, L), (3.2, B),
(1,1.1, L), (1.1.2, B), (1.2.1, 1), (3.1.1, B), (3.2.1, 1),
(1.1.1.1, L), (1.1.1.2, B), (1.1.2.1, 0), (3.1.1.1, 0),
(1.1.1.1.1, B), (1.1.1.2.1, 1), (1.1.1.1.2.1, 1)}.
This clearly contains all the information of the entire derivation tree. T h e
semantic rules (2.4) define the attribute t on all nodes (except the root)
to be the set representing the entire derivation tree, while 1 is the location
134 DONALD E. KNUTH

of that node. It is therefore evident that any conceivable function of the


derivation tree can be an attribute of any node, since such a function is
f(t, l) for somef.
Similarly we can show that synthesized attributes alone are sufficient
to define the meaning associated with any derivation tree, since the syn-
thesized attribute w defined by the rule
np
(2.5) w(Xpo)= {(0, Xp0)} U U {(j"
j=l
a,x)l (a,X) ~w(Xvj),XpjeN}
evaluated at the root specifies the entire tree. Any semantic rules definable
by the method of this section can be considered to be a function of this
attribute w, and therefore the method is inherently no more powerful
than a method which uses no inherited attributes. But this statement is
very misleading, since semantic rules which do not use inherited attributes
are often considerably more complicated (and more difficult to under-
stand and to manipulate) than semantic rules which allow both kinds of
attributes. The ability to let the whole tree influence the attributes of each
node of the tree often leads to rules of semantics which are much simpler
and which correspond to the way in which we actually understand the
meanings involved.
3. Testing for circularity. Now let us consider an algorithm which
determines whether or not a collection of semantic rules, as described in
the previous section, is well defined; in other words, we want to know when
the semantic rules will always lead to definitions of all attributes at all
nodes of all derivation trees. We may assume that the grammar contains
no "useless" productions, i.e., that each production of ~ appears in the
derivation of at least one terminal string.
Let 3- be any derivation tree obtainable in the grammar, having only
terminal symbols as labels of its terminal nodes, but allowed to have any
symbol of V (not only the start symbol S) as the label of the root. T h e n we
can define a directed graph D(~--) corresponding to 3- by taking the
ordered pairs (X, a) as vertices, where X is a node of J - and a is an at-
tribute of the symbol which is the label of node X. The arcs of D(~--) go
from (XI, al) to (X2, a2) if and only if the semantic rule for the value of
attribute a2 depends directly on the value of attribute a~. For example,
if 3- is the tree (1.2) and if the semantic rules are given by (1.5), then
D(3-) is the directed graph
Semantics of Context-Free Languages 135
In other words, the vertices of D(3-) are the attribute values which must be
determined, and the arcs specify the dependency relations which imply
that certain attribute values must be computed before others. (Cf. (1.6).)
It is clear that the semantic rules are well defined if and only if no
directed graph D(Y') contains an oriented cycle. For if there are no
oriented cycles, there is a well-known procedure which assigns values to
each attribute (see [8, p. 258] ). And if there is an oriented cycle in some
D ( J - ) , the fact that the grammar contains no useless productions implies
that there is an oriented cycle in some D ( J ' ) in which the root of Y- has
the label S; this ~" is a derivation tree of the language for which it is im-
possible to evaluate all of the attributes. Therefore the problem, "Are the
semantic rules well-defined?" reduces to the problem, "Do the directed
graphs D(~--) contain any oriented cycles?"
Each directed graph D(T) may be regarded as the superposition of
smaller directed graphs Dp corresponding to each of the productions
Xp0 --* Xp~ • • • Xpnp of the grammar, 1 ~< p ~< m. In the notation of Section
2, the directed graph Dp has vertices (Xp~, a), for 0 <~j <~rip, a ~ A(Xpj), and
arcs from (Xpki, oti) to (Xpj, a) for 0 <~j <~np, a ~ Ao(Xpj) i f j = 0, ot E AI(Xpj) if
j > 0, ki = ki(p,j, oO, oti = ai(p,j, or), 1 <~i <<-t(p,j, or). In other words, Dp re-
flects the dependencies of all the semantic rules associated with the p-th
production. For example the six productions of grammar (1.5) correspond
to six directed graphs, namely

DI: °v(B) °s(B) D2: "v(~B) "s(B)

o3: o4:
(3.2)
"*v(B) #s(B) ov(L2) *'I(L2) ds(L2) "*v(B)-'°s(B)

Ds:]Iv(N) Dn: ,/, , ~ . ° (N)

• v(L) .l(L)*s(L) *v(L1) .t(L,) s(L,) v(Lz)*ttL2~*s(L2)


The directed graph (3.1) is obtained by "pasting together" various sub-
graphs having these forms. In general if 9- has a terminal symbol as the
label of the root, D(~--) has no arcs; if the root of 9-- is labeled with a non-
terminal symbol, ~-" has the form

(3.3)
/X.o\
Y--1 "'" 3-,p

for some p, where ~--j is a derivation tree with Xp~ as the label of the root,
for I ~ j ~ np. In the former case we will say ~-- is a derivation tree of type
0, and in the latter case we will say ~-- is a derivation tree oftypep; according
to the definition, D(~--) is obtained in this case from Dp, D(J-1), • • • ,
D(3",p) by identifying the vertices for attributes o f Xpj with the correspond-
ing vertices for the attributes of the root of 3-j in D(~--j), 1 <~j <~ np.
136 DONALD E. KNUTH

In o r d e r to test w h e t h e r D(ff-) contains o r i e n t e d cycles, o n e f u r t h e r


concept is useful. Let p be the n u m b e r o f a p r o d u c t i o n , a n d for 1 <<.j <<-np
suppose G~ is any directed g r a p h whose vertices are a subset of/I(Xpj), the
attributes o f Xpj; t h e n let

(3.4) D p [ G . • • • , Grip]
be the directed g r a p h obtained f r o m D v by a d d i n g an arc f r o m (Xpj, or) to
(Xp¢, a ' ) w h e n e v e r t h e r e is an arc f r o m a to a ' in Gj. For example, if we have
vl s vs
GI = e ~ • , G2 = T,~•
a n d if D 4 is the directed g r a p h a p p e a r i n g in (3.2), t h e n D4[G1, G2] is

T h e following algorithm may now be used: " F o r 1 ~ p ~< m let D~ be the


directed g r a p h with vertices A(Xp0), a n d with an arc f r o m a to a ' if and only
if t h e r e is an o r i e n t e d path f r o m (Xp0, a) to (Xp0, a') in D v. Let D~ be the
e m p t y directed g r a p h having no vertices. Now add f u r t h e r arcs to D'I, • • • ,
Dm by the following p r o c e d u r e until no f u r t h e r arcs can be a d d e d : Choose
an integer p, with 1 ~< p ~< m, a n d for 1 ~<j ~< np let q(j) = 0 ifXp¢ is terminal,
o r choose an integer q(j) such that Xp~ is the left-hand side o f the q(j)-th
production, i.e., X¢o)0 = Xpj. T h e n if t h e r e is an orientdd path f r o m (Xpo, a)
to (Xp0, a') in the directed g r a p h

(3.5) D v[D~,,, • • • , D;,ap)],

there should be an arc f r o m a to a ' in D~." It is clear that this process must
ultimately t e r m i n a t e with n o m o r e arcs a d d e d , since only finitely m a n y arcs
are possible in all.
In the case o f g r a m m a r (1.5), this algorithm begins with
V $ V 8 V l S

D1'----oo D2t = L ,° D3t = O O O

v l s v v
D'4 ~-- • • • D'5 = • D'6 = •
a n d adds arcs until finally we have
V°$ ~ 8 V l $

D', = • • D~' = U" D~ = -~,~,,.


I s v v
t __
94- o~.,/o D; = • D~ = •

After the above algorithm terminates, we can prove that there is an


oriented path from (X, or) to (X, or') in some D(ff-), where if" is a derivation tree of
type p with root X, if and only if there is an arc from ot to or' in D~. For the con-
struction does not add any arc f r o m a to a' unless such a D ( 3 - ) exists; the
algorithm could readily be e x t e n d e d so that it would in fact print out an
Semantics of Context-Free Languages 137
appropriate derivation tree ~- for each arc in Di, • • • , D~,. Conversely,
suppose Y" is a derivation tree with root X, for which D(Y-) contains an
oriented path f r o m (X, or) to (X, a'); we can prove by induction on the n u m -
ber o f nodes of Y" that there is an arc f r o m a to a ' i n D , , where Y- is of type
p: Since D(~J") contains at least one arc, 3- must be of the f o r m (3.3), and
D ( J ' ) is "pasted together" f r o m Dp, D(~"I), • • • , D(Sr%). By induction and
the fact that no arcs r u n f r o m D(~Y'j) to D(J'~,) f o r j # j * , any arcs of the as-
s u m e d path which appear in D(~--1), • • • , D(3-%) may be replaced by ap-
propriate arcs in Dp[D~t), • • • , D'~,p)], where Y'j is o f type q(j), 1 ~<j ~<np;
and we have an oriented path f r o m (Xp0, a') in this directed graph, hence
t
there is an arc f r o m a to a ' in Dp.
T h e above algorithm now affords a solution to the problem posed in
this section:
T H E O R E M . Semantic rules added to a grammar as described in Section 2 are
well defined if and only if none of the directed graphs (3.5), for any admissible
choice of p, q(1), • • • , q(np) as specified in the above algorithm, contains an
oriented cycle.
Proof If (3.5) contains an oriented cycle, the remarks just m a d e prove
that some D(3-) contains an oriented cycle. Conversely, if ~-- is a tree with
the fewest possible nodes, such that D ( J ' ) contains an oriented cycle, then
~-- must be of the f o r m (3.3) and D ( J ' ) is "pasted together" f r o m Dp,
D(3-1), • • • , D(3-%). By the minimality of 3-, the oriented cycle involves
at least one arc of Dp, and therefore we may argue as above that any arcs
o f the cycle which are within D (3"1), • • • , D (9--%) may be replaced by arcs
of (3.5) when ~--~ is of the type q(~).
4. A simple programming language. Now let us consider an example
of how the above techniques of semantic definition can be applied to pro-
g r a m m i n g languages. For simplicity let us study a formal definition o f a
little language that describes T u r i n g machine programs.
A T u r i n g machine (in the classical sense) processes an infinite tape
which may be t h o u g h t of as divided into squares; the machine can read or
write characters f r o m a finite alphabet on the tape in the square which is
currently being scanned, and it can move the scanning position to the left
or right. T h e following program, for example, adds unity to an integer
expressed in binary notation a n d prints a radix point at the right of this
n u m b e r , assuming that the square just to the right of the n u m b e r is to be
scanned at the beginning a n d end o f the program:
tape alphabet is blank, one, zero, point;
print "point";
go to carry;
test: if the tape symbol is "one" then
(4.1) {print "zero"; carry: m o v e left one square; go to test};
print "one";
realign: move right one square;
if the tape symbol is "zero" then go to realign.
138 DONALD E. KNUTH

(It is hoped that the reader will find this programming language sufficiently
self-explanatory that he understands it before any formal definition of the
language is given, although of course this is not necessary. The above pro-
gram is not intended as an example of good programming, rather as an ex-
ample of the features of the simple language considered in this section.)
Since every programming language must have a name, let us call the
language Turingol. Any well-formed Turingol program defines a program
for a T u r i n g machine; let us say a Turing machine program consists of

a set Q of "states";
a set Z of "symbols";
an "initial state" qo E Q;
a "final state" q® ~ Q;

and a "transition function" 6 which maps (Q - {q~o}) × X into X × {-1, O,


+1 } × Q. If 8(q, s) = (s', k, q') we may say informally that, if the machine is
in state q and scanning symbol s, it will print symbol s', move k spaces to
the right (meaning one space to the left if k = -1), and go into state q'.
More formally, a Turing machine program defines a computation on any
"initial tape contents", i.e., on any doubly infinite sequence

(4.2) " " " , a-3, a-2, a-l, ao, al, a2, a3, • " "

of elements of ~, as follows: At any moment of the computation there is a


"current state" q ~ Q and an integer-valued "tape position" p; initially q - q0
a n d p --- 0. Ifq ~ q=, and if 6(q, ap) = (s', k, q'), the computation proceeds by
replacing the value of ap by s', then by replacing p by p + k and q by q'. If
q = q=, the computation terminates. (The computation might not termi-
nate; for program (4.1) this happens if and only ifaj ="one" for allj < 0.)
Now that we have a precise definition of T u r i n g machine programs, we
wish to define the Turing machine program corresponding to any given
Turingol program (and at the same time to define the syntax of Turingol).
For this purpose it is convenient to introduce a few abbreviation conven-
tions.
(1) The semantic rule "include x in B" associated with a production will
mean that x is to be a member of set B, where B is an attribute of the start
symbol S of the grammar. The value of B will be the set of all x for which
such a semantic rule has appeared corresponding to each appearance of
the production in the derivation tree. (This rule may be regarded as an
abbreviation for the semantic rule
%
(4.3) B(Xpo) = U B(Xpj) U {x] "include x in B" is associated
J=~ with the p-th production}
Semantics of Context-Free Languages 139
added to each production, with a set B added as a synthesized attribute of
each nonterminal symbol, and B(x) the empty set for each terminal symbol.
These rules clearly make B(S) the desired set.)
(2) The semantic rule "definef(x) = y " associated with a production will
mean that y is to be the value of the f u n c t i o n f evaluated at x, w h e r e f is an
attribute of the start symbol S of the grammar. If two rules occur defining
f(x) for the same value of x, this is an error condition, and any derivation
tree which allows this condition to occur may be said to be malformed.
F u r t h e r m o r e , f may be used as a function in any other semantic rules, with
the proviso that f ( x ) m a y only appear when f has been defined at x; any
derivation tree which calls for an undefined value o f f ( x ) is malformed.
(This type of rule is important, for example, to ensure that there is agree-
ment between the declaration and use of identifiers. In the example below
this convention implies that programs are malformed if the same identifier
is used twice as a label or if a go to statement specifies an identifier which is
not a statement label. The rule may essentially be thought of as "include
(x, y) i n f " , as in (1), i f f is regarded as a set of ordered pairs; additional
checks for malformedness are also included. We may regard "well-formed
or malformed" as an attribute of S; appropriate semantic rules analogous
to (4.3) which completely specify this "define f(x) = y" convention are
readily constructed and left to the reader.)
(3) The function "newsymbol" appearing in any semantic rule will
have, as its value, an abstract element which for each evaluation of "new-
symbol" is different from the abstract element produced by other evalua-
tions of newsymbol. (This convention can readily be expressed in terms of
other semantic rules, e.g., by making use of the l attributes of (2.3) which
has a different value at each node of a tree. T h e function n e w s y m b o l
serves as a convenient source of "raw material" for constructing sets.)
We have observed that conventions (1), (2), (3) can be replaced by other
constructions of semantic rules which do not use these conventions, so they
are not "primitives" for semantics. But they are of fairly wide utility, since
they correspond to concepts which are often needed, so they may be re-
garded as fundamental aspects of the techniques for semantic definition
presented in this paper. The effect of using these conventions is to reduce
the number of attributes that are explicitly mentioned and to avoid un-
necessarily long rules.
Now it is a simple matter to present a formal definition of the syntax
and semantics of Turingol.

Nonterminal symbols: P (program), S (statement), L (list of statements),


I (identifier), O (orientation), A (alphabetic character), D (declaration).

Terminal symbols: a b c d e f g h i j k lFm n o p q r s t u v w x y z . , : ; .... { } tape


alphabet is print go to if the symbol then m o v e left right one square
140 DONALD E, KNUTH

Start symbol: p
Attributes:
Name of attribute Type of value Purpose
Q Set States of the program
Set Symbols of the program
qo Element of Q Initial state
q= Element of Q Final state
Function from (Q-q=) × ~i, Transition function
intoX x { - 1 , 0 , + 1 } XQ
label Function from strings of State table for statement
letters into elements of Q labels
symbol Function from strings of Symbol table to tape
letters into elements of S symbols
follow Element of Q State immediately follow-
ing statement or list of
statements
d +1 Direction
text String of letter's Identifier
start Element of Q State at the beginning of a
statement or list of state-
ments (an inherited
attribute).
Productions and semantics: See Table 1.

Notice that two states correspond to each statement S: start (S) is the
state corresponding to the first instruction of the statement (if any), and it
is an inherited attribute of S; follow (S) is the state which "follows" the state-
ment, the state which is normally reached after the statement is executed.
In the case of a "go statement", however, the program does not transfer to
follow (S), since the action of the statement is to change control to another
place; follow (S) may be said to follow statement S "statically" or "textually",
not "dynamically" during a run of the program.
In Table 1, follow (S) is a synthesized attribute; it is possible to give
similar semantic rules in which follow (S) is inherited, although a less effi-
cient program would be obtained for null statements (see Rule 4.4). Simi-
larly, both start (S) and follow (S) could be synthesized attributes, but at the
expense of additional instructions in the Turing machine program for
statement lists (Rule 6.2).
This example would be somewhat simpler if we had used a less standard
definition of Turing machine instructions. The definition we have used
requires reading, printing, and shifting in each instruction, and also makes
the Turing machine into a kind of "one-plus-one-address computer" in
which each instruction specifies the location (state) of the next instruction.
Semantics o f C o n t e x t - F r e e L a n g u a g e s 141

Table 1.

Description No. Syntactic Rule Example Semantic Rules

Letters 1.1 A ~ a a text (A) = a.


. . . . . . . . . . (similarly for all letters)
1.26 A --*z z text (A) = z.
Identifiers 2.1 1 --->A m text (1) = text (A).
2.2 1 --> 1,4 marilyn text (I) = text (I) text (A).
Declarations 3.1 D ---> t a p e tape a l p h a b e t d e f i n e s y m b o l (text (I)) =
a l p h a b e t is I is marilyn newsymbol;
i n c l u d e symbol (text (1)) in S.
3.2 D --->D, I tape a l p h a b e t d e f i n e symbol (text ( / ) ) =
is marilyn, newsymbol;
jayne, birgitta i n c l u d e symbol (text (I)) in S.
Print s t a t e m e n t 4.1 S --* p r i n t "I" p r i n t 'jayne" d e f i n e 8 (start (S), s) = (symbol
(text (I)), 0, follow (S)) for
a l l s e X;
follow (S) = n e w s y m b o l ;
i n c l u d e follow (S) in Q.
Move s t a t e m e n t 4.2 S --* m o v e 0 o n e move left one define 8 (start (S), s) = (s, d(O),
square square follow (S)) for all s e X;
follow (S) = n e w s y m b o l ;
i n c l u d e follow (.S) in Q.
4.2.1 O "-' left left d(O) = - 1 .
4.2.2 O --* f i g h t fight d(O) = +1.
Go statement 4.3 S ~ go to I go to boston d e f i n e 8 (start (S),s)= (s, O,
label (text (I)) for all s e X;
follow (S) = n e w s y m b o l ;
i n c l u d e follow (S) in Q.
Null s t a t e m e n t 4.4 S --~ follow (S) = start (S).
Conditional 5.1 $1 --> i f t h e t a p e if t h e t a p e define 8 (start (SO,s) = (s, O,
statement s y m b o l is "I" s y m b o l is follow ($2)) for all s ~ X -
t h e n $2 "nmr/lyn" symbol (text (1));
then define 8 (start (SO,s) = (s, O,
p r i n t "]ayne" start ($2)) for s = symbol
(text (I));
start ($2) = n e w s y m b o l ;
follow ($1) = follow ($2);
i n c l u d e start ($2) in Q.
Labeled 5.2 S 1 --> I : Sz boston: m o v e d e f i n e label (text (I)) =
statement left o n e start (S~);
square start (Sz) = start ($1);
follow (SO = follow ($2).
Compound 5.3 S --> {L} {print'Jayne";; start (L) = start (S);
statement go to boston } follow (S) = follow (L).
List o f 6.1 L --->S p r i n t "'jayne" start (S) = start (L);
statements follow (L) = follow (S).
6.2 LI-~ L2; S print"jayne";; start (L2) = start (LI);
go to boston follow (L2) = n e w s y m b o l ;
i n c l u d e follow (Lz) in Q;
start (S) = follow (Lz);
follow (LO = follow (S).
Program 7 P-*D;L. tape alphabet q0 = n e w s y m b o l ;
is marilyn, i n c l u d e q0 i n Q;
jayne, birgitta; start (L) = qo;
p r i n t "'jayne ". q® = follow (L).
142 DONALD E, KNUTH

The method of defining semantic rules in this example, with an inherited


"first (S)" and a synthesized "follow (S)" attribute, lends itself readily also
to computers or automata in which the (n + 1)st instruction normally is
performed after the n-th. Then (follow (S) -- start (S)) would be the num-
ber of instructions "compiled" for statement S.
This definition of Turingol seems to approach the desirable goal of
stating almost exactly the same things which would appear in an informal
programmer's manual explaining the language, except that the description
is completely formal and unambiguous. In other words, this definition
perhaps corresponds to the way we actually understand the language in
our minds. The Definition 4.1 of a print statement, for example, might be
freely rendered in English as follows:
"A statement may have the form
print "I"

where I is an identifier. This means that, whenever this statement is exe-


cuted, the tape symbol on the currently scanned square will be replaced by
the symbol denoted by I, regardless of what symbol was being scanned;
afterwards the program will continue with a new instruction, which is de-
fined (by other rules) to be the instruction following this statement."
5. Discussion. The idea of defining semantics by associating synthe-
sized attributes with each nonterminal symbol, and associating correspond-
ing semantic rules with each production, is due to Irons [6, 7]. Originally
each nonterminal symbol was given exactly one attribute, its "translation".
This idea was applied by Irons and later authors, notably McClure [ 14], in
the design of "syntax-directed compilers" which translate programming
languages into machine instructions.
As we have observed in Section 2, synthesized attributes alone are (in
principle) sufficient to define any function of a derivation tree. But in
practice, the inclusion of inherited attributes as well as synthesized attri-
butes, as described in this paper, leads to important simplifications. The
definition of Turingol, for example, shows that the agreement between
declaration and use of symbol s , and the association of labels to statements,
may be easily treated. "Block structure" is another common aspect of pro-
gramming languages whose definition is greatly facilitated by the use of
inherited attributes. In general, inherited attributes are useful when part
of the meaning of some construction is determined by the context in which
that construction appears. The method of Section 2 shows how both in-
herited and synthesized attributes can be treated formally, and Section 3
shows that it is possible to rule out problems of circularity (which are poten-
tial sources of difficulty when both inherited and synthesized attributes
are mixed).
The principal contributions to formal semantic definition of program-
ming languages, at least those known to the author at the time of writing,
are de Bakker's definition of ALGOL60 by means of a growing Markovian
Semantics of Context-Free Languages 143

algorithm [ 1] ; Landin's definition of ALGOL60 by means of the X-calculus


[9, 10, 11] (see also B6hm [2, 3]); McCarthy's definition of MicrO-ALGOLby
means of recursive functions applied to the program and to "state vectors"
[12] (see also McCarthy and Painter [13]); Wirth and Weber's definition
of Euler, by means of semantic rules applied as a program is parsed [ 16];
and the IBM Vienna Laboratory's definition of PL/I [15] based on the
work of McCarthy, Landin, and abstract machines defined by Elgot [4, 5].
The most striking difference between the previous methods and the
definition of Turingol in Table 1 is that the other definitions are processes
which are defined on programs as a whole in a rather intricate manner; it
may be said that a person must understand an entire compiler for the
language before he can understand the definition of the language. This
difficulty is most pronounced in the work of de Bakker, who defines a
machine having approximately 800 instructions, analogous to Markov
algorithms but somewhat more complicated; at each stage of this ma-
chine's computations we are to execute the last applicable instruction, so
we cannot verify that instruction number 100 will be performed until we
can prove to ourselves that the 700 subsequent instructions are inapplic-
able; furthermore, additional instructions are added to the list by the ac-
tions of the machine. It is clearly very difficult for a reader to understand
the workings of such a machine, or to give formal proofs of its important
properties. By contrast, the above definition of Turingol defines each con-
struction of the language only in terms of its "immediate environment",
minimizing to a large extent the interconnections between the definitions
of different parts of the language. The definition of compound statements
and go statements, etc., does not influence the definition of print state-
ments in a substantial way; for example, any of Rules 4.1, 4.2, 4.3, 4.4, 5.1,
5.3 could be deleted and we would obtain a valid definition of another
language. This localization and partitioning of the semantic rules tends to
make the definition easier to understand and more concise.
Although the other authors cited above do not make use of such an
intricately interwoven definition as de Bakker's, the relatively complex
interdependence is still present. For example, consider the formal defini-
tion of Euler given by Wirth and Weber [16, pp. 94-98]; this is a concise
definition of a very sophisticated language, and so it is certainly one of the
most successful formal definitions ever devised. Yet even though Wirth
and Weber tested their definition by means of extensive computer simula-
tion, it is quite probable that their language contains some features which
would surprise its authors. The following Euler program is syntactically
and semantically well-formed, although the label L is never followed by
a colon:
± b e g i n label L; n e w .4; .4 ~-- O;
if false t h e n go to L else L;
out 1;L;A ~-A+ 1;out2;
if false t h e n go to L else
if A < 2 t h e n go to L else out 3; L e n d ±
144 DONALD E. KNUTH

The output of this program is 1, 2, 2, 3! Oversights such as this are not un-
expected when an algorithmic definition of a language is constructed; they
are less likely to occur when the methods of Section 4 are employed.
It appears to be reasonable to assert that none of the previous schemes
for formal definition of semantics could produce a definition of Turingol
that is as brief or as easy to comprehend as the definition given above; and
(although the details have not of course been worked out) it also appears
that ALGOL 60, Euler, MicrO-ALGOL, and PL/I can be defined using the
methods of Section 4 in a manner which has advantages over the defini-
tions previously given. But of course the author cannot judge these things
impartially, and more experience is needed before these claims can be
substantiated.
Notice that semantic rules as given in this paper do not depend on any
particular form of syntactic analysis. In fact, they need not even be tied
down to specific forms of the syntax: All that the semantic rules depend on
is the name of the nonterminal symbol on the left of a production and the
names of the nonterminals on the right. Particular punctuation marks, and
the order in which the nonterminals appear on the right-hand side of any
production, are immaterial as far as the semantic rules are concerned.
Thus, the method of semantics considered here blends well with
McCarthy's idea [12, 13] of "abstract syntax".
When a syntax is ambiguous, in the sense that some strings of the
language have more than one derivation tree, the semantic rules give us
one "meaning" for each derivation tree. For example, suppose the rules
L1 ~ BL2 v(L1) = 2t(L2)v(B) q- v(L2), l(Li) =/(L2) + 1
are added to grammar (1.3). T h e n the grammar becomes syntactically
ambiguous; but it still is semantically unambiguous since the attribute
v(N) has the same value over all derivation trees. On the other hand, if we
were to change production 5.2 of Turingol from S ~ I: S to S ~ S: I, the
grammar would become syntactically and semantically ambiguous.

REFERENCES

[ 1] J. w. DZ BAKKER,Formal definition ofprogramming languages, with an application to the defini-


tion of ALGOL 60, Math Cent. Tracts 16, Mathematisch Centrum, Amsterdam, 1967.
[2] C. B6HM, The CUCH as a formal and description language, FormalLanguage Description
Languages for Computer Programming, pp. 266-294, Proc. IFIP Working Conf.,
Vienna (1964), North Holland, 1966.
[3] CORRADOB6HM and WOLF GROSS, "Introduction to the OUCH,"Automata Theory (ed. by
E. R. Caianiello), pp. 35-65, Academic Press, 1966.
[4] C. C. ELGOT, "Machine species and their computation languages," Formal Language
Description Languages for Computer Programming, pp. 160-179, Proc. IFIP Working
Conf., Vienna (1964), North Holland, 1966.
[5] C. C. ELGOTand A. ROBINSON,"Random-access, stored program machines, an approach
to programming languages,"J. ACM 11 (1964), 365-399.
[6 ] EDGART. 1RONS,A syntax directed compiler for ALGOL 00, Comm.ACM 4 (1961), 51-55.
Semantics of Context-Free Languages 145

[7] EDGART. IRONS, Towards more versatile mechanical translators, Proc. Sympos. Appl.
Math., Vol. 15, pp. 41-50, Amer. Math. Soc., Providence, R. I., 1963.
[8] DONALDE. KNUTH,The Art of Computer Programming, I, Addison-Wesley, 1968.
[9] P.J. LANDXN,"The mechanical evaluation of expressions," Comp.J. 6 (1964), 308-320.
[10] P.J. LANDIN,A formal description of ALGOL60, Formal Language Description Languages
for Computer Programming, pp. 266-294, Proc. IFIP Working Conf., Vienna, (1964),
North H.olland, 1966.
[11] P. J. LANDIN, A correspondence between ALGOL60 and Church's lambda notation,
Comm. ACM 8 (1965), 89-101, 158-165.
[12] JOHN MCCARTHY,A formal definition of a subset of ALGOL,Formal Language Descrip-
tion Languages for Computer Programming, pp. 1-12, Proc. IFIP Working Conf.,
Vienna (1964), North Holland, 1966.
[13] JOHN MCCARTHYand JAMESPAmTrR, Correctness of a compiler for arithmetic expres-
sions, Proc. Sympos. Appl. Math., Vol. 17, to appear, Amer. Math. Soc., Providence,
R. I., 1967.
[14] ROBERT M. McCLUm~, T M G - A syntax directed compiler, Proc. ACM Nat. Conf. 20
(1965), 262-274.
[ 15] PL/I Definition Group of the Vienna Laboratory, Formal definition of PL/L IBM Techni-
cal Report TR 25.071 (1966).
[16] NXKLAUSWroTH and HELMUTWEBER, Euler: A generalization of ALGOL,and its for-
mal definition, Comm. ACM 9 (1966), 11-23, 89-99,878.

(Received 15 November 1967)


Interview with Donald E. Knuth

Published on Free Software Magazine (http://www.freesoftwaremagazine.com)

Interview with Donald E. Knuth


A prime number of questions for the Professor Emeritus of
the Art of Computer Programming
By Gianluca Pignalberi
We all know that the typesetting of Free Software Magazine is entirely TeX-based. Maybe somebody don’t
know yet that Prof. Donald Knuth designed TeX, and did it about 30 years ago. Since then the TeX project
has generated a lot of related tools (i.e., LaTeX, ConTeXt, ?, and others).

This year I had the chance and the honor of interviewing Professor Knuth. I’m proud, as a journalist and
FSM’s TeX-nician, to see it published in what I consider “my magazine”.

Prof. Knuth while reading one of the magazines typeset by his program TeX. Photo by Jill Knuth (she
is a graduate of Flora Stone Mather College (FSM))

Donald E. Knuth, Professor Emeritus of the Art of Computer Programming, Professor of (Concrete)
Mathematics, creator of TeX and METAFONT, author of several fantastic books (such as Computers
and Typesetting, The Art of Computer Programming, Concrete Mathematics) and articles, recipient of the
Turing Award, Kyoto Prize, and other important awards; fellow of the Royal Society (I could keep
going). Is there anything you feel you have wanted to master and haven’t? If so, why?

Thanks for your kind words, but really I’m constantly trying to learn new things in hopes that I can then help
teach them to others. I also wish I was able to understand non-English languages without so much difficulty;
I’m often limited by my linguistic incompetence, and I want to understand people from other cultures and
other eras.

Your algorithms are well known and well documented (I’ll only quote, for brevity’s sake, the
Knuth-Morris-Pratt String Matching algorithm), which allows everyone to use, study and improve
upon them freely. If it wasn’t clear through your actions, in an interview with Dr. Dobb’s Journal, you
stated your opinion about software patents, which are forcing people to pay fees if they either want to

Interview with Donald E. Knuth 1


Interview with Donald E. Knuth
use or modify patented algorithms. Has your opinion on software patents changed or strengthened? If
so, how? And how do you view the EU parliament’s wishes to adopt software patent laws?

I mention patents in several parts of The Art of Computer Programming. For example, when discussing one of
the first sorting methods to be patented, I say this:

Alas, we have reached the end of the era when the joy of discovering a new algorithm was
satisfaction enough! Fortunately the oscillating sort isn’t especially good; let’s hope that
community-minded folks who invent the best algorithms continue to make their ideas freely
available.

I don’t have time to follow current developments in the patent scene; but I fear that things continue to get
worse. I don’t think I would have been able to create TeX if the present climate had existed in the 1970s.

On my recent trip to Europe, people told me that the EU had wisely decided not to issue software patents. But
a day or two before I left, somebody said the politicians in Brussels had suddenly reversed that decision. I
hope that isn’t true, because I think today’s patent policies stifle innovation.

However, I am by no means an expert on such things; I’m just a scientist who writes about programming.

The portion of the paper where Knuth answered question 2 of this interview. Please note that he
typographically quotes and typesets even when writing by hand: the quotation from TAOCP and the
word TeX

So far you have written three volumes of The Art of Computer Programming, are working on the fourth,
are hoping to finish the fifth volume by 2010, and still plan to write volumes six and seven. Apart from
the Selected papers series, are there any other topics you feel you should write essays on, but haven’t got
time for? If so, can you summarize what subject you would write on?

I’m making slow but steady progress on volumes 4 and 5. I also have many notes for volumes 6 and 7, but
those books deal with less fundamental topics and I might find that there is little need for those books when I
get to that point.

I fear about 20 years of work are needed before I can bring TAOCP to a successful conclusion; and I’m 67
years old now; so I fondly hope that I will remain healthy and able to do a good job even as I get older and
more decrepit. Thankfully, at the moment I feel as good as ever.

A prime number of questions for the Professor Emeritus ofthe Art of Computer Programming 2
Interview with Donald E. Knuth
If I have time for anything else I would like to compose some music. Of course I don’t know if that would be
successful; I would keep it to myself if it isn’t any good. Still, I get an urge every once in awhile to try, and
computers are making such things easier.

There are rumours that you started the TeX project because you were tired of seeing your manuscripts
mistreated by the American Mathematical Society. At the same time, you stated that you created it
after seeing the proofs of your book The Art of Computer Programming. Please, tell our readers briefly
what made you decide to start the project, which tools you used, and how many people you had at the
core of the TeX team.

No, the math societies weren’t to blame for the sorry state of typography in 1975. It was the fact that the
printing industry had switched to new methods, and the new methods were designed to be fine for magazines
and newspapers and novels but not for science. Scientists didn’t have any economic clout, so nobody cared if
our books and papers looked good or bad.

I tell the whole story in Chapter 1 of my book Digital Typography, which of course is a book I hope
everybody will read and enjoy.

The tools I used were home grown and became known as Literate Programming. I am enormously biased
about Literate Programming, which is surely the greatest thing since sliced bread. I continue to use it to write
programs almost every day, and it helps me get efficient, robust, maintainable code much more successfully
than any other way I know. Of course, I realize that other people might find other approaches more to their
liking; but wow, I love the tools I’ve got now. I couldn’t have written difficult programs like the MMIX
meta-simulator at all if I hadn’t had Literate Programming; the task would have been too difficult.

At the core of the TeX team I had assistants who read the code I wrote, and who prepared printer drivers and
interfaces and ported to other systems. I had two students who invented algorithms for hyphenation and line
breaking. And I had many dozens of volunteers who met every Friday for several hours to help me make
decisions. But I wrote every line of code myself.

Chapter 10 of my book Literate Programming explains why I think a first-generation project like this would
have flopped if I had tried to delegate the work.

Portrait of Donald E. Knuth by Alexandra Drofina. Commercial users should write to Yura Revich
(revich@computerra.ru) for permission

Maybe you feel that some of today’s technologies are still unsatisfactory. If you weren’t busy writing
your masterpieces, what technology would you try to revolutionize and in what way?

A prime number of questions for the Professor Emeritus ofthe Art of Computer Programming 3
Interview with Donald E. Knuth

Well certainly I would try to work for world peace and justice. I tend to think of myself as a citizen of the
world; I am pleasantly excited when I see the world getting smaller and people of different cultures working
together and respecting their differences. Conversely I am distressed when I learn about deep-seated hatred or
when I see people exploiting others or shoving them around pre-emptively.

In what way could the desired revolution come about? Who knows… but I suspect that “Engineers Without
Borders” are closer than anybody else to a working strategy by which geeks like me could help.

Thank you again for your precious time.

Thank you for posing excellent questions!

Biography
Gianluca Pignalberi (/user/4" title="View user profile.): Gianluca (/contacts/g.pignalberi) is Free Software
Magazine's Compositor.

Copyright information
This article is made available under the "Attribution-NonCommercial-NoDerivs" Creative Commons License
3.0 available from http://creativecommons.org/licenses/by-nc-nd/3.0/.

Source URL:
http://www.freesoftwaremagazine.com/articles/interview_knuth

Biography 4
Dancing Links
Donald E. Knuth, Stanford University

My purpose is to discuss an extremely simple technique that deserves to be better known.


Suppose x points to an element of a doubly linked list; let L[x] and R[x] point to the
predecessor and successor of that element. Then the operations
   
L R[x] ← L[x], R L[x] ← R[x] (1)
arXiv:cs/0011047v1 [cs.DS] 15 Nov 2000

remove x from the list; every programmer knows this. But comparatively few programmers
have realized that the subsequent operations
   
L R[x] ← x, R L[x] ← x (2)

will put x back into the list again.


This fact is, of course, obvious, once it has been pointed out. Yet I remember feeling
a definite sense of “Aha!” when I first realized that (2) would work, because the values of
L[x] and R[x] no longer have their former semantic significance after x has been removed
from its list. Indeed, a tidy programmer might want to clean up the data structure by
setting L[x] and R[x] both equal to x, or to some null value, after x has been deleted.
Danger sometimes lurks when objects are allowed to point into a list from the outside;
such pointers can, for example, interfere with garbage collection.
Why, therefore, am I sufficiently fond of operation (2) that I am motivated to write an
entire paper about it? The element denoted by x has been deleted from its list; why would
anybody want to put it back again? Well, I admit that updates to a data structure are
usually intended to be permanent. But there are also many occasions when they are not.
For example, an interactive program may need to revert to a former state when the user
wants to undo an operation or a sequence of operations. Another typical application arises
in backtrack programs [16], which enumerate all solutions to a given set of constraints.
Backtracking, also called depth-first search, will be the focus of the present paper.
The idea of (2) was introduced in 1979 by Hitotumatu and Noshita [22], who showed
that it makes Dijkstra’s well-known program for the N queens problem [6, pages 72–82]
run nearly twice as fast without making the program significantly more complicated.
Floyd’s elegant discussion of the connection between backtracking and nondeterminis-
tic algorithms [11] includes a precise method for updating data structures before choosing
between alternative lines of computation, and for downdating the data when it is time to
explore another line. In general, the key problem of backtrack programming can be re-
garded as the task of deciding how to narrow the search and at the same time to organize
the data that controls the decisions. Each step in the solution to a multistep problem
changes the remaining problem to be solved.
In simple situations we can simply maintain a stack that contains snapshots of the
relevant state information at all ancestors of the current node in the search tree. But the
task of copying the entire state at each level might take too much time. Therefore we often
need to work with global data structures, which are modified whenever the search enters
a new level and restored when the search returns to a previous level.

1
For example, Dijkstra’s recursive procedure for the queens problem kept the current
state in three global Boolean arrays, representing the columns, the diagonals, and the
reverse diagonals of a chessboard; Hitotumatu and Noshita’s program kept it in a doubly
linked list of available columns together with Boolean arrays for both kinds of diagonals.
When Dijkstra tentatively placed a queen, he changed one entry of each Boolean array
from true to false; then he made the entry true again when backtracking. Hitotumatu
and Noshita used (1) to remove a column and (2) to restore it again; this meant that they
could find an empty column without having to search for it. Each program strove to record
the state information in such a way that the placing and subsequent unplacing of a queen
would be efficient.
The beauty of (2) is that operation (1) can be undone by knowing only the value of x.
General schemes for undoing assignments require us to record the identity of the left-hand
side together with its previous value (see [11]; see also [25], pages 268–284). But in this
case only the single quantity x is needed, and backtrack programs often know the value
of x implicitly as a byproduct of their normal operation.
We can apply (1) and (2) repeatedly in complex data structures that involve large
numbers of interacting doubly linked lists. The program logic that traverses those lists
and decides what elements should be deleted can often be run in reverse, thereby deciding
what elements should be undeleted. And undeletion restores links that allow us to continue
running the program logic backwards until we’re ready to go forward again.
This process causes the pointer variables inside the global data structure to execute an
exquisitely choreographed dance; hence I like to call (1) and (2) the technique of dancing
links.
The exact cover problem. One way to illustrate the power of dancing links is to consider
a general problem that can be described abstractly as follows: Given a matrix of 0s and
1s, does it have a set of rows containing exactly one 1 in each column?
For example, the matrix

0 0 1 0 1 1 0
 
1 0 0 1 0 0 1
0 1 1 0 0 1 0
 
(3)
1 0 0 1 0 0 0
 
0 1 0 0 0 0 1
 
0 0 0 1 1 0 1

has such a set (rows 1, 4, and 5). We can think of the columns as elements of a universe,
and the rows as subsets of the universe; then the problem is to cover the universe with
disjoint subsets. Or we can think of the rows as elements of a universe, and the columns as
subsets of that universe; then the problem is to find a collection of elements that intersect
each subset in exactly one point. Either way, it’s a potentially tough problem, well known
to be NP-complete even when each row contains exactly three 1s [13, page 221]. And it is
a natural candidate for backtracking.
Dana Scott conducted one of the first experiments on backtrack programming in 1958,
when he was a graduate student at Princeton University [34]. His program, written for the
IAS “MANIAC” computer with the help of Hale F. Trotter, produced the first listing of all

2
ways to place the 12 pentominoes into a chessboard leaving the center four squares vacant.
For example, one of the 65 solutions is shown in Figure 1. (Pentominoes are the case n = 5
of n-ominoes, which are connected n-square subsets of an infinite board; see [15]. Scott
was probably inspired by Golomb’s paper [14] and some extensions reported by Martin
Gardner [12].)

Figure 1. Scott’s pentomino problem.

This problem is a special case of the exact cover problem. Imagine a matrix that
has 72 columns, one for each of the 12 pentominoes and one for each of the 60 cells of
the chessboard-minus-its-center. Construct all possible rows representing a way to place
a pentomino on the board; each row contains a 1 in the column identifying the piece, and
five 1s in the columns identifying its positions. (There are exactly 1568 such rows.) We can
name the first twelve columns F I L P N T U V W X Y Z, following Golomb’s recommended
names for the pentominoes [15, page 7], and we can use two digits ij to name the column
corresponding to rank i and file j of the board; each row is conveniently represented by
giving the names of the columns where 1s appear. For example, Figure 1 is the exact cover
corresponding to the twelve rows
I 11 12 13 14 15
N 16 26 27 37 47
L 17 18 28 38 48
U 21 22 31 41 42
X 23 32 33 34 43
W 24 25 35 36 46
P 51 52 53 62 63
F 56 64 65 66 75
Z 57 58 67 76 77
T 61 71 72 73 81
V 68 78 86 87 88
Y 74 82 83 84 85 .
Solving an exact cover problem. The following nondeterministic algorithm, which I
will call algorithm X for lack of a better name, finds all solutions to the exact cover problem
defined by any given matrix A of 0s and 1s. Algorithm X is simply a statement of the
obvious trial-and-error approach. (Indeed, I can’t think of any other reasonable way to do
the job, in general.)

3
If A is empty, the problem is solved; terminate successfully.
Otherwise choose a column, c (deterministically).
Choose a row, r, such that A[r, c] = 1 (nondeterministically).
Include r in the partial solution.
For each j such that A[r, j] = 1,
delete column j from matrix A;
for each i such that A[i, j] = 1,
delete row i from matrix A.
Repeat this algorithm recursively on the reduced matrix A.

The nondeterministic choice of r means that the algorithm essentially clones itself into
independent subalgorithms; each subalgorithm inherits the current matrix A, but reduces
it with respect to a different row r. If column c is entirely zero, there are no subalgorithms
and the process terminates unsuccessfully.
The subalgorithms form a search tree in a natural way, with the original problem at
the root and with level k containing each subalgorithm that corresponds to k chosen rows.
Backtracking is the process of traversing the tree in preorder, “depth first.”
Any systematic rule for choosing column c in this procedure will find all solutions,
but some rules work much better than others. For example, Scott [34] said that his initial
inclination was to place the first pentomino first, then the second pentomino, and so on;
this would correspond to choosing column F first, then column I, etc., in the corresponding
exact cover problem. But he soon realized that such an approach would be hopelessly slow:
There are 192 ways to place the F, and for each of these there are approximately 34 ways
to place the I. The Monte Carlo estimation procedure described in [24] suggests that the
search tree for this scheme has roughly 2 × 1012 nodes! By contrast, the alternative of
choosing column 11 first (the column corresponding to rank 1 and file 1 of the board),
and in general choosing the lexicographically first uncovered column, leads to a search tree
with 9,015,751 nodes.
Even better is the strategy that Scott finally adopted [34]: He realized that piece X
has only 3 essentially different positions, namely centered at 23, 24, and 33. Furthermore,
if the X is at 33, we can assume that the P pentomino is not “turned over,” so that it takes
only four of its eight orientations. Then we get each of the 65 essentially different solutions
exactly once, and the full set of 8 × 65 = 520 solutions is easily obtained by rotation and
reflection. These constraints on X and P lead to three independent problems, with

103,005 nodes and 19 solutions (X at 23);


106,232 nodes and 20 solutions (X at 24);
126,636 nodes and 26 solutions (X at 33, P not flipped),
when columns are chosen lexicographically.
Golomb and Baumert [16] suggested choosing, at each stage of a backtrack procedure,
a subproblem that leads to the fewest branches, whenever this can be done efficiently. In
the case of an exact cover problem, this means that we want to choose at each stage a
column with fewest 1s in the current matrix A. Fortunately we will see that the technique

4
of dancing links allows us to do this quite nicely; the search trees for Scott’s pentomino
problem then have only
10,421 nodes (X at 23);
12,900 nodes (X at 24);
14,045 nodes (X at 33, P not flipped),
respectively.
The dance steps. One good way to implement algorithm X is to represent each 1 in the
matrix A as a data object x with five fields L[x], R[x], U [x], D[x], C[x]. Rows of the matrix
are doubly linked as circular lists via the L and R fields (“left” and “right”); columns are
doubly linked as circular lists via the U and D fields (“up” and “down”). Each column
list also includes a special data object called its list header.
The list headers are part of a larger object called a column object. Each column ob-
ject y contains the fields L[y], R[y], U [y], D[y], and C[y] of a data object and two additional
fields, S[y] (“size”) and N [y] (“name”); the size is the number of 1s in the column, and the
name is a symbolic identifier for printing the answers. The C field of each object points
to the column object at the head of the relevant column.
The L and R fields of the list headers link together all columns that still need to be
covered. This circular list also includes a special column object called the root, h, which
serves as a master header for all the active headers. The fields U [h], D[h], C[h], S[h], and
N [h] are not used.
For example, the 0-1 matrix of (3) would be represented by the objects shown in
Figure 2, if we name the columns A, B, C, D, E, F, and G. (This diagram “wraps around”
toroidally at the top, bottom, left, and right. The C links are not shown because they
would clutter up the picture; each C field points to the topmost element in its column.)
Our nondeterministic algorithm to find all exact covers can now be cast in the following
explicit, deterministic form as a recursive procedure search(k), which is invoked initially
with k = 0:
If R[h] = h, print the current solution (see below) and return.
Otherwise choose a column object c (see below).
Cover column c (see below).
 
For each r ← D[c], D D[c] , . . . , while r 6= c,
set Ok ← r;  
for each j ← R[r], R R[r] , . . . , while j 6= r,
cover column j (see below);
search(k + 1);
set r ← Ok and c ←C[r];
for each j ← L[r], L L[r] , . . . , while j 6= r,
uncover column j (see below).
Uncover column c (see below) and return.
The operation of printing the current solution is easy: We successively print the rows
containing O0 , O
 1 , .. . , Ok−1, where
 the row
 containing data object O is printed by
printing N C[O] , N C[R[O]] , N C[R[R[O]]] , etc.

5
A B C D E F G
h
2 2 2 3 2 2 3

Figure 2. Four-way-linked representation of the exact cover problem (3).

To choose a column object c, we could simply set c ← R[h]; this is the leftmost
uncovered column. Or if we want to minimize the branching factor, we could set s ← ∞
and then  
for each j ← R[h], R R[h] , . . . , while j 6= h,
if S[j] < s set c ← j and s ← S[j].
Then c is a column with the smallest number of 1s. (The S fields are not needed unless
we want to minimize branching in this way.)
The operation of covering column c is more interesting: It removes c from the header
list and removes all rows in c’s own list from the other column lists they are in.
   
Set L R[c] ← L[c] and R L[c]
 ← R[c].
For each i ← D[c], D D[c]
 , . . . , while i 6= c,
for each j ← R[i],
 R R[i] , .. . , while
 j 6= i,
set U D[j] ← U [j], D U[j] ← D[j],
and set S C[j] ← S C[j] − 1.
Operation (1), which I mentioned at the outset of this paper, is used here to remove objects
in both the horizontal and vertical directions.
Finally, we get to the point of this whole algorithm, the operation of uncovering a given
column c. Here is where the links do their dance:
 
For each i = U [c], U U [c] , . .. , while i 6= c,
for each j ← L[i],
 L L[i]
  , . . . , while j 6= i,
set S C[j] ← S [j] + 1, 
 and
 set U D[j]
 ←  j, D U [j] ← j.
Set L R[c] ← c and R L[c] ← c.

6
Notice that uncovering takes place in precisely the reverse order of the covering operation,
using the fact that (2) undoes (1). (Actually we need not adhere so strictly to the principle
of “last done, first undone” in this case, since j could run through row i in any order. But
we must be careful to unremove the rows from bottom to top, because we removed them
from top to bottom. Similarly, it is important to uncover the columns of row r from right
to left, because we covered them from left to right.)

A B C D E F G
h
2 2 2 1 2 2 2

Figure 3. The links after column A in Figure 2 has been covered.

Consider, for example, what happens when search(0) is applied to the data of (3) as
represented by Figure 2. Column A is covered by removing both of its rows from their
other columns; the structure now takes the form of Figure 3. Notice the asymmetry of the
links that now appear in column D: The upper element was deleted first, so it still points to
its original neighbors, but the other deleted element points upward to the column header.
Continuing search(0), when r points to the A element of row (A, D, G), we also cover
columns D and G. Figure 4 shows the status as we enter search(1); this data structure
represents the reduced matrix
B C E F
 
0 1 1 1
. (4)
1 1 0 1

Now search(1) will cover column B, and there will be no 1s left in column E. So
search(2) will find nothing. Then search(1) will return, having found no solutions, and
the state of Figure 4 will be restored. The outer level routine, search(0), will proceed to
convert Figure 4 back to Figure 3, and it will advance r to the A element of row (A, D).

7
A B C D E F G
h
2 1 2 1 1 2 1

Figure 4. The links after columns D and G in Figure 3 have been covered.

Soon the solution will be found. It will be printed as

A D
B G
C E F

if the S fields are ignored in the choice of c, or as

A D
E F C
B G

if the shortest column is chosen at each step. (The first item printed in each row list is the
name of the column on which branching was done.) Readers who play through the action
of this algorithm on some examples will understand why I chose the title of this paper.
Efficiency considerations. When algorithm X is implemented in terms of dancing links,
let’s call it algorithm DLX. The running time of algorithm DLX is essentially proportional
to the number of times it applies operation (1) to remove an object from a list; this is also
the number of times it applies operation (2) to unremove an object. Let’s say that this
quantity is the number of updates. A total of 28 updates are performed during the solution
of (3) if we repeatedly choose the shortest column: 10 updates are made on level 0, 14 on
level 1, and 4 on level 2. Alternatively, if we ignore the S heuristic, the algorithm makes
16 updates on level 1 and 7 updates on level 2, for a total of 33. But in the latter
 case
each update will go noticeably faster, since the statements S C[j] ← S C[j] ± 1 can
be omitted; hence the overall running time will probably be less. Of course we need to

8
Figure 5. The search tree for one case of Scott’s pentomino problem.

study larger examples before drawing any general conclusions about the desirability of the
S heuristic.
A backtrack program usually spends most of its time on only a few levels of the search
tree (see [24]). For example, Figure 5 shows the search tree for the case X = 23 of Dana
Scott’s pentomino problem using the S heuristic; it has the following profile:
Level Nodes Updates Updates per node
0 1 ( 0%) 2,031 ( 0%) 2031.0
1 2 ( 0%) 1,676 ( 0%) 838.0
2 22 ( 0%) 28,492 ( 1%) 1295.1
3 77 ( 1%) 77,687 ( 2%) 1008.9
4 219 ( 2%) 152,957 ( 4%) 698.4
5 518 ( 5%) 367,939 (10%) 710.3
6 1,395 (13%) 853,788 (24%) 612.0
7 2,483 (24%) 941,265 (26%) 379.1
8 2,574 (25%) 740,523 (20%) 287.7
9 2,475 (24%) 418,334 (12%) 169.0
10 636 ( 6%) 32,205 ( 1%) 50.6
11 19 ( 0%) 826 ( 0%) 43.5
Total 10,421 (100%) 3,617,723 (100%) 347.2
(The number of updates shown for level k is the number of times an element was removed
from a doubly linked list during the calculations between levels k − 1 and k. The 2,031 up-
dates on level 0 correspond to removing column X from the header list and then removing
2030/5 = 406 rows from their other columns; these are the rows that overlap with the

9
placement of X at 23. A slight optimization was made when tabulating this data: Col-
umn c was not covered and uncovered in trivial cases when it contained no rows.) Notice
that more than half of the nodes lie on levels ≥ 8, but more than half of the updates occur
on the way to level 7. Extra work on the lower levels has reduced the need for hard work
at the higher levels.
The corresponding statistics look like this when the same problem is run without the
ordering heuristic based on S fields:
Level Nodes Updates Updates per node
0 1 ( 0%) 2,031 ( 0%) 2031.0
1 6 ( 0%) 5,606 ( 0%) 934.3
2 24 ( 0%) 30,111 ( 0%) 1254.6
3 256 ( 0%) 249,904 ( 1%) 976.2
4 581 ( 1%) 432,471 ( 2%) 744.4
5 1,533 ( 1%) 1,256,556 ( 7%) 819.7
6 3,422 ( 3%) 2,290,338 (13%) 669.3
7 10,381 (10%) 4,442,572 (25%) 428.0
8 26,238 (25%) 5,804,161 (33%) 221.2
9 46,609 (45%) 3,006,418 (17%) 64.5
10 13,935 (14%) 284,459 ( 2%) 20.4
11 19 ( 0%) 14,125 ( 0%) 743.4
Total 103,005 (100%) 17,818,752 (100%) 173.0
Each update involves about 14 memory accesses when the S heuristic is used, and about
8 accesses when S is ignored. Thus the S heuristic multiplies the total number of memory
accesses by a factor of approximately (14 × 3,617,723)/(8 × 17,818,752) ≈ 36% in this
example. The heuristic is even more effective in larger problems, because it tends to
reduce the total number of nodes by a factor that is exponential in the number of levels
while the cost of applying it grows only linearly.
Assuming that the S heuristic is good in large trees but not so good in small ones,
I tried a hybrid scheme that uses S at low levels but not at high levels. This experiment
was, however, unsuccessful. If, for example, S was ignored after level 7, the statistics for
levels 8–11 were as follows:
Level Nodes Updates
8 18,300 5,672,258
9 28,624 2,654,310
10 9,989 213,944
11 19 10,179
And if the change was applied after level 8, the statistics were
Level Nodes Updates
9 11,562 1,495,054
10 6,113 148,162
11 19 6,303
Therefore I decided to retain the S heuristic at all levels of algorithm DLX.

10
My trusty old SPARCstation 2 computer, vintage 1992, is able to perform approxi-
mately 0.39 mega-updates per second when working on large problems and maintaining the
S fields. The 120 MHz Pentium I computer that Stanford computer science faculty were
given in 1996 did 1.21 mega-updates per second, and my new 500 MHz Pentium III does
5.94. Thus the running time decreases as technology advances; but it remains essentially
proportional to the number of updates, which is the number of times the links do their
dance. Therefore I prefer to measure the performance of algorithm DLX by counting the
number of updates, not by counting the number of elapsed seconds.
Scott [34] was pleased to discover that his program for the MANIAC solved the pen-
tomino problem in about 3.5 hours. The MANIAC executed approximately 4000 instruc-
tions per second, so this represented roughly 50 million instructions. He and H. F. Trotter
found a nice way to use the “bitwise-and” instructions of the MANIAC, which had 40-bit
registers. Their code, which executed about 50,000,000/(103,005+106,232+154,921) ≈ 140
instructions per node of the search tree, was quite efficient in spite of the fact that they
had to deal with about ten times as many nodes as would be produced by the order-
ing heuristic. Indeed, the linked-list approach of algorithm DLX performs a total of
3,617,723 + 4,547,186 + 5,526,988 = 13,691,897 updates, or about 192 million memory
accesses; and it would never fit in the 5120-byte memory of the MANIAC! From this stand-
point the technique of dancing links is actually a step backward from Scott’s 40-year-old
method, although of course that method works only for very special types of exact cover
problems in which simple geometric structure can be exploited.
The task of finding all ways to pack the set of pentominoes into a 6 × 10 rectangle is
more difficult than Scott’s 8 × 8 − 2 × 2 problem, because the backtrack tree for the 6 × 10
problem is larger and there are 2339 essentially different solutions [21]. In this case we
limit the X pentomino to the upper left quarter of the board; our linked-memory algorithm
generates 902,631 nodes and 309,134,131 updates (or 28,320,810 nodes and 4,107,105,935
updates without the S heuristic). This solves the problem in less than a minute on a Pen-
tium III; however, again I should point out that the special characteristics of pentominoes
allow a faster approach.
John G. Fletcher needed only ten minutes to solve the 6 × 10 problem on an IBM 7094
in 1965, using a highly optimized program that had 765 instructions in its inner loop [10].
The 7094 had a clock rate of 0.7 MHz, and it could access two 36-bit words in a single clock
cycle. Fletcher’s program required only about 600 × 700,000/28,320,810 ≈ 15 clock cycles
per node of the search tree; so it was superior to the bitwise method of Scott and Trotter,
and it remains the fastest algorithm known for problems that involve placing the twelve
pentominoes. (N. G. de Bruijn discovered an almost identical method independently;
see [7].)
With a few extensions to the 0-1 matrix for Dana Scott’s problem, we can solve the
more general problem of covering a chessboard with twelve pentominoes and one square
tetromino, without insisting that the tetromino occupy the center. This is essentially the
classic problem of Dudeney, who invented pentominoes in 1907 [9]. The total number of
such chessboard dissections has apparently never appeared in the literature; algorithm DLX
needs 1,526,279,783 updates to determine that it is exactly 16,146.
Many people have written about polyomino problems, including distinguished math-
ematicians such as Golomb [15], de Bruijn [7, 8], Berlekamp, Conway and Guy [4]. Their

11
92 solutions, 14,352,556 nodes, 1,764,631,796 updates 100 solutions, 10,258,180 nodes, 1,318,478,396 updates

20 solutions, 6,375,335 nodes, 806,699,079 updates 0 solutions, 1,234,485 nodes, 162,017,125 updates

Figure 6. Packing 45 Y pentominoes into a square.


arguments for placing the pieces are sometimes based on enumerating the number of ways
a certain cell on the board can be filled, sometimes on the number of ways a certain piece
can be placed. But as far as I know, nobody has previously pointed out that such problems
are actually exact cover problems, in which there is perfect symmetry between cells and
pieces. Algorithm DLX will branch on the ways to fill a cell if some cell is difficult to fill,
or on the ways to place a piece if some piece is difficult to place. It knows no difference,
because pieces and cells are simply columns of the given input matrix.
Algorithm DLX begins to outperform other pentomino-placing procedures in problems
where the search tree has many levels. For example, let’s consider the problem of packing
45 Y pentominoes into a 15 × 15 square. Jenifer Haselgrove studied this with the help of
a machine called the ICS Multum—which qualified as a “fast minicomputer” in 1973 [20].
The Multum produced an answer after more than an hour, but she remained uncertain
whether other solutions were possible. Now, with the dancing links approach described
above, we can obtain several solutions almost instantly, and the total number of solutions
turns out to be 212. The solutions fall into four classes, depending on the behavior at the
four corners; representatives of each achievable class are shown in Figure 6.

12
Applications to hexiamonds. In the late 1950s, T. H. O’Beirne introduced a pleasant
variation on polyominoes by substituting triangles for squares. He named the resulting
shapes polyiamonds: moniamonds, diamonds, triamonds, tetriamonds, pentiamonds, hex-
iamonds, etc. The twelve hexiamonds were independently discovered by J. E. Reeve and
J. A. Tyrell [32], who found more than forty ways to arrange them into a 6 × 6 rhombus.
Figure 7 shows one such arrangement, together with some arrow dissections that I couldn’t
resist trying when I first learned about hexiamonds. The 6 × 6 rhombus can be tiled by
the twelve hexiamonds in exactly 156 ways. (This fact was first proved by P. J. Torbijn
[35], who worked without a computer; algorithm DLX confirms his result after making
37,313,405 updates, if we restrict the “sphinx” to only 3 of its 12 orientations.)

4 solutions, 6,677 nodes, 4,687,159 updates 0 solutions, 7,603 nodes, 3,115,387 updates

156 solutions, 70,505 nodes, 37,313,405 updates

41 solutions, 35,332 nodes, 14,948,759 updates 3 solutions, 5546 nodes, 3,604,817 updates

Figure 7. The twelve hexiamonds, packed into


a rhombus and into various arrowlike shapes.

13
O’Beirne was particularly fascinated by the fact that seven of the twelve hexiamonds
have different shapes when they are flipped over, and that the resulting 19 one-sided hexi-
amonds have the correct number of triangles to form a hexagon: a hexagon of hexiamonds
(see Figure 8). In November of 1959, after three months of trials, he found a solution; and
two years later he challenged the readers of New Scientist to match this feat [28, 29, 30].
Meanwhile he had shown the puzzle to Richard Guy and his family. The Guys pub-
lished several solutions in a journal published in Singapore, where Richard was a professor
[17]. Guy, who has told the story of this fascinating recreation in [18], says that when
O’Beirne first described the puzzle, “Everyone wanted to try it at once. No one went to
bed for about 48 hours.”
A 19-level backtrack tree with many possibilities at each level makes an excellent
test case for the dancing links approach to covering, so I fed O’Beirne’s problem to my
program. I broke the general case into seven subcases, depending on the distance of the
hexagon piece from the center; furthermore, when that distance was zero, I considered two
subcases depending on the position of the “crown.” Figure 8 shows a representative of
each of the seven cases, together with statistics about the search. The total number of
updates performed was 134,425,768,494.
My goal was not only to count the solutions, but also to find arrangements that were
as symmetrical as possible—in response to a problem that was stated in Berlekamp, Guy,
and Conway’s book Winning Ways [4, page 788]. Let us define the horizontal symmetry of
a configuration to be the number of edges between pieces that also are edges between pieces
in the left-right reflection of that configuration. The overall hexagon has 156 internal edges,
and the 19 one-sided hexiamonds have 96 internal non-edges. Therefore if an arrangement
were perfectly symmetrical—unchanged by left-right reflection—its horizontal symmetry
would be 60. But no such perfectly symmetric solution is possible.
The vertical symmetry of a configuration is defined similarly, but with respect to top-
bottom reflection. A solution to the hexiamond problem is maximally symmetric if it has
the highest horizontal or vertical symmetry score, and if the smaller score is as large as
possible consistent with the larger score. Each of the solutions shown in Figure 8 is, in
fact, maximally symmetric in its class. (And so is the solution to Dana Scott’s problem
that is shown in Figure 1: It has vertical symmetry 36 and horizontal symmetry 30.)
The largest possible vertical symmetry score is 50; it is achieved in Figure 8(c), and in
seven other solutions obtained by independently rearranging three of its symmetrical sub-
parts. Four of the eight have a horizontal symmetry score of 32; the others have horizontal
symmetry 24. John Conway found these solutions by hand in 1964 and conjectured that
they were maximally symmetric overall. But that honor belongs uniquely to the solution
in Figure 8(f), at least by my definition, because Figure 8(f) has horizontal symmetry 52
and vertical symmetry 27. The only other ways to achieve horizontal symmetry 52 have
vertical symmetry scores of 20, 22, and 24. (Two of those other ways do, however, have
the surprising property that 13 of their 19 pieces are unchanged by horizontal reflection;
this is symmetry of entire pieces, not just of edges.)
After I had done this enumeration, I read Guy’s paper [18] for the first time and learned
that Marc M. Paulhus had already enumerated all solutions in May 1996 [31]. Good, our
independent computations would confirm the results. But no—my program found 124,519
solutions, while his had found 124,518! He reran his program in 1999 and now we agree.

14
(a) (b)

(hsym = 51, vsym = 24) (hsym = 52, vsym = 24)


1,914 solutions, 4,239,132 nodes 5,727 solutions, 21,583,173 nodes
2,142,276,414 updates 11,020,236,507 updates

(c) (d) (e)

(hsym = 32, vsym = 50) (hsym = 51, vsym = 22) (hsym = 48, vsym = 30)
11,447 solutions, 20,737,702 nodes 7,549 solutions, 24,597,239 nodes 6,675 solutions, 17,277,362 nodes
10,315,775,812 updates 12,639,698,345 updates 8,976,245,858 updates

(f) (g)

(hsym = 52, vsym = 27) (hsym = 48, vsym = 29)


15,717 solutions, 43,265,607 nodes 75,490 solutions, 137,594,347 nodes
21,607,912,011 updates 67,723,623,547 updates

Figure 8. Solutions to O’Beirne’s hexiamond hexagon problem,


with the small hexagon at various distances from the center of the large one.

15
O’Beirne [29] also suggested an analogous problem for pentominoes, since there are
18 one-sided pentominoes. He asked if they can be put into a 9 × 10 rectangle, and
Golomb provided an example in [15, Chapter 6]. Jenifer Leech wrote a program to prove
that there are exactly 46 different ways to pack the one-sided pentominoes in a 3 × 30
rectangle; see [26]. Figure 9 shows a maximally symmetric example (which isn’t really
very symmetrical).

46 solutions, 605,440 nodes, 190,311,749 updates, hsym = 51, vsym = 48

Figure 9. The one-sided pentominoes, packed into a 3 × 30 rectangle.

I set out to count the solutions to the 9 × 10, figuring that an 18-stage exact cover
problem with six 1s per row would be simpler than a 19-stage problem with seven 1s per
row. But I soon found that the task would be hopeless, unless I invented a much better
algorithm. The Monte Carlo estimation procedure of [24] suggests that about 19 quadrillion
updates will be needed, with 64 trillion nodes in the search trees. If that estimate is correct,
I could have the result in a few months; but I’d rather try for a new Mersenne prime.
I do, however, have a conjecture about the solution that will have maximum horizontal
symmetry; see Figure 10.

hsym = 74, vsym = 49

Figure 10. Is this the most symmetrical way


to pack one-sided pentominoes into a rectangle?

A failed experiment. Special arguments based on “coloring” often give important in-
sights into tiling problems. For example, it is well known [5, pages 142 and 394] that if we
remove two cells from opposite corners of a chessboard, there is no way to cover the remain-
ing 62 cells with dominoes. The reason is that the mutilated chessboard has, say, 32 white
cells and 30 black cells, but each individual domino covers one cell of each color. If we

16
present such a covering problem to algorithm DLX, it makes 4,780,846 updates (and finds
13,922 ways to place 30 of the 31 dominoes) before concluding that there is no solution.
The cells of the hexiamond-hexagon problem can be colored black and white in a
similar fashion: All triangles that point left are black, say, and all that point right are
white. Then fifteen of the one-sided hexiamonds cover three triangles of each color; but
the remaining four, namely the “sphinx” and the “yacht” and their mirror images, each
have a four-to-two color bias. Therefore every solution to the problem must put exactly
two of those four pieces into positions that favor black.
I thought I’d speed things up by dividing the problem into six subproblems, one
for each way to choose the two pieces that will favor black. Each of the subproblems was
expected to have about 1/6 as many solutions as the overall problem, and each subproblem
was simpler because it gave four of the pieces only half as many options as before. Thus
I expected the subproblems to run up to 16 times as fast as the original problem, and I
expected the extra information about impossible correlations of piece placement to help
algorithm DLX make intelligent choices.
But this turned out to be a case where mathematics gave me bad advice. The overall
problem had 6675 solutions and required 8,976,245,858 updates (Figure 8(c)). The six
subproblems turned out to have respectively 955, 1208, 1164, 1106, 1272, and 970 solutions,
roughly as expected; but they each required between 1.7 and 2.2 billion updates, and the
total work to solve all six subproblems was 11,519,571,784. So much for that bright idea.
Applications to tetrasticks. Instead of making pieces by joining squares or triangles
together, Brian Barwell [3] considered making them from line segments or sticks. He
called the resulting objects polysticks, and noted that there are 2 disticks, 5 tristicks, and
16 tetrasticks. The tetrasticks are especially interesting from a recreational standpoint; I
received an attractive puzzle in 1993 that was equivalent to placing ten of the tetrasticks
in a 4 × 4 square [1], and I spent many hours trying to psych it out.
Barwell proved that the sixteen tetrasticks cannot be assembled into any symmetrical
shape. But by leaving out any one of the five tetrasticks that have an excess of horizontal
or vertical line segments, he found ways to fill a 5×5 square. (See Figure 11.) Such puzzles
are quite difficult to do by hand, and he had found only five solutions at the time he wrote
his paper; he conjectured that fewer than a hundred solutions would actually exist. (The
set of all solutions was first found by Wiezorke and Haubrich [37], who invented the puzzle
independently after seeing [1].)
Polysticks introduce a new feature that is not present in the polyomino and polyia-
mond problems: The pieces must not cross each other. For example, Figure 12 shows a
non-solution to the problem considered in Figure 11(c). Every line segment in the grid of
5 × 5 squares is covered, but the ‘V’ tetrastick crosses the ‘Z’.
We can handle this extra complication by generalizing the exact cover problem. In-
stead of requiring all columns of a given 0-1 matrix to be covered by disjoint rows, we
will distinguish two kinds of columns: primary and secondary. The generalized problem
asks for a set of rows that covers every primary column exactly once and every secondary
column at most once.
The tetrastick problem of Figure 11(c) can be set up as a generalized cover problem
in a natural way. First we introduce primary columns F, H, I, J, N, O, P, R, S, U, V,

17
(a) (b)

72 solutions, 1,132,070 nodes 382 solutions, 3,422,455 nodes


283,814,227 updates 783,928,340 updates

(c) (d) (e)

607 solutions, 2,681,188 nodes 530 solutions, 3,304,039 nodes 204 solutions, 1,779,356 nodes
611,043,121 updates 760,578,623 updates 425,625,417 updates

Figure 11. Filling a 5 × 5 grid with 15 of the 16 tetrasticks;


we must leave out either the H, the J, the L, the N, or the Y.

W, X, Y, Z representing the fifteen tetrasticks (excluding L), as well as columns Hxy


representing the horizontal segments (x, y) −− (x + 1, y) and Vxy representing the vertical
segments (x, y) −− (x, y + 1), for 0 ≤ x, y < 5. We also need secondary columns Ixy to
represent interior junction points (x, y), for 0 < x, y < 5. Each row represents a possible
placement of a piece, as in the polyomino and polyiamond problems; but if a piece has two
consecutive horizontal or vertical segments and does not lie on the edge of the diagram, it
should include the corresponding interior junction point as well.

Figure 12. Polysticks are not supposed to


cross each other as they do here.

For example, the two rows corresponding to the placement of V and Z in Figure 12
are
V H23 I33 H33 V43 I44 V44
Z H24 V33 I33 V32 H32
The common interior point I33 means that these rows cross each other. On the other hand,

18
I33 is not a primary column, because we do not necessarily need to cover it. The solution
in Figure 11(c) covers only the interior points I14, I21, I32, and I41.
Fortunately, we can solve the generalized cover problem by using almost the same
algorithm as before. The only difference is that we initialize the data structure by making
a circular list of the column headers for the primary columns only. The header for each
secondary column should have L and R fields that simply point to itself. The remainder
of the algorithm proceeds exactly as before, so we will still call it algorithm DLX.
A generalized cover problem can be converted to an equivalent exact cover problem
if we simply append one row for each secondary column, containing a single 1 in that col-
umn. But we are better off working with the generalized problem, because the generalized
algorithm is simpler and faster.
I decided to experiment with the subset of welded tetrasticks, namely those that do not
form a simple connected path because they contain junction points: F, H, R, T, X, Y. There
are ten one-sided welded tetrasticks if we add the mirror images of the unsymmetrical pieces
as we did for one-sided hexiamonds and pentominoes. And—aha—these ten tetrasticks can
be arranged in a 4 × 4 grid. (See Figure 13.) Only three solutions are possible, including
the two perfectly symmetric solutions shown. I’ve decided not to show the third solution,
which has the X piece in the middle, because I want readers to have the pleasure of finding
it for themselves.

Figure 13. Two of the three ways to pack the


one-sided welded tetrasticks into a square.

There are fifteen one-sided unwelded tetrasticks, and I thought they would surely fit
into a 5 × 5 grid in a similar way; but this turned out to be impossible. The reason is that
if, say, piece I is placed vertically, four of the six pieces J, J′ , L, L′ , N, N′ must be placed
to favor the horizontal direction, and this severely limits the possibilities. In fact, I have
been unable to pack those fifteen pieces into any simple symmetrical shape; my best effort
so far is the “oboe” shown in Figure 14.

Figure 14. The fifteen one-sided unwelded tetrasticks.

19
Figure 15. Do all 25 one-sided tetrasticks
fit in this shape?

I also tried unsuccessfully to pack all 25 of the one-sided tetrasticks into the Aztec
diamond pattern of Figure 15; but I see no way to prove that a solution is impossible. An
exhaustive search seems out of the question at the present time.
Applications to queens. Now we can return to the problem that led Hitotumatu and
Noshita to introduce dancing links in the first place, namely the N queens problem, be-
cause that problem is actually a special case of the generalized cover problem in the
previous section. For example, the 4 queens problem is just the task of covering eight
primary columns (R0, R1, R2, R3, F0, F1, F2, F3) corresponding to ranks and files, while
using at most one element in each of the secondary columns (A0, A1, A2, A3, A4, A5, A6,
B0, B1, B2, B3, B4, B5, B6) corresponding to diagonals, given the sixteen rows

R0 F0 A0 B3
R0 F1 A1 B4
R0 F2 A2 B5
R0 F3 A3 B6
R1 F0 A1 B2
R1 F1 A2 B3
R1 F2 A3 B4
R1 F3 A4 B5
R2 F0 A2 B1
R2 F1 A3 B2
R2 F2 A4 B3
R2 F3 A5 B4
R3 F0 A3 B0
R3 F1 A4 B1
R3 F2 A5 B2
R3 F3 A6 B3 .

In general, the rows of the 0-1 matrix for the N queens problem are
Ri Fj A(i + j) B(N − 1 − i + j)
for 0 ≤ i, j < N . (Here Ri and Fj represent ranks and files of a chessboard; Ak and Bℓ
represent diagonals and reverse diagonals. The secondary columns A0, A(2N − 2), B0, and
B(2N − 2) each arise in only one row of the matrix so they can be omitted.)
When we apply algorithm DLX to this generalized cover problem, it behaves quite
differently from the traditional algorithms for the N queens problem, because it branches
sometimes on different ways to occupy a rank of the chessboard and sometimes on different

20
ways to occupy a file. Furthermore, we gain efficiency by paying attention to the order in
which primary columns of the cover problem are considered when those columns all have
the same S value (the same branching factor): It is better to place queens near the middle
of the board first, because central positions rule out more possibilities for later placements.
Consider, for example, the eight queens problem. Figure 16(a) shows an empty board,
with 8 possible ways to occupy each rank and each file. Suppose we decide to place a queen
in R4 and F7, as shown in Figure 16(b). Then there are five ways to cover F4; after choosing
R5 and F4, Figure 16(c), there are four ways to cover R3, and so on. At each stage we
choose the most constrained rank or file, using the “organ pipe ordering”
R4 F4 R3 F3 R5 F5 R2 F2 R6 F6 R1 F1 R7 F7 R0 F0
to break ties. Placing a queen in R2 and F3 after Figure 16(d) makes it impossible to
cover F2, so backtracking will occur even though only four queens have been tentatively
placed.

(a) (b)
F0 F1 F2 F3 F4 F5 F6 F7 F0 F1 F2 F3 F4 F5 F6 F7
R7 8 R7 × × 6

R6 8 R6 × × 6

R5 8 R5 × × 6

R4 8 R4 × × × × × × ×
q
R3 8 R3 × × 6

R2 8 R2 × × 6

R1 8 R1 × × 6

R0 8 R0 × × 6
8 8 8 8 8 8 8 8 7 7 7 6 5 5 5

(c) (d)
F0 F1 F2 F3 F4 F5 F6 F7 F0 F1 F2 F3 F4 F5 F6 F7
R7 × × × × 4 R7 × × × × × × 2

R6 × × × × 4 R6 × × × × × 3

R5 × × × ×
q × × × R5 × × × ×
q × × ×

R4 × × × × × × ×
q R4 × × × × × × ×
q
R3 × × × × 4 R3 × × × × ×
q × ×

R2 × × × × 4 R2 × × × × × 3

R1 × × × 5 R1 × × × × × 3

R0 × × × 5 R0 × × × × × 3
5 5 4 4 4 4 4 3 2 2 3

Figure 16. Solving the 8 queens problem by treating ranks and files symmetrically.

21
The order in which header nodes are linked together at the start of algorithm DLX can
have a significant effect on the running time. For example, experiments on the 16 queens
problem show that the search tree has 312,512,659 nodes and requires 5,801,583,789 up-
dates, if the ordering R0 R1 . . . R15 F0 F1 . . . F15 is used, while the organ-pipe ordering
R8 F8 R7 F7 R9 F9 . . . R0 F0 requires only about 54% as many updates. On the other
hand, the order in which individual elements of a row or column are linked together has
no effect on the algorithm’s total running time.
Here are some statistics observed when algorithm DLX solved small cases of the
N queens problem using organ-pipe order, without reducing the number of solutions by
taking symmetries of the board into account:

N Solutions Nodes Updates R-Nodes R-Updates


1 1 2 3 2 3
2 0 3 19 3 19
3 0 4 56 6 70
4 2 13 183 15 207
5 10 46 572 50 626
6 4 93 1,497 115 1,765
7 40 334 5,066 376 5,516
8 92 1,049 16,680 1,223 18,849
9 352 3,440 54,818 4,640 71,746
10 724 11,578 198,264 16,471 269,605
11 2,680 45,393 783,140 67,706 1,123,572
12 14,200 211,716 3,594,752 312,729 5,173,071
13 73,712 1,046,319 17,463,157 1,589,968 26,071,148
14 365,596 5,474,542 91,497,926 8,497,727 139,174,307
15 2,279,184 31,214,675 513,013,152 49,404,260 800,756,888
16 14,772,512 193,032,021 3,134,588,055 308,130,093 4,952,973,201
17 95,815,104 1,242,589,512 20,010,116,070 2,015,702,907 32,248,234,866
18 666,090,624 8,567,992,237 141,356,060,389 13,955,353,609 221,993,811,321

Here “R-nodes” and “R-Updates” refer to the results when we consider only R0, R1, . . . ,
R(N − 1) to be primary columns that need to be covered; columns Fj are secondary. In
this case the algorithm reduces to the usual procedure in which branching occurs only on
ranks of the chessboard. The advantage of mixing rows with columns becomes evident as
N increases, but I’m not sure whether the ratio of R-Updates to Updates will be unbounded
or approach a limit as N goes to infinity.
I should point out that special methods are known for counting the number of solutions
to the N queens problem without actually generating the queen placements [33].
Concluding remarks. Algorithm DLX, which uses dancing links to implement the “nat-
ural” algorithm for exact cover problems, is an effective way to enumerate all solutions
to such problems. On small cases it is nearly as fast as algorithms that have been tuned
to solve particular classes of problems, like pentomino packing or the N queens problem,
where geometric structure can be exploited. On large cases it appears to run even faster

22
than those special-purpose algorithms, because of its ordering heuristic. And as computers
get faster and faster, we are of course tackling larger and larger cases all the time.
In this paper I have used the exact cover problem to illustrate the versatility of dancing
links, but I could have chosen many other backtrack applications in which the same ideas
apply. For example, the approach works nicely with the Waltz filtering algorithm [36];
perhaps this fact has subliminally influenced my choice of names. I recently used dancing
links together with a dictionary of about 600 common three-letter words of English to find
word squares such as

ATE BED OHM PEA TWO


WIN OAR RUE URN ION
LED WRY BET BAY TEE
in which each row, column, and diagonal is a word; about 60 million updates produced
all solutions. I believe that a terpsichorean technique is significantly better than the
alternative of copying the current state at every level, as considered in the pioneering
paper by Haralick and Elliott on constraint satisfaction problems [19]. Certainly the use
of (1) and (2) is simple, useful, and fun.
“What a dance / do they do / Lordy, I am tellin’ you!” [2]
Acknowledgments. I wish to thank Sol Golomb, Richard Guy, and Gene Freuder for the
help they generously gave me as I was preparing this paper. Maggie McLoughlin did an
excellent job of translating my scrawled manuscript into a well-organized TEX document.
And I profoundly thank Tomas Rokicki, who provided the new computer on which I did
most of the experiments, and on which I hope to keep links dancing merrily for many years.
Historical notes. (1) Although the IAS computer was popularly known in Princeton
as the “MANIAC,” that title properly belonged only to a similar but different series of
computers built at Los Alamos. (See [27].) (2) George Jelliss [23] discovered that the
great puzzle masters H. D. Benjamin and T. R. Dawson experimented with the concept
of polysticks already in 1946–1948. However, they apparently did not publish any of their
work. (3) My names for the tetrasticks are slightly different from those originally proposed
by Barwell [3]: I prefer to use the letters J, R, and U for the pieces he called U, J, and C,
respectively.
Program notes. The implementation of algorithm DLX that I used when preparing
this paper is file dance.w on webpage http://www-cs-faculty.stanford.edu/~knuth/
programs.html. See also the related files polyominoes.w, polyiamonds.w, polysticks.w,
and queens.w.

23
References

[1] 845 Combinations Puzzles: 845 Interestingly Combinations (Taiwan: R.O.C. Patent
66009). [There is no indication of the author or manufacturer. This puzzle, which
is available from www.puzzletts.com, actually has only 83 solutions. It carries a
Chinese title, “Dr. Dragon’s Intelligence Profit System.”]
[2] Harry Barris, Mississippi Mud (New York: Shapiro, Bernstein & Co., 1927).
[3] Brian R. Barwell, “Polysticks,” Journal of Recreational Mathematics 22 (1990), 165–
175.
[4] Elwyn R. Berlekamp, John H. Conway, and Richard K. Guy, Winning Ways for Your
Mathematical Plays 2 (London: Academic Press, 1982).
[5] Max Black, Critical Thinking (Englewood Cliffs, New Jersey: Prentice–Hall, 1946).
[Does anybody know of an earlier reference for the problem of the “mutilated chess-
board”?]
[6] Ole-Johan Dahl, Edsger W. Dijkstra, and C. A. R. Hoare, Structured Programming
(London: Academic Press, 1972).
[7] N. G. de Bruijn, personal communication (9 September 1999): “. . . it was almost my
first activity in programming that I got all 2339 solutions of the 6 × 10 pentomino on
an IBM1620 in March 1963 in 18 hours. It had to cope with the limited memory of
that machine, and there was not the slightest possibility to store the full matrix . . .
But I could speed the matter up by having a very long program, and that one was
generated by means of another program.”
[8] N. G. de Bruijn, “Programmeren van de pentomino puzzle,” Euclides 47 (1971/72),
90–104.
[9] Henry Ernest Dudeney, “74.—The broken chessboard,” in The Canterbury Puzzles,
(London: William Heinemann, 1907), 90–92, 174–175.
[10] John G. Fletcher, “A program to solve the pentomino problem by the recursive use
of macros,” Communications of the ACM 8 (1965), 621–623.
[11] Robert W. Floyd, “Nondeterministic algorithms,” Journal of the ACM 14 (1967),
636–644.
[12] Martin Gardner, “Mathematical games: More about complex dominoes, plus the
answers to last month’s puzzles,” Scientific American 197, 6 (December 1957), 126–
140.
[13] Michael R. Garey and David S. Johnson, Computers and Intractability (San Francisco:
Freeman, 1979).
[14] Solomon W. Golomb, “Checkerboards and polyominoes,” American Mathematical
Monthly 61 (1954), 675–682.
[15] Solomon W. Golomb, Polyominoes, second edition (Princeton, New Jersey: Princeton
University Press, 1994).
[16] Solomon W. Golomb and Leonard D. Baumart, “Backtrack programming,” Journal
of the ACM 12 (1965), 516–524.

24
[17] Richard K. Guy, “Some mathematical recreations,” Nabla (Bulletin of the Malayan
Mathematical Society) 7 (1960), 97–106, 144–153.
[18] Richard K. Guy, “O’Beirne’s Hexiamond,” in The Mathemagician and Pied Puzzler,
edited by Elwyn Berlekamp and Tom Rodgers (Natick, Massachusetts: A. K. Peters,
1999), 85–96.
[19] Robert M. Haralick and Gordon L. Elliott, “Increasing tree search efficiency for con-
straint satisfaction problems,” Artificial Intelligence 14 (1980), 263–313.
[20] Jenifer Haselgrove, “Packing a square with Y-pentominoes,” Journal of Recreational
Mathematics 7 (1974), 229.
[21] C. B. and Jenifer Haselgrove, “A computer program for pentominoes,” Eureka 23, 2
(Cambridge, England: The Archimedeans, October 1960), 16–18.
[22] Hirosi Hitotumatu and Kohei Noshita, “A technique for implementing backtrack al-
gorithms and its application,” Information Processing Letters 8 (1979), 174–175.
[23] George P. Jelliss, “Unwelded polysticks,” Journal of Recreational Mathematics 29
(1998), 140–142.
[24] Donald E. Knuth, “Estimating the efficiency of backtrack programs,” Mathematics of
Computation 29 (1975), 121–136.
[25] Donald E. Knuth, TEX: The Program (Reading, Massachusetts: Addison–Wesley,
1986).
[26] Jean Meeus, “Some polyomino and polyamond problems,” Journal of Recreational
Mathematics 6 (1973), 215–220.
[27] N. Metropolis and J. Worlton, “A trilogy of errors in the history of computing,”
Annals of the History of Computing 2 (1980), 49–59.
[28] T. H. O’Beirne, “Puzzles and Paradoxes 43: Pell’s equation in two popular problems,”
New Scientist 12 (1961), 260–261.
[29] T. H. O’Beirne, “Puzzles and Paradoxes 44: Pentominoes and hexiamonds,” New
Scientist 12 (1961), 316–317. [“So far as I know, hexiamond has not yet been put
through the mill on a computer; but this could doubtless be done.”]
[30] T. H. O’Beirne, “Puzzles and Paradoxes 45: Some hexiamond solutions: and an
introduction to a set of 25 remarkable points,” New Scientist 12 (1961), 379–380.
[31] Marc Paulhus,“Hexiamond Homepage,” http://www.math.ucalgary.ca/~paulhusm/
hexiamond1.
[32] J. E. Reeve and J. A. Tyrell, “Maestro puzzles,” The Mathematical Gazette 45 (1961),
97–99.
[33] Igor Rivin, Ilan Vardi, and Paul Zimmermann, “The n-queens problem,” American
Mathematical Monthly 101 (1994), 629–639.
[34] Dana S. Scott, “Programming a combinatorial puzzle,” Technical Report No. 1 (Prince-
ton, New Jersey: Princeton University Department of Electrical Engineering, 10 June
1958), ii + 14 + 5 pages. [From page 10: “. . . the main problem in the program was
to handle several lists of indices that were continually being modified.”]

25
[35] P. J. Torbijn, “Polyiamonds,” Journal of Recreational Mathematics 2 (1969), 216–227.
[36] David Waltz, “Understanding line drawings of scenes with shadows,” in The Psy-
chology of Computer Vision, edited by P. Winston (New York: McGraw–Hill, 1975),
19–91.
[37] Bernhard Wiezorke and Jacques Haubrich, “Dr. Dragon’s polycons,” Cubism For Fun
33 (February 1994), 6–7.
Addendum. During November, 1999, Alfred Wassermann of Universität Bayreuth suc-
ceeded in covering the Aztec diamond of Figure 15 with one-sided tetrasticks, using a
cluster of workstations running algorithm DLX. The 107 possible solutions, which are
quite beautiful, have been posted at http://did.mat.uni-bayreuth.de/wassermann/.
He subsequently enumerated the 10,440,433 solutions to the 9 × 10 one-sided pentomino
problem; many of these turn out to be more symmetric than the one in Figure 10.

26
Computer Science Department
Mathematical Writing
by
Donald E. Knuth, Tracy Larrabee, and Paul M. Roberts
This report is based on a course of the same name given at Stanford University during
autumn quarter, 1987. Here's the catalog description:
CS 209. Mathematical Writing-Issues of technical writing and the ef­
fective presentation of mathematics and computer science. Preparation of theses,
papers, books, and "literate" computer programs. A term paper on a topic of
your choice; this paper may be used for credit in another course.
The first three lectures were a "minicourse" that summarized the basics. About two
hundred people attended those three sessions, which were devoted primarily to a discussion
of the points in § 1 of this report. An exercise (§2) and a suggested solution (§3) were also
part of the minicourse.
The remaining 28 lectures covered these and other issues in depth. We saw many
examples of "before" and "after" from manuscripts in progress . We learned how to avoid
excessive subscripts and superscripts. We discussed the documentation of algorithms, com­
puter programs, and user manuals . We considered the process of refereeing and editing.
We studied how to make effective diagrams and tables, and how to find appropriate quota­
tions to spice up a text. Some of the material duplicated some of what would be discussed
in writing classes offered by the English department, but the vast majority of the lectures
were devoted to issues that are specific to mathematics and/or computer science.
Guest lectures by Herb Wilf ( University of Pennsylvania), Jeff Ullman (Stanford),
Leslie Lamport (Digital Equipment Corporation) , Nils Nilsson (Stanford), Mary-Claire
van Leunen (Digital Equipment Corporation) , Rosalie Stemer (San Francisco Chronicle),
and Paul Halmos (University of Santa Clara), were a special highlight as each of these
outstanding authors presented their own perspectives on the problems of mathematical
communication.
This report contains transcripts of the lectures and copies of various handouts that
were distributed during the quarter. We think the course was able to clarify a surprisingly
large number of issues that play an important part in the life of every professional who

works in mathematical fields. Therefore we hope that people who were unable to attend
the course might still benefit from it, by reading this summary of what transpired.
The authors wish to thank Phyllis Winkler for the first-rate technical typing that
made these notes possible.
Caveat: These are transcripts of lectures, not a polished set of essays on the subject.
Some of the later lectures refer to mistakes in the notes of earlier lectures; we have decided
to correct some (but not all) of those mistakes before printing this report. References to
such no-longer-existent blunders might be hard to understand. Understand?
Videotapes of the class sessions are kept in the Mathematical & Computer Sciences
Library at Stanford.
The preparation of this report was supported in part by NSF grant CCR-8610181.
Table of Contents

§l. Minicourse on technical writing 1


§2. An exercise on technical writing ,

h An answer to the exercise S


§4· Comments on student answers (1) 9
§-;)
.
Comments on student answers (2) 11
§6. Preparing books for publication (1) 14
§-
, .
Preparing books for publication (2) 15
§S. Preparing books for publication (3) IS
§g. Handy reference books 19
§10. Presenting algorithms 20
§11. Literate Programming (1) 22
§12. Literate Programming (2) 26
§ '3· User manuals 28
§14· Galley proofs 30
§ '5· Refereeing (1) 31
§16. Refereeing (2) 34
§ '7· Hints for Referees 36
§ 18. Illustrations (1) 37
§ 19· Illustrations (2) 40
§20. Homework: Subscripts and superscripts 40
§21. Homework: Solutions 43
§22. Quotations 47
§23· Scientific American Saga (1) 49
§24· Scientific American Saga (2) 51
§25· Examples of good style 54
§26. Mary-Claire van Leunen on 'hopefully' 57
§27· Herb W ilf on Mathematical Writing 59
§28. �Nilf's first extreme 61
§29· Wilf's other extreme 62
§30. Jeff Ullman on Getting Rich 66
§31. Leslie Lamport on Writing Papers 69
§32. Lamport's handout on unnecessary prose 71
§33· Lamport's handout on styles of proof 72
§34· Nils Nilsson on Art and Writing 73
§J5 · Mary-Claire van Leunen on Calisthenics (1) 77
§ 36. Mary-Claire's handout on Composition Exercises 81
§J7· Comments on student work 89
§ 38. Mary-Claire van Leunen on W hich vs. That 93
§J9· Mary-Claire van Leunen on Calisthenics (2) 98
§40. Computer aids to writing 100
§41. Rosalie Sterner on Copy Editing 102
§42. Paul Halmos on Mathematical Writing 10.6
§43· Final truths 112
§ 1. Notes on Technical Writing
Stanford's library card catalog refers to more than 100 books about technical writing,
including such titles as The Art of Technical Writing, The Craft of Technical Writing,
The Teaching of Technical Writing. There is even a journal devoted to the subject, the
IEEE Transactions on Professional Communication, published since 1958. The American
Chemical Society, the American Institute of Physics, the American Mathematical Society,
and the Mathematical Association of America have each published "manuals of style."
The last of these, Wi'iting Mathematics Well by Leonard Gillman, is one of the required
texts for CS 209.
The nicest little reference for a quick tutorial is The Elements of Style, by Strunk and
White (Macmillan, 1979). Everybody should read this 85-page book, which tells about
English prose writing in general. But it isn't a required text-it's merely recommended.
The other required text for C S 209 is A Handbook for Scholars by Mary-Claire van
Leunen (Knopf, 1978). This well-written book is a real pleasure to read, in spite of its
unexciting title. It tells about footnotes, references, quotations, and such things, done
correctly instead of the old-fashioned "op. cit." way.
Mathematical writing has certain peculiar problems that have rarely been discussed
in the literature. Gillman's book refers to the three previous classics in the field: An
article by Harley Flanders, Amer. Math. Monthly, 197 1 , pp. 1-10; another by R. P. Boas
in the same journal, 1981, pp. 727-731. There's also a nice booklet called How t o Write
Mathematics, published by the American Mathematical Society in 1973, especially the
delightful essay by Paul R. Halmos on pp. 19-48.
The following points are especially important, in your instructor's view:
1 . Symbols in different formulas must be separated by words.
Bad: Consider Sq, q < p.
Good: Consider Sq, where q < p.
2. Don't start a sentence with a symbol.
Bad: xn a has n distinct zeroes.
-

Good: The polynomial xn a has n distinct zeroes.


-

3. Don't use the symbols . ' . , =*', V, 3, 3; replace them by the corresponding words.
( Except in works on logic, of course. )
4. The statement just preceding a theorem, algorithm, etc., should be a complete sen­
tence or should end with a colon.
Bad: We now have the following
T heorem. H(x) is continuous.
This is bad on three counts, including rule 2 . It should be rewritten, for example, like
this:
Good: We can now prove the following result.
Theorem_ The function H(x) defined in (5) is continuous.
Even better would be to replace the first sentence by a more suggestive motivation,
tying the theorem up with the previous discussion.

[§ 1. MINICOURSE ON TECHNICAL WRITING 1)


5 . The statement of a theorem should usually b e self-contained , not depending on the
assumptions in the preceding text. (See the restatement of the theorem in point 4.)
6. The word "we" is often useful to avoid passive voice; the "good" first sentence of
example 4 is much better than "The following result can now be proved." But this
use of "we" should be used in contexts where it means "you and me together", not a
formal equivalent of "I". Think of a dialog between author and reader.
In most technical writing, "I" should be avoided, unless the author's persona is rele­
vant.
7. There is a definite rhythm in sentences. Read what you have written, and change the
wording if it does not flow smoothly. For example, in the text Sorting and Searching it
was sometimes better to say "merge patterns" and sometimes better to say "merging
patterns". There are many ways to say "therefore" , but often only one has the correct
rhythm.
S. Don't omit "that" when it helps the reader to parse the sentence.

Bad: Assume A is a group.


Good: Assume that A. is a group.
The words "assume" and "suppose" should usually be followed by "that" unless a.n­
other "that" appea.rs nearby. But never say "We have that x = y," say '''vVe have
"
x = y. And avoid unnecessary padding "because of the fact that" unless you feel
that the reader needs a moment to recuperate from a conceritrated sequence of ideas .
9. Vary the sentence structure and the choice of words, to avoid monotony. But use
parallelism when parallel concepts are being discussed. For example (Strunk and
White #15), don't say this:
Formerly, science was taught by the textbook method, while now the lab­
oratory method is employed.
Rather:
Formerly, science was taught by the textbook method; now it is taught by
the laboratory method .
Avoid words like "this" or "also" in consecutive sentences; such words, as well as
unusual or polysyllabic utterances, tend to stick in a reader's mind longer than other
words, and good style will keep "sticky" words spaced well apart. (For example, I'd
better not say "utterances" any more in the rest of these notes.)
10. Don't use the style of homework papers, in which a sequence of formulas is merely
listed. Tie the concepts together with a running commentary.
11. Try to state things twice, in complementary ways, especially when giving a definition.
This reinforces the reader's understanding. ( Examples, see §2 below: Nn is defined
twice, An is described as "nonincreasing", L( C, P) is characterized as the smallest
subset of a certain type.) All variables must be defined, at least informally, when they
are first introduced .

[2 §l. MINICOURSE ON TECHNICAL WRITING)


12. Motivate the reader for what follows. In the example of §2, Lemma 1 is motivated
by the fact that its converse is true. Definition 1 is motivated only by decree; this is
somewhat riskier.
Perhaps the most important principle of good writing is to keep the reader uppermost
in mind: What does the reader know so far? What does the reader expect next and
why?
When describing the work of other people it is sometimes safe to provide motivation
by simply stating that it is "interesting" or "remarkable"; but it is best to let the
results speak for themselves or to give reasons why the things seem interesting or
, remarkable.
When describing your own work, be humble and don't use superlatives of praise, either
explicitly or implicitly, even if you are enthusiastic.
13. Many readers will skim over formulas on their first reading of your exposition. There­
fore, your sentences should flow smoothly when all but the simplest formulas are
replaced by "blah" or some other grunting noise.
14. Don't use the same notation for two different things. Conversely, use consistent nota­
tion for the same thing when it appears in several places. For example, don't say "A.j
for 1 :5 j S n" in one place and"Ak for 1 S k S n" in another place unless there is a
good reason. It is often useful to choose names for indices so that i varies from 1 to
In and j from 1 to n, say, and to stick to consistent usage. Typographic conventions

(like lowercase letters for elements of sets and uppercase for sets) are also useful.
15. Don't get carried away by subscripts, especially when dealing with a set that doesn't
need to be indexed; set element notation can be used to avoid subscripted subscripts.
For example, it is often troublesome to start out with a definition like "Let X =

{Xl,"" In}" if you're going to need subsets of X, since the subset will have to
defined as {Xi., ... , Xim}, say. Also you'll need to be speaking of elements Xi and Xj
all the time. Don't name the elements of X unless necessary. Then you can refer to
elements X and y of X in your subsequent discussion, without needing subscripts; or
YOll can refer to Xl and X2 as specified elements of X.

16. Display important formulas on a line by themselves. If you need to refer to some of
these formulas from remote parts of the text, give reference numbers to all of the most
important ones, even if they aren't referenced.
17. Sentences should be readable from left to right without ambiguity. Bad examples:
"Smith remarked in a paper about the scarcity of data." "In the theory of rings,
groups and other algebraic structures are treated."
18. Small numbers should be spelled out when used as adjectives, but not when used as
names (i.e., when talking about numbers as numbers).
Bad: The method requires 2 passes.
Good: Method 2 is illustrated in Fig. 1; it requires 17 passes. The count was
increased by 2. The leftmost 2 in the sequence was changed to a l.
19. Capitalize names like Theorem 1, Lemma 2, Algorithm 3, Method 4.

[§1. MfNfCOURSE ON TECHNICAL WRITING 31


20. Some handy ma..."(ims:
Watch out for prepositions that sentences end with.
When dangling, consider your participles.
About them sentence fragments.
Make each pronoun agree with their antecedent.
Don't use commas, which aren't necessary.
Try to never split infinitives.
21. Some words frequently misspelled by computer scientists:
implement not impliment
complement not compliment
occurrence not occurence
dependent not dependant
auxiliary not auxiliary
feasible not feasable
preceding not preceeding
referring not refering
category not catagory
consistent not consistant
PL/I not PL/1
descendant (noun) not descendent
its (belonging to it) not it's (it is)
The following words are no longer being hyphenated in current literature:
nonnegative
nonzero
22. Don't say "which" when "that" sounds better. The general rule nowadays is to use
'
"which" only when it is preced
interrogatively. Experiment to find out which is better, "which" or "that", and you'll
understand this rule.
Bad: Don't use commas which aren't necessary.
Good: Don't use commas that aren't necessary.
Another common error is to say "less" when it should be "fewer" .
23 . In the example at the bottom of §2 below, note that the text preceding displayed
equations (1) and (2) does not use any special punctuation. Many people would have
written
of "nonincreasing" vectors:

An = {(al, . . . ,an )E Nn I al 2: ... 2: an}. (1)

If G and P are subsets of Nn , let:

L(G,P) = . . .

and those colons are wrong.

[4 §l. MINICOURSE ON TECHN1CAL WR1TCNG]


24. The opening paragraph should be your best paragraph. and its first sentence should
be your best sentence. If a paper st3rts badly, the reader will wince and be resigned to
a difficult job of fighting with your prose. Conversely, if the beginning flows smoothly.
the reader will be hooked and won't notice occasional lapses in the later parts .
Probably the worst way to start is wi th a sentence of the form "An x is y." For
example,
Bad: An important method for internal sorting is quicksort.
Good: Quicksort is an important method for internal sorting, because
Bad: A commonly used data structure is the priority queue.
Good: Priority queues are significant components of the data structures needed
for many different applications.
25. The normal style rules for English say that commas and periods should be placed in­
side quotation marks, but other punctuation (like colons, semicolons, question marks,
exclamation marks) stay outside the quotation marks unless they are part of the quo­
tation. It is generally best to go along with this illogical convention about commas
and periods, because it is so well established, except when you are using quotation
marks to describe some text as a specific string of symbols. For example,
Good: Always end your program with the word "end " .
O n the other hand, punctuation should always be strictly logical with respect to
parentheses and brackets. Put a period inside parentheses if and only if the sentence
ending with that period is entirely within the parentheses . The punctuation within
parentheses should be correct, independently of the outside context, and the punctu­
ation outside the parentheses should be correct if the parenthesized statement would
be removed.
Bad: This is bad, (although intentionally so.)
26. Resist the temptation to use long strings of nouns as adjectives: consider the packet
switched data communication network protocol problem.
In general, don't use jargon unnecessarily. Even specialists in a field get more pleasure
from papers that use a nonspecialist's vocabulary.
Bad: "If L +(P, No) is the set of functions f: P No with the property that
-+

3 V p?:: no => f(p) =0


noENo pEP
then there exists a bijection N\ -+ L +(P, No) such that if n -+ f then
n = II pf(p).
pEP
Here P is the prime numbers and N\ = No � {O}."

[§l. MINICOURSE ON TECHNICAL WRITING


Better: "According to the 'fundamental theorem of arithmetic' (pron'd III ex.
1.2.4-21), each positive integer t! can be expressed in the form

u =
2u2 3 u35 u5 7u71 1 ull . . . = II pUp�

P prime

where the exponents t!2, t!3,'" are uniquely determined nonnegative inte­
gers, and where all but a finite number of the exponents a.re zero."
[The first quotation is from Carl Linderholm's neat satirical book Mathema.tics Made
Difficult; the second is from D. Knuth's Seminumericai Algorithms, Section 4.5.2.]

When in doubt, read The Art of Computer Programming for outstanding examples
of good style.
[That "as a joke. Humor is best used in technical writing when readers ca. n understand
the joke only when they also understand a technical point that is being made. Here
is another example from Linderholm:
= =
"... 0D 0 and N0 N, which we may express by saying that 0 is
absorbing on the left and neutral on the right, like British toilet paper."
Try to restrict yourself to jokes that will not seem silly on second or third reading.
And don't overuse exclamation points!]

[6 §l. MINICOURSE ON TECHNICAL WRITING ]


§2. An Exercise on Technical Writing
In the following excerpt from a term paper, N denotes the nonnegative integers, Nn
denotes the set of n-tuples of nonnegative integers, and An = (al,"" an) E Nn I al �
'" � an}. If C,P C N n, then L(C,P) is defined to be {C+PI +... +Pm Ic E C , m � O.
and Pj E P for 1 ::; j ::; m}. We want to prove that L(C,P) S;; An implies C,P S;; An.
The following proof, directly quoted from a sophomore term paper, is mathematically
correct (except for a minor slip) but stylistically atrocious:
L(C,P) c A.n
C CL � CC An
Spse pEP, P if; An � Pi < Pj for i < j
C+ pEL C An
.' . Ci + Pi � Cj + pj but Ci � Cj � O,Pj � Pi' (Ci - Cj) � (Pi - Pi)
but 3 a constant k " c + kp if; An
letk=(ci-cj)+l c+kpELCAn
. '. Ci + kPi � Cj + kpj � (Ci - Cj) � k(pj - Pi)
� k -1 � k· m k,m � 1 Contradiction
.'. PE An
.'. L(C,P) c An � C, P C An and the
lemma is true.

A possible way to improve the quality of the writing:


Let N denote the set of nonnegative integers, and let

Nn = { (bl, ... , bn) I bi E N for 1 ::; i S n}

be the set of n-dimensional vectors with nonnegative integer components. We shall be


especially interested in the subset of "nonincreasing" vectors,

(1)

If C and P are subsets of N , let


n

L( C, P) = {c + PI + ...+ Pm Ic E C, m � 0 , and Pj E P for 1 ::; j ::; m} (2)


n
be the smallest subset of N that contains C and is closed under the addition of elements
of P. Since An is closed under addition, L( C, P) will be a subset of An whenever C and P
are both contained in An. We can also prove the converse of this statement.
Lemma 1. H L(C,P) S;; An and C =J 0, then C S;; An and P S;; An.
Proof. (Now it's your turn to write it up beautifully.)

[§2. AN EXERCISE ON TECHNICAL WRITING 7)


§3. An Answer
Here is one way to complete the exercise in the previous section. (But please try to WORI,
IT YOURSELF BEFORE READING THIS. ) Note that a few clauses have been inserted to help
keep the reader synchronized with the current goals a.nd subgoals and strategies of the
proof. Furthermore the notation (h, . . . , bn) is used instead of (PI, . .. , Pn), in the second
paragraph below, to avoid confusion with formula (2).
Proof. Assume that L(C, P) � An. Since C is always contained in L(C,P), we must
have C � An; therefore only the condition P � An needs to be verified.
If P is not contained in An, there must be a vector (bl,· . . , bn)EP such that bi < "j
for some i < j. 'vVe want to show that this leads to a contradiction.
Since the set C is nonempty, it contains some element ( CI, ... , cn). We know that the
components of this vector satisfy CI � ... � Cn, because C � An.
Now ( CI, .. . , Cn) + k(bl, ... , bn) is an element of L(C,P) for all k � 0, and by hy­
pothesis it must therefore be an element of An. But if we take k = Ci - Cj + 1, we have
k � 1 and
C; + kb; � Cj + kbj,

hence
C; - Cj � k(b; - bj). (3)
This is impossible, since C; - Cj =k - 1 is less than k, yet bj - bi � 1. It follows that
(bl, ... , bn) must be an element of An. I
Note that the hypothesis C #- 0 is necessary in Lemma 1, for if C is empty the set
L(C, P) is also empty regardless of P.
[This was the "minor slip".J

BUT ... don't always use the first idea you think of. The proof above actually
commits another sin against mathematical exposition, namely the unnecessary use of proof
by contradiction. It would have been better to use a direct proof:
Let (bl, ... , bn) be an arbitrary element of P, and let i and j be fixed subscripts with
i < j; we wish to prove that b; � bj. Since C is nonempty, it contains some element
(CI, ... ,cn). Now the vector (CI, . . . , cn) + k(bl, ... , bn) is an element of L(C,P) for all
k � 0, and by hypothesis it must therefore be an element of An. But this means that
Ci + kbi � Cj + kbj, i.e.,
C; - Cj � k(bj - b;), (3)
for arbitrarily large k. Consequently bj - bi must be zero or negative.
We have proved that bj - bi � 0 for all i < j , so the vector (bl, ..., bn) must be ,n
element of An. I
This form of the proof has other virtues too: It doesn't assume that the bi's 2 re
integer-valued, and it doesn't require stating that CI 2··· � Cn·

[8 §3. AN ANSWER TO THE EXl UCISE]


§4. Excerpts from class, October 7 [notes by TLL)
Our first serious business involved examining "the worst abusers of the 'Don't use symbols
in titles' rule." Professor Knuth (hereafter known as Knuth) displayed a paper by Gauss
that had a long displayed formula in the title. He showed us a bibliography he's preparing
that references not only that paper but another with even more symbols in the title.
(Such titles make more than bibliographies difficult; they make bibliographic data retrieval
systems and keyword-in-context produce all sorts of hiccups. )
In his bibliography Knuth has tried to keep his citations true to the original sources. The
bibliography contains mathematical formulas, full name spellings (even alternate spellings
when common), and completely spelled-out source journal names. (This last may be
unusual enough that some members of a field may be surprised to see the full journal
name written out, but it's a big help to novices who want to find it in the library.)
We spent the rest of class going over some of solutions that students had turned in for
the exercise of §2 (each sample anonymous). He cautioned us that while he was generally
pleased by the assignments, he was going to be pointing out things that could be improved.
The following points were all made in the process of going through these samples.

In certain instances, people did not understand what constitutes a proof. Fluency
in mathematics is;mportant for Computer Science students but will not be taught
in this class.

Not all formulas are equations. Depending on the formula, the terms 'relation',
'definition', 'statement', or 'theorem' might used.

Computer Scientists must be careful to distinguish between mathematical notation


and programming language notation. While it may be appropriate to use p[r) in a
program, in a formal paper it is probably better to use p with a subscript of r. As
another example, it is not appropriate to use a star (*) to denote multiplication in
a paper about mathematics. Just say xy.

Some people called p an element of P and Pr an element of p. Everything was an


"element." It's better to call pr a "component" of p, thus distinguishing two kinds
of subsidiary relationships.

It is natural in mathematics to hold off some aspects of your definition - to "place


action before defn i ition"
to carry this too far, if too much is being held back. The best location for certain
definitions is a subjective matter.

Remember to put words between adjacent formulas.

When you use ellipses, such as (PI, . . . , Pn), remember to put commas before and
after the three dots. When placing ellipses between commas the three dots belong
on the same level as the commas, but when the ellipsis is bracketed by symbols such
as ' + ' or ' < ' the dots should be at mid-level.

[§4· COMMENTS ON STUDENT ANSWERS (1) 9)


Be careful with the spacing around ellipses. The character string' . . .;' looks strange
(it should have more space after the last dot). All kinds of accidents happen con­
cerning spaces in formulas. Typesetting systems like TEX have built-in rules to
cover 99% of the cases, but if you write a lot of mathematics you will get bitten.

Linebreaks in the middle of formulas are undesirable. There are ways to enforce
this with TEX (as well as other text formatting systems). People who use TEX and
wish to use the vertical bar and the empty set symbol in notation like' {c IcE �}'
should be aware of the TEX commands \mid and \emptyset.

Comments such as, "We demonstrate the second conclusion by contradiction," and
"There must be a witness to the unsortedness of P," are useful because they tell the
reader what is going on or bring in new and helpful vocabulary.

Numbering all displayed formulas is usually a bad idea; number the important ones
only. Extraneous parentheses can also be distracting. For example, in the phrase
"let k be (Ci - Cj) + 1," the parentheses should omitted.

You can overdo the use of any good tool. For instance, you could overuse typo­
graphic tools by having 20 different fonts in one paper.
Two more topics were touched on (and are sure to be discussed further): the use of 'I' in
technical writing, and the use of past or present tense in technical wri ting.
Knuth says that Mary-Claire van Leunen defends the use of'!' in scholarly articles, but
that he disagrees (unless the identity of the author is important to the reader). Knuth
likes the "teamwork" aspect of using'we' to represent the author and reader together. If
there are multiple authors, they can either "revel in the ambiguity" of continuing to use
'we', or they can use added disambiguating text. If one author needs to be mentioned
separately, the text can say 'one of the authors (DEK)', or 'the first author', but not 'the
senior author'.
Knuth (hereafter known as Don) recommends that one of two approaches be used with
respect to tenses of verbs: Either use present tense throughout the entire paper, or write
sequentially. Sequential writing means that you say things like, "We saw this before. We
will see this later." The sequential approach is more appropriate for lengthy papers. You
can use it even more effectively by using words of duration: "We observed this long ago.
We saw the other thing recently. We will prove something else soon."

[10 §4. COMMENTS ON STUDENT ANSWERS (//1


§s. Excerpts from class, October 9 [notes by TLL}
"I'm thinking about running a contest for the best Pascal program that is also a sonnet, "

was the one of the first sentences out of Don's mouth on the topic of the exact definition
of "Mathematical Writing." He admitted that such a contest was "probably not the
right topic for this course." However, a program (presumably even an iambic pentameter
program) is among the documents that he will accept as the course term paper. He
will accept articles for professional journals, chapters of books or theses, term papers for
o t he r courses, computer programs, user manuals or parts thereof: anything that falls into

a definite genre where you have a specific audience in mind and the technical aspect is
significant.
We spent the rest of cl ass continuing to examine the homework assignment. In the interest
of succinct notes, I have replaced many literal phrases by their generic equiva.lents. For
example, I might have replaced'A > B' by '(relation)'. This time I have divided the com­
ments into two sets: those dealing with what I will call "form" (parentheses, capitalization,
fonts, etc.) and those dealing with "content" (wording, sentence construction, tense, etc.).
First, the comments concerning form:

Don't overdo the use of colons. While the colon in 'Define it as follows:' is fine, the
one in 'We have: (formula)' should be omitted since the formula just completes the
sentence. Some papers had more colons than periods.

Should the first word after a colon be capitalized? Yes, if the phrase following the
colon is a full sentence; No, if it is a sentence fragment. (This is not "yet" a standard
rule, but Don has been trying it for several years and he likes it.)

While too many commas will interfere with the smooth flow of a sentence, too
few can make a sentence difficult to read. As examples, a sentence beginning with
,
'Therefore, does not need the comma following 'therefore'. But 'Observe that if
(symbol) is (formula) then so is (symbol) because (reasoning)' at least needs a
comma before 'because'.
Putting too many things in parentheses is a stylistic thing that can get very tiring.
(When Don moves from his original, handwritten draft to a typed, compute r sto red
-

version his most frequent change is to remove extra parentheses.)

Among the parentheses most in need of removal are nested parentheses. To this end,
it is better to write '(Definition 2)' than '(definition (2))'. Unfortunately, however,
you can't use the former if the definition was given in displayed formula (2). Then
it's probably best to think of a way to avoid the outer parentheses altogether.

In some cases your audience may expect nested parentheses. In this case (or in any
other case when you feel you must have them), should the outer pair be changed to
brackets (or curly-braces)? This was once the prevailing convention, but it is now
not only obsolete but potentially dangerous; brackets and curly braces have semantic
content for many scientific professionals. ("The world is short of delimiters," says

[§5. COMMENTS ON STUDENT ANSWERS (2) 11)


Don. ) Typographers help by using slightly larger parentheses for the outer pair in
a nested set.
An entire paper or proof in capital letters is distracting. It gives the impression of
sustained shouting. Same goes for boldface, etc.

Paul Halmos introduced the handy convention of placing a box at the end of a
proof; this box serves the same function as the initials 'Q.E.D.'. If you use such a
box, it seems best to leave a space between it and the final period.

Try to make it clear where new paragraphs begin . When using displayed formulas,
this can become confusing unless you are careful.

Using notationa.l or typographic conventions can be helpful to your readers (as long
as your convention is appropriate to your audience). Boldface symbols or arrows
over your vectors are each appropriate in the correct context. When using a raised
'st' in phrases such as 'the i+1 ,t component', it's better to use roman type: 'i+1" '.
Then it's clear that you aren 't speaking of "1 raised to the power st."

Avoid "psychologically bad" line breaks. This is subjective, but you can catch many
such awkward brea.ks by not letting the final symbol lie on a line separate from the
rest of its sentence. If you are using TEX, a tilde ( ) in place of a space will cause the
-

two symbols on either side of the tilde to be tied together. (Other text processors
also have methods to disallow line breaks at specific points.)

Some of us are much better at spelling than others of us. Those of us who are not
naturally wonderful spellers should learn to use spelling checkers.

Allowing formulas to get so long that they do not format well or are unnecessarily
confusing "violates the principle of 'name and conquer' that makes mathematics
readable." For example, 'v = c + U(Ci - Cj + 1)' should be 'v = C + k", where
k = Ci - Cj + 1', if you're going to do a lot of formula m anipulation in which
(Ci - Cj + 1) rema.ns as a un it.

Be stingy with your quotation marks. "Three cute things in quotes is a little too
cute."
'
Remember to minimize subscripts . For example, Pi is an element of p' could more
easily be 'P is an element of P'.

Remember to capitalize words like theorem and lemma in titles like Lemma 1 and
Theorem 23.

Remember to place words between adjacent formulas. A particularly bad example


was, "Add P k times to c."
Be careful to define symbols before you use them (or at least to define them very
near where you use them ) .

[12 §5. COMMENTS O N ST UDENT ANSWERS (2))


Don't get hung up on one or two styles of sentences. The following sort of thing
can become very monotonous:
Thus, - ---.
Consequently, -- - -.
Therefore, - - - - .
And so, - - - -.

On the other hand, parallelism should be used when it is the point of the sentence.
Now the comments involving content:
Try to make sentences easily comprehensible from left to right. For example, "We
prove that (grunt) and (snort) implies (blah)." It would be better to write "We
prove that the two conditions (grunt) and (snort) imply (blah)." Otherwise it
seems at first that (grunt) and (snort) are being proved.
While guidelines have been given for the use of the word 'that', the final placement
must be dictated by cadence and darity. Read your words aloud to yourself.
The word 'shall' seems to be a natural word for definitions to many mathematical
readers, but it is considered formal by younger members of the audience.
Be precise in your wording. If you mean "not nonincreasing," don't say "increasing";
the former means that Pi < Pj+1 for some j , while the latter that Pi < Pj+1 for
all j.
Avoid passive voice. (My temptation to write, "Passive voice is bad," was over­
whelming. ) For example, replace "It can be shown .. . " by "A proof shows . . . .
"

Mixed tenses on the same subject are awkward. For example, "We assume now
(grunt), hoping to show a contradiction," is better than, "We assume now (grunt),
and will show that this leads to a contradiction."
Many people used the ungainly phrase "Assume by contradiction that (blah) ." It
is better to say, "The proof that (blah) is by contradiction," and even better to say
"To prove (grunt), let us assume the opposite and see what happens."
In general, a conversational tone giving signposts and clearly written transition
paragraphs provides for pleasant reading. One especially easy-to-read proof con­
tained the phrases "The operative word is zero," "The lemma is half proved," and
"We divide the proof into two parts, first proving (blah) and then proving (grunt)."
You can give relations in two ways, either saying 'Pi < Pi' or 'Pi> Pi'. The latter is
for "people who are into dominance," Don says, but the former is much easier for a
reader to visualize after you've just said 'P (PI, Pl,
= , Pn) and i < j'. Similarly,
. . .

don't say 'i < j and Pi < Pi '; keep i and j in the same relative position.

[§5. COMMENTS ON STUDENT A NSWERS (2) 13]


§6. Excerpts from class, October 12 [notes by TLL]

Don opened class by saying that up until now he has been criticizing our writing; now he
will show us what he does to his own. Perhaps apropos showing us his own writing he
quoted Dijkstra: "A good teacher will teach his students the importance of style and how
to develop their own style-not how to mimic his."
First he showed us a letter from Bob Floyd. The letter opened by saying 'Don, Please stop
using so many exclamation points!' and closed with at least five exclamation points. After
receiving this letter he looked in The Art of Computer Programming and found about two
exclamation points per page. (Among the other biographical tidbits we learned at this
class were that Don went to secretarial school, types 80 words per minute, and once knew
two kinds of shorthand.)
Don is writing a book with Oren Patashnik and Ron Graham. The book is entitled
Concrete Mathematics and is to be used for CS 260. He showed us two copies of Chapter
Five of this book: one copy he called "Before" and one he called "After".
The Before copy actually carne into existence long after the work on the book began. Oren
wrote several drafts using the ]}TEX book style, and then the authors availed themselves
of the services of a book designer. The designer decided how wide the text was, what
fonts were to be used, what chapter headings looked like, and a host of other things. The
designer, at the authors request, has left room in the inner margins for "graffiti". That is,
for informal snappy comments from the peanut gallery. (This idea was "stolen" from the
booklet Approaching Stanford.)
The After copy is just another formally typeset revision of the Before copy. N ei ther copy
has yet been through a professional copy editor. Having now mentioned copy editors and
book designers, Don said, "In these days of author self-publishing, we must not forget the
value of professionals." (Actually, the copy edi tor was first mentioned when an error in
punctuation was displayed on the screen.)
Upon receiving a question from the audience concerning how many times he actually
rewrites something, Don told us (part of) his usual rewrite sequence:
His first copy is written in pencil. Some people compose at a terminal, but Don says, "The
speed at which I write by hand is almost perfectly synchronized with the speed at which
I think. I type faster than I think so I have to stop, and that interrupts the flow."
In the process of typing his handwritten copy into the computer he edits his composition
for flow, so that it will read well at normal reading speed. Somewhere around here the
text gets TEXed, but the description of this stage was tangled up with the description of
the process of rewriting the composition. Of course, rewriting does not all occur at any
one stage. As Don said, "You see things in different ways on the different passes. Some
things look good in longhand but not in type."
While discussing his own revisions, he mentioned those of two other Computer Science
authors. Nils Nilsson had at least five different formal drafts of his "Non-Monotonic Rea­
soning" chapter. Tony Hoare revised the algorithm in his paper on "Communicating
Sequential Processes" more than a dozen times over the course of two years.

[14 §6. PREPARING BOOKS FOR PUBLICATION (li]


Don, obviously a fan of rewriting 'in general, told us that he knows of many computer
programs that were improved by scrapping everything after six months and starting from
scratch. He said that this exact approach was used at Burroughs on their Algol compiler
in 1960 and the result was what Don considers to be one of the best computer programs
he has ever seen. On the limits of the usefulness of rewriting, he did say, "Any writing can
be improved. But eventually you have to put something out the door."
The last part of class was spent discussing the font used in the coming book: Euler.
The Euler typeface was designed by Hermann Zapf ( "probably the greatest living t ype
designer" ) and is an especially appropriate font to use in a book that is all about Euler's
work. The idea of the face is that is supposed to look a bit handwritten. For example, the
zero to be used for mathematics has a point at the apex because "when people write zeros,
they never really close them" . This zero is different from the zero used in the text ( for
example, in a date), so book preparation with Euler needs more care than usual. You ha,'e
to distinguish mathematical numerals from English-language numerals in the manuscript.
Somebody asked about 'all' versus 'all of'. Which should it be? Answer: That's a very
good question. Sometimes one way sounds best, sometimes the other. You have to use
your ear. Another tricky business is the position of'only' and 'also'; Don says he keeps
shifting those words around when he edits for tiow.

§7. E x ce rpts from class, October 14 (notes by P M Rj

Don discussed the labours of the book designer and showed us specimen "page plans" and
example pages. The former are templates for the page and show the exact dimensions of
margins, paragraphs, etc. His designer also suggested a novel scheme for equations: They
are to be indented much like paragraphs rather than being centered in the traditional
way. We also saw conventions for the display of algorithms and tables. Although Don
is doing his own typesetting, he is using the services of the designer and copy editor.
These professionals are well worth their keep, he said. Economists in the audience were
not surprised to hear that the prices of books bear almost no relation to their production
costs. Hardbacks are sometimes cheaper to produce than paperbacks. For those interested
in such things, Don recommended a paperback entitled One Book Six Ways (available in
the Bookstore) that describes the entire production process by means of actual documents.
Returning to the editing of his Concrete Maths text, Don went through more of the Before
and After pages he began to show us on Monday, picking out specific examples that
illustrate points of general interest.
He exhorted writers to try to put themselves in their readers' shoes: "Ask yourself what
the reader knows and expects to see next at some point in the text." Ideally, the finished
version reads so simply and smoothly that one would never suspect that had been rewritten
at alL For example, part of the Concrete Math draft said
(Before) The general rule is ( . , . ) and it is particularly val� able because ( . . . ) .
. .
The transformation in (5. 1 2) is called ( . " ). It IS easIly proved· smce
( . . , and . . . ) .

1 '; ;
[§7. PREPARING BOOKS FOR PUBLICATION (2)
Reading this at speed and i n context made i t dear that readers would be hanging on their
chairs wondering why the rule was true; so we should first tell them why, before stressing
the rule's significance:
(After) The general rule is ( . . . ) and it is easily proved since ( . . . and . . . ).
[new paragraph] Identity ( 5. 12) is particularly valuable because ( . . . ).
It is called ( . . . ).
Don's favorite dictionary was of no help on the question of 'replace with' vs. 'replace by'.
The phrase 'by replacing - - by - - ' is bad (due to the repetition), and 'by replacing - -
with - - ' seems worse. In this case the solution is to choose another word: 'by changing
- - to - - '.
As a very general rule, try reading at speed. You will often get a much better sense of the
rhythm of the sentence than you did when you wrote it.
It is a bad idea to display false equations. The reader's eye is apt to alight upon them in
the text and treat . them as gospel. It is much better to put them into the text, as in "So
the equation ' . . . ' is always false!"
Be sure that the antecedent of any pronoun that you use is clear. For example, the previous
paragraph has two sentences beginning 'It is . . . '; they are fine. But sometimes such a
sentence structure is troublesome because 'i t ' seems to be referring to an object under
discussion. For example,
(Before) Two things about the derivation are worthy of note. First, it's a great
convenience to be summing . . . .
(After) Two things about this derivation are worthy of note. First, we see again
the great convenience of summing . . . .
Towards the end of the editing process you will need to ensure that you don't have a page
break in the middle of a displayed formula. Often you'll simply have to think up something
else to say to fill up the page, thus pushing the displayed formula entirely onto the next
page. Try to think of this as a stimulus to research!
Let proofs follow the same order as definitions, e.g., where you have to deal with several
separate cases.
Hyphens, dashes, and minus signs are distinct and should not be used interchangeably.
The shortest is the hyphen. The next is the en-dash, as in 'lines 10-18'. Longer still is the
minus sign, used in formulae: ' 10 18- -8'. The longest of all is the em-dash-used in
=

sentences.
When proofreading you may catch technica.l errors as well as stylistic errors. Think about
the mathematics too, not just the prose. For example, the book was discussing a purported
argument that 0° should be undefined "because the functions xO and 0' have different
limiting values when x -+ 0 " . Don revised this statement to " . . . when x decreases to 0,"
because 0' is undefined when x -+ 0 through negative values.

[1 6 §7. PREPARING BOOKS FOR PUBLICATION (2)1


When you use the word 'instead', be clear about the contrast you are drawing. The read'T
should immediately understand what you are referring to:
(Before) And when x = - 1 instead, . . .
(After) And when x -1 instead of + 1 , . . .
=

Notice the helpful use of a redundant ' +' sign here.


Use the present tense for timeless facts. Things that we proved some time ago are ne,·pr­
theless still true.
Try to avoid repeating words in a sentence.
(Before) - - approach the values - - fill in the values --.
(After) - - approach the values - - fill in the entries - - .
a
In answer to question from the class, Don suggested giving page umb s on n er ly for remote
references ( to equations, say). Usually it is enough to say 'using Equation 5.1 4 ' o whatever.
r
It becomes unwieldy to give page numbers for every single such reference. A member of
the class suggested the 'freeway method' for numbering tables; number them with the page
number on which they appear. Don confessed that he hadn't thought of this one. Sounds
like a neat idea.

m
The formula

( k - �) - ; (m � 1)
1
k:'Om G)
(Before) L =

looks a bit confusing because of the minus sign on the right, 50 Don
changed it to

m; (m � 1 ) .
1
(After)
k� G) G k) - =

There may be many ways to write a formula; you have the freedom to select the best.
(This change also propagated into the subsequent text, where a reference to 'the factor
(k - r/2 ) ' had to be changed to 'the factor (r/2 - k ) '.
Somebody saw an integral sign on that page and asked about the relative merits of

versus other notations like


a x=a

J f(x) dx J f(x) dx .
- 00 % = - 00

Don said that putting limits above and below, instead of at the right, traded vertical space
for horizontal"space, so it depends on how wide your formulas are : Both forms are used.

[§7 . PREPARING BOOKS FOR PUBLrCATION (2) J 7]


Whichever form you adopt should be consistent t hroughout an entire book. Somebody
suggested

but Don pooh-poohed this.


He said that major writing projects each have their own style; you get to understand the
style that's appropriate as you write more and more of the book, just as novelists learn
about the characters they are creating as they develop a story. In Concrete Mathema tics
he is learning how to use the idea of "graffiti in the margin" as he writes more. One nice
application is to quote from the first publications of important discoveries; thus famous
mathematicians like Leibniz join the writers of 20th century graffiti.

§8 . Excerpts from class, October 16 [notes by TLL]

We continued to examine Before and After pages from the book of which Don is a co­
author. The following points were made in reference to changes Don decided to make.
When long formulas don't fit, try to break the lines logically. In some cases the
author can even change some of the math (perhaps by introducing a new symbol) to
make the formula placement less jarring. Such a change is best made by the author,
since the choice of how to display a complex expression is an important part of any
mathematical exposition.
Sometimes moving a formula from embedded text to one separately displayed will
allow the formula to be more logically divided. The placement of the equals sign ( = )
is different for line breaks in the middle of displayed versus embedded formulas: The
break comes after the equals sign in an embedded formula, but before the equals
sign in a display.

While editing for flow, sentences can be broken up by changing semicolons to peri­
ods; or if you want the sentences to join into a quickly moving blur, you can change
periods to semicolons. Breaking existing paragraphs into smaller paragraphs can
also be helpful here.
While making such changes make sure to preserve clarity. For example, make sure
that any sentences you create that begin with conjunctions are constructed clearly.
and that words like 'it' have clear antecedents. (Sentences that begin with the word
'And' are not always evil. )
Make sure your variable names are not misleading. Variable names that are too
similar to conceptually unrelated variables can be confusing. Systematic variable
renaming is one of the advantages of text editors.
We noted last time that present tense is correct for facts that are still true; but it
is okay to use past tense for "facts" that have turned out to be in error.

[18 §8. PREPARING BOOKS FOR P UBLICATfOS 1 . 1 .


One of the most common errors that mathematicians make when they get their own
typesetting systems is to over-use the form of fraction with a horizontal bar ( I +r
y
)
rather than a slash (( 1 + x )/y) . The stacked form can lead to tiny little numbers­
especially when they are used in exponents. One of the most common changes that
mathematical copy editors make is to slash a mathematician's fractions. (They even
know that they have to add parentheses when they do this.)
..-

Exercises are some of the most difficult parts of a book to write. Since an exercise
has very little context, ambiguity can be especially deadly; a bit of carefully chosen
redundancy can be especially important. For this reason, exercises are also the
hardest technical writing to translate to other languages.
Copyright law has changed, making it technically necessary to give credit to nil
previously published exercises. Don says that crediting sources is probably sufficient
(he doesn't plan to write every person referenced in the exercises for his new book.
unless the publisher insists). Tracing the history of even well-known theorems c a n
be difficult, because mathematicians have tended to omit citations. He recently
spent four hours looking through the collected works of Lagrange trying to find the
source of "Lagrange's inequality," but he was unsuccessful. Therefore he's not too
unhappy with the new law.
We can dispense with some of our rhetorical guidelines when writing the answers to
exercises. Answers that are quick and pithy, and answers that start with a symbol.
are quite acceptable.

§g. Excerpts from class, October 1 6 (continued) [notes by TLL]


From esoteric mathematics we moved on to reference books. Don showed us six such books
that he likes to have next to him when he writes.
1 . The Oxford English Dictionary (usually called the OED). He showed us the two
volum� "squint print" edition rather than the 16-volume set. This compact edition
is ofte'n offered as a bonus given to new members upon joining a book club. (There
is a project in Toronto that is will have the entire OED online.)
2. The OED Supplement. The supplement brings the OED up to date. The supple­
ment comes in four volumes, each of which costs $100 or more, so you may have to
go to the library for this one.
3. The American Heritage Dictionary. Don likes this dictionary because of the usage
notes and the Appendix containing Indo-European root words. (For example, the
usage notes will help you choose between 'compare to' or 'compare with' in a specific
sentence.)
4. The Longman Dictionary of Contemporary English. Instead of the historical words
found in the previously mentioned dictionaries, this one has the words used on
the street. Current slang and popular usage are explained in very simple English.

[§g. HANDY REFERENCE BOOKS


( For example, the nuances of 'mind-bending' versus 'mind-blowing' wrsus 'mind­
boggling' are explained.)

5. Webster's New Word Speller Divider. Don said that people who don't spell well find
this book to be quite useful. [I saw no indication that he actually uses it, though. ]
6. Roget 's Thesaurus. This book is a synonym dictionary. Don says that he owns two,
one for home and one for his Stanford office, and he uses them in many different
ways: when he knows that a word exists but has forgotten it; when he wants to
avoid repetition; when he wants to define a new technical term or a new name for
a paper or program.

The issue of British versus American dialect came up. When wntmg for international
audiences, should we use British or American spellings and conventions"? Don says he
agrees with the rule that Americans should write with their own spellings and the British
should do the same. The two styles should be mixed only when, say, an American writes
about the 'labor of the British Labour Party'. ( Readers of these classnotes will now
understand why TLL and PMR spell some words differently. )

§ 10. Excerpts from class, October 19 [ notes by TLL)

Should this course have been named "Computer Scientifical Writing" or "Informatical
Writing" rather than "Mathematical Writing"? The Computer Science Department is
offering this class, but until now we have been talking about topics that are generally of
concern to all writers who use mathematics. Today we begin to discuss topics specific to
the writing of Computer Science.
We are not abandoning mathematical concerns; Don says that a technical typist in Com­
puter Science must know all that a Math department typist must know plus quite a bit
more. He showed us two examples where mathematical journals had trouble presenting
programs, algorithms, or concrete mathematics in papers he wrote. In order to solve the
first problem, Don had to convince the typesetters at Acta Arithmetica to create "floor"
and "ceiling" functions by carving off small pieces of the metal type for square brackets.
The second problem had to do with typographic conventions for computer programs; The
American Mathematical Monthly was using different fonts for the same symbol at different
. .
pomts m a procedure, was m ' t erchangeably usmg
' := ,
u n : = , an d " =: " t 0 represen t an
,, "

assignment symbol.
Stylistic conventions for programming languages originated with Algol 60. Prior to 1960.
FORTRAN and assembly languages were displayed using all uppercase letters in variable
width fonts that did not mix letters and numbers in a pleasant manner. Fortunately, Algol's
visual presentation was treated with more care: Myrtle Kellington of ACM worked from
the beginning with Peter Naur (editor of the Algol report) to produce a set of conventions
concerning, among other things, indentation and the treatment of reserved words.
Don found the prevailing variable-width fonts unacceptable for use in the displayed com­
puter programs in Volume 1 of The Art of Computer Programming, and he insisted that

[2 0 § 1O . PRESENTING ALGORITH.\/S ]
he needed fixed width type. The publishers initially said that it wasn·t possible, but they
eventually found a way to mix typewriter style with roman, bold, and italic.
Don says he had a difficult time trying to decide how to present algorithms. He could
have used a specific programming language, but he was afraid that such a choice would
alienate people (either because they hated the language or because they had no access to
the language). So he decided to write his algorithms in English.
His Algorithms are presented rather like Theorems with labeled steps; often they have
accompanying (but very high-level) flow charts ( a technique he first saw in Russian liter­
ature of the 1950s). The numbered steps have parenthetical remarks that we would call
comments; after 1968 these parenthetical remarks are often invariant relations that can be
used in a formal proof of program correctness.
Don has received many letters complimenting him on his approach, but he says it is not
really successful. Explaining why, he said, "People keep saying, 'I'm going to present an
algorithm in Knuth's style,' and then they completely botch it by ignoring the conventions
I think are most important . This style must just be a personal style that works for me.
So get a personal style that works for you. " In recent papers he has used the pidgin Algol
style introduced by Aho, Hopcroft, and Ullman; but he will not change his style for the
yet-unfinished volumes of The Art of Computer Programming because he wants to keep
the entire series consistent.
Don says that a computer program is a piece of literature. ( "I look forward to the day
when a Pulitzer Prize will be given for the best computer program of the year." ) He
says that, apart from the benefit to be gained for the readers of our programs, he finds
that treating programs in this manner actually helps to make them run smoothly on the
computer. ( "Because you get it right when you have to think about it that way." )
He gave us a reprint of "Programming Pearls" by Jon Bentley, from Communications of
the ACM 29 (May 1986), pages 364-369, and told us we had best read it by Wednesday
since it will be an important topic of discussion. Don, who was 'guest oyster' for this
installment of "Programming Pearls," warned us that "this represents the best thing to
come out of the TEX project. If you don't like it, try to conceal your opinions until this
course IS over. "
.

Bentley published that article only after Don had first published the idea of "literate
programming" in the British Compu ter Journal. (Don says that he chose the term in
hopes of making the originators of the term "structured programming" feel as guilty when
they write illiterate programs as he is made to feel when he writes unstructured programs.)
When Bentley wanted to know why Don did not publish this in America, Don said that
Americans are illiterate and wouldn't care anyway. Bentley seems to have disagreed with
at least part of that statement. (As did many of his readers: The article was so popular
that there will now be three columns a year devoted to literate programming.)
As Don began explaining the "WEB" system, he restated two previously mentioned princi­
ples: The correct way to explain a complex thing is to break it into parts and then explain
each part; and things should be explained twice (formally and informally). These two
principles lead: naturally to programs made up of modules that begin with text (informal

[ § 1O . PRESENTING ALGORITHMS 21]


explanation) and finish wi th Pascal ( formal explanation) .
The WEB system allows a programmer to keep one source file that can produce either a
typeset ting file or a programming language source file, depending on the transforming
program used.
Monday's final topic was the "blight on the i ndust ry " : user manuals. Don would like us
to bring in some reall y stellar examples of bad user manuals. He tried to find some of
his favorites but foun d t ha t they had been i m prove d (or hidden) when he wasn ' t look i ng .
While he could have brought in the improved manuals, bad examples are much more fUll.
He showed a brand-new book, The AWK Programming Language, to i ll ustrate a principle
often used by the writers of user manuals: Try to write for the absolute novice. He says
that many manu als say just that, but then proceed to use jargon t h at even some f'xperts
are uncomfortable with. While the AWK book does not explicitly state this goal. the
authors (Aho, Weinberger, Kernighan AWK) told him that they had this goal in mind.
=

But the book fails to be comp rehensible by novi ces. It fails because, as Don says , "If you
are a person who has been in the field for a long time, you don't realize when you are
using jargon." However, Don says that just because the AWK book fails to meet this goal
does not mean that it isn't a good book. ( "Perhaps the best book in Comp u ter Science
published this year." ) He explains this by saying, "If you try to write for the novice. you
will communicate with the experts-otherwise you communicate with nobody. "

§ l.1 . Excerpts from class, October 21 [notes by P MRj

Don opened class with the good news that Mary-Claire van Leunen has agreed to help
read the term papers and drafts thereof, despite the fact that her name was in co rr e ct.ly
capitalized in last week's notes.
Returning to the subject of "Literate Programming," Don said that it takes a while to find
a new style to suit a new system like WEB. When he was trying to write the WEB program
in its own language he tore up his first 25 pages of code and started again, having finally
found a comfortable style. He digressed to talk about the vicious circle involved in writing
a program in its own language. To break it, he hand-simulated the program on itself to
produce a Pascal program that could then be used to compile WEB programs. The task was
eased because there is obviously no need for error-handling routines when deali ng with
code that you have to debug anyway. But there is also another kind of bootstrapping
going on; you can evolve a style to write these programs only by sit t ing down and writing
programs. Don told us that he wrote WEB in just two months, as it was never intended to
be a polished product like '!EX.
We spent the rest of the class looking at WEB programs that had been written by under­
graduates doing independent research with Don during the Spring. We saw how t hey had
(or had not) adapted to its style. Don said that he had got a lot of feedback and some­
times found it hard to be dispassionate about stylistic questions, but that some things
were clearly wrong. He showed us an example that looked for all the world just· like a
Pascal program; t he student had obviously not changed his ways of thinking or wri ting at

[££ § l l . LITERATE PROGRAlvlMING (J Ii


all, and so had failed to make any use of the features of the system. The English in his
introductory paragraph also left a lot to be desired.
Don showed us his thick book 'Tp;X: The Program a listing of the code for 'JE,'(, written
-

in WEB. It consists of almost 1400 modules. The guiding principle behind WEB is that each
module is introduced at the psychologically right moment. This means that the program
can be written in such a way as to motivate the reader, leaving TANGLE to sort everything
'. . out later on. [The TANGLE processor converts WEB programs to Pascal programs.] After
all, we don't need to worry about motivating the compiler. (Don added the aside that
contrary to superstition, the machine doesn't spend most of its time executing those parts
of the code that took us the longest to write.) It seems to be true that the best way in
which to present program constructs to the reader is to use the same order in which the
creator of the program found himself making decisions about them. Don himself always
felt it was quite clear what had to be presented next, throughout the entire composition
of this huge program. There was at all points a natural order of exposition, and it seems
that the natural orderings for reading and writing are very much the same.
The first student hadn't used this new flexibility at all; he had essentially just used WEB to
throw in comments here and there.
A general problem of exposition arose: How are we to describe the behavior of a computer
program? Do we see the program as essentially autonomous, "running itself," or are we
participants in the action? Our attitude to this determines whether we are going to say
'we insert the element in the heap' or 'it inserts the element . . . '. Don favours 'we'; at
any rate one should be consistent.
Students used descriptors and imperatives for the names of their modules; Don said he
favours the latter, as in ( Store the word in the dictionary ) , which works much better than
( Stores the word in the dictionary ) . On the other hand, where a module is essentially
a piece of text with a declarative function-a list of declarations, say-we should use a
descriptor to name it: ( Procedures for sorting ) .
Incidentally, it is natural to capitalize the first letter of a module name.
One student used the identifier 'FindinNew Word3'. This looks comparatively bad in print:
Uppercase let ters were not designed to appear immediately following lowercase ones. Since
the use of compound nouns is almost inevitable, WEB provides a neat solution. It allows
a short underscore to be used to conjoin words like geLword . (Since the Pascal compiler
will not accept identifiers like this, TANGLE quietly removes the underscore.) Don told us
that Jim Dunlap of Digitek, who made some of the best early compilers, invariably used
identifiers forty-or-so characters long. The meaning was always quite clear although no
comments app eared .

Each module should contain an informal but clear description of what it actually does. A
play-by-play account of an algorithm, a simple stepping through of the process, does not
qualify. We are trying to convey an intuition of what is going on, so a high-level account
is much more helpful.
We saw several modules that were much too long. Don thinks that a dozen lines of code
is about the right length for a module. Often he simply recommended that the students

[§ 11 . LITERATE PROGRAMMING (1) J.'I '


cut the offending specimens into several pieces, each of more manageable size. The whole
philosophy behind WEB is to break a complex thing into tractable parts. so the code should
reflect this. Once you get the idea, you begin writing code this way. and it's easier.
We saw an example in which the student had slipped into "engineerese" in his descriptive
text-all conjunctions and no punctuation. This worked for James Joyce, but it doesn't
make for good documentation. One student had apparently managed to break WEB-the
formatting of begins and ends came out all wrong. Heaven knows what he did.
One student put comments after each end to show what was being ended, as end { while } .
This is a good idea idea when writing Pascal, but it's unnecessary in WEB. Thus it's a good
example of a convention that is no longer appropriate to the new style; when you change
style you needn't carry excess baggage along.
Don had more to say about the anthropomorphization of computer systems. Why prompt
the user with ' Name of f i l e to process?' when we can have the computer say 'What
f i l e should I proce s s? ' ? Don generally likes the use of ' I ' by the computer when
referring to itself, and thinks this makes it easier for users to conceptualize what is going
on. Perhaps humans can think of complex processes best in terms of demons in boxes,
so why not acknowledge this? Eliza, the AI program that simulates a certain type of
psychiatrist, managed to fool virtually everyone by an extension of this approach. Eliza
may or may not be a recommendation for anthropomorphisms, or for psychiatry. There
are those, such as Dijkstra, who think such use of'I' to be a bad thing.
As in the case of maths, don't start a sentence with a symbol. So don't say ' data assumes
that . . . '-it can easily be rewritten.
We saw several programs by one student who had developed a very distinctive and (Don
thought) colourful style. His prose is littered with phrases like "Oooops! Ho\\' can we
fix this?" and "Now to get down to the nitty-gritty!" This stream-of-consciousness style
really does seem to motivate reading, and helps infect the reader with the author's obvious
enthusiasm. There were a few small nits to pick with this guy though: His descriptions
could often be more descriptive. Why not call a variable caps_range instead of just range?
Don also had to point out to him that 'complement' and 'component' are in fact two
different words.
In WEB you can declare your variables at any point in the program. Don thinks it is always
a good idea to add some comment when you do so, even if only a very cursory explanation
is needed.
A note about asterisks: Be warned that typeset asterisks tend to appear higher above the
line than typewritten ones, so your multiplication formulre may come out looking strange.
Better to use x for multiplication, and to use a typewriter-style font with body-centered
' . ' symbols instead of the ' . ' in normal typographic fonts.
Another freshman was digitizing the Mona Lisa for reasons best known to insiders of Don's
research project. Don pointed out that since the program uses a somewhat specialized data
structure (the heap) that might be unknown to the readers, the author should keep all the
heap routines together in the text so that they can be read as a group while fresh -in the
reader's mind. In WEB we are not constrained by top-down, bottom-up, or any other order.

[ 24 §ll. LITERATE PROGRA MMI.VG (l!j


This student capitalized the first letter of every word in titles of modules, e'-en '_-\nd'
and the like. This looks rather unnatural-it is better to follow the newspaper-headlines
convention by leaving such words entirely in lowercase, and even better to capitalize only
the first word.
Don thought it a good idea to use typewriter type for hexadecimal numbers, for instance
when saying 3F represents 63'. But leave the '63' in normal type. This convention looks
'

appropriate and provides a kind of subliminal type-checking.


The words used in the documentation should match the words used in the formal program­
you will only confuse the reader by using two different terms for the same thing.
It's a good idea to develop the habit of putting your begins and ends inside the called
modules, not putting them in the calling module. That is, do it like this:
if down = 4 then ( Punt ) ;

( Punt ) =
begin snap;
place;
kick;
end

Not like this:


if down = 4 then
begin ( Punt ) ;
end

( Punt ) =
snap;
place;
kick

Incidentally, appalling bugs will occur if we mix the two conventions!

[§ 1 1 . LITERATE PROGRAMMING (1) 25 1


§ 12. Excerpts from class, October 23 [notes by P MR]
One of the chief aspects of WEB is to encourage better programming, not just better expo­
sition of programs. For example, many people say that around 25% of any piece of code
should be devoted to error handling and user guidance. But this will typically mean that
a subroutine might have 15 lines of 'what to do if the data is faulty' followed by one or
two lines of 'what to do in the normal course of events'. The subroutine then looks very
much like an error-handling routine. This fails to motivate the writer to do a good job;
his heart just isn't in the error handling. WEB provides a solution to this. The procedure
can have a single line near the beginning that says ( Check if the data is wrong 28 ) and
points to another module. Thus the proper focus is maintained: In the main module we
have code devoted to handling the normal cases, and elsewhere we have all the error-case
instructions. The programmer never feels that he's writing a whole lot of stuff where he'd
really much rather be writing something else; in module 28, it feels right to do the bf'st
error detection and recovery. Don showed us an example of this from his undergradua te
class in which a routine had two references of the form
if . . . then . . . else char_error
pointing to a " ery brief error-reporting module.
\Ve looked at a program written by another student who had the temerity to include some
comments critical of WEB. Don struck back with the following:
It is good practice to use italics for the names of variables when they appear m

comments.
Let the variables III the module title correspond to the local parameters m the
module itself.
According to this student's comments, his algorithm uses 'tail recursion'. This is
an impressive phrase, helpful in the proper context; but unfortunately that is not
the kind of recursion his program uses.
However, Don did grant that his exposition was good and gave a nice intuition of the
functions of the modules.
We saw a second program by the same student. It had the usual sprinkling of "wicked
whiches"-'which's that should have been 'that's. The purpose of the program was to
"enforce" the triangle inequality on a table of data that specified the distances between
pairs of large cities in the US. Don commented here that his project (from which these
programs came) intends to publish interesting data sets so that researchers in different
phases can replicate each other's results. He also observed that a program running on a
table of " real data," as here ( the actual "official" distances between the cities in question)
is a lot more interesting than the same program running on "random data." Returning to
the nitty-gritty of the program, Don observed that the student had made a good choice of
variable names-for instance ' villain.' for those parts of the data that were causing incon­
sistences. This fitted in nicely with the later exposition; he could talk about 'cut throats'
and so forth. ( Don added that we nearly always find villainy pretty unamusing in real life.

[26 § 12. LITERATE PROGRAMMliVG (211


but the word makes for a witty exposition in artificial life; the English language has lots
of vocabulary just waiting for such applications. )
Don wondered aloud why i t is that people talk about "the nth and mth positions" ( as this
student had) thereby reversing the natural (or at any rate alphabetical) order?
He also pointed to an issue that arises with the move from typewriters to computer
typesetting-the fact that we now distinguish between opening and closing quotes. We
saw an example where the student had written "main program" . To add to the confu­
sion, different languages have different conventions for quotes; in German they appear like
this: " The Name of the Rose". How to represent this in a standard ASCII file remains a
mystery.
Back to the triangle inequality. Don pointed out that one obvious check for bad data in
the distance table follows from the fact that the road distance can not be less than a Great
Circle route. ( "It could, if you had a tunnel" commented a New Yorker in the audience.)
The student had written a nice group of modules based on this fact, and it illustrated the
WEB facility of being able to put displayed equations into comments.
"SO WEB effectively just does macro substitution"!" asked another member of the class.
Exactly, said Don. In fact the macros he uses are not very general-they really allow only
one parameter. This means he doesn't need a complex parser, but in fact one can do a
great deal within this restriction. For instance, it is not difficult to simulate two-parameter
macros if we wish.
Someone in the class commented that it seemed a little strange to put variable declarations
in a different module from their use. Don said that this was OK as long as they are close
to their use, but large procedures should have their local variables "distributed" as the
exposition proceeds.
Don recalled that older versions of Algol allowed you to declare a variable in the middle
of a block. This fits in nicely with the WEB philosophy, but unfortunately cannot be done
in modern Pascal. Indeed, Don became painfully aware of the limitations of Pascal for
system programming when he was writing WEB-you can't have an array of file names, for
example. He got around them, though, with macros.
One example of improving Pascal via macros is to define (in WEB)
string_type (#) == packed array [l :#J of char
so that you can say things like
name_code : string_type (2)
when declaring a two-letter string variable.
At this point, prompted by a note from Tracy, Don announced that 23 copies of the
Handbook for Scholars had arrived in the Bookstore, with more to come. A resounding
cheer echoed throughout Terman.
Don commented that the student had called a certain variable called ' scan'. Since this
variable was essentially a place marker, Don thought that a noun v{Quld be much better

[§ 1 2 . LITERATE PROGRAMMING (2) 271


than a verb-'place', perhaps. Let the function determine the part of speech; think of it
as a kind of Truth in Naming. Verbs are for procedures, not data.
The last student had written a program to handle graph structures based on encounters
between the characters in novels. He too had made the " quote mistake" . The student gave
a nice characterization of the input and output of the program, using the typewriter font
to illustrate data as it appears in a file.
This student also showed a bit of inconsistency in the use of 'it' and 'we' as the personifi­
cation of his program. We seem to be finding the same old faults over and over now, Don
said, so perhaps that indicates that we have found them all. Discuss.

§1 3. Excerpts from class, October 26 [notes by P M R]


We moved on to the subject of user manuals. Don was disappointed that nobody had
responded to his request in a previous lecture to give him glaring examples of bad ones­
either they are being much better written these days or we hadn't taken him seriously. So
instead, Don produced mini-sized user manuals written by CS graduate students for his
class CS 304 earlier this year. The students had had to tackle five weird and wonderful
problems in ten weeks; one of the problems had been to design and implement some
software and to write a one-page user manual for the 'Digiflash' display system. This is
the kind of thing you see in Times Square, and increasingly in bars and post offices, in
which news and advertising flows across a sort of dot-matrix screen. In this case, the
screen was to be a simple array of 8 by 256 pixels. The students had only two weeks in
which to write the system and manual, which were then subject to the ultimate test, the
truly Na'ive User. The idea was that the user would need no understanding of computers
or of writing, but should still be able to use the system to produce a variety of visual
effects. The students divided themselves into four teams and so we saw four solutions to
the problem.
A common failing was that terminology that seemed perfectly transparent meant nothing to
Don's wife: "Menu" and "Scrolling" for example. Such terms are so familiar to CS people,
it never occurs to them that these are actually technical terms.
Don went through the solutions in ascending order of competence. The class reaction to
this discussion might almost have led one to believe that some of the authors were sitting
amongst us.
Don digressed on the subject of 'i.e. '. Is it formal, he asked, or is it part of the language?
He confessed that he was considering taking all the 'i.e.'s out of his new book. One thing
he does know: You should always put a comma after 'i.e.'. (Except in this instance. )
The first solution could b e described as a very logical approach, almost an archetypically CS
solution. The manual was essentially a hierarchy of definitions. The writers talked about
MESSAGES (or MESSAGEs-consistency was not their watchword) when they wanted to
say: 'here are objects that have a special meaning for us and whose definition you ought to
know'. But, said Don, formal definitions are not the way to explain something to a novice.
This write-up apparently thought the phrase LEFT-INDENTED to be self-explanatory,
although it meant 'flush with the left margin'. ( Left unindented?) The user was prompted

[28 § ' 3 . USER MANUALS]


to enter ' Type of mes s age ( 1 -6 ) : ' Why should there be numeric types? Sentences like
"And now you should ENTER the data" do nothing to help the user relax-the capitals
look too much like DANGER SIGNS.
Don's wife commented that one thing she always needs to know is "How do I get out of
a mess if I do something wrong?" Don said that this is something manuals almost never
explain-perhaps it never occurs to their authors that somebody will eventually want to
stop playing with their program. The solution we were looking at did have a one-line
description of how to EXIT, but Don said even this is jargon.
The second solution was Digiflash TM. It had a good introductory and motivational para­
graph, albeit with a whole crowd of 'which's that should have been ' that's. Unfortunately
it claimed that the system was very easy to use and understand-a claim that can rebound
by making the user feel stupid. There was a major flaw in the program in that one was ex­
pected to hit OPTION-B to enter 'bold' mode, and then OPTION-B again to leave it. Don
thought it would be far more natural to type OPTION-N ( for 'normal') instead. Option-V
was "reverse video"-another jargon word, and why wasn't this OPTION-R? There were
some cute options though: 'M' for 'slowly materializing' text, and an assortment of small
animal logos that could be made to appear.
The third solution was the DiJKSTra system, so named to keep it sufficiently Dutch
(obscure in-joke, please ignore). The authors had a nice use of the phrase 'flashing bar'
instead of the more technical 'cursor' (though for some reason they still felt impelled to
define the latter as the former), and likewise they said 'hit return' instead of 'enter' (or
worse, ENTER). They also kept their sentences nice and short. Another good idea was
that the manual invited the user to type ' ? ' to get an online demonstration, thus sparing
us a painful description of such arcane concepts as boldface italic reverse video fade-in
mode and incidentally helping to keep the manual terse. If a picture is truly worth a
thousand words, said somebody, then an animated demonstration must be worth at least
a paragraph. One problem with this system was that the user is prompted for five or so
paranleters every time he enters a new line, and the defaults are fixed. Wouldn't it be
better, asked Don, to default to the style used for the preceding line?
The last solution (though not even typeset, much less TEXed) Don declared to be the best.
There was a good overview and a step-by-step description of the system; very friendly
looking. Crisp sentences. Easy to skim. Helpful redundancy and diagrams. Don said
that there's really nothing much you can do about the reader who insists on starting at
a random point in the middle of a text. When he surreptitiously watches people looking
at his books in the bookstore, he notices that they always start in the middle somewhere,
not at the preface where he wanted them to read first.
There was a good use of a symbol in the text to indicate the control key, and likewise
diagrams of the keyboard to explain which keys to use for left, right, next message, previous
message, etc. It was also good to emphasize that the control key must actually be held
down while another key is typed (that is, they are not simply typed successively) . Perhaps
the main flaw was that the user was expected to realize that 'up' meant, in effect, 'go to
the previous message'; and 'down', 'go to the next message'. To those unfa.miliar with
full-screen editors, this mightn' t be obvious. There was a. nice use of icons to describe

[§ 1 3 . USER MANUALS 291


scrolling up, down, left, or right though. One obscurity was the advice

\ THIS IS VISIBLE HERE I BUT NOT HERE

Don declared that he didn't know what this was supposed to mean; it would be a lot better
to say 'Extra long messages can be seen if you make them move'.
It's good to have plenty of comments like 'Good luck!' and 'Enjoy" scattered here and
there. But Don thought the phrase 'this system has been carefully redesigned not to bite'
hardly reassuring.

§ 1.4' Excerpts from class, October 26 (continued) [ notes by PMR]


In Don's mailbox today he found galley proofs from the ACM, to be corrected and returned
within 48 hours of this time two weeks ago. Unphased by this injunction he went over the
text with us. The Algol programs seemed to be laid out properly. There were occasional
cryptic marginal notes: 'Bad proof, Camera copy OK'. He took this to mean that his copy
was made by a laser printer instead of a phototypesetting machine. vVe learned tha.t ' Au'
means not gold but 'author' in the copy-editing world. The copy editor had substituted
'cleverer' for Don's 'more clever', citing Fowler. Don sighed and recalled the occasion that
Scientific American had replaced his 'more common' with 'commoner'. It was noticeable
that the copy editor was not going to change anything without Don's specific approval­
not even removing the first 'of' in ' . . . several possible of values of the variable n . . ' . Don
.

told us that at the moment all papers are re-typed by the publishers, except for one or
two AI journals that have used TEX for several years. But next year a math journa.l will
be adopting a policy in which the author's text is manipula.ted electronically throughout
the whole process. This should speed publication and reduce errors and costs.
Some of the note's in the galley were signed ' Ptr', that is 'printer', and asked 'OK?'. Don
answers affirmatively by circling the 'OK'. At one point he was asked to sanction the
insertion of a whole new sentence. Apparently he had made reference to Figure 14 before
Figure 13, and his approval was sought to make an extra comment first about 'Figures
13-16'. (The extra comment was wrong but fixable. )
The publishers also insisted o n more details i n his bibliography. They wanted t o know,
for example, exactly where and when a conference had taken place. Someone in the class
pointed out that Mary-Claire van Leunen recommends omitting the location of conferences.
Don replied that libraries often nowadays index conferences by city for those poor souls who
can remember nothing else about them; so such information was useful. He observed that
people have a great tendency to copy citation information blindly into their own papers,
and so errors propagate unchecked. When Elwyn Berlekamp wrote his book on coding
theory, he found that nearly half the information in bibliographies of papers was wrong!
Don wrapped up the galley proof. discussion by showing us a few tables of (somewhat )
standardized proof-readers' symbols.

[SO § 14. GALLEY PROOFS I


§ 1 5. Excerpts from class, October 30 [notes by PMRj

Today Don spoke about the refereeing process. A paper submitted to an academic journal
is usually passed to one or more referees by the editor of the journal. Each referee is
intended to be an expert in the relevant field, and thus in a position to tell the edit.or
whether or not the paper meri ts publication. Don pointed out that many of us will one
day find our papers being subject to just this scrutiny; and some of us will certainly be
-
asked to assess other people's papers ourselves.
Don talked about his now famous research on "The Toilet Paper Problem." This was first
published in the MONTHLY, and as Don pointed out to the Editor in his cover letter, mRlly
of its readers probably keep their copies in the bathroom anyway. The editor (H alulOs)
replied a little gravely that "jokes are dangerous in our journal" , and asked Don to think
twice about the scatological references. Don did agree to change the section names-which
originally continued the pun with such headings as 'An absorbing barrier', 'A process
of elimination', and 'Residues'-to innocuous equivalents, but kept the title intact. In
justification of this, Don pointed out to the editor that two talks had already been given on
his results under this title, and that the material had been widely circulated and discussed.
"Your toilet paper is accepted" replied Halmos. Don confessed that he still has occasional
doubts when he catches sight of the title amongst his papers , but the deed is done now.
Still, it did get reasonably good reviews, even in Russia.
Don showed us an article entitled 'Rules for Referees' by Forscher, published in Science
(October 1 5, 1965 ). These rules constitute a rather traditional view, Don said, and em­
phasize the legal rights and responsibilities of all concerned. Don thought that this seems
a lot more oriented to the advancement of careers rather than of science as such; the right
reason to publish is to build upon the results of others and provide a foundation for fu t.ure
research. It is a sad truth, said Don, that an editor can all too easily find himself spending
a great deal of time dealing with those authors whose papers don't merit publicatioll. for
it is usually very hard to convince them of the fact . Rebuttals are followed by counter­
rebuttals , and so on. But fortunately this doesn't happen so often that the whole business
of science gets bogged down.
The referee is conventionally regarded as a sort of "expert witness," whose task is to tell
the editor whether the paper deserves to be published or not. The first criterion should
be originality; is the material presented a genuine advance on previous work?
Don urged referees to see their primary responsibility as being to authors and readers, not
just to editors. Don himself decided long ago that he would put more of his efforts into
refereeing papers before their publication than into reviewing published papers. Don hoped
that he could thus do his bit to encourage high standards of writing in Computer Science
and help the field win respect . These days there are more good people to go around, both
in refereeing and reviewing.
In the 1960's Don circulated a list of 'Hints to referees' to try to encourage good practice.
He wanted to show us the list, but not a copy can be found. Don has written to some of
the people to .whom he sent it, so it is possible that a copy will turn up before the end of
the quarter.

[§ 1 5 · REFEREEING (1)
Don disagreed with our guest speaker, Herb ·'Nilf, who had said that he would tolerate
more stylistic lapses in the Journal of Algorithms t han in the M O NTHLY. Authors, thought
Don, should always be encouraged to do better; he could recall only a single occasion when.
as a referee or editor, he could recommend no improvements at all. (The author in this
case was George Collins writing for the ACM journal. ) Let us publish journals to be proud
of, he said. This was sadly not true of Computer S cience in the early 60s. Some published
results were just plain wrong; or a correct result was incorrectly proved; or a paper simply
contained no results at all! Contrast this state of affairs, said Don, to the math journals
that were published in the 20s and 30s-leafing through them at random we see a host
of familiar names and theorems, because so much of what was written then was polished,
significant, and worth reprinting in textbooks . The same could not be said of loday's
efforts-perhaps we have grown increasingly tolerant of substandard work.

Referees should try to be teachers, said Don. The author you criticize today will be writing
another paper tomorrow, so try to help him improve his writing. Unfortunately, referees
wi l l often be over-critical and make quite t asteless comments on papers, knowing that
they do so under a cloak of anonymity. This only angers the author and he learns nothing.
Try to supply constructive criticism, Don urged. These human issues are not discussed in
Forscher's ' Rules '.

In addition, the referee can contribute to the technical quality of a paper by giving ref­
erences to related work of which the author was apparently unaware, or improving the
results. Don himself has contributed results anonymously to papers-more than one au­
thor has had to add a footnote: "My thanks to the referee for Theorems 4, 5, and 6 . "
Don was always pleased t o feel that by doing this the image o f the journal was improved.
A journal should be seen as a source of wisdom, so let us be cooperative toward this end,
not legalistic.

How should one choose a journal to which to submit a paper? Don thought the answer is
to look for the one with the best referees, not the one with the least critical editor. After
all, an author presumably wants to know whether he has really made a contribution to his
field. So find a journal that has handled papers on related subjects.

Someone asked whether the letters that appear in journals are also refereed. Don said
that sometimes they are, sometimes not. There is often nothing to distinguish letters from
short papers.

Some journals do not use referees at all. Their readership must be willing to wade through
a great deal of nonsense. The ACM did at one time have plans to publish an unrefereed
journal, but to Don's relief that never came to fruition.

At this point Don confessed to a sneaky trick he had pulled way back in the 60s. At that
time he had just begun to edit the programming languages material for the Communica­
tions of the A CM and the Journal of the A CM. He had no way of k nowi ng which of his
referees were any good, so in an effort to calibrate them he sent all a copy of the same paper
and solicited their opinions. Don had already refereed the paper himself, of course, and
found it a very badly written exposition of a very interesting algorithm (due to someone
besides the author) . As such, it was certainly worthy of the referees ' study.

[32 § 15 · REFEREEIt';C (1)1


We looked at some of the results. One commentator simply went through line by line, listing
his complaints point by point. Another made much more general comments: "A paper
with this title should contain (1) a complete algorithm; (2) a proof or at least a convincing
explanation of corrctness; (3) a statement of limitations on the algorithm's applicability.
None of these can be found here." A third said that the paper contained little that was
new, and supplied a substantial bibliography for the author to go away and study. The next
referee liked the algorithm and recommended the paper for publication. Don was surprised;
he had mistakenly thought that this referee had originally invented the algorithm himself!
Another critic dismissed the paper as 'incredibly poorly written'. Another commented it
was not a paper to be read, but rather a puzzle to be solved.
Don told us that as a result of his experiment, the algorithm actually became quite well­
known.
On one occasion Don ripped into a paper with a long report on its failings, and was later
told by the author that those constructive comments had changed his life: The author had
resolved that from then on he was going to study writing and give a lot of attention to
exposition. This nameless individual went on to become a renowned professor at a great
(but here equally nameless) university, and an editor of a fine journal.
In answer to a question, Don said that if the content of a paper was obviously bad, he
would not spend time reviewing the grammar. But in studying a paper that really has
something to say, then he would also try to ensure that it was said as well as possible.
Don showed us some referees' reports on one of his recent papers. The editor had told him
that these were 'mostly positive'-in fact two were in favour and one against. The referees
in this case had been asked to answer a specific list of questions about his paper-Don
said that this tedious format might at least cause a referee to consider issues he might
otherwise have forgotten about. The referees did agree that Don hadn't made enough
reference to earlier work in the subject. This didn't surpirse him; the paper was his first
venture into an unfamiliar field. The referees were helpful enough to comment now and
then that they had particularly enjoyed certain sections, and they provided a whole slew
of references to other work-references that Don said had led to some new ideas. They
were also able to point out subtle technical errors; Don had to write a program to convince
himself that one in particular of these criticisms was valid. Finally, we were amused to
see that the referees were asked to assign an overall rating to the paper by checking one
of a series of boxes, ranging from (as the most fulsome praise) "accept" , down through
"accept with major modifications" and "accept with minor changes" to "Reject: submit
to ".
_______ When checking this last box (and most damning indictment), the referee was
asked to suggest a less prestigious journal that might publish such inferior work. By such a
downward filtering, even the most appalling paper stands some chance of finding its place
in the pages of what Dijkstra has characterised a "Write-Only Journal". With four new
scientific papers being published every minute throughout the world, we can rest assured
that many do so.

[§ 1 5 · REFEREEING (1) SS]


§ 16. Excerpts from class, November 2 [notes by TLL]
Today's handout, "Hints for Referees" by D. Knuth (see § 17 below) , could have been
subtitled "Ask and ye shall receive." Last Friday Don mentioned in class tha t he could
find no copy of this document, but when he returned to his office immediately after class
he found it sitting on his desk. (To be truthful, he thinks this copy has gone through a
few revisions since it left his hands; he no longer recognizes the style of all the comments. )
Before demonstrating to us how highly he esteems referees and the lengths to which he
will go in order to get referees, Don told us to note an important date on our calendar:
On Wednesday, November 1 8, we are to turn in the first drafts of our Term Papers ( "The
closer to the final version, the better" ).
The identities of the referees for a journal paper are usually hidden from the author. Is
the identity of the author ever hidden from the referees? In some few journals, yes. Don
is well aware that the name written just below the title of a paper can strongly effect the
reader's reaction, so he submitted a journal paper using the sobriquet, Ursula N. Owen.
(Those of us who have read Agatha Christie's And Then There Were None realize that his
use of the name U. N. Owen is a pleasant allusion.)
Don doesn't always resort to pseudonyms, but neither does he always get his papers ref­
ereed. On occasion he has recruited his own referees when he found out that his target
publication was supplying none. As an example, his paper on Literate Programming for
the British Computer Journal generated no referee reports (and no feedback of any kind);
they went right into print.
Don repeatedly stated how invaluable he found "feedback from a motivated reader." He
showed us the comments that "Ursula" got on her paper, and they were pertinent in more
than one way. The referee found typographic errors and suggested notation changes, as
well as finding errors where there were none present. The last set of comments were more
important than they might at first seem because they pointed out where Don's presentation
was misleading or overly subtle.
In another example, the referee significantly improved one of the theorems while remain­
ing anonymous. Instead of being content with an acknowledgement to an anonymous
contributer, such a referee could be jealous and publish his own competing paper.
In contrast to such substantive contributions, Don showed us another example with sug­
gestions that he called "facile generalizations" (terminology attributed to Polya): general­
izations that are merely mechanical manipulations of a given a.rgument without creating
new insight .
Don says that refereeing is a "cooperative effort-a correspondence between tens of thou­
sands of world authorities," and he is perfectly willing to exploit the system by letting
referees improve his papers as he helps with theirs.
He showed us a series of letters passing between himself and the Journal of Number Theory.
He had produced a result that seemed novel (could not be found by exploring the standard
pathways in the Math Library) , but since Number Theory is not his field of expertise, Don
was unwilling to claim that the result was not a duplication. He told this to the editors

[94 § 16. REFEREEING (2)]


of the journal and asked for feedback. ( "I put in a lot of time reviewing other people's
papers. This is my chance to get some of that time back." )
The referee reports on that paper found references that Don "couldn't have found in a
million years." The results were similar but not identical, so the referee offered to check
with a famous Russian expert. As Don was availing himself of this offer. someone else was
publishing on the same subject. ( "You have to decide, do you want speedy publishing or
rigorous checking?" )
Finally, he showed us two examples that dealt with ambiguity. In the first, he and David
Fuchs had written a paper entitled "Optimal Font Caching." One of the refereees pointed
out that this paper could be about the caching of optimal fonts, or the best of all possible
caching mechanisms for fonts. An analogous title "Common Sense Amplifiers" was cited.
(Don and Dave solved this problem by changing the title to "Optimal Prepaging and Font
Caching." ) In the second, he had to cope with the IEEE Journal on Coding Theory's
penchant for writing out the words 'one' and 'zero' for the symbols ' 1 ' and '0'. Since 'one'
is an English pronoun, he was forced to use the word 'unity' in one place to make the text
unambiguous.

[§ 16. REFEREEING (2) J5 ]


Hints tor re fereea

( plea l. keep in your fil. )

D . Knuth

In a relat ively new tie ld luch al computinl there i. bound to be I lot of


trash publ ished .inee there Ir. too fev people aVlilable to re coBQi z e the poor
qu ality of mueh ot the mlter i al . But this diacour.... people in the computing
profession from reldinl the literltur. and CIUS.' I poor iml,. tor the profes­
. i on in the eye. of other.. The caly way to prevent thl. ia to hev. e .trong
refereein, Iy.tem. Although the j ob ot ref.reeinl 1& not limpl., it il an �_
port ant responsibility, n.arly the mo.t important thine anyone eould be dolng
for the fi.ld ot computer aclence.

Paperl generally will tall into the follovlnl: categorha:

1. Publ1.h eaa ...tially a. 18 ; the only chanlea nec.. lary ue very limpl e
typographical matter. vbJ.ch een be change4 by the editor.

2. Publ1.h arter author ' . minor revislon; the reteree IUCle.tl pointe
vh i ch muat b. ehanpd before the peper ...ta the .tendardl tor publ1c:et1on.

}. Publ1lh only it the euthor malt.. major r ....ldOll. . (Perhapi the pIper
11 mueh too lone or 11 badly written. 'lbe ...... 1Ie4 pIper ,,1ll be ref....e4 agaln. )
.. . Rej.et . (Th.... 11 noth1n& lalv..nble . )

The l0al. of a re1'uee are to keep the quality of publ1clt 1 011 .. hi;;ll ..
pOlilbloe and al.o to help the author to produe. better paper. In the t'Uture .
Your ...1'..... . . report .hould be dea11De4 to 11ve the author the maximua beneflt,
Try to let every weher to put out the b.st paper
yet not cClllPrClll1 •• 011 quality .
he 11 clplbl. 01' vrit1n&i a p.p.r rarely falla in clt.lory 1 Ibov.. liner put I
pap.r in c.t.Sory 1, 11' yaU 1'••1 the author ean 40 b.tt.r, ev.n it th. plp.r &1
it It ends 18 r... caably 1004! Ii. paper .hould only b. put lnto eat.lory }, it the
lubstene. at the paper 11 canald.re4 .11D11'1cant enOUCh to warr ant the addltional
amount or llbor to rewrlte and ree0l1ll 1der th. p.per.

to jude. the pu1>U.habU1ty or the p.per 70Q certainly lin"" whIt 11 ,oed and
vhlt 11 bId but the 1'ollov1n& br1.1' li.t 11 lncluded her. � .

(. ) Th. paper lhoul4 contr1but. to the .tat. at the art lZJ4./or .hould b. I
100d expodtory p.per . It it 11 purely lXl'OIitory 1t .houl4 be clauly d.dsnat od
•• •ueh.

(b) All t.clllllcal _ter1a1 ...t he .ccvat. (.'1. no incorr.ct equ.tionl,


ete. ). Ii. rater.. .houl4 check th1.a euet'UllT .
(e) '!'he artlele ...t be =dentUldabl., ...a4abl., &lid. written In loed
En&l1lh .tyl• •

(d) '!'he b1bU....ap_ lhaul4 be ad.quat• •

It 11 ta.pt1n& t o poetpoae ...t..... lnI tulia by putt lnl the paper .. 1ds tor
a fev day. .But it tu.. no lccler to do it todq than it ,,111 in a "e.k · . t1&• •
It au t••l th.t au are tor .c.a ... ..... un.bl. to rat.ree the • II' lelle re-
turn it imnedllt. at
• rv •• , the ...1'.1'•••• report 18 expect.d DO 1II0re tllan
four week.. R_ber th.t th. r.t.r.dne cyel. 1. "er1tical path tw." in the
publicat ion proc • • • •

R.turn the aanuacrlpt t o the e4iter; p l.... don ' t mark it up. Yau .hauld
lubmit the report in duplicat.. II_her thet one copy will b• ••nt dlreetly to
the luthor ; 1t i. up to you "hlth... 10U w�t to m.nt lon your n... an it or not.
It you dealr., you aq wr1t. an acccapanylne lettar to the Ultor ""lch of course
>lUI not bO. p.. . e4 on to the .uther. Tht. latte.. , howev..., IlUlt
. not cons t1tute
the refer •• ' . report .

[3 6 § 17. HINTS FOR REFERU;s j


§18. Excerpts from class, November 4 . [notes by P M RJ
Today, Don said, we are going to talk about the use of pictures and illustrations in math·
ematical writing, and about the problem of "getting across the feeling of complicated
algorithms."
But first, by popular demand, Don showed us his first publication. This was a description
of a system of weights and measures known as the Potrzebie System, which appeared in
the pages of MAD magazine in 1957. Any resemblance to the Metric System is purely
coincidental. It an extremely natural and logical system, Don told us. For example, the
units of time were named after the editors of MAD (the new editors substituted their
own names). He felt there was also a need for new units of counting, and so coined the
MAD; 48 things constitute one MAD (or 49, a baker's MAD). Don didn't publish a better
illustrated work until The 1EX book, he claimed, nor another paid one until he wrote for
ACM Computing Surveys some 1 2 years later. MAD forked over no less than $25 for this
research paper, no mean sum thirty years ago. 'The Potrzebie System' still heads the list
of publications on his C.V.
MAD inexplicably declined Don's second article, "RUNCIBLE: Algorithmic Translation
on a Limited Computer," which was picked up by Communications of the ACM in 1959.
Perhaps this was because it contained what even Don admits is probably one of the worst
" spaghetti" flow charts ever drawn. Not only does the chart attempt to illustrate the
entire algorithm, but it contains an error (a misdirected arrow). The article included
a play-by-play account of the algorithm, which helped ameliorate the obscurity of the
chart. Back in those days, Don now admits, he didn't know any better. Likewise, full of
youthful enthusiasm at being able to communicate improvements on a previously published
algorithm (Don was a Junior then), he failed to mention his co-authors in the paper; Don
did the writing but other students contributed illustrations and most of the ideas of the
algorithm. At the time he had no notion there was academic prestige to be gained through
publication, Don confessed. This is, he said, a. common mistake among young authors who
frequently overlook proper acknowledgements in their haste to get the news out. At the
other extreme, he recalled, Paul Erdos once cited a railroad car porter as a co-author.
Diagrams are good if they are kept small, said Don. As an example of a useful one that
is not small, he showed us a fold-out syntax chart for a slightly extended version of the
Algol 60 language. It does convey quite a good impression of what the language is, and
gives computer scientists something to hang on the wall where chemists put their Periodic
Tables.
Don's "Programming Pearls" article came up again. He had ended that paper with the
observation that the only fair test of his WEB system would be this: Someone should provide
a challenge problem, and Don would use WEB to write an ultra-elegant solution to it . .Jon
Bentley responded to this challenge; he devised such a problem and invited Don to submit
his solution for publication. Holding Don to his claim that WEB programs should be works
of literature, Jon then published the solution along with a literary critique. In this review
Jon commented that Don could have eased the exposition of his data structure with a
suitable diagram. Don agreed that this would have helped the reader get a handle on
it (he had described the data structure in words only). He told us that diagrams were

[§18. ILLUSTRATIONS (1) :1 7 '


actually quite easy to do in WEB, a claim that was greeted with a certain skeptical laughter
from the class (all doubtless recalling hours spent wasted trying to get tables j",_,t right ),
Referring again to his 'optimal prepaging' paper (which included a diagram in which t",o
approximately diagonal lines crawled across the page, touching occasionally to indicate a
page fault ) Don told us that the referee had complained that the figure was too detailed.
Don disagreed with this, saying that the detail was there for those who want to see it, but
could easily be ignored by those who don't. Don confessed that he always has been very
concerned with the minutia! of his subject, and seldom thought any detail too trifling to
bother with.
Don discussed a paper he had written with Michael Plass on T]y\:'s algorithm for placing
line-breaks in a paragraph [Software--Practice & Experience 1 1 ( 1981 ), 1 1 19-1 184]. The
main difficulty writing the paper was: How to describe the problem and the new algorithm?
First of all, they chose a paragraph from one of Grimm's Fa iry Tales as "test data" with
which to illustrate the process. As Don remarked once before. it is better to use "real" data
than "sample data" that have in fact been cooked up solely to use as an example. [Grimm 's
Fairy Tales, along with the text of Harold and Maude, are kept on-line on S A I L (an ancient
and eccentric CSD computer).] Corresponding to each line of any right-and-Ieft-justified
paragraph is a real number, positive or negative, indicating the degree to which the line
had to be stretched or compressed to fit the space exactly. In his paper, Don prints these
numbers in a column beside his typeset paragraph. Don used a couple of lines of the paper
itself to show how bad it looks if these adjustments are too extreme ( and of course had to
tell the printers that this was a deliberate mistake, lest they "correct" it).
Don outlined three basic algorithms: first fit (which essentially packs the text as tightly as
possible one line at a time); best fit (which can loosen it up if this works better, but still
works line by line); and optimum fit (optimal in the sense that it minimizes the sum of the
"demerits" earned by the various distortions of each line, taken over the paragraph as a
whole). To describe this last algorithm, Don drew a diagram. It is essentially a graph, each
node on level p corresponding to a different word after which the pth line might be broken.
Edges run between nodes on successive levels, and are labelled by the demerits scored by
the line of text they define. The problem of finding an optimal fit thus reduces to finding
a least-cost path from the top to the bottom node; well-understood search techniques can
be used for this. Don commented that certain "demerit-cutoffs" were used to limit the
number of nodes on each level and thus speed the algorithm. This means that a solution
in which one very distorted line permits all the rest to be displayed perfectly might be
missed.
If the above account is opaque, it only goes to show why diagrams can be so useful!
The article includes histograms to illustrate how frequently 'lEX generates more-or-Iess
distorted lines of text. As he explained, this was biased by the fact that he would usually
re-write any particularly ugly paragraph. A second histogram confirmed that the text was
considerably more distorted when it hadn't been hand-crafted to the line-width that 'lEX
was generating, yet the new algorithm was significantly better than Brands X and Y .
Finally, we saw an old Bible whose printers were so keen to fill out the page width that

[38 § 18 . ILL USTRATIONS {IJ]


they inserted strings of o's to fill up any gaps.
Don found many illustrative illustrations in the book The Visual DispJay of Quantatati"e
Information by Tufte. He also recommended How to Lie with Statistics by Huff, which
advises (for example) that if you would impress your populace with the dazzling success
of the Five-Year Plan in increasing wheat production by 1 7%, then draw two sacks. the
first 6 cm and the second 7 cm tall. The perceived increase, of course, corresponds to the
apparent volumes of the sacks, and 73 is 58% larger than 63 . . . .
Don referred to Terry Winograd's book Language as a Cognitive Process. Algorithms for
parsing English sentences are there illustrated as charts defining augmented transition net­
works or ATN's-nodes correspond to internal states, edges are transitions between states
and correspond to individual words. Winograd also has a nice use of nested diagrams­
boxes within boxes-to replace the more traditional tree diagrams.
We saw a scattergram of smiley-faces of somewhat indeterminate significance; a wit in
SrTN projected Don's amongst them. The idea is that several dimensions of numeric
data can be used to control features on these faces; humans are supposedly wired to read
nuances in facial expressions quite easily.
Don showed us a table from his Art of Compu ter Programming that listed the many,
many states of the Caltech elevator. He said he wished that he'd been able to dream up a
diagram to capture that example more neatly: A listing of events is the best way he knows
to convey the essential features of asynchonous processes.
The third Volume of this tome does contain a large fold-out illustration comparing the
performances of various sort-on-tapes algorithms. Certain subtleties arise from overlaps,
rewinds, and buffering that tend to elude conventional algorithmic analysis. Don's diagram
neatly captures these, and clearly shows that certain sophisticated algorithms-one was
even patented by its author-are in fact slower than traditional methods. Unanticipated
rewind times can cause significant slow-downs, and the chart shows why.

[§ 18. ILLUSTRATIONS (1) .'l D j


3 19. Exce rpts from class, November 6 . [ no tes by TLL]
We spent the first half of class examining the solutions to a homework assignment (see 3 20
below). Don says that the solutions were surprisingly good (see § 2 1 ).
One of the proofs described in that section contains illustrations in four colors. Don says
that color can be used effectively in talks, but usually not in papers (for that matter.
Leslie Lamport says that proofs should never be presented in talks, but only in papers ).
Technical illustrations, even without four colors, cause no end of trouble: Don says that
the amount of work involved in preparing a paper for publication is proportional to the
cube of the number of illustrations. But they are indispensible in many cases.
Don showed us several of the illustrations, charts, and tables from The Art of Compu ter
Programming, Volume 3, and recounted the difficulties in choosing clear methods of pre­
senting his ideas. He also mentioned some technical and artistic problems that he had with
an illustration: At what angle should the truncated octahedron on page 13 be displayed?
His books contain some numerical tables ( "which are sometimes thought to be unenlight­
ening" ) ; Don says that they can sometimes present ideas that can't be demonstrated
graphically (such as numbers oscillating about 2 with period 211" , page 41). Diagrams with
accompanying text are also used. Don made sure that the final text was arranged opposite
the diagrams to which it refers.
The book contains a running example of how 16 particular numbers are sorted by dozens
of different algorithms. Each algorithm leads to a different graphical presentation of the
sorting activities on those numbers (pages 77, 82, 84, 97, 98, 106, 1 10, 1 13, 1 15, 124. 140,
143, 147, 151, 161, 165, 166, 172, 175, 205, 251, 253, 254, 359).

320. A Homework Problem


The Appendix to Gillman's book takes a paper that has horrible notation and simplifies
it greatly. Your assignment is to take Gillman's simplification and produce something
simpler yet. Aim for notation that needs no double subscripts or subscripted superscripts.
This assignment will be graded! Please take time to do your best.
Here is a statement of Gillman's simplification. This is your starting point. What is the
best way to present Sierpinski's theorem?

Lemma. There is a one t e-one correspondence between the set of all real numbers Ii and
-

the set of all pairs ( (nk), (tk)), where (nk)k ;?: l is an increasing sequence of positive integers
and (tkh ?: l is a sequence of real numbers.

Notation. The sequences (nk) and (tk) corresponding to Ii are called (nk) and (tk ") .
The set of real numbers is called R.

[4 0 § 20. HOMEWORK: SUBSCRIPTS AND SUPERSCRIPTS ]


Theorem. Assume that (Aa)" ER is a family of countably infini te subsets of R such that.
for a 'I f3, either a E Ail or f3 E A" . Then there is a sequence of fun ctions In : R R -->

such that, if 5 is any uncountable su bset of R, we have In ( 5 ) R for all but finitely =

man.v In .

Proof. Let the countable set A" consist of the real numbers
--

If a is any real number, define an increasing sequence of positive integers (In by letting
If = nr1 and then, after l�_l has been defined, letting I� be the least integer in the
sequence (nr' , n�· , ) that is greater than I�-l '
. . .

Let In be the function on real numbers defined by the rule

In(a) =
{ ta.
n ' if n l� for some k � 1 ;
=

0 , otherwise.

We will show that the sequence of functions In satisfies the theorem, by proving that any
set 5 for which infinitely many n have In ( 5 ) 'I R must be countable.
Suppose, therefore, that (nk) is an increasing sequence of integers and that (tk) is a
sequence of real numbers such that

for all k � 1 .

Let tj 0 if j is not one of the numbers {n l , n2 , " }


= . . By the lemma, there's a real
number f3 such that nk n � and tk t � for all k .
= =

'
Let 0 be any real number 'I f3 such that a tf. Ail ' We will prove that Q tf. 5; this will
prove the theorem, because all elements of 5 must then lie in the countable set Ail U {f3} .
By hypothesis, f3 E A" . Hence we have f3 O k for some k . If we set n
= = l� , we know by
the definition of In that
lA - t'"
: In (�) il
n - tn -- tn ' - -

But the construction of l� tells us that n = nj' = nJ = nj for some j . Therefore

We chose tn; tf. InJ5) , hence 0 tf. S . I

[§20. HOMEWORK: SUBSCRIPTS AND SUPERSCRIPTS 4/


[Here are additional excerpts from TLL's classnotes for October 16, when the homework
problem was handed out:] The first thing that we learned in class today was that now
would be a good time to buy Leonard Gillman's book ( Writing Mathematics Well ) . :>iot
only have several copies (finally) arrived at the bookstore, but Don has given us a homework
assignment straight out of the Appendix of this book.
The assignment (which is due on Friday, October 30th) is to take the "simplified version"
of the proof in Gillman's case study and to simplify it still further. The main simplifying
principle is to minimize subscripts and superscripts. When we are done, there should be
no subscripted subscripts and no subscripted superscripts. As Don said, "Try to recast
the proof so that the idea of the proof remains the same, but the proof gets shorter."
The original proof was written by Sierpinski. Don told us that Sierpinski was a great
mathematician who wrote several papers cited in Concrete Mathematics, from the year
1909 as well as 1959. But the notation in Sierpinski's original proof quoted by Gillman
was so complicated that it confused even him: His proof contained an error that was found
by another mathematician (after publication).
While the mathematics used in the proof is not trivial, it uses only functions and sets and
should be accessible to us. (This is not to say that it is immediately obvious.) Anyone who
'
is uncomfortable with what sets are, what it means for a set to be countable, or what a one­
to-one correspondence is, may need some help with this assignment. Don recommended
visiting the TAs during office hours as a good first step for those who feel they need help.
(It might also help to remember that Don says, "It's not necessary to understand the proof
completely in order to do this assignment. " )
Don't worry i f the hypothesis of the theorem seems pretty wild; i t is pretty wild. I t implies
the "Continuum Hypothesis." The Continuum Hypothesis states that there are no infini­
ties between the countably infinite (the cardinality of the integers) and the continuum (the
cardinality of the real numbers). From 1900-1960, the truth or falsity of the Continuum
Hypothesis was one of the most famous unsolved problems of mathematics; Siepinski pub­
lished his paper as a step toward solving that problem. Paul Cohen, a Stanford professor,
eventually showed that both the Continuum Hypothesis and its negation are consistent
with standard set theory; so the hypothesis can be neither proved nor disproved.

[Here are additional excerpts from PMR's classnotes for October 23:] The homework
assignment is due a week from today, Don said; so do it as well as possible, and let's not
have any excuses!

[4 2 320. HOMEWORK: SUBSCRIPTS AND SUPERSCRfP'l,


§: n . Solutions t o the Homework Assignment
Most students pleased the instructor by handling this rather well. Either you already knew
a lot about writing, or you have learned something this quarter; in any case the exerrise
seems to have been good practice.
Several answers or excerpts from answers are attached. First is Solution A, an unexplll'­
gated draft that was written by your instructor before handing out the assignment. The
main idea here is to "hold back" before enumerating the elements of a set; YOll can say that
S is countable without writing S {SI , S2 , " ' } ' This solution also simplifies Sierpinski's
=

proof in minor ways. For example, it's not necessary to have the hypothesis C/ f. .8 to
conclude that C/ E -4.il or f3 E A" , because the existence of a family .-I." that satisfies
Sierpinski's more complicated hypothesis is equivalent to the existence of a family that
satisfies the simplified one.
The grader objected to the last sentence in the first paragraph of my proof. He asks, "Has
some 'initialization' of L" been omitted?" He apparently wants k 1 to be singled out a s
=

a special case, for more effective exposition. The sentence makes perfectly good sense to
me, but maybe there should be a concession to readers who are unaccustomed to empty
constraints.
Solution B introduces two nice techniques of a different kind. First, the lemma becomes
a sequence of ordered pairs instead of an ordered pair of sequences. Second, the need for
a notational correspondence between Q and the corresponding sequence is avoided by just
using English words, saying that one is the counterpart of the other. In other words. we
can hold back in giving notations for a correspondence, since plain words are sufficient
(even bet ter at times).
Solution B also "factors" the proof into two parts, one that describes a subgoal ( the crucial
property that the functions fn will possess) and one that applies the coup de grace. Much
less must be kept in mind when you read a factored proof, because the two pieces have a
simple interface. Moreover, the reader is told that the proof is "essentially a diagonalization
technique" ; this gives an extremely helpful orientation. It is no wonder that the grader
found Solution B easier to understand than Solution A.
Solution C i s by another student who found words superior to notation in this case.
Solution D cannot be shown in full because it contains seven illustrations, some of which
are in four colors. But the excerpts that are shown do capture its expository flavor.
A combination of the ideas from all these solutions would lead to a truly perspicacIOUS
proof of Sierpinski's theorem.

[§2 1 . HOMEWORK: SOL UTIONS OJ


Solution A

Lemma. There is a one-to-one correspondence between the set of all real numbers a and
the set of all pairs ( N, T) , where N is a countable set of integers and T is a sequence of
real numbers.

Notation. The set N corresponding to 0 is called N"" and the sequence T is called
(a I , a2, . . . ) . The set of real numbers is called R.

Theorem. Assume that there is an uncountable family of countable subsets ..1."" one for
each real number o . with the property that either 0 E Ai! or f3 E A", for all real a and (3 .
Then there exists a countable family F of functions i : R ..... R such that, if 5 IS any
uncountable subset of R, we have i( 5 ) R for all but finitely many i E F.
=

Proof. If a is any real number, we can construct a countable set of integers L", as follows:
Fork 1 , 2, . . . , let f3 be the ph element of A"" in some enumeration of this countable
=

set. Include in L", any element of Ni! that's not already present in L", because of the first
k - 1 elements of A", .

Now let F = { h , 12 , . . . } be the countable set of functions defined for all real a as follows:

in(o) = { f3n ,
0 ,
if n E L", and n corresponds to f3 E
if n ¢ L", .
.4" ;

We will show that F satisfies the theorem, by proving that any given set 5 � R is countable
whenever { n I in(5 ) # R } is infinite.
Let 5 be a set such that N = { n I in(5) # R } is infinite, and suppose that

for all n E N .

Let t n= 0 for n ¢ N . By the lemma, there is a real number f3 such that N N{3 and =

( t1 , t2 , . . . ) ( f31 , f32 , . . . ) .
=

Let a b e any real number such that 0 ¢ Ai! ' We will prove that a ¢ 5 ; this will prove
the theorem, because all elements of 5 then must lie in the countable set Ai! '
By hypothesis, f3 E A", . Hence there is some n E L" corresponding to f3 , and in(a) f3n =

by definition of in . Also n E Np N , by the construction of L". But f3n tn ¢ in ( 5 ) ,


= =

so a cannot be in 5 . I

[44 § 2 1 . HOMEWORK: SOLUTlOss j


Solution B
Sierpinski's Theorem

Lemma. There is a one-to-one correspondence between the set of all real numbers Q and
the set of all sequences of ordered pairs ((n" t'))'> I , where the first component (n,) is an
increasing sequence of positive integers and the second component (t,) is a sequence of
real numbers.

We shall call the sequence of ordered pairs corresponding to Q the counterpart of Q, and
vice versa.

Theorem. Suppose that there exists a family of countably infinite subsets of the reals R.
denoted by (A")"ER, with the property that Q "I {3 implies either Q E .4.11 or {3 E .-I." .
Then there is a sequence of functions In : R R such that for any uncountable subset 5
-t

of R, we have In (S) R for all but finitely many In .


=

Proof: Using the existence of (Aa)aER, we first construct a sequence of functions In with
the property that for all 0 , and for all {3 E Aa , there exists an ordered pair (n, t) in the
counterpart of {3 such that In (o) =t . The construction is essentially a diagonalization
technique. For each 0 , let the countable set A" be enumerated as

Start with (nl, t l ) being the first ordered pair in the counterpart of {31 ' Proceed inductively,
and let (n. , t k ) be the first ordered pair in the counterpart of {3. such that n. > n'_I . This
selection can be made because the first component of the counterpart of {3. is unbounded.
Thus, we have constructed a sequence of ordered pairs ((n., tk))'�1 with n. increasing and
each (n. , t . ) in the counterpart of {3• . Using this sequence, we then define the function In
by the rule
if n n. for some k;
=

otherwise.
Indeed, In is well-defined since ni "I nj for i "I j . Moreover, the sequence (In) has the
desired property that for every 0 and every {3 in A", there is an ordered pair (n, t) in the
counterpart of {3 such that In(o) t .=

Now we show that any subset S of R for which infinitely many n have In(S) "I R must be
countable, thereby proving the theorem. If In( S ) "I R then there exists a real t ¢ In( S ) .
S o if there are infinitely many In such that In( S) "I R, then there is a sequence of ordered
pairs (n, t) with n increasing and t ¢ In( S ) . Let the counterpart of this sequence of
ordered pairs be {3 . Thus, every ordered pair (n, t) in the counterpart of {3 has t ¢ In( S ) .
Now consider all real 0 ¢ All U {{3} . By the hypothesis, we must have {3 E Aa . We
constructed the sequence (In) in such a way that there is an ordered pair (n, t) in the
counterpart of {3 with In(o) = t . But by the choice of {3 , we have t ¢ In(S). Hence,
In (o) t ¢ In( S ) implies 0 ¢ S . Therefore S must be a subset of All U { {3 } , a cOl.\ntable
=

set, implying that S is also a countable set.

[§21. HOMEWORK: SOLUTIONS 45)


Solution C

. . . If the real number a corresponds to the pair ((nk) , (tk ) ) , then we call (nkk2:1 the
integer sequence of Q and (tkh2:1 the real Jequence of a .
. . . Proof. Note that a given real number a has associated with it both integer and
real sequences, as well as a set of reals A", = {aI, a2, a3 , . . . } . We add to this list and
construct an infinite set of integers La = { I I , 12 , 13 , . . . } in which each I; comes from the
integer sequence of a i .

if n = I; E L a , where (tk) is the real sequence of a;;


otherwise.
With these functions we will establish the contrapositive of the theorem: If jn (S) of R for
infinitely many integers n , then S is countable. . . .

Solution D
As a step toward proving the Continuum Hypothesis, which states that there are no infini­
ties between the countably infinite and the continuum, Sierpinski proposed the following
theorem.
Suppose we have a function, speci a l , that maps every real a to a count ably infinite subset
of the reals (Figure A). Now suppose we make the additional hypothesis that for any two
reals Q of a, either a E special or a E special (Figure B). Then we can dra.w the following
conclusion. There exists . . .

spec(Q) (__�.
� � .
�� __
__ ��
. . .__
.. �.
� __
__ __
__ __•__
• •
• __
__ __
__ __��
. --'
4--�

Figure A. Each real number a determines special,


a count ably infinite subset of the reals.

a
special ()( • )( )( )( )( * )( )( )()()/)
JJ. 1't
special • • ••• • • • )( •

Figure B. By hypothesis, either a E special ,


or a E specia l . Here a is not in special,
so a must be spec( a ) .

[4 6 § 2 1 . HOMEWORK; SOL UTIOI\'S j


§� � . Excerpts fro m class, November 9 [notes by P M R]

Quotat ion . . . a writer expresses himself in quoting words that have been used
before because they give his meaning better than he can give it himself, or because
they are beautiful or witty, or because he expects them to touch a chord of
association in his readers, or because he wishes to show that he is learned and
well-read. Quotation due to the last motive is invariably ill-advised; the discerning
reader detects it and is contemptuous, the undiscerning is perhaps impressed, but
even then is at the same time repelled, pretentious quotation being the surest road
to tedium.

Fowler, Dictionary of 1Iodern English C.,age.

Mais malheur a' l'auteur qui veut toujours instruire! Le secret d'ennuyer est celui
de tout dire.

Voltaire, De la Nature de l'Romme.

II ne faut jamais qu'un prince donne dans les details. II faut qu'il pense, et laisse
et fasse agir: n es t l'aime, et non pas Ie bras.

Montesquieu. IvIes Pensees.

Don's secret delight, he confessed today, is to "play a library as if it were a musical instru­
ment ." Using the resources of a great library to solve a specific problem-now that, to him,
is real living. One of his favourite ways to spend an afternoon is amongst the labyrinthine
archives, pursing obscure cross-references, tracking down ancient and neglected volumes,
all in the hope of finding the perfect quotation with which to open or conclude a chapter .
Don takes great pleasure in finding a really good aphorism with which to preface a piece
of writing. So many people have written so many neat things down the ages, he said, that
it behooves us to take every opportunity to pass them on. Don has been known to take
such a liking to a phrase that he has written an article to publish along with it .

So how are to find that wonderfully apposite quotation with which to preface our term
paper? Serendipity, said Don. Live a full and varied life, read widely, keep your eyes and
ears open, live long and prosper. You will stumble across great quotations. For example,
vVebster defines 'bit' as "a boring tool"-Don was able to use this when introducing a
computer science talk.
Sometimes one needs to go about the search more systematically. For example, Don's
TEXbook consists of 27 chapters, 10 appendices, and a preface. His format demands two
relevant quotations at the end of each of these. His METAFONT book posed exactly the
same problem. How did he go about it?

The first secret, he confided, is Bartlett . There are numerous dictionaries of quotations
[filed under PN 6000 in the reference section of Green Library] , of which B artlett's Familiar
Quotations is the most familiar. It was here, under the heading ' technique' in the index,
that Don found a quote from Leonard B acon deriding Technique as the death of true Art.

[§ 22. Q UOTATTONS 4 7)
:-.Iuw ' TE X ' , in greek, means both 'technique' and 'art ' , so this seemed pretty appropriate
for The TE;.X book where the (greek ) name TEX is explained.
When Bartlett fails, we can try the OED. This incomparable dictionary lists every word
along with contexts in which it has been used; very often it prints a memorable quotation
that incorporates the word in question. Likewise, we can turn to concordances of Shake­
speare or Chaucer to find every single instance in which these authors used any given
word.
Leafing through The TEX book, Don picked out some of his favourites: Goethe on mathe­
maticians ( and why they are like Frenchmen); Paul Halmos telling us that the best notation
is no notation (write mathematics as you would speak it!). Tacitus had something to say
about the macro (or rather, about the ancient politician of that name).
A stiffer challenge was provided by a book that listed the M ETAFONT code defining each
letter of the alphabet (as well as other symbols) in a certain typeface; Don had to come
up with quotes for individual letters of the alphabet. No problem: James Thurber had
proposed the abolition of '0'; Ambrose Bierce had scathing things to say about 'M' in
his famous Devil's Dictionary; Benjamin Franklin once wrote to Bodoni concerning the
exact form of the letter 'T'; a technical report about statistical properties of the alphabet
deliberately made no use of the letter 'E' .
Some of the best quotations are taken entirely out of context. The economist Leontief had
something to say about (economic) output; Don quoted him in his chapter on ( computer)
output. Galsworthy's comments on Expressionists found their way into his section on
expressIOns.
In a pinch, said Don, quote yourself. You could even find someone famous and ask her to
say something-anything!-on such-and-such a subject. In another desperate case, Don
couldn't find anything much that had been said about fonts. No matter, he quoted the
explorer Pedro Font writing about something else entirely (the discovery of Palo Alto, as it
happens). If you are Don Knuth, you may even be able to quote Mary-Claire van Leunen
praising your use of quotation!
Computer technology now gives us another quote-locating resource. When Albert Camus'
The Plague is available online, it will be a simple matter for this note-taker to find the
part in which a writer agonizes for a week before putting a comma in a particular sentence.
and then for another week before taking it out again; just search for occurrences of the
word 'comma' in the text. Don used this technique to find quotations involving the word
' expression' in Grimm 's Fairy Tales and Wutbering Heigbts, both of which are available
on SAIL.
If any member of the class would like to demonstrate virtuosity at "playing the library."
he could try to track down the quotation "God is in the details." Don rather identifies
with God in this, but hasn't been able to track down the reference. A number of people
have assured him that it originated with Mies van der Rohe, but despite reading all the
works and contacting the two biographers of this architect, he has not been able to find it.
Someone told him that Flaubert once wrote "Le bon Dieu est dans Ie detail." Don hasn ' t
the patience t"o search exhaustively in Flaubert '5 voluminous publications, but he did try

§22. Q UOTATIO.\ 'i :


French equivalents of Bartlett-finding the two quotes above (which express the opposite
sentiment) . The God-in-details aphorism remains an orphan to this day.
Don has found another quote that so well expresses his philosophy on the subject of error
that he is having it carved in slate by English stonecutters, to occupy pride of place in his
garden:
The road to wisdom? Well it's plain
and simple to express:
err
and err
and err again
but less
and less
and less.
Mention was also made of indexes for books. The Sears & Roebuck catalogue for 189,
contains the useful advice: "If you don't find it in the index, look very carefully through
the entire catalogue." A British judge named Lord Campbell wanted legislation to compel
writers to index their work (but was never able to get round to it himself).
Tangentially, Don mentioned that the designers had given his TEXbook rather large para­
graph indentations-perhaps it's the style of the 80s, he said. This meant that he some­
times had to add or subtract words to ensure that the last line of each paragraph was at
least as long as the indentation on the following one. The page looks rather strange if this
isn't the case.

323. Excerpts from class, November 1 1 [notes b y TLL]


Today we heard war stories-stories of the wars between Don Knuth and the Scientific
American editorial staff.
However, before we got completely on track, Don told us a little about the the book he is
writing this quarter: Concrete Math.
He said that this summer he went to see Snow White and the Seven Dwarfs and was
very impressed. ( "Who would have conceived, in 1 937, that such a work of art could be
made?" ) He said he was inspired; that he wanted to produce a work of art as inspired as
Snow White, "except that I wanted to finish it in three months."
A book in three months: This means that Don has to "crank out" four pages a day,
including Saturdays, Sundays, and Holidays. Surprisingly, Don says, "Here it is November,
and I am still happy." He says sometimes he gets up in the morning and can't wait to get
writing; at other times he just finds it a chore that he has to do; "but once I get started,
it's easy-starting is the hard part."
At this point he delivered the punch line to his story on inspiration: We have one more
week to finish the first draft of our term papers. We have the good fortune to have two
professional editors who have volunteered to read our papers: Mary-Claire van Leunen
and Rosalie Stemer.

[§23. SCIENTIFIC AMERICAN SAGA (1) 4. ? ]


;"Ioving immediately from his statement that we were lucky to have professionals editing
our work to the stories of his wars with a professional editor, Don showed us a quotation
from The Plague, by Camus [found by PMRj .
"What I really want, doctor, is this. On the day when the manuscript reaches
the publisher, I want him to stand up-after he's read it through, of course-and
say to his staff: 'Gentlemen, hats off! ' "
Of course, the fictional character who made the above statement is portrayed by Camus
as being not only naive but a bit mentally unstable. This doesn't mean that a person
couldn't harbor a healthy enmity for an overzealous copy editor.
Vve now review the correspondence concerning one paper that Don eventually had puh­
Iished in Scientific American (henceforth known as SA):
In the Fall of 1975, Don received a letter from Dennis Flanagan (the editor). The letter
invited him to write a paper, of about 6000 words, on the topic of Algorithms, for SA's
600,000 readers. It offered him a $500 honorarium for such a paper. (This means he got
about eight cents per word or eight cents per 100 readers-depending on how you like to
think of such things.)
After some correspondence concerning the date that the paper was to be received ( Don
had been ill and the date needed to be pushed back), we came to the cover letter for the
original manuscript that Don submitted to SA. He told Mr. Flanagan that he understood
that some editing would take place, but that he had gone out of his way to try to imitate
the "Scientific .4.merican style." Don told them, "It will be interesting to see what you do
to this, my masterpiece."
Don soon got a letter back from Mr. Flanagan acknowledging Don's paper, telling him
that it might have to be "slightly edited," and warning him that it might take a while to
give it the attention it deserves. (Don also got his $500 at this point. )
Finally, 1 4 months later, Don received an edited copy of his paper together with a cover
letter that explained that it had been "edi ted for the general reader." Don was told to
correct any errors that they might have inadvertently introduced and exhorted to get back
to them within the next two weeks.
To put it mildly, Don was not pleased with the results of this editing. Every sentence
had been rewritten. He wrote a letter to Martin Gardner-a letter written more to vent
frustration than in expectation of achieving any result-in which he stated many of his
grievances. One of his comments covers the general tone: "I was astonished to see how
many editorial changes were made that took perfectly good English and turned it into
something that would be worth no more than B on a high school term paper."
-

In addition to showing us his letter to Gardner (and Mr. Gardner's sympathetic response)
he showed the class the original and the edited versions. Among SA's changes: Changing
all uses of 'we', transforming some long sentences to several short sentences, transform­
ing some short sentences into one long sentence, removing commas (commas that Don
found necessary), changing 'which's to 'that's, removing technical jargon, changing 'most
common' to 'commonest', and introducing a few errors. ( Don found many of the changes

[5 0 §23. SCIENTIFIC AMERICAN SA(;A ( I ; :


gratuitous, but the editorial introduction of errors was useful because it meant that Don's
exposition had not been clear enough.)
The next letter we saw was the cover letter for the, now re-edited, manuscript that Don sent
back to Dennis Flanagan. He mentioned his extensive re-editing, stated that he appreciated
some aspects of the editing more than some others, and asked to see the galley proofs: he
said viewing the proofs was especially important since there is "so much technical material
that is typographical in nature."
Two weeks later Don got back the proofs and a letter. The letter argued successfully with
some of Don's objections to the original editing job (they stuck by 'that' instead of 'which'.
hurray!); less successfully, SA refused to budge on 'commonest' (boo). The letter also said
that sending proofs to an author was unprecedented. But the printer was having a terri ble
time with the mathematics, so they made an exception. (Don pointed out that this was
largely dictated by the printing mechanisms they were using.) But it was a good thing
the proofs were sent, because important changes were made during a 1 .5-hour telephone
conversation.
By the end of class, about the time that Don showed us his second letter to Martin
Gardner-the one in which he said he shouldn't have been so frustrated in the first place­
Don admitted that some of the disputed changes really had been appropriate ones. He said
that the original copy editor had improved his article in some ways, but that his further
editing had improved it still further. At the end everybody was happy. ( Music up.)
As a final parenthetical remark, he told us about the way that the (quite long) captions
for illustrations in SA are typeset: The linebreaks are determined by hand. The final
line always ends at the right margin (there'S no extra white space). To achieve this, the
SA copy editors must count letters and reword the captions until they fit. One of the
methods that they use to make things fit nicely is to start at the end of the caption and
start removing 'the's. ( "It is not placed at root of tree because it is too far from center of
alphabet." ) At least, this was the system in 1977.
Among comments about how the SA editorial staff is overworked and how he shouldn't
have been so upset, he did get off a parting shot: "After spending all this time doing
crazy stuff like caption filling, it's no wonder the copy editor had no time for polishing my
article."

§ 24 . Friday the 13th , part 24: The Class notes [notes by P M R]


The Story S o Far. Reader3 will recall that our hero, 'Pro/ , Don, i3 locked in mortal
combat with Scientific American, a journal wh03e global reach i3 exceeded only by it... ed­
itorial hubri3. Will Don'3 definitive Algorithms article reach the world un3cathed? Or
will it 31Lffer the death of a thowand 'improvement3 ' at the hand3 of a hoard of dyslexic
copy-editors ? Now Read On . . .
On March 25, Don received the page proofs for his article, which was to appear in the
April edition. ( "Ever since Martin Gardner's famous April Fool hoax, I had wanted to get
into an April issue," he mused.) Don picked up the phone and spent the next hour and a
half in damage-limitation negotiations with an editor code-named TEB.

[§24. SCIENTIFIC AMERICAN SAGA (2) 51 ]


Some straightforward errors were easily corrected: A ' 1 ' had metamorphosed into an ' 1 '
and an '0' into a '<p ' . Typesetters who are unfamiliar with mathematics in" ariably find
creative things do to with this "empty set" symbol, Don said. Many problems show up
only at the page-proof stage. For example, one page began with the solitary last line of a
paragraph and then broke with a new subheading. Since the paragraph could just as easily
introduce the new subsection as conclude the previous one, Don just moved the subheading
back to precede it, on the previous page. Don also got his fioor brackets restored where
square brackets appeared on the page proof.
Don didn't get his way on everything, though. Brackets were used interchangeably with
parentheses in a mathematical formula, despite Don's protest that the former have special
meamngs.
Neither was Scientific American ('SA', hereinafter) able to get hold of a photograph of
a particular Mesopotamian clay tablet that is housed in the Louvre. It is a table of
reciprocals, and is probably the earliest example of a large database that was sorted into
order for ease of retrieval. Don thinks this object definitely deserves a place in the hearts
and minds of CS folk, being perhaps the first ever significant piece of data processing.
Even a modern computer might need a second or so to do the work involved.
On the whole, Don was pretty happy with his article. It enjoys a continuing success
as an SA reprint; thousands of copies are still sold to schools ( wi th the page references
carefully renumbered). As far as Don knows, it's the only one of his articles to have been
translated into Farsi (Persian). He showed us that in this language, as in others where
the text runs right to left across the page, mathematical formulre are not reversed. The
word 'hashing' invariably gives translators pause; it becomes 1 4 characters in Chinese, and
a French translator of one of his books once put in a call to the Academie Fran<;aise to
establish the authorized equivalent.
All the re-editing was painful at the time, admits Don, but in the long run he has come
to agree that this cooperative effort did much to remqve the jargon and make the paper
accessible to a general audience. Martin Gardner, Don told us, attributes his success as a
mathematics writer to the fact that he is not a mathematician.
Don's paper for the IEEE Transactions on Information Theory makes for a sadder tale:
They made such a mess of it that Don decided the game just not worth the candle, and he
advises everyone to read the Stanford CSD Report instead. For example, IEEE says 'zero'
and 'one' instead of '0' and '1'. Don likes to use 'lg' to mean 'log to the base 2', but they
changed this without explanation to 'log despite the fact that to most people this latter
'

means 'log to the base 10'; or to number theorists, when it means the natura.l log (base e ).
Not the greatest copy-editors, Don sighed.
More recently, Don wrote for the October issue of the ACM Transac tions on Graphics,
and encountered some really shocking copy-editing. They changed ' . . . data has to . . . ' to
' . . . data have to . . . ' . Now long ago Don was told that 'data' is really plural, but everywhere
it is used both as a singular or a plural, even in the reliably conservative (,antediluvian"
chimed Mary-Claire) New York Times. Don thought it quite right to use it as a singular
when referring to data as some kind of collective stuff. Don wrote and complained that

[ 52 §24. SCIENTIFIC A MERICAN S.�GA (2)1


the ACM should certainly know about data. In the end, Don kept everyone happy by
c hanging the sentence to read ' . . . data must . . . ' .
Mary-Claire van Leunen sanctioned the term ' Automata Theory' , although one would not
normally incorporate a plural adjective into a compound noun. But no-one has ever said
'automaton theory', and no-one ever will.

ACM did gracefully admit to and correct some straightforward mistakes, such as 'this
number plus that number are equal to 63'. But where Don wrote 1000000 they substituted
1 , 000, 000. Don objected that although this might be justified in text, his use is perfectly
O K in a formula. Well then, they replied, write 106 . Fine, said, Don, but what do I do
when the number is 1234567? The IEEE standard here is to insert spaces, thus: 1 234 567 .
Don doesn't like this in formulre, but agrees that it may be useful in a high-precision
context, such as numerical tables.

Don recalled a remark by George Forsythe that every scientist should try to write for a
general audience--not just for other scientists-at least once in his life. Don has done
this three times now, so feels that he's done his bit! He gave his first such lecture to a
non-technical audience in Norway and found it surprisingly hard to understand their 'mind
set'. The problem is to make the talk interesting, but convey how it feels to a computer
scientist to do computer science. The public probably imagine that mathematicians sit
and factor polynomials all day, and that CS types design videogames. How to convey the
soul of the subject to them? In this lecture, Don presented a sequence of algorithms for
a search task. S ince we all have to look up information in large tables or indexes now
and then, he hoped the audience would have a clear intuition of the problem. Brute force
searching is clearly too slow; binary search is natural and powerful; hashing is better still ,
but very unintuitive to most people. Don was asked to write up his talk for a Norwegian
magazine called Forskningsnytt, ' Research News' ( a sort of Scientiiic Norwegian). In the
course of doing so he learned enough of the language to wri te v and h instead of I and ,. to
designate left and right sons in a tree structure. Dr. Ole Amble, a numerical analyst who
was one of Norway's computer pioneers, helped Don with Norwegian style on this article ,
and got interested in search algorithms as a result. He asked Don whether there mightn't
be a way to combine the advantages of binary search and hashing? Don at first told him
"obviously not," but then realized what Amble meant . . . alas, too late to include in the
just-published Volume 3 of ACP. But this combination of methods made a nice conclusion
to his SA paper, which was based on this Norwegian prototype.

It was in April of 1977 that Don's travails with SA prompted him to investigate typesetting
for himself; in May of that year he designed the first draft of '!EX and spent his sabbatical
(and ten more years) perfecting it, putting Volume 4 of ACP on the backburner.

We had a few minutes left to look at other changes that SA made to Don's original
manuscript. In the first case we looked at, there seemed to be no reason for restructuring
a sentence to put Amble's name first instead of the motivation of his discovery. But Mary­
Claire noted that SA always tries to stress the human contributions in science, sometimes
at the expense of the ideas. Don also mentioned another surprising thing he learned about
SA's editorial policy: They never display equations. (PMR knows at least one scientist who
refuses to read S A for this very reason-'How can you explain science without equations?-

[§24· SCIENTIFIC AMERICAN SAGA (2)


P ah l ' )

§ :l5 . Excerpts from class, November 16 (notes by TLL]


After a brief (but charming) musical prelude, Don demonstrated to us that we are not alone
in bei ng concerned with the mechanics of writing. He showed us four small publications
that touched on some of the humorous aspects of written rhetoric.
We briefly viewed a Russell Baker column entitled "Block That That Cursor" ; a "Peanuts"
comic strip with a punchline concerning comma placement; a. quotation from the Nelv York
Times ( "Plagiarize creatively, but quotes can be dangerous if you don't acknowledge the
source" ); and an article by Richard Feynman in the Caltech Alumni magazine. Feynman
discussed his disappointment with his experience of serving on the Challenger Disaster
Panel; he complained that instead of discussing ideas, the panel spent all their time " word­
smithing" (deciding how to reword or re-punctuate sentences in the committee's report ).
Feynman's dismay at the amount of time he spent dealing with commas, wicked-whiches,
and typographic presentation is not unique. Don said, "Word-smithing is a much greater
percentage of what I am supposed to be doing in my life than I would have ever thought.
That's one of the main reasons I am teaching this course."
Don also showed us what he thinks is a wonderful piece of writing: a spoof on the Sam
Spade genre, full of detectives, blondes, . 38's, and the 'sweet smell of greenbacks'. It turned
out to be a passage from "Getting Even" by Woody Allen. Likewise for the term papers,
he said, try to have a genre in mind (though perhaps not this one) and do a good job in
that genre.
To help prepare us for the guest speakers coming up soon, Don handed out copies of several
of their works, encouraging us to read them as examples of good practice. First he handed
out an "Editor's Corner" article published by Herb Wilf last January:
This issue marks another changing-of-the-guard for the MONTHLY. Paul Halmos' act
will be a tough one to follow . . .
Wilf's article contains a nice exposition of problems related to Riemann's famous unproved
Hypothesis.
Don also showed us another draft of a paper by Herb: "n coins in a fountain" . This title.
he said, was just too good to pass up, even though it includes a formula. But Don would
have capitalized the n instead of putting it in italics. As for the objection about starting
a title with a symbol, why shouldn't we regard N as simply another English word ( after
all, it appears in most dictionaries as the first entry under 'N')? The paper, however, was
never published. Delving more deeply into the subject, Herb found that it had all been
done before. C'est la guerre.
The next guest speaker after Wilf will be Jeff Ullman, who will tell us how to become rich
by writing textbooks. Don recommended that we look closely at Chapter 11 of Jeff's book
Principles of Database Systems (second edition), which shows "excellent simplification of
subtle problems and algorithms."

[54 §25. EXAMPLES OF GOOD STH£i


Don handed out two examples by the third guest speaker Leslie Lamport. One. frotH
Notices of the American Mathematical Society 34 (June 1987) is entitled "Document
Production: Visual or Logical?" and Don said "It's a 'flame' but very well written so I
wanted you all to read it. It's a nice polemic that takes the 'WYSIWYG versus Markup'
controversy and reformulates the problem along more fruitful lines. " The other Lamport
article is entitled "A simple approach to specifying concurrent systems" : it will soon be
published in Communications of the ACM.
Don says the latter paper is the best technical report he has seen in the last year or so.
The paper is unusual because of its question-and-answer format. While dialogs have been
used effectively by experts in other fields (such as Socrates, Galileo, George Dantzig, and
Alfred Renyi), this is the first time, as far as Don knows, that such a format has been used
in computer science.
Before moving on to the next handout, Don told us about writing his book Surreal
Numbers. Like Leslie Lamport's paper, Don's book is presented as a dialog. Don's
dialog presents some ideas that John Conway told him at lunch one day (Don wrote
the ideas down on a napkin and then lost the napkin). The most extraordinary aspect
of this book is that Don wrote it in six days ( "And then I rested" ). That week was
very special for Don. ( "It was the most exciting week in my life. I don't think I can
.
ever recapture It. " )
When Don wrote the book he was in Norway. He was in the middle of writing one of
the volumes of The Art of Computer Programming (isn't he always?) , and he did not
expect Jill (his wife) to be sympathetic when he told her that he wanted to write yet
another book-even if he did think he could write it in a week. Perhaps Jill knows
more about Don than Don knows about Jill, because she not only didn't complain
but she got quite into the spirit of the thing.
Just what was the spirit of the thing? "Intellectual whimsey" probably isn't far off.
Don rented a hotel room ( "near where Ibsen wrote" ) and spent his week writing,
taking long walks ( "to get my head clear"), eavesdropping on his fellow hotel guests
at breakfast ( "so I could hear what dialog really sounds like" ), and pretending that
Jill's visits were clandestine ( "we had always read about people having affairs in hotels
. . . " ).
Don said he wrote "with a muse on my shoulder." Every night's sleep was filled with
ideas and solutions; before dozing off he would have to get up and write down the first
letter of every word of the ideas he had (and he would spend the morning decoding
these cryptic scribbles) . He told us that he was more perceptive during this week­
his description of the King's Garden during an evening walk was worthy of Timothy
Leary.

All this prolific word production must have left him in verbal debt: When he finished
the book he tried to write a letter to Phyllis telling her how to type the book. He
couldn't. Except he must have eventually-the book is still in print and sells several
hundred copies a year (in seven languages).

[§25 . EXAMPLES OF GOOD STYLE


Still another handout was part of a chapter written by Nils Nilsson and Mike Genesereth
for their new book Logical Foundations of Artificial Intelligence. Chapter 6. entitled " :'-ion­
monotonic reasoning," presents a new area of research at the level of a graduate student.
Don says that the chapter has an excellent blend of formal and informal discussion, wi th
well-chosen examples; this subject had never been "popularized" before, so the task of
writing a good exposition was especially challenging. Don also praised the authors' typo­
graphic conventions (for example, logic is presented using a "typewriter" font).
Don said that we already have Mary-Claire's book, so he didn't have to introduce her to
us. But he ran across some electronic mail she had written recently, and thought it was a
particularly elegant essay, so he passed it along (see §26 below). Computer scientists and
mathematicians are way behind real writers when it comes to exquisite style.
Finally, just in case we still craved more good examples to read, he handed out some
excerpts from a paper written by Garey, Graham, Johnson, and Knuth. Don says that
he included it because it has two proofs of difficult theorems: proofs that are not, and
probably could not be, trivial.
Don tried to interest his readers in the first proof (and algorithm) by presenting an example
as a mathematical puzzle. He says that by solving the puzzle the reader can see that the
problem is not simplistic-but that an algorithm might be possible. ( "This builds exactly
the right mental structures in the reader's mind for this particular problem, I think. The
algorithm itself is the worst algorithm I have ever had to present-but there is probably
no simpler one." ) While flashing us part of the algorithm-complete with more cases
than could fit on the monitor-Don said, "The ability to handle lots of cases is Computer
Science's strength and weakness. We are good at dealing with such complexity, but we
sometimes don't try for unity when there is unity."
The second proof involves the reduction of one problem to another. The reduction requires
a very complicated system-a system that Don found was well served by an extended
biological metaphor and some involved terminology. As his metaphor, he chose the jellyfish
( "an unrooted, free-floating tree" ); he named pieces of the data structure stems, polyps,
tentacles, heads, and nematocysts (the biological term for stingers).
Mary-Claire asked, "If that structure turns out to be generally useful, are you going to
be sad that you called it a nematocyst rather than a stinger?" Don said No, but he
has been sorry about names he has chosen in the past. ( He wishes he had called LR(k)
grammars L(k) grammar s.) When he was writing The Art of Computer Programming,
Volume three, he used the word "Daemon" to refer to what are now called "Oracles," but
the Oracle replaced the Daemon before it was too late.
Another last minute terminology substitution happened when Aho, Hopcroft, and Ullman
substituted "NP-complete" for "Polynomially-complete" in their text on Algorithms­
even though they had already gotten galley proofs using the original name. The name
was changed at that late date as the result of a poll conducted throughout the Theoretical
Computer Science community (suggested names were NP-Hard, Herculean Problem, and
Augean Problem).

[5 6 §25 . EXAMPLES OF GOOD STYLE!


------- Forvardod Kossaso

Ropliod: Forvardod 14 luS 85 1 2 : 1 1


Return-Path: <mevl>
RocoiYod: by lovis . 1RPl ( 4 . 22 . 0 1 /4 . 7 . 34)
id lA20294; Kon . 5 luS 84 1 2 : 2 2 : 47 pdt
From: mcvl ( Kary-Clairo van Lauuon)
Ko.aag.-Id: <840805 1922 . 1A2029401.wis . 1RP1>
Oat.: 5 luS 1984 1 222-PDT ( Konday)
To : adam• • ar . urt;Stanford . BITJET. a ••nt.OSha.ta . baldwinGYal • . 1RPA. bask.tt .
baigel, bill, benttVi.e-rlch . ARP 1 . blatt , cUIl.OD, dlcv&Z ! j mcl.
ellisOYal •. 1RP1 • • atrinGKIT-IX . 1RPA. foir. SU.lson . soodmanOYal • . 1RP 1 .
guarino . h.rbiaonGultra . DEC . h.ub.rtOYal • . ARP1 . hominS.
hsuClrlang . DEC , john •• on, tarlton, kel •• y , kac. larrab •• , llYin, 11,
10VA.yOYal • . 1RPA. mbrown. mccall . paOX.roz . ARP A . mcdani. l .
mcvl-• • aay. OPurdu• . 1RPA . mcvl-. a.ayaOVa.hin�ou . ARPA • •••hanOYal • . 1RP1.
minoyor.z . DEC. naushton. park.rOYal •. 1RPA. p.t.rs . p.tit . philbin.
pi.rc •• ramahaw. LEICBTERJORl.I .DEC . r••• • r.idOGlaci.r.
r.nt.chGuuc . CS.ET. ritt .rOYal • . ARPA. rOb.on. paOX.roz . ARP 1 .
rnttonb.rsOYal• . ARP1 . ahi• •ra . siddall • ••ok.y • • 0 . paOX.roz . 1RP1 .
st ••art . swart , troweXero x , vall , wick, vilhala. witt lnblrstYal • . ARPA ,
vli7.Yal.-RinSOYal • . 1RPA. youuSOYal • . 1RPl
Cc : mcvl
Subj .ct : '6 : "hop.fully"

Q: I have oft.n h.ard that it i. incorr.ct to us.


"hopdully" to moan "it is hoped . " But tho Random Houa.
dict ionary lista that a. d.finition numb.r two and Siv •• no
yaminSS of any kind . Is this u.as• • tandard?

1:

Th. atory on "hopdully" is on. of tho .trans ..t in .od.m


£ASlish .

"Bopdul" haa had two '.u •• ovor .inc. it firn app.ared in tho
lansuas. late in tho 15th cantury. A p.non coulcl b. hopdul
(.zp.ctant • • as.r. d •• iroue) ; or a .ituation could b. hop.ful
(proai.ins . au.picioue . br1&ILt ) .

As with molt adj .ctiv • • • both of th... "hop.ful". nsularly


produc.d "-ly" adverbial forlU . but tho kind of hopduln ... that
m.an. exp.ctant and .asor proelllcocl uvorb. aou nuily than tho
Und that ••an. proaia inS and br1&ILt. Thor. ' . nothins
my.t.rioue about thet diff .ronc. in fr.qu.ncy . 1 porson can
carry hiae.lf hop.fully Or .y. a d •• irabl. obj.ct hop.fully or
pr.par. himl.lf hop.fully for a po •• ibl. futur. . tap.r.onal
sub.tantiv• • • on tho othor haneI • ••rv. 1 ••• oft on than p.r.onal
ono. at tho h.u of tho kind of activo v.rb. w• •oeIify with
adv.rb. of mann. r. .on.th.l ••• • a was.r can b• •hepinS up
hop.fully. a day can b.,in hop.fully . tho oa.na can ausur
hopdully. All porfectly .trai&ILUonard and normal. Th. fir.t
OED citation for "hop.fully" in thi• • •coud ••na. i. froa 1531.

Early in tho 11130 • • this ••cond ••u. of "hopdully" b.san to


appear in & dit t ereAt kind ot cOD.truct ioD, a. what ' . called a
IlntencI adverb . Slntlncl adverb. are p� ot a cla•• ot

[§26. MARY-CLAIRE VAN LEUNEN ON 'HOPEFULLY' 5 7/


e x p r e S S 1 :> r:. s : � a. : can :n o d !. t y '" h o l e claus e s ; such expre s s l o n s are
ca.lled a.�s o l'.lt e s . L � o k a.t s o m e s en t e n c e advert)! a.t 'iJ o r k :

Intlr es t ingly , most mathlmat ici ans failld to notici thl


correspondinc i .

Presumably h. knov, vhat ho " doing .

Regrottably thoro i, no r e. ody for this kind of infoct ion.

Fortunately v. managld to glt out betore he notic.d us .

Hopefully tho veathor vill cloar up b.foro it ' . ti•• to


leavi .

Thon in tho .arly 1geO. attack. againlt "hopefully" b.gan to


app.ar in print . That ' . about tho right las tim. toi u.ag.
eontrovor,io. , and I look.d forward to fisuring out hOI to vi.ld
my cUdgol. I b.li.v., by tho way , that if rational dobat. had
ensu.d I vould han b ••n against "hopefully" u a ....unc.
adv.rb . Untortunately . r...ou ueYer entered into it .

Tho attacks v.ro by and largo utounc U ..gly ill-informed. So ••


managed to con.,.y that it 1I'al "hopetully" al a 'Intinci advlrb
to vhieh th.y ••r. oppos.d but lacked tho t.chnical vocabulary
vith vhich to oxpro.. tho idea. Soa. oppo,od both "hopeful" and
"hopefully" in tho ...... proaising(ly) , . au.piciou. (ly ) ,
bright (ly) . Mo.t bizarr e of all , soa. took it upon th....lv ••
to oppose all 'entlnee adv erb• .

Th. author. of th••• attack. pr •••nted th....lv •• a. d.f.nd.r.


of tho purity of tho lananag. again.t tho olll laught Of licked
barbarian. . Th.y a •••rted (apparo..tly lithout •• or f ••ling tho
noed to chock tho •• id.nc.) that tho obj.ct. of th.ir attack
v.r. v.ry r.c.nt addit ion. to tho lananag. -- true in tho cu.
of "hopefully" a. a ••ntonc. adverb , but not true at all of
imp.rsonal "hop.ful" and "hop.fully , " and coapletely zany vh.n
it coa •• to ....t.nc. ad••rb. in g.n.ral. (In fi •• ainut •• with
tho OED I vu able to find "c.rtainly" b.ing ued U a s.nt.nc.
adv.rb in 1300 . ) On. ca.. to ••• that th••• ••If-proclaimed
lananag. d.f.nd.r. kn.1 nothing Of ••eA tho ao.t .l ...ntary
tool. of tho trad• .

So . What .hould .. do? lenor. tho ienorant bully-boy. or


knuckl. under?

When I II' younS l one. attoll4ld a .chool at lhich on. sroup 01


girl. declared that anyo... who lor. y.llol Oil Thur.4oy IU a
fr.at. Th. ro.t of no rocoenized tho intord1ct U arbitrary,
irraUonal, aII4 a.an. Ita ouly purpo •• wu to wound . V. talked
about it ..ons �ur••l••• , aII4 I. tried to fir. our ••l••• up to
•• ar y.llol Oil tho fatal 4oy. But th.r. w.r• • tzt .eA lon.ly
hour. b.tl.eA lut .tudy hall Oil Vo4n••4oy aII4 S.tt ing dr ••• ed
Thur.day aoraing . lobody o";ed v.ry ....y it ... of y.llol
Clothing an,..ay ; quit. lik.ly th.y I.r. alr.ady in tho laundry .
AII4 y.llol" not a flattorins color in tho aorains light . lot
at all flatt.ring . T.ndo to 1IIk. tho lear.r look sr.eA.

I moved in tho aiddl. 01 tho y.ar, 10 I ha.. ..0 i4•• Ihothor


.oao bra•• child '.'lltual1y lor. a y.1101 b10no . , or y.llol
lock• • or a y.llol handk.rchi.f to that .chool Oil • Thur.4ay.
Rop.fully th.r. IU aor. than 0... ; hop.fully tho girl. Iho l&1d
do... tho origi..al rule 1.1 hOI jaunty tho r.bel. looked ill thoi:
yellol outfit. ; hop.fully -- oh , d••outly to b. hoped -- th.y
all b.c ..� b•• t frie ..4. and b.hayed b.autifully to on. anoth.r
...4 n.v.r did any thins petty or aaliciou. again a. long a. they·
liVid.
§26. MARY-CLAIRE VAN LEUNEN ON 'HOPEFULLY· ]
[58
§ 2 7. . Excerpts fro m class, October 28 [notes by TLL]
Class opened as Don introduced today's guest speaker: Professor Herbert Wilf. Professor
WiIf is on the faculty at the University of Pennnsylvania but is spending his sabbatical
year at Stanford.

As Wilf took the dais he pronounced this "a marvelous course." ( "Taken earlier in my
career it would have saved me and the world a lot of grief-mostly me." ) The course
topic is one of daily concern for him; apart from writing his own papers he edits two
very different journals: the American Mathematical Monthly ( "The MONTHLY" ) and the
Journal of Algorithms.
The Journal of Algorithms was founded in 1980 by Wilf and Knuth and is a research
Journal. Results are reported there if they are new, if they are important , and if they are
significant contributions to the field. If these conditions are met, a little leeway can be
given in the area of beautiful presentation. But the MONTHLY is an expository journal. It
is a home for excellent mathematical exposition. (It also seems to be a popular place to
send "proofs" of Fermat's Last Theorem.)

Though he told us that he feels "older without feeling wiser" and is uncomfortable setting
down rules for a human interaction that "involves part brain and part hormone system"
he gave us several pointers.

Get the attention of your readers immediately. Snappy titles, arresting first
sentences, and lucid initial paragraphs are all methods of doing this.

As examples, he showed us a paper by Andrew M . Gleason with the title "Trisecting


the Angle, the Heptagon, and the Triskaidecagon" ; a paper by Hugh Thurston that
began "Can a graph be continuous and discontinuous?"; and the first paragraph of
an autobiographical piece by by Olga Taussky-Todd that started with some insight
into the author's fascination with matrices. Gleason's paper was attention-getting
mostly because Gleason is famous-"that helps" .

Get everything up front. Tell your readers in plain English what you are going to
write about and let them decide for themselves whether or not they are interested.
( "You. can quintuple your readership if you will let them in on what it is that you
are doing." )

Remember that people scan papers when they read them. Potential readers
will skim looking for statements of theorems; if all of your text is discursive they will
having nothing to latch onto. Summarize your results using bold face ( "or neon" )
so that the page flippers can make an informed decision. Similarly, drop notational
abbreviations and convoluted references in the statements of theorems.

A little motivation is good, b u t readers don 't like too much. Presenting
examples that do not yield desired results can be quite useful, but the technique
loses its charm after a small number of such examples. (Far from overdoing this
technique, many writers will introduce mysteriously convenient starting points for

[§27 . HERB WILF ON MATHEMATICAL WRITING 59]


their theorems. "Whenever I see ' Consider the following . . . ' I know the author
really means to say ' Here comes something from the left field bleachers. ' "

He gave us the name of three books (not written by anyone in the room) that he considers
superb books of mathematics:

Problems and Tbeorems of Analysis, by P6lya and Szego. It has a "Problems"


section and an "Answers" section. The problems are self-contained, digestible pieces
of more complex problems. The answers are on the spare side and have been the
cause of much head-scratching over the years. By solving several of these self­
contained problems, a reader can arrive at an understanding of major results in the
field.

Introduction to Tbe Tbeory of Numbers, by Hardy and Wright. This book is


"short on motivation." Theorems are stated and proved concisely and precisely. In
the preface the authors claim that "the subject matter is so attractive that only
extravagant incompetence could make it dull."

Matbematical Analysis, by Rudin. This book is rigorous. It teaches the reader


what is and is not a proof. A reader who survives this book feels strong.

Wilf commented that all three of these books are quite dry, but Knuth objected (along
the same lines as those used by Hardy and Wright in their preface) and Wilf amended his
statement: Each of these books is very lean.

Discussing the change of his own writing style over time, he told us that when he was
younger he didn't have much self esteem and stuck to established forms . Now that he
feels better about himself he has developed his own, much chattier, style. (Speaking of
chattiness, he is also a fan of the use of the first-person in technical writing.) He says he
aims to be chatty leading up to a proof, prove it in the "lean and mean" style that Rudin
would use, and then be chatty again after he finishes the proof.

The last things that Wilf discussed were two handouts ( § 28 and § 29 below): "Enumeration
of orbits of mappings under action of en , the cyclic group," and "Counting necklaces."
Each handout discusses the same mathematical problem, solved the same way. "Enumer.­
at ion of . . . " takes a half page; "Counting Necklaces" takes four pages.

Some audience members will appreciate the half page of exposition that is condensed to
the word "evidently" in the shorter paper; some will merely be annoyed by it. As the .
Montbly editor he gets letters from people who complain about the informal style creeping
into recent publications. "Mathematics is a serious business, not a comic pursuit," said
one such letter.

Finally, Wilf doesn't mean to say that either of the two approaches is superior ( "They are
the two sides of the coin" ) ; he means for us to examine each and decide what techniques
we want from each.

[60 § 27 . HERB WILF ON MATHEMATICAL WRITING]


§:38. From Acta Hypermathica

Enumeration of orbit. of mapping. under action of C", the cyclic group

B. Nimble

Say that a mapping 1 [n] -- [k] is irreducible if V{a E C"i a "F 1}


:

we have l o a "F I. If M{n,k) is the number of these and F{n,k) is the


number of orbit. of mappings 1 under the action of C,., then evidently

F{n,k) = E M{d, k) (n � 1) (1)


ctl"
But since, clearly,
E dM{d, k) = k� (n � 1) (2)
ctl"
we find from (2),
M{n,k) = (l In) E �(n/a)kct
ctl"
and from ( 1),
F(n, k} = E(l/a) E �(dI6}k'
ctl.. II"
= (lIn) E �(n/a)k"
ciJ"
where � is Euler'. function, and the lut equality follows from simple mao­

nipul ations .

[§28. WILF'S FIRST EXTREME 61 J


Counting necklaces
§ 29. From M athWorld
R. U. Certain

Suppose we have a supply of beads of Ie different colors, and we wish to


construct necklaces of n beads. How many different necklaces can we make?
The word 'different' is to be understood in the sense of rotations; two necklaces
are equivalent if one can be carried into the other by a rotation. Another
interesting problem would have resulted if we had allowed the customer to
pick up the necklace off of the counter and flip it over. In the latter case we
would have been studying equivalences under the action of the dihedral group
(generated by a cyclic shift right by 1 unit and a 1800 flip) instead of under
the Clle/ic group, which is in fact what we're going to talk about here.
For instance, if n = 4 and Ie = 2 there are 6 different necklaces, and these
are shown below.

The problem is to find F(n, Ie), the number of different n-bead necklaces
of at mo.t Ie colon (let'l call th_ (n, Ie) necklaces) .
Among all (n,le) necklaces we distinCUish a subset that we will call the
' prime ' necklaces. Say that a necklace is prime if it doea not result from
concatenating a number of repetitions of a ahorter pattern. ThUl, among
the (4, 2) necldac. above, the am one is not prime because it results from
stringing together 4 identical .horter atrlnp (via., 'A'). The third and sixth
ones are also not prime, whereu the second, fourth and fi.fth ones are prime.
Let M(n, Ie) denote the number of prim. (n,le) necklaces (e.g., M(4, 2) =

3).
The re&lOn for concentrating on these prime necklaces will now appear.

Con.truction. Berin witb some divisor d of n &ltd a prime


necklace. (d, Ie)
Cut tbe necklace immediately to tbe rigbt of one of it. beads. This yields a
linear .ftring of lengtb d. Make n/ d copies of that linear .ftnn. &ltd concatenate
them to produce a .fin,le linear .ftrin, of lengtb n.

We claim that .1I.rll po••i61. on. 01 the Ie" po••i61. linur .trin9. 01 n 6eads
01 Ie eolor. un he wn8trueted onee and On/II one. 611 .uch a cuttin9 opel'CltiorL
To prove that, let w be such an n.string, and let d be the smallest integer

[62 §2g. WILF'S OTHER EXTR£;\lE)


with the property that w is the concatenation of n / d copies of a string of length
d. There is always at least one candidate for such an integer d, since d = n

will work. Evidently, whatever d is, it is certainly a divisor of n.


Having found d for the given string w , construct a necklace of d beads by
taking the first d beads from w and tying their ends together. The resulting
(d, k) necklace is prime (why? ) and it is the one and only prime necklace with
the property that if we apply the 'Construction' stated above, the given string
w results.
Hence every prime (d, k) necklace yields d different linear strings, and
every linear string turns up at some point in the game. That is to say,

L dM(d, k) = k n (n � 1).
din

After Mobius inversion, we get

M(n,k) = ( l/n) L �(n/d)kd (n � 1) (1)


din

where � is the Mobius function. Hence we have an explicit formula for the
number of prime necklaces.
Unfortunately, that wasn't the question. The question was to find the
number of all different necklaces whether prime or not.
Fortunately, the la.tter number, F(n,k), is easily obtainable from M(n, k) .
Take a. divisor d of n and a prime (d, k) necklace. Cut it somewhere, and make
n/d copies of the resulting string. Concatenate them (sound familiar? ) into
a single n string, but now (this part shouldn't sound familiar) tie its ends
together. The result of this opera.tion is a. nuklace, not just a linear string.
Hence it doesn't matter where we ma.ke the cut: wherever the cut is made, the
end result is the same necklace.
Bottom line: F(n,k) = L d,.. M(d,k)
Since we ha.ve an explicit formula ( 1) for M and an explicit formula for F
in terms of M, we're finished, aren't we? Well in a sense yes, but if we stick
with it we'll find tha.t simplifying the expression is half the fun. Where we are
is tha.t
F(n,k) == L M(d,k)
dl..
== L(l/d) L �(d/d')kd'
dl.. d'id
The first step in the simplification process is to invoke the Law of Double
Sums: 'Intercha.nge Them'. This gives

F(n,k) == L kd' L �(d./d.') /d. (2)


d'I" d'ilii n

[§ 2g. WILF'S OTHER EXTREME 6J ]


in which the innermost sum is over d's that are simultaneously divisors of n

and multiples of d' .


Now that kind of a sum is very confusing to handle unless you use a
language that has been attributed by a well-known SU professor to Iverson,
although said professor has since developed its use to a fare-thee-well. The
idea is to move all of the fine print up from the bottom of the airline ads into
the main text so you can see that you have to stay over a weekend to get thee
best fares.
The way to apply this sage advice ( thymely too) is to use the function T( ·) ,
the 'truth-value' function. Its value on any particular ' . ' is 1 if its argument is
true and 0 else. In this case we have,

L p,(d/d')/d = L T (d'!d) T(d!n)p,(d/d') (l/d)


d'I!!ln d

But T (d'!d) T( 3t : d = td') , so we can replace d


= by td' in the inner sum
above, and sum on t . The inner sum now becomes

L T(td'!n)p,(t)/t = L p,(t)/t
c tl(n/d')

where the fine print has now reverted to the bottom of the ad.
Next we want to relate this lut sum to Euler's totient function, from the
theory of numbers. The well known evaluation of the Euler function in terms
of the prime factorization of an integer n is

if>(n) = n II (1 - l/p)
pin
where the product is over prime divisors of n. IT we multiply out all of the
factors of the product we get an enormous alternating sum of reciprocals of
various product. of prime divisors of n. Those product. run through precisely
the square-free divisors of n, i.e., those in which no prime factor is repeated,
and those are exactly the divisors of n on which the Mobius function is nonzero.
What that all boils down to is that

if>(n)/n = L P,(d)/d.
di n

IT we substitute into the inner sum that we've been fussing over, and then
substitute back in (2) we get the finallinal result that exactly

F (n, k) = (l/n) L if>(n/d) kd (3)


dl"

different necklaces of " beads can be made out of beads of k colors.

[ 64 §2g. WILF'S OTHER EXTRE.\ Il, 1


There will be a brief pause while everybody lets n = 4 and k = 2 to see
that the formula really gives F(4. 2) 6 ...... .. . 0K. now that that's out of the
= . .

way. let's try another simple case of the formula. This time. suppose n is a
prime number. The virtue of that assumption is that primes have only two
divisors. so there are only two terms in the sum for F(n. k} . A er recalling p
that ¢(n) = n -1 if (and only if) n is prime. we discover that

F(p. k) = (kP - k + pk)/p. J


Now the exact number is not the thing that is most interesting here. It's
the fact that the right hand side is an integer. It has to be. because it's the
number of ways of doing something! Thus. the numerator. kP k + pk must be -

divisble by p . and therefore kP k must be divisble by p, whatever the positive


-

integer k might be. Well that is exactly Fermat's Little theorem, and we got a
proof of it as a spinoff from a combinatorial formula. The main point is that
counting formulas must give integer answers, and if the answers don't look like
integers then we may have discovered something interesting.
By the way, formula (3) must still be an integer even when n isn't prime.
but it sure doesn't look like it. I wonder what that might mean...

6� 1
[§2g. WILF'S OTHER EXTREME
§3 0. Excerpts from class , November 18 [notes by P M Rj

"No man but a blockhead ever wrote except for money. "

-Samuel Johnson, quoted in Boswell's Life of Samuel Johnson


April 5th, li/6

Today we got an entirely different perspective on the whole ball of wax. Don began his
fortnight's sabbatical by turning the stage over to one of Computer Science's most prolific
authors: Professor Jeff Ullman. A large crowd had gathered to hear Jeff's advice on "How
to get rich by writing books" -an illustration of one of the principles of cover design, he
said: Attract people with something that isn't in the book at all.

Jeff started by talking a bit about the pragmatics of publishing-how the money flows. He
kicked off with a back-of-an-envelope calculation. A book is a megabyte of text. Jeff can
write perhaps two or three kilobytes of first draft per hour-say one kilobyte per hour of
finished text. We can all train ourselves up to much the same performance, he asserted. So
it takes around a thousand hours of labour to write a book. Now then, a typical CS text
might sell for $40. A good book on a specialized topic, or a mediocre book on a general
topic, might well sell 1500 copies in the US and 500 copies abroad. ( These figures put the
200,000 copies of Don's ACP sold in the USSR into some perspective.) A 15% royalty is
standard on domestic sales, a rather lower rate for foreign sales. All in all , our talented
specialist or so-so generalist can expect to net maybe $8000 over his book's lifetime of
perhaps five years. Of course, fame as well as fortune is to be gained through publication,
but Jeff dismissed such non-financial motivations as being beyond the scope of his talk.

"I told you to be a lawyer. Or a doctor," someone's mother was heard to whisper. But
Jeff forestalled a mass exodus to the GSB by going on to tell us how to make book-writing
a going concern. Firstly, he said, it's quite feasible to double the royalty rate. CS authors
have some leverage with publishers in that their books sell quite well-a publisher's costs
are very sublinear in the number of copies sold, so he can afford to pay a lot more for a
book that will sell 5000, instead of 2000, copies. What's more, a computer scientist often
keeps his publisher's costs down by preparing his own camera-ready copies. Jeff is happy
to tell you more about how to drive a hard bargain with your publisher-go and talk to
him about it!

Secondly, you need to aim for ten thousand domestic sales; say two thousand a year for five
years. That's 5-10% of the entire market in a topic like compilers or operating systems.
There's nothing off-the-wall about this, provided you find the right niche: Let yours be
the hardest book on the subject, or the easiest. Or the best. This wasn't so hard to do in
the early days of CS, when there was a big demand for textbooks but only a few authors;
it's certainly going to get harder as the field matures. If you're going for the big bucks ,
advised Jeff, choose a young and booming field-biogenetics perhaps.

Increase your royalties and sales, arid your efforts can net you as much as a medium-grade
hooker's: say $100 per hour. Top-notch computer scientists should aspire to no less.

[66 §30. JEFF ULLMAN ON GETTING RICHI


A miscellany of tips
• Find a co-author or two. Co-authors won't save you any time, but they do help
filter out your idiosyncrasies. Jeff said that when he writes alone "my own craziness
takes over" and the book turns out a dud. He was just "too weird" in Principles
of Programming Systems-although not too weird for the Japanese, who continue to
buy it. His Database book went well, though probably because Chris Date's book
provided a framework and the necessary "dose of reality." Filtering out oddball stuff
has a big effect on quality. And since a textbook that is only marginally better than
the competition will nevertheless grab the lion's share of sales, any small impro"ement
is well worth having.
• Jeff never saw a book with too many examples. Use lots. Even a very simple example
will get three-quarters of an idea across. A page or two later you can refine it with
a complex example that illustrates all the "grubbies." But finding good examples­
examples that illustrate all and only the points you are concerned with-is not easy;
Jeff has no recipe. You must be prepared to spend a lot of time on it.
• Jeff endorsed Don's exhortation: "Put yourself in the reader's place!" If Mary-Claire
concurs, we may even be convinced.
• Spend the day reading about a topic, and write it up in the evening. That way, you'll
get the expository order right. You have an advantage over the experts because you
can still remember what was hard to learn.
• Jeff often sees a definition in Chapter 2 and its use in Chapter 5 . This just isn't the
way readers work; it's essential to keep definitions and uses close together. Don't be
ashamed to repeat yourself if that's what it takes.
• Those who can, do; those who can't, teach; those who can't teach, show off. Remember
that the object of exposition is education, not showmanship.
• There is a tradeoff in using powerful mechanics to justify your methods; they may
be too opaque. Jeff had to decide whether to spend 20 pages teaching asymptotic
analysis in order to spend 5 pages applying its theorems, or whether just to say "It
can be shown that . . . " and refer his readers to another text. In the end he got around
the dilemma by doing only the most basic calculations and proving nothing deep. In
general, keep the level of your exposition down so that you can rely on your readers
understanding it.
A couple of tactical remarks:
State the type� of your variables. Talk about ' . . . the set S . . . " not about ' . ' S
. . . ' '.
Jeff's English professor, now a leading poet, told him never to use the non-referential 'this'.
Recognizing the dearth of poetry in CS, Jeff now forbids his students to use it either. 90%
of the time it doesn't matter; the other 10% leaves your readers bewildered. One book
presents four ideas in a row and then says "This leads us to consider . . " . What leads US
.

to consider?

[§ 30 . JEFF ULLMAN ON GETTING RICH 671


Coping with the competition
Like it or not, book-writing is an increasingly competitive sport. But just because every
other introductory Pascal text starts with 'write' statements doesn't mean that yours has
to starts with 'while', just to be different . Don't slavishly imitate another's style, but
don't avoid it either. Know the market, know thyself, and work out a compromise of your
own. Don't hesitate to follow the crowd when they are all going in the right direction.

This last remark brought Jeff ( "I am not a lawyer") Ullman round to the tricky subject
of plagiarism. According to Prentice-Hall's Guide to Authors, imitation ceases to be the
sincerest form of flattery and becomes something much more culpable if a reasonable person
could not believe that you didn't have the other chap's book open in front of you as you
wrote yours. That said, remember that you can't copyright ideas as such, but only ways
of expressing them. Jeff shamelessly admits that his Compilers book borrowed another 's
table of contents and the general front-to-back expository scheme.
,
Jeff showed us a suspicious case in which an author had written "Knuth has shown . . . .
and then went on to quote more-or-Iess verbatim from ACP. The coincidence of notation
is hardly conclusive, he said, but the identical use of italics is pretty damning.

Don here pointed out that his disciple had actually corrected a typo, for one sentence
was in fact the exact logical negation of the other. But this book contained much worse
examples of plagiarism: A dozen or so successive equations lifted straight from elsewhere.
In these notes, names have been suppressed to protect the guilty.

Someone asked about second and subsequent editions. Jeff said that these will still consume
a kilohour or so, although they'll go faster if you can use your earlier examples. But the
financial advantages are very real: People stop buying a book when it has been out for five
years, so publish a new edition and start the clock ticking again!

One person asked about writing survey papers-surely they will contain a lot of verbatim
quotes? There's no problem since the writer is not presenting the work as his own, Jeff
said. Besides, accusations of plagiarism hinge on financial loss, and no one writes technical
papers to make money. But be explicit in your quotation if you feel more comfortable
doing so.

Why don't expositions of CS make more use of analogy, asked someone, drawing an analogy
with physics texts ( which are planted thick with analogy, metaphor, and simile). Jeff
thought it partly due to the nature of the subject, but encouraged us to use analogy where
we are sure that the reader will get the point.

Asked about progress on Parallel Computation, Jeff confessed that it may never be finished:
"That's another point about co-authors . . . " . Jeff left us, and Don, to reflect on his maxim:

"Never spend more than a year on anything."

[68 § 30. JEFF ULLMAN ON GETTING RICHI


§3 1 . Excerpts from class, November 20 [notes by TLL]
Today's special guest lecturer was Leslie Lamport of DECSRC. Leslie, sporting a Mama '.
Barbeque T-shirt ( "WALK IN - PIG OUT" ) , took the stage and gave us a very active
lecture. (He clearly believes in one of his own maxims: "You've got to be excited about
what you are doing. " )

The first thing Leslie told u s was that h e would restrict his advice t o the writing o f papers
(not books) . "I have one thing to say about writing a paper for publication: Don't. The
market is flooded. Why add to the detritus?" After the appropriate dramatic pause, he
continued with, "But seriously folks, somebody has to write papers."

While we are asking ourselves if our own papers are worth writing, Leslie asks that we
keep in mind two bad reasons for writing a paper:

The first bad reason is "to have a long publications list." Leslie says he would like to think
that the people who are supposed to be impressed by a long publications list would be
more impressed with quality than quantity. Admitting that this might not always be the
case, he appealed to our own sense of integrity to police us where others' standards do not .

The second bad reason is "to have a paper published in a specific conference." Leslie
has known people whose need to insert papers in specific proceedings is greater than their
need to disseminate accurate information. This approach "sometimes leads to pretty sloppy
papers." He told us that he knows of one case where the authors of a conference paper
promised to send a correction, once they figured it out, to each conference participant.

Leslie recognizes one good reason to publish a paper: "You have done something that you
are excited about."

Just how excited can you be and yet not publish a paper? Leslie was once told: "Judge an
artist not by the quality of what is framed and hanging on the walls, but by the quality of
what's in the wastebasket." Similarly, Leslie thinks that we should be judged on the "best
thing that we have done that we decided not to publish."

Moving on to how we learn to write well, Leslie told us that learning to write is more like
learning to play the piano than like learning to type. While both typing and piano-playing
involve motor skills, a good pianist must spend much time studying music in its entirety;
he must ·spend more time away from the piano than in front of it. Correspondingly, we
should learn to write by reading. Leslie payed homage to Halmos and Knuth, but said that
they can not match Fowles and Eliot: We should read great literature in order to learn
how to write good mathematical literature.

We must know what we want to present before we can present it well. As Leslie said,
"Bad writing comes from bad thinking, and bad thinking never produces good writing."
We must keep in mind what we are writing-and to whom.

The question of audience is closely related to where a paper, once written, should be
published. Appropriate places may be a Tech Report, a letter, a Journal, or the bottom
drawer of your desk. (Don't really throw anything out: it is good to have the record-, even
if you don't publish your work.) How do we choose?

[§ 3 1 . LESLIE LAMPORT ON WRITING PAPERS 69J


Journal articles should be polished and timeless. Conference papers can be a little rougher.
Conference papers are appropriate for work that is "not yet ready for the archives."' Tech­
nical reports (usually distributed by an institution) are good for work that is not even
ready for the general world but still should be written up.
Leslie asks us to remember that "in each case, you still have readers. That tech report
may some day turn into a Journal article. You've got to be excited about your writing."
As for the central theme of a paper, Leslie told us that he enjoys the Elizabethan use of the
word "conceit" to denote a fanciful or cute idea around which a paper can be built. vVhile
such an idea can be a good catalyst as we begin to write, we should be willing to abandon
it. After all, we use such metaphors or themes in order to present ideas-we should not
allow them to intrude. The line between what can be called a conceit and the merely cute
is a fine one. Beware of jokes. Just how funny will a joke be ten years after it was included
in a Journal paper?
While jokes should be left out, examples are welcome additions to most papers. Leslie
said, "It is better to have one- solid example than to have a dry, abstract, academic paper."
He also said that it is never a mistake to have too simple an example ("at least not for a
lecture" ). Demonstrating that "examples keep you honest," Leslie told us about a major
revision of one of his published theories upon discovering that his original draft of the
theory was not powerful enough to deal with the example that he wanted to use in his
paper.
Expressing concern that people often "fix the sentence and not the idea," Leslie told us
that we can be too concerned with details. For example, he tells us not to think about
formatting when we are writing. ( "Don't think about format. Do think about structure." )
He suggests that whenever we have some detail, such as complex notation, we shouldn't
write it out: We should use a macro.
Leslie discussed trends in notation, showing us a translation of Newton's Principia Matb­
ematica. Newton stated his mathematical theorems in non-mathematical language that
was very difficult to read. Instead of saying that something is inversely proportional to the
square of the distance, we can get the point across better by saying that it is c/ If} .
Thus algebra has provided us with a tool for presenting the structure of a formula. But
can't we improve present practice by making the structure of an entire discourse more clear?
Leslie gave us a handout demonstrating two forms of a proof: a paragraph form and a form
that looks like the tabular proofs that high-school students produce in Plane Geometry
homework. (See § 33 below.) Pointing out that the tabular proof is much more easy to
read, Leslie cautioned us that he was not talking about formatting, but the structure that
the tabular form enforces. He says that writing proofs in such tabular "statement-reason"
forms will help us clarify proofs that are to be presented in paragraph style. (The flip side
of the handout also contains an example that Leslie did not have time to explain. The
example shows some "bloated prose" that Leslie trimmed down by half.)
In discussing writing itself, Leslie said, "You should be excited about what you are writing
and that excitement should show." Saying that this principle can especially be applied
to first sentences ( "You want something that leaps out at you" ), he read us several first

[ 70 §31 . LESLIE LAMPORT ON WRITING PAPERS!


sentences from various compositions. The first sentence can be expected to be nontechnical
and to represent an author's best effort. He was pleased with some of the first sentences
from his own work and less pleased with others, but he was ecstatic about some of the
first sentences he read us by T. S . Eliot or Allen Ginsburg. Thus it might be a good idea
to ask ourselves: "What would T. S. have written, if he were writing this paper?"

What characterizes a good first sentence? Leslie says to "avoid passive wimpiness," but
to be simple and direct . "Get right down to business." Of course, once you have hit your
readers in the gut with your first sentence, you can't let them down with your second.
Continuing in this vein, by induction, "When you come to sentence number 2079, you've
got to keep socking it to them." (He illustrated this by reading an arresting sentence from
the middle of The Four quartets by T. S. Eliot , choosing the sentence at random. )

Leslie finished his lecture b y saying, "I am not T . S . Eliot. I need t o pay more attention
to my writing. As do we all."

9 3 2. How I changed my co-author's draft

In this section, we describe some In this section,


of the highlights of the research
area. We discuss some of the most we discuss some of the most signif·
significant, elegant, and useful al­ icant algorithms and lower bound
gorithms, and some corresponding results.
lower bound results. Since the lit­
erature in the azea i S ,vast and var·
ied, we have found the selection
and organization of these results
to be a formidable task. We have
chosen to simplify our task by re­
stricting our attention to four ma­ We restrict our attention to four
jor categories of results: shared major categories: shared memory
memory algorithms, distributed algorithms, distributed consensus
consensus algorithms, distributed algorithms, distributed network
network algorithms and concur­ algorithms and concurrency con·
rency control. Each of these cate­ trol.
gories has a very rich research lit­
erature of its own, and we think
that together, they provide a rep­ Although we are neglecting many
resentative picture of work in the interesting topics, these four areas
area. Still, our description i. in­ provide a representative picture of
complete, aince ... nes1ect many di.tributed computing.
other interesting topics.

[§32. LAMPORT'S HANDOUT ON UNNECESSARY PROSE 71 )


§33. Toward structured proofs

Modified version of Corollary 3 from page 170 of Calculus by Michael Spivak:

Proposition If /'(%) > 0 for all % in an interval, then f is increasing on


the interval.

PaOOF: Let a ana b be two points in the interval with 4 < b. We must
prove that f( a) < f(b ).
By the Mean Value Theorem, there is some x in (a, b) with

I'(x) = i=�(a)
f( b

But by hypothesis /'(z) > 0 for all z i n ( a,b), so


f( b) - f( c)
>0
b-c
Since b - c > 0 it follows that f(b) > f(4) .•

PaOOF: Let c and b be two point. in the interval with 4 < b . We must
prove that /(c) < /(6).
Statement RejN!O!l
1. There exists x in (c, b) with
1. The Mean Value Theorem.
= f( b) - f(o)
I'(x) b-o
2. f'(x ) > 0 2. By 1 and hypothesis.
. f( b) - J( c)
3 >0 3. By 1 and 2.
b-c
4. b - c > 0 4. By choice of 0 and b.
5. J(b) > J(4) 5. By 3 and 4.

[ 72 §33. LAMPORT'S HANDOUT ON STYLES OF PROOF ]


§ 34. Excerpts from class, November 23 (notes by P M Rj
Nils Nilsson, latest in our line-up of megastar guest speakers, spoke on the subject of "Art
and Writing." He began by showing us two photographs: Edward Weston's print of a
snail-shell ( strangely reminiscent of a human form) , and Ansel Adams's "Aspens in New
Mexico." Having thus set the artistic mood, Nils went on to talk about what this has to
do with writing. Novels and plays are recognised as art; mathematical writing should also
qualify, he said. Writing can be both art and communication; indeed, real communication
happens only when writing is charged with artistic passion.
For Nils, a key word is Composition. Nils once took a course in photography from a teacher
who declared that:

Composition = Organisation + Simplification.

This formulation made a lasting impression on Nils. It applies equally to writing as to


photography. A quote from Edward Weston: "Composition is the strongest way of see­
ing." A typical artistic phrase, said Nils, but what does it mean? Some might say that
Weston anticipated the findings of recent research in computer vision: The viewer must
participate, construct models, form hypotheses. There are no spectator sports! Likewise,
a photographer sees best when a scene is well-composed.
"Life is very nice, but it lacks form.
It is the aim of art to give it some."
- Jean Anouilh.
But like all art, said Nils, writing should be fun. Just as the painter takes pleasure in the
smell of his paints, so should the writer feel good when surrounded by the tools of his art:
paper, ink, typewriter, word-processor, whatever. He must feel a thrill, as Don does when
pin-pointing a reference. Another key word, then, is Joy.
But if writing is to be art, we must first master the craft. Only when our grasp of the
minutire is perfect can we transcend technique and aspire to genius. Nils gave us a "broad
brush" overview of some important points, along with some autobiographical tales.
L Start early_ Impressionable minds are best. Some people find that writing becomes
a real compulsion; if this happens to you, then let the urge take over'
Way back in 1954 Nils took a Stanford course on "Scientific Writing." Writing an essay
or two a week, he learned to become clear and organised-and got an A- for his paper on
"Ionic Oscillations." Nils was pretty pleased, and thus began his career as a writer. In the
Air Force, he discovered a growing urge to write a book about radar; he realises now that
this was mainly a compulsion to get the material organised. By 1960 he had an outline of
the book, but it never saw the light of day; after leaving the Air Force he joined SRI and
got deeply involved with something else entirely (Neural Nets, as it happens ) .
2. Write, rewrite, rewrite, rewrite _ . , _ This dictum really is true, said Nils. It
is the extremely rare artist who does not need to labour over and over on his' work.
Mozart was said to be an exception; his first draft was his final version. Beethoven,

l§34 . NILS NILSSON ON ART AND WRITING 73 ]


on the other hand, rewrote his work over and over, and even then was never satisfied.
As someone once remarked: "A work of art is never completed, only abandoned."
A member of the class quoted Robert Heinlein as saying that a writer must resist the urge
to rewrite. (But then, Heinlein writes great thick books and pretty poor ones at that.
Likewise, Barbara Cartland is said to wander about the house dictating her novels into a
tape-recorder, whence they are transcribed and published. Of the literary qualities of her
work, the less said the better.)
"Easy writing makes damned hard reading." Nils couldn't remember the source of this
quote '* Hemingway put it rather more colourfully, which we blush to repeat here. Think
of your early drafts as being like an artist's sketches, urged Nils: Be prepared to throw
away nearly all of them. Neither are you done when the book is finished and on the shelves.
Maurice Karnaugh (inventor of Karnaugh Maps) wrote to Nils after Principles of Al·tificiai
Intelligence was published and pointed out that the A' algorithm as Nils had defined it
would fail on a certain graph. This led to a correction in a subsequent edition.
Never let anything you write be published without having had others critique it. A univer­
sity is a good environment in which to get feedback on your work, though you may need
to give some thought to the timing of your requests for comments (unless you have infinite
resources of willing readers) . Nils told us about the time that he thought he had a neat
result in non-monotonic reasoning and circumscription. He wrote it up and sent it to John
McCarthy, who passed it on to Vladimir Lifschitz, who discovered that Nils's derivation
"appeared to contain an oversight . . . .
"

Nils always tries to teach a course on a topic at the time that he is writing it up-it's
ridiculous to inflict your ramblings on the world unless you are prepared to do this, he
said.
Nils decided that since he had the whole book on-line, he would take a crack at publishing it
himself. He and his wife Karen set up the Tioga Publishing Company. One big advantage
about this cottage-industry approach is the ease with which the author can make changes
in subsequent editions. Karen went on to become a full-time publisher; Tioga's theme has
now changed from AI to nature and the environment. So Nils considers himself pretty well
"vertically integrated" in the world of books.
3 . Read. Read a great deal; it'll sharpen your style and get your critical faculties
working.
4 . Model the Reader. Deja vu. This should be obvious, said Nils, but there's really
a lot to it. Ask yourself what the reader's primitives are, and write with them in mind.
In fact, the whole issue is so complex and important that Nils likes to operationalise it
with AI-type "dremons." Any number of these have to be running in the background
as you write, catching errors and providing constructive criticism. You have to be
asking all the time; "How is the reader going to misunderstand me here?" You must

* "Easy reading is damned hard writing." -Nathaniel Hawthorne. (Just lucky to find
it -DEK.)

- [ 74 §34. NILS NILSSON ON ART AND WRITING]


automatically insert forests of guidelines to keep him on track. You develop these
daemons by practice-it's a kind of motor skill, like playing tennis or riding a bicycle.
A split infinitive should really jar, Nils said: "It's got to light up in red!" The
daemons
have to run automatically; you can't be consciously checking a list of rules all the time .
Besides, if writing is to be fun, it can't be compulsive!

5. Master the Medium. You need a good vocabulary, though this needn't mean a
huge list of big words. There are issues other than pure language: indexes, tables ,
graphs, and how to use them to best effect. As Don pointed out earlier, we can use
typography to make important distinctions, as with the typewriter font for logical
formulae.

In the future, said Nils, it's clear that reacling and writing will be far more interactive
processes-The Media Lab i s not all hype. It's not clear yet what will prove necessary or
useful; just as i t took several centuries to invent the index, it will probably take us a long
time to identify the "stable points" offered by our new technology. We in the audience are
at the cutting edge of these experiments.

6. Master the Material. There's a lot of internal feedback involved in wri ting; one
comes to understand the material in a new way on trying to organise it for publication.
Nils drew this diagram :

Internal
Model

writing

Text

As Mary-Claire said on Wednesday, "How do I know what I mean until I hear what I say?"
Even Nils sometimes finds himself thinking "I don't believe that!" when he hears himself
lecture. I am reminded of the ( true) story of a professor who was always seen to take a pad
of blank paper with him when he delivered a talk. When asked what was for, he replied:
"Why, if I say anything good I'll want to write it down! " So go to lecture� and classes,
give talks. All these things help modify your internal model and get things into shape.

"In a very real sense, the writer writes in order to teach himself, to understand
himself; the publishing of his ideas, though it brings gratifications, is a curious
anticlimax."
-:- Alfred Kazin

[§ 34 . NILS NILSSON ON ART AND WRITING 75)


7 . Simplify. Lie, if it helps. You can add the correct details later on. but it is essential
to present the reader with something straightforward to start off with. So don't be
afraid to bend the facts initially where this leads to a useful simplification and then
pay back the debt to truth later by gradual elaborations.

"Another noteworthy characteristic of this manual is that it doesn't always tell


the truth."
- Don Knuth, The TE;X book (page vii )

"Everything should always be made as simple as possible, but not simpler."


- Albert Einstein

Ted Shortliffe did a great job with Mycin, Nils said. But with 20/20 hindsight he might
have done better to invent a simplified system for expository purposes. For example,
he could have demonstrated the backward-chaining techniques and only later dealt with
"certainty factors."

By using simple examples we can get ourselves on the winning side of the 80-20 rule: we
can convey 80% of the truth with only 20% of the difficulty. Mathematicians, of course,
like to go the other way: They never state a theorem in three dimensions if it can be
generalised to n. Such terse elegance can be painful for the reader.

8. Avoid Recycling. With on-line text and sophisticated editors ( I refer to software,
not the mandarins behind Scientiiic American) it is very tempting to re-use portions
of old material. Resist the temptation. Almost certainly you are writing in a new
context, with a new emphasis. Hopefully you are older and wiser, and perhaps even a
better writer than you were when the old material was written. So do rewrite it , it's
worth the extra effort.

9. Aim for Excellence. You've got to keep shooting for perfection, even if you'll never
get there. What the Great have said on this:

"We are all apprentices in a craft where no-one ever becomes a master."
- Ernest Hemingway

"Someday I'll build the perfect birch-bark canoe."


- John McPhee

"Someday I'll write the perfect AI textbook."


- Nils Nilsson

"Ah, but a man's reach should exceed his grasp, or what's a Heaven for?"
- William Blake

"The message of these books is that, here in the 80s, 'good' is no longer good
enough. In today's business environment, 'good' is a word we use to describe
an employee whom we are about to transfer to a urinal-storage facility in the
Aleutian Islands. What we want, in our 80s business executive, is somebody

[ 76 §34. NILS NILSSON ON ART AND WRITING)


who demands the best in everything; somebody who is never satisfied; somebody
who, if he had been in charge of decorating the Sistine Chapel, would have said:
"That is a good fresco, Michelangelo, but I want a better fresco, and I want it by
tomorrow morning."
- Dave Barry

§35. Excerpts from class, November 25 (notes by TLL]


Don opened class by introducing guest lecturer Mary-Claire van Leunen and by giving us

the title of her talk: "Calisthenics." Mary-Claire opened her talk by telling us a story.
Many years ago Mary-Claire was a frequent passenger on the Chicago bus system. The
neighborhood where she boarded her #5 bus was a gathering spot for "bummy guys."
All of these guys were interested in money: Some begged, others peddled. Among the
peddlers-hawking wares ranging from trenchcoats full of watches to freedom from the
peddler's presence--was a man whom Mary-Claire patronized quite regularly. He sold
pencil stubs (obviously collected from trash bins); but Mary-Claire said his patter was
charming enough to rate one or two purchases a week.
"These pencils are magic pencils," he would say. "Buy a magic pencil. Only 25
cents. "
"What's a magic pencil?" would come the expected response.
"With this pencil, you can write the truth."
Inevitably, someone would pipe up, "But I can write lies with it."
"Oh, you can break the magic. But if you really believe, you can write the truth."
Mary-Claire sees this as the wonder and the motivation behind the craft of writing: If
you work hard, you can explain a new truth to someone you will never meet-perhaps to
someone who will live after you are dead.
Such a vocation requires preparation. The Composition Exercises that Mary-Claire has
given us (see § 36 below) were designed to help US become as strong as we can. Our readers
are more likely to be tolerant of a few weaknesses if they are surrounded and supported
by strength.
Mary-Claire has given these exercises to students before, but preparing this draft for
our class pushed her to really write the exercises. The copy that she referred to over
the TV monitors was slightly different than the copies that we have been given. Mary­
Claire, hoping that these differences represent improvements, invites us to suggest further
improvements to the draft. (She says that she might publish something that evolves from
this draft-but probably not soon: She is not a fast writer.)
The first set of exercises, labeled "Vocabulary," is designed to increase our command of
just that.
The first of the pair is an exercise that was done by little Greek boys: Taking a composition
and swapping all the old words (nouns, verbs, adjectives, and adverbs) for new ones. What

[§J5 · MARY-CLAIRE VAN LEUNEN ON CALISTHENICS (1) 77]


is the effect of these changes? What happens when a vulgarism is used? When a hoity-toity
word is used?

The second vocabulary exercise, the writing of a thesaurus entry, is best done o\-er a
week. After several days of slowly adding to our set of synonyms, we should compare
our entry with an entry in our thesaurus. ( Everyone needs at least one thesaurus and a

good unabridged dictionary. In addition to more than one kind of dictionary, Mary-Claire
recommends Sidney Landau's book Dictionaries to help us understand how to best use our
dictionaries. )

"Syntax," the next set of exercises, deals with syntactic mastery. Mary-Claire says even
though vocabulary improvement is more often considered than increasing our command
of syntax, syntactic armory improvement is more important. She says that most of the
time we will use our basic three to four thousand words; we must use them in the most
interesting way possible_

Speaking of using words in interesting ways, Mary-Claire has been reading the first draft
of our term papers. There must be room for some improvement there: Her first comment
was, "Nobody sits down to write a boring paper." How can we tell when something we
write is "syntactically impoverished" ? She gathered some statistics that might help us get
the right idea.

One of her tricks was to study the first 10 complete sentences on the third page of every
paper. First she charted the average length of the 1 0 sentences: They varied from 15.6
sentence t o 24.4 words per sentence. Mary-Claire says that any of us with averages under
20 words per sentence are in the correct range for adult writing_ (But the writer with the
24.4 average had better have results pretty wonderful, to compensate for the extra work
that it takes to read his paper.)

Sheer variation in sentence length is one indication of syntactic variation and appropriate
pacing. With 10 sentences we should be aiming for 9 or 10 different lengths. The samples
from our papers yielded 6 to 9 different lengths. The difference between the word count
on the shortest sentence to the word count on the longest varied between 1 7 words to 3 7
words. The ideal chart of sentence lengths should look like a bell-curve centered around
1 5 to 1 8 words per sentence.

She asks us to note that we did not have enough short ( "and punchy" ) sentences. A few
long sentences are also important. She said, "A well constructed 46-word sentence is not
a difficult beast, but it had better not be the your crucial point." We should remember
that we have a responsibility to emphasize and deemphasize our points to the reader; long
sentences are one method of deemphasizing a point.

Beyond the word counts, she looked at the the templates used to construct our sentences.
For example, she found two writers who would appear to be similar if we just looked
at their sentence length average and variation, but who had quite different methods of
constructing their sentences. One of these writers used the same sentence construction for
almost every sentence (adverbial + subject + transitive-verb + object), and the other
used many different styles of construction. But the second writer was not free of flaws. He
had two sentences in a row with a full independent clause followed by a full parenthetical

[ 78 § 35 . MARY-CLAIRE VAN LEUNEN ON CALISTHENICS (I I]


independent clause. Mary-Claire says that we must learn what syntactic usage is unusual
so that we do not overuse it.
One syntactic trick not normally thought unusual was startlingly absent from our papers:
tight parallels. The use of two adjacent sentences with exactly the same construction is
an effective way to communicate similarity to our readers. Mary-Claire is curious how we
could all avoid this technique. Perhaps it is an artifact of the way that students of our
generation were taught?
Given this motivation to "increase our syntactic muscle," Mary-Claire led us back to
discussing the "Syntax" exercises. She passed over the first two exercises as obvious, but
a few comments were made on "periodic" sentences.
We who have had mathematical training might be tempted to guess that a periodic sentence
is one that repeats cyclically, but we would be wrong. Periodic sentences are those whose
grammatical and physical ends coincide: We must get adverbials out of the final position.
For example, a verb that is intransitive must end the sentence. Period.
Periodic sentences are not really appropriate in our kind of writing; they are a high literary
form. Even though such a sentence form is more frequently encountered in church than in
conference papers, the use of periodic sentences will heighten our awareness that we can
control sentence structure.
The next exercise has us recast a sentence so as to change emphasis. What are the emphatic
positions? The front of the sentence and the back. ( "The middle of a sentence is sort of a
slum.") But she says not to take her word for it; we should write sentences with varying
emphasis and find out for ourselves.
Mary-Claire says that the last syntactic exercise is "incredibly wonderful" : Write nonsense.
Write a completely unrelated stream of thoughts with the correct glue: Words like "thus,"
"therefore," and "as we can see." She says this is a fun exercise to do after a couple of
drinks. (Maybe we need a class lab?)
Moving on to exercises labeled "Manual labor," Mary-Claire told us that these should
logically come first, but she wanted to woo us with the logical stuff before we ran into the
weird stuff. Why is it important to use different methods to copy other people's writing?
Because writing-and even reading-is partly a manual process. Mary-Claire typed out
large sections of our papers as she was analyzing them. She said that if you tie a baby's
hands behind his back, but give him otherwise adequate mental stimulation, he will not
learn to speak well.
When we want to read a passage of text seriously, such manual labor can help us slow our
brains down until we can give the passage the consideration it deserves. (W. H. Auden
said that the proper way to show contempt for a poem is to copy it on a typewriter; the
proper way to show admiration is to copy it in longhand.) Memorization and recitation
can also help us to be able to read word by word. ( "Make yourself into a book that you
can take to prison should worse come to worst. " )
Mary-Claire i s aware that we may not buy this "manual labor" technique a.t first, but she
asks us to take it on faith. She took the technique on faith for ten �ears and then wrote a

[§3 5 MARY-CLAIRE VAN LEUNEN ON CALISTHENICS (1)


.. _� _ -- � r' • •" � V � " ..J .

Most writers are aware how important the manual part of composition is: They have very
rigid restrictions on how they compose. ( "Oh, I can only write on yellow pads with a
fountain pen." Mary-Claire says we should be able to compose on a cocktail napkin.

While discussing the section labeled "Frozen sounds," Mary-Claire told us about reading
aloud to her students their own writing. Some students were chagrined; others glowed.
She says we should form partnerships with other novice writers: Read and listen to each
other. But she cautions that a little goes a long way. If the writing is good, we can live on
that joy for quite a while; if the writing is bad, we won't be able to stand it for very long.

At this point in the lecture, Mary- Claire noticed that very few minutes remained. So her
comments on the final exercises were limited to those that she thought were the most
important.

Concerning the "Marks on paper" exercises, Mary-Claire quoted from E. M. Forster: "How
do I know what I mean till I see what I say?" We need to remember that writing is "the
most forgiving medium known to man ." We can work on it until we get it right.

Rushing past the "Stance, voice, and tone" section, she told us that she borrowed tech­
niques from speech therapists-who ask patients to exaggerate their defects until they
understand just exactly what characterizes their defects. For example, she says, "If any
one has ever told you that you are 'breezy,' write something truly off the wall."

She told us that the sections labeled "Observation," "Same as and different from," and
"Invention" are less important for us than for pure writers. Our discipline provides the
glue that writers with more freedom have to manufacture from scratch.

She warns us that the "Scansion" exercises are hard, but very important. She realizes that
she may have trouble convincing us that we need to write verse in order to learn to write
mathematics, but once again she says, "Trust me."

She reminded us that the "Precis" exercises were touched upon by Leslie Lamport in his
talk. At some point we cannot reduce the word count of a piece of prose without changing
the structure of that prose. ( We should never change the meaning, but we will have
to dispense with some details.) This point comes at different percentages of decrease­
depending on the flabbiness of the original text.

The final exercises she discussed, "Nearly real," are aptly named. They really are very
much like real writing. For instance, Mary-Claire says that "Writing a joke is exposition
at its purest. Things aren't funny unless they a,re well written."

She suggests that we try "Ben Franklin's exercise," rewriting a passage of someone else's
from memory and limited written hints-but that we try it with Don's writing. When we
have finished, what do we like better about Dan's version? What do we like better about
our own?

Before the cameraman could shoo us out of the room, Mary-Claire reminded us once again
that these exercises are "very hard work." She closed with, "I hope they will serve you as
well as they have served me."

[80 §35 . MARY-CLAIRE VAN LE UNEN ON CALISTHENICS I I "


Some of us surely hope the same.

Composition Exercises

"' Draft ·..

Mary-Oaire van Leunen

Unless you plan to do nothing else but composition exercises, there are enough here to last you for the
next decade. I had a wonderful student who did nearly all of them in a year, but he really did do nothing
else. Some of the exercises are quite deep, and you might easily be able to do them again and again in
different guises for the rest of your li!e. I've done all of them myself.

Many of these exercises tell you to take a passage of such-and-such a length and work some kind of
transformation on it. Whole pauage should it be, your own or someone else's? Either; or rather, both.
You can leam diHererlt things by doll\g the exerdJe diHererlt ways. If a piece of your own writing Is still
fresh enough In your mind so that you can ,eu_1ber what problems you were trying to solve as you
wrote it, the exper!eiIce of working comp1etely arbitrary changes on it can be exhilarating, not unlike
setting dollar billa on fire.

[§36. MARY-CLAIRE'S HANDOUT ON COMPOSITlON EXERCISES 81 ]


.
Vocabulary:

I . Replace.

Take a passage five pages long and replac:e at least three words in every sentence with others
that rrean approximately the same thing. (- "Get into your hands a 1500- word portion of a
written work and swap out of every sentence a minimum of three words, substituting others
without changing what's being said." - "Select a longish section from something you've
been reading and change the vocabulary of every sentence without changing the
signification.")

2. Multiply.

Choose a word and write a thesaurus enny for it - all the words at every level of diction that
mean approximately the sarre thing. Compare your enny to the entries in which the word
actually appears in some real thesaurus.

Most writers like to have several good desk dictionaries and at least one good thesaurus. In addition,
you might llke • book by Slclney 4ndau called DictioMria. It has helped me understand how
dictionaries and thesawwes get made and thus how to use them better.

Syntax:

t. Transform.

Take . passage of five pages and transform every sentence so that it says approximately the
same thing in different syntax. Change the vocabulary as little as possible. (_ 'Taking a
five-page passage, transform every sentence to say .n approximation of the same thing in
different syntax." - "Can you transform every sentence in a passage of five pages so that
approximately the same thing gets said in different syntax?")

2. Build tight parallels.

Write a sentence containing a tight parallel: • pair of structures that match perfectly in the
number and kind of aU their parts .nd subordinate structures. Push yourself till you can
construct sentences that contain tight parallels in which sch member is fifteen or twenty
words long.

3. Be periodic.

Rewrite a non-periodk sentence or pair of sentences as a periodic sentence. (A periodic


sentence is one whote gruNNtical and physical ends coincide - one with no adverbials to
the right of the preclIcate. The flnt sentence in this paragraph is non periodic; a periodic
-

venion might read: "Rewrite as a periodic sentence a non-periodic sentence or pair of


sentences.") Write an entire paragraph in periodic sentences; an entire page.

4. Eulphasize.

Reast a sentence so as to express the same meurlng but emphasize a different point.

S. Write nonsense.

Write a paragraph of coherent, tightly structurecl nonsense - all the connectives and labels in
place but n9 �ing. (M.de-up words not allowed.)
,

§ 36. MARY-CLAIRE'S HANDOUT ON COMPOSITION EXERCISES I


[82
Manual labor:

1 . Copy I.

Copy out a passage of your own writing or someone else's with a pen; with a pencil; with a
crayon; first with your left hand and then with your right; with a manual typewriter; with a
word-processor.

2. Copy II.

Copy out a passage from somethillg you lil<e; from something you dislil<e; from something
you find difficult to read; from something you find laughably easy; from someone you'd like
to imitate; from someone completely unlike you; from something written a hundred years
ago; from somethillg written last year; from somethillg scrawled off in haste; from something
overwritten and finicky.

The first of these copying exercises Is exploratory and interesting, and I certainly recommend that you do
it, but the second is is in another class altogether. Copying as a means of close reading is an inexhaustible
source of information. Word-processing has temporarily confused writers about the connection between
their hands and their brains. What you do with your hands Is the easy part. Use It to support the hard
part. The hard part which is what you do with your brain.

Frozen sounds:

1 . Transcribe.

Make a tape recording of five minutes of raello news and transcribe it. Transcribe five
minutes of a publlc lecture; five minute. of dialog from a television show; five minutes of
ordinary convelMtion among tluee o� four people. Talk extemporaneously into a tape
recorder fo� five minutes and traNaibe that.

2. Usten.

Read aloud a page of your own writing. Ask a friend of yours to read it aloud. Ask a second
friend. Ask a stranger.

Learn to mutter aloud what you're writing as you write it. It's only a minor eccentricity, and there's no
more efficient way of checking tor both cadence and tone.

[§36. MARY-CLAIRE'S HANDOUT ON COMPOSITION EXERCISES 83 )


Marks on paper:

1. Close your eyes.

Compose a paragraph with your eyes shut. Start over again from the beginning on a fresh
piece of paper as often as you like, but don't peek.

2. Tabulate.

Write a sentence with two conditionals ("If it rains and if Peter arrives on time .. "); rewrite it
.

as a table. Write a paragraph with several linlced conditionals; rewrite it as a table.


3. Caption.

De9c:ribe a picture in a single line that fits under the picture exactly; in two lines that fit
under the picture exactly.

Stance, voice, and tone:

1 . Change stance.

Rewrite a textbook explanation as a personal letter to an intelligent child a beloved niece or


-

nephew, for instance. Write a description of this morning's events as a letter to your spouse
or lover; rewrite the description as a letter to an old hIgh�1 teacher of yours; rewrite it
yet again as a report to an examining psychiatrist; to an anthropologbt; to a police inspector;
to a reporter from People magazine.
2. Take both sides.

Write a vigorous, closely reasoned argument for SODle small household economy like re­
using plastic bags or turning mattresses; now write a vigorous, closely reasoned argument
against
3. Hyperbolize.

Describe your current dwelling al an UNO'Upulous realtor would; describe your most recen t
meal in restawant-menu proee; describe an object on your desk as if It were for sale by mail
order.
4. Euphemize .

De9c:ribe the sympto� of severe gutroenteritis accurately but without recourse to vivid
language. Describe human aexual lntercoune In the diction of a knowledgeable prude.
Describe an employee's forced resiption for incompetence in language that attempts to
leave no opening for a libel suit
5. Obfuxate.

Take a passage of simple prote and rewrite It so t1!at the saDIe Ideas seem obscure and
difficult.
6. Pontificate.

Take a straightforward passage written in the first person and rewrite it so as to make the
author seem pompous and self-Important.

[ 84 §36. MARY-CLAIRE'S HANDOUT ON COMPOSITION EXERCISES ]


7. Vacillate.

Take an argumentative passage and inject it with doubts, quibbles, and hesitations.
8. Strengthen; vitiate.

Find a weak. flabby paragnph and rewrite it, inventing ideas and details where necessary, to
make it vigorous and strong. Now do the reverse: Find a strong paragnph and weaken it.
9. Change tone.

Look through a magazine or a newspaper for a sarcastic letter to the editor; rewrite it to
make the same point but without the sarcasm. Fmd a short factual piece; rewrite it as the
preamble to a petition asking for some action on the facts.

Observation:

1. Expand and contract.


Write a paragnph describing some small event of the last day - fixing your breakfast, or
catching the bus, or buying a newspaper. Expand the description to five pages. Now cut it
back to a paragraph again and compare the new paragraph to the old one.

2. Rethink.

Describe a favorite food by its appearance alone; describe only the sounds in the opening
credits for a movie; categorize and describe the objects on your desk by texture; by color.
3. Sensualize.

Choose an object and describe it by sight; by sound; by smell; by taste; by touch. Choose an
event from your daily life and describe it as a sensory experience.
4. Louis Agassiz's exerc:lse.

Put a green leaf or a flower on a plate and describe It eve%)' day for two weeks. (The original
version used a fish and took two months.)

Same as and different from:

1. Compare.
Choose two unlike things and build a simile capturing some point of slmllarity between
them. Choate two siuUlar things and explain how they differ.
2. Analogize.
Invent an extended analogy that would help an illiterate understand what a library is good
for; that would help a child understand getting fired from a job; that would help a city­
dweller understand the agricUltural year.

3. Differentiate.
Choose a ten- or fifteen-word entry in a the$l.uru5 and explain how the words differ from
one another.

[§36. MARY·CLATRE'S HANDOUT ON COMPOSITION EXERCISES 85 )


Invention:

1 . Combine words.

Choose two words at random from a dictionary and write a sentence that uses both of them;
choose three words at random and do the same.

2. Categorize.

Take twenty nouns at random from a dictionary and arrange them in categories. Write an
explanation of your scheme for arranging them.

Usually we don't need to do pure invention; we start from something, even if it's only "What I Did on My
Summer Vacation." Much of learning a discipline is learning how to do invention - how to recognize the

kinds of ideas that malce that discipline go forward and how to get yourself into position to have such
ideas yourself.

Another part of learning a discipline is learning what you don't have to invent because it's already been
done for you. The lonN lor taking advantage of that bacl,log of ideas vary from one discipline to
another, but the underlying habits of thought are simllar. The best set of exercises I have ever seen on
those habits is at the end of the section called "External Aids to Invention in Edward P.J. Corbett's
"

C/QssiaJl RJretori& for tire Modent 5 twIDIt.

Scansion:

1. Venily.
Render a newspaper story in couplets of IunbIc tetrameter; in triplets of dactylic hexameter.
Render a recipe into rhymes veri Ilbre. Render an expository passage as a ballad. Render a
short argumentative puaage II a villanel.le

2. SoMetize.
Write a new SOMe! every day for a week. (Be sure to throw these sonnets away.)
3. Explode.

Take a piece of metric vene and expand every line by one foot without altering the meaning
- from tetramell!r to pentameter, for instance, or from pentameter to hexameter.

Verse-writing is to other compoaition exercises what lifting hundred-pound weights is to touching your
toes. But you must honor conventional rhyme and stresa in order to get the benefit of writing verse;
otherwise you'll cheat yourself by writing near misses. For help on words lllce "villanelle" and
"heumeter: get a prosody handbook; get John Hollander'S, and you'll find yourself reading it for fun.

[86 §36. MARy.CLAIRE'S HANDOUT ON COMPOSITION EXERCISES ]


Precis:

1 . Reduce.

Choose a passage and count the number of words in it. Reduce them by 5% without
changing the meaning; by a quarter; by half.

2. AbstTact.

Describe in no more than ten sentences the content of an article; of a boo k.

Nearly real:

1. flip.
Take a paragraph and rewrite it so that the last sentence comes first and the first sentence
comes last. The middle will have to be completely rewritten, but try to change the first and
last sentences as little as possible.

2. Repace.

Rewrite an ordinary paragraph so as to enlorce a leisurely. ruminative pace on the reader;


now rewrite it the other way, to make it seem unusually quick and light and tripping.

3. Crunch.

Write a JOO.word description of some ooncrete physical object without using any adjectives
or adverbs; write a thousand-word description without any.

4. Unpack.

Take a metaphoril:al pasaage in either verse or prose and rewrite it as a series of flat-footed
proM comparUons.

5. Define.

Write dictionary definitions for a common word 1i1ce "hand" or "mean" or "find." Compare
your definitions to those in several dlctionaries.

6. Exemplify.

Chooee an abstract noun Uke "pne1oeity" or "fortitude" and describe three or more instances
of It. Impose an order on the instances and explain the order.

7. Explain.
Rewrite a simple .water pattern to meet the needs of someone who has never knitted.
Rewrite a cho<X!late-c:ala! recipe for someone who has never cooked. Rewrite directions on
how to set IpitIon points lor someone who has never driven a car.

8. Instruct.

Write a set of instruction. on how to draw some fairly complicated object without even
naming it or any of ils parts - a Christmas tree with ornaments and a star on top, lor
instance, or a house with a chimney, windows, doors, and foundation plantings.

Try your instructions out on a &lend.

[§36. M A Ry.CLAIRE'S HANDOUT ON COMPOSITION EXERCISES


8 7]
9. Write a joke.
So go ahead, write a joke.

10. Translate.

Buy a book in a language you don't know and a bilingual dictionary for the language.
Translate passages.
II. Ben Franklin's exercise.
Take a passage of someone else's, three or four pages long, and reduce it to a set of one- and
two-word hints to yourself about the contents, each written on a separate piece of paper.
Jumble the hints, put them in a box, and take them out again after three weeks. Arrange
them and reconstruct the passage. Compare your reconstruction to the origina\.
12. Push.

Write sentences at a deliberate pace for five minutes without repeating yourself, without
writing nonsense, and without stopping. Increase the time gradually till you can do this
exercise for twenty minutes.

Tropes:

In addition to doing all these exercises, my student and I also worked our way through Richard
unham's HArullist of R1retoriad TtnIIlI, writing an example for every rhetorical figure listed.
Contrary to
what I had expected. writing an example for "ery figure in unham turned out to be quite shallow. [
believe that merely thinking about our doing it will give you every bit as much benefit as doing it
yourself.

Books Mentioned
23 November 1987
Benjamin Franklin.
The Autobiography.

Edward P. J. Corbett.
ClAssical Rhetoric for tM Modmt Student.
Oxford University Press, second edition 1971.

John Hollander.
Rhyme'J &tuorI:
A Guide to E"glWl V".".
Yale Univenity Presl, 1981.

Sidney 1. l.&ndau.
DictioPUlrits:
TIft Art II,", Craft of
Charles Scribner's Sons, LaiaJgrtqJIry
19M. .
RJchud A. l.&nham.
A Hll1IIIlUt of R1retoriad Tmns:
A Guide for StudlmfJI of E"gliM uu,IIM6.
Univenity of Callfomi.a Press, 1969.

[88 § 36 . MARY·CLAIRE'S HANDOUT ON COMPOSITION EXERCISES ]


§37. Excerpts from class, November 30 [notes by PMR]

During the whole of a dull, darlc, and soundless day in the autumn of the year,
when the clouds hung oppressively low in the heavens, I had been pa.... ing alone,
on hor.. ebaclc, through a singularly dreary tract of country; and at length found
myself, a.. the shade.. of evening drew on, within view of the melancholy Terman
Engineering building.
-E. A. Poe (amended)
Don, like Mary-Claire, scans the pages of The New Yorker for choice malapropisms to
entertain us. In its columns the law firm of Choate, Hall, & Stewart had been rendered
as Choate, Hall, Ampersand, and Stewart, presumably by a journalist receiving dictation
over the telephone.
We also saw a splendid dangling participle from the same source:
"Flavor and texture of cooked okra are different from other vegetables. We usually
don't eat it raw, but in judging at fairs, I frequently taste a slice of a pod to check
maturity and condition. In soups, it is used as a thickening agent. When fried, I
love okra."
[When sober, can't stand the stuff. -The New Yorker]
Don announced that he had good news and bad news for us. He gave us the good news
first. Mary-Claire is to speak again on Wednesday. Also, Don finally got up the courage
to ask Paul Halmos to appear in our guest spot; he readily agreed and will speak next
Wednesday (9th December). This talk should be a fitting climax to the course. And a
week from today (Monday, 7th December) we will hear from Rosalie Sterner, a copy-editor
for The San Francisco Chronicle.
Having thus softened us up with these cheerful tidings, Don delivered the Bad News: The
first drafts of the term papers were, well . . . "their content was not one hundred percent
pleasing to your instructor." What makes a professor's life worthwhile? The knowledge
that he has succeeded in teaching something. In particular, there's a joy in the thought that
a student was able to do something that he couldn't have managed without the professor's
help. Don confessed that this joy did not run through him as he read our drafts. Indeed,
he could almost think that many of them were written before the class started. Have we
been relaxing too much, he wondered? Has our writing in fact changed at all? Have we
learnt nothing? Disturbing thoughts, he said.
Of the thirteen papers submitted, eleven were sprinkled with wicked whiches-at least two
in each. Don himself has been guilty of these in his time, and of course there is no-one like
a convert for rooting out heresy. But these are the 80s and we are supposed to be sensitised
to these things. And heaven knows, we've talked about this issue enough in class! So what
is he to think about this landslide of carelessness? Shaking his head, Don declared that
we left him with no alternative-he would have to resort to the ultimate sanction: a quiz.
In keeping with Honor Code protocols, Don left the room while we each wrote· a sen­
tence that used a 'that' correctly where a 'which' would have been wrong, and another

(§3i· COMMENTS ON STUDENT WORK


complementary sentence-which used a 'which' correctly where a 'that' would ha\'e been
incorrect. A minute passed. And then another minute. There was little to hear but the
scratching of pencils and the beating of hearts .

Don returned. We spent a few minutes looking a t the various examples that the class had
come up with, some correct and some incorrect. By and large, the class redeemed itself by
the creative solutions that were submitted:

All the students that know when to use 'which' and 'that' will pass the quiz. The
exam, which took place at the beginning of class, was not difficult.

A paper that uses two whiches improperly does not demonstrate that the author
hasn't learned anything. My first draft , which was written this summer, had a

million of them.

Beware of examples that are misleading. My term paper, which contains many
wicked whiches, is otherwise not too bad.

CS-types just love self-reference, it seems.

Is it not true, TLL asked of Mary- Claire, that people invariably get their whiches and
thats right when they speak? Mary-Claire replied that people almost never say 'which'
improperly in general speech-it's only when they feel under pressure that they resort to
this unnatural diction. So unnecessary use of 'which' really conveys a bad tone in your
writing; it makes you sound nervous. ( Conversely, on paper we can often fool our audience
into thinking that we are a lot more comfortable than we really are).

Don observed that all translations of the Bible are strewn with erroneous whiches. ( "Thou
shalt not llUffer a wicked which t o live," he might have said . ) A clamour of voices pointed
out that Fowler is quite clear about the rule. True, but it was never enforced until the late
iDs, Don countered. It seems particularly strange, he said, that the New English Bible
should commit this error, as its editors take great pride in the literary qualities of their
text. Mary-Claire resolved this mystery: Apparently our "oldest and closest allies" on that
far-off island regard this whole issue as unmitigated nonsense!

Don made a final plea to us: "You all keep your text online, so it's very very easy to locate
all your whiches and check them. Please don't cause your instructor any more pain on this
score!"

Sneaky Don had saved one more item of good news to lighten our spirits after this depress­
ing interlude: A letter from Leonard Gillman, editor of the Seirpinski proof over which we
had laboured many moons ago. Professor Gillman was fulsome in his praise of our sugges­
tions, and is now working on an improved write up. Particular credit went to Student B,
of course. Gillman is an Emeritus Professor as of three months ago-Don drools to think
of all the free time he must have.

We moved on. Don claimed to have discovered a new ( ?) rule only by seeing it broken in
three of the papers he read. It is t his: The text should make sense if we read through it
omitting the titles of subsections. So, for example, don't say:

2. Contour Integration. This technique, invented by Cauchy, is used . . .

[90 §37, COMMENTS ON STUDENT \\'o n t-: '


Rather, say:
2. Contour Integration. The technique of integrating along curves in the
complex plane, invented by Cauchy, is used . . .
The point is that the subheading should not be referred to explicitly or regarded as an
"integral" part of the text. Think of it as some kind of marginal note or meta-level
comment.
We spent the rest of the time looking at the draft of a paper about graph theory written
by Ramsey Haddad and Alex Schaffe r. Firstly, Don pointed to their careful attention to
definitions. This is particularly important in graph theory as various authors use terms
inconsistently. A path may or may or not be the same thing as a simple path. At least
one writer uses the term walk to make a distinction here. So it is necessary to define one's
graph theory terms clearly right at the start, even the most basic ones. Remarkably, there
was a time when the symbol '=' was not in general use. Fermat never used it, preferring
always to write ';eq' or 'ad;eq' or fuller Latin words like 'ad;equibantur' that these terms
abbreviate. So in those days you would have to define the symbol ' at the beginning of
='

your article if you intended to use it. (The equals sign was invented by Robert Recorde
in his Whetstone of Wit te, 1557, but it did not come into general use until more than a
hundred years later. Descartes used '=' to mean something completely different. ) The
moral: Ask yourself what background your readers share, and what they may or may not
have in common. "Be aware of what's diverse in your readership" .
We saw a somewhat intimidating multi-part definition. It would become less formidable
to the reader if shortened. In this particular case, the expression

could have been condensed to

since W2 is used nowhere else. (In the Haddad-Schaffer paper, ' . ' means 'zero or more'
and ' + ' means 'one or more', but let's not worry about that here.) Try to be succinct,
said Don: "Less is more."
It is important to be consistent in your use of terms, and you need to be especially careful
about this when working with co-authors. In this paper, one writer talked about 'dom­
inators' and the other about 'parents" referring to the same concept. (Freudian slip? )
A related issue: Don't define terms that you never use. Don recalled Feynman's complaint
about New Maths: you are taught the symbols n and U in second grade, but you don't
use them in any nontrivial way for seven years.
Next came a tricky question of tenses. "Gabow and Tarjan[Gab83] show that for many
algorithms that had such a multiplicative factor in their worst-case complexities, the mul­
tiplicative term can be removed." Here 'had' should be 'have'; an algorithm lives forever,
and its worst-case complexity is a timeless fact about it. However, the problem · solved
by an algorithm can have different known complexities at different times; therefore 'had'

[§37. COMMENTS ON STUDENT WORK 91


would be okay if 'algorithms' were 'problems' . (The quoted sentence also exhibits other
anomalies. A 'multiplicative factor' is not also a 'multiplicative term' ; factors are multi­
plied, terms are added. Also the logic of the sentence can be unwound to make the point
clearer: "Gabow and Tarjan have shown how to impro";e the algorithms by removing such
a multiplicative factor from the worst-case complexities in many cases [Gab83j ." )

We talked about abbreviations for bibliographic references. Don didn't like the lack of
space before the bracket in " . . . Tarjan[Gab83j . . . " ; neither does he like this kind of thing:
"In [Smith 80) it was shown . . . " . References should ideally be parenthetical; we should be
able to read the sentence ignoring them and still have it make sense (cf. subheadings ) . Some
citation styles write up names and dates in full, but this can get repetitious: " . . . Knuth
[Knuth83) has shown that . . . " . Don's paper on goto's was published first in ACM Com·
puting Surveys and later incorporated into a book. For this second printing he had to
make numerous changes to the sentences containing citations, because the originals would
look strange in the different context and format of the book. Oren Patashnik pointed out
that the Chigaco Manual of Style recommends that you don't number references, lest you
have to make changes all through the text every time you insert a new one. This is less of
an issue when a system like TEX handles such things automatically, of course. The CMS
is full of such efficiency tips.

Too many commas can be a bad thing (bad things? ) . For example, consider this sentence:
"Our algorithm to recognize and label the graphs when given a directed graph, G , with
distinguished vertex s, can be summarized as follows." Remove the commas around 'G'
and put one after 'graphs'. A s a rough guide, put a comma where a speaker would pause
to draw breath.

The word "loop" was ambiguous when first used; Don replaced it by "self-loop " .

A sentence in the paper began "If any Hi (j > 0 ) h as . . . " . In fact i t was known that Ho
satisfies the stated condition, so Don suggested that the authors simplify the statement by
omitting the j > 0 condition. Moral: Give a simple rule rather than an optimal one.

Elsewhere we saw " . . . all the Hi 's . . .". This is of course the standard way to form
the plural of a symbol, but you are going to get into trouble when you start also using
the construct Hj (that is, Hi primed) . A simple way to avoid the problem is just to
say: " . . . each Hi is . . , " . Alternatively, you might want to invent another name for the
concept , particularly if you are going to be using it time and time again. It's just not
elegant to have too many symbols crowded on the page. At one point the authors wrote
" . . . of Hi S descendants . . . " . This doesn't work at all; you do need an apostrophe for the
genitive (possessive) case.

Three small points:

Instead of " . . . the one vertex path . . . " , write " . . . the one-vertex path . . . " .

The preposition 'at' would be better than 'of' in " . . . vertex of distance < d.,. " .

We certainly need a space here: " . . . using 4.3(below) we derive . . . " .

Some authors have a disconcerting habit of using a lemma or theorem that is not proved
until later on in the book. This can leave the reader wondering whether someone hasn't

[92 § 3 7 . COMMENTS ON S T UDENT WORK)


pulled a fast one on him (essentially by using a result to prove that same resul t ) . So make
it quite clear to the reader that your proof structures do respect the necessary partial
ordering.
Using ties: 'lEX and other systems allow you to specify that certain blanks are not to
be used as line breaks. For example, put ties after the word 'dominates' in the phrase
'
' v dominates x and x dominates w . In " . . . if e is . . . " it is best to put such a tie
between e and 'is'. The idea is to keep line breaks from interrupting or distorting the
message.
Beware of the unfortunate co-incidence! Sometimes we cannot use an idiom because some
word is also being used in another sense. For example: " . . . n vertices have been deleted
by this point." In one of the term papers, someone was using contour integration to study
aerodynamics. There were airplanes and complex pl anes all over the place, much to Don's
confusion. Another example that came up was the word 'left'. This can be either left as
against right (in a tree structure, say), or a past tense of the verb 'to leave'. So 'the node .T
is left' might be ambiguous.
In a final remark for today, Don suggested that a sequence of examples that build upon
one another is much more useful than a number of unrelated ones. The paper by Haddad
and Schiiffer has a particularly nice sequence of illustrations demonstrating this point.
After class, everyone got back two independently annotated copies of their term papers.

§38. Excerpts from class, December 2 [notes by TLL]


Don welcomed Mary-Claire van Leunen to her encore lecture by pointing out the intriguing
books that Mary-Claire had placed on the desk; he said t hat he hoped that she could now
tell us all the things that the clock had prevented her from telling us last lecture. Mary­
Claire countered by saying that Don had only invited her back "on the theory that no one
could possibly be that nervous two weeks in a row."
Leaving the books alone for the moment, Mary-Claire told us the tale of which and that.
The story opens in the 17th century, when speakers of English have two relative pronouns:
which and that. What are relative pronouns? Here are some sentences (exanlples taken
from the Concise Oxford Dictionary) where which and that are used as relative pronouns,
both singular and plural:
Our Father which art in Heaven . . .
.
These are the ones which I want to learn.
. . . the one that I mean . . .
These are the ones that I want to learn.
Which and that are not always used as relative pronouns. Here are some sentences (again
from the Concise Oxford) where they serve other functions (along with the technical term
for the function they are serving) :
Which? Say which. (interrogative pronoun)
Which one? Say which one. (interrogative adjective)

[§ 38 . ltIARY-CLAIRE VAN LE UNEN ON WHICH VS. THAT


.
911
· . . during which time we can . . . (relati ve adjective)

I like that. (demonstrative pronoun )


I like that thing. ( demonstrative adjective)
· . . not all that wonderful . . . (adverb)
· . . no doubt that he can . . . (subordinating conjunction)
o that we could! (particle)

We have no spoken evidence from the 17th century, but Language Theorists believe that
writing and speech were very far apart. That is, they believe that no one's ideal was
to write the way that he sounded. Theorists cite two pieces of evidence to support this
claim: The first is that the Theorists themselves find it difficult to believe that, in the last
three centuries, spoken English has evolved as fast as it must have if the written language
and the spoken language originally matched. The second piece of evidence comes from
examining extant 17th century guidelines on writing or speaking effectively.

By examining samples of writing from the 19th century (particularly the everyday writing
that was used for communication rather than as examples of great literature), we can see
that the written language has evolved into one much closer to the spoken language. Lan­
guage Theoreticians of that time said that this evolution was good, but their admonitions
came after the direction of evolution was already evident. (We should remember that our
language belongs to millions of people. It cannot be controlled by the decrees of any one
person or group.) We now move on to our own century.

In 1906 H . W. Fowler and F. G. Fowler published The King's English (Oxford University
Press still has it in print) . Here the brothers Fowler write down for the first time that
conversational rhythms are to be reflected in written English.

In 1926 H. W. Fowler published Modern English Usage ( also still available from Oxford
University Press). In this book the surviving brother continues the explanation of the
relationship between spoken and written English-but he does so much more clearly.

While we are following the hot trail of our current subject, we should not lose sight of the
vast range of the contributions that Fowler made in this landmark book. Mary-Claire calls
Fowler the " great theoretician of the semicolon." Fowler saw the semicolon, which has no
spoken equivalent, as a structuring device that operates between the levels of the sentence
and the paragraph. This is just one example of how Fowler tried to utilize the graphical
nature of print to the advantage of written English.

Returning to the evolution of written English toward spoken English, let's examine how
people use which and that when they talk.

Speakers do not use which as a relative pronoun because speakers do not normally express
thoughts that are long enough to contain non-restrictive clauses: OUI spoken sentences
are shorter than our written ones. People do use which when they talk, but they use
non-referential whiches to introduce new thoughts that are tacked on to old thoughts .
Examples of this kind of usage seem strange when written down ( because we don't use
non-referential whiches in written English), but they sound perfectly normal when heard

[ 94 § 38 . MARY-CLAlRE VAN LEUNEN ON WHICH VS. THAT]


on the street. Here is one:

I went sailing this weekend; which tells you why my nose is pink.

Fowler realized that written English would sound more like speech if the choice of relative
pronoun was uniquely determined by whether or not the clause it introduced was restrictive
or non-restrictive. He wrote several thousand words on this subject; here are a few of them:

A supposed, and misleading, distinction is that 'that' is the colloquial and 'which'
the literary relative. That is a false inference from an actual but misinterpreted
fact . It is a fact that the proportion of 'that's to 'which's is far higher in speech
than in writing; but the reason is not that the spoken 'that's are properly con­
verted into written 'which's. It is that the kind of clause properly begun with
'which' is rare in speech with its short detached sentences, but very common in
the more complex and continuous structure of writing, while the kind properly
begun with 'that' is equally necessary in both. This false inference, however,
tends to verify itself by persuading the writers who follow rules of thumb actually
to change the original 'that' of their thoughts into a 'which' for presentation in
print.

The two kinds of relative clause, to one of which 'that' and to the other of which
'which' is appropriate, are the defining and the non-defining; and if writers would
agree to regard 'that' as the defining .relative pronoun, and ' which' as the non­
defining, there would be much gain both in lucidity and in ease. Some there
are who follow this principle now; but it would be idle to pretend that it is the
practice either of most or of the best writers.

There is no doubt that Fowler has had a significant influence on the English language. but
why is it that his effect on American English has been greater than on British English?
To answer that question, we move our focus to New York in the year 1925: Harold Ross
has just founded the New Yorker magazine.

Ross was a man who liked things to be clearly defined. He took Modern English Usage as
gospel. For decades the New Yorker had reliably influential prose, and for decades H. W.
Fowler's dictums were applied blindly to that prose. Mary-Claire was nearly nonplused as
she mentioned reading a collection of letters from a New Yorker editor to various literary
luminaries. ( "I'm sorry, but we had to change all your whiches to thats," sounds r ather
presumptuous when addressed to John Updike .) The New Yorker no longer treats Fowler
as divinely inspired, and they haven't since the 1950s, but that leaves close to three decades
of blind obedience to consider.

According to Mary-Claire, Harold Ross's attachment to obedience is not unusual-for


Americans. She says that Americans look in a reference work, see what it says, and then
decide to obey or to disobey; Britishers look in the same reference, see what it says, and
then formulate new and different ways of treating the same questions. Why is it that
Britishers feel that they are entitled to an opinion but Americans do not? Two··partial
answers might combine to give us a single satisfactory one.

c
�J
.

[§J8 . MARY-CLAIRE VAN LEUNEN ON WHICH VS. THAT _


Most British people have been English speakers for generations, but most Americans are
descendants of recent immigrants. Immigrants are told, "These are the facts. If you want
to speak English, follow the rules . " Perhaps more important , educated British people are
taught to write from day one. Many of the exercises that Mary- Claire gave us in her lecture
on Calisthenics are actually used in British Grammar Schools. British University students
discuss their weekly writing with their tutors-and they regularly write about 2000 words
a week.

In 1 957, we Americans acquired a new source of authority on writing English-this time


American English. An old classmate of E. B. White's sent him a copy of the book that they
had received from their English professor, Will Strunk. The decision-makers at Macmillan
decided to publish a book that contained Strunk's monograph plus an extra chapter by
White on "spiritual things . " The combination of Strunk's clear and simple instructions
and White's beautiful prose made The Elements of Style,
by William Strunk, Jr . . and
E. B. White, the landmark of written style for our generation. Here is their entire essay
on which and that:

"That" is the defining or restrictive pronoun, "which" the non-defining or non­


restrictive. See under Rule 3.

The lawn mower that is broken is in the garage. (Tells which one . )

The lawn mower, which i s broken, i s i n the garage. (Adds a fact about
the only mower in question.)

Rule 3 says "Enclose parenthetic expressions between commas."

Mary-Claire has a copy of the first edition of The Elements of Style, in which White uses
a ' which' for a 'that' (this has been changed in later editions) . The line originally read:

. . . a coinage of his own which he felt was similar to . . .

The Elements of Style has many departures from guidelines presented by Fowler. It was
written for the American audience, and it was written for an audience without a high level
of grammatical sophistication. In contrast, as Mary-Claire said, "Fowler is rough going
for those of us whose Latin is weak and whose Greek is non-existent ." Future editions of
Fowler may need prefaces explaining what adjectives, adverbs, and the like are. It is most
common for people to learn those terms when they learn their first non-native language
( though Latin is the only language to which the terms are perfectly suited) .

Fortunately for native English speakers, there i s a rule completely lacking in jargon that
we can use to determine whether a 'which' should be a 'that ' :

If you c an substitute 'that' for 'which', d o it.

Mary-Claire attributed this rule to Leslie Lamport. Leslie says that his version of the rule
is actually:

If it sounds all right to replace a 'which' by a 'that ' , then Strunk & White say
replace it.

[9 6 § 3 8 . MARY-CLAIRE VAN LEUNEN ON WHICH V·S. T l I .·I ! '


Which brings us to our next issue: Are whiches that could be thats always wrong? Don said
that now that he knows the rules, he finds every "wicked which" an irritating distraction
from his reading enjoyment . He seemed to imply that whiches in restrictive clauses are
always wrong.
Mary-Claire said that the rules given hold for "everyday, expository prose-shirtsleeve
prose, not literary prose." ( She did not tell us how to decide when the everyday rules
should be violated.) As for Don's discomfort on finding whiches in well-beloved authors,
she said "I believe we are encountering obedience here." After class, Leslie Lamport had
this to say on the same subject:

I unfortunately have somewhat the same reaction as Don to "incorrect" uses of


which, for which I curse the evil influence of Strunk & White. When I observed
that writers such as Dickens and Fowles are "incorrect," I quickly lost my desire
to be "correct." However, I can't completely unlearn the reflex of being bothered
by the "incorrect" usage.
I still try to use thats when Strunk & White tells me to, because I know that
many of my readers will have been similarly indoctrinated. But I will throw in
an occasional wicked which to avoid a string of thats.

Mary-Claire's final word on the subject:

" Which and that are not in themselves very important. But tone is important,
and tone consists entirely of making these tiny, tiny choices. If you make enough
of these them wrong-choices like which versus that-then you won't get your
maximum readership. The reader who has to read the stuff will go on reading
it, but with less attention, less commitment than you want. And the reader who
doesn't have to read will stop."

[ §38. MARY-CLAIRE VAN LEUNEN ON WHICH VS. THAT 9 7]


§39. Excerpts fro m class, December 2 (continued) [notes by TLL]
During the final moments of class, Mary-Claire finally got to those intriguing books. They
were worth waiting for.

First, she reminded us of "Franklin's Exercise" from her previous lecture. She read us a

passage from The Autobiography of Benjamin Franklin where Franklin mentions it:

A question was once, somehow or other, started between Collins and me, of the
propriety of educating the female sex in learning, and their abilities for study.
He was of opinion that it was improper, and that they were naturally unequal to
it. I took the contrary side, perhaps a little for dispute's sake. He was naturally
more eloquent , had a ready plenty of words; and sometimes, as I thought, bore
me down more by his fluency than by the strength of his reasons. As we parted
without settling the point, and were not to see one another again for some time,
I sat down to put my arguments in writing, which I copied fair and sent to him.
He answered, and I replied. Three or four letters of a side had passed, when
my father happened to find my papers and read them. Without entering into
the discussion, he took occasion to talk to me about the manner of my writing;
observed that, though I had the advantage of my antagonist in correct spelling
and pointing (which I ow'd to the printing-house) , I fell far short in elegance of
expression, in method and in perspicuity, of which he convinced me by several
instances. I saw the justice of his remarks, and thence grew more attentive to the
manner in writing, and determined to endeavor an improvement.

About this time I met with an odd volume of the Spectator. It was the third.
I had never before seen any of them. I bought it, read it over and over, and was
much delighted with it. I thought the writing excellent , and wished, if possible,
to imitate it. With this view I took some of the papers, and, making short hints
of the sentiment in each sentence, laid them by a few days, and then, without
looking at the book, try'd to compleat the papers again, by expressing each
hinted sentiment at length, and as fully as it had been expressed before, in any
suitable words that should come to hand. Then I compared my Spectator with
the original, discovered some of my faults, and corrected them. But I found I
wanted a stock of words, or a readiness in recollecting and using them, which I
thought I should have acquired before that time if I had gone on making verses;
since the continual occasion for words of the same import , but of different length,
to suit the measure, or of different sound for rhyme, would have laid me under
a constant necessity of searching for variety, and also have tended to fix that
variety in my mind, and make me master of it. Therefore I took some of the tales
and turned them into verse; and, after a time, when I had pretty well forgotten
the prose, turned them back again. I also sometimes jumbled my collections of
hints into confusion, and after some weeks endeavored to reduce them into the
best order, before I began to form the full sentences and compleat the paper.
This was to teach me method in the arrangement of thoughts. By comparing my
work afterwards with the original, I discovered many faults and amended them;
but I sometimes had the pleasure of fancying that, in certain �articulars of small

[ 98 §39 . MARY-CLAIRE VAN LEUNEN ON CALlSTHE.'1[CS (iI)


import, I had been lucky enough to improve the method or the language, and this
encouraged me to think I might possibly in time come to be a tolerable English
writer, of which I was extreamly ambitious.

Next, she reminded us of the verse-writing exercises that she had so highly recommended
during the previous lecture. She showed us the book that she and her student, Steven
Astrachan, had worked through together: A Prosody Handbook, by Shapiro and Beum.
She said what would have been even better was Rhyme's Reason by John Hollander.
Mary-Claire said we should all buy Hollander's book, and then she tried to make sure we
would by playing on the Computer Scientist's love of self-reference. This is Hollander's
description of one particular poetic form:

This form wi th two refrains in parallel?


( J ust watch the opening and the third line .)
The repeti tions build the villanelle.

The subject established, it can swell


across the poet-architect's design:
This form with two refrains in parallel

Must never make them jingle like a bell,


Tuneful but empty, boring and benign;
The repetitions build the villanelle
By moving out beyond the tercet's cell
(Though having two lone rhyme-sounds can confine
This form). With two refrains in parallel

A poem can find its way into a hell


Of ingenuity to redesign
the repetitions. Build the villanelle

Till it has told the tale it has to tell;


Then two refrains will fu).ally intertwine.
This form with two refrains in parallel
The repetitions build: The Villanelle,

Mary-Claire told us that she once wrote out a recipe for making bagels in Alexandrine
couplets. It was a good exercise, and it was hard. She says that it was so hard that she
actually began to believe that the results would be intelligible (and interesting) to someone
else. She sent the recipe off to a food magazine and received "a truly astounded letter of
rejection." She cautioned us again that the verse exercises, useful as they are, "really are
only exercises. "

The final book that she showed us was A Handlist of Rhetorjcal Terms, by Richard A .
Lanham. She said Lanham is the source for many of the great words she dazzles people
with. Some of the terms in Lanham's book are more useful than others; there are some
terms in the book that can only be represented in Greek syllabic verse.

Mary-Claire and Steven wrote out examples of each term in Lanham's book. Having
performed the exercise, Mary-Claire confidently told us that it was not profitable. She

[§ 39 . MARY-CLAIRE VAN L EUNEN ON CALISTHENICS (2) 99]


said that her warning us not to try exercises that won't do us any good proves that she
isn't totally crazy-and that the exercises that she did give us are worth doing.

§40' Excerpts from class, December 4 [notes by P MR]

"All the officer patients in the ward were forced to censor letters written by all
the enlisted-men patients, who were kept in residence in wards of their own. It
was a monotonous job, and Yossarian was disappointed to learn that the lives of
enlisted men were only slightly more interesting than the lives of officers. After
the first day he had no curiosity at all. To break the monotony, he invented
games. Death to all modifiers, he declared one day, and out of every letter that
passed through his hands went every adverb and every adjective. The next day he
made war on articles. He reached a much higher plane of creativity the following
day when he blacked everything in the letters but a, an and the. That erected
more dynamic intralinear tensions, he felt, and in just about every case left a
message far more universal."
- from Catcn-22 by Joseph Heller
Don rewarded today's early birds with the chance to participate in a referendum. We
voted to decide the due-date for the term papers, Monday 14th or Wednesday 16th . UN
observers were not surprised to find the latter date was favoured by the populace; the only
surpcise was that the vote was not quite unanimous. Very well then, said Don: All papers
to b, h nded to him, his secretary or TA's, by 5pm (Pacific Standard Time) on Wednesday
16" - ·ember. (The real early birds were rewarded with some cookies that Sherry was
har.('. ··Jund. And very good they were too).
It \Va.; course too much to hope that we could get through the whole of a CS class
witho\.: .· omputers rearing their ugly heads; today they did. Don's topic was computer
prograus that are supposed to help us with our writing. Two such-style and dict ion­
are available on Navajo (a CSD Unix machine). These are relatively old programs. State­
of-the-art systems cost a lot of money, and so naturally Stanford doesn't have them. There
is a program called sexist, for example, which attempts to alert us to controversial word
usage. Don recalled the occasion when the (London) Times quoted him as saying that it
wasn't appropriate to talk about 'mother and daughter' nodes in a tree structure, and he
received a lot of irate mail as a result. People seem to be less uptight about such things
these days, he said_
The style program takes a piece of text and scores it according to 'readability'. The
analysis is very superficial-way below the level of human critiquing. However, said Don,
these programs are kind of fun. And they do provide an excuse to read the document
from another point of view. Even if the analysis is wrong it does prompt you to re-read
your prose, and this has to be a good thing. Don recalled Richard Feynman's anecdote
about his first day at Oak Ridge Laboratories: Having no idea what he was supposed to
be doing, Feynman pointed to a random symbol in the blueprints and said, "What about
this then?" A technician immediately agreed that Feynman had spotted a significant and
potentially dangerous oversight in the design.

[ 1 00 §40. COMPUTER AIDS TO WRITl.W; ]


To illustrate the programs, Don had run them on a dozen or so sample texts. For instance,
he used a passage from the rather ponderous introduction to a book by Alonzo Church;
samples of PMR's and TLL's notes for CS 209; versions of his own exposition of binomial
coefficients, vintage 1965 and 1985; vVuthering Heights ; Grimm 's Fairy Tales; and part
of a book about the Bible that Don is writing on weekends. The style routine produces
four different readability grades for any piece of text. Each is literally a "grade" in that
it indicates what level of education the piece suggests. The basis of the grading is very
straightforward; it's a linear formula whose variables are the average number of syllables
(or letters) per word and the average number of words per sentence (or sometimes the
reciprocal of this value). For example, there are constants Q , {3, l' such that

grade = Q (words/sentence) + {3 (syllables/word) + 1' .

How were Q , {3 , and l' determined? The authors of each readability index simply look at
a large number of pieces of writing and assign them a grade-level 'by eye'-that is, they
estimate the age of the intended reader. Each piece of text is then characterised by three
real numbers: the average number of words per sentence, the average number of syllables
per word, and the subjective grade level. So each piece determines a single point in 3-space
(plotted against three orthogonal axes) ; the set of pieces determines a scatter of points in
3-space. Standard linear regression techniques are used to find the plane that is the "best
fit" for these points. The three parameters above define this plane.
Someone asked whether we should be shooting for some specific grade level, and if so, what
level? Don replied that his usual aim is to minimise the level, although overdoing this will
defeat the purpose.
In addition to the raw scores, a variety of other parameters come out of a s t yle analysis:
average length of sentences, percentage of sentences that are much shorter or longer than
the average, percentage of sentences that begin with various parts of speech, etc. The
program also attempts to classify sentences into types and tabulate their frequencies, as well
as telling us the percentages of nouns, adjectives, verbs (active or passive), etc. A sentence
is considered "passive" if a passive verb appears in it anywhere, even in a subclause.
Curiously, style classifies any sentence that begins 'It . . . ' or 'There . . . ' as an "expletive."
This seems a little strange to those of us who are old enough to remember Watergate. We
always thought that it was quite a different class of words that the transcribers of Tricky
Dicky's tapes felt the need to delete.
Don's theological piece stood out as being pitched at a significantly lower grade level than
the other specimens. He was initially surprised by this, and double checked the data
to make sure there was no mistake. But on reflection he concluded that we usually write
more obscurely when writing about our own field. The two versions of his binomial chapter
had very similar scores, despite their having been written twenty years apart. Church's
piece scored high. Don said that the statistics were misleading here; although Church's
sentences are quite long, they are not ugly but musical. Still they were not a special joy
for the reader.
The style output also noted a lot of passive voice in Church (perh'l-ps not surprising in a

[§40. COMPUTER AIDS TO WRITING 101 ]


technical work) and a paucity of adjectives in Grimm's Fairy Tales. Don noted that Mark
Twain didn't think much of adjectives either.

A companion program called diction operates on different lines. It has an internal dic­
tionary of 450 words and phrases that it deems 'questionable' and flags them, inviting the
writer to find an alternative way to express himself. For example, diction doesn't like
the word 'gratuitous' , and flags its use as an error. Neither does it like the phrases 'num­
ber of' or 'due to'. Don noted that copy editors generally prefer 'because of' to 'due to'
in ordinary writing, and perhaps diction is overlooking the mathematical usage: "This
theorem, due to Cauchy, is used . . . " . In Don's book TEX: The Program, the copy-editor
changed all Don's 'due to's to 'owing to'; Don changed them all back again. But he
searched unsuccessfully for a reference to the mathematical usage in his dictionaries, so
he wondered aloud if he was completely out of line with the rest of the world. The class
unanimously reassured him that ' due to' was quite the elegant way to give credit for a
scientific innovation. Lexicographers are out of touch here.

The word ' very' is also on dict ion's list of suspects. Don recalled that someone had once
advised him thus: "Try changing all your 'very's to 'damn's and see what results. Don't
use ' very' unless you would happily use 'damn' in its place." Damn good advice!

The diction filter also objected to 'literally' and 'in fact', but partially redeemed itself by
catching a wicked 'which'. A sister program, explain, expands on diction's objections
and recommends improvements. For example, explain suggests that we write 'if' instead
of 'assuming that', and 'really' instead of 'actually'. In practice, users reportedly accept
about 50% of diction's suggestions. And that's as it should be-we've got to keep these
machines in their place.

§41. Excerpts from class, December 7 [notes by P M Rj

Today we heard from our penultimate guest speaker, Rosalie Sterner. Rosalie is a wire
features editor at the San Francisco Chronicle, teaches copy editing at Berkeley, and has
worked as a copy editor for the San Francisco Chronicle, the Kansas City Star, and Chicago
Daily News. So she wields an ultimate pen.

It's a sad truth, Rosalie said, that people who should be able to write well often can't. She
illustrated with a newspaper headline:

D ISABLED FLY TO SEE CARTER

and a story that began: "Doing what he loved best, golf pro John Smith died while . . . " .
She told us about the occasion when · a newspaper was having trouble fitting the word
' psychiatrist' into a headline, and resolved the problem simply by writing 'dentist' instead.

Rosalie went through a story filed by an experienced journalist, pointing out its good and
bad features and the changes she had made.

"Nine out of ten books bought in this century by the U.S. Library of Congress, one
'
of the great research libraries of the world, will self-destruct in ao to 50 years. "

[ 1 02 §41 . ROSALIE STEMER ON COpy EDITING)


She faulted this sentence on a number of counts. The 'great research libraries' phrase
puts the wrong focus on the sentence--we are not really concerned about the status of the
Library of Congress in this article. 'Nine of ten . . . ' would be better, she said; the word
'out' is superfluous. And does the Library of Congress buy the books that it houses? No.
Publishers give books to the Library of Congress, as required by law.
The second sentence noted that the problem is plaguing fine book collectors, among others.
What does 'fine' modify here, Rosalie asked: the books or their collectors?
"Yet many books several hundred years old are in excellent condition, Dr. Norman
Shaffer, the congressional library's director of preservation, said yesterday."
Rosalie thought the subject and verb too far apart. Moreover, she said, it raises the
question: " Why are they in excellent condition?"
Other points: " . . . the cheaper process of making their products . . . "-cheaper than what?
"Another solution lies in persuading . . . "-a solution to what? And who should be doing
the persuading? Rosalie saw a systematic error here. A hallmark of good writing is that
it answers more questions than it raises, she said.
Someone asked whether reporters perhaps write a little sloppily in the knowledge that
the copy-editor will go over their copy and clean it up? Rosalie said that they certainly
3houldn't do this. Someone else pointed out that it's probably a bad idea to start talk­
ing about solutions (to problems) in the same breath as chemical solutions. This is the
"unfortunate coincidence" problem that Don talked about recently.
Rosalie shuddered over an extremely awkward sentence about acids and alkalis-fortunate­
ly someone in the class was able to decipher and explain it. "One hopeful sign . . . " was
another problem-the sign is not full of hope. Over and over again we saw sentences in
which the subject and verb are separated by many words-these are hard to read. Rosalie
pointed out a number of places in which whole phrases could be dropped without any loss
of meaning: "Dr. Shaffer said one of the most encouraging signs is the fact he has heard
one of the largest paper manufacturers . . . " can be reduced to "Dr. Shaffer said one of
'

the largest paper manufacturers . . . " . Whenever you see "he has heard" you can often
improve or delete it, Rosalie said. A similar case: "The reason for removing the spaces
from the list is that . , . " can be (better) written "We remove the spaces because . . . .
"

Rosalie conceded that good writing is very difficult. We must strive to be clear, coherent,
accurate, and concise. This last is especially imp,ortant, she said, and quoted Pascal:
"I have made this letter longer than usual because I lack the time to make it shorter."
Rosalie was pleased to note that the first drafts of our term papers were quite a bit better
than something else she had read recently-a report by a local software company. After
just a few weeks we are pushing out the envelope of Silicon Valley literacy! But many of our
sentences could be improved, she said, by cutting them shorter. Out with the semicolons,
in with the periods. Don't write one long sentence if you can say the same thing in two
short ones. A semicolon should be used only where the separated clauses have a very
close relationship, and even there a period is often better. She quoted William Zinsser
in his book On Writing Well as saying "The semicolon all too ell$ily conveys 'a certain

[§ 4 1 . ROSALIE STEMER ON COpy EDITING 103 )


19th-century mustiness' and slows the pace of the writing." Another common error was
the frequent repetition of a word like 'this', 'they', 'just', or 'then'. Reading the piece
aloud will often help you spot such over-uses.
Rosalie wasn't too keen on some of the conversational idioms that crept into our writing:
sentences that begin 'Anyway, . . . '; an algorithm described as 'pretty straightforward'
(perhaps a bad idea in a paper on pretty-printers?). Neither did she like the phrase
'again iterate through'-this sounds awkward; surely the same point could be put more
smoothly? A lot of sentences suffered from not having their subject near the beginning:
"If . . . , the graphic interface . . . " . Someone suggested that these kinds of problems-as
well as over-use of the passive voice-are easily avoided if we stick to a subject-verb-object
style.
Someone asked whether 'in the context of' wasn't a "noise-phrase" --one that could be
deleted without any loss of meaning? Rosalie said that this was often so, but that some­
times it can mean something. In the example we were looking at, the phrase had been
used early on, so it seemed reasonable to repeat it somewhere else to make it clear to the
reader that we were once again talking about the same thing as before.
A student asked whether perhaps there aren't different styles of writing appropriate to
scientific journals and to newspapers? Newspapers would probably put a greater premium
on simple, direct sentences, for example. Rosalie said that there might be something in
this, but that clear writing was always good.
The sentence "Each of the b.z's are then multiplied by this factor" can be improved on
two counts: Change 'are' to 'is' and eliminate the passive voice. How about: "Multiply
each b.z by this factor"?
If you are using commas to insert a parenthetical note, you must put a comma on both
sides: " . . . this node, b, is thus . . . " .
Rosalie didn't like a sentence that began 'It is unlikely . . . ' " The pronoun reference problem
appears: What is unlikely? There is no uniform rewrite-rule for this, but we can usually
find an alternative construction that conveys the same meaning. Like Jeff Ullman, Rosalie
wasn't enthusiastic about 'This is done by . . ' i better, she said, to say "This procedure
.

(process, step, etc.) is done by . . . " .


A very common error is the misplaced 'only'. To illustrate, Rosalie took the sentence "I hit
him in the eye yesterday" and inserted 'only' in each of the eight possible positions. Sure
enough, each resulting sentence carries a somewhat different force:
Only I hit him in the eye yesterday.
I only hit him in the eye yesterday.
I hit only him in the eye yesterday.
I hit him only in the eye yesterday.
I hit him in only the eye yesterday.
I hit him in the only eye yesterday.
I hit him in the eye only yesterday.
I hi t him in the eye yesterday only.

[ 1 04 §41. ROSALIE STEMER ON COPY EDITING ]


If you say "Here we only calculate the position of two vertices" you probably mean "Here
we calculate the position of only two vertices" .
We saw a sentence that contained four or five occurrences of the word 'then'-surely a
trifle excessive? Someone remarked that the sentence was probably an anglicised version
of a line of computer code, which abounds in 'if . . . then . . . 's, sometimes deeply nested.
Another line that caught Rosalie's eye was: " . . . saving the computation for the place where

it is really needed." The word 'really' was used again the same paragraph. She thought this
sounded altogether too vague for a piece of technical writing: Is the computation needed
or not? What is this "really needed" ? There was a definite difference of opinion over
this question: Some people in the class couldn't see any objection to this usage. Someone
argued that the "really" amounted to stylistic advice (as in "when painting a house, be
especially careful on the window-frames, where precision is really important" ), but it is
by no means superfluous-the word makes a substantive contribution to the meaning of
the sentence. Rosalie's objection stemmed mainly from the fact that the word "really" is
much over-used in colloquial speech. In the end we agreed that it would probably be better
to say something like: " . . . saving the computation for those vertices where the additional
work contributes more to the visual quality" .
Can an object witness a property? To Rosalie's ear this was a strange construction. But
the class assured her that this is common usage in computer science. Technical terms take
on an anarchic life of their own!
In the last minute, Rosalie showed us a list of pairs of words that are frequently-and
sometimes amusingly-conflated. For example, 'prostrate' and 'prostate'. One common
confusion is 'alternately' vs. 'alternatively'. These are not synonyms. (Alternately, Tracy
and I take notes in class. You could read them, or alternatively you could take your own
notes. )

a b d i ca t e , a b rog a t e conv i nce , p e r s u ade nauseated , nauseous


adve r s e , ave r s e con t i nu a l l y , con t i nuous l y no I s y, no I some
a f fec t , e f f e c t d i s i n te r e s ted , un i n t e r e s ted pore , pou r
a l l eged , i n t ended f l au n t , f l ou t p rec i p i ta t e , p rec i p i t o u s
a l l ude , e l ude g an t l e t , g a u n t l e t p ro s t a te , p ro s t r a t e
a l l u s i on , i l l u s i on i mp l y , i n fer rack , wrack
a s sume , p re s ume lay, l ie rebu t , refute
breach , breech l ec te r n , pod i um who , whom

[§41 ROSALIE S TEMER ON COPY EDITING 1 05 ]


§4 � . Excerpts fro m class, December 9 [notes by TLL]

Don started class by introducing Paul Halmos. Paul is a distinguished author, a professor
of mathematics at the University of Santa Clara, and a spicy and entertaining lecturer. As
Don said, "He brings our program of guest speakers to a triumphant conclusion."

Paul started his lecture by wondering why we had called him here. "I don't have anything
new to say," he said. "What I had to say has already been majorized by Don and Mary­
Claire." He said that even the act of talking about mathematical writing was difficult, by
comparison with the act of talking about mathematics itself. We don't have to remember
much about math, because we know its structure; we can develop and discover the material
as we talk about it. The structure of mathematical writing is much more elusive, so how
do we know what to say about it? Sure, Paul brought several pages of prepared notes to
class, but he claims that even those won't help h.im much.

Not that the subject of mathematical writing isn't important. Some mathematicians have
disdain for anything other than great theorems. "Anything else is beneath them." But
they are wrong. Mathematicians who merely think great theorems have no more done
their job than painters who merely think great paintings.

Paul has read our handouts, and he wants to make a few comments. He wants to have a
dialog with us; he admonished us to break in whenever we feel the urge.

He is going to drift in and out of many different topics but only after he has given us an
anchor and a rough outline. The anchor? Two basic rules:

Do organize material.
Do not distract the reader.

The outline? Four aspects of good mathematical communication:

Semantics (words, and the job they do);


Syntax (also known as grammar);
Symbols (very meaningful to mathematical writers) ;
Style (synthesis of the above).

Turning first to Semantics, Paul spoke to us about the natural process of change inherent
in language and how it affects our word usage. Some changes are good-some changes are
bad. According to Paul, one of the most often discussed symptoms of that change is the
word 'hopefully'.

The most recent literary tradition, handed down to us by our grandparents, tells us that
'hopefully' means the exact opposite of 'hopelessly' :

" I don't have have a chance i n the world t o be promoted," he growled hopelessly.
"My chances look good," his colleague grinned hopefully.

But another, impersonal use of 'hopefully' has become popular-an evasive form in which
one can say "Hopefully he won't be re-elected" instead of "I hope he won't." This confiicts
with the normal usage of other words that can end both -fully and -lessly. Although we
may think that interest rates will rise, we don't say "Thoughtfully interest rates will rise."

[106 §42. PAUL HALMOS ON MATHEMATTCAL WRTTTNGj


Although we may fear that muggers are in the street, we don't say "Fearfully those muggers
are still out there." Consistent English usage would prohibit

Hopefully I'll visi t you again next year,

as much as it prohibits

Hopelessly I'll not be able to come.

Paul doesn't like the new usage, which he calls "illogical and ugly." The mere fact of change
is bothersome. But he realizes that his is only one vote, and he seems to be outnumbered.
On balance, it is perhaps a good change, one that might even make communication easier.
"The English language won't collapse if the other side wins." In fact, Paul says,

Arguably the change is a needed one.

But he is surprised to hear himself saying that.

Paul sees other changes as needless and careless. It grates on his ears when he hears,

The earthquake decimated more than half the houses.

Of course some would say, Why do we need to reserve a special word for the random
destruction of one out of ten? Paul thinks muddying the meaning of the word is bad, but
he admits that it is harmless.

Other unneeded, and harmful, obfuscations should be discouraged. 'Imply' does not mean
'infer' , and ' disinterested' does not mean 'uninterested'. To confuse these words is to' lose
valuable distinctions. Tragically, the differences between these words are becoming so
confused that if we are writing for a large audience, and if we need to make use of the
distinctions, we probably shouldn' t .

Evidence of bad changes can even be found in our handouts. In § 4 , one of the TAs (not the
one with the charming British Accent) used 'reference' as a verb. Paul's response: "There
is no such verb, and if there were, it sure as hell wouldn't be transitive." How would it
sound to say "1 quotationed the author"?

Barry Hayes pointed out that in Computer Science, 'reference' is a technical term used
as a verb. Technical terms like 'majorize' sometimes creep into our vocabularies. Don
supported him by saying that computer programmers "reference and de-reference things
all the time." Paul's response: "My condolences. You know, the French say English is
ruining their language. How the French feel about English is how 1 feel about that."

We moved on to Syntax. "Obviously," said Paul, "people approve of it; nobody uses un­
grammatical English on purpose." Syntax changes more slowly than semantics. However,
he once heard the following lovely sentence:

If I'da knowed I coulda rode, I woulda went.

This has rhythm, it's communicative, it's personal; but of course it's not grammat ical
English. Therefore it distracts the reader from what is actually being said. Here's another
non-made-up example:

Us'll go along with she if her'll go along with we.

[§42 . PA UL HALMOS ON MATHEMATICAL WRITING


If we are trying to communicate with people who use such grammar, we should use their
language so as not to distract them from what we're saying. But as technical writers we
are presumably not addressing that audience, certainly not in print.
Paul would like to advance the thesis that grammar is logic. This notion is abhorrent
to linguists, who see grammar as illogical, inconsistent, and contradictory; and they are
right. Nevertheless, grammar is the organizational principle that lies behind linguistic
communication. A typical English sentence like 'He saw her' contains case, tense, and
gender; such things give a tremendous amount of information in condensed form, and they
can be seen as logic. To identify grammar with logic is less of an error than to reject logic
altogether.
Speaking of case, Paul says, "Cases are good things, even though in English by now they
are vestigial." They do exist, and they must be treated with respect. We say
I don't know him,

but we wouldn't be caught dead saying


I don't know he.

Similarly, we say
He is the President of France,
but never
Him is the President of France.
Therefore we would not logically ask,
Whom is the President of France?
Simple, right? Well, there are more confusing cases too:
I don't know who is the President of France.

Or should it be 'I don't know whom is the President of France'? A grammatical push-pull
is involved here. (The nominative wins, and 'Who' is correct.)
Paul would like to stamp out abuses such as 'I hate whomever said that'. An attention to
logical rules of grammar helps us to clarify our own thinking in general.
Taking issue with part of our first handout, Paul says the rule "A preposition is a bad
word to end a sentence with" is "reactionary grammarian balderdash". Consider:
Palo Alto is a good place to live in.
Don Knuth is fun to have a drink with.
There aren't many people I would say that to.
All of these are examples of prepositions in "post position" that could only be ruined by
being made grammatically pure. (We have all heard Winston Churchill's famous statement
about "the sort of nonsense up with which I will not put.") Why should we do gymnastics
for sentences with only one preposition at the end? Paul gave us a famous sentence ending
with five prepositions:
..

What did you want to bring that book I didn't want to be read to out of up for?

[ 1 08 § 42. PA UL HALMOS ON MATHEMATICAL WRITI.V G]


On the discussed and re-discussed subject of 'which' and 'that', he says that :-'Iary-Claire
stole his thunder. It is worthwhile to get it right, but it is not terribly important.
We began discussing Symbols by discussing punctuation. Paul urges everyone (contrary
to rule #25 in § 1 ) to place quotation marks logically, every time. He gave us what he
sees as a ridiculous example from Kate Turabian, whom he calls "The Antichrist," in The
Chicago Manual of Style:

See the section on "Quotations," which may be found elsewhere in this volume.
Paul was incensed. "Horrors" , he said. "You see the iJlogic, don't you? There's no
reason for it. It's not a grammatical convention-it's a totally arbitrary typographer's
convention. The battle against this sort of stupidity can be won." He has succeeded in
getting his own books punctuated logically. Bob Floyd gave support by mentioning how
deadly such conventions are in a book about computer programming.
But then Don remarked that one of Paul's two main points was not to distract the reader.
Paul said, "And your implied, snide, argument?" "Well," said Don, "I guess I'm implying
that you think you're distracting only the copy editors and not the readers." "Yes, I believe
that's right, with respect to commas and quotation marks."
Mary-Claire asked, "Just how far are you willing to go in the direction of logic? Are you
willing to place periods outside the quotation marks in actual dialog that already has its
own punctuation?" . Her example:
He said, "No." .
Paul said that if you push him in a corner he might go so fas as to say "Yes.". And
Mary-Claire responded, "That's what I thought. Luckily there's not much dialog in the
sort of stuff you write." . (Paul conceded that he doesn't really have an ear for dialog and
doesn't have immediate plans to break into the world of fiction. He would love to write
a novel, some piece of literature that isn't expository, but he's not being held back by an
inability to punctuate.)
The second Symbols-related point that Paul wanted to bring to our attention was the
subject of written versus symbolic numerals. He gave us an examples where 'one' could
either be a pronoun or a numeral, depending on the context:
What are we to do when x is one?
The sentence preceding that one may have been
The solutions of the equation are the singularities of the
function we are studying.
Or it may have been
Everything is clear when x is 2 or greater.
Another example (this time from Birkhoff & MacLane's classic text):
The first few positive primes are
2, 3, 5, 7, 11, . . . .
Any positive integer which is not one

lS42 . PAUL HALMOS ON MATHEMATICAL WRITING 1 09]


or a prime can be factored . . .

He urges us to remove such ambiguities by using '1' when we want to speak of the numeral.

The number of solutions is either two or three.


The only solutions are 2 or 3.
Paul now moved on to the final area of discussion: Style.

Rule #6 in § 1 suggests that we use 'we' to avoid passive voice. This use of 'we' is equivalent
to "the reader and 1" . Paul says that even better is to avoid both passive voice and the
use of 'we' through judicial use of imperative and indicative moods along with an outlying
kind of non-sentential phrase. For example,

We can now prove the following result:

becomes

A consequence of all this is the following result .

Or,
Consequence: A implies B.
The latter technique can occasionally be used in a sequential manner,

Consequence 1 : X. Consequence 2: Y.

ending wi th a final blaze of glory,

Conclusion: Z.
Alternatively, here's an example of imperative mood:

All we need to do to get the answer is to replace x by 7 throughout.

Just say
Replace x by 7 throughout.

Paul finds this less distracting. Using 'we' is not a crime, but it adds an irrelevant dimension
that can often be replaced by something clearer and smoother.

He gave a lengthier example of a typical passage that shows how both 'we' and passive
voice can be avoided without sounding artificial:

If U is (something) , then the spectral theorem justifies the assumption


(something) . If f is (something) , then (something) equals (something) .
Since, however, y is (something) , it follows that· (something) . Since,
moreover, the assumption (something) imples that (something) , the Le­
besgue theorem is applicable. This completes the proof of convergence.

(The example would be more effective, of course, if the (something) s were replaced by
meaty concepts, but that would distract us from the point at issue.)

* Passive, God forgive me, or at least not active; but this phrase is standard aJ;ld inof·
fensive to my ear.

[110 §42. PA UL HALMOS ON MATHEMATICAL WRITl.V; !


An audience member asked if using 'we' introduced a light tone that imperative doesn ' t
have. Paul agreed that it does, and stated that he isn't sure he wants that tone in his
writing.
Another questioner asked about first person singular? Paul likes it, but he admits that it
can be disturbing: "Who does that jerk think he is?" He reluctantly agrees that the first
person singular should be avoided in formal technical writing.
Leslie Lamport asked at this point if the use of 'we' could not be avoided by avoiding
prose proofs in favor of tabular proofs. Paul didn't like that idea at all: He finds symbols
insidious and much prefers prose proofs. But then he had second thoughts, saying that he
and Lamport might not disagree too much on the need to rethink the techniques of proof
presentation. Outline form (not too heavily symbolic) might be advantageous.
Paul said that he casts all possible votes in favor of Rule #9 in § 1 : Do not echo unusual
words. We had been told that this is a good idea because it avoids monotony. Paul says
that it is a good idea because two uses of the same word in unrelated passages will be
associated in the reader's mind and cause unwarranted connections. (Bob Floyd says that
it will also cause technical typists to omit all words between the two occurrences.)
Paul's next bugaboo ( "Do I dare do this thing?") was the phrase 'he or she' when he
feels the traditional neuter pronoun 'he' would be sufficient. As soon as he brought this
up, Mary-Claire disagreed, but Paul held the floor and quoted from authority by reading
Mary-Claire's words from page 4 of her own book:
This 'his' is generic, not gendered. 'His or her' becomes clumsy with repetition
and suggests that 'his'. alone elsewhere is masculine, which it isn't. 'Her' alone
draws attention to itsfM and distracts from the topic at hand.
Mary-Claire responded, "Deeply moving quotation, but it is not true that the traditional
solution to this problem in English is 'he'. The traditional solution is 'they'."
Many people in the audience stated pieces of opinions, but time was nearly up. "To each
their own." Paul moved to the next topic: Proof by contradiction. He emphasized that
proofs by contradiction should not be used if a direct proof is available. For example,
he noted that. proofs of linea.r independence' often say, "Suppose the variables are linearly
dependent. Then there are eoefficients, not all zero, such that . . . contradicting the as­
sumption that the coefficients are nonzero." This circuitous route can usually be replaced
by a direct argument: "If the linea.r relation . . holds, the coefficients are all zero. Hence
.

the variables are linearly independent."


Don pointed out that proof by contradiction is often the easiest way to prove something
when you're first solving a problem for yourself, but such stream-of-consciousness proofs
don't usually lead to the best exposition.
Paul wound up his speech by repeating his opening rules: "Do organize," and "Do not
distract. "
The trouble is that it is hard to say what organization is. But we recognize it when we see
it. "Give me a book, or a paper or a manuscript, and I'll tell you if it is organized," said

[§42. PA UL HALMOS ON MATHEMATICAL WRITING 1111


- "

Paul. The material is in linear order, but organization means much more than that. " The
plot of an exposition is rarely a straight line." Branches and alternative threads must be
woven together. Paul says he spends most of his writing time working on organization of the
material. He suggests that we look at Roget's Thesaurus, an Encyclopedia, a do-it-yourself
article, and a·good textbook, for increasingly complex examples of non-linearly-organized
presentations.
"Do organize," and "Do not distract." Except that all rules are made to be broken. When
you want to jar your readers, Paul suggests that you distract them by changing your
notation, screaming ungrammatical sentences, or being awkwardly repetitious.
His final words to the class were, "Anything that helps communication is good. Anything
that hurts is bad. And that's all I have to say."

§ 43 . Excerpts from class, December 1 1 [notes by TLL]

The final lecture of CS 209 was partially devoted to course evaluation. (We were, no doubt,
harsh but fair. ) Don told us that we would spend the last 40 minutes of class looking at
the notes of people who have been going over our handouts but haven't had a chance to
speak. ( More course evaluations, perhaps?) Don said that he wanted to "end on a note of
honesty and truth."
The first comments that he addressed were from Nelson Blachman (father of course member
Nancy Blachman). Nelson is very interested in writing (he writes papers frequently), and
he took the time to suggest improvements to the first few handouts.
Don liked some of these suggestions, but he found others incompatible with his personal
style. He said, "The main thing that I get from this is that the style has to be your own.
You will write things that someone else will never write." Don says he has learned this
lesson well by writing an annual Christmas letter with his wife, Jill. "We get along 364
days of the year," he said, "but there is no way that we can write a sentence acceptable
to both of us." (They have solved the problem by writing alternate paragraphs.)
Among Nelson's suggestions were:
Changing 'the above proof' to 'the proof above'. Don agrees with this change
mostly because editors are forever calling him on it, but the original usage doesn't
sound terribly odd to him. Nelson says that 'above' and 'below' are two adjectives
that never precede the things they modify. Don thinks 'above' has become an
adjective, but 'below' hasn't (yet).
Changing ', i.e.' to '; i.e.' . Don says that that is a matter of taste and pacing.
Changing the spelling of 'hiccups' to 'hiccoughs'. Don's dictionaries preferred the
shorter spelling.
Changing 'depending on the usage, the terms this, that, or the other might be
used' to 'depending on the usage, the term this, that, or the other might be use<:i'.
Don didn't see this as an improvement.

[1 1 2 343 · FINAL TRl"THSi


/

Changing 'programming language notation' to ' programming-language notation'.


Don said that the suggestion might be appropriate for readers in other disciplines,
but in our field the hyphenation would become annoying. Analogous cases are
'random number generator' and 'floating point arithmetic', each of which is po­
tentially ambiguous, but so familiar in computer science that a hyphen looks
wrong.
Then Don briefly showed us an example of a problem that often occurs when mathemati­
cians are allowed to typeset their own text. A novice typesetter tends to make fractions
like n(n+1;( 2 n + 1 ) instead of using the more readable slashed form n(n + 1 )(2n + 1)/ 3.
Next, we returned to Mary-Claire'S essay on 'hopefully' (see §26 above). Don says that
he passed it out to us more for the style of the essay than the content, but it does make
good technical points as well. To his surprise, Mary-Claire said that after re-reading it
she actually wanted to improve the style. (This proves once again that nothing is perfect.)
Here is what Mary-Claire wrote to him:
1 ) The dates should be expressed in the same terms. Given that I'm going to need to
say '1637', I have to say 'late in the 1500s', not 'late in the 16th century'.
2) The sentence
Impersonal substantives, on the other hand, serve less often than per­
sonal ones at the head of the kind of active verbs we modify with adverbs
of manner
is so horrid I'd prefer to think I was drunk when I wrote it. To fix it I have to
rewrite the whole paragraph, sliding 'adverb of manner' up earlier:
As with most adjectives, both of these 'hopeful's regularly produced
'-ly' adverbs of manner. The kind of hopefulness that means expectant
and eager produced adverbs more readily than the kind that means
promising and bright. There's nothing mysterious about that difference
in frequency. The pattern
(personal noun) (active verb) (adverb of manner)
is very common. People can carry themselves hopefully or eye a desirable
object hopefully or prepare themselves hopefully for a possible future.
The pattern
(impersonal noun) (active verb) (adverb of manner)
is less common. Impersonal nouns serve less often than personal ones
as subjects of the kind of active verbs that we modify with adverbs of
manner. Nonetheless, a wager can be shaping up hopefully, a day can
begin hopefully, . . . etc.
Bob Floyd sent a few comments to Don, beginning with his opinion of the usage of. hope­
fully. First, he reports that only 44% of the American Heritage Usage Panel found the

[§ 43 . FINAL TRUTHS 1 1 :I ]
use of hopefully as a sentential adverb acceptable. Bob also provided several authoritative
quotations to support his objection to its use. (Don said that this is the main concern:
Using 'hopefully' raises hackles in many people, distracting them from what you're trying
to say; that's why he doesn't use i t . But he thinks some of the documents that Bob uses to
support his position were probably written by the the people that Mary-Claire was calling
ignorant in her essay. )

Tom Henzinger, who is Austrian, observed that the German language has a common
word 'hoffentlich' that corresponds precisely to the new English usage of 'hopefully'. This
reminded Don that he often needs words that the English language just doesn't have. For
example, we have hundreds of ways to say that Jane beat Jim, but we have few ways to
say that Jim lost to Jane. (And we have to use two words in the latter case where only
one is needed in the former. ) Don said:

Our language often lacks verbs that correspond to "reflexive" relations. We have
an abundance of words like 'dominate' but none like 'dominate or equal to'. So
we must use long-winded phrases like 'less than or equal to'; sometimes, but not
often enough, we can say 'at most '.

Returning to Bob Floyd's comments, Bob sent Don several citations to support his claim
that exclamation points should be used only with actual exclamations or interjections.
Some examples: Ouch! Stop! Thief! Well, I'll be! To Don's surprise, none of the authorities
even mention that exclamation points can indicate surprise! Paul Halmos, speaking from
the peanut gallery today, told about a trick he has to get around this: You can put the
exclamation point in parentheses(!) . * Then everybody is happy, because you've made an
exclamation of surprise.

Bob said, "Advice to always avoid splitting infinitives is unwise." Don agreed that split
infinitives can provide good emphasis and that rewrites can sound forced or awkward.

About not ending sentences with prepositions, Bob said, "You have no case, give up." Don
agreed, saying that he had not understood the issue. "Coming from Milwaukee, where half
the people speak English with a heavy dose of German, has made me oversensitive to
sentences that end funny." However, there is a problem with sentences ending with prep­
ositions, namely when they already have a structure that accommodates the preposition
in the middle:

Avoid such prepositions, which such sentences end with. The people who don't
like the rule against prepositions in post position would never think of writing
such sentences, so they probably have forgotten why the overly restrictive rule
was first formulated.

Bob next objected to Don's suggestion not to omit 'that's. Don admitted that there are
cases when leaving out a that produces a better sentence. For example, 'He said he was

* Don was able to use that trick the next day in Chapter 8 of his book. (Who said this
course wasn't practical?) But he found that it was like an unusual word: You can't easily
repeat it twice in the same chapter.

[1 1 4 §43. FINAL TRUTHS)


/

going' is a better sentence than 'He said that he was going. ' But, in this example ·that ·
is not needed as a grammatical help because the pronoun (in nominative case) keeps the
syntax clear. In technical writing we often have more complicated sentences, which can
benefit from the extra information that 'that' provides.
Someone in the class mentioned a related issue: Should the word 'then' be used in sentences
like "If I get there early enough, (then) I will save you a seat." (Rosalie had suggested
that it should not . ) Don says that there is a difference between technical writing and
newspaper writing, and he believes that well placed 'then's can make a paper more easily
understood. In that particular sentence he would definitely leave out 'then' : but in mathe­
matical contexts (where the phrase after the comma is likely a mathematical statement) he
would definitely leave it in. Don says that our brains only have time to do simple parsing
when we are reading for speed and comprehension. As Paul Halmos said, "Anything that
helps communication is good."
The final subject that Don introduced was a behind-the-scenes discussion between Mary­
Claire, Don, and one of the class members: Dan Schroeder. Dan received the comments on
his term paper and objected to the claim that he had "wicked-whiches" ; he gave involved
logical reasoning in support of why his whiches really should be whiches. Don said, "If you
have to think that long about the sentence, it is probably wrong." Mary-Claire said that
writers have to contend with overly-sensitive readers like Don, who wince at all whiches
that aren't preceded by commas or prepositions.
In one place Dan did not place a comma before a which because he was concerned about
coincident commas. This is what Mary-Claire has to say about coincident commas:
Coincident commas are not a sign of bad construction, any more than the co­
incidence of a final comma and a period, or a final comma and a semicolon, or
any other two marks of punctuation. Where two commas coincide, we write only
one. Where a comma and a period coincide, we write the period. Etc. Truly,
coincident punctuation is not a problem.
(Did you catch the coincident periods there?)
After this comment we were thrown from the room in order to make way for another class.
As always in this course, there was more to say than there was time to say it in.

Postscript: The instructor received an anonymous contribution after class, in response to


his request for a poetically stated computer program:
This algorithm t o count bits
Rotat es VALUE one left and sums its
tvo ' s-comp negation
in a zeroed location
Repeats WORDuLENGTH times . then exit s .

(Not only does this rhyme and scan, it also works. In fact , it may be the fastest way to
do sideways addition on the GE635 and similar machines.)

[ §43. FINAL TRUTHS


SIGACT News 18 Apr.-June 1976

BIG OMICRON AND BIG OMEGA AND BIG THETA

Donald E. Knuth
Computer Science Department
Stanford University
Stanford, California 94305

Most of us have gotten accustomed to the idea of using the notation


O(f(n)) to stand for any function whose magnitude is upper-bounded by a
constant times f(n) , for all large n . Sometimes we also need a
corresponding notation for lower-bounded functions, i.e., those functions
which are at least as large as a constant times f(n) for all large n .
Unfortunately~ people have occasionally been using the O-notation for
lower bounds, for example when they reject a particular sorting method
"because its running time is O(n 2) " I have seen instances of this in
print quite of tent and finally it has prompted me to sit down and write
a Letter to the Editor about the situation.
The classical literature does have a notation for functions that are
bounded below, namely ~(f(n)) . The most prominent appearance of this
notation is in Titchmarsh's magnum opus on Riemann's zeta function [8],
where he defines ~(f(n)) on p. 152 and devotes his entire Chapter 8 to
"~ -theorems". See also Karl Prachar's Primzahlverteilung [7], P. 245.
The ~ notation has not become very common, although I have noticed
its use in a few places, most recently in some Russian publications I
consulted about the theory of equidistributed sequences. Once I had
suggested to someone in a letter that he use ~ -notation "since it had
been used by number theorists for years"; but later, when challenged to
show explicit references, I spent a surprisingly fruitless hour searching
in the library without being able to turn up a single reference. I have
recently asked several prominent mathematicians if they knew what ~(n 2)
meant, and more than half of them had never seen the notation before.
Before writing this letter, I decided to search more carefully, and
to study the history of O-notation and o-notation as well. Cajori's two-
volume work on history of mathematical notations does not mention any of
these. While looking for definitions of ~ I came across dozens of books
from the early part of this century which defined O and o but not ~ .
SIGACT News 19 Apr.-June 1976

I found Landau's remark [6, p. 883] that the first appearance of 0 known
to him was in Bachmann's 1894 book [i, p. 401]. In the same place, Landau
said that he had personally invented the o-notation while writing his
handbook about the distribution of primes; his original discussion of 0
and o is in [6, pp. 59-63].
I could not find any appearances of ~ -notation in Landau's publications;
this was confirmed later when I discussed the question with George P61ya, v~o
told me that he was a student of Landau's and was quite familiar with his
writings. P61ya knew what ~ -notation meant, but never had used it in
his o}m work. (Like teacher, like pupil, he said.)
Since ~ notation is so rarely used, my first three trips to the
library bore little fruit, but on my fourth visit I was finally able to
pinpoint its probable origin: Hardy and Littlewood introduced Q in their
classic 1914 memoir [4, p. 225], calling it a "new" notation. They used
it also in their major paper on distribution of primes [5, see pp. 125ff],
but they apparently found little subsequent need for it in later works.
Unfortunately, Hardy and Littlewood didn't define ~(f(n)) as I wanted
them to; their definition was a negation of o(f(n)) , namely a function
whose absolute value exceeds Cf(n) for infinitely many n , when C is a
sufficiently small positive constant. For all the applications I have seen
so far in computer science, a stronger requirement (replacing "infinitely
many n" by "all large n") is much more appropriate.
After discussing this problem with people for several years, I have
come to the conclusion that the following definitions will prove to be
most useful for computer scientists:

O(f(n)) denotes the set of all g(n) such that there exist positive
constants C and nO with Ig(n)l ~ Cf(n) for all n ~ nO .

~(f(n)) denotes the set of all g(n) such that there exist positive
constants C and nO with g(n) k Cf(n) for all n k n0 .

®(f(n)) denotes the set of all g(n) such that there exist positive
constants C, C' , and nO with Cf(n) < g(n) < C'f(n) for all
n ~ nO
SIGACT News 20 Apr.-June 1976

Verbally, O(f(n)) can be read as "order at most f(n) "; ~(f(n)) as


"order at least f(n) "; ®(f(n)) as "order exactly f(n) ". Of course,
these definitions apply only to behavior as n ~ ~ ; when dealing with
f(x) as x ~ 0 we would substitute a neighborhood of zero for the
neighborhood of infinity, i.e., Ixl <_ x 0 instead of n > nO .
Although I have changed Hardy and Littlewood's definition of ~ ,
I feel justified in doing so because their definition is by no means in
wide use, and because there are other ways to say what they want to say
in the comparatively rare cases when their definition applies. I like
the mnemonic appearance of ~ by analogy with 0 , and it is easy to
typeset. Furthermore, these two notations as defined above are nicely
complemented by the ® -notation~ which was suggested to me independently
by Bob Tarjan and by Mike Paterson.
The definitions above refer to "the set of all g(n) such that ...",
rather than to "an arbitrary function g(n) with the property that ...";
I believe that this definition in terms of sets~ which was suggested to me
many years ago by Ron Rivest as an improvement over the definition in the
first printing of my volume i, is the best way to define O-notation.
Under this interpretation, when the O-notation and its relatives are
used in formulas, we are actually speaking about sets of functions rather
than single functions. When A and B are sets of functions, A+B
denotes the set {a+b I acA and bcB] , etc. ; and " i + O(n -I) " can be taken
to mean the set of all functions of the form l+g(n) , where Ig(n)l ~ Cn -I
for some C and all large n . The phenomenon of .one-way eq~uskl!ties
arises in this connection, i.e., we write l + O ( n -I) = 0(i) but not
0(i) : l + O ( n -I) The equal sign here really means c (set inclusion),
and this has bothered many people who propose that we not be allowed to
use the = sign in this context. My feeling is that we should continue
to use one-way equality together with O-notations~ since it has been
common practice of thousands of mathematicians for so many years now, and
since we understand the meaning of our existing notation sufficiently well.
We could also define w(f(n)) as the set of all functions whose ratio
to f(n) is unbounded, by analogy to o(f(n)) Personally I have felt
little need for these o-notations; on the contrary, I have found it a good
discipline to obtain 0-estimates at all times, since it has taught me
about more powerful mathematical methods. However, I expect someday I may
SIGACT News 21 Apr.-June 1976

have to break down and use o-notation when faced with a function for
which I can't prove anything stronger.
Note that there is a slight lack of symmetry in the above definitions
of 0 , N , and ® , since absolute value signs are used on g(n) only in
the case of 0 . This is not really an anomaly, since 0 refers to a
neighborhood of zero while N refers to a neighborhood of infinity.
(Hardy's book on divergent series uses 0L and OR v~en a one-sided
O-result is needed. Hardy and Littlewood [5] used NL and ~R for
functions respectively < -Cf(n) and > Cf(n) infinitely often. Neither
of these has become widespread.)
The above notations are intended to be useful in the vast majority
of applications~ but they are not intended to meet all conceivable need£.
For e x s m ~ l % if you are dealing with a function like (log log n) c°s n
you ~ g h t want a notation for "all functions which oscill~te between
log log n and i/log log n where these limits are best possible". In
such a case, a local notation for the purpose~ confined to tme pages of
whatever paper you are writing at the t i m % should suffice; it isn't
necessary to worry about standard notations for a concept unless that
concept arises frequently.
I would like to close this letter by discussing a competing way to
denote the order of function growth. My library research turned up the
surprising fact that this alternative approach actually antedates the
O-notation itself. Paul du Bois-Reymond [2] used the relational notations

g(n) < f(n) , f(n) > g(n)

already in 1871 , for positive functions f(n) and g(n) , with the meaning
we can now describe as g(n) : o(f(n)) (or as f(n) = e(g(n)) ). Hardy's
interesting tract on "Orders of Infinity" [3] extends this by using also
the relations

g(n) ~ f(n) , f(n) g(n)

to mean g(n) = O(f(n)) (or, equivalently, f(n) = m(g(n)) , since we are


as s~ming that f and g are positive). Hardy also wrote

f(n) ~ g(n)

when g(n) = ®(f(n)) , and


SIGACT News 22 Apr.-June 1976

~W
f(n) ~ g(n)

when i ~ n'-'°° f(n)/g(n) exists and is neither 0 nor ~ ; and he wrote

f(n) ~ g(n)
v
when lim n ~ f(n)/g(n) = i . (Hardy's ~ notation may seem peculiar at

firstj until you realize what he did with it; for example, he proved the
following nice theorem: "If f(n) and g(n) are any functions built up
recursively from the ordinary arithmetic operations and the exp and log
functions, we have exactly one of the three relations f(n) < g(n)
f(n) ~ g(n) , or f(n) ~ g(n) .")
Hardy's excellent notation has become somewhat distorted over the years.
For example, Vinogradov [9] writes f(n) << g(n) instead of f(n) < g(n) ;
thus, Vinogradov is comfortable with the formula

2002 << (n
2) ,
while I am not. In any even% such relational notations have intuitively
clear transitive properties~ and they avoid the use of one-way equalities
which bother some people l Why, then, should they not replace 0 and the
new symbols ~ and ® ?
The main reason why 0 is so handy is that we can use it right in the
middle of formulas (and in the middle of English sentences~ and in tables
which show the running times for a family of related algorithms~ etc.).
The relational notations require us to transpose everything but the function
we are estimating to one side of an equation. (Cf. [7], P. 191.) Simple
derivations like

l+ n
-~ : exp(H n in(l + Hn/n))

: exp(Hn(Hn/n + O(log n/n)2))

: exp(Hn2/n + O((log n)3/n2))

= exp((in n + y ) 2 / n + O((log n)5/n2))

: (I + O((log n)3/n2)e (In n + y ) 2 / n

would be extremely cumbersome in relational notation.


SIGACT News 23 Apr.-June 1976

When I am working on a problem~ my scratch paper notes often contain


ad-hoc notations, and I have been using an expression like " (~ 5n 2) "
to stand for the set of all functions which are < 5n 2 Similarly~ I can
write " ( ~ 5n 2) " to stand for functions which are asymptotic to 5n 2 ,
etc.; and " ( ~ n 2) " would therefore be equivalent to O(n 2) , if I made
appropriate extensions of the ~ relation to functions which may be
negative. This would provide a unifomn notational convention for all
sorts of things~ for use in the middle of expressions, giving more than
just the 0 and ~ and ® proposed above.
In spite of this, I much prefer to publish papers ~ t h the 0 , ~ ,
and ® notations; I would use other notations like " ( ~ 5n 2 )" only
~en faced with a situation that needed it. ~y? The main reason is ;
that O-notation is so universally established and accepted, I would not
feel right replacing it by a notation " (< f(n) )" of my ovm invention,
however logically conceived; the O-notation has now assumed important
mnemonic significance, and we are comfortable with it. For similar
reasons, I am not abandoning decimal notation although I find that octal
(say) is more logicai. And I like the ~ and ® notations because they
now have mnemonic signif$cance inherited from 0 .
Well, I think I have beat this issue to death, knowing of no other
arguments pro or con the introduction of ~ and @ . On the basis of the
issues discussed here, I propose that members of SIGACT, and editors of
computer science and mathematics journals, adopt the O , ~ , and ®
notations as defined above, unless a better alternative can be found
reasonably soon. Furthermore I propose that the relational notations of
Hardy be adopted in those situations where a relational notation is more

s©propriate.

References

[i] Paul Bachms~n, Die Analytische Zahlentheorie. Zahlentheorie, pt. 2

(Leipzig: B. G. Teubner, 1894).


[2] Paul du Bois-Reymond, "Sur la grandeur relative des infinis des
fonctions," Annali di Mat. ~ura ed app!i c. (2), 4 (1871), 338-353.
[3] G. H. IIardy, "Orders of Infinity," Cambridge Tracts in Math. and
Math. Physics, 12 (1910; Second edition, 1924).
SIGACT News 24 Apr. -June 1976

[4] G. H. Hardy and J. E. Littlewood, "Some problems of Diophantine


approximation," Acta Mathematica 37 (1914), 155-238.
[5] G. H. Hardy and J. E. Littlewood, "Contributions to the theory of
the Riemann zeta function and the theory of the distribution of
primes," Acta Mathematiea 41 (1918), 119-196.
[6] Edmund Landau, Handbuch der Lehre yon der Verteilung der Primzahlen,
2 vols. (Leipzig: B. G. Teubner, 1909).
[7] Karl Prachar, Primzahlverteilung (Berlin: Springer, 1957).
[8] E. C. Titchmarsh, The Theory of the Riemann Zeta-Function (Oxford:
Clarendon Press, 1951).
[9] I. M. Vinogradov, The Method of Trigonometrical Sums in the Theory
of Numbers, translated from the 1947 Russian edition by K. F. Roth
and Anne Davenport (London: Interscience, no date).

II May 1976

Dear Editor,

The reader of "The Four Russians' Algorithm for Boolean Matrix


Multiplication is Optimal in its Class" (News, Vol. 8, No. I] is Advised
that its contents are essentially subsumed by "An Algorithm for the
Computation of Linear Forms" by J. E. Savage, SIAM J. Comput. Vol. 3
(1974) I~0-I~8, which the author has kindly brought to my attention.
Savage presents therein a generalization of the Four Russians' Algorithm,
several applications of it, and a counting argument lower bound similar
to Moon and Moser's.
Sincerely,

Dana Angluin
Oral History of Donald Knuth

Interviewed by:
Edward Feigenbaum

Recorded: March 14, 2007 and March 21, 2007


Mountain View, California

CHM Reference number: X3926.2007

© 2007 Computer History Museum


Oral History of Donald Knuth

Edward Feigenbaum: My name is Edward Feigenbaum. I’m a professor at Stanford University, in the
Computer Science Department. I’m very pleased to have the opportunity to interview my colleague and
friend from 1968 on, Professor Don Knuth of the Computer Science Department. Don and I have
discussed the question of what viewers and readers of this oral history we think there are. We’re
orienting our questions and comments to several groups of you readers and viewers. First, the generally
intelligent and enlightened science-oriented person who has seen the field of computer science explode
in the past half century and would like to find out what is important, even beautiful, and what some of the
field’s problems have been. Second, the student of today who would like orientation and motivation
toward computer science as a field of scholarly work and application, much as Don and I had to do in the
1950s. And third, those of you who maybe are not yet born, the history of science scholar of a dozen or
100 years from now, who will want to know more about Donald Knuth, the scientist and programming
artist, who produced a memorable body of work in the decades just before and after the turn of the
millennium. Don and I share several things in our past in common. Actually, many things. We both went
to institutes of technology, I to Carnegie Institute of Technology, now Carnegie Mellon University, and
Don to Case Institute of Technology, now Case Western Reserve. We both encountered early computers
during that college experience. We both went on to take a first job at a university. And then our next job
was at Stanford University, in the new Computer Science Department, where we both stayed from the
earliest days of the department. I’d like to ask Don to describe his first encounter with a computer. What
led him into the world of computing? In my case, it was an IBM 701, learned from the manual. In Don’s
case, it was an IBM Type 650 that had been delivered to Case Institute of Technology. In fact, Don even
dedicated one of his major books to the IBM Type 650 computer. Don, what is the story of your discovery
of computing and your early work with this intriguing new artifact?

Donald Knuth: Okay. Thanks, Ed. I guess I want to add that Ed and I are doing a team thing here; next
week, I’ll be asking Ed the questions that he’s asking me today. We thought that it might be more fun for
both of us and, also for people who are listening or watching or reading this material, to see the
symmetrical approach instead of having a historian interviewing us. We’re colleagues, although we work
in different fields. We can give you some slants on the thing from people who sort of both have been
there. We’re going to be covering many years of the story today, so we can’t do too much in-depth. But
we also want to do a few things in depth, because the defining thing about computer science is that
computer science plunges into things at low levels as well as sticking on high level. Since we’re going to
cover so many topics, I’m sure that I won’t sleep tonight because I’ll be saying to myself, “Oh, I should’ve
said such and such when he asked me that question”. So I think Ed and I also are going to maybe add
another little thing to this oral interview, where we might want to add a page or two of afterthoughts that
come to us later, because then I don’t have to be so careful about answering every question that he asks
me now. The interesting thing will be not only the wrong answer that pops first in my mind, but also
maybe a slightly corrected thing. One of the stories of my life, as you’ll probably find out, is I try to get
things correct. I probably obsess about not making too many mistakes. Okay. Now, your question, Ed,
was how did I get into the computing business. When the computers were first built in the ‘40s I was ten
years old, so I certainly was not a pioneer in that sense. I saw my first computer in 1957, which is pretty
late in the history game as far as computers are concerned. On the other hand, programming was still
pretty much a new thing. There were, I don’t know, maybe 2,000 programmers in the world at that time.
I’m not sure how to figure it, but it was still fairly early from that point of view. I was a freshman in college.
Your question was: how did I get to be a physics student there in college. I grew up in Milwaukee,
Wisconsin. Those of you who want to do the math can figure out, I was born in 1938. My father was the
first person among all his ancestors who had gone to college. My mother was the first person in all of her
ancestors who had gone to a year of school to learn how to be a typist. There was no tradition in our

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 2 of 73


Oral History of Donald Knuth

family of higher education at all. I think [that’s] typical of America at the time. My great-grandfather was a
blacksmith. My grandfather was a janitor, let’s say. The people were pretty smart. They could play
cards well, but they didn’t have an academic background. I don’t want to dwell on this too much, because
I find that there’s lots of discussion on the internet about the early part of my life. There’s a book called
Mathematical People, in which people asked me these questions at length -- how I got started. The one
thing that stands out most, probably, is when I was an eighth grader there was a contest run by a local TV
station, a company called Zeigler’s Giant Bar. They said, “How many words can you make out of the
letters in ‘Zeigler’s Giant Bar’?” Well, there’s a lot of letters there. I was kind of intrigued by this question,
and I had just seen an unabridged dictionary. So I spent two weeks going through the unabridged
dictionary, finding every word in that dictionary that could be made out of the letters in “Zeigler’s Giant
Bar”. I pretended that I had a stomachache, so I stayed home from school those two weeks. The bottom
line is that I found 4500 words that could be made, and the judges had only found 2500. I won the
contest, and I won Zeigler’s Giant Bars for everybody in my class, and also got to be on television and
everything. This was the first indication that I would obsess about problems that would take a little bit of -
- what do you call it? -- long attention span to solve. But my main interest in those days was music. I
almost became a music major when I went to college. Our high school was not very strong in science,
but I had a wonderful chemistry and physics teacher who inspired me. When I got the chance to go to
Case, looking back, it seems that the thing that really turned it was that Case was a challenge. It was
supposed to have a lot of meat. It wasn’t going to be easy. At the college where I had been admitted to
be a music major, the people, when I visited there, sort of emphasized how easy it was going to be there.
Instead of coasting, I think I was intrigued by the idea that Case was going to make me work hard. I was
scared that I was going to flunk out, but still I was ready to work. I worked especially hard as a freshman,
and then I coasted pretty much after that. In my freshman year, I started out and I found out that my
chemistry teacher knew a lot of chemistry, but he didn’t know physics or mathematics. My physics
teacher knew physics and chemistry, but he didn’t know much about mathematics. But my math teacher
knew all three things. I was very impressed by my math teacher. Then in my sophomore year in physics,
I had to take a required class of welding. I just couldn’t do welding, so I decided maybe I can’t be a
physicist. Welding was so scary. I’ve got these thousands of volts in this stuff that I’m carrying. I have to
wear goggles. I can’t have my glasses on, I can’t see what I’m doing, and I’m too tall. The table is way
down there. I’m supposed to be having these scary electrons shooting all over the place and still connect
X to Y. It was terrible. I was a miserable failure at welding. On the other hand, mathematics! In the
sophomore year for mathematicians, they give you courses that are what we now call discrete
mathematics, where you study logic and things that are integers instead of continuous quantities. I was
drawn to that. That was something, somehow, that had great appeal to me. Meanwhile, in order to
support myself, I had to work for the statisticians at Case. First this meant drawing graphs and sorting
cards. We had a fascinating machine where you put in punch cards and they fall into different piles, and
you can look at what comes out. Then I could plot the numbers on a graph, and get some salary from
this. Later on in my freshman year there arrived a machine that, at first, I could only see through the
glass window. They called it a computer. I think it was actually called the IBM 650 “Univac”. That was a
funny name, because Univac was a competing brand. One night a guy showed me how it worked, and
gave me a chance to look at the manual. It was love at first sight. I could sit all night with that machine
and play with it. Actually, to be exact, the first programs I wrote for the machine were not in machine
language but in a system called The Bell Interpreter System. It was something like this. You have an
instruction, and the instruction would say, “Add the number in cell 2 to the number in cell 15 and put the
result in cell 30.” We had instructions like that, a bunch of them. This was a simple way to learn
programming. In fact, I still believe that it might be the best way to teach people programming, instead of
teaching them what we call high-level language right now. Certainly, it’s something that you could easily
teach to a fourth or fifth grader who hasn’t had algebra yet, and get the idea of what a machine is. I was

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 3 of 73


Oral History of Donald Knuth

pledging a fraternity, and one of my fraternity brothers didn’t want to do his homework assignment where
he was supposed to find the roots of a fifth-degree equation. I looked at some textbooks, and it told me
how to solve a fifth-degree equation. I programmed it in this Bell Interpretive Language. I wrote the
program. My memory is that it worked correctly the first time. I don’t know if it really gave the right
answers, but miraculously it ground out these numbers. My fraternity brother passed his course, I got
into the fraternity, and that was my first little program. Then I learned about the machine language inside
the 650. I wrote my first program for the 650 probably in the spring of my freshman year, and debugged it
at night. The first time I wrote the program, it was about 60 instructions long in machine language. It was
a program to find the prime factors of a number. The 650 was a machine that had decimal arithmetic with
ten-digit numbers. You could dial the numbers on the console of the machine. So you would dial a ten-
digit number, and my program would go into action. It would punch out cards that would say what are the
factors of this number that you dialed in there. The computer was a very slow computer. In order to do a
division instruction, it took five milliseconds. I don’t know, is that six orders of magnitude slower than
today’s machines, to do division? Of course, the way I did factoring was by division. To see if a number
was divisible by 13, I had to divide by 13. I divided by 15 as well, 17, 19. It would try to find everything
that divided. If I started out with a big ten-digit number that happened to be prime -- had no dividers at all
-- I think it would take 15 or 20 minutes for my program to decide. Not only did my program have about
60 or so instructions when I started, they were almost all wrong. When I finished, it was about 120, 130
instructions. I made more errors in this program than there were lines of code! One of the things that I
had to change, for example, that took a lot of work, was I had originally thought I could get all the prime
factors onto one card. But a card had 80 columns, and each number took ten digits. So I could only get
eight factors on a single card. Well, you take a number like 2 to the 32nd power, that’s going to take four
cards. Because it’s two times two times two times two [and so on]. I had to put in lots of extra stuff in my
program that would handle these cases. So my first program taught me a lot about the errors that I was
going to be making in the future, and also about how to find errors. That’s sort of the story of my life, is
making errors and trying to recover from them. Did I answer your question yet?

Feigenbaum: No.

Knuth: No.

Feigenbaum: Don, a couple questions about your early career, before Case and at Case. It’s very
interesting that you mentioned the Zeigler’s Giant Bar, because it points to a really early interest in
combinatorics. Your intuition at combinatorics is one of the things that impresses so many of us. Why
combinatorics, and how did you get to that? Do you see combinatorics in your head in a different way
than the rest of us do?

Knuth: I think that there is something strange about my head. It’s clear that I have much better intuition
about discrete things than continuous things. In physics, for example, I could pass the exams and I could
do the problems in quantum mechanics, but I couldn’t intuit what I was doing. I didn’t feel right being able
to get an “A” on an exam without ever having the idea of how I would’ve thought of the questions that the
person made up solving the exam. But on the other hand, in my discrete math class, these were things
that really seemed part of me. There’s definitely something in how I had developed by the time I was a
teenager that made me understand discrete objects, like zeros and ones of course, or things that are
made out of alphabetical letters, much better than things like Fourier transforms or waves -- radio waves,

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 4 of 73


Oral History of Donald Knuth

things like this. I can do these other things, but it’s like a dog standing on his hind legs. “Oh, look, the
dog can walk.” But no, he’s not walking; he’s just a dog trying to walk. That’s the way it is for me in a lot
of the continuous or more geometrical things. But when it comes to marks on papers, or integer numbers
like finding the prime factors of something, that’s a question that appealed to me more than even finding
the roots of polynomial.

Feigenbaum: Don, question about that. Sorry to interject this question, behaving like a cognitive
psychologist.

Knuth: This is what you’re paid to do, right?

Feigenbaum: Right. Herb Simon -- Professor Simon, of Carnegie Mellon University -- once did a set of
experiments that kind of separated thinkers into what he called “visualizers” and “symbolizers”. When
you do the combinatorics and discrete math that you do, which so amazes us guys who can’t do it that
well, are you actually visualizing what’s going on, or is it just pure symbol manipulation?

Knuth: Well, you know, I’m visualizing the symbols. To me, the symbols are reality, in a way. I take a
mathematical problem, I translate it into formulas, and then the formulas are the reality. I know how to
transform one formula into another. That should be the subtitle of my book Concrete Mathematics: How
to Manipulate Formulas. I’d like to talk about that a little. It started out… My cousin, Earl, who died, Earl
Goldschlager [ph?], he was a engineer, eventually went to Southern California, but I knew him in
Cleveland area. When I was in second grade he went to Case. He was one of the people who sort of
influenced me that it may be good to go to Case. When I was visiting him in the summer, he told me a
little bit about algebra. He said, “If you have two numbers, and you know that the sum of these numbers
is 100 and the difference of these numbers is 20, what are the two numbers?” He said, “You know how
you can solve this, Don? You can say X is one of the numbers and Y is one of the numbers. X plus Y is
100. X minus Y is 20. And how do you find those numbers?” he says. “Well, you add these two
equations together, and you get 2X = 120. And you subtract the equation from each other, and you get
2Y = 80. So X must be 60, and Y must be 40. Okay?” Wow! This was an “aha!” thing for me when I
was in second grade. I liked symbols in this form. The main thing that I enjoyed doing, in seventh grade,
was diagramming sentences. NPR had a show; a woman published a book about diagramming
sentences, “The Lost Art of Diagramming Sentences”, during the last year. This is where you take a
sentence of English and you find its structure. It says, “It’s a noun phrase followed by a verb phrase.”
Let’s take a sentence here, “How did you get to be a physics student?” Okay. It’s not a noun phrase
followed by a verb phrase; this is an imperative sentence. It starts with a verb. “How did you get…” It’s
very interesting, the structure of that sentence. We had a textbook that showed how to diagram simple
English sentences. The kids in our class, we would then try to apply this to sentences that weren’t in the
book, sentences that we would see in newspapers or in advertisements. We looked at the hymns in the
hymnal, and we couldn’t figure out how to diagram those. We spent hours and hours trying to figure this
out. But we thought about structure of language, and trying to make these discrete tree structures out of
English sentences, in seventh grade. My friends and I, this turned us on. When we got to high school,
we breezed through all our English classes because we knew more than the teachers did. They had
never studied this diagramming. So I had this kind of interest in symbols and diagramming early on --
discrete things, early on. When I got into logic as a sophomore, and saw that mathematics involved a lot
of symbol manipulation, then that took me there. I see punched cards in this. I mean, holes in cards are

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 5 of 73


Oral History of Donald Knuth

nice and discrete. The way a computer works is totally discrete. A computer has to stand on its hind legs
trying to do continuous mathematics. I have a feeling that a lot of the brightest students don’t go into
mathematics because -- curious thing -- they don’t need algebra at the level I did. I don’t think I was
smarter than the other people in my class, but I learned algebra first. A lot of very bright students today
don’t see any need for algebra. They see a problem, say, the sum of two numbers is 100 and the
difference is 20, they just sort of say, “Oh, 60 and 40.” They’re so smart they don’t need algebra. They
go on seeing lots of problems and they can just do them, without knowing how they do it, particularly.
Then finally they get to a harder problem, where the only way to solve it is with algebra. But by that time,
they haven’t learned the fundamental ideas of algebra. The fact that they were so smart prevented them
from learning this important crutch that I think turned out to be important for the way I approach a
problem. Then they say, “Oh, I can’t do math.” They do very well as biologists, doctors and lawyers.

Feigenbaum: You’re recounting your interest in the structure of languages very early. Seventh grade, I
think you said. That’s really interesting. Because among the people -- well, the word “computer science”
wasn’t used, but we would now call it “information technology” people -- your early reputation was in
programming languages and compilers. Were the seeds of that planted at Case? Tell us about that early
work. I mean, that’s how I got to know you first.

Knuth: Yeah, the seeds were planted at Case in the following way. First I learned about the 650. Then,
I’m not sure when it was but it probably was in the summer of my freshman year, where we got a program
from Carnegie -- where you were a student -- that was written by Alan Perlis and three other people.

Feigenbaum: “IT”.

Knuth: The IT program, “IT”, standing for Internal Translator.

Feigenbaum: Yeah, it was Perlis, [Harold] van Zoeren, and Joe Smith.

Knuth: In this program you would punch on cards a algebraic formula. You would say, “A = B + C.”
Well, in IT, you had to say, “X1 = X2 + X4.” Because you didn’t have a plus sign, you had to say, “A” for
the plus sign. So you had to say, “X1 Z X2 A X4.” No, “S,” I guess, was plus, and “A” was for absolute
value. But anyway, we had to encode algebra in terms of a small character set, a few letters. There
weren’t that many characters you could punch on a card. You punch this thing on a card, and you feed
the card into the machine. The lights spin around for a few seconds and then -- punch, punch, punch,
punch -- out come machine language instructions that set X1 equal to X2 + X4. Automatic programming
coming out of an algebraic formula. Well, this blew my mind. I couldn’t understand how this was
possible, to do this miracle where I had just these punches on the card. I could understand how to write a
program to factor numbers, but I couldn’t understand how to write a program that would convert algebra
into machine instructions.

Feigenbaum: It hadn’t yet occurred to you that the computer was a general symbol-manipulating device.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 6 of 73


Oral History of Donald Knuth

Knuth: No. No, that occurred to Lady Lovelace, but it didn’t occur to me. I’m slow to pick up on these
things, but then I persevere. So I got hold of the source code for IT. It couldn’t be too long, because the
650 had only 2,000 words of memory, and some of those words of memory had to be used to hold the
data as well as the instructions. It’s probably, I don’t know, 1,000 lines of code. The source code is not
hard to find. They published it in a report and I’ve seen it in several libraries. I’m pretty sure it’s on the
internet long ago. I went through every line of that program. During the summer we have a family get-
together on a beach on Lake Erie. I would spend the time playing cards and playing tennis. But most of
the time I was going through this listing, trying to find out the miracle of how IT worked. Ok, it wasn’t
impossible after all. In fact, I thought of better ways to do it than were in that program. Since we’re in a
history museum, we should also mention that the program had originally been developed when Perlis
was at Purdue, before he went to Carnegie, with three other people there. I think maybe Smith and van
Zoeren came with Alan to Carnegie. But there was Sylvia Orgel and several other people at Purdue who
had worked on a similar project, for a different computer at Purdue. Purdue also produced another
compiler, a different one. It’s not as well-known as IT. But anyway, I didn’t know this at the time, either.
The code, once I saw how it happened, was inspiring to me. Also, the discipline of reading other people’s
program was something good to learn early. Through my life I’ve had a love of reading source materials -
- reading something that pioneers had written and trying to understand what their thought processes were
in order to write this out. Especially when they’re solving a problem I don’t know how to solve, because
this is the best way for me to put into my own brain how to get past stumbling blocks. At Case I also
remember looking at papers that Fermat had written in Latin in the 17th century, in order to understand
how that great number theorist approached problems. I have to rely on friends to help me get through
Sanskrit manuscripts and things now, but I still…. Just last month, I found, to my great surprise, that the
concept of orthogonal Latin squares, which we’ll probably talk about briefly later on, originated in North
Africa in the 13th century. Or was it the 14th century? I was looking at some historical documents and I
came across accounts of this Arabic thing. By reading it in French translation I was able to see that the
guy really had this concept, orthogonal Latin squares, that early. The previous earliest known example
was 1724. I love to look at the work of pioneers and try to get into their minds and see what’s happening.

Feigenbaum: One of the things worth observing -- it’s off the track but as long as we’re talking about
history -- is that our current generation, and generations of students, don’t even know the history of their
own field. They’re constantly reinventing things, or thoughtlessly disregarding things. We’re not just
talking about history going back in time hundreds of years. We’re talking about history going back a
dozen years, or two-dozen years.

Knuth: Yeah, I know. It’s such a common failing. I would say that’s my major disappointment with my
teaching career. I was not able to get this across to any of my students this love for that kind of
scholarship, reading source material. I was a complete failure at passing this on to the people that I
worked with the most closely. I don’t know what I should’ve done. When I came to Stanford from
Caltech, I had been researching Pascal. I couldn’t find much about Pascal’s work in the Caltech library.
At Stanford, I found two shelves devoted to it. I was really impressed by that. Then I came to the
Stanford engineering library, and everything was in storage if it was more than five years old. It was a
basket case at that time, in the 60’s.

Knuth: I’ve got to restrain myself from not telling too much about the early compiler. But anyway, after
IT, I have to mention that I had a job by this time at the Case Computing Center. I wasn’t just growing
grass for statisticians anymore. Case was one of the very few institutions in the country with a really

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 7 of 73


Oral History of Donald Knuth

enlightened attitude that undergraduate students were allowed to touch the computers by themselves,
and also write software for the whole campus. Dartmouth was another place. There was a guy named
Fred Way who set the policy at Case, and instead of going the way most places go, which would hire
professionals to run their computer center, Case hired its own students to play with the machines and to
do the stuff everybody was doing. There were about a dozen of us there, and we turned out to be fairly
good contributors to the computing industry in the long range of things. I told all of my friends how this IT
compiler worked, and we got together and made our own greatly improved version the following year. It
was called RUNCIBLE. Every program in those days had to have an acronym and this was the Revised
Unified New Compiler Basic Language Extended, or something like this. We found a reason for the
name. But we added a million bells and whistles to IT, basically.

Feigenbaum: All on the 2000 word drum.

Knuth: All on the 2000 word drum. Not only that, but we had four versions of our compiler. One of them
would compile to assembly language. One would compile directly into machine language. One version
would use floating point hardware. And one version would use floating point attachment. If you changed
613 instructions, you would go from the floating point attachment to the floating point hardware version. If
you changed another 372 instructions, it would change from the assembly language version to the
machine language version. If we could figure out a way to save a line of code in the 373 instructions in
one version, then we had to figure out a way to correspondingly save another line of code in the other
version. Then we could have another instruction available to put in a new feature. So RUNCIBLE went
through the stages of software development that have since become familiar, where there is what they
call “creeping featurism”, where every user you see wants a new thing to be added to the software. Then
you put that in and pretty soon the thing gets… you have a harder and harder user manual. That is the
way software always has been. We got our experience of this. It was a group of us developing this;
about, I don’t know, eight of us worked together on different parts of it. But my friend, Bill Lynch, and I
did most of the parts that were the compiler itself. Other people were working on the subroutines that
would support the library, and things like that. Since I mentioned Bill Lynch, I should also, I guess... I
wrote a paper about the way RUNCIBLE worked inside, and it was published in the Communications of
the ACM during my senior year, because we had seen other articles in this journal that described
methods that were not as good as the ones that were in our compiler. So we thought, okay, let’s put it to
work. But I had no idea what scientific publishing was all about. I had only experienced magazines
before, and magazines don’t give credit for things, they just tell the news. So I wrote this article and it
explained how we did it in our compiler. But I didn’t mention Bill Lynch’s name or anything in the article. I
found out to my great surprise afterwards that I was getting credit for having invented these things, when
actually it was a complete team effort. Mostly other people, in fact. I had just caught a few bugs and
done a lot of things, but nothing really very original. I had to learn about scholarship, about scientific
publishing and things as part of this story. So we got this experience with users, and I also wrote the user
manual for this machine. I am an undergraduate. Case allows me to write the user manual for
RUNCIBLE, and it is used as a textbook in classes. Here I’ve got a class that I am taking; I can take a
class and I wrote the textbook for it already as an undergraduate. This meant that I had an unusual
visibility on campus, I guess. The truth is that Case was a really great college for undergraduates, and it
had superb teachers. But it did not have very strong standards for graduate studies. It was very difficult
to get admitted to the undergraduate program at Case, and a lot of people would flunk out. But in
graduate school it wasn’t so hard to get over. I noticed this, and I started taking graduate courses,
because there was no competition. This impressed my teachers -- “Oh, Knuth is taking graduate

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 8 of 73


Oral History of Donald Knuth

courses” -- not realizing that this was line of least resistance so that I could do other things like write
compilers as a student. I edited a magazine and things like that, and played in the band, and did lots of
activity. Now [to] the story, however: What about compilers? Well, I got a job at the end of my senior
year to write a compiler for Burroughs, who wanted to sell their drum machine to people who had IBM
650s. Burroughs had this computer called the 205, which was a drum machine that had 4000 words of
memory instead of 2000, and they needed a compiler for it. ALGOL was a new language at the time.
Somebody heard that I knew something about how to write compilers, and Thompson Ramo Wooldridge
[later TRW Inc.] had a consulting branch in Cleveland. They approached me early in my senior year and
said, “Don, we want to make a proposal to Burroughs Corporation that we will write them an ALGOL
compiler. Would you write it for us if we got the contract?” I believe what happened is that they made a
proposal to Burroughs that for $75,000 they would write a ALGOL compiler, and they would pay me
$5,000 for it, something like this. Burroughs turned it down. But meanwhile I had learned about the 205
machine language, and it was kind of appealing to me. So I made my own proposal to Burroughs. I said
I’ll write you an ALGOL compiler for $5,000, but I can’t implement all of ALGOL. I think I told them I can’t
implement all of ALGOL for this; I am just one guy. Let’s leave out procedures -- subroutines. Well, this
is a big hole in the language! Burroughs said, “No, no -- you got to put in procedures.” I said, “Okay, I
will put in procedures, but you got to pay me $5,500.” That’s what happened. They paid me $5,500,
which was a fairly good salary in those days. I think a college professor was making eight or nine
thousand dollars a year in those days. So between graduating from Case and going to Cal Tech, I
worked on this compiler. As I drove out to California, I drove a 100 miles a day and I sat in a motel and
wrote code. The coding form on which I wrote this code, I now donated it to the Computer History
Museum, and you can see exactly the code that I wrote. I debugged it, and it was Christmas time [when]
I had the compiler ready for Burroughs to use. So I was interested; I had two compilers that I knew all the
code by the end of the ‘60s. Then I learned about other projects. When I was in graduate school some
people came to me and said, “Don, how about writing software full time? Quit graduate school. Just
name your price. Write compilers for a living, and you will have a pretty good living.” That was my
second year of graduate school.

Feigenbaum: In what department at Cal Tech?

Knuth: I was at Cal Tech in the math department. There was no such thing as a computer science
department anywhere.

Feigenbaum: Right. But you didn’t do physics.

Knuth: I didn‘t do physics. I switched into math after my sophomore year at Case, after flunking
welding. I switched into math. There were only seven of us math majors at Case. I went to Cal Tech,
and that’s another story we’ll get into soon. I’m in my second year at Cal Tech, and I was a consultant to
Burroughs. After finishing my compiler for Burroughs, I joined the Product Planning Department. The
Product Planning Department was largely composed of people who had written the best software ever
done in the world up to that time, which was a Burroughs ALGOL compiler for the 220 computer. That
was a great leap forward for software. It was the first software that used list processing and high level
data structures in an intelligent way. They took the ideas of Newell and Simon and applied them to
compilers. It ran circles around all the other things that we were doing. I wanted to get to know these
people, and they were by this time in the Product Planning Group, because Burroughs was doing its very

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 9 of 73


Oral History of Donald Knuth

innovative machines that are the opposite of RISC. They tried to make the machine language look like
algebraic language. This group I joined at Burroughs as a consultant. So I had a programming hat when
I was outside of Cal Tech, and at Cal Tech I am a mathematician taking my grad studies. A startup
company, called Green Tree Corporation because green is the color of money, came to me and said,
“Don, name your price. Write compilers for us and we will take care of finding computers for you to
debug them on, and assistance for you to do your work. Name your price.” I said, “Oh, okay.
$100,000.”, assuming that this was… In that era this was not quite at Bill Gate’s level today, but it was
sort of out there. The guy didn’t blink. He said, “Okay.” I didn’t really blink either. I said, “Well, I’m not
going to do it. I just thought this was an impossible number.” At that point I made the decision in my life
that I wasn’t going to optimize my income; I was really going to do what I thought I could do for… well, I
don’t know. If you ask me what makes me most happy, number one would be somebody saying “I
learned something from you”. Number two would be somebody saying “I used your software”. But
number infinity would be… Well, no. Number infinity minus one would be “I bought your book”. It’s not
as good as “I read your book”, you know. Then there is “I bought your software”; that was not in my own
personal value. So that decision came up. I kept up with the literature about compilers. The
Communications of the ACM was where the action was. I also worked with people on trying to debug the
ALGOL language, which had problems with it. I published a few papers, like ”The Remaining Trouble
Spots in ALGOL 60” was one of the papers that I worked on. I chaired a committee called “Smallgol”
which was to find a subset of ALGOL that would work on small computers. I was active in programming
languages.

Feigenbaum: Was McCarthy on Smallgol?

Knuth: No. No, I don’t think he was.

Feigenbaum: Or Klaus Wirth?

Knuth: No. There was a big European group, but this was mostly Americans. Gosh, I can’t remember.
We had about 20 people as co-authors of the paper. It was Smallgol 61? I don’t know. It was so long
ago I can’t remember. But all the authors are there.

Feigenbaum: You were still a graduate student.

Knuth: I was a graduate student, yeah. But this was my computing life.

Feigenbaum: What did your thesis advisors think of all this?

Knuth: Oh, at Case they thought it was terrible that I even touched computers. The math professor said,
“Don’t dirty your hands with that.”

Feigenbaum: You mean Cal Tech.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 10 of 73


Oral History of Donald Knuth

Knuth: No, first at Case. Cal Tech was one of the few graduate schools that did not have that opinion,
that I shouldn’t touch computers. I went to Cal Tech because they had this [strength] in combinatorics.
Their computing system was incredibly arcane, and it was terrible. I couldn’t run any programs at Cal
Tech. I mean, I would have to use punched paper tape. They didn’t even have punch cards, and their
computing system was horrible unless you went to JPL, Jet Propulsion Laboratory, which was quite a bit
off campus. There you would have to submit a job and then come back a day later. You couldn’t touch
the machines or anything. It was just hopeless. At Burroughs I could go into what they called the
fishbowl, which was the demonstration computer room, and I could run hands-on every night, and get
work done. There was a program that I had debugged one night at Burroughs that was solving a problem
that Marshall Hall, my thesis advisor, was interested in. It took more memory than the Burroughs
machine had, so I had to run it at JPL. Well, eight months later I had gotten the output from JPL and I
had also accumulated the listings that were 10 feet high in my office, because it’s a one- or two-day
turnaround time and then they give you a memory dump at the end of the run. Then you can say, “Oh, I’ll
change this and I’ll try another thing tomorrow.” It was incredibly inefficient, brain damaged computing at
Cal Tech in the early ‘60s. But I kept track with the programming languages community and I became
editor of the programming languages section of the Communications of the ACM and the Journal of the
ACM in, I don’t know, ’64, ’65, something like that. I was not a graduate student, but I was just out of
graduate school in the ‘60s. That was definitely the part of computing that I did by far the most in, in
those days. Computing was divided into three categories. By the time I came to Stanford, you were
either a numerical analyst, or artificial intelligence, or programming language person. We had three
qualifying exams and there was a tripartite division of the field.

Feigenbaum: Don, just before we leave your thesis advisor: your thesis itself was in mathematics, not in
computing, right?

Knuth: Yes.

Feigenbaum: Tell us a little bit about that and what your thesis advisor’s influence on your work was at
the time.

Knuth: Yeah, because this is combinatorial, and it’s definitely an important part of the story.
Combinatorics was not a academic subject at Case. Cal Tech was one of the few places that had it as a
graduate course, and there were textbooks that began to be written. I believe at Stanford, for example,
George Danzig introduced the first class in combinatorics probably about 1970. It was something that
was low on the totem pole in the mathematics world in those days. The high on the totem pole was the
Bourbaki school from France, of highly abstract mathematics that was involved with higher orders of
infinities and things. I had colleagues at Cal Tech that I would say, “You and I intersect at countable
infinity, because I never think of anything that is more than countable infinity, and you never think of
anything that is less than countable infinity.” I mostly stuck to things that were finite in my own work. At
Case, when I’m a senior, we had a visiting professor, R. C. Bose from North Carolina, who was a very
inspiring lecturer. He was an extremely charismatic guy, and he had just solved a problem that became
front page news in the New York Times. It was to find orthogonal Latin squares. Now, today there is a
craze called Sudoku, but I imagine by the time people are watching this tape or listening to this tape that
craze will have faded away. An N-by-N Latin square is an arrangement of N letters so ever row and
every column has all N of the letters. An orthogonal Latin square is where you have two Latin squares

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 11 of 73


Oral History of Donald Knuth

with the property that if you put them next to each other, so you have a symbol from the first and a symbol
from the second, the N squared cells you get have all N squared possibilities. All combinations of A will
occur with A somewhere. A will occur with B somewhere. Z will occur with Z somewhere. A famous
paper, from 1783, I think, by Leonard Euler had conjectured that it was impossible to find orthogonal Latin
squares that were 10 by 10, or 14 by 14, or 18 by 18, or 6 by 6 -- all the cases that were twice an odd
number. This conjecture was believed for 170 years, and even had been proved three times, but people
found holes in the proof. In 1959 R. C. Bose and two other people found that it was wrong, and they
constructed Latin squares that were 10 by 10 and 14 by 14. They showed that all those cases where
actually it was possible to find orthogonal Latin squares. I met Bose. I was taking a class from him. It
was a graduate class, and I was taking graduate classes. He asked me if I could find some 12 by 12
orthogonal Latin squares. It sounded like an interesting program, so I wrote it up and I presented him
with the answer the next morning. He was happy and impressed, and we found five mutually orthogonal
Latin squares of the order of 12. That became a paper. Some interesting stories about that, that I won’t
go into it. The main thing is that he was on the cutting edge on this research. I was at an undergraduate
place where we had great teaching, but we did not have cutting edge researchers. He could recommend
me to graduate school, and he could also tell me Marshall Hall is very good at combinatorics. He gives
me a good plug for going to Cal Tech. I had visited California with my parents on summer vacations, and
so when I applied to graduate school I applied to Stanford, Berkeley and Cal Tech, and no other places.
When I got admitted to Cal Tech, I got admitted to all three. I took Cal Tech because I knew that they had
a good combinatorial attitude there, which was not really true at Stanford. In fact, [at] Stanford I wouldn’t
have been able to study Latin squares at all. While we’re at it, I might as well mention that I got
fellowships. I got a National Science Foundation Fellowship, Woodrow Wilson Foundation Fellowship, to
come to these place, but they all had the requirement that you could not do anything except study as a
graduate student. I couldn’t be a consultant to Burroughs and also have an NSF fellowship. So I turned
down the fellowships. Marshall Hall was then my thesis advisor. He was a world class mathematician,
and had done, for a long time, pioneering work in combinatorics. He was my mentor. But it was a funny
thing, because I was such in awe of him that when I was in the same room with him I could not think
straight. I wouldn’t remember my name. I would write down what he was saying, and then I would go
back to my office so that I could figure it out. We couldn’t do joint research together in the same room.
We could do it back and forth. It was almost like farming my programs out to JPL to be run. But we did
collaborate on a few things. The one thing that we did the most on actually never got published,
however, because it turned out that it just didn’t lead to the solution. He thought he had a way to solve
the Burnside problem in group theory, but it didn’t pan out. After we did all the computation I learned a lot
in the process, but none of these programs have ever appeared in print or anything. It taught me how to
deal with tree structures inside a machine, and I used the techniques in other things over the years. He
also was an extremely good advisor, in better ways than I was with my students. He would seem to keep
track of me to make sure I was not slipping. When I was working with my own graduate students, I was
pretty much in a mode where they would bug me instead of me bugging them. But he would actually
write me notes and say, Don, why don’t you do such and such? Now, I chose a thesis topic which was to
find a certain kind of what they call block designs. I will just say: symmetric block designs with parameter
Lambda equals 2. Anybody could look that up and find out what that means. I don’t want to explain it
now. At the time I did this, I believe there were six known designs of this form altogether. I had found a
new way to look at those designs, and so I thought maybe I’ll be able to find infinitely many more such
designs. They would be mostly academic interest, although statisticians would justify that they could use
them somehow. But mostly, just, do they exist or not? This was the question. Purely intellectual
curiosity. That was going to be my thesis topic: to see if I could find lots of these elusive combinatorial
patterns. But one morning I was looking at another problem entirely, having to do with finite projective
geometry, and I got a listing from a guy at Princeton who had just computed 32 solutions to a problem

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 12 of 73


Oral History of Donald Knuth

that I had been looking at with respect to a homework problem in my combinatorics class. He had found
that there are 32 solutions of Type A, and 32 solutions of Type B, to this particular problem. I said, hmm,
that’s interesting, because the 32 solutions of Type A, one of those was a well known construction. The
32 of Type B, nobody had ever found any Type B solutions before for the next higher up case. I
remember I had just gotten this listing from Princeton, and I was riding up on the elevator with Olga Todd,
one of our professors, and I said, “Mrs. Todd, I think I’m going to have a theorem in an hour. I’m going to
look at these two lists of 32 numbers. For every one on this page I am going to find a corresponding one
on this page. I am going to psyche out the rule that explains why there happen to be 32 of each kind.”
Sure enough, an hour later I had seen how to get from each solution on the first page to the solution on
the second page. I showed this to Marshall Hall. He said, “Don, that’s your thesis. Don’t worry on this
Lambda equals 2 business. Write this up and get out of here.” So that became my thesis. And it is a
good thing, because since then only one more design with Lambda equals 2 has been discovered in the
history of the world. I might still be working on my thesis if I had stuck to that problem. I felt a little guilty
that I had solved my PhD problem in one hour, so I dressed it up with a few other chapters of stuff. The
whole thesis is 70 some pages long. I discovered that it is now on the internet, probably for peoples’
curiosity, I suppose: what did he write about in those days? But of all the areas of mathematics that I’ve
applied to computer science, I would say the only area that I have never applied to computer science is
the one that I did my thesis in. It just was good training for me to exercise my brain cells.

Feigenbaum: Yeah. In fact for your colleagues, that is kind of a black hole in their knowledge of you
and understanding of you, is that thesis.

Knuth: The thesis, yeah. Well, I was going to say the reason that it is not used anymore is because
these designs turn out… Okay, we can construct them with all this pain and careful, deep analysis. But
it turned out later on that if we just work at random, we get even better results. So it was kind of pointless
from the point of view of that application, except for certain codes and things like that.

Feigenbaum: Don, just a footnote to that story. I intended this would come up later in the interview, but
it’s just so great a point to bring it in. When I’ve been advising graduate students, I tell them that the
really hard part of the thesis is finding the right problem. That’s at least half the problem.

Knuth: Yeah.

Feigenbaum: And then the other half is just doing it. And that’s the easy part of it. So I am not
impressed by this one hour. I mean, the hard part went into finding the problem, not in the solving of it.
We will get to, of course, the great piece of work that you did on The Art of Computer Programming. But
it’s always seemed to me that the researching and then writing the text of The Art of Computer
Programming was a problem generator for you. The way you and I have expressed it in the past is that
you were weaving a fabric and you would encounter holes in the fabric. Those would be the great
problems to solve, and that’s more than half the work. Once you find the problems you can go get at
them. Do you want to comment on that?

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 13 of 73


Oral History of Donald Knuth

Knuth: Right. Well, yeah. We will probably comment on it more later, too. But I guess one of the
blessings and curses of the way I work is that I don’t have difficulty thinking of questions. I don’t have too
much difficulty in the problem generation phase -- what to work on. I have to actively suppress
stimulation so that I’m not working on too many things at once. But you can ask questions that are…
The hard thing, for me anyway, is not to find a problem, but to find a good problem. To find a problem
that has some juice to it. Something that will not just be isolated to something that happens to be true,
but also will be something that will have spin offs. That once you’ve solved the problem, the techniques
are going to apply to many other things, or that this will be a link in a chain to other things. It’s not just
having a question that needs an answer. It’s very easy to… There’s a professor; I might as well mention
his name, although I don’t like to. It would be hard to mention the concept without somebody thinking of
his name. His name is [Florentin] Smarandache. I’ve never met him, but he generates problems by the
zillions. I’ve never seen one of them that I thought any merit in it whatsoever. I mean, you can generate
sequences of numbers in various ways. You can cube them and remove the middle digit, or something
like this. And say, ”Oh, is this prime?”, something like that. There’s all kinds of ways of defining
sequences of numbers or patterns of things and then asking a question about it. But if one of my
students say “I want to work on this for a thesis”, I would have to say “this problem stinks”. So the hard
thing is not to come up with a problem, but to come up with a fruitful problem. Like the famous problem of
Fermat’s Last Theorem: can there be A to the N, plus B to the N equals C to the N, for N greater than 2.
It has no applications. So you found A, B and C. It doesn’t really matter to anything. But in the course of
working on this problem, people discovered beautiful things about mathematical structures that have
solved uncountably many practical applications as a spin off. So that’s one. My thesis problem that I
solved was probably not in that sense, though, extremely interesting either. It answered a question
whether there existed projective geometries of certain orders that weren’t symmetrical. All the cases that
people had ever thought of were symmetrical, and I thought of unsymmetrical ways to do it. Well, so
what? But the technique that I used for it led to some insight and got around some other blocks that
people had in other theory. I have to worry about not getting bogged down in every question that I think
of, because otherwise I can’t move on and get anything out the door.

Feigenbaum: Don, we've gotten a little mixed up between the finishing of your thesis and your assistant
professorship at Caltech, but it doesn't matter. Around this time there was the embryonic beginnings of a
multi-volume work which you're known for, "The Art of Computer Programming." Could you tell us the
story about the beginning? Because soon it's going to be the middle of it, you were working on it so fast.

Knuth: This is, of course, really the story of my life, because I hope to live long enough to finish it. But I
may not, because it's turned out to be such a huge project. I got married in the summer of 1961, after
my first year of graduate school. My wife finished college, and I could use the money I had made -- the
$5000 on the compiler -- to finance a trip to Europe for our honeymoon. We had four months of wedded
bliss in Southern California, and then a man from Addison-Wesley came to visit me and said "Don, we
would like you to write a book about how to write compilers." The more I thought about it, I decided “Oh
yes, I've got this book inside of me.” I sketched out that day -- I still have the sheet of tablet paper on
which I wrote -- I sketched out 12 chapters that I thought ought to be in such a book. I told Jill, my wife, "I
think I'm going to write a book." As I say, we had four months of bliss, because the rest of our marriage
has all been devoted to this book. Well, we still have had happiness. But really, I wake up every morning
and I still haven't finished the book. So I try to -- I have to -- organize the rest of my life around this, as
one main unifying theme. The book was supposed to be about how to write a compiler. They had heard
about me from one of their editorial advisors, that I knew something about how to do this. The idea

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 14 of 73


Oral History of Donald Knuth

appealed to me for two main reasons. One is that I did enjoy writing. In high school I had been editor of
the weekly paper. In college I was editor of the science magazine, and I worked on the campus paper as
copy editor. And, as I told you, I wrote the manual for that compiler that we wrote. I enjoyed writing,
number one. Also, Addison-Wesley was the people who were asking me to do this book; my favorite
textbooks had been published by Addison Wesley. They had done the books that I loved the most as a
student. For them to come to me and say, ”Would you write a book for us?", and here I am just a second-
year gradate student -- this was a thrill. Another very important reason at the time was that I knew that
there was a great need for a book about compilers, because there were a lot of people who even in 1962
-- this was January of 1962 -- were starting to rediscover the wheel. The knowledge was out there, but it
hadn't been explained. The people who had discovered it, though, were scattered all over the world and
they didn't know of each other's work either, very much. I had been following it. Everybody I could think
of who could write a book about compilers, as far as I could see, they would only give a piece of the
fabric. They would slant it to their own view of it. There might be four people who could write about it, but
they would write four different books. I could present all four of their viewpoints in what I would think was
a balanced way, without any axe to grind, without slanting it towards something that I thought would be
misleading to the compiler writer for the future. I considered myself as a journalist, essentially. I could be
the expositor, the tech writer, that could do the job that was needed in order to take the work of these
brilliant people and make it accessible to the world. That was my motivation. Now, I didn’t have much
time to spend on it then, I just had this page of paper with 12 chapter headings on it. That's all I could do
while I'm a consultant at Burroughs and doing my graduate work. I signed a contract, but they said "We
know it'll take you a while." I didn't really begin to have much time to work on it until 1963, my third year
of graduate school, as I'm already finishing up on my thesis. In the summer of '62, I guess I should
mention, I wrote another compiler. This was for Univac; it was a FORTRAN compiler. I spent the
summer, I sold my soul to the devil, I guess you say, for three months in the summer of 1962 to write a
FORTRAN compiler. I believe that the salary for that was $15,000, which was much more than an
assistant professor. I think assistant professors were getting eight or nine thousand in those days.

Feigenbaum: Well, when I started in 1960 at [University of California] Berkeley, I was getting $7,600 for
the nine-month year.

Knuth: Yeah, so you see it. I got $15,000 for a summer job in 1962 writing a FORTRAN compiler. One
day during that summer I was writing the part of the compiler that looks up identifiers in a hash table. The
method that we used is called linear probing. Basically you take the variable name that you want to look
up, you scramble it, like you square it or something like this, and that gives you a number between one
and, well in those days it would have been between 1 and 1000, and then you look there. If you find it,
good; if you don't find it, go to the next place and keep on going until you either get to an empty place, or
you find the number you're looking for. It's called linear probing. There was a rumor that one of
Professor Feller's students at Princeton had tried to figure out how fast linear probing works and was
unable to succeed. This was a new thing for me. It was a case where I was doing programming, but I
also had a mathematical problem that would go into my other [job]. My winter job was being a math
student, my summer job was writing compilers. There was no mix. These worlds did not intersect at all
in my life at that point. So I spent one day during the summer while writing the compiler looking at the
mathematics of how fast does linear probing work. I got lucky, and I solved the problem. I figured out
some math, and I kept two or three sheets of paper with me and I typed it up. [“Notes on ‘Open’
Addressing’, 7/22/63] I guess that's on the internet now, because this became really the genesis of my
main research work, which developed not to be working on compilers, but to be working on what they call

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 15 of 73


Oral History of Donald Knuth

analysis of algorithms, which is, have a computer method and find out how good is it quantitatively. I can
say, if I got so many things to look up in the table, how long is linear probing going to take. It dawned on
me that this was just one of many algorithms that would be important, and each one would lead to a
fascinating mathematical problem. This was easily a good lifetime source of rich problems to work on.
Here I am then, in the middle of 1962, writing this FORTRAN compiler, and I had one day to do the
research and mathematics that changed my life for my future research trends. But now I've gotten off the
topic of what your original question was.

Feigenbaum: We were talking about sort of the.. You talked about the embryo of The Art of Computing.
The compiler book morphed into The Art of Computer Programming, which became a seven-volume plan.

Knuth: Exactly. Anyway, I'm working on a compiler and I'm thinking about this. But now I'm starting,
after I finish this summer job, then I began to do things that were going to be relating to the book. One of
the things I knew I had to have in the book was an artificial machine, because I'm writing a compiler book
but machines are changing faster than I can write books. I have to have a machine that I'm totally in
control of. I invented this machine called MIX, which was typical of the computers of 1962. In 1963 I
wrote a simulator for MIX so that I could write sample programs for it, and I taught a class at Caltech on
how to write programs in assembly language for this hypothetical computer. Then I started writing the
parts that dealt with sorting problems and searching problems, like the linear probing idea. I began to
write those parts, which are part of a compiler, of the book. I had several hundred pages of notes
gathering for those chapters for The Art of Computer Programming. Before I graduated, I've already
done quite a bit of writing on The Art of Computer Programming. I met George Forsythe about this time.
George was the man who inspired both of us [Knuth and Feigenbaum] to come to Stanford during the
'60s. George came down to Southern California for a talk, and he said, "Come up to Stanford. How
about joining our faculty?" I said "Oh no, I can't do that. I just got married, and I've got to finish this book
first." I said, "I think I'll finish the book next year, and then I can come up [and] start thinking about the
rest of my life, but I want to get my book done before my son is born.” Well, John is now 40-some years
old and I'm not done with the book. Part of my lack of expertise is any good estimation procedure as to
how long projects are going to take. I way underestimated how much needed to be written about in this
book. Anyway, I started writing the manuscript, and I went merrily along writing pages of things that I
thought really needed to be said. Of course, it didn't take long before I had started to discover a few
things of my own that weren't in any of the existing literature. I did have an axe to grind. The message
that I was presenting was in fact not going to be unbiased at all. It was going to be based on my own
particular slant on stuff, and that original reason for why I should write the book became impossible to
sustain. But the fact that I had worked on linear probing and solved the problem gave me a new unifying
theme for the book. I was going to base it around this idea of analyzing algorithms, and have some
quantitative ideas about how good methods were. Not just that they worked, but that they worked well:
this method worked 3 times better than this method, or 3.1 times better than this method. Also, at this
time I was learning mathematical techniques that I had never been taught in school. I found they were
out there, but they just hadn't been emphasized openly, about how to solve problems of this kind. So my
book would also present a different kind of mathematics than was common in the curriculum at the time,
that was very relevant to analysis of algorithm. I went to the publishers, I went to Addison Wesley, and
said "How about changing the title of the book from ‘The Art of Computer Programming’ to ‘The Analysis
of Algorithms’." They said that will never sell; their focus group couldn't buy that one. I'm glad they stuck
to the original title, although I'm also glad to see that several books have now come out called “The
Analysis of Algorithms”, 20 years down the line. But in those days, The Art of Computer Programming

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 16 of 73


Oral History of Donald Knuth

was very important because I'm thinking of the aesthetical: the whole question of writing programs as
something that has artistic aspects in all senses of the word. The one idea is “art” which means artificial,
and the other “art” means fine art. All these are long stories, but I’ve got to cover it fairly quickly. I've got
The Art of Computer Programming started out, and I'm working on my 12 chapters. I finish a rough draft
of all 12 chapters by, I think it was like 1965. I've got 3,000 pages of notes, including a very good
example of what you mentioned about seeing holes in the fabric. One of the most important chapters in
the book is parsing: going from somebody's algebraic formula and figuring out the structure of the
formula. Just the way I had done in seventh grade finding the structure of English sentences, I had to do
this with mathematical sentences. Chapter ten is all about parsing of context-free language, [which] is
what we called it at the time. I covered what people had published about context-free languages and
parsing. I got to the end of the chapter and I said, well, you can combine these ideas and these ideas,
and all of a sudden you get a unifying thing which goes all the way to the limit. These other ideas had
sort of gone partway there. They would say “Oh, if a grammar satisfies this condition, I can do it
efficiently.” ”If a grammar satisfies this condition, I can do it efficiently.” But now, all of a sudden, I saw
there was a way to say I can find the most general condition that can be done efficiently without looking
ahead to the end of the sentence. That you could make a decision on the fly, reading from left to right,
about the structure of the thing. That was just a natural outgrowth of seeing the different pieces of the
fabric that other people had put together, and writing it into a chapter for the first time. But I felt that this
general concept, well, I didn't feel that I had surrounded the concept. I knew that I had it, and I could
prove it, and I could check it, but I couldn't really intuit it all in my head. I knew it was right, but it was too
hard for me, really, to explain it well. So I didn't put in The Art of Computer Programming. I thought it
was beyond the scope of my book. Textbooks don't have to cover everything when you get to the harder
things; then you have to go to the literature. My idea at that time [is] I'm writing this book and I'm thinking
it's going to be published very soon, so any little things I discover and put in the book I didn't bother to
write a paper and publish in the journal because I figure it'll be in my book pretty soon anyway. Computer
science is changing so fast, my book is bound to be obsolete. It takes a year for it to go through editing,
and people drawing the illustrations, and then they have to print it and bind it and so on. I have to be a
little bit ahead of the state-of-the-art if my book isn't going to be obsolete when it comes out. So I kept
most of the stuff to myself that I had, these little ideas I had been coming up with. But when I got to this
idea of left-to-right parsing, I said "Well here's something I don't really understand very well. I'll publish
this, let other people figure out what it is, and then they can tell me what I should have said." I published
that paper I believe in 1965, at the end of finishing my draft of the chapter, which didn't get as far as that
story, LR(k). Well now, textbooks of computer science start with LR(k) and take off from there. But I
want to give you an idea of…

Feigenbaum: Don, for historical reasons, tell the audience where the LR(k) paper was published so they
can go look it up.

Knuth: It was published in the journal called Information and Control, which has now changed its name
to Information and Computation. In those days, you can see why they called it Information and Control. It
was the journal that had had the best papers on parsing of languages at the time. It's a long paper, and
difficult. It's also reprinted in my book “Selected Papers on Computer Languages”, with a few corrections
to the original. In the original, I drew the trees with the root at the bottom. But everybody draws trees with
the root at the top now, so the reprint has trees drawn in a more modern notation. I'm trying to give the
flavor of the way things were in 1965. My son was born in the summer of '65, and I finished this work on
LR(k) at Christmastime in '65. Then I had, I think, one more chapter to write. But early in '66 I had all

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 17 of 73


Oral History of Donald Knuth

3000 pages of the manuscript ready. I typed chapter one. My idea was, I looked at these pages -- the
pages were all hand-written -- and it looked to me like my handwriting, I would guess, that was, I don't
know how many words there were on a page. I had chapter one and I typed it and I sent it to the
publishers, and they said "Don, what have you written? This book is going to be huge." I had actually
written them a letter earlier as I'm working on sorting. I said to the guy who signed me up, I signed a
contract with him; by this time, he had been promoted. No, I'm not sure was about this, but anyway, I
wrote to him in '63 or '64 saying, "You know, as I'm working on this book on compilers, there's a few
things that deserve more complete treatment than a compiler writer needs to know. Do you mind if I add a
little bit here?" They said "Sure, Don, go right ahead. Whatever you think is good to write about, do it."
Then I send them chapter one a few years later. By this time, I guess the guy's promoted, and he's
saying "Oh my goodness, what are we going to do? Did you realize that this book is going to be more
than 2,000 pages long?", or something like this. No, I didn't. I had read a lot of books, and I thought I
understood about things. I had my typed pages there, and I was figuring five typed pages would go into
one page of text. It just looked to me, to my eyes, if I had five typewritten pages -- you know, the letters in
a textbook are smaller. But I should have realized that the guys at the publishing house knew something
about books too. They told me "No, no, it was one and a half pages of text makes a book [page]." I didn't
believe it. So I went back to my calculus book, which was an Addison Wesley book, and it typed it out.
Sure enough, they were absolutely right. It took one and a half pages. So I had three times longer. No
wonder it had taken me so long to get chapter one done! I'm sitting here with much, much more than I
thought I had. Meanwhile computer science hasn't been standing still, and I knew that more still has to
be written as I go. I went to Boston, and I happened to catch a glance at some notes that my editor had
written to himself for the meeting that we were going to have with his bosses, and one of the comments
on there was "Terrific cost bind" or something like that. Publishing houses all have their horror stories
about a professor who writes12 volumes about the history of an egg, or something like this, and it never
sells, and it just is a terrible thing that they have a contract that they've signed. So they have to figure out
how to rescue something out of this situation coming with this monster book. We thought at first we
would package it into three volumes instead of one. Then they sent out chapter one to a dozen readers
in a focus group, and they got comments on it. Well, the readers liked what they saw in that chapter, and
so at least I had some support from them. Then after a few more months we decided to package it. They
figured out that of the 12 chapters there were seven of them that would sell, and we could stick the other
five in some way that would make a fairly decent seven-volume set. That was what was finally
announced in 1966 or something: that it would come out in seven volumes. After typing chapter one I
typed chapter two, and so on. I kept working on it. All the time when I'm not teaching my classes at
Caltech, I'm typing up my notes and polishing the hand-written notes that I had made from these 3000
pages of rough draft. That sets the scene for the early days of The Art of Computer Programming.

Feigenbaum: What year are we at now?

Knuth: What happened is, I'm at Caltech. I'm a math professor. I'm teaching classes in algebra and
once in a while combinatorics at Caltech. Also one or two classes connected with computing, like sorting,
I think I might have taught one quarter. But most of the things I'm teaching at Caltech are orthogonal to
The Art of Computer Programming. My daughter is born in December of '66. I've got the entire
manuscript of volume one to the publisher, I think, during '66. I'm working on typing up chapters three
and four at the beginning of '67. I think this is approximately the way things stand. I was trying to finish
the book before my son was born in '65, and what happened is that I got… I'm seeing now that…
Volume one actually turned out to be almost 700 pages, which means 1,000 type-written pages. You can

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 18 of 73


Oral History of Donald Knuth

see why I said that my blissful marriage wasn't quite so blissful, because I'm working on this a lot. I'm
doing most of it actually watching the late late show on television. I have also some earplugs for when
the kids are screaming a little bit too much. Here I am, typing The Art of Computer Programming when
the babies are crying, although I did also change diapers and so on.

Feigenbaum: I think that what we need to do is talk about… This is December '66, when your daughter
was born.

Knuth: Yeah.

Feigenbaum: That leads sort of directly into this magical year of 1967, which didn't end so magically.
Let's continue on with 1967 in a moment.

Knuth: Okay.

Feigenbaum: Don, once you told me that 1967 was your most creative year. I'd like to get into it. You
also said you had only a very short time to do your research during that year, and the year didn't end so
well for you. Let's talk about that.

Knuth: Well, it's certainly a pivotal year in my life. You can see in retrospect why I think things were
building up to a crisis, because I was just working at high pitch all the time. I think I mentioned I was
editor of ACM Communications, and ACM Journal, in the programming languages sections. I took the
editorial duties very seriously. A lot of people were submitting papers, and I would write long referee
reports in many cases, as well as discussing with referees all the things I had to be doing. I was a
consultant to Burroughs on innovative machines. I was consumed with getting The Art of Computer
Programming done, and I had children, and being a father, and husband. I would start out every day and
I would say "Well, what am I going to accomplish today?" Then I would stay up until I finished it. I used
to be able to do this. When I was in high school and I was editor of the paper, I would do an all-nighter
every week when the paper came out. I would just go without sleep on those occasions. I was sort of
used to working in this mode, where I didn't realize I was punishing my body. We didn't have iPods and
things like that, but still I had the TV on. That was enough to kill the boredom while I had to do the typing
of a lot of material. Now, in 1967, is when things came to a head. Also, it was time for me to make a
career decision. I was getting offers. I think I was offered full professorships at North Carolina in Chapel
Hill, and also at Purdue, I think. I had to make a decision as to what I should do. I was promoted to
Associate Professor at Caltech surprisingly early. The question is, where should I spend the rest of my
life? Should I be a mathematician? Should I be a computer scientist? By this time I had learned that
there was actually possible to do mathematical work as a computer scientist. I had analysis of algorithms
to do. What would be a permanent home? I visited Stanford. I gave a talk about my left-to-right parsing. I
discovered a theorem about it sitting in one of the student dormitories, Stern Hall, the night I gave the
lecture. I came up there, I liked George Forsythe very much, I liked the people that I met here very much.
I was thinking Stanford would be a nice place, but also there were other places too that I wanted to check
out carefully. I was also trying to think about what to do long-term for my permanent home. I don't like to
move. My model of my life was going to be that I was going to make one move in my lifetime to a place

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 19 of 73


Oral History of Donald Knuth

where I had tenure, and I would stay there forever. I wanted to check all these things out, so I was
confronted with this aspect as well. I was signed up to be an ACM lecturer, ACM National Lecture
Program, for two or three weeks in February of 1967, which meant that I give a list of three talks. Each
ACM chapter or university that wants to have a speaker, they coordinate so that I have a schedule. I go
from city to city every day. You probably did the same thing about then.

Feigenbaum: Yep.

Knuth: Stanford and Berkeley were on this list, as well as quite a few schools in the east. That was
three weeks in February where I was giving talks, about different things about programming languages,
mostly. When I'm at Caltech, I've got to be either preparing my class lectures, or typing my book and
getting it done. I'm in the middle of typing chapter four at this time, which is the second part of volume
two. I'm about, I don't know, one third of the way into volume two. That's why I don't have time to do
research. If I get a new idea, if I'm saying "Here's a problem that ought to be solved", when am I going to
do it? Maybe on the airplane. As you know, when you're a lecturer every day goes the same way. You
get up at your hotel, and you get on the plane. Somebody meets you at noon and you go out to lunch
and then they have small talk. They ask you the same questions; "Where are you going to be tomorrow,
Don", and so on. You give your lecture in the afternoon, there's a party in the evening, and then you go
to your hotel. The next morning you just go off to the next city. After three weeks of this, I got really not
very good. I skipped out in one case. There was a snowstorm in Atlanta, so I skipped my talk in Atlanta
and I stayed an extra day. I'm trying to give you the flavor of this. But on this trip in February, also, it
turned out to be very fruitful because one of my stops was in Cornell, where Peter Wegner was a visiting
professor. We went out for a hike that weekend to talk about the main topic in programming languages in
those days: how do you define the semantics of a programming language. What's a good way to
formalize the meaning of the sentences in that language? When someone writes a string of symbols, we
wanted to say exactly what that means, and do it in a way that we can prove interesting results about,
and make sure that we’ve translated it correctly. There were a lot of ideas floating in the air at the time. I
had been thinking of how I'm presenting it in The Art of Computer Programming. I said, well, you know,
there were two basic ways to do this. One is top down, where you have the context telling you what to
do. You start out and you say, “Oh, this is supposed to be a program. What does a program mean?”
Then a program tells the things inside the program what they're supposed to mean. The other is bottom
up, where you just start with one symbol, this is a number one, and say “this means one”, and then you
have a plus sign, and one plus two, and you build up from the bottom, and say “that means three”. So we
have a bottom-up version of semantics, and a top-down version of semantics. Peter Wegner says to me
"Don, why don't you use both top-down and bottom-up? Have the synthesized attributes from the bottom
up and the inherited attributes that come down from the environment." I said "Well, this is obviously
impossible. You get into circular reasoning. You can't define something in terms of itself." We were
talking about this, and after ten minutes I realized I was shouting to him, because I was realizing that he
was absolutely right. You could do it both ways, and define the things in a way that they would not
interfere with each other; that certain aspects of the meaning could come from the top, and other aspects
from the bottom, and that this actually made a beautiful combination.

Ed Feigenbaum: Don, we were speaking about semantics of programming languages and you were
shouting at Peter Wegner.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 20 of 73


Oral History of Donald Knuth

Don Knuth: I’m shouting at Peter Wegner because it turns out that there’s a beautiful way to combine
the top-down and bottom-up approaches simultaneously when you’re defining semantics. This is
happening on a weekend as we’re hiking at Cornell in a beautiful park by frozen icicles. I can remember
the scene because this was kind of an “aha” moment that doesn’t happen to you very often in your life.
People tell me now no one’s allowed in that park in February because it’s too risky that you’re going to
slide and hurt yourself. It was when all of a sudden it occurred to me that this might be possible. But I
don’t have time to do research. I have to go on and give more lectures. Well, I find myself the next week
at Stanford University speaking to the graduate students. I gave one of my regular lectures, and then
there was an hour where the students ask questions to the visitor. There was a student there named
Susan Graham, who of course turned out to be a very distinguished professor at Berkeley and editor of
Transactions on Programming Languages and Systems, and she asked me a question. “Don, how do
you think would be a good way to define semantics of programming languages?” In the back of my mind
through that week I had been tossing around this idea that Peter and I had talked about the week before.
So I said, “Let’s try to sketch out a simple language and try to define its semantics”. On the blackboard,
in response to Susan’s questions, we would erase, and try things, and some things wouldn’t work. But
for the next 15 or 20 minutes I tried to write down something that I had never written down before, but it
was sort of in the back of my mind: how to define a very simple algebraic language and convert it into a
very simple machine language which we invented on the spot to be an abstract but very simple computer.
Then we would try to write out the formal semantics for this, so that I could write a few lines in this
algebraic language, and then we could parse it and see exactly what the semantics would be, which
would be the machine language program. Of course there must have been a lot of bugs in it, but this is
the way I had to do research at that time. I had a chance while I’m in front of the students to think about
the research problem that was just beginning to jell. Who knows how bad it was fouled up, but on the
other hand, being a teacher, that’s when you get your thoughts in order best. If you’re only talking to
yourself, you don’t organize your thoughts with as much discipline. It probably was also not a bad way to
do research. I didn’t get a chance to think about it when I got home to Caltech because I’m typing up The
Art of Computer Programming when I’m at home, and I’m being an editor, and I’m teaching my classes
the rest of the time at Caltech. Then in April I happened to be giving a lecture in Grenoble, and a
Frenchman, Louis Bolliet, asked me something about how one might define semantics, in another sort of
a bull session in Grenoble in France. That was my second chance to think about this problem, when I
was talking with him there. I was stealing time from the other things. That wasn’t the only thing going on
in ’67. I wasn’t only thinking of what to do with my future life, and editing journals and so on, I’m also
teaching a class at Caltech for sophomores. It’s an all year class, sort of an introduction to abstract
mathematics. While I was looking at a problem, we had a visitor at Caltech named Trevor-- what’s his
last name-- Evans, Trevor Evans. He and I were discussing how to work from axioms, and to prove
theorems from axioms. This is a basic thing in abstract mathematics. Somebody sets down an axiom,
like the associative law; it says that if parentheses “ab” times “c” is equal to “a” times parentheses “bc.”
That’s an axiom. I was looking at other axioms that were sort of random. One of the things I asked my
students in the class was, I was trying to teach the sophomores how to do mini research problems. So I
gave them axioms which I called the “axioms of a grope.” They were supposed to develop “grope theory”
-- they were supposed to grope for theorems. Of course the mathematical theory well developed is a
“group”, which I had been teaching them; axioms of groups. One of them is the associative law. Another
axiom of groups is that an element times its inverse is equal to the identity. Another axiom is that the
identity times anything, identity times “X”, is “X”. So groups have axioms. We learned in the class how to
derive consequences of these axioms that weren’t exactly obvious at the beginning. So I said, okay, let’s
make a “grope.” The axiom for a grope is something like “x” times the quantity “yx” was equal to “y”. I give
them this axiom, and I say to the class, what can you derive? Can you find all gropes that have five
elements? Can you prove any theorems about normal subgropes, or whatever it is? Make up a theory.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 21 of 73


Oral History of Donald Knuth

As a class we came back in a week saying what theorems could you come up with. We tried to imagine
ourselves in the shoes of an inventor of a mathematical theory, starting with axioms. Well, Trevor Evans
was there and he showed me how to define what we called the “free grope,” which is the set of all… It
can be infinite, but you take all strings of letters, all formulas. Is it possible to tell whether one formula
can be proved equal to the other formula just by using this one axiom of the grope, “x” times “yx” equals
“y”? He showed me a very nice way to solve that problem, because he had been working on word
problems in what’s called universal algebra, the study of axiom systems. While I was looking at Trevor
Evans’ solution to this problem -- this problem arose in connection with my teaching of the class -- I
looked at Trevor Evans’ solution to this problem and I realized that I could develop an actual method that
would work with axioms in general, without thinking that a machine could figure out. The machine could
start out with the axioms of group theory, and after a small amount of computation, it could come up with
a set of 10 consequences of those axioms that would be enough to decide the word problem for free
groups. And the machine was doing it. We didn’t need a mathematician there to prove, to say, “Oh, now
try combining this formula and this formula.” With the technique I learned from Trevor Evans, and then
with a little extra twist that I put on it, I could set the machine going on axioms and it would automatically
know which consequences of these things, which things to plug in, would be potentially fruitful. If we were
lucky, like we were in the case of group theory axioms, it would finally get to the end and say, “Now,
there’s nothing more can be proved. I’ve got enough. I’ve got a complete set of reductions. If you apply
these reductions and none of them applies, you’ve got it.” It relates to AI techniques of expert systems, in
a way. This idea came to me as I’m teaching this basic math class. The students in this class were
supposed to do a term paper. In the third quarter, everybody worked on this. One of the best students in
the class, Peter Bendix, chose to do his term paper by implementing the algorithm that I had sketched on
the blackboard in one of the lectures at that time. So we could do experiments during the spring of ’67,
trying out a whole bunch of different kinds of axioms and seeing which ones the machine would solve and
which ones it would keep spinning and keep generating more and more reductions that seemed to go
without limit. We figured out in some cases how we could introduce new axioms that would bring the
whole thing back down again. So we’re doing a lot of experiments on that kind of thing. I don’t have time
to sit down at home and work out the theory for it, but I knew it had lots of possibilities. Here I had
attribute grammars coming up in February, and these reductions systems coming up in March, and I’m
supposed to be grinding out Volume Two of The Art of Computer Programming. The text of volume one
had gone to Addison-Wesley the previous year, and the copy editor had sent me back corrections and
told me, “Don, this isn’t good writing. You’ve got to change this,” and he’d teach me Addison-Wesley
house style. The page proofs started coming. I started going through galley proofs, but now it was time to
get page proofs for volume one. Volume one was published in January of 1968, but the page proofs
started to be available in the spring also.

Q So it’s layer, upon layer, upon layer.

Don Knuth: Right. There’s a conference in April in Norway on simulation languages; that was another of
the things that I’d been working on at Burroughs. We had a language called SOL, Simulation Oriented
Language, which was an improvement of the state-of-the-art in systems simulation, in what they called
discrete simulation languages. There was an international conference held in Norway by the people who
had invented the Simula language, which wasn’t very well known. They organized this conference and I
went to that, visiting Paris and Grenoble on my way because Maurice Nivat and I had also become
friends. His thesis was on theory of context-free grammars, and no one in France would read it. He
found a guy in America who would appreciate his work, so he came out and we spent some time together

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 22 of 73


Oral History of Donald Knuth

in ’66 getting to know each other and talking about context-free grammar research. I visited him in Paris
and then I went to Grenoble, and then went to Norway for this conference on simulation languages where
I presented a paper about SOL, and learned about Simula, and so on. My parents and Jill’s parents are
taking care of our kids while we’re in Europe during this time in April. I’m scheduled in June to lecture at
a summer school in Copenhagen, an international summer school. I’m giving lectures about how to
parse, what’s called top-down parsing. “LL(k)” is the terminology that developed after these lectures. This
was a topic that I did put in my draft of Chapter 10. It was something that I understood well enough that I
didn’t have to publish it at the time. I gave it for the first time in these lectures in June in Copenhagen.
That was a one-week series of lectures with several lectures every day, five days, to be given there. The
summer school met for two weeks, and I was supposed to speak in the second week of that summer
school. All right. What happened then in May is I had a massive bleeding ulcer, and I was in the hospital.
My body gave out. I was just doing all this stuff, and it couldn’t take it. I learned about myself. I had a
wonderful doctor who showed me his textbook about ulcers. At that time they didn’t know that ulcers are
related to this bacteria. As far as they were concerned it, was just acid.

Q Stress.

Don Knuth: Yeah. People would get operations so that their stomachs wouldn’t produce so much acid,
and things like that. Anyway, he showed me his textbook, and his textbook described the typical ulcer
patient; what other people call the “Type A” personality. It just described me to a “T”, all of the things that
were there. I was an automaton, I think, basically. I had all been all my life pretty much a test-taking
machine. You know, I saw a goal and I put myself to it, and I worked on it and pushed it through. I didn’t
say no to people when they said, “Don, can you do this for me?” At this point I saw, I could all of a
sudden get to understand, that I had this problem; that I shouldn’t try to do the impossible. The doctor, I
say he’s so wonderful because doctors usually talk down to patients and they keep their secrets to
themselves. But here he let me look at this textbook so I could know that he wasn’t just telling me
something to make me feel good. I had access to anything I wanted to know about my condition. So I
wrote a letter to my publisher, framed in black, saying, “I’m not going to be able to get the manuscript of
volume two to you this year. I’m sorry. I’m not supposed to work for the next three weeks.” In fact, you
can tell exactly where this was. I was writing in a part of volume two when the ulcer happened, when it
started to burst or whatever. I was working out the answer to a problem about greatest common divisors
that goes about in the middle of volume two. It was an exercise where the answer had a lot of cases to it,
so it takes about a page and a half to explain the answer. It was a problem that needed to be studied and
nobody had studied before, and I was working at it. All of a sudden, bingo. The reason you can find it is
if you look in the index to volume two under “brute force,” it refers you to a page, an answer page. I was
solving this problem by brute force, and so you look at that page, you can see exactly what exercise I was
working on. Then I put it away. I only solved half of the exercise before I could work on it again a few
weeks later. I went into the hospital. It wasn’t too bad, but the blood supply… I took iron pills and got
ready. I could still go to Copenhagen to give my lectures in June. However, the first week was supposed
to be lectures by Nicholas Wirth, and the second week was supposed to be lectures by me. But Klaus
had just gone on an around the world tour with his wife and had come down with dysentery in India and
was extremely ill, and had to cancel his lectures. So I was supposed to go on in the first week instead.
But I was stealing time so bad, I hadn’t really prepared my lectures. I said, oh, I have a week. I’ll go to
Copenhagen, listen to Klaus and I’ll prepare my lectures. I hadn’t prepared. So I’m talking about stuff that
has never been written down before, never been developed with the students. I get to Copenhagen with
one day to prepare for this week of lectures. Well, one thing in Copenhagen, there’s wonderful parks all

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 23 of 73


Oral History of Donald Knuth

over the city. I sat down under a big tree in one of those parks on the first day, and I thought of enough
things to say in my first two lectures. On the second day I gave the lectures, and I sat down under that
tree and I worked out the lectures for the next day. These lectures became my paper called “Top-Down
Syntax Analysis.” That was the story of the first part of June. The second part of June I’m going to a
conference in Oxford, one of the first conferences on discrete mathematics. There I’m presenting my
paper on the new method that I had discovered, now called the Knuth-Bendix algorithm, about the word
problems in universal algebra. After I finished my lectures at Copenhagen I had time to write the paper
that I was giving at Oxford the following week. There at Oxford, I meet a lot of other people and get more
stimulated about combinatorial research, which I can’t do. Come back to Caltech and I’m working as a
consultant as well. I resigned from ten editorial boards at this time. No more ACM Journal, no more
Communications. I gave up all of the editorships that I was on in order to cut down my work load. I started
working again on volume two where I left off at the time of the ulcer, but I would be careful to go to sleep
and keep a regular schedule. In the fall I went to a conference in Santa Barbara, a conference on
combinatorial mathematics. That was my first chance to be away from Caltech, away from my teaching
duties, away from having to type The Art of Computer Programming. That’s where I had three days to sit
on the beach and develop the theory of attribute grammars, this idea of top-down and bottom–up. I cut
out of the whole conference. I didn’t go to any of the talks. I just sat on the beach and worked on the
theory of attribute grammar. As it turned out, I wasn’t that interested in most of the talks, although I met
people that became lifelong friends at the meals and we talked about things off-line. But the formal talks
themselves, I was getting disappointed with mathematical talks. I found myself, in most lectures on
mathematics that I heard in 1966 and ’67, I sat in the back row and I said, “So what? So what?” Computer
science was becoming much more exciting to me. When I finally made my career decision as to where to
go, I had four main choices. One was stay at Caltech. They offered me full professor of mathematics. I
could go to Harvard as a full professor in applied science, which meant computer science. That was as
close as you could get to computer science there. At Harvard my job would have been to build up a
computer science department there. Harvard was, in Floyd’s term, an advanced backwater at that point
in time for computer science, and Caltech was as well. Because Caltech and Harvard are so good at
physics and chemistry and biology, they were thinking of computers because they can help chemists and
physicists and biologists. They didn’t think of it as having problems of its own interest. Stanford, where
we had the best group of computer scientists in the world already there, and knowing that computer
science had a great future, and also the best students in the world were there to work with, the program
was already built up. I could come to Stanford and be one of the boys and do computer science, instead
of argue for computer science and try to do barnstorming. Berkeley was the fourth place. I admired
Berkeley very much as probably the greatest all around institution for covering everything. Everything
Stanford covered it covered well, but it didn’t have a professor of Sanskrit, and Berkeley had a professor
of Sanskrit, that sort of thing. But I was worried about Berkeley because Ronald Reagan was governor.
Stanford was a private school and wouldn’t be subject to the whims of politicians so much as the
University of California. Stanford had this great other thing where the faculty can live on campus, so I
knew that I could come to Stanford and the rest of my life I would be able to bike to work; I wouldn’t have
to do any commuting. And Forsythe was a wonderful person, and all the group at Stanford were great,
and the students were the best. So it was almost a no-brainer, why I finally came to Stanford. My offer
from Stanford came through in February of ’68, which was the end. The other three had already come in
earlier, but I was waiting for Stanford before I made my final decision. In February of ’68 I finally got the
offer from Stanford. It was a month after volume one had been published, and George said, “Oh yes,
everybody’s all smiles now.”

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 24 of 73


Oral History of Donald Knuth

Ed Feigenbaum: Everyone was all smiles because they had gone out on a limb to offer you a full
professorship?

Don Knuth: No, because the committees were saying, “This guy is just 30 years old.” You know, I was
born in ’38 and this was January of ’68. But when they looked at the book, they said, “Oh, there’s some
credibility here.” That helped me. I got through ’67 and learned how to slack off a little bit, right? I’ve
always felt after that, hearing many other stories of people of when did they get these special insights that
turned out to be important in their research thing, that was very rarely in a settled time of their life, where
they had a comfortable living conditions and good – the word is escaping me now - but anyway, luxury;
set up a nice office space and good lighting and so forth. No, people are working in a garret, they’re
starving, they’ve got kids screaming, there’s a war going on or something. But that’s when they get a lot
of their most… almost every breakthrough idea. I’ve always wondered, if you wanted to set up a think
tank where you were going to get the most productivity out of your scientists, wouldn’t you have to, not
exactly torture them, but deprive them of things? It’s not sustainable. Still, looking back, that was a time
when I did as much science as I could, as well as try to fulfill all my other obligations.

Ed Feigenbaum: Don, to go back to the Stanford move. A couple of questions come up, because I was
around. I remember sitting in George Forsythe’s office, just a handful of us people considering the
appointment of this young guy from Caltech who had this wonderful outline of books. One of the things
that we were discussing was [that] Don Knuth wanted us to also hire Bob Floyd. It turns out that hiring
Bob Floyd was a wonderful idea. Bob Floyd was magnificent. But it hadn’t occurred to us until you
brought it up, and then we did it. Can you go into that story?

Don Knuth: Yeah, because Bob was a very special person to me throughout this period. As I said, I’d
been reading the literature about programming languages avidly. When I was asked to write a book
about it in ’62, I knew there were these people who had written nice papers, but nobody knew how to sort
out the chaff from the wheat. In the early days, like by 1964, my strong opinion was that five good papers
about programming languages had ever been written, and four of them were by Bob Floyd. I met Bob the
first time in summer of ’62 when I was working on this Fortran compiler for Univac. At the end of the
summer I went to the ACM conference in Syracuse, New York, and Bob was there. We hit it off very well
right away. He was showing me his strange idea that you could prove a computer program correct,
something that had never occurred to me. I said I was a programmer in one room, and I was a
mathematician in another room. Mathematicians prove things. Programmers write code and they hope it
works, and they twiddle it until it works. But Bob’s saying, no, you don’t have to twiddle; you can take a
program and you can give a mathematical proof that it works. He was way ahead of me. There were very
few people who had ever conceived of putting those two worlds together at that time.

Ed Feigenbaum: [John] McCarthy was one of them, though.

Don Knuth: McCarthy, exactly, right. John and Bob were probably… I don’t know if there was anybody
in Europe yet who had seen this right. Bob tells me his thoughts about this when I meet him in this
conference in Syracuse. Then I went to visit him a year later when I was in Massachusetts at the crisis
meeting with my publishers. He lived there, and I went and spent a couple of days in Topsfield where he
lived. We shared ideas about sorting. Then we had a really exciting correspondence over the next time

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 25 of 73


Oral History of Donald Knuth

where letters go back and forth, each one trying to trump the other about coming up with a better idea
about something that’s now called sorting networks. Bob and I developed a theory of sorting networks
between us in the correspondence. We were thinking at the time, this looks like Leibnitz writing to
Bernoulli in the old days of scientists trying to develop a new theory. We had a very exciting time working
on these letters. Every time I would send a letter off to Bob, thinking, “Okay, now this is the last result,” he
would come back with a brand new idea and make me work harder to come up with the next step in our
development of this theory. We weren’t talking only about programming languages; we were talking also
about a variety of algorithms. We found that we had lots of common interests. He came out to visit me a
couple of times in California, and I visited him. So when I was making my career decision, I said, “Hey
Bob, wouldn’t it be nice if we could both end up at the same place?” I wrote him a letter, probably the
same letter where I was describing to him my idea about left-to-right parsing. As soon as I discovered it, I
wrote immediately to Bob a 12-page letter with ideas of left-to-right parsing after I had come up with the
idea. He comes back and says, “Oh, bravo, and did you think about this,” and so on. So we had this
going on. Then at the beginning of ’67 I said, “You know, Bob, why don’t we think about trying to get into
the same place together? What is your take on the different places in the world?” At that time he was at
Carnegie. He had left Computer Associates and spent, I think, two years at Carnegie. He was enjoying it
there, and he was teaching and introducing new things into the curriculum there. He wrote me this letter
assessing all of the schools at the time, the way he thought their development of computer science was.
When I quoted him a minute ago saying Harvard was an advanced backwater, that comes out of that
letter that he was describing the way he looked at things. At the end of the letter he says -- I had already
mentioned that Stanford was my current number one but I wasn’t totally sure -- and at the end he ended
up concurring. He said if I would go there and he could go there, chances are he would go there, too. I
presented this to Forsythe, saying why don’t we try to make it a package deal. This meant they had to
give up two professors to replace us with. They couldn’t get two new billets for us, and so it was a lot of
work on Stanford’s part, but it did develop. Except that you had to lose two other good people, but I think
Bob and I did all right for the department.

Ed Feigenbaum: Maybe that was your first great service to our department was recruiting Bob Floyd.

Don Knuth: Well, I don’t know. I did have to work a little bit the year after I got here. To my surprise they
had appointed him as an associate professor but me as a full professor. It was understandable because
he didn’t have a Ph.D. He had been a child prodigy, and I think he had gotten into graduate school at
something like age 17, and then dropped out to become a full time programmer. So he didn’t have the
academic credentials, although he had all the best papers in the field. I had to meet with the provost and
say it’s time to promote him to full professor. The thing that clinched it was that he was the only person
that had gotten -- this was 1969 -- he was the only person that had been invited to give keynote
addresses in two sessions of the International Congress in Ljubljana.

Ed Feigenbaum: In ’71.

Don Knuth: ’71, yeah. That helped.

Ed Feigenbaum: That was IFIPS.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 26 of 73


Oral History of Donald Knuth

Don Knuth: Yeah, IFIPS: Information Processing.

Feigenbaum: Don, maybe we could just say a little more about Bob [Floyd] and his life at Sanford.

Knuth: Right. As it turned out, when we got together we couldn’t collaborate quite as well as when we
were writing letters. I noticed this was true in other cases. Like sometimes I could advise my students
better when I was on sabbatical than when we were having weekly meetings. It’s not easy to work face-
to-face all the time, but rather sometimes offline instead of online. I told you my experience with Marshall
Hall -- that I couldn’t think in his presence. I have to confess that there are some women computer
scientists that when I’m in their presence, I think only of their brown eyes. I love their research, but I’m
wired in certain ways that mean that we should write our joint papers by mail, or by fe-mail. Anyway. We
did a lot of joint work in the early ‘70s, but also it turned out that when Bob became chair of the
department… I’m not sure exactly when that was; probably right after my sabbatical.

Feigenbaum: I think 1972.

Knuth: Yeah. I went on leave of absence for a year in Norway and then I came back and Bob was chair
of the department. He took that job extremely seriously, and worked on it to such an extent that he
couldn’t do any research very much at all during those three or four years when he was chair. I don’t
know how many years, five years.

Feigenbaum: I started being chair in ’76, so like four years.

Knuth: Okay, so it was four years. That included very detailed planning all aspects of our new building.
When he came back, then he had two years of sabbatical. That’s one credit that you get. So there was a
break in our joint collaboration. Afterwards, he never quite caught up to the leading edge of the same
research topics that I was in. We would work on things occasionally, but not at all the way we had done
previously. We wrote a paper that we were quite pleased with at the end of the ‘80s, but it was not the
kind of thing that we imagined originally, that would always be in each other’s backyard. In fact, I’m a
very bad coworker. You can’t count on me to do anything, because it takes me a while to finish stuff and
I think of something else. So how can anybody rely on me as being able to go with their agenda? Bob,
during the ‘70s, came up with a lot of ideas, like his method for half-tone, for making gray-level pictures,
that is in all the printers of the world now. That was done completely independently. I didn’t even know
about it until a couple years after he had come up with these inventions. But I’m dedicating a book to
Bob. My collected works are being published in eight volumes. The seventh volume is selected papers
on design of algorithms. That one is dedicated to Bob Floyd, because a lot of the joint papers, joint work
we did, occurs in that volume. He was one of the few people in my life that really I consider one of my
teachers, the gurus that inspired me.

Feigenbaum: Don, I’m going to call that the end of your first period of Stanford. I wanted to move into
some questions about what I call your second Stanford period. This is very different. I’ve sort of
delineated this as a very different time. I saw you shifting gears, and I couldn’t believe what was
happening. You became, in a solitary way, the world’s greatest programmer. It was your engineering

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 27 of 73


Oral History of Donald Knuth

phase. This was TeX and METAFONT. All of a sudden, you disappeared into just miles of code, and
fantastic coding ideas just pouring out, plus your engineering. We were in the new building and you were
running back and forth from your office to where this new printing machine was installed. You’d be
debugging it with your eyes and with your symbols and pulling your hair out of your head, because it
wasn’t working right, and all that. You were just what the National Academy of Engineering would call an
engineer. Tell me about that period in your life.

Knuth: Okay, well, it ties in with several things. There was a year that you didn’t see me when I was up
at McCarthy’s lab...

Feigenbaum: Well, I heard all about it.

Knuth: …starting this. One of the first papers that I collaborated with Bob Floyd on in 1970 [had] to do
with avoiding go-to statements. There was a revolutionary new way to write programs that came along in
the ‘70s, called structured programming. It was a different way than we were used to when I had done all
my compilers in the ’60s. Bob and I, in a lot of our earliest conversations at Stanford, were saying, “Let’s
get on the bandwagon for this. Let’s understand structured programming and do it right.” So one of our
first papers was to do what we thought was a better approach to this idea of structured programming than
some people had been taking. Some people had misunderstood that if you just get rid of go-to
statements you had a structured program. That’s like saying zero population growth; you have a
numerical goal, but you don’t change the structure. People were figuring out a way to write programs that
were just as messy as before, but without using the word “go-to” in them. We said no, no, no; here’s
what the real issues are. Bob and I were working on this. This is going on, and we’re teaching students
how to write programs at Stanford, but we had never really written more than textbook code ourselves in
this style. Here we are, being full professors, telling people how to do it, having never done it ourselves
except in really sterile cases with not any real world constraints. I probably was itching… Thank you for
calling me the world’s greatest programmer. I was always calling myself that in my head. I love
programming, and so I loved to think that I was doing it as well as anybody. But the fact is, the new way
of programming was something that I didn’t have time to put much effort into.

Feigenbaum: The emphasis in my comment was on the solitary. You were a single programmer doing
all this. No team.

Knuth: That’s right. As I said, it’s hard for me to have somebody else doing the drumming. I had to
march to my... I had The Art of Computer Programming, too. I could never be a reliable part of a team
that I wasn’t the head of, I guess. I did first have to get into that mode, because I was forced to. I was
chair of the committee at Stanford for our university reports. We put out lots and lots of reports from all
phases of the department through these years. We had a big mailing list. People were also trading their
reports with us. We had to have a massive bookkeeping system just to keep the correspondence, so that
the secretaries in charge of it could know who had paid for their reports, who we were sharing with. All
this administrative type of work had to be done. It seemed like just a small matter of programming to do
this. I had a grad student who volunteered to do this as his master’s project; to write-up program that
would take care of all of the administrative chores of the Stanford tech reports distribution. He turned in
his term paper and I looked at it superficially and I gave him an A on it, and he graduated with his

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 28 of 73


Oral History of Donald Knuth

master’s degree. A week later, the secretary called me up and said, “Don, we’re having a little trouble
with this program. Can you take a look at it for us?” The program was running up at the AI lab, which I
hadn’t visited very often. I went up there and took a look at the program. I got to page five of the
program and I said, “Hmmm. This is interesting. Let me make a copy of this page. I’m going to show it
to my class.” [It was] the first time I saw where you change one symbol on the page and you can make
the program run 50 times faster. He had misunderstood a sorting algorithm. I thought this was great.
Then I turned to the next page and he has a searching algorithm there for binary search. I said, “Oh, he
made a very interesting error here. I’ll make a copy of this page so I can show my class next time I teach
about the wrong way to do binary search.” Then I got to page eight or nine, and I realized that the way he
had written his program was hopelessly wrong. He had written a program that would only work on the
test case that he had used in his report for the master’s thesis, that was based on a database of size
three or something like this. If you increased the database to four, all the structures would break down. It
was the most weird thing. I would never conceive of it in my life. He would assume that the whole
database was being maintained by the text editor, and the text editor would generate an index, the way
the thing did. Anyway, it was completely hopeless. There was no way to fix the program. I thought I was
going to spend the weekend and give it to the secretary on Monday and she could work on it. There was
no way. I had to spend a month writing a program that summer -- I think it was probably ’75, ’76 -- to
cover up for my terrible error of giving this guy an A without seeing it. The report that he had, made it
look like his program was working. But it only worked on that one case. It was really pathetic. So I said,
“Okay, I’ll use structured programming. I’ll do it right. This is my chance to do structured programming.
I’ll get a learning experience out of it.” I got a good appreciation for writing administrative-type
programming. I used to think was trivial, [but] there was a lot to it. After a month I had a structured
program that would do Stanford reports, and I could install that and get back to the rest of my life.
Meanwhile, I’d been up at the AI lab and I met the people up there. I got to know Leland Smith, who is a
great musician professor. Leland Smith told me about a problem that he had. He was typesetting music.
He says, “I’ve got a piece of music and it maybe has 50 bars of music. I have to decide when to turn the
page. I know how many notes are in each bar of the music, and I know how much can fit on the page.
But I like to have the breaks come out right. Is there any algorithms that could work for this?” He
described the problem with me. He had the sequence of numbers, how many notes there are, and try to
find a way to break it into lines and pages in a decent way. I looked at the problem and said, “Hey
Leland, this is great. It’s a nice application of something we in computer science call the dynamic
programming algorithm (method). Look, here’s how dynamic programming can be used to solve this
problem.” Then I’m teaching Stanford’s problem seminar the next fall, and it came up in class. I would
show the students, “Look how we had this music problem, and we can solve it with dynamic
programming.” One of the students, I don’t remember who it was, raised his hand and said, “You know,
you could also use that to text, to printing books. You could say, instead of notes into bars, you could
also say you’ve got letters and words into lines, and make paragraphs choosing good line breaks that
way.” I said, “Hey, that’s cool. You’re right.” Then comes, in the mail, the proof sheets for the second
edition of volume two. I had changed a lot of pages in volume two of The Art of Computer Programming.
I got page proofs for the new edition. During the ‘70s, printing technology changed drastically. Printing
was done with hot lead in the ‘60s, but they switched over to using film in the ‘70s. My whole book had
been completely retypeset with a different technology. The new fonts looked terrible! The subscripts
were in a different style from the large letters, for example, and the spacing was very bad. You can look
at books printed in the early ‘70s and it turns out that if it wasn’t simple -- well, almost everything looked
atrocious in those days. I couldn’t stand to see my books so ugly. I spent all this time working on it, and
you can’t be proud of something that looks hopeless. I’m tearing out my hair. I went to Boston again and
they said, “Oh, well, we know these people in Poland. They can imitate the fonts that you had in the old
hot lead days. It’s probably not legal, but we can probably sneak it through without…” You know, the

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 29 of 73


Oral History of Donald Knuth

copyright problems of the fonts. “They’ll try to do the best they can, and do better”. Then they come back
to me, at the beginning of ’77, with the new version done with these Polish fonts which are supposed to
solve the problem. They are just hopelessly bad. At the very same time, February of ’77, I’m on
Stanford’s comprehensive exam committee, and we’re deciding what the reading list is going to be for
next year’s comp. Pat Winston had just come out with a new book on artificial intelligence, and the proofs
of it were just being done at III Corporation [Information International, Incorporated] in Southern
California; at [Ed] Fredkin’s company. They had a new way of typesetting using lasers. All digital, all dots
of ink. Instead of photographic images and lenses, they were using algorithms, bits. I looked at these
galley proofs of Winston’s book. I knew it was just bits, but they looked gorgeous. They looked
absolutely as good as anything I’d ever seen printed by any method. By this time I was working at the AI
lab, where we had the Xerox Graphics Printer, which did bits at about 120 dots per inch. It looked
interesting, but it didn’t look beautiful by any stretch of the imagination. Here, with I think this was 1,000
dots per inch at III, you couldn’t tell the difference. It was like: I come from Wisconsin and in Wisconsin
we never eat margarine. Margarine was illegal to bring into the State of Wisconsin unless you didn’t color
it. I’m raised on butter. It’s the same thing here. With typography, I’m thinking: okay, digital typography
would have to be like margarine. It couldn’t be the real thing. But, no! Our eyes don’t see any difference
when you’ve got enough dots to the inch. A week later, I’m flying down with Les Earnest to Southern
California to III, and finding out what’s going on there. How can we get this machine and do it?
Meanwhile, I planned to have my sabbatical year in ‘77-’78. I was going to spend my sabbatical year in
Chile.

Feigenbaum: Don, can I interrupt you just a second?

Knuth: Yeah.

Feigenbaum: I don’t know if Fredkin was still involved with III at that time. But III never gets enough
credit for those really revolutionary ideas.

Knuth: That’s right.

Feigenbaum: Not just those ideas, but the high speed graphics ideas.

Knuth: Oh yeah. That’s when I met Rich Sherpel [ph?] down there, and he was working on character
recognition problems. They had been doing it actually for a long time on microfilm, before doing
Winston’s book. This was the second generation. First they had been using the digital technology at
really high resolutions on microfilm. And so many other things [were] going on. Fredkin is a guy who--

Feigenbaum: Right at the beginning, Fredkin revolutionized film reading, using the PDP-1. Anyway, I
interrupted you. You were on your Chile.

Knuth: Ed’s life is ten times as interesting as mine. I’m sure that every time I hear more about Ed, it
adds just another… He’s an incredible person. We got to get 20 oral histories.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 30 of 73


Oral History of Donald Knuth

Feigenbaum: I think Ed may be a subject for one of these oral histories of the Computer History
Museum.

Knuth: Yeah, we’ve got to do it. Anyway, I cancelled my sabbatical plan for Chile. I wrote to them
saying I’m sorry; instead of working on volume four during my sabbatical, I’m going to work on
typography. I’ve got to solve this problem of getting typesetting right. It’s only zeros and ones. I can get
those dots on the page, and I’ve got to write this program. That’s when I became an engineer.

Feigenbaum: I’m going to let you go on with this, but I just wanted to ask a question in the middle here,
just related to myself, actually. How much of this motivation to do TeX related to your just wanting to get
back to being a programmer? Life was going on in too abstract a way, and you wanted to get back to
being a programmer and learning what the problems were, or the joy of programming.

Knuth: It’s a very interesting hypothesis, because really you can see that I had this. The way I
approached the CS reports problem the year before was an indication of this; that I did want to sink my
teeth into something other than a toy problem. It wasn’t real large, but it wasn’t real small either. It’s true
that I probably had this craving. But I had a stronger craving to finish volume four. I did sincerely believe
that it was only going to take me a year to do it.

Feigenbaum: Maybe volume four wasn’t quite ready. Maybe…

Knuth: Oh, this is true.

Feigenbaum: …it was still cooking.

Knuth: No, no, absolutely. You’re absolutely right. In 1975 and ’76, you can check it out. Look at the
Journal of the ACM. Look at the SIAM Journal on Computing. Look at, well, there’s also SIAM Review
and there’s math journals, combinatorial journals, Communications of the ACM, for that matter. You’ll
find more than half of those articles are things that belong in volume four. People were discovering things
right and left that I knew deserved to be done right in volume four. Volume four is about combinatorial
algorithms. Combinatorial algorithms was such a small topic in 1962 when I made that chapter seven of
my outline that Johan Dahl asked me, when I was in Norway, “How did you ever think of putting in a
chapter about combinatorial algorithms in 1962?” I said, “Well, the only reason was, that was the part I
thought was most fun.” I really enjoy writing, like this program for Bose that I did overnight. It was a
combinatorial program. So I had to have this chapter just for fun. But there was almost nothing known
about it at the time. People will talk about combinatorial algorithms nowadays [and] they usually use
“combinatorial” in a negative way. In a pejorative sense, instead of the way I look at it. They say, “Oh,
the combinatorial is going to kill you.” “Combinatorial” means “It’s exploding. you can’t handle it, it’s a
huge problem.” The way I look at it is, combinatorial means this is where you’ve got to use some art.
You’ve got to be really skillful, because one good idea can save you six orders of magnitude and make
your program run a million times faster. People are coming up with these ideas all the time. For me, the
combinatorial explosion was the explosion of research. Not the problems exploding, but the ideas were
exploding. So there’s that much more to cover. It’s true that I also in the back of my mind I’m scared stiff

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 31 of 73


Oral History of Donald Knuth

that I can’t write volume four anymore. So maybe I’m waiting for it to simmer down. Somebody did say to
me once, after I solved the problem of typesetting, maybe I would start to look at binding or something,
because I had to have some other reason [to delay]. I’ve certainly seen enough graduate student
procrastinators in my life. Maybe I was in denial.

Feigenbaum: Anyway, you headed into this major engineering problem.

Knuth: As far as I knew, though, it was going to take me a year. I was going to work and I was going to
enjoy having a year of writing this kind of a program. The program was going to be just for me and my
secretary, Phyllis; my super-secretary, Phyllis. I was going to teach her how to do it. She loved to do
technical typing. I could write my books and she could make them; dotting I’s and crossing T’s and spit
and polish that she did on my math papers when she always typed my math papers.

Session 2: March 21, 2007

Ed Feigenbaum: My name is Edward Feigenbaum. I am a Professor of Computer Science Emeritus at


Stanford University and a colleague of Donald Knuth’s since the day that he showed up at Stanford. This
is session No. 2 of an oral history in which Don has been discussing his early work and what we call the
first Stanford period. We’re now about to go into what we’re calling the second Stanford period, in which
he discusses his work on typesetting, printing, and font design, TeX and Metafont. But Don, before we
start on that, I want to mention in the week since we met for the first session, the news came out that
John Backus, the leader of the team that developed Fortran for IBM, died. John’s work intersects to
some degree with the work that you spoke about last week in the first part of the interview, the work that
Alan Perlis did at Carnegie Mellon, at that time Carnegie Tech, on IT, Internal Translator, and the work
that you did on RUNCIBLE. I happened to be around both places at that time. I was a summer student
working for IBM at the time that the Fortran group was working in New York, and interacted with them.
Then I came back to Carnegie Tech and was startled, actually, seeing Perlis’s compiler up and running
on the [IBM] 650. It looked to me as if Perlis’s work was up and running somewhere around 6 months
before, maybe 9 months before, Fortran was running on the IBM 704 in New York. The question for you
is: is priority important in computer science research as it is in many other disciplines, like chemistry or
physics? Is there a priority discussion to be had here on the question of Fortran and IT?

Don Knuth: Okay, those are great questions. It’s funny you’d ask that, because the number one thing
on my mind as I was walking into the building this morning was thinking about John Backus’s death. It
was a shock to me to learn about it yesterday, but then I was just thinking, “Oh yeah, he always wore
clothes like this.” [Motions to himself] Whenever I saw him he was wearing a denim jacket. I can say just
a few orthogonal things about the whole situation that strike me first. In the first place, when I was a
student we had no information at all about Fortran. I didn’t hear about it until after I had been using IT,
and I think after RUNCIBLE. It was, like, 1959 when people were coming out with something called
FORTRANSIT, which was a translator from Fortran to IT, to IT, so that people on the 650 could use the
Fortran language. You see, the IBM 650 was the world’s first mass-produced computer, the first time
there were more than 100 of any one kind of computer. Fortran was developed for a 704, which there
were several dozen of those, but it was aircraft industries and so on. It was people who could afford a
much bigger kind of machine than the 650. It was a different world. You were lucky as a summer

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 32 of 73


Oral History of Donald Knuth

student. You could see the other world, but I was more in the boondocks. I told you last time that priority
was so far from my mind when I wrote this article about RUNCIBLE for the Communications [of the ACM]
that I failed to mention any of the people who were working with me. We had this team at Case, but we
didn’t name any names in our story as to who came up with the improvements that we made, because it
wasn’t something that we knew anything about. But that might have been my naiveté as a college
undergraduate. The first time I learned about upsmanship or something -- academic priority -- was, in
fact, from my teacher Bose, the man who worked on Latin squares. The reason he wanted me to get this
program working overnight is because he was in intense competition with another group in Canada that
was also trying to find Latin squares of order 12. It turned out it was approximately a dead heat between
the two groups. That amazed me, that there could be so much competition for being first at the time. It
wasn’t part of the culture that I grew up in. All I can report is that I was amazed later on to find out also,
when people are talking about the discovery of DNA and all this, how much passion went into these
things, because it just wasn’t something that I personally experienced. But it just might be, again, my
naiveté. I presented a paper at the ACM Conference in 1962 in Syracuse, which was the summer that I
wrote my Fortran compiler for the Univac solid-state machine. At the end of that summer I gave this
paper at the ACM called, “A History of Writing Compilers.” I guess I didn’t call it “The History of Writing
Compilers.” Basically I was trying to explain in my talk what I knew about the various developments that
had come up with in technology for writing compilers. Of course I mentioned Fortran, and the ways in
which they had dealt with the question, for example, of operator precedence. That was, if you write
AxB+C without parentheses, Fortran would recognize that as first multiply A by B, and then add C. I’ll go
back and give you a better example. AxB+CxD, but you write that without any parentheses. Now what
would happen in Fortran, is Fortran would know that the mathematicians usually mean by that that you
take A and multiply it by B, and you take C and multiply by D, and then you add the two things together.
But IT wouldn’t do it that way. IT would require you to put parentheses if you want to do it, and, if my
memory is correct, otherwise it would associate to the right. So it would take A times the quantity B plus
the quantity C times D. So Fortran had to invent a way to do this. The way they did it was rather clever.
They replaced the times sign by right parenthesis times left parenthesis, and they replaced the plus sign
by two right parentheses plus two left parentheses. Then they put a whole bunch of parentheses around
the whole thing. The result is that you had an expression that was fully parenthesized, but since you had
guarded the plus sign with two parentheses and the times with only one, the times was done first. It was
a clever idea. It’s just one of the things I mention in my paper in this Syracuse thing. Well, a reviewer of
my paper afterwards said, “He didn’t talk about the history of writing compilers. He just talked about the
history of him writing one particular compiler.” Well, if you look at my paper you’ll see it’s not a fair
criticism. The reviewer was undoubtedly tee’d off that I had not mentioned his compiler. I gave a history
of many ideas that were used in building compilers, but I didn’t give a history of what people had done in
compilers. As years went on, I got more interested in history. In 1962 I was 24 years old. It’s like Mark
Twain or somebody said, that “When you’re a teenager you think your parents are the stupidest people in
the world. Five years later you wonder how they could learn so much in five years.” You get more
interested in history and the overall thing. Well anyway, this criticism, that I hadn’t given a very
comprehensive history of compilers, weighed on my mind. So the next few years after ’62, I actually
started looking into the history of compilers, trying to get a real understanding as to who did what when,
and first, and so on. Where did the ideas come from before the little excrescences of the story that I
knew. In fact, the main lecture I was giving on my ACM lecture tour in 1967 was the real early history of
writing compilers. By that time I had gone through and I had studied Grace Hopper’s work, and I had
studied Backus’s work, and the Fortran 0, and the many developments in England and Russia and so on,
that had taken place in the earliest days. So the talk that I was giving when I’m making this nationwide
lecture tour is mostly this talk of redeeming myself for giving a very unbalanced view of the history of
compiler development that I had given in 1962. Later on I worked with my student Luis Trabb Pardo in

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 33 of 73


Oral History of Donald Knuth

order to really do it right, because the Harvard University Press had asked me to edit a sourcebook on
computer science which was intended to print the documents from the early days that had come out
before their time. I had collected a lot of these early things. They’ve done this with many other fields:
sourcebook on logic, sourcebook on mathematics, sourcebook on chemistry, and that kind of thing.
Harvard had a big series. I was asked to do a sourcebook on computer science. In the course of this I
worked with Luis to get a really thorough history of programming languages, their early development. We
presented this as a paper at a big conference in Los Alamos in 1976. 1976 was the year everybody had
history on mind, because it was the bicentennial of America. We had a big conference where almost all
the computer pioneers were living were assembled there. People like [Konrad] Zuse came, who I met
from Europe, and the people who had worked on the Colossus computers. All these pioneers were there.
The paper that I presented at that time was “The Early History of Computer Languages.” This was one of
the most difficult papers to write, in the sense of total amount of work expended, because what I
presented in this talk was 20 predecessors to Fortran. Not only was Fortran not number 1, but it was
number 21, basically. Although one of the 20 preceding Fortran was the preliminary specs of Fortran,
which wasn’t implemented, but people were using it in mockups and trial runs. Going to Zuse’s work, for
example, Zuse had a high level language, his PlanKalkul. Many, many other pioneers [attended]. I
brought that picture all together. I’m quite proud of the paper now, because of all the work I put into it. As
I was writing it I found out actually I only had 19 predecessors of Fortran. Just a week before the
conference I learned about another one at Livermore that had been developed. I went out to Livermore,
and right in my own backyard was one of the first. So there was a great amount of activity going on. IT
was part of this, for sure. But most of the people didn’t know [of] the existence of the others. Fortran
itself was strongly influenced by a compiler for the Whirlwind computer that John Backus learned about
when he went to a conference at MIT in 1954. Then John got his team together and did that. I had a
great admiration for John. I remember that the first time I came to Stanford, which was about 1964, was
when I first met him. We had corresponded. He and Barbara invited me to their house, and he also
introduced me to topless bars at the time. It was interesting as a nice phenomenon in San Francisco, you
know. We had a pleasant evening together. That was on the same trip that I visited Stanford at
Forsythe’s invitation. John and I always hit it off well, and I admired his breadth of interest in all these
things. But your question was mostly about priority. I think it cuts two ways. In the first place, I don’t like
to think of it as saying somebody did it before somebody else. That’s the popular interpretation of the
priority. But the opposite is where you just have an idea and you have no idea where it came from.
That’s very bad, I think, just to assume that ideas have no connection to each other or they didn’t spring
from somewhere. Because how are we going to get another idea tomorrow if we don’t have a lot of case
studies as to how ideas can germinate? I go out of my way in my books The Art of Computer
Programming to try to track down the sources of the concepts that we have in computer science.
Sometimes I tell people I only do this in order to make computer science respectable, to show that it’s not
a fly-by-night thing, but it’s deeply rooted in ancient history and so on. Well, of course, it’s nice to have
computer science a little bit respectable. We are the new kid on the block. But that’s not really the point.
The point is that really there were people who would’ve been computer scientists, if computers had been
around, that were living a hundred years ago. They just happened to have been born at the wrong time,
but they had the same kind of strange way of looking at things that I do. I can see that in their writings. I
was reading last year a manuscript from 14th century India, and I felt the guy was talking to me. I doubt if
any of his contemporaries really knew what he was, but here it was. I said to my wife, “This guy is a
computer scientist. I know exactly what’s going on because I went through the same kind of a thought
process when I was looking at a similar problem when I was younger.” So the idea of priority is more,
instead, really learning the human element of it. How somebody was able to combine ideas and then
make a non-obvious leap that would then influence somebody else. For this reason I love to read source
documents instead of [reading] somebody boiling down a source document. I boil it down myself in my

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 34 of 73


Oral History of Donald Knuth

books. I try to recommend that. I try to give places so that people can check out the originals when they
can. We’d have much less of this cutthroat idea of competition in the field than I read about when I study
the novel by my friend who invented the birth control pill.

Feigenbaum: Carl Djerassi.

Knuth: Yeah, Carl Djerassi’s novel, “The Bourbaki Gambit”. Or something like this, right? It’s all about a
world of science that I don’t feel computer science inhabits. It’s a different…

Feigenbaum: Yeah, Carl’s a chemist.

Knuth: Yeah.

Feigenbaum: I was going to bring a quote from a chemist to this interview but I didn’t have it exactly right
so I didn’t do it. But in chemistry, the knife is sharp.

Knuth: Yeah. I worked on open source publishing a few years ago, and I was surprised to find out
that… I was looking at some of the general policy, and in other fields than computer science when you
submit your paper, you can list people that you don’t want to be referees of your paper. It blew my mind.
I said, “Why do you do this?” and he said, “Well, because they think the other guys are going to steal their
ideas.” <laughs> We share ideas. The whole Silicon Valley culture, the venture capitalists get together
for lunch every Tuesday and say, “These are the startups I’m thinking of starting,” and somebody will say,
“Well why don’t you change it a little bit?” They share ideas openly because they know that there’s a half-
life of these ideas, and in six months they’ll get better. The companies are even better because of it. The
biology community would never think of such a thing, of sharing their plans for new development. It is
quite a different culture. I think you’re right.

Feigenbaum: Don, just a small follow-up question for this. As you were speaking, it was bubbling in my
mind that in various sciences, including ours, the big prizes are sometimes given for what are considered
breakthrough ideas. So a young person can win a big prize. Sometimes it’s given for career
contributions. The Nobel Prizes are like this too. Sometimes a brilliant thing flashes up on the screen,
like the CT scan. The Nobel Prize was given to a EE guy for the CT scan, in medicine. But often the
prize is given to someone for a career’s worth of work. Do you think that we have breakthrough ideas in
computer science of that sort?

Knuth: Yeah, but I minimize their importance, in a sense. We do have landmark ideas that sort of all of
a sudden… Something like in theoretical field, the idea of NP-completeness. All of a sudden we had
thousands of people inspired by this idea. But how many of them are there? It was interesting. I wrote a
letter to Allen Newell when I was starting to write “The Art of Computer Programming.” I think I wrote it to
him in 1963 or something like this. I said, “Allen, I’m struck by the fact that all good ideas in computer
science were invented before 1960 and we’ve just been rediscovering the wheel since then.” I’m not
sure, but sort of that was the thrust of my letter. Allen replied to me, “Oh no, Don, you’re suffering from

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 35 of 73


Oral History of Donald Knuth

the bow wave phenomenon,” Or something like this. Then I had another conversation with Juris
Hartmanis, who was the head of the department at Cornell. Juris was a wonderful leading person in
Automata Theory, and he happened to have been a student of Marshall Hall as well. He was recruiting
me to come to Cornell at the same time I was considering Berkeley, and Stanford, and other things. He
visited me, and I visited Cornell with the serious idea of going there, and didn’t put it on my final list
because people don’t drop into Cornell the way they drop into Stanford. I would have to go to them, and I
don’t like traveling that much. But we had this conversation, and one of the questions that struck me, he
said, “Don, what was the most important new idea in computer science during the past year?” I couldn’t
think of a single thing. Say 1965, or something like that. I couldn’t think of any breakthrough. For the
next ten or so years, I asked myself the same thing at the end of every year. What was the breakthrough
that occurred this year? I couldn’t come up with anything. Almost never could come up with anything.
On the other hand, in ten years the whole field had changed. I realized that what it really is, it’s like a
great wall, where everybody’s contributing bricks to the wall, and each brick… In other words, it’s the
community enterprise that really has made it such a thriving field. I like to give credit to everybody who
puts in one of these bricks. Of course, we’ve got to have the major prizes in order to get into the
newspaper and things like this. But so many things go into this. The big breakthrough is not the real
story, although they’re wonderful when they occur. That’s my take on that.

Feigenbaum: Thanks, Don. We could sit here and discuss that endlessly. I’m going to resist doing that
because I’d like to get on to the moment when you are sitting in that little office in [Stanford’s Margaret]
Jacks Hall, and you decide that you and Phyllis need a better language in which to basically, essentially,
typeset your books. Listening to the early part of your conversation about this last week, it occurred to
me that one level below the surface, there’s something else about books. My wife just stopped being,
she was a trustee, a member of the Board of the San Francisco Center for the Book. She’s a book artist
and she loves books.

Knuth: Oh, I see. I visited there two weeks ago.

Feigenbaum: I get this feeling that there’s something about books inside you, inside your head, that you
absolutely love. Can you just tell us about your love affair with books?

Knuth: That goes very deep. My parents disobeyed the conventional wisdom by teaching me to read
before I went into kindergarten. All of their friends said, “No, he’s going to be bored in school,” but I was
the youngest member of the “bookworm club” in Milwaukee Public Library. I think I was two-and-a-half
years old, or something like this. The Milwaukee Journal ran a little blurb about it with my picture in it
because I was a member of the bookworm club at the library. I loved books from a child. In those days
there weren’t big drug problems and so on, and little kids could ride the streetcars downtown. I went
down to the library one day, and the lights went out in the library. I went over to the window so I could
see better the book I was reading. It didn’t occur to me the library was closing. My parents called a
couple hours later and said, “Where is he? Where’s our son?”, and the librarian found me in the book. I
have kind of a strange love affair with books going way back. In my undergraduate years, I think I
mentioned last time that a lot of my favorite textbooks were published by Addison-Wesley: the calculus
book that I had, the physics book that I had, the book on number theory that I had seen. Addison-
Wesley, for technical books, had a special thing. The president of the company had actually done
something that other publishers… I know you were an editor for McGraw-Hill. McGraw-Hill would farm

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 36 of 73


Oral History of Donald Knuth

out their typesetting, but Addison-Wesley had its own house composition plant. Hans Wolf had his team
of people making the type, right next door to where the editorial offices were. It was the philosophy of the
company really to get a special house style, and really good designers, and they made their mark on it.
They also published the first book on computer science: Wilkes, Wheeler, and Gill in 1951, or something
like this. That was one of the very first books Addison-Wesley put out, at the time when it was a
struggling new company. I had also this thing about the appearance of books. I wanted my books to be
something that other readers would treasure the appearance of it, not just that there were some words in
there.

Feigenbaum: Let’s go back to the time when you were planning TeX, and Phyllis was in the outside
office, and the two of you needed a language.

Knuth: Right. Phyllis had been typing all of my technical papers. I have never seen her equal anywhere,
and I’ve met a lot of really good technical typists. She really loved it too, and so we had a fairly good
thing. She could read my handwriting. I always composed my manuscripts by hand. People ask me
about this. I might as well digress yet again. I also love keyboard things. I’ve been playing the piano for
ages. When I was in high school I learned how to run a stenograph machine, like court reporters use. I
went to Spencerian College for summer class. I had the idea I’d get to college and I’m going to take
notes with a stenograph machine. I tried it for two weeks at Case before giving it up. We had been
taught shortcuts for how to say, “Dear Sir,” and “Yours very truly,” but we didn’t have any abbreviations
for chemistry and all these other things.

Feigenbaum: Yeah, or differential equations.

Knuth: But anyway, I’m fascinated by keyboards. I also took typing and I was a very good typist. I could
do 70 words a minute or something like this. I got myself a Russian typewriter with a Cyrillic keyboard so
that I could do my Russian homework in undergrad as well. I love keyboards. But I always compose my
manuscripts handwritten. The reason is that I type faster than I think. There’s a synchronization
problem. I can think of ideas at about the rate I can write them down with a pencil. But with typing I’m
going faster, so I have to sync, and my thoughts have to start up and stop again in a way that involves
more of my brain. As a college student I found I could write a letter home much faster by hand, much
faster than I could type it even though I’m a great typist. The synchronization was slowing down the total
thing. Phyllis and I had this nice, symbiotic relationship. She could read my handwriting, she knew when
to display a formula, make it look beautiful. You almost would think she knew more mathematics than I
did, sometimes, the way she would correct a formula that I had and didn’t look right to her. She would
change it and also get it right. When I’m learning that typesetting is a problem of zeros and ones --just a
matter of programming to get the ink where it’s supposed to go -- my thought was definitely that this
would be something that I would make so that Phyllis would be able to take my handwritten manuscripts
and go from there. I used her as the model for the language that I was developing, and I also would be
able to understand it myself.

Feigenbaum: You didn’t have in mind another mathematician…

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 37 of 73


Oral History of Donald Knuth

Knuth: No.

Feigenbaum: …doing his own work?

Knuth: That’s right. I knew that Bell Labs had a system where they had been using secretaries to
typeset. Bell Labs had the EQN system. I learned later that other people had developed systems where
they hire and train secretaries. I used the Bell Lab system, which I knew was a working system, where
somebody uses the Greek letter Alpha, they say “A-l-p-h-a.” The guys in these commercial systems, the
letter Alpha, they say, “Oh no, these secretaries can never learn that. They’re scared by any hint that it’s
Greek. They just know that it’s this symbol, and so they type QA for the letter Alpha, and that gives them
job satisfaction because they know this code that the mathematicians don’t know. My philosophy was,
though, that I knew that Phyllis would like to write “A-l-p-h-a”. What went into the design was her as a
model. And the fact that I knew that the secretaries at Bell Labs had a language that was in existence,
that it was something that secretaries could learn. At that time when I started TeX, some physics journals
were already being typeset with the EQN system from Bell Labs. It looked horrible -- the spacing was just
ugly -- but it was the first generation of this. But I knew that they had a language that the secretaries
could learn. All I had to do was tune up the aesthetics of the final product.

Feigenbaum: Don, I would like to ask you about the activities going on. You mentioned that TeX took
much longer than you had anticipated. You had anticipated a one-year project. You ended up with a ten-
year project. It kind of carves out a section of your life in which you were being an interface designer.
You were being a programmer. I use the term “programmer” because you yourself use it in bio material
on the web, that when you were doing TeX you were a programmer. Then there’s all the other things
going on, both with TeX, with fonts, with the rest of your life. Can you tell us about those three things?

Knuth: Okay.

Feigenbaum: A designer story. A programmer story.

Knuth: A life story. Okay, there are stories. The first part of it, I’m designing a language for my
secretary. This took place in sort of two all-nighters. I made a draft. I sat up at the AI lab one evening
and into the early morning hours, composing what I thought would be the specifications of a language. I
had already been playing around. I looked at my book and I found excerpts from several dozen pages
where I thought it gave all the variety of things I need in the book. Then I sat down and I thought, well, if I
were Phyllis, how would I like to key this in? What would be a reasonable format that would appeal to
Phyllis, and at the same time something that as a compiler writer I felt that I could translate into the book,
because TeX is another kind of a compiler. Instead of going into machine language, instead, you’re
going into words on a page. That’s a different output language, but it’s analogous in recognizing the
constructs that appear in the source file. So I went through and this day I drafted how I would typeset
those 12 sample segments in a language that I thought Phyllis would understand. I also mentioned a
mini-users manual for teaching this language. I wrote the draft of this one night, and I showed it to a

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 38 of 73


Oral History of Donald Knuth

bunch of people for their comments. Then a few weeks later I went through the same thing again.
Fortunately, the Stanford AI lab, where I did this work, had a very good backup system. All of the files
that were on that computer for more than 20 years, stored on archival tapes, are now being available
through the internet. I found, thanks to looking at these old so-called dark tapes, I found the drafts that I
made of TeX on those days when I did the design. Since I believe in source documents, as I said, I
published those in my book, “Digital Typography”, so the people could see what the raw thoughts were,
and all the mistakes, the words that were there at the very beginning. Just as an idea of a design
process. Then I showed the second version of this design to two of my graduate students, and I said,
“Okay, implement this, please, this summer. That’s your summer job.” I thought I had specified a
language. I had to go away. I spent several weeks in China during the summer of 1977, and I had
various other obligations. I assumed that when I got back from my summer trips, I would be able to play
around with TeX and refine it a little bit. To my amazement, the students, who were outstanding
students, had not competed [it]. They had a system that was able to do about three lines of TeX. I
thought, “My goodness, what’s going on? I thought these were good students.” Well afterwards I
changed my attitude to saying, “Boy, they accomplished a miracle.” Because going from my
specification, which I thought was complete, they really had an impossible task, and they had succeeded
wonderfully with it. These students, by the way, [were] Michael Plass, who has gone on to be the brains
behind almost all of Xerox’s Docutech software and all kind of things that are inside of typesetting devices
now, and Frank Liang, one of the key people for Microsoft Word. He did important mathematical things
as well as his hyphenation methods which are quite used in all languages now. These guys were actually
doing great work, but I was amazed that they couldn’t do what I thought was just sort of a routine task.
Then I became a programmer in earnest, where I had to do it. The reason is when you’re doing
programming, you have to explain something to a computer, which is dumb. When you’re writing a
document for a human being to understand, the human being will look at it and nod his head and say,
“Yeah, this makes sense.” But then there’s all kinds of ambiguities and vagueness that you don’t realize
until you try to put it into a computer. Then all of a sudden, almost every five minutes as you’re writing the
code, a question comes up that wasn’t addressed in the specification. “What if this combination occurs?”
It just didn’t occur to the person writing the design specification. When you’re faced with implementation,
a person who has been delegated this job of working from a design would have to say, “Well hmm, I don’t
know what the designer meant by this.” If I hadn’t been in China they would’ve scheduled an
appointment with me and stopped their programming for a day. Then they would come in at the
designated hour and we would talk. They would take 15 minutes to present to me what the problem was,
and then I would think about it for a while, and then I’d say, “Oh yeah, do this. ” Then they would go
home and they would write code for another five minutes and they’d have to schedule another
appointment. I’m probably exaggerating, but this is why I think Bob Floyd’s Chiron compiler never got
going. Bob worked many years on a beautiful idea for a programming language, where he designed a
language called Chiron, but he never touched the programming himself. I think this was actually the
reason that he had trouble with that project, because it’s so hard to do the design unless you’re faced
with the low-level aspects of it, explaining it to a machine instead of to another person. Maybe it was
Forsythe, I think it was, who said, “People have said traditionally that you don’t understand something
until you’ve taught it in a class. The truth is you don’t really understand something until you’ve taught it to
a computer, until you’ve been able to program it.” At this level, programming was absolutely important.

Feigenbaum: Could I stop you just a second? That’s exactly the same methodology that I learned from
Herb Simon and Al Newell at Carnegie, which is, it’s useless to spit out theories of human thinking unless
you can program them. You get every detail. You have to make a decision about every detail.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 39 of 73


Oral History of Donald Knuth

Knuth: Yeah, and they’re trying to come up with models of the brain and chess players and things like
this. It becomes very clear at this point.

Feigenbaum: No room for hand waving.

Knuth: But also in every field. Composing music. I took a class in music theory during my sabbatical
year, my year in Princeton before coming to Stanford. The idea of music theory, you’re supposed to
decide whether or not certain combinations of notes are going to sound good or not. But if they had
presented it as a programming thing -- write a program that decides whether or not these notes are going
to sound good or not -- that would’ve focused the issue, the attention, so much more sharply. It’s a
dream that if I finish my “Art of Computer Programming,” one of the things I want to do before I die is to
spend time programming for musical composition, and see if I can come up with some good music that is
developed with computer aid. I feel that in order to really understand music, it’s going to help me to be
able to program that. Who knows?

Feigenbaum: Don, let me go back to the programming stage. I would wander into your office in Jacks
occasionally, and occasionally you would jump up and down and show me something. I remember one
day you were showing me something that had to do with paragraph formatting, where you had uncovered
a link between that and, I think, dynamic programming or some other kind of mathematical programming.
That was a very interesting story, which is told other places, but maybe you want to use that as an
example to illustrate the link between one part of your life and another.

Knuth: I’m not sure if I mentioned that. I was telling somebody about that in the last two weeks. I don’t
know if I mentioned it last week.

Feigenbaum: Well say it again, even if you did.

Knuth: Okay. I had met Leon Smith at the AI lab.

Feigenbaum: Oh yeah, I think that was in the previous interview.

Knuth: Then in my class they said they could do this with the dynamic programming algorithm that I
used for music. It turned out to also work for English texts, and that was a revelation for my student. But
then when I got to actually programming it, I had to also organize it so that I could handle lots of text. I
had to develop a new data structure in order to be able to do the paragraph coming in text and enter it in
an efficient way. I had to introduce some ideas that are called “glue”, and “penalties”, and figure out how
that glue should disappear at boundaries in certain cases and not in others. All these things would never
have occurred to me unless I was writing the program. Edsger Dijkstra gave this wonderful Turing lecture
early in the 70s called “The Humble Programmer.” One of the points he made early on in his talk was
that when they asked him in Holland what his job title was, he said, “Programmer,” and they said, “No,
that’s not a job title. You can’t do that; programmers are just coders.” They’re people who are assigned
like scribes were in the days when you needed somebody to write a document in the Middle Ages.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 40 of 73


Oral History of Donald Knuth

Dijkstra said he was proud to be a programmer. Unfortunately he changed his attitude completely, and I
think he wrote his last computer program in the 1980s. At this conference I went to in 1967 about
simulation language, Chris Strachey was going around asking everybody at the conference what was the
last computer program you wrote. This was 1967. Some of the people said, “I’ve never written a
computer program.” Others would say, “Oh yeah, here’s what I did last week.” I asked Edsger this
question when I visited him in Texas in the 90s and he said, “Don, I write programs now with pencil and
paper, and I execute them in my head.” He finds that a good enough discipline. I think he was mistaken
on that. He taught me a lot of things, but I really think that if he had continued... One of Dijkstra’s
greatest strengths was that he felt a strong sense of aesthetics, and he didn’t want to compromise his
notions of beauty. They were so intense that when he visited me in the 1960s, I had just come to
Stanford. I remember the conversation we had. It was in the first apartment, our little rented house,
before we had electricity in the house. We were sitting there in the dark, and he was telling me how he
had just learned about the specifications of the IBM System/360, and it made him so ill that his heart was
actually starting to flutter. He intensely disliked things that he didn’t consider clean to work with. So I can
see that he would have distaste for the languages that he had to work with on real computers. My
reaction to that was to design my own language, and then make Pascal so that it would work well for me
in those days. But his response was to do everything only intellectually. So, programming. I happened
to look the other day. I wrote 35 programs in January, and 28 or 29 programs in February. These are
small programs, but I have a compulsion. I love to write programs and put things into it. I think of a
question that I want to answer, or I have part of my book where I want to present something. But I can’t
just present it by reading about it in a book. As I code it, it all becomes clear in my head. It’s just the
discipline. The fact that I have to translate my knowledge of this method into something that the machine
is going to understand just forces me to make that crystal-clear in my head. Then I can explain it to
somebody else infinitely better. The exposition is always better if I’ve implemented it, even though it’s
going to take me more time.

Feigenbaum: It’s not just the exposition. It’s the understanding. That’s why I don’t do theoretical AI. I
just can’t understand the thing from a theoretical point of view until I experiment with it.

Knuth: Yeah. That’s absolutely true. I’ve got to get another thought out of my mind though. That is,
early on in the TeX project I also had to do programming of a completely different type. I told you last
week that this was my first real exercise in structured programming, which was one of Dijkstra’s huge...
That’s one of the few breakthroughs in the history of computer science, in a way. He was actually
responsible for maybe two of the ten that I know. So I’m doing structured programming as I’m writing
TeX. I’m trying to do it right, the way I should’ve been writing programs in the 60s. Then I also got this
typesetting machine, which had, inside of it, a tiny 8080 chip or something. I’m not sure exactly. It was a
Zilog, or some very early Intel chip. Way before the 386s. A little computer with 8-bit registers and a
small number of things it could do. I had to write my own assembly language for this, because the
existing software for writing programs for this little micro thing were so bad. I had to write actually
thousands of lines of code for this, in order to control the typesetting. Inside the machine I had to control
a stepper motor, and I had to accelerate it. Every so often I had to give another [command] saying,
“Okay, now take a step,” and then continue downloading a font from the mainframe. I had six levels of
interrupts in this program. I remember talking to you at this time, saying, “Ed, I’m programming in
assembly language for an 8-bit computer,” and you said “Yeah, you’ve been doing the same thing and it’s
fun again.” You know, you’ll remember. We’ll undoubtedly talk more about that when I have my turn
interviewing you in a week or so. This is another aspect of programming: that you also feel that you’re in

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 41 of 73


Oral History of Donald Knuth

control and that there’s not a black box separating you. It’s not only the power, but it’s the knowledge of
what’s going on; that nobody’s hiding something. It’s also this aspect of jumping levels of abstraction. In
my opinion, the thing that computer scientists are best at is seeing things at many levels of detail: high
level, intermediate levels, and lowest levels. I know if I’m adding 1 to a certain number, that this is getting
me towards some big goal at the top. People enjoy most the things that they’re good at. Here’s a case
where if you’re working on a machine that has only this 8-bit capability, but in order to do this you have to
go through levels, of not only that machine, but also to the next level up of the assembler, and then you
have a simulator in which you can help debug your programs, and you have higher level languages that
go through, and then you have the typesetting at the top. There are these six or seven levels all present
at the same time. A computer scientist is in heaven in a situation like this.

Feigenbaum: Don, to get back, I want to ask you about that as part of the next question. You went back
into programming in a really serious way. It took you, as I said before, ten years, not one year, and you
didn’t quit. As soon as you mastered one part of it, you went into Metafont, which is another big deal. To
what extent were you doing that because you needed to, what I might call expose yourself to, or upgrade
your skills in, the art that had emerged over the decade-and-a-half since you had done RUNCIBLE? And
to what extent did you do it just because you were driven to be a programmer? You loved programming.

Knuth: Yeah. I think your hypothesis is good. It didn’t occur to me at the time that I just had to program
in order to be a happy man. Certainly I didn’t find my other roles distasteful, except for fundraising. I
enjoyed every aspect of being a professor except dealing with proposals, which I did my share of, but that
was a necessary evil sort of in my own thinking, I guess. But the fact that now I’m still compelled to… I
wake up in the morning with an idea, and it makes my day to think of adding a couple of lines to my
program. Gives me a real high. It must be the way poets feel, or musicians and so on, and other
people, painters, whatever. Programming does that for me. It’s certainly true. But the fact that I had to
put so much time in it was not totally that, I’m sure, because it became a responsibility. It wasn’t just for
Phyllis and me, as it turned out. I started working on it at the AI lab, and people were looking at the
output coming out of the machine and they would say, “Hey, Don, how did you do that?” Guy Steele was
visiting from MIT that summer and he said, “Don, I want to port this to take it to MIT.” I didn’t have two
users. First I had 10, and then I had 100, and then I had 1000. Every time it went to another order of
magnitude I had to change the system, because it would almost match their needs but then they would
have very good suggestions as to something it wasn’t covering. Then when it went to 10,000 and when
it went to 100,000, the last stage was 10 years later when I made it friendly for the other alphabets of the
world, where people have accented letters and Russian letters. I had started out with only 7-bit codes. I
had so many international users by that time, I saw that was a fundamental error. I started out with the
idea that nobody would ever want to use a keyboard that could generate more than about 90 characters.
It was going to be too complicated. But I was wrong. So it [TeX] was a burden as well, in the sense that I
wanted to do a responsible job. I had actually consciously planned an end-game that would take me four
years to finish, and [then] not continue maintaining it and adding on, so that I could have something
where I could say, “And now it’s done and it’s never going to change.” I believe this is one aspect of
software that, not for every system, but for TeX, it was vital that it became something that wouldn’t be a
moving target after while.

Feigenbaum: The books on TeX were a period. That is, you put a period down and you said, “This is it.”

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 42 of 73


Oral History of Donald Knuth

Knuth: 1986 was it, in other words. Five volumes were published, “Computers and Typesetting,
Volumes A, B, C, D, and E”, and that was to be the end. Then we had this 1988 and 1989, changing
everything from 7-bit to 8-bit, which was a major rewrite, done with the help of volunteers all over the
world. But I still had to personally do everything myself in order to make sure that it wasn’t going to
diverge.

Feigenbaum: This was at the same time it was being ported over to personal computers?

Knuth: It was ported over to personal computers already in 1980. It was ported to 200 different
programming environments -- I’m considering the combination of operating system and language -- by
1981. TeX ’82 was the complete rewrite and incompatible break with TeX ’78. The original design, TeX
’78, had already been ported to 200 different environments before I did TeX ’82. We also made sure that
this could be ported.

Feigenbaum: Did you have to design that porting environment?

Knuth: Yes. We worked on the porting environment. This was the genesis of literate programming.
One of the aspects of literate programming that doesn’t get top billing is the way it helps for porting a
system. It’s called change file mechanism. I have my master files, and nobody is allowed to touch these.
It says at the top of the file, “Do not change this file unless you are D.E. Knuth.” I don’t know how many
D.E. Knuth’s there are in the world, but anyway I get to change the master file. But change files come
along. The change file starts out with a line saying, “Okay, now go to the first line in the master file that
matches this,” and then it quotes lines from the master file, When it comes to the end, then it says, “Now
replace those by these lines.” This turned out to be a very flexible mechanism. It also had extra features,
like you can include another change file in the midst of one change file. But anyway, there’s the master
files that I write, and you have everybody who’s porting it. You have hundreds of these change files.
Then I make a change to the master file, because I find a bug, or because I have to have a new feature
before TeX is frozen. Still, the change file has very minor corrections in it. The error checking was
sufficiently good that you would usually find that the people who were porting it to another environment,
their ports would automatically work, even though I was changing the thing and they understood the port.
So that mechanism has worked well.

Feigenbaum: Don, I wanted to, while we’re talking about TeX and this decade, bring in fonts. Font
design, your interest in the art of font design, bringing Chuck Bigelow to Stanford. All of that, and
Metafont as a program, and as a book.

Knuth: Yeah. Metafont. Wow, there’s so many layers here. I just received in the mail two days ago a
wonderful book by Herman Zapf, who’s about to celebrate his 90th birthday. It tells the story of his life
and everything, and I’m just thinking about it because I met so many wonderful people. The graphic
designers are about the nicest people I’ve ever met in my life, and this came out of this group. It starts
out, actually, very briefly, at Stanford. Stanford has a wonderful professor, Matt Kahn, who taught a
course in basic design. Jill and I took his class – audited his class -- in 1976, I think it was. I got to rub
shoulders with artists during this time. He also gave a lot of insight into the way artists do their wonderful

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 43 of 73


Oral History of Donald Knuth

things. Then a few years later when I’m working on TeX, of course aesthetics is very important to me.
That’s why I didn’t like the Bell Lab system, otherwise I would’ve adopted the Bell Lab system. I had to
have something that looked beautiful to me. Stanford has a wonderful collection of fine printing, called
the Gunst Collection. I went through and I absorbed the writings of type designers through the centuries,
and studied, and started to learn what makes good quality different from ordinary quality in published
books. That was during the earliest time working in TeX. Before the summer of ’77, I could be mostly
found, like during May of that year just before my sabbatical, I could probably mostly be found in the
Stanford Library reading about the history of letter forms. Before I went to China I had drafted the letters
for A to Z. I’m not sure if I had gotten into all the letters. I think I had probably 26 lower case and 26
upper case letters by the time I left for China. But I had to do fonts at the same time as TeX. It wasn’t
something [where] I can do TeX and then I can do fonts. It’s a chicken and egg problem. You can’t do
typesetting unless you have the fonts to work with. Structured programming gave me a different feeling
from programming the old way. A feeling of confidence that I didn’t have to debug something immediately
as I wrote it. Even more important, I didn’t have to mock-up the unwritten parts of the program. I didn’t
have to do any fast prototyping on something like this, because when you use structured programming
methodology, you have more confidence that it’s going to be right, that you don’t have to try it out first. In
fact, I wrote all of the code for TeX over a period of seven months, before I even typed it into a computer.
It wasn’t until March of 1978 when I spent three weeks debugging everything I had written up to that time.
Certainly you can imagine how I’m feeling in October, November, saying, “Hmm. I wonder if this is really
going to typeset a paragraph, if these data structures I have for dynamic programming are really going to
work.” Maybe I’m a little curious about it, but structured programming still was strong enough that I
thought, “No, no. If I’m going to try to minimize my total time, then why should I have to first debug my
prototype and then debug the real thing? Why don’t I just do all the debugging once and save total time?”
The same with fonts. I had to have fonts. I couldn’t debug TeX until I had the fonts. So it’s all mixed up,
but working on one for a month and then going to the other for a month and coming back. I thought fonts
were going to be easy. I had seen Butler Lampson playing around with fonts at Xerox PARC. He was
sitting at a terminal and he had a big letter “B.” I can sort of visualize it now. He was drawing splines
around the edge. In my art class project I had done a project for Matt Kahn [which] taught me about
splines, so I knew how to program splines. I thought, okay, I’ll get the letters that are used in the old
edition of “The Art of Computer Programming”, and I’ll do like Butler did, and I’ll make my font. I was
going to go over to Xerox PARC and work with their equipment. They said, “Fine. Sure, Don. We’ll give
you an office over here. Of course, any fonts you design here become Xerox property. You won’t mind
that?” I said, “What? All I’m going to come out with [are] my measurements, a bunch of numbers. How
can you own those numbers? These are just integers. Numbers belong to God.” Well, this is a
debatable point. But they said anything I do there would belong to them. So I worked instead at the
Stanford AI lab, where we didn’t have anywhere near as good of precision cameras. We had a TV
camera and a great amount of distortion. If you turned the light slightly up just a tiny bit, the width of the
letters on the screen would grow by 25%. It was impossible to do any quality work through that. I had to
learn all kind of tricks for getting around it. It became much more difficult to do fonts than I had expected.
You were saying the other day that a story has to have moments of tragedy as well as success. One of
the greatest disappointments in my whole life was the day that I received in the mail the new edition of
volume 2 of “The Art of Computer Programming,” which was typeset with my fonts and which was
supposedly to be the crowning moment of my life when I had succeeded with the TeX project. I think it
was 1981, and I had gotten the best typesetting equipment, and I had written a program for the 8-bit
microprocessor inside, and it had 5,000 dots-per-inch, and all of the proofs that I had coming out looked
good on this machine. I went over to Addison-Wesley and they typeset it, and it came in a book. There
was the book, and it was in the familiar beige color covers. I opened the book up and I’m thinking oh, this
is going to be a nice moment. [But] this doesn’t look the same!

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 44 of 73


Oral History of Donald Knuth

Feigenbaum: You sent them film, right?

Knuth: I sent them film. It doesn’t look the same as my other books. I had volume 2, first edition. I had
volume 2, second edition. They were supposed to look the same. Everything I had known up to that
point was that they would look the same. All the measurements seemed to agree. But a lot of distortion
goes on, and our optic nerves aren’t linear. All kinds of things happening. I wrote it up once, when I say I
burned with disappointment. I mean, I really felt a hot flash where I “Ohhhhh!”

Feigenbaum: Yeah. Probably seething anger too.

Knuth: I don’t know. So, I mean, I—

Feigenbaum: You were saying that you put so much effort into this and it wasn’t beautiful.

Knuth: It wasn’t that bad. Some people didn’t notice any difference at all, but the worst was the
numerals. The numbers 1, 2, 3, 4, 5 are really in a rather different style from letters, and they’re very
tricky. I didn’t realize that when browsing a book our eyes jump and focus on different parts, and one of
the things we focus on most, often when we’re using a book, is the page numbers. And the 2 was really
ugly. And the 6 -- there is something about the 6 that it’s just not a 6. And the 5‘s! Anyway, I got to the
point where I was so upset. Some of California highway signs -- the speed limit signs for 50 miles an
hour, or 25 miles an hour -- the 5 is really ugly. It looks like the 5 that I used to have. I couldn’t live in
Santa Rosa because they have lousy 5’s on their speed limit signs in Santa Rosa. It just reminds me of
this awful time. There will be a time when I would be looking at all of the 2’s that I could see as I’m riding
a bus, or something like this, and how am I going to get this 2 to be right, because the numbers were the
worst of all. The letters were okay, but I’d seen the numbers, and I can’t read my book without seeing
these numbers. I’m looking up a page and I look in the index. Oh, yeah, I see, page 413. Then I have to
read all these numbers in order to get to page 413.

Feigenbaum: How did this get by your eyes?

Knuth: Before.

Feigenbaum: How come it didn’t get caught in the process?

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 45 of 73


Oral History of Donald Knuth

Knuth: You see, it’s the context. Having it on a film… Ok, first of all, we’re working with the Xerox
Graphics Printer, which has a very low resolution. Everything has jaggies -- jagged edges -- in that
machine. I knew about this even before I started to go into typography. We had the Xerox Graphics
Printer and we were saying, “Oh, this is interesting, but it’s not a book.” Then I had the nice results from
Pat Winston’s book that looked like a book. That was professionally designed type; it wasn’t done by a
computer programmer. But now I was trying to match exactly the type that we had in the other [version].
I would debug my whole book looking at Xerox XGP proofs. Then I would go to my high-res machine,
this expensive typesetter in the basement, and [on] that machine it was certainly crisp, and I didn’t see
any jaggies in those. I had no indication that when this would actually then go to be printed on paper, the
ink gets a little distorted by the printing process, and even more so bound in a place that looked exactly…
It’s the context. It had to look right, and it didn’t at that time. I’m happy to say that I open my books now
and I like what I see.

Feigenbaum: You’re at the bottom of this trough—

Knuth: Even though they don’t match exactly 1968, the way they differ are pleasing to me. But I had to…
So then I went to all the best type designers in the world. I had learned some of their names, and I was
able to invite them to participate in my research project, and I got to meet [them]. I could see, for
example, that Herman Zapf, from some of the things he had written, he seemed to be a very open-
minded guy. So I wrote him a letter introducing myself and saying, “Would you be interested in spending
two weeks at Stanford?” And boy! He’s the absolute best in the world. In my apprenticeship he’s one of
my great teachers. As you mentioned Chuck Bigelow, Chuck was the dean of typography in America. I
worked out to get some donations that we would be able to hire Chuck and have a joint appointment with
the art department. I was glad to find out that after we had gone through the process of committees and
getting the appointments approved by two departments and everything, the week after he had accepted
our offer he received a MacArthur Prize Fellowship, which certainly enhanced my credibility too with the
art department. This was a big, new thing for them; we had never had a joint thing with the art
department before. I brought Matthew Carter, who is considered definitely the leading type designer in
America. There was a great article about him in The New Yorker last year. He was out here for a quarter.
Many other visitors and industry leaders from around the world helped me at the time. Finally by 1986 I
was ready. I had type that I could be happy with. They said to me, “Don, that’s the normal five years’
apprentice as a type designer. That’s the way it goes.” Originally, I thought it was just going to be a
matter of making a few measurements and taking a few numbers, and that would be it.

Feigenbaum: That was the TeX story, the METAFONT story. Anything else going on during this time,
[in] the other parts of your life?

Knuth: Okay. I had to work so intensively on this software that I could not keep up my normal teaching
load at Stanford. I think three or four quarters… I’m not sure. Were you chair?

Feigenbaum: I was chair ’76 through ’81.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 46 of 73


Oral History of Donald Knuth

Knuth: ’81, yeah. So I had to approach you and say, “Can you give me a leave of absence this quarter
because I’m doing software?” Also then Nils [Nilsson] probably. Do you know who? No?

Feigenbaum: After me I think Gene Golub may have taken over.

Knuth: Gene. Okay. Anyway, I missed three or four quarters during a period of four years, because I
found that writing software was much more difficult than anything else I had done in my life, in the
following sense. I had to keep so many things in my head at once. I couldn’t just put them down and
start something else. It really took over my life during this period. I used to sort of think there were
different kind of tasks: writing a paper, writing a book, teaching a class, things like that. I could juggle all
of those simultaneously. But software was an order of magnitude harder. I couldn’t do that and still teach
a good Stanford class. Of course, I’m advising my grad students through all this period, and they’re doing
great theses related to typography. Mostly, not always. But the other parts of my life were largely on
hold. That includes The Art of Computer Programming. Except volume 2 was my big project, to get the
new edition of volume 2 done with TeX. In 1980 I spent several months just doing pure… There were
new developments in the algorithms that belong in volume 2, and I wrote a lot of new material for volume
2 during this period. But then in order to get TeX and METAFONT completely finished, that was the
focus. At Stanford we had a unique class taught in the spring of ’84 when the new version of METAFONT
was being done. I co-taught it with Chuck Bigelow and Richard Southall. Richard is not a type designer
but an expert in the interface between the designer and the actual final product. He’s a talented designer
but he’s not one of the leading designers. His main expertise is actually knowing what distortions you
have to make in order to get it to look right on the page. The three of us co-taught the class. The class
met three days a week, once by Chuck, once by Richard and once by me. The students in the class are
learning to design fonts at the same time. It was a great quarter doing this class, and it was all recorded
on videotape. Unfortunately the tapes were all erased, so we just have our memories of this class. My
life was pretty much typography. When it got to The Art of Computer Programming, every three months I
would take a look at the journals that had come in for those three months and I would scan the titles. For
each article I would say, “Oh, this belongs in volume 4, in a certain part.” I kept an index of them for a
while. I started throwing the preprints that I would receive in the mail, I started first putting them into a
box. All my preprints had been organized well for volume 4, into 32 compartments. But then they were
starting to overflow, so then I had X1, which just had overflow from all the compartments, and X2 and X3.
I got up to X15 of these preprints. Then I gave up on that and I started putting them into a big box in a
room in my house. And then the box overflowed and there was a big pile on the floor.

Feigenbaum: Yeah. I remember visiting you in your study when it was just a chaos of piles.

Knuth: Yeah. So in 1993, I think it was, I finally attacked the pile. I went through and I had accumulated,
I think it was, 14 linear feet of material that I had just been saying “someday get to this for volume 4.” I
think it took me a year to go through all of that and organize it and get ready to write the real volume 4
after all this time. So I put that on hold. Then before 1994 I had to get ready to, well, I’m retiring. We’ll
probably get into my third Stanford period. But typography was it for the early part of the ‘80s. Then I
started doing a lot of mathematical research in the late part of the ‘80s, analysis of algorithms, my real
life’s work.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 47 of 73


Oral History of Donald Knuth

Feigenbaum: I’d like to do that, to move on to the third period. You’ve already mentioned one of them,
the retirement issue, and let’s talk about that. The second one you mentioned quite early on, which is the
birth in your mind of literate programming, and that’s another major development. Before I quit my little
monologue here I also would like to talk about random graphs, because I think that’s a stunning story that
needs to be told. Let’s talk about either the retirement or literate programming.

Knuth: I’m glad you brought up literate programming, because it was in my mind the greatest spinoff of
the TeX project. I’m not the best person to judge, but in some ways, certainly for my own life, it was the
main plus I got out of the TeX project was that I learned a new way to program. I love programming, but I
really love literate programming. The idea of literate programming is that I’m talking to, I’m writing a
program for, a human being to read rather than a computer to read. It’s still a program and it’s still doing
the stuff, but I’m a teacher to a person. I’m addressing my program to a thinking being, but I’m also being
exact enough so that a computer can understand it as well. And that made me think. I’m not sure if I
mentioned last week, but I think I did mention last week, that the genesis of literate programming was that
Tony Hoare was interested in publishing source code for programs. This was a challenge, to find a way to
do this, and literate programming was my answer to this question. That is, if I had to take a large
program like TeX or METAFONT, fairly large, it’s 5 or 600 pages of a book--how would you do that? The
answer was to present it as sort of a hypertext, where you have a lot of simple things connected in simple
ways in order to understand the whole. Once I realized that this was a good way to write programs, then I
had this strong urge to go through and take every program I’d ever written in my life and make it literate.
It’s so much better than the next best way, I can’t imagine trying to write a program any other way. On the
other hand, the next best way is good enough that people can write lots and lots of very great programs
without using literate programming. So it’s not essential that they do. But I do have the gut feeling that if
some company would start using literate programming for all of its software that I would be much more
inclined to buy that software than any other.

Feigenbaum: Just a couple of things about that that you have mentioned to me in the past. One is your
feeling that programs can be beautiful, and therefore they ought to be read like poetry. The other one is a
heuristic that you told me about, which is if you want to get across an idea, you got to present it two ways:
a kind of intuitive way, and a formal way, and that fits in with literate programming.

Knuth: Right.

Feigenbaum: Do you want to comment on those?

Knuth: Yeah. That’s the key idea that I realized as I’m writing The Art of Computer Programming, the
textbook. That the key to good exposition is to say everything twice, or three times, where I say
something informally and formally. The reader gets to lodge it in his brain in two different ways, and they
reinforce each other. All the time I’m giving in my textbooks I’m saying not only that I’m.. Well, let’s see.
I’m giving a formula, but I’m also interpreting the formula as to what it’s good for. I’m giving a definition,
and immediately I apply the definition to a simple case, so that the person learns not only the output of
the definition -- what it means -- but also to internalize, using it once in your head. Describing a computer
program, it’s natural to say everything in the program twice. You say it in English, what the goals of this
part of the program are, but then you say in your computer language -- in the formal language, whatever

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 48 of 73


Oral History of Donald Knuth

language you’re using, if it’s LISP or Pascal or Fortran or whatever, C, Java -- you give it in the computer
language. You alternate between the informal and the formal. Literate programming enforces this idea.
It has very interesting effects. I find that, for example, writing a system program, I did examples with
literate programming where I took device drivers that I received from Sun Microsystems. They had device
drivers for one of my printers, and I rewrote the device driver so that I could combine my laser printer with
a previewer that would get exactly the same raster image. I took this industrial strength software and I
redid it as a literate program. I found out that the literate version was actually a lot better in several other
ways that were completely unexpected to me, because it was more robust. When you’re writing a
subroutine in the normal way, a good system program, a subroutine, is supposed to check that its
parameters make sense, or else it’s going to crash the machine. If they don’t make sense it tries to do a
reasonable error recovery from the bad data. If you’re writing the subroutine in the ordinary way, just
start the subroutine, and then all the code. Then at the end, if you do a really good job of this testing and
error recovery, it turns out that your subroutine ends up having 30 lines of code for error recovery and
checking, and five lines of code for what the real purpose of the subroutine is. It doesn’t look right to you.
You’re looking at the subroutine and it looks the purpose of the subroutine is to write certain error
messages out, or something like this. Since it doesn’t quite look right, a programmer, as he’s writing it, is
suddenly unconsciously encouraged to minimize the amount of error checking that’s going on, and get it
done in some elegant fashion so that you can see what the real purpose of the subroutine is in these five
lines. Okay. But now with literate programming, you start out, you write the subroutine, and you put a line
in there to say, “Check for errors,” and then you do your five lines. The subroutine looks good. Now you
turn the page. On the next page it says, “Check for errors.” Now you’re encouraged. As you’re writing
the next page, it looks really right to do a good checking for errors. This kind of thing happened over and
over again when I was looking at the industrial software. This is part of what I meant by some of the
effects of it. But the main point of being able to combine the informal and the formal means that a human
being can understand the code much better than just looking at one or the other, or just looking at an
ordinary program with sprinkled comments. It’s so much easier to maintain the program. In the
comments you also explain what doesn’t work, or any subtleties. Or you can say, “Now note the
following. Here is the tricky part in line 5, and it works because of this.” You can explain all of the things
that a maintainer needs to know. I’m the maintainer too, but after a year I’ve forgotten totally what I was
thinking when I wrote the program. All this goes in as part of the literate program, and makes the
program easier to debug, easier to maintain, and better in quality. It does better error messages and
things like that, because of the other effects. That’s why I’m so convinced that literate programming is a
great spinoff of the TeX project.

Feigenbaum: Just one other comment. As you describe this, it’s the kind of programming methodology
you wish were being used on, let’s say, the complex system that controls an aircraft. But Boeing isn’t
using it.

Knuth: Yeah. Well, some companies do, but the small ones. Hewlett-Packard had a group in Boise that
was sold on it for a while. I keep getting… I got a letter from Korea not so long ago. The guy says he
thinks it’s wonderful; he just translated the CWEB manual into Korean. A lot of people like it, but it doesn’t
take over. It doesn’t get to a critical mass. I think the reason is that a lot of people don’t enjoy writing the
English parts. A lot of good programmers don’t enjoy writing the English parts. Two percent of the
world’s population is born to be programmers. I don’t know what percent is born to be writers, but you
have to be in the intersection in order to be really happy with literate programming. I tried it with Stanford

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 49 of 73


Oral History of Donald Knuth

students. I had seven undergraduates. We did a project leading to the Stanford GraphBase. Six of the
seven did very well with it, and the seventh one hated it.

Feigenbaum: Don, I want to get on to other topics, but you mentioned GWEB. Can you talk about WEB
and GWEB, just because we’re trying to be complete?

Knuth: Yeah. It’s CWEB. The original WEB language was invented before the [world wide] web of the
internet, but it was the only pronounceable three-letter acronym that hadn’t been used at the time. It
described nicely the hypertext idea, which now is why we often refer to the internet as a web too. CWEB
is the version that Silvio Levy ported from the original Pascal. English and Pascal was WEB. English and
C is CWEB. Now it works also with C++. Then there’s FWEB for Fortran, and there’s noweb that works
with any language. There’s all kinds of spinoffs. There’s the one for Lisp. People have written books
where they have their own versions of CWEB too. I got this wonderful book from Germany a year ago
that goes through the entire MP3 standard. The book is not only a textbook that you can use in an
undergraduate course, but it’s also a program that will read an MP3 file. The book itself will tell exactly
what’s in the MP3 file, including its header and its redundancy check mechanism, plus all the ways to
play the audio, and algorithms for synthesizing music. All of it a part of a textbook, all part of a literate
program. In other words, I see the idea isn’t dying. But it’s just not taking over.

Feigenbaum: We’ve been talking about, as we’ve been moving toward the third Stanford period which
includes the work on literate programming even though that originated earlier. There was another event
that you told me about which you described as probably your best contribution to mathematics, the
subject of random graphs. It involved a discovery story which I think is very interesting. If you could sort
of wander us through random graphs and what this discovery was.

Knuth: Well, let me try to set the scene and connect it to the past a little bit. We finished the TeX project.
The climax of that was 1986, although I did have to come back into it later on to make it more world
friendly. But after 1986, that was a sabbatical year for me, so it was also a time when I spent the whole
year in Boston. It was the year I gave to my wife as her sabbatical. It was 25 years of marriage; I thought
I could help her for one year, and she’s been helping me for all the rest. That was a break. I came back
to Stanford after that, and I plunged into what I consider my main life’s work is analysis of algorithms.
That’s a very mathematical thing, and so instead of having font design visitors to my project, I had great
algorithmic analysts to my project, especially Philippe Flajolet from Paris. I started working on some
powerful mathematical approaches to analysis of algorithms that were unheard of in the ‘60s when I
started the field. We were excited about these developments and able to analyze a lot more algorithms
that previously were untouchable. Also other visitors, like Boris Pittel and so on. I had good research
funding to do work on analysis of algorithms. In fact I brought in the TeX project originally as just a minor
thing on my contract. ”Say, by the way, we’re going to write these technical papers and we need a
publishing method to present our work, so I’ll spend a little time on typography.” That lasted only a year,
and then I got special funding for working on TeX. But throughout that time I also was doing a little bit of
support, with graduate students and visitors, doing analysis of algorithms. This became a major thing
again in the late ‘80s. I found on the web one of my progress reports from 1987 listing ten
accomplishments of that year. I had to say that I don’t know if any other year was as fruitful as that year,
as far as my project was concerned anyway. It was certainly in full swing again finally after, from 1977 to
1986, the work on typography. So here I am in math mode, and thriving on the beauties of this subject.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 50 of 73


Oral History of Donald Knuth

The main glory of it then occurred after the new ideas had started to gel. We started to see the deeper
implications. As you learn the new techniques you apply it to new problems that were previously
unreachable. One of the problems that was out there that was fascinating is the study of random graphs.
Graphs are one of the main focuses of volume 4, all the combinatorial algorithms, because they’re
ubiquitous in applications. A lot of times in order to understand what an algorithm is doing, you see what
would it do if I applied it to random data of various kinds. Yesterday at our computer forum Pat Hanrahan
was telling me how many people he knows that are working with random graphs to study the internet, and
so on. One of the simplest models of random graphs is one that also the physicists had been interested
in for many years. It connects to so-called Bose-Einstein statistics, they tell me, although I don’t really
understand that much about that part of physics. This model is very simple. We start out with N points
that are totally disconnected from each other. These points don’t exist in three-dimensional space. They
exist just as N objects in any number of dimensions. Initially there’s no connection at all between any
objects. But you can imagine that somebody draws two random objects, totally at random. Close your
eyes, find one, and each one with equal probability, 1/N. Then find another one and then put a
connection between those two. “Zap.” Those two are now joined. Okay. Now we have N-2 objects that
are still independent, but two of them are connected together. Do it again, and maybe you’ll connect two
others. After you do it a few more times you might find that these two are together, and these two are
together, but then you will hook them together and we’ll get four. Or we might have two that get a third; a
guy goes with them. Eventually we build up trees of things, meaning that they’re hooked together but they
don’t have cycles. There’s no loops. Everything in a tree is connected to everything else in the tree, but
there’s only one way to get from each one to each other one. There’s no loops. But we keep on adding.
This random process keeps going on, adding more and more connections, one at a time. Eventually
cycles occur. If we keep on going on and on and on and on, eventually everything is going to be
connected to everything else directly. This is called the evolution of random graphs. We can ask, at any
point in time, what does the random graph look like after we’ve added M connections to these N groups?
What does it look like? Paul Erdos and Alfred Renyi had proved in 1960 that an amazing thing happens
as we add these connections. When M gets to a value which is approximately one half of N times the
natural log of N, all of a sudden a “big bang” occurs, where comparatively little connection was true
before the big bang, compared to a lot after the big bang. The statistics are something like this. If we say
that M, the number of edges, is equal to lambda over 2 times N. If M is N over 2, if we went ahead,
added half as many edges as there are points, then lambda is 1. If lambda is 10, then I’ve added 5N
point connections. The thing is, if lambda is less than 1… So we consider a large value of N, and we
have fewer than one half… Sorry. If lambda is less than log N… No. Ok. Change my definition so the
number of edges is equal to lambda times natural log N times N over 2. If lambda is less than 1, then
almost surely the graph consists of only trees, and the largest tree is of size something like the logarithm
of N. It’s almost totally dispersed. If lambda is equal to 1, almost surely there is a component of size N to
the two thirds power; if N is a million, a component of size approximately 10,000. It’s N to the two thirds
power. It goes from log N size trees to connect the part that’s big, that has N to the two thirds. If lambda
is greater than 1, it’s proportional to N, not N to the two thirds. So there is this jump between a very small
number and no cycles. If lambda is 1 minus, if lambda is 0.999999, you still only get log N. If lambda is
1.000001, you get N. There is this bang that’s occurring, and the question…

Feigenbaum: By “bang” you mean a discontinuity.

Knuth: Discontinuity, a double jump. People who have studied the Erdos and Renyi, and physicists,
could study it from the point of view of starting from zero and going up to lambda equals 1, and then their

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 51 of 73


Oral History of Donald Knuth

equations would blow up at lambda equals 1. Or they could study the later stages, larger lambda, and
lambda gets down towards 1, and there the equations blow up. Okay? Now a Russian man in St.
Petersburg who had noticed to his surprise that actually there was some similarity between the blow-ups
from the top and the blow-ups from the bottom. What we proposed to study was what happens in the
middle, and center on the middle, if possible. I guess we’ll continue the story later.

Feigenbaum: We’ll continue that story.

Feigenbaum: Don, we’re at the discontinuity point, and you’re about to explore both sides of that point,
and the story’s going to get really interesting here.

Knuth: Well, I hope so, but at about this time, Dick Karp at Berkeley was also interested in the evolution
of random graphs, and this explosion phenomena. It relates in a vague way to computer algorithms,
because if we have data that has a lot of connections in it, then we would want to use a different kind of
data structure to represent in the computer, and certain strategies would work a lot better. Dick Karp had
shown that, for example, if we want to take the transitive closure of a binary relation, you use a different
method, or if you want to update the consequences of adding a new thing, depending on how big the
graph is, you want to choose a different strategy. So this becomes a problem also in an analysis of
algorithm as well as in physics. He had a couple of his graduate students do a simulation and try to grow
a lot of random graphs and see what happened. The word we heard from this simulation -- it actually
turned out we misunderstood it -- but what it seemed to imply from what we heard from what the Berkeley
students had done was the following: as the graph is growing and getting more and more connections,
the graph first gets to a point where it has one cycle. It’s not just trees, but there’s also one of the
components has an extra edge in it, more than needed to connect things together. Not only are the
things connected, but also there’s another edge making a cycle. Eventually there will be two cycles, and
three cycles, and things like this, and there’ll be more things happening. What we thought the Berkeley
group had discovered was that there almost never was a case where two of the connected components
of the graph would have cycles. In other words, as we’re adding edges, components merged together;
things that used to be apart become one. You might think that actually in a graph if we have a left
component and a right component, the left component might get a cycle and the right component might
get a cycle, and then they might merge later. But in the Berkeley experiments, it seemed, this almost
never happened. Instead, whichever component first got a cycle, it was the only one that had cycles later
on. Others would merge into it, but none of these other components would grow their own cycles first.
They weren’t big enough to have cycles. We thought, well, if this is true, this would also have
implications for data structures and algorithms. We could design our algorithms so that they could have
one place for the cycle guy, and one place for the other ones. We could have our data structure and say,
well, here’s where the cycles are, and here’s where the trees are, and then we could do faster updating.
So we set out, really, not originally to understand everything about the way the graph goes through this
critical point. Our original goal was to just try to prove what we thought the Berkeley group had found
empirically, this phenomenon that there’s sort of almost always only one main component, or one main
component that has cycled.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 52 of 73


Oral History of Donald Knuth

Feigenbaum: Don, can I interrupt you just a second to ask a question? What puzzles me, and puzzles
maybe the audience, which is how often do analysts, mathematical analysts, do empirical experiments to
discover things? Is that a usual thing, or was it special in this case?

Knuth: It’s a fast-growing area in mathematics. The Journal of Experimental Mathematics was founded
by Sylvio Levy less than ten years ago. He was my co-author with CWEB, but he’s very broad. It’s
because computers are now there, so we can now do empirical studies with mathematics. It’s not too
common. My professor, Marshall Hall, was sort of famous for his observation with combinatorial things,
that at the time he best expressed the wisdom of the 1960s of saying that when you’re doing
mathematics it’s nice to do a bunch of experiments with pencil and paper. If some problem has a
parameter N associated with it, you can usually go up to some value of N, like N=10 or something, by
hand. Then with the computer you can go on with N=11. Combinatorial problems tend to grow faster, to
the point where the computer can go beyond the hand thing. But then you can’t go to N=12, because
that’s already too much, because the problem is growing so much. So he says computers were good for
going one case beyond what you could do by hand. But now computers are better by orders of
magnitude than they were there, and also the tools that we have now for examining mathematical things
are much better, the software that we have.

Feigenbaum: If this journal is only ten years old, this work that you were doing around 1990 must’ve
been very much an early kind of a pioneering thing.

Knuth: Well, it was, actually. I guess there was another story associated with that, and that is I did
empirical studies on the first cycle that occurs with a random graph. There was the paper that I wrote just
previous to the one, the work I did with Philippe Flajolet. We first developed the theory, and then we
wanted to have a section at the end of the paper that validated [it] experimentally, so we could see how
big the graph had to be before the asymptotics would kick in. A lot of graph problems actually behave
differently when the size is small. Our theorems we knew were true when N gets up to larger than the
size of the universe, but how did we actually know, if N is a million, is our theory correct? So I ran
experiments, sort of as a last phase of writing the previous paper, in order to test the thing in practice for
small values, since our mathematics was entirely concentrated on the case where N is getting very large,
the size of the graph is getting very large. I ran the program over Christmas vacation. I think I let it run a
little longer than I intended, I think because of timesharing, nobody else was using it at the Christmas
vacation. I didn’t realize, but a week later I got a bill from Betty Scott for $60,000 of computer time, which
was way more than I had in my budget of my research grant. I refused to pay it, basically. I said, “I’m
sorry, I have to declare bankruptcy.” The worst part of the story is that I found out, 15 years later, that I
had a bug in my program and all the answers were wrong, all the $60,000 of calculation. What we had to
report in our paper was that actually our theory didn’t seem to be very relevant for the small values of N.
And Professor, our stat professor -- oh, what’s his name? I see him in front of me, but I don’t know -- he
was looking at our data and he figured out another algorithm by which he could calculate things by hand.
He knew that our answers were wrong. Sure enough, all this money that I wasted on this empirical
calculation, no wonder it didn’t agree with our theory, because my program was, in fact, wrong. In the
reprint of that paper on the first cycles, which came out in my Collected Papers on Discrete Mathematics,
I think it is -- I don’t remember which of mine -- I recomputed this table with a correct program. Of course,
it only took five minutes on a modern computer. But with the SAIL [Stanford Artificial Intelligence Lab]
computer we got a whopping bill. So it wasn’t very usual to do empirical calculations at that time, and it
was at Berkeley that the guys do.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 53 of 73


Oral History of Donald Knuth

Feigenbaum: Let’s go back to Berkeley. You had interpreted, but probably misinterpreted, the Berkeley
numbers.

Knuth: That’s right. The Berkeley numbers were telling us there might be this giant component
phenomenon, that the seed is planted very early, and then it stays with the thing. That was our original
motivation for studying the… What we finally found out was a good explanation of the Big Bang, but our
motivation -- we didn’t start out in saying, “I’m going to solve this problem.” That would’ve been a
hopeless problem. That would’ve been too much, even for an optimist like me, to say he was going to
tackle that problem. It just turned out that we stumbled on the answer. But in our course of looking at it,
we did find a way to slow down the Big Bang, and that’s not too hard to understand. Let’s imagine again
that we’re watching this graph evolve. Every graph as it evolves finally gets to a point that a cycle
appears in one component. Okay, we have one component containing a cycle. Now then it comes to a
point where there are two cycles in the whole graph. There are two possibilities. Either the two cycles
are in the same component, or one cycle in this component, one cycle in another one. So there’s a fork
in the road. It goes one direction or the other. Then when the third cycle appears, we have three
possibilities. We could have all three in one component, or we could have one and two, or we could have
one, one and one. And so on. You could draw an abbreviated history if you just look at which
components have cycles. There’s a branching diagram that every evolving graph goes through some
path in this diagram. The Berkeley experiment, as we understood it, was that almost always we were on
the upper line of this path. Almost always there’s only one component that contains cycles. These other
possibilities are there, but rare. We developed tools of complex analysis that I had mostly learned from
Philippe Flajolet. It got to the point where I could prove that it wasn’t almost always happening on the top
line, because at the very first branch, if I’m not mistaken, the odds were 72 to 5 that it would take the first
branch, but 5 cases out of 77, you’d take the bottom branch. It’s not going to zero, but that most of the
time it takes the top branch. But then maybe those two will join together and will get up to the top branch
again. We started to have more mathematics so we could find the first branch, in the two cases. A few
more days later, we could extend that. We could say, “Oh, what happens when the two go into three, and
three go into four?” We were getting peculiar numbers, but we could calculate these probabilities by a
long sequence of steps; a lot of calculus, a lot of Mathematica -- or Macsyma, I guess it was at that time -
- using the symbolic algebra systems to grind out these strange probabilities. The truth actually turned
out that the Berkeley experiments had sampled the graph. Say you have a million nodes. They would
sample it after you had a thousand edges, and then you print out what’s the state then. What about
1,100 edges, 1,200 edges? I’m sorry, the critical point occurs at 1/2 N log N. They would sample it at
periodic times, but they wouldn’t sample it [exactly] at the state where you get the first cycle, the second
cycle, as in our mathematics. What we were doing is we were seeing the graph at a certain number of
seconds at time. The truth is that these deviations from the top line disappear very quickly. There’ll be a
brief instant of time where there’s two [cycles], where you’re not on the top line, but then it jumps back up
to the top line again. If you’re only sampling the graph at intervals of time, you almost never see the case
where you’re not on the top line. That’s why we misunderstood what we thought the Berkeley
experiments contained. We actually were able to prove sort of a climactic theorem, to get an exact
probability that it stays on the top line throughout and never ever has more than one cycle. The number
was something like 5π/12. Amazing. No, it’s got to be less than one, but it equals a small rational
multiple of π. That was the exact probability of staying on the top line. It’s kind of amazing that the
number π would occur in this connection. So we had these new mathematical tools. What it finally gave
us was a way to look at the Big Bang from the center of the process, and none of our equations blow up.
We’re able to slow down the Big Bang and watching it happen, by means of this new scale of
measurement saying, “Look at it after there’s a certain number of cycles in the graph.”

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 54 of 73


Oral History of Donald Knuth

Feigenbaum: Don, tell me about the words, “stumble upon.”

Knuth: Stumble upon, yeah.

Feigenbaum: “We stumbled upon.”

Knuth: Right, yeah.

Feigenbaum: What happened?

Knuth: I had these numbers now. They were numbers like 5/77. I wrote these numbers down, and they
just looked like really crazy numbers. Then one day I decided to take the series, it’s a power series, X +
(5/77) X2, or something like this. You have a sequence of powers of X with weird rational numbers
attached to each power. I realized that the mathematics that we were developing actually would simplify
if we weren’t using those numbers, but instead forming the exponential of these power series. You take
ef(X) instead of f(X). I used Macsyma to calculate ef(X), and it has rational coefficients, too. But one of
those rational coefficients was something like 23023. Or 17017, or something. It wasn’t just a random
number, You play with numbers [and] you know that 23023 is 23 x 7 x 11 x 13, because 7 x 11 x 13 is
1,001, and that happens to involve a lot of small prime numbers. So here’s this number with a lot of small
prime factors appearing. If we didn’t take the exponential, the numbers just looked crazy; they didn’t
have small prime factors, they didn’t have any nice mathematical redeeming features. But after I took
the exponential, all of a sudden the numbers that I was looking at looked like old friends. They were
something that, you know, there had to be a reason for it. God didn’t want these numbers just to be
there. There had to be some mathematical reason. You could say that’s “stumbling on” something. An
hour later I could see the pattern for all of the numbers, because now it was all small prime factors and I
could guess what the next one in the series is. Before having this combination, it was impossible. The
funny story is that I made this discovery in the middle of the night, about 4:00. I could explain why it’s
5/77ths and everything, and I could draw the diagram of the transition of every graph as it goes through
the beginning of the Big Bang. Bill Gates was visiting Stanford the next day, and they were trying to
impress him so that he would donate some money to build a new house for the computer science
department. They asked me to meet with him in the morning. I’m not sure if we had ever met before. I
know he says now that he had studied my books rather hard when he was at Harvard. I was all filled with
the enthusiasm about having seen a pattern in these numbers. I drew on the blackboard the branching
structure of the first moments of the Big Bang, and I put my rational numbers on there, and I put my
formula involving 6N factorial, or anyway all the pattern that I had noticed. Later on, Carolyn Tajnai, who
was walking him around between the buildings, said to me, “Don, can you recreate for me what you put
on the blackboard that day? Because Bill was really enthusiastic about this?” The next day he wrote a
check to Stanford, for I don’t know, $10 million, or something like this. I always use this story if somebody
says, “Who says theoretical computer science has no value?”

Feigenbaum: Great story. Here’s a question which kind of wraps up the Stanford… Well, you were
going to talk about your retirement, and then I was going to ask you about volumes five through seven.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 55 of 73


Oral History of Donald Knuth

Knuth: Okay.

Feigenbaum: Say something about the retirement.

Knuth: I was afraid you’d ask me about volumes five through seven, so I’ll talk about retirement. That’s
really when we’re getting into phase 3 of Stanford. Phase 4 will be retirement, then, because phase 2
was certainly intensive software work, and then phase 3 was back into intensive analysis for “Art of...”
The next phase is going into retirement. As I said, I had my sabbatical year in Boston in ’86. That was to
be the climax and finishing of the TeX project. Then I come back from sabbatical and get back to speed
and so on. One of the things that happens when you come back from sabbatical is people will say to you,
“Oh, aren’t you glad to be home?” And, you know, I say, “Yeah, it’s nice”, and all this. But I found that it
wasn’t. I wasn’t really as happy as I let on. I mean, I was certainly enjoying this research that I was
doing, but I wasn’t making any progress at all on Volume 4. I’m doing this work, giant component, Big
Bang type of explorations, and I’m learning all of this thing. But at the end of the year, how much more
had been done? I’ve still got this 11 feet of preprints stacked up in my closet that I haven’t touched,
because I had to put that all on hold for the TeX project. I figured the thing that I’m going to be able to do
best for the world is going to be to finish ”The Art of Computer Programming”. I can do cutting edge
research, but maybe I shouldn’t be just enjoying myself on this, but I should be getting stuff out the door
that’s going to be “The Art of Computer Programming,” which I had promised to write in 1962, and here it
is late 1980s. After two years, I started thinking about it during the summer of ’88, as to what I should be
doing with my life. At this point, see, I’m 50 years old. I was born in ’38, this is 1988. I decided that I
didn’t need money anymore. I didn’t need my Stanford salary. I had enough money in the bank. I didn’t
get any money from the TeX project -- that’s in the public domain -- but “The Art of Computer
Programming,” you know, [is] selling by the thousands every year all the time. So I can afford to do
whatever I want with my life. I don’t have to be employed. I can do what’s the best way to use whatever
gifts I have to put out. I decided that I really wanted to do “The Art of Computer Programming,” and get
this done. The only way to do it was to stop being a professor full time. I really had to be a writer full
time. I wrote a memo to Nils Nilsson, who is our chair, saying, “Nils, I’ve decided that after two more
years I would like to go on leave of absence and never come back.” I would love to continue an affiliation
with Stanford whereby I would be giving occasional lectures, but I think the thing I really want to do is
write “The Art of Computer Programming.” I don’t like the idea of a professor who just spends all his time
writing books and getting paid for being a professor, so I shouldn’t call myself any more a fulltime
professor. I shouldn’t be drawing my Stanford salary. I’m going to be doing only the books, except for
occasional things. I’d like to be five percent time to keep participating in things, but I’ll never get my
books done unless I can really put fulltime into that. If I’m only going to make one day’s worth of progress
out of every 365, it’s going to take an estimated several centuries to finish at this rate. I wrote this letter to
Nils, and then we had meetings with the Dean, Gibbons, and the provost, who you know is Jim Ross.
They thought maybe they could find a donor to Stanford who’d like to endow a professorship for
somebody who writes “The Art of Computer Programming.” They didn’t find that, but they did say that we
could have an amicable way to achieve this. It looked like in a year-and-a-half we’d be able to find
someone who would take over my role as leading the analysis of algorithms activities in the department.
Unfortunately, that never happened. We never found a senior person to take over what I was doing. But
as of January 1990 I became on leave of absence. They allowed the leave of absence to continue until I
was 55 years old and I could officially retire with a pension. I didn’t get any buy-out or anything like this,
like people are talking about now, but I do get some of my health insurance and so on through Stanford.
This is the kind of retirement that I worked out. I was able to also create my own title. I’m “Professor

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 56 of 73


Oral History of Donald Knuth

Emeritus of The Art of Computer Programming”, with a capital “T” in “The Art of Computer Programming.”
I love that title. So starting at age 55, which is officially the beginning of ’93, I was Professor Emeritus.
The arrangement was that I give occasional lectures, which I’m now giving about three a year. We were
hoping for more, like six a year, but it’s on the average three because I’m out of town a lot. I have an
office and a secretary, and I’m on campus a lot. But I don’t have to raise funds for research projects.
Unfortunately I don’t have direct work with graduate students like I did before, and I don’t have regular
teaching. I enjoyed those things very much, and I think that the students that I had, I’m proud of every
one of them. The thing is, “The Art of Computer Programming” is something I have to do my best at.

Feigenbaum: Let me ask you a little easier question than Volume 5 through 7. Where are you in
Volume 4?

Knuth: So, Volume 4: I’m on page 12 of Volume 4 right now, although I’ve already written 400 pages
that come starting after about page 50. I’ve got a lot of it under my belt now, but you know as a computer
programmer you don’t write the initialization first. I’m at the point now I’m ready to write the initialization
to Volume 4.

Feigenbaum: But this volume must make you particularly nervous, because it’s on combinatorial
algorithms.

Knuth: It has [been] subject to combinatorial explosion, so it will Volume 4A, 4B, and 4C-- possibly 4D.
I’m sure it won’t get up to 4Z, but there will be sub-volumes to Volume 4 because of the huge growth in
combinatorial algorithm. By the way, while it’s in my mind, let me, because it related to a question you
asked me last week and I didn’t think of a response at the time. It was something about being an
engineer versus being a scientist, or something like this. The way I tended to phrase that is the relation
between theory and practice in my life. I always thought that the best way to sum up my professional
work is that it has been a mixture of theory and practice, almost equally. The theory that I do gives me
the vocabulary and the ways to do practical things that can make giant steps instead of small steps when
I’m doing a practical problem, because of the theory that I know. The practice that I do makes me able to
consider better, more robust theories, theories that are richer than if they’re just purely inspired by other
theories. There’s this basic symbiotic relationship between those things that’s probably central to the
whole thing. At least four times in my life when I was asked to give kind of a philosophical talk about the
way I look at my professional work, the title was always ”Theory in Practice.” I think the first time I did this
you were chair of the department, and I had just gotten the “Fletcher Jones” professorship. That was the
title, and I was asked to speak for five minutes on my life as I get this endowed chair at Stanford. My title
was “Theory and Practice.” I remember that in that talk I gave a kind of a spoof. I started out and I said,
“Well, I’ve written so many pages of books, and I’ve published so many papers, and I’ve had so many
students.” I gave a lot of the numerical statistics, and I said, “And that just about sums me up. So now
that I’ve got this chair, I’m going to follow the advice of the Fonz and ’Sit on it.’” I remember I had made a
pretty compelling case for why I was tired and ready to ‘sit on it.’ I scared you to the point where you
were really sweating blood there. Then, of course, in the next sentence I said, “And of course, you know
that this is impossible, and that I couldn’t possibly do this.” And “Whew!” I could see you, you know,
doing this. [Showing relief] That was the first time I gave a talk about “Theory in Practice.” I gave another
one; the next one was actually very interesting. It was given in the Theater of Epidaurus in Greece, the
best preserved ancient theater. It was the keynote speech for the European Association for Theoretical

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 57 of 73


Oral History of Donald Knuth

Computer Science. They had their annual meeting in Greece that year. Greece is the place for
philosophy, and also the words “theory” and “practice” both come from Greek words. So naturally I
decided I would speak on “Theory and Practice” in Greece, and I could speak in this temple of Greek
culture giving this talk. Melina Mercouri was the Greek Minister of Culture, and she introduced me in the
speech. It was a great moment of my life to summarize the roles, the tradeoffs between the two. At that
time I was working on TeX; it was early ’80s. My main message to the theorists is, “Your life is only half
there unless you also get nurtured by practical work.” And, I said, “Software is hard.” My experience with
TeX taught me to have much more admiration for the colleagues that are devoting most of their life to
software than I had previously done, because I didn’t realize how much more bandwidth of my brain was
being taken up by that work than it was when I was doing just theoretical work.

Feigenbaum: While we have just a moment left, if that Greek lecture was written up, do you know where
it is, so the audience could go look it up?

Knuth: I have a book called, “Selected Papers on Computer Science.” George Forsythe told me early
on when I came to Stanford, he said, “Don, sometimes in your life you’re going to be speaking not to
professionals, but you’re going to be talking to a much more general audience. It’s always scary to do
that, because you don’t understand… It’s easier to give a speech to somebody that’s exactly like you
than to somebody who has a different way of thinking.” When I wrote for Scientific American or
something -- every once in a while I would write something that was not addressed to somebody in my
own field. This book, 200 pages or something, contains all of the papers that I wrote in this way. There
are three or four versions, takes on, “Theory and Practice,” including the Fonz Winkler one, are reprinted
in that volume. Thanks for asking.

Feigenbaum: Okay.

Feigenbaum: You’ve reviewed for us what you might call the chronologically-oriented themes of your
career. Pre-Stanford, first Stanford period, and so on, until your retirement -- your pseudo-retirement, I
should say. Cutting through all these are other kinds of themes that touch on in many different points in
the chronological explanation of your life. In my field, I really call these the heuristics of leading a career.
In fact, I told you once that I felt that one of the bad decisions I made in my career was leaving what was
then Carnegie Tech a year too early, before I learned all I had to learn from Herb Simon. I don’t mean
learning the material. Not the content, but the heuristics of leading a life. Could you talk a little bit about
that? If a Ph.D. program is kind of a research training apprenticeship where the students learn these
heuristics, what are they learning from you?

Knuth: I have some slants that I would tend to emphasize. Other professors would emphasize other
slants. I don’t have a monopoly on wisdom of this kind. The kind of things that I would tend to
emphasize are not just doing trendy stuff. In fact, I’d probably overemphasize that. If something is really
popular, I tend to think maybe I back off. I tell myself and my students: really to go with your own
aesthetics, what you think is important. Not what you think other people think you want to do, but what

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 58 of 73


Oral History of Donald Knuth

you really want to do yourself. That’s sort of been a guiding heuristic all the way through. When I was
working on typography, it wasn’t fashionable for a computer science professor to do typography, but I
thought it was important and a beautiful subject. So what? In fact, other people told me that they’re so
glad that I put a few years into it because they could make it academically respectable and now they
could work on it themselves. They sort of were afraid to do it themselves. But all the way through, when
my books came out, they weren’t usually copies of any other books. They always were something that
hadn’t been fashionable to do, but they corresponded to my own perception of what ought to be done.
Also, your word “heuristics” triggers another thing in my mind, and that is George Polya. Polya wrote this
great book called, “Heuristics and Plausible Reasoning.” Of course, I know heuristics is a great word for
you because you had the ”Heuristics Programming Project” and all these things. Heuristic, meaning
discovery. Polya also inspired me. I had the great fortune to get to know him because he’s a Stanford
professor. He came to my house many times and I spent a lot of time with him. One of the things that
cuts across also many years is he had an approach to teaching that he called, “Let Us Teach Guessing.”
When he was teaching his classes, he’s not saying, “Memorize this, guys.” He’s saying, “Here’s a
problem. How do you solve it?”, with the idea that the students are going to make mistakes and then
they’re going to learn how to recover from mistakes, as well as making guesses. These are important
heuristics for my life, both in the teaching aspect and in the research aspect. Let me talk about the
teaching aspect first. Polya gave a demonstration lecture that was filmed at Stanford, and I saw it when I
was still at Caltech. I saw this. He presents the students with a problem, with something like, “You have
a piece of cheese and you cut it with four strokes. How many pieces are you going to get?” Something
like this. Then he has the students try to analyze this. He started out by looking at simpler problems,
where it’s on a plane instead of in three dimensions, and you only take two cuts; things like this. At the
end of the hour he has all the students understanding not only the solution to this problem, but also
having taken apart and discovering the solution themselves. That’s what goes into their brain, because
then they can solve other problems later on. I adopted this as a model for my own classes, already at
Caltech. Whenever I taught a class that had a decent textbook, I would devote the class time to problem
solving as a group, instead of reading to them or lecturing to them about what’s in the book. I would
assume that they could read the book on their own. They come to class, we do things that aren’t in the
book. We take a problem that’s similar to ones in the book and we try to work on it, almost like a
language class. I go down the row and, “It’s your turn, your turn, your turn.” People soon learned that if
they make a mistake, we all do, and we recover. I’d give a rule that nobody’s allowed to speak more than
twice in the hour, so that everybody participates. My teaching assistants would take notes, so that the
students could concentrate on what was going on instead of having to worry about having their notes
right so they couldn’t listen fully. The teaching assistant’s notes would then be typed up later on by
Phyllis and distributed to everyone. So we could record these sessions in the class as to things that
aren’t in the book, and how to recover from errors. I kept that style of teaching all the way through until I
retired. That was a great source of pleasure. I could use it except in the cases where there was no
textbook available. In my own research, this idea of guessing is also very important. When I read a
technical paper, I don’t turn the page until I try to guess what’s on the next page. Or, [say] the guy writing
the paper is going to state a problem. Before I look any further, I’ll say, “Now how would I solve this
problem?” I fail almost always. But I’m ready now to read the solution to the problem, so then I turn the
page and I see. But by this time I’m ready for what’s happening. When I work on “The Art of Computer
Programming,” over a period of 40 years I’ve gathered together dozens of papers on the same subject. I
haven’t read any of them yet except to know that they belong to this subject. Then I read those papers,
the first two papers extremely slowly with this “Don’t turn the page until you’ve tried to solve the problem
yourself and do it yourself.” With this method, I can then cover dozens of papers. The last ones, I’m
ready for. I just know what to look at that’s a little different than I’ve already learned how to absorb.
That’s been a key heuristic in my own research, based on guessing.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 59 of 73


Oral History of Donald Knuth

Feigenbaum: That’s a really interesting story. In fact, my little footnote to that is that I called my own
project, the “Heuristic Programming Project” because I didn’t want to infringe on John McCarthy’s term,
“Artificial Intelligence.” Stanford Artificial Intelligence Laboratory. Everyone knew what programming
was, but no one knew what heuristics were. When they asked me, I would just quote Polya. I’d say,
“Polya says heuristic is the art of good guessing.”

Knuth: Yeah. Okay, very good.

Feigenbaum: Anyway, I wanted to ask you a little bit about the process of gathering up the literature and
writing them in “The Art of Computer Programming” that you’ve been doing. To go back to Artificial
Intelligence, part of the program is a problem solver, but then there’s the part we don’t understand very
well, which is the problem generator. I’ve always thought of “The Art of Computer Programming” as
some kind of a problem generator for you. In fact, I’ve been jealously thinking of that. As you begin to
put things together, you see the holes.

Knuth: Yeah. The main perk that I get from working on “The Art of Computer Programming” is that I get
first crack at a lot of really natural research problems. Because I’m the only person so far who’s read a
paper by two authors who didn’t know of each other’s existence. I can see where this guy’s ideas fit in
with this guy’s ideas. They’re both working on the same problem, but they don’t realize it because they
have different vocabularies, very often. Artificial intelligence people, you know, have a _________
algorithm or something like this. The electrical engineers are working on a problem with a different
vocabulary, a different slant on it, but they’re thinking of something else. The people in operations
research are thinking of another way. Each person will take the problem and solve it in one respect.
Person B will solve a similar problem in another respect. I get to be the one who solves problem A in
respect to B and vice versa. Often these problems are natural and unify the subject. They tie the
problems in with even more parts of the subject, which make more of a pattern instead of having page 1,
page 2, page 3. Somehow it’s a network instead of a branching structure. Then there are also the other
problems that I can’t solve. Those make good research problems. I usually know somebody in the world
that I can suggest it to, and then science advances that way. But I get first crack at it. If it’s an easy one,
then I have a chance then. It’s fun to do this. The danger is I have to know when to stop. If I couldn’t go
on, if I had to solve a problem before I turned to the next problem, I’d never get to the end of The Art.

Feigenbaum: Another way to put it is you can’t plug every hole.

Knuth: That’s right. Very good. It’s a lot of work writing “The Art of Computer Programming,” but the big
benefit is this chance to see patterns that other people didn’t have the opportunity to see, because they
just didn’t spend 40 years gathering the material the way I did.

Feigenbaum: I wanted to ask you, again it’s a heuristics question, but it has to do with another
qualitative aspect of picking problems and finding their solutions, which is the aesthetic that you
mentioned. You mentioned that you had an aesthetic, that other people have aesthetics. You’ve
mentioned to me in the past some various criteria that you use in your aesthetic. Do you want to mention
any of those?

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 60 of 73


Oral History of Donald Knuth

Knuth: Okay. For example, when I’m writing a computer program, I could have different aesthetics. I
could say that the program should be the fastest possible, right? Or it could be the one that uses the
smallest amount of memory. Or the one that takes up the smallest number of keystrokes to type. Or the
one that’s easiest to explain to a student. Or the one that’s hardest to explain to a student. There are
lots of different measures that you can apply to a program. Or to anything; to a piece of literature, music,
whatever. You can say, “My goal is to make this best for teenagers” or whatever it is. Somehow you
have an audience in mind, or some criterion. All artists are trying to optimize some constraints or other
that you have in your mind, as to what you consider most beautiful or most important in this particular
piece of work. In the combinatorial work I’m doing now in volume 4, the main goal tends to be speed,
because we have these problems that involve zillions of cases. Every time we can save 100
nanoseconds, if we’re doing it a billion times, that’s an hour. We look for things like that because we
know that everything we do is going to have a large payoff in that way. But other programs, I just want it
to be elegant in a way that hangs together; somebody can read it and smile. There are so many different
criteria of it. But in all cases, the thing that turns me on is the beauty of it and the style that goes with it.
Dijkstra had a great remark about teaching programming. I find style important in programming. Like the
style in IT, in Perlis’ program, was not great. The program worked, but it was sort of bumpy. Another
program I read when I was in my first year of programming was the SOAP II assembler by Stan Poley at
IBM. It was a symphony. It was smooth. Every line of code did two things. It was like seeing a grand
master playing chess. That’s the first time I got a turn-on saying, “You can write a beautiful program.”
I’ve mentioned that several times, because it did have an important effect on my life. I’m worried about
the present state of programming. Programmers now are supposed to mostly just use libraries.
Programmers aren’t allowed to do their own thing from scratch anymore. They’re supposed to have
reusable code that somebody else has written. There’s a bunch of things on the menu and you choose
from these when you put them together. Where’s the fun in that? Where’s the beauty of that? It’s very
hard, [but] we have to figure out a way that we can make programming interesting for the next generation
of programmers, that it’s not going to be just a matter of reading a manual and plugging in the parameters
in the right order to get stuff. I’ve got to say something else, too, that pops into my mind. I saw a review
a year or so ago in Computing Reviews. Someone had written a book, something about tricks of the
trade, or something like this. It was somebody telling how to use machines efficiently by using some of
the less well-known instructions in the machine. The reviewer of the book hated this. He said, “If I ever
caught any of my programmers using anything in this book, I’d fire them.” Of course I immediately went
to the library and got out the book, because this was the book for me. My attitude is, if there’s a method
that works well and it’s not commonly known to students, let’s not stop using it. Let’s teach the students
how to use this so that it’s understandable and it can be used in the next generation. But this guy, he
was saying, “No, no. We already understand all the possible good ways to write programs. I’m not going
to let anybody write for me using anything subtle.”

Feigenbaum: Yeah, that was the kind of thing that I was telling you. Bob Bemer would come down
when I was a graduate student and tell us about these tricks, like unintended side effects of instructions.
“The designer never intended this but you can do this with it.”

Knuth: Of course, I told you about when I’m writing RUNCIBLE and we were saving one line of code
here because we can use one constant for two different purposes. [In the 650] you could store a data
address and you could store an instruction address. You could actually put one constant in there, and you
could store it with one thing and it would zero out one field and store another one. So we could save;
instead of having two constants we could have one, all kinds of stuff like that. That is terrible

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 61 of 73


Oral History of Donald Knuth

programming. I don’t recommend it at all. If you have a machine that has only 2000 words of memory,
okay. But I’m not recommending tricks just because they’re tricks. Although if your aesthetic is to cram
something in small, like you’re writing something for Gameboy or something, and you can put ten extra
features in there without increasing the size of the cartridge, okay, that’s fine. But [for] most of the things,
it’s much more important to have stuff that is not tricky to the point of breaking whenever you make a
slight change to something else. With literate programming you can document this stuff very carefully, to
warn people against it, but still it’s not great [or] to be recommended. But the fact is, a computer doesn’t
slow down when it gets to a part of the program that was harder to understand. The computer doesn’t
say, “Oh, I don’t understand what I’m doing here,” and then go faster like a human being does. So
there’s no reason for us not to put subtle tricks in our programs -- unless we can’t document them enough
so that the person who’s going to have to modify the program won’t be able to fathom it.

Feigenbaum: Don, I wanted to ask you about another word that you have used, and lots of scientists
use the word, difficult to define, but the word is “taste.” Good taste in problems, good taste in finding
problems, good taste in solving problems. Do you want to say anything about good taste?

Knuth: Well, there’s no accounting for taste. I was going to mention how Dijkstra was talking about
style. That is, you want to teach your students that they should have taste, but you don’t want to tell them
to have the same taste as you. You try to give them the idea of taste. You can imagine a music
composer. If Beethoven or Stravinsky or somebody would take on students, would they be a great
teacher if they told them to compose exactly like they did? Or if they said, “Here’s an example of a strong
style. Now develop your own.” That’s what you really want to do. My feeling is it’s important to have
taste driving yourself and to try to refine your taste, but you can’t impose it on somebody else. There’s no
absolute way for me to know that what I believe is beautiful is going to appeal to somebody else. Still, if I
am trying to define beauty by what other people think is beautiful than me, I think I’m making a mistake.
That’s why I was talking about trendy stuff a minute ago.

Feigenbaum: The other issue that you’ve talked about in the past is exercising some control of your
problem selection by knowing what it is you don’t know. Any words of wisdom about that?

Knuth: Well, the best way to learn what you don’t know is to try to program it, as we were saying. Well,
not exactly. Words of wisdom? I don’t know. I often learn what I don’t know by trying to program it for a
computer. But also I found, like in trying to translate something written in another language, if I try to put
it in my own words, then I realize that I don’t know. If I read somebody else’s translation, I don’t get as
much out of it as if I take a text and try to put it in my own words. This exercise of being a teacher, or in
some way putting yourself into it, is the best way for me to discover what I don’t understand. You can
think you understand something until you try to program it, or do some other thing where you are really
not just repeating something but you’re actually processing it.

Feigenbaum: When you discover some of these holes that need to be plugged, some of them are easy
to solve, and some of them you just don’t know what the answer is.

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 62 of 73


Oral History of Donald Knuth

Knuth: Yeah. I remember the first time in my life when I spent more than 10 hours on a problem and
actually got the answer. When you start out in life, when you start doing something that you don’t know,
you think of a question and then you answer it. First then you discover that oh, the Greeks already had
done that. Then you learn a few more things, and you ask some more questions, and you say oh yeah,
this was done in the 17th century, the 18th century, 19th century. Finally you get to something that, for the
first time in your life, you discovered something that, as far as you know, nobody had discovered before.
Then you’re asking questions and you don’t get anywhere with them. You have to go on to the next
question. I do remember there was a time, when all of a sudden… Up until this day, if I couldn’t get it in
the first hour, I didn’t get it even if I spent a week on it. But here was a time when I had actually worked
on something more than 10 hours and I did get the solution. That was the big time in my life to realize
that I could go that far. What I do now, though, is I try to give myself an hour on these problems, and then
I say, “Well no, I’ll have to pass that on to somebody else,” send a letter to somebody who might do it.
Unless I think I’m almost there, if I think “Well, maybe in another five minutes I’ll get the answer.” Then
another hour later, if I still think I’m five minutes from the answer, I keep going at it. Sometimes I’m
trapped in this mode for a week still. But not too often anymore. Just in the past week I sent off two
problems to other people that I thought would be worth their attention, that they might like.

Feigenbaum: I’d like to switch to the personal Don Knuth. We at Stanford know the personal Don
Knuth. The people watching this video or the scholars of 50 years from now may know the professional
and mathematical Knuth, but they won’t have the privilege of knowing the personal Don Knuth. So I
wanted to ask you a few questions that just relate to the Don that we know and love. You say in your
various biographies, you always end by quoting or saying to the reader that your avocation is music, and
if I had to write out my biography like that, I would also. I would also say my avocation is music. I get as
much of a thrill every week by going to the Stanford choruses as anything else that happens in the week.
But your musical background is way more extensive than mine. Could you tell us something about the
role of music in your life, and if there is any connection with your work? What the role of music in
influencing your work has been?

Knuth: Okay. Well it’s certainly one of my greatest loves, is music. We were just talking a minute ago
about taste. I don’t like all kinds of music. Like everybody, I have certain music that really touches me
deeply, and other kinds that I’m not really enthused about. For example, I spent an hour-and-a-half last
night playing through the score of South Pacific. More than half of the songs in there I find really
beautiful. On the other hand, if I were a professor of music, I would have to find a way to distance myself
from opera, because I’ve given opera a good shot many times, and I’ve seen excellent performances, but
it has never turned me on. So everybody has their taste. My own musical tastes are fairly eclectic. I
love jazz, I love to play things by Dave Brubeck, but other kinds of jazz don’t seem to work very well for
me either. I like Beatles music. I don’t get too thrilled by some kinds of hip-hop and so on. Every
generation also has their own favorite kind of music. It must be partly because of the records that my
father played when I was growing up. Things like Brahms’ Symphonies are things that are deeply
satisfying to me now too, by their familiarity, by what I learned. My father was a musician. He was a
church organist, and a pretty good one. He played at the Chicago World’s Fair in the ‘30s before he got
married. I started piano lessons probably when I was five years old. Throughout high school I was the
accompanist for the chorus, for the choir, and I played in the band. I wanted to play bassoon, but that
was taken, so I played tuba -- the sousaphone. Those were the two instruments that you didn’t have to
own yourself. The school owned the tuba and the bassoons, and our family was poor. We didn’t have
money to buy instruments. My dad earned enough money to buy a piano by teaching piano lessons

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 63 of 73


Oral History of Donald Knuth

himself. I did then get into the band as well as the keyboard music. I took a year of organ lessons when
I was in high school, from my piano teacher. I almost became a music major, as I mentioned, in college.
I went into physics but if I had gone to _________ University, I would’ve been a music major there. I
started looking at arranging. I made arrangements for our school band. When I got into college I wrote
the music for a five-minute skit that our fraternity put on. It was called “Nebbishland.” That was when
nebbishes were popular in greeting cards. I don’t know if anybody in the future will know what nebbishes
were, but one of the lines in there was “We’re all on the verge of insanity.” It might bring back some
memories anyway. Nebbishes. “I’m a nebbish and a nebbish isn’t snobbish.” I’ll probably put this great
musical piece of mine into the final volume of my collected papers, which is going to be called “Selected
Papers on Fun and Games.” There’ll be a little bit of music in there that I did for fun over the years.
During the ‘60s, at the church where I was going, I was a member of the choir. I had mentioned to the
choir director and organist that I had taken a year of organ lessons when I was younger. He knew that I
could do some keyboard skills. If we needed a harpsichord accompaniment or something, I could help
out and he could be directing, or I could go to the console while he’s directing and I can be playing. One
Saturday I got a phone call from his wife saying, “My husband has just come down with a detached retina
in his eye.” In those days, the only way to cure this was for him to sit still for six months with a pack
holding his head steady. She said, “Don, did you say that you knew a little bit about the organ? Can you
play on Sunday and be our temporary organist?” That’s what happened. For six months I was the
organist at our church in Pasadena. Fortunately Pasadena was the home of some of America’s best
organists. There was a famous teacher, Clarence Mader, and five of his students who are still located in
the Pasadena area. If you look at the National Recitalists of the American Guild of Organists, five of them
are from Pasadena. There are others from the east and all around, but we had a very good concentration
of this. So I joined the American Guild of Organists and got to see some very excellent musicians. At
that time I learned something about the literature of the organ. I thought hey, it would be cool in the future
if I sort of was a college professor with an interest in organ. If I had 40 more years to look at this music,
there was some neat pieces of organ that are so good I could never get tired of them and I could learn to
play. When I had my year in Princeton between Caltech and Stanford, I took organ at the Westminster
Choir College. I had a teacher there and I had some other classes there at the college as part of my
year. My teacher, Mary Krimmel, taught me a lot about how to perform. Also, I had made some friends in
Pasadena that had an organ in their own house, and that seemed kind of interesting to me. My father
also had an organ. It was an electronic organ but he had an organ in our house in Milwaukee. When Jill
and I were planning our dream house to be built on the Stanford campus, we decided that we would have
two special rooms in the house. One was my room where I would have a music room and have room for
an organ, and one was her art room, a studio, where she’d have good lighting for working on her art
projects. We couldn’t afford to put in an organ at the beginning, but the architect made sure that there
would be enough bracing in the floor to handle several tons of weight and there was a nice 16-foot ceiling
so that we would have room for a good organ. I spent the next few years thinking about what kind of an
organ would be good to have in the home. Peter Naur in Copenhagen introduced me to five great organ
builders in Denmark. The year that I spent in Norway, I visited him also for a week and talked to some of
the world’s greatest organ makers that he could introduce me to in Denmark. I found out, though, that I
couldn’t buy a Danish organ with any reasonable economic certainty. Because the way it works in
Denmark was that they don’t give you a fixed price on it. The Danish labor contracts are tied to the rate
of inflation. I would have to give them a blank check and say, well, whatever it costs, I would have to pay.
What happened then is that I also talked to American organ builders. I found a very fine one whose shop
is near UCLA, and we hit it off very well. I started going around to all the organs around the Bay Area
and all the Stanford organs and listening to each pipe and each note and making notes, and then worked
with the builder, Pete Seeker [ph?], down in the Los Angeles area. It turned out then that they built an
organ for my house. It’s a nice company that builds about four organs a year. They made an organ for

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 64 of 73


Oral History of Donald Knuth

my house, and I still haven’t seen another house organ that I would rather have than it. It’s designed to
be enjoyed by the person playing it, rather than for the audience. But it really has a lot of varieties of
tone.

Feigenbaum: Why don't you continue with your discussion of the organ?

Knuth: My main hobby had turned out to be then the focus on organs, and this had lots of interesting
little side stories. I'll give you a few. In the first place, I'm making the arrangements for this organ in my
home just at the time when I'm finishing volume three. I have a few jokes in the indexes to my books.
Some of them haven't been discovered yet. Like one of them in the TeX [book] that people found.
There's one that I think, if you look under “ten“… No, no, I'm sorry. If you look under, oh what's her
name? She was the star of the movie "Ten." Oh, goodness , you know the movie I'm talking about?

Feigenbaum: Yeah, sure.

Knuth: Anyway, she's this beautiful woman, and I put her name in the index of the book. If you look
there it will just tell everywhere the number ten appears in the textbook; you can find it indexed that way.
In volume four I have a place in the index where it says "pun resisted." It refers you to a page, and you're
supposed to figure out what pun that I could have made on that page that doesn't appear. I have fun with
my indexes. I try to make them useful, but it takes me six weeks to write them so I have to do something
to amuse myself during that six weeks. In volume three if you look under "royalties, use of" you get to a
page that has a picture of organ pipes on it, because this is what allowed me to get an organ in my
house. In those days it cost $35,000. Other people on the block, their house cost $35,000 in those days.
You can't believe it now, but that was true. That's one little story. That was actually put in before the
organ was built, but I had to sign a contract some years in advance. Through the years, then, the fact
that I can play organ has given me intro to lots of the great organs of the world. I don't have to be a great
organist, I just have to be pretty good for a computer scientist. Then the leading computer scientist, my
host wherever I am, Mexico City or Paris or whatever, will know somebody who knows an organist. Then
I get introduced, and they'll take me over there, and I get to play on the organ. I've played on the world's
best organs. I played on the largest organ too. I got to a point where I had sort of given plenty of
lectures, and I couldn't accept any invitations to travel to give a lecture. But a guy in Philadelphia wrote to
me, and he said, "Don, we really want you to lecture at Drexel University." He says, "Now about organs."
He said, "If you come I'll let you play on the Wannamaker organ, which was the largest musical
instrument in the world. Then we can go to Eaton Hall, and then we can go to Benjamin Franklin…this
old American organ", and so on. He arranged four great organs for me to play in the Philadelphia area,
so naturally I went to speak at Drexel University.

Feigenbaum: “It's a deal”, yeah.

Knuth: The last time I was in Paris I got to play on a really great unique organ. I had two hours to play
on it. I went to Israel, I could play on the organ in the Mormon Center, wherever. This fall I'll be playing in

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 65 of 73


Oral History of Donald Knuth

Bordeaux. I was in Zurich -- organ -- last year, or a year and a half ago. I don't have to be a great
organist in order to have these opportunities. I just have to be a computer scientist who is not too bad. I
don't play in public very much. The one exception, really, was at the University of Waterloo about five
years ago. They have an organ professor there, and he and I put on a concert of organ duets -- music
written for two performers at one or two organs. I practiced with him several times for this. That was the
highlight of my organ playing, where we put this on. The music was, I guess, broadcast a couple of times
on Canadian radio as well. I got to work with a really fine organist. On my web page there's a reference
to the program that we played, some very interesting music.

Feigenbaum: Yeah, I was going to say -- if they broadcast it, you might have tapes, and you can put
them on the website.

Knuth: Yeah, I think I might have a tape in my collection somewhere. Good idea to try to put it on the
Internet.

Feigenbaum: Don, let me move on to something which is important in your life and which we all know
about. The world didn't really know much about it until you published the book "3:16". Namely, your
religious belief, and your studies of religion and religious thought. Do you want to say anything about
that?

Knuth: Well, yeah. This is the Computer History Museum, but it is part of my…

Feigenbaum: We're talking about you, though.

Knuth: That's right. The thing is, I think computer science is wonderful, but it's not everything.
Throughout my life I've been in a very loving religious community. My father -- also my mother, it wasn’t
her career, but she sang in the church choir for 60 years -- but my father dedicated his life to being an
educator in the Lutheran school systems. I was raised in Lutheran schools, and Lutheran high school,
before I went to college. I come from a Midwestern Lutheran German background that has set the scene
for my life. This is something that I've gotten to appreciate, that Luther was a theologian who said you
don't have to close your mind. You keep questioning. You never know the answer. You don't just blindly
believe something. Also, he had ways of making it both intellectual and faith, as a combination. That's
part of my background. I had a lot of exposure to it as I'm young. I'm also a scientist. On Sundays I
would study with other people of our church on aspects of the Bible and other topics in religion. I got this
strange idea that maybe -- the Bible is a complicated subject -- maybe I could study the Bible the way a
scientist would do it, by using random sampling. Like a Gallup poll. You have a complicated thing and
you want to look at a small number of samples. You talk to 1000 people and you try to find out what the
sentiment is in the United States about something. I thought, well what if I did this with the Bible? This
was a complicated book. There's been tens of thousands of books written about the Bible. Instead of
somebody else telling me what parts of the Bible to look at, what if I just chose parts that were selected in
an unbiased way? I was doing this also with other things. I wrote a paper… About this same time we
had this conference which was a pilgrimage to Khwarizm [now Khiva, in Uzbekistan] . The word
algorithm means “from Khwarizm”; it's an Arabic word. We went to Khwarizm, and I gave a talk there

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 66 of 73


Oral History of Donald Knuth

trying to analyze what is the difference between a mathematician and a computer scientist. We did this
by looking at page 100 of many books. We sampled the works of mathematicians to find out, what do
mathematicians do? From an AI point of view we tried to say what would we have to program in the
computer in order to reproduce page 100 of these books. It's this idea of sampling. I was using it also for
grading term papers. A student gives me a 50-page paper. I don't have time to read 40 of these papers
and get my grades done. I only have a week to do the grading. So I would look at parts of the term
paper. The student wouldn't know which parts I was going to look at, but I would look at parts of them
and I would use that to assess the quality of the whole. I got in trouble with this Master's thesis about the
CS bookkeeping problem, that we talked about last week. But anyway, I'm using sampling. So I said, let
me do it to the Bible too. I wanted to have a rule. I would study this with my friends at our church in
Menlo Park. We would, as a group, discuss randomly chosen parts of the Bible. The rule I decided on
was we were going to study Chapter 3, Verse 16 of every book of the Bible. Genesis 3:16, Exodus 3:16,
and it ends with Revelation 3:16. The reason is, that if any part of the Bible is known by its number, it's
John 3:16. There's a verse in the Bible that people put up on Super Bowl Sunday, and it's supposed to
be a capsule summary of the gospel. A lot of people knew it; 3:16 had a catchy phrase in people's
minds. I said, "Okay, we all know John 3:16. Nobody knows what's Genesis 3:16." Well it turned out
very interesting. It's about women's liberation. Exodus 3:16, and so on. I mentioned Peter Wegner last
week. Well, his wife Judith Wegner is a great scholar of women's issues in the Hebrew scriptures. She
couldn't believe it, but three of the verses that I chose are key verses in her own studies. It just turned out
a really strange coincidence. Isaiah 3:16 talks about women strutting. Anyway, it's very funny. It was
just serendipity, but in fact there's a nice joke about it. Somebody called it a “cross section” of the
scriptures because of the cross in Christian theology. I did this with my friends in Menlo Park. Actually I
had announced that we were going to meet the next Sunday and we were going to study Genesis 3:16,
Exodus 3:16. Then I came down with an attack of kidney stones, and I was in the Stanford Hospital. We
couldn't meet for our first session of this group. But I looked and I was in hospital room 316. So I said,
"Whoa. Well, God wants me to continue with this project."

Feigenbaum: A big sign.

Knuth: So we went through, and the class grew in interest all the way through. It could have been a real
dud [if] all these are really boring, but it didn't happen that way. It was sustained all the way through, and
people got inspired. Some of the women in the class were very good at calligraphy, and they would take
these verses and they would write them beautifully, and we'd put them up in front of us as we're studying
the things. I had this experience in the late '70s where sampling gave some insight into the complicated
thing called the Bible. All of a sudden I get this “Aha!” moment in the middle of the night after I met
Hermann Zapf and a whole bunch of other experts on letter forms. I'm working on the TeX project in the
early '80s and I said, "Boy, this class that we did on the 3:16 turned out to be really interesting for us. It
would be interesting to other people too. We could make it into a book. What if I asked Hermann if he
would do a few pages of a book for me like the women in the class had been doing?" He was sort of the
dean of all the calligraphers of the world. He's the god to the calligraphers and he knows all the
calligraphers everywhere. I didn't really dream of asking him to do it, but I asked him to do the cover. I
said, "Herman, I got this strange idea for a book on 3:16. Can you make for me the most beautiful 3
that's ever been drawn in the history of mankind, and the most beautiful colon, and the most beautiful 1
and 6 to go with it, and make it for the cover of the book?" He sends me back a letter. He says, "Don this
is wonderful" and he also gives me sketches of a couple other verses that he looked at in his German
Bible. He says, "Don, I know the best calligraphers in every country of the world. We could invite them,

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 67 of 73


Oral History of Donald Knuth

each one, to do a page." So he's on the bandwagon. I got him, and everybody loves him. To make a
long story short, as I'm on my sabbatical year in Boston I also am going to the Boston Public Library
several hours looking up what all the greatest writers about the Bible have said through the centuries
about Genesis 3:16, et cetera, et cetera. I made my own translations of these verses. This ties in with
your question a minute ago about how do you know what you don't understand? I'm thinking if I really
want to understand Genesis 3:16 I shouldn't read somebody else's translation, but I should look at the
Hebrew words and how those words have been used in other parts of Genesis and so on; how other
people have translated these. I copied out 60 different translations of each verse, so I knew that in my
own translation every mistake I made had also been made by at least ten other people. Then I made my
translation, and I sent out a letter -- Herman and I signed it, both -- commissioning the best artists all over
the world to do these verses as a page in the book. While I'm in Boston these artworks started coming.
It's like getting a Christmas present every day, with these beautiful mailing tubes and all of that. I mean,
the calligraphers also write beautiful letters; "Dear Don" and things like this. Many stories involved with
individual pages later on. That's why I know I can say that the graphic artists are the best people in the
world, because Jill and I have met most of these people in subsequent years. We didn't know very many
of them, of course, in the beginning. Then I had to write the actual text to go with it. I could go to Harvard
Theological Library, and the Boston Public[Library], and I spent a few days at Yale Divinity School, and
the Graduate Theological Union at Berkeley has a great library. Here in Menlo Park we've got an
excellent library in St. Patrick's Seminary. And all the theological literature is well indexed, so that I can,
like Jonah 3:16, you can see what articles in the theological literature have been written that refer to
Jonah 3:16. So I'm not just having a cross-section of the Bible, but all the secondary literature about the
Bible. There's all these tens of thousands of books. I can just look at a few parts of them that are
relevant to this thing, and I can crack open books that I would never see before. For example, John
Calvin writes 90 volumes about theology. But I'm a Lutheran. Why should I ever read any of these? But,
no. Now I look at a few pages of John Calvin. He wrote about Genesis 3:16. Okay, good. I find out he's
got some insights that none of the other people had. I get to appreciate John Calvin. I get to appreciate
St. Patrick. I get to appreciate people from early days of Christianity, different people in the 17th Century,
18th Century, 19th Century, 20th Century, all the different streams; atheist, Jews. Not too many Muslims,
but there was some connection with India and so on that came up. It turned out to be really interesting.
This idea of sampling turned out to be a good time-efficient way to get into a complicated subject. The
result was that I actually got too confident that I knew much more. I started to feel that I knew more about
the Bible than I actually had any right to do, because I'm only studying less than 1/500th of the Bible. But
the thing is, people have this idea. There's a classical definition a liberal education is that you know
everything about something and something about everything. Now I had the point where there were a
few things that I knew everything about. I mean, I had 60 pegs of things that I had researched and I had
found out just about everything that had been written about these small parts of the Bible, but these I had
surrounded. There was nothing vague, so everything else in the Bible sort of could be tacked onto
something solid. It gave me more of a secure feeling that I understood the Bible scholarship than I really
did. But it really shows that this methodology has a lot of merit as long as you don't bias it to a particular
way. It turned out to be an educational experience for me. I met these wonderful artists, and their work
was shown all around the world. It was supported by the National Endowment for the Arts, and it got into
many countries. It was shown in the Guinness Museum in Dublin, and greatest places like that. I saw
some of the work in San Francisco in February on exhibit still.

Feigenbaum: Is it still up there?

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 68 of 73


Oral History of Donald Knuth

Knuth: The original deal was with the artists that they retained possession of the work, and I was paying
them some money for the reproduction rights. What eventually developed was that the collection was so
good, there was a strong feeling that it ought to be kept intact. So we wrote to them saying was it okay if
the San Francisco Public Library wants to keep it in their Harrison Collection. They have the world's best
collection of calligraphy, and they would like to accession these works into their collection. The artists
agreed to this, so that's what happened.

Feigenbaum: So it's permanently in the San--

Knuth: It's permanently in the San Francisco Public Library.

Feigenbaum: Oh, magnificent.

Knuth: Yeah. So I had to come out of the closet saying, "Oh, I'm going to write a book about the Bible."
Well, Isaac Asimov did this. I mentioned this for the first time to somebody when I'm living in Boston that
year on my sabbatical year. There was an ACM SIGCSE convention there -- the computer science
education group -- and I mentioned that I was spending my time at the library looking up these Bible
verses. I thought they would say, "Oh, gosh, you're over the hill now, Don." But surprisingly, people to
my face didn't really laugh at me too much. It's something that I never would talk about in a Stanford
class, but this is a part of my life that integrated with it.

Feigenbaum: Which is why I brought it up.

Knuth: Okay, thank you.

Feigenbaum: I'd like to see if we can bring this full circle by getting into, finally, two aspects of your
career looking back a little bit and looking ahead a little bit. We know you, and the world knows you,
pretty much -- and you've said it yourself I think on the web somewhere -- that you are pretty much a lone
wolf. In fact, I think you even said it last week in this interview.

Knuth: Yeah, could have been.

Feigenbaum: You operate by yourself. We all know that. We leave you alone. You've cut yourself off
from email. You work in your study for long hours. Two questions about that. Is that a myth? Because
you keep talking about all the places you've traveled to and all the people you see. And the other is, how
do you feel about working with collaborators?

Knuth: Okay. I think I mentioned last week that the trouble… I enjoy working with collaborators, but I
don't think they would enjoy working with me, because I'm very unreliable. I march to my own drummer
and I can't be counted on to meet deadlines because I always underestimate things. I'm not a great co-

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 69 of 73


Oral History of Donald Knuth

worker. Also I'm very bad at delegating. That's why I resisted any chance, any opportunity to be the chair
of the department. I knew that I would just be awful at [it]. I'd have to do it myself. I have no good way to
work with somebody else on tasks that I can do myself. I'm just unable to. It's a huge skill that I lack.
With the TeX project I think it was important, however, that I didn't delegate the writing of the code, as we
said before. I needed to be the programmer on a first generation project. I needed to write the manual
too, a manual. I can't understand… Other users write the manual their way, but I had to write a manual
too. If I delegated that, I wouldn't have realized some parts of it are impossible to explain, and I changed
them as I wrote the manual realizing that it was a stupid thing that was there. So I was the tech writer of
the project. I was a user of the project. I had to use TeX in order to typeset volume two. As I'm
typesetting volume two, I kept track of the changes that I made to TeX as I went through volume two. It
turned out almost a perfect straight line: every four pages I type, I got a new idea for how to improve TeX.
For the first 500 pages of that 700-page book, I got a new idea every four pages. The last 200 pages
were sheer boredom and I didn't do it. With 500 pages, if I hadn't been the user, I would not have had
such a good system. I had to live with it before I gave it to somebody else. These are cases in my life
where I think it's a good thing I didn't delegate. Then again, with the TeX project, once it was there, once
I had this prototype out there, then we're getting more and more users. Then we would have every
Friday, for two or three hours, a community meeting of several dozen people discussing questions,
issues, problems with TeX, how to make it better, how to adapt it to their problems. Everybody coming to
Stanford knowing about this could join our sessions on Friday. Also there was quite a team of volunteers
associated with the project. There I'm working together with the group, but I'm still insisting that I be the
final filter on the stuff. Now I should have mentioned, on this giant component work that we did, I
mentioned that Philippe Flajolet and Boris Pittel were involved with it. But also what turned out is, I got it
to a point where I couldn't prove some of the main theorems. I met Svante Janson, one of the greatest
mathematicians, a Swedish guy. I was visiting Norway, so I went up to Uppsala to show him my work on
this. He got enthusiastic about it, and he saw how to get me to the next level of some things that I was
stumped on. Then he had a visitor from Poland who was there, so it turned out that our paper was
published under four authors. He was the leading author because of alphabetical order, and so that's a
joint work. Svante and Tomasz [Luczak] and myself and Boris all worked on the drafts of this. It's a giant
paper too. It filled a whole issue of a journal, I don't know, 120 pages or something like this. We went
through many, many drafts of this, all working together on it. So it's not that I can't ever work with
anyone. In the same room with a person, I think I mentioned that I couldn't think when I was with
Marshall Hall, and so on.

Feigenbaum: It's too distracting, yeah.

Knuth: There was a guy in Princeton who was my office mate, and he and I were perfectly tuned to each
other. Ed Bender. Ed and I, I mean, he could start a sentence and then I would know. Then we would
work on a problem and I would take it as far as I could. Then I'd be stuck and then he would know how to
do the next thing. Then he would be stuck and then I could take it over. Once in a while you find
somebody where you can really do this online interaction, and the synchronization problem is nil, and it
works very well. But I found that actually terrifying, because it would be your responsibility that we had to
invent science whenever we were together. I already had promised to do so many other things, if I get
more stimulation it’ll kill me. I have to finish The Art of Computer Programming, and all these other
things. So part of my being a loner is in order to fulfill the responsibilities that I have already
accumulated. And knowing that I'm not that great for integrating in with somebody else's agenda; I've
got too strong opinions of my own as to what I have to do. On the other hand, I came to Stanford so that I

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 70 of 73


Oral History of Donald Knuth

could collaborate, so that there would be a music department that I could be with. Caltech didn't have a
music department. At Stanford I could be a chair of the Ph.D. thesis committee, I could chair the oral
examination of music students, and students of German, and things like this. I like to come to a place
where there are people who aren't clones of myself that I can learn from. Whenever I am stumped on
something, I can turn to them, and they can help me learn this. I need people to help me read German
and French and Japanese and Russian things, Sanskrit, and so on, when I get into a historical question
where I don't have a translation handy. All kinds of parts of the university are very important that I
collaborate in that way, but not so often where it's a long term project.

Feigenbaum: I think, in all of that, there's probably -- I don't want to go into it here because it's your
interview not mine -- but I think there are some deep issues there having to do with problem solving and
concentration. When you're into something really deep, like piecing together 14 different articles on one
subject to try to make sense of them, it's straining all the limited human information processing abilities,
and you really can't stand a lot of input. Too many symbols change context.

Knuth: Yeah, it's a bandwidth question. It's easier for my left brain to communicate with my right brain
than for me to communicate with another person's brain.

Feigenbaum: Don, there's a little last thing I want to talk about. This is a little bit of a paradox. Well not
so much; you said you like collaboration, so it's not really a paradox. But you say, I think somewhere on
your website or else in one of your publications, that you're predicting that the future of computer science
will be in terms of contributing pairs of people.

Knuth: Yeah.

Feigenbaum: One a computer scientist, and one somebody from another discipline. I'd like to have you
talk about that, particularly in view of the fact that you did string search algorithms, and yet you did not
collaborate with a biologist; those people have the most intensive string search problems that there are
today.

Knuth: Right. Well, to take them in last-in first-out order: the biologists didn't have those problems in
1972.

Feigenbaum: That's true. That's right.

Knuth: The human genomes, now we've got all this data, but there wasn't such data then. Certainly if I
was doing the work now, it would be a different thing. But this pairwise thing is a notion that I have that
might be way off the wall. I didn't limit it to computer scientists and X. I was viewing it as a university as a
whole, including humanities, medicine, everything. I'm saying knowledge in the world is exploding, and
there are so many things now, that trying to look at the way a university might be 100 years from now
compared to the way universities have evolved up to this point, in the following way. Up until this point
we had subjects, and a person would identify themselves with what I call the vertices of a graph, where

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 71 of 73


Oral History of Donald Knuth

one vertex would be mathematics. Another vertex would be biology. Another vertex would be computer
science, a new vertex on the block. There would be a physics vertex, and so on. Then, okay, there was
biophysics and things. There was English, and Spanish, Latin. But people identified themselves as
vertices, because these were the specialties. You could sort of live in that vertex, and you would be able
to understand most of the lectures that were given by your colleagues. The subjects were getting bigger
and bigger, but, still, we used to be able to have a computer science colloquium every week and
everybody would come and we would know. But knowledge is growing and growing to the point where
nobody can say they know all of mathematics, certainly. But also there's also so much interdisciplinary
work now, where we see a mathematician can study the printing industry and see that some of the ideas
of dynamic programming apply to book publishing. Wow! There's interactions galore wherever you look.
You mentioned the electrical engineer who gets a Nobel Prize for medicine because he can do CT
scanning, or whatever. My model of the way the future might go is that people wouldn't identify
themselves with vertices, but rather with edges, with the connections between. Each person is a bridge.
Each person is a bridge between two other areas, and that they identify themselves by the two sub-
specialties that they happen to have a talent for. Then it's more of a network than a group of
departments. It doesn't mean that I'm a loner, but that I'm communicating with the other people who are
branches in the adjacent fields. This is the context in which that remark came up.

Feigenbaum: What you're saying is that it’s an interdisciplinary world.

Knuth: …world. We're going to find that most of the people we talk to are people that have one foot in
the same place than we do.

Feigenbaum: Live on the edges, not in the nodes.

Knuth: Yeah.

Feigenbaum: Don, thank you for sharing all of this with everyone. Not only everyone now, but everyone
50, 100, 200 years from now.

Knuth: Well, thank you for directing it all this way. I hope I can do half as well when I have to sit in your
shoes next time.

END OF INTERVIEW

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 72 of 73


Oral History of Donald Knuth

CHM Ref: X3926.2007 © 2007 Computer History Museum Page 73 of 73


SPECIAL SECTION

Guest Editor
Peter Neumann The Complexity
of Songs

DONALD E. KNUTH

Every day brings new evidence that the concepts of By the Distributive Law and the Commutative Law [4],
computer science are applicable to areas of life which we have
have little or nothing to do with computers. The pur-
c= n- (V+R)m + mV
pose of this survey paper is to demonstrate that impor-
tant aspects of popular songs are best understood in = n- Vm-Rm + Vm (3)
terms of modern complexity theory.
=n-Rm.
It is known [3] that almost all songs of length n re-
quire a text of length ~ n. But this puts a considerable The lemma follows. [3
space requirement on one's memory if many songs are
to be learned; hence, our ancient ancestors invented (It is possible to generalize this lemma to the case of
the concept of a refrain [14]. When the song has a verses of differing lengths V1, V2. . . . ~Vm, provided that
refrain, its space complexity can be reduced to cn, the sequence (Vk) satisfies a certain smoothness condi-
where c < 1 as shown by the following lemma. tion. Details will appear in a future paper.)
A significant improvement on Lemma 1 was discov-
LEMMA 1. ered in medieval European Jewish communities where
Let S be a song containing m verses of length V and a an anonymous composer was able to reduce the com-
refrain of length R where the refrain is to be sung first, plexity to O(x/n). His song "Ehad Mi Yode'a" or "Who
last, and between adjacent verses. Then, the space Knows One?" is still traditionally sung near the end of
complexity of S is ( V / ( V + R)) n + O(1) for fixed V the Passover ritual, reportedly in order to keep the chil-
and R as m ~ oo. dren awake [6]. It consists of a refrain and 13 verses
vl . . . . . v13, where v~ is followed by vk-1 • .. v2vl before
PROOF. the refrain is repeated; hence m verses of text lead to
T h e l e n g t h of S when s u n g i s 1/2m2 .-F O(m) verses of singing. A similar song called
"Green Grow the Rushes O" or "The Dilly Song" is
n =R+(V+R)m (1)
often sung in western Britain at Easter time [1], but it
has only twelve verses (see [1]), where Breton, Flemish,
while its space complexity is
German, Greek, Medieval Latin, Moldavian, and Scottish
c = R + Vm. (2) versions are cited.
The coefficient of ~n was further improved by a Scot-
tish farmer named O. MacDonald, whose construction~
appears in Lemma 2.
The research reported here was supported in part by the National Institute of
Wealth under grant $262,144. ActuallyMacDonald'spriorityhas beendisputedby somescholars;Peter
Kennedy([8],p. 676)claimsthat "1BoughtMyselfa Cock"and similarfarm-
©1984ACMO001-0782/84/0400-0344 75¢ yard songsare actuallymucholder.

344 Communications of the ACM April 1984 Volume 27 Number 4


Special Section

L E M M A 2. when a is fixed. Therefore if MacDonald's farm animals


Given positive integers a and X, there exists a song ultimately have long names they should make slightly
whose complexity is (20 + X + a) ~/n/(30 + 2)0 + O(1). shorter noises.
Similar results were achieved by a French Canadian
PROOF. ornithologist, who named his song schema "Alouette"
Consider the following schema [9]. [2, 15]; and at about the same time by a Tyrolean
butcher whose schema [5] is popularly called "Ist das
V o = 'Old MacDonald had a farm, ' Ri
nicht ein Schnitzelbank?" Several other cumulative
R1 = 'Ee-igh, ,2 'oh! ' songs have been collected by Peter Kennedy [8], in-
cluding "The Mallard" with 17 verses and "The Barley
R2(x) -- Vo 'And on this farm he had Mow" with 18. More recent compositions, like "There's
a Hole in the Bottom of the Sea" and "I Know an Old
some' x', ' R1 'With a'
Lady Who Swallowed a Fly" unfortunately have com-
U~(x, x') = x', ' x' ' here and a ' x', ' x ' ' there; ' paratively large coefficients.
A fundamental improvement was claimed in England
U2(x, y) = x'here a ' y, ' ' in 1824, when the true love of U. Jack gave to him a
U3(x, x') = Ui(x, x') U2(g, x) U2('t', x ' ) total of 12 ladies dancing, 22 lords a-leaping, 30 drum-
mers drumming, 36 pipers piping, 40 maids a-milking,
U2('everyw', x ' , ' x') 42 swans a-swimming, 42 geese a-laying, 40 golden
rings, 36 collie birds, 30 french hens, 22 turtle doves,
Vk = U3(Wk, W~)Vk-i for k _> 1 and 12 partridges in pear trees during the twelve days
where of Christmas [11]. This amounted to 1/6 m 3 -.b 1/2 m 2 -b 1/a
m gifts in m days, so the complexity appeared to be
W1 = 'chick', W2 = 'quack', O(3~n); however, it was soon pointed out [10] that his
W3 = 'gobble', Wk = 'oink', (5) computation was based on n gifts rather than n units of
singing. A complexity of order ~/n/log n was finally
W5 = 'moo', W6 = 'hee' , established (.see [7]).
We have seen that the p a r t r i ~ in the pear tree gave
and
an improvement of only 1/Vlog n; but the importance
W~- = W k for k#6;W~ ='haw'. (6) of this discovery should not be underestimated since it
showed that the n °5 barrier could be broken. The next
The song of order m is defined by
big breakthrough was in fact obtained by generalizing
(7) the partridge schema in a remarkably simple way. It
was J. W. Blatz of Milwaukee, Wisconsin who first dis-
J~,~ = R2(W;[i)VmJ'~-I for m_>l, covered a class of songs known as "m Bottles of Beer on
where the Wall"; her elegant construction2 appears in the fol-
lowing proof of the first major result in the theory.
W~' = 'chicks', W~' = 'ducks',
THEOREM 1.
W3" = 'turkeys', W ~ = 'pigs', (8) There exis t songs of complexity O(log n).
W~' = 'cows', W~' = ' d o n k e y s ' . PROOF.
Consider the schema
The length of Y (m) is
Vk = TkBW', '
n = 30m 2 + 153m
TkB'; '
+ 4(m/1 + (m - 1)/2 -4- . . . + /m)
If one of those bottles should (12)
+ (al + . . . + am) (9)
happen to fall, '
while the length of the corresponding schema is
Tk-iB W'.'
c = 20m + 211 + (/1 + . . . + / , , )
where
+ (al + . . . + a,,). (10)
B = ' bottles of beer' , (13)
HereA = IWkl + IW/I and ak = IW~Yl,where Ixl
denotes the length of string x. The result follows at W = ' on the wall' ,
once, if we assume that ,~ = X and ak = a for all
large k. [3 and where Tk is a translation of the integer k into Eng-
Note that the coefficient (20 + X + a ) / ~ / ~ + 2X
2 Again Kennedy ([8], p. 631) claims priority for the English, in this case
assumes its m i n i m u m value at because of the song "I'11 drink m if you'll drink m + 1." However, the English
start at m = 1 and get no higher than m = 9, possibly because they actually
k = max(l, a-lO) (11) drink the beer instead of allowing the bottles to fall.

April 1984 Volume 27 Number 4 Communications of the ACM 34S


Special Section

lish. It requires only O(m) space to define Tk for all Acknowledgment I wish to thank J. M. Knuth and J. S.
k < 10" since we can define Knuth for suggesting the topic of this paper.
Tq.lo.,.r = Tq ' times 10 to the ' Tin' plus ' T, (14)
REFERENCES
for 1 _< q _< 9 and 0 _< r < 10m-L 1. Rev. S. Baring-Gould, Rev. H. Fleetwood Sheppard, and F.W. Bus-
sell, Songs of the West (London: Methuen, 1905}, 23, 160-161.
Therefore the songs Sk defined by 2. Oscar Brand, Singing Holidays (New York: Alfred Knopf, 1957), 68-
69.
So=e,Sk= VkSk-1 for k>_ 1 (15) 3. G.J. Chaitin, "On the length of programs for computing finite binary
sequences: Statistical considerations," J. ACM 16 (1969), 145-159.
have length n X k log k, but the schema which defines 4. G. Chrystal, Algebra, an Elementary Textbook (Edinburgh: Adam and
them has length O(log k); the result follows. [3 Charles Black, 1886), Chapter 1.
5. A. D6rrer, Tiroler Fasnacht (Wien, 1949), 480 pp.
Theorem 1 was the best result known until recently 3, 6. Encyclopedia Judaica (New York: Macmillan, 1971), v. 6 p. 503; The
perhaps because it satisfied all practical requirements Jewish Encyclopedia (New York: Funk and Wagnalls, 1903); articles
for song generation with limited memory space. In fact, on Ehad Mi Yode'a.
7. U. Jack, "Logarithmic growth of verses," Acta Perdix 15 (1826),
99 bottles of beer usually seemed to be more than suffi- 1-65535.
cient in most cases. 8. Peter Kennedy, Folksongs of Britain and Ireland (New York: Schirmer,
1975), 824 pp.
However, the advent of modern drugs has led to
9. Norman Lloyd. The New Golden Song Book (New York: Golden Press,
demands for still less memory, and the ultimate im- 1955), 20-21.
provement of Theorem 1 has consequently just been 10. N. Picker, "Once sefiores brincando al mismo tiempo," Acta Perdix
12 (1825), 1009.
announced: 11. ben shahn, a partridge in a pear tree (New York: the museum of
modern art, 1949), 28 pp. (unnumbered).
THEOREM 2. 12. Cecil J. Sharp, ed., One Hundred English Folksongs (Boston: Oliver
There exist arbitrarily long songs of complexity O(1). Ditson, 1916), xlii.
13. Christopher J. Shaw, "that old favorite, A p i a p t / a Christmastime al-
PROOF: (due to Casey and the Sunshine Band). Consider gorithm," with illustrations by Gene Oltan, Datamation 10, 12 (De-
cember 1964), 48-49. Reprinted in Jack Moshman, ed., Faith, Hope
the songs Sk defined by (15), but with and Parity (Washington, D.C.: Thompson, 1966), 48-51.
14. Gustav Thurau, Beitr~ge zur Geschichte und Charakteristik des Refrains
Vk = 'That's the way,' U 'I like it, ' U (16) in derfranzosischen Chanson (Weimar: Felber, 1899), 47 pp.
15. Marcel Vigneras, ed., Chansons de France (Boston: D.C. Heath. 1941),
U = 'uh huh,' 'uh huh' 52 pp.

for all k. [3
Permission to copy without fee all or part of this material is granted
It remains an open problem to study the complexity provided that the copies are not made or distributed for direct commer-
of nondeterministic songs. cial advantage, the ACM copyright notice and the title of the publication
and its date appear, and notice is given that copying is by permission of
3 The chief rival for this honor was "This old man, he played m, he played the Association for Computing Machinery. To copy otherwise, or to
knick-knack...'. republish, requires a fee a n d / o r specific permission.

346 Communications of the ACM April 1984 Volume 27 Number 4


Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

the trusted technology learning source

Home > Articles > Programming

Related Resources
Twenty Questions for Donald Knuth Store Articles Blogs
By Donald E. Knuth
May 20, 2014 Learning MIT App Inventor: A
Hands-On Guide to Building
Your Own Android Apps
By Derek Walter, Mark Sherman
⎙ Print + Share This 💬 Discuss Page 1 of 1
$27.99

Learning MIT App Inventor: A


From the author of Hands-On Guide to Building
Your Own Android Apps
To celebrate the By Derek Walter, Mark Sherman
publication of Art of Computer Programming, The: Volume 1:
Fundamental Algorithms, 3rd Edition $22.39
the eBooks of
The Art of Learn More  Buy
From Mathematics to Generic
Computer
Programming
Programming,
By Alexander A. Stepanov, Daniel E.
(TAOCP), we Rose
asked several
$31.99
computer
scientists, contemporaries, colleagues, and well-wishers to pose one
question each to author Donald E. Knuth. Here are his answers. See All Related Store Items

Check informit.com/knuth throughout 2014 to purchase Vol 3-4A eBooks as they become
available. If you want email notifications, send an email to taocp@awl.com.

1. Jon Bentley, researcher: What a treat! The last time I had an opportunity like this was at
the end of your data structures class at Stanford in June, 1974. On the final day, you opened
the floor so that we could ask any question on any topic, barring only politics and religion. I still
vividly remember one question that was asked on that day: "Among all the programs you've
written, of which one are you most proud?"

Your answer (as I approximately recall it, four decades later) described a compiler that you
wrote for a minicomputer with 1024 available bytes of memory. Your first draft was 1029 bytes
long, but you eventually had it up and running and debugged at 1023 bytes. You said that you
were particularly proud of cramming so much functionality into so little memory.

My query today is a slight variant on that venerable question. Of all the programs that you've
written, what are some of which you are most proud, and why?

Don Knuth: I'd like to ask you the same! But that's something like asking parents to name their
favorite children.

Of course I'm proud of and , because they seem to have helped to

1 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

change the world, and because they led to many friendships. Furthermore they've made these
eBooks possible: I'm enormously happy that the work I did more than 30 years ago has
miraculously survived many changes of technology, and that the 3,000 pages of TAOCP now
look so great on a little tablet—even after zooming.

While I was preparing for Volume 4 of TAOCP in the 90s, I wrote several dozen short routines
using what you and I know as "literate programming." Those little essays have been packaged
into The Stanford GraphBase (1994), and I still enjoy using and modifying them. My favorite is
the implementation of Tarjan's beautiful algorithm for strong components, which appears on
pages 512–519 of that book.

I have to admit some pride also in the implementation of IEEE floating-point arithmetic that
appears in my book MMIXware (1999), as well as that book's metasimulator for MMIX, in which I
explain many principles of advanced pipelined computers from the ground up.

Literate programming continues to be one of the greatest joys of my life. In fact, I find myself
writing roughly two programs per week, on average, both large and small, as I draft new
material for the next volumes of TAOCP.

2. Dave Walden, Users Group: Might you publish the original 3,000-page version of
TAOCP (before the decision to change it into seven volumes), as a historical artifact of your
view of the state of the art of algorithms and their analysis circa 1965? I think lots of people
would like to see this.

Don Knuth: Scholars can look at the handwritten pages that led to Volumes 1–3 by going to
the Stanford Archives, and all of the remaining pages will be deposited there eventually. I see
little value in making those drafts more generally available—although some of the material
about baseball that I decided not to use is pretty cool. Archives from the real pioneers of
computer science, who wrote in the 40s and 50s, should be published first.

I do try to retain the youthful style of the original, in the pages that I write today, except where
my first draft was embarrassingly naïve or corny. I've also learned when to say "that" instead of
"which," thanks in part to Guy Steele's tutelage.

3. Charles Leiserson, MIT: TAOCP shows a great love for computer science, and in particular,
for algorithms and discrete mathematics. But love is not always easy. When writing this series,
when did you find yourself reaching deepest into your emotional reservoir to overcome a
difficult challenge to your vision?

Don Knuth: Again, Charles, I'd like to ask you exactly the same question!

For me, I guess, the hardest thing has always been to figure out what to cut. And I obviously
haven't been very successful at that, in spite of much rewriting.

The most difficult technical challenge was to write the metasimulator for MMIX. I needed to do
that behind the scenes, in order to shape what actually appears in the books, and it was surely
the toughest programming task that I've ever faced. Without the methodology of literate
programming, I don't think I could have finished that job successfully.

Many of the "starred" mathematical sections also stretched me pretty far. Overall, however,
after working on TAOCP for more than fifty years, I can't think of any aspect of the activity
where the effort of writing wasn't amply repaid by what I learned while doing it.

4. Dennis Shasha, NYU: How does a beautiful algorithm compare to a beautiful theorem? In
other words, what would be your criteria of beauty for each?

Don Knuth: Beauty has many aspects, of course, and is in the eye of the beholder. Some
theorems and algorithms are beautiful to me because they have many different applications;
some because they do powerful things with severely limited resources; some because they
involve aesthetically pleasing patterns; some because they have a poetic purity of concept.

For example, I mentioned Tarjan's algorithm for strong components. The data structures that he
devised for this problem fit together in an amazingly beautiful way, so that the quantities you
need to look at while exploring a directed graph are always magically at your fingertips. And his
algorithm also does topological sorting as a byproduct.

2 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

It's even possible sometimes to prove a beautiful theorem by exhibiting a beautiful algorithm.
Look, for instance, at Theorem 5.1.4D and/or Corollary 7H in TAOCP.

5. Mark Taub, Pearson: Does the emergence of "apps" (small, single-function, networked
programs) as the dominant programming paradigm today impact your plans in any way for
future material in TAOCP?

Don Knuth: People who write apps use the ideas and paradigms that are already present in
the first volumes. And apps make use of ever-growing program libraries, which are intimately
related to TAOCP. Users of those libraries ought to know something about what goes on inside.

Future volumes will probably be even more "app-likable," because I've been collecting tons of
fascinating games and puzzles that illustrate programming techniques in especially instructive
and appealing ways.

6. Radia Perlman, Intel: (1) What is not in the books that you wish you'd included? (2) If you'd
been born 200 years ago, what kind of career might you imagine you'd have had?

Don Knuth: (1) Essentially everything that I want to include is either already in the existing
volumes or planned for the future ones. Volume 4B will begin with a few dozen pages that
introduce certain newfangled mathematical techniques, which I didn't know about when I wrote
the corresponding parts of Volume 1. (Those pages are now viewable from my website in
beta-test form, under the name "mathematical preliminaries redux.") I plan to issue similar
gap-filling "fascicles" when future volumes need to refer to recently invented material that
ultimately belongs in Volume 3, say.

(2) Hey, what a fascinating question—I don't think anybody else has ever asked me that before!

If I'd been born in 1814, the truth is that I would almost certainly have had a very limited
education, coupled with hardly any access to knowledge. My own male ancestors from that era
were all employed as laborers, on farms that they didn't own, in what is now called northern
Germany.

But I suppose you have a different question in mind. What if I had been one of the few people
with a chance to get an advanced education, and who also had some flexibility to choose a
career?

All my life I've wanted to be a teacher. In fact, when I was in first grade, I wanted to teach first
grade; in second grade, I wanted to teach second; and so on. I ended up as a college teacher.
Thus I suppose that I'd have been a teacher, if possible.

To continue this speculation, I have to explain about being a geek. Fred Gruenberger told me
long ago that about 2% of all college students, in his experience, really resonated with
computers in the way that he and I did. That number stuck in my mind, and over the years I was
repeatedly able to confirm his empirical observations. For instance, I learned in 1977 that the
University of Illinois had 11,000 grad students, of whom 220 were CS majors!

Thus I came to believe that a small percentage of the world's population has somehow
acquired a peculiar way of thinking, which I happen to share, and that such people happened to
discover each other's existence after computer science had acquired its name.

For simplicity, let me say that people like me are "geeks," and that geeks comprise about 2% of
the world's population. I know of no explanation for the rapid rise of academic computer science
departments—which went from zero to one at virtually every college and university between
1965 and 1975—except that they provided a long-needed home where geeks could work
together. Similarly, I know of no good explanation for the failure of many unsuccessful software
projects that I've witnessed over the years, except for the hypothesis that they were not
entrusted to geeks.

So who were the geeks of the early 19th century? Beginning a little earlier than 1814, I'd maybe
like to start with Abel (1802); but he's been pretty much claimed by the mathematicians. Jacobi
(1804), Hamilton (1805), Kirkman (1806), De Morgan (1806), Liouville (1809), Kummer (1810),
and China's Li Shanlan (1811) are next; I'm listing "mathematicians" whose writings speak
rather directly to the geek in me. Then we get precisely to your time period, with Catalan (1814)
and Sylvester (1814), Boole (1815), Weierstraß (1815), and Borchardt (1817). I would have
enjoyed the company of all these people, and with luck I might have done similar things.

3 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

By the way, the first person in history whom I'd classify as "100% geek" was Alan Turing. Many
of his predecessors had strong symptoms of our disease, but he was totally infected.

7. Tony Gaddis, author: Do you remember a specific moment when you discovered the joy of
programming, and decided to make it your life's work?

Don Knuth: During the summer of 1957, between my freshman and sophomore years at Case
Tech in Cleveland, I was allowed to spend all night with an IBM 650, and I was totally hooked.

But there was no question of viewing that as a "life's work," because I knew of nobody with
such a career. Indeed, as mentioned above, my life's work was to be a teacher. I did write a
compiler manual in 1958, which by chance was actually used as the textbook for one of my
classes in 1959(!). Still, programming was for me primarily a hobby at first, after which it
became a way to support myself while in grad school.

I saw no connection between computer programming and my intended career as a math


professor until I met Bob Floyd late in 1962. I didn't foresee that computer science would ever
be an academic discipline until I met George Forsythe in 1964.

8. Robert Sedgewick, Princeton: Don, I remember some years ago that you took the position
that you weren't trying to reach everyone with your books—knowing that they would be
particularly beneficial to people with a certain interest and aptitude who enjoy programming and
exploring its relationship to mathematics. But lately I've been wondering about your current
thoughts on this issue. It took a long time for society to realize the benefits of teaching
everyone to read; now the question before us is whether everyone should learn to program.
What do you think?

Don Knuth: I suppose all college professors think that their subject ought to be taught to
everybody in the world. In this regard I can't help quoting from a wonderful paper that John
Hammersley wrote in 1968:

Just for the fun of getting his reactions, I asked an eminent scholar of English Literature
what educational benefits might lie in the study of goliardic verse, Erse curses, and runic
erotica. 'A working background of goliardic verse would be more than helpful to anyone
hoping to have some modest facility in his own mother tongue', he declared; and with that
he warmed to his subject and to the poverties of unlettered science, so that it was some
minutes before I could steer him back to the Erse curses, about which he seemed a good
deal less enthusiastic. 'Really', he said, 'that sort of thing isn't my subject at all. Of course, I
applaud breadth of vocabulary; and you never know when some seemingly useless piece
of knowledge may not turn out to be of cardinal practical importance. I could certainly
envisage a situation in which they might come in very handy indeed'. 'And runic erotica?'
'Not extant'. (Was it only my fancy that heard a note of faint regret in his reply?) Certainly
the higher flights of scholarship can add savour; but does the man-in-the-street have the
time and the pertinacity and the intellectual digestion for them?

Programming, of course, is not just an ordinary subject. It is intrinsically empowering, and


applicable to many different kinds of knowledge. And I also know that you've been having
enormous successes, at Princeton and online, teaching advanced concepts of programming to
students from every discipline.

But your question asks about everybody. I still think many years will have to go by before I
would recommend that my own highly intelligent wife, son, and daughter should learn to
program, much less that everybody else I know should do so.

Nick Trefethen told me a few years back that he had just visited his son's high school in Oxford,
which is one of the best anywhere, and learned that not a single student knew how to program!
Britain is now beginning to change that, indeed at a more rapid pace than in America. Yet such
a revolution almost surely needs to take place over a generation or more. Where are the
teachers going to come from?

My own experience is with the subset of college students who are sufficiently interested in
programming that they expect it to become an integral part of their life. TAOCP is essentially for
specialists. I've primarily been writing it for geeks, not for a general audience, because
somebody has to write books that aren't for dummies. (By a "dummy" I mean a smart non-geek.
That's a much larger market, and very important; but it's not my target audience, and general

4 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

education is not my forte.)

On the other hand, believe it or not, I try to explain everything in my books by imagining a
non-specialist reader. My goal is to be jargon-free whenever possible; I especially try to avoid
terms from higher mathematics that tend to frighten the programmer-on-the-street. Whenever
possible I try to translate results from the theoretical literature into a language that high-school
students could understand.

I know that my books still aren't terribly easy to fathom, even for geeks. But I could have made
them much, much harder.

9. Barbara Steele: What was the conversion process, and what tools did you use, to convert
your print books to eBooks?

Don Knuth: I knew that these volumes would not work especially well as eBooks unless they
were converted by experts. Fortunately I received some prize money in 2011, which could be
used to pay for professional help. Therefore I was able to achieve the kind of quality that I
envisioned, without delaying my work on future volumes, by letting the staff at Mathematical
Sciences Publishers in Berkeley (MSP) handle all of the difficult stuff.

My principal goal was to make the books easily searchable—and that's a much more
challenging problem than it seems, if you want to do it right. Secondarily, I wanted to let readers
easily click on the number of any exercise or equation or illustration or table or algorithm, etc.,
and to jump to that exercise; also to jump readily between an exercise and its answer.

The people at MSP wrote special software that converts my source text into suitable input
to other software that creates pdf files. I don't know the details, except that they use "change
files" analogous to those used in WEB and CWEB. I've checked the results pretty carefully, and I
couldn't be more pleased. Moreover, they've designed things so that it won't be hard for me to
make changes next year, as readers discover bugs in the present editions.

(My style of writing tends to maximize the number of opportunities to make mistakes, hence I
would be fooling myself if I thought that the books were now perfect. Therefore it has always
been important to keep future errata in mind. The production staff at Addison-Wesley has been
consistently wonderful in the way they allow me to correct about fifty pages every year in each
volume.)

10. Silvio Levy, MSP: Could you comment on the differences between the print, pdf, ePUB,
etc., editions of TAOCP? What would you say is gained or lost with each?

Don Knuth: The printed versions weigh a lot more, but they don't need battery power or a
tether to electricity. They are always there; I don't have to turn them on, and I can have them all
open at once.

I can scribble in the margins (and elsewhere) of the print versions, and I can highlight text in
different colors. Ten years from now I expect analogous features will be commonly available for
eBooks.

I'm used to flipping pages and finding my way around a regular book, much more so than in an
eBook; but my grandchildren might have the opposite reaction.

The great advantage of an eBook is the reader's ability to search exhaustively. What fun it is to
look for all occurrences of a random word like 'game', or for a random word fragment like 'gam'
or 'ame', and find lots of cool material that I don't recall having written. The search feature on
these books works even better than I had a right to hope for.

The index in a printed book has the advantage of being more focused. But that index also
appears in the eBook, and in the eBook you can even click in the index to get to the cited
pages.

Today's eBook readers are often inconvenient for setting bookmarks and going back to where
you were a couple of minutes ago, especially after you click on an Internet link and then want to
go back to reading. But that software will surely improve, and so will today's electronic devices.

In the future I look forward to curated eBooks that have additional notes by experts—and
possibly even graffiti in the style of Concrete Mathematics—somewhat analogous to the

5 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

"director's comments" and other extras found on the DVDs for films. One could select different
subsets of these comments when reading.

11. Peter Gordon, Addison-Wesley (retired): If the full range of today's eBook features and
functionalities had been available when TAOCP was first published, would you have written
those volumes very differently?

Don Knuth: Well, I don't think I would have gotten very far at all. I would have had to think
about doing everything in color, and with interactive figures, tables, equations, and exercises. A
single person cannot use the "full range" of features that eBooks potentially have.

But by limiting myself to what can be presented well in black-and-white type, on printed pages
of a fixed size, I was fortunately able to complete 3,000 pages over a period of 50 years.

12. Udi Manber, Google: The early volumes of TAOCP established computer programming as
computer science. They introduced the necessary rigor. This was at the time when computers
were used mostly for numerical applications. Today, most applications are related to people
—social interaction, search, entertainment, and so on. Rigor is rarely used in the development
of these applications. Speed is not always the most important factor, and "correctness" is rarely
even defined. Do you have any advice on how to develop a new computer science that can
introduce rigor to these new applications?

Don Knuth: The numerical computations that were somewhat central when computer science
was born are by no means gone; they continue to grow, year by year. Of course, they now
represent a much smaller piece of the pie, but I don't believe in concentrating too much on the
big pieces.

My work on introduced me to applications where "correctness" cannot be


defined. How do I know, for example, that my program for the letter A produces a correct
image? I never will; and I've learned to live with that uncertainty. On the other hand, when I
implemented the routines that interpret specifications and draw the
associated bitmaps, there was plenty of room for rigor. The algorithms that go into font
rendering are among the most interesting I've ever seen.

As a user of products from Google and Adobe and other corporations, I know that a
tremendous amount of rigor goes into the manipulation of map data, transportation data, pixel
data, linguistic data, metadata, and so on. Furthermore, much of that processing is done with
distributed and decentralized algorithms that require more rigor than anybody ever thought of in
the 60s.

So I can't say that rigor has disappeared from the computer science scene. I do wish, however,
that Google's and Adobe's and Apple's programmers would learn rigorously how to keep their
systems from crashing my home computers, when I'm not using Linux.

In general I agree with you that there's no decrease in the need for rigor, rather an increase in
the number of kinds of rigor that are important. The fact that correctness can't be defined on the
"bottom line" should not lull people into thinking that there aren't intermediate levels within
every nontrivial system where correctness is crucial. Robustness and quality are compromised
by every weak link.

On the other hand, I certainly don't think that everything should be mathematized, nor that
everything that involves computers is properly a subdiscipline of computer science. Many parts
of important software systems do not require the special talents of geeks; quite the contrary.
Ideally, many disciplines collaborate, because a wide variety of orthogonal skill sets is a
principal reason why life is such a joy. Vive la différence.

Indeed, I myself follow the path of rigor only partway: Rarely do I ever give a formal proof that
any of my programs are correct, once I've constructed an informal proof that convinces me. I
have no real interest, for example, in defining exactly what it would mean for to be correct,
or for verifying formally that my implementation of that 550-page program is free of bugs. I know
that anomalous results are possible when users try to specify pages that are a mile wide, or
constants that involve a trillion zeros, etc. I've taken care to avoid catastrophic crashes, but I
don't check every addition operation for possible overflow.

There's even a fundamental gap in the foundations of my main mathematical specialty, the

6 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

analysis of algorithms. Consider, for example, a computer program that sorts a list of numbers
into order. Thanks to the work of Floyd, Hoare, and others, we have formal definitions of
semantics, and tools by which we can verify that sorting is indeed always achieved. My job is to
go beyond correctness, to an analysis of such things as the program's running time: I write
down a recurrence, say, which is supposed to represent the average number of comparisons
made by that program on random input data. I'm 100% sure that my recurrence correctly
describes the program's performance, and all of my colleagues agree with me that the
recurrence is "obviously" valid. Yet I have no formal tools by which I can prove that my
recurrence is right. I don't really understand my reasoning processes at all! My student Lyle
Ramshaw began to create suitable foundations in his thesis (1979), but the problem seems
inherently difficult. Nevertheless, I don't lose any sleep over this situation.

13. Al Aho, Columbia: We all know that the Turing Machine is a universal model for
sequential computation.

But let's consider reactive distributed systems that maintain an ongoing interaction with their
environment—systems like the Internet, cloud computing, or even the human brain. Is there a
universal model of computation for these kinds of systems?

Don Knuth: I'm not strong on logic, so TAOCP treads lightly on this sort of thing. The TAOCP
model of computation, discussed on pages 4–8 of Volume 1, considers "reactive processes,"
a.k.a. "computational methods," which correspond to single processors. I've long planned to
discuss recursive coroutines and other cooperative processes in Chapter 8, after I finish
Chapter 7. The beautiful model of context-free parsing via semiautonomous agents, in Floyd's
great survey paper of 1964, has strongly influenced my thinking in this regard.

I'd like to see extensions of the set-theoretic model of computation at the beginning of Volume 1
to the things you mention. They might well shed light on the subject.

But fully distributed processes are well beyond the scope of my books and my own ability to
comprehend them. For a long time I've thought that an understanding of the way ant colonies
are able to perform incredibly organized tasks might well be the key to an understanding of
human cognition. Yet the ants that invade my house continually baffle me.

14. Guy Steele, Oracle Labs: Don, you and I are both interested in program analysis: What
can one know about an algorithm without actually executing it? Type theory and Hoare logic are
two formalisms for that sort of reasoning, and you have made great contributions to using
mathematical tools to analyze the execution time of algorithms. What do you think are
interesting currently open problems in program analysis?

Don Knuth: Guy, I'm sure you aren't really against the idea of program execution. You and I
both like to know things about programs and to execute them. Often the execution contradicts
our supposed knowledge.

The quest for better ways to verify programs is one of the famous grand challenges of computer
science. And as I said to Udi, I'm particularly rooting for better techniques that will avoid
crashes.

Just now I'm writing the part of Volume 4B that discusses algorithms for satisfiability, a problem
of great industrial importance. Almost nothing is known about why the heuristics in modern
solvers work as well as they do, or why they fail when they do. Most of the techniques that have
turned out to be important were originally introduced for the wrong reasons!

If I had my druthers, I wish people like you would put a lot of effort into a problem of which I've
only recently become aware: The programmers of today's multithreaded machines need new
kinds of tools that will make linked data structures much more cache-friendly. One can in many
cases start up auxiliary parallel threads whose sole purpose is to anticipate the memory
accesses that the main computational threads will soon be needing, and to preload such data
into the cache. However, the task of setting this up is much too daunting, at present, for an
ordinary programmer like me.

15. Robert Tarjan, Princeton: What do you see as the most promising directions for future
work in algorithm design and analysis? What interesting and important open problems do you
see?

7 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

Don Knuth: My current draft about satisfiability already mentions 25 research problems, most
of which are not yet well known to the theory community. Hence many of them might well be
answered before Volume 4B is ready. Open problems pop up everywhere and often. But your
question is, of course, really intended to be much more general.

In general I'm looking for more focus on algorithms that work fast with respect to problems
whose size, n, is feasible. Most of today's literature is devoted to algorithms that are
asymptotically great, but they are helpful only when n exceeds the size of the universe.

In one sense such literature makes my life easier, because I don't have to discuss those
methods in TAOCP. I'm emphatically not against pure research, which significantly sharpens
our abilities to deal with practical problems and which is interesting in its own right. So I
sometimes play asymptotic games. But I sure wouldn't mind seeing a lot more algorithms that I
could also use.

For instance, I've been reading about algorithms that decide whether or not a given graph G
belongs to a certain class. Is G, say, chordal? You and others discovered some great
algorithms for the chordality and minimum fillin problems, early on, and an enormous number of
extremely ingenious procedures have subsequently been developed for characterizing the
graphs of other classes. But I've been surprised to discover that very few of these newer
algorithms have actually been implemented. They exist only on paper, and often with details
only sketched.

Two years ago I needed an algorithm to decide whether G is a so-called comparability graph,
and was disappointed by what had been published. I believe that all of the supposedly "most
efficient" algorithms for that problem are too complicated to be trustworthy, even if I had a year
to implement one of them.

Thus I think the present state of research in algorithm design misunderstands the true nature of
efficiency. The literature exhibits a dangerous trend in contemporary views of what deserves to
be published.

Another issue, when we come down to earth, is the efficiency of algorithms on real computers.
As part of the Stanford GraphBase project I implemented four algorithms to compute minimum
spanning trees of graphs, one of which was the very pretty method that you developed with
Cheriton and Karp. Although I was expecting your method to be the winner, because it
examines much of the data only half as often as the others, it actually came out two to three
times worse than Kruskal's venerable method. Part of the reason was poor cache interaction,
but the main cause was a large constant factor hidden by O notation.

16. Frank Ruskey, University of Victoria: Could you comment on the importance of working
on unimportant problems? My sense is that computer science research, funding, and academic
hiring is becoming more and more focused on short-term problems that have at their heart an
economic motivation. Do you agree with this assessment, is it a bad trend, and do you see a
way to mitigate it?

Similarly, could you comment on the demise of the individual researcher? So many papers that
I see published these days have multiple authors. Five-author papers are routine. But when I
dig into the details it seems that often only one or two have contributed the fresh ideas; the
others are there because they are supervisors, or financial contributors, or whatever. I'm pretty
sure that Euler didn't publish any papers with five co-authors. What is the reason for this trend,
how does it interfere with trying to establish a history of ideas, and what can be done to reverse
it?

Don Knuth: I was afraid somebody was going to ask a question related to economics. I've
never understood anything about that subject. I don't know why people spend money to buy
things. I'm willing to believe that some economists have enough wisdom to keep the world
running some of the time, but their reasons are beyond me.

I just write books. I try to tell stories that seem to be important, at least for geeks. I've never
bothered to think about marketing, or about what might sell, except when my publishers ask me
to answer questions as I'm doing now!

Three years ago I published Selected Papers on Fun and Games, a 750-page book that is
entirely devoted to unimportant problems. In many ways the fact that I was able to live during a

8 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

time in the history of the world when such a book could be written has given me even more
satisfaction than I get when seeing the currently healthy state of TAOCP.

I've reached an age where I can fairly be described as a "grumpy old man," and perhaps that is
why I strongly share your concern for the alarming trends that you bring up. I'm profoundly
upset when people rate the quality of my work by measuring the extent to which it affects Wall
Street.

Everybody seems to understand that astronomers do astronomy because astronomy is


interesting. Why don't they understand that I do computer science because computer science is
interesting? And that I'd do it regardless of whether or not it made money for anybody? The
reason is probably that not everybody is a geek.

Regarding joint authorship, you are surely right about Euler in the 18th century. In fact I can't
think of any two-author papers in mathematics, until Hardy and Littlewood began working
together at the beginning of the 20th century.

In my own case, two of my earliest papers were joint because the other authors did the theory
and I wrote computer programs to validate it. Two other papers were related to the ALGOL
language, and done together with ACM committees. In a number of others, written while I was
at Caltech, I did the theory and my student co-authors wrote computer programs to validate it.
There was one paper with Mike Garey, Ron Graham, and David Johnson, in which they did the
theory and my role was to explain what they did. You and I wrote a joint paper in 2004, related
to recursive coroutines, in which we shared equally.

The phenomenon of hyperauthorship still hasn't infected computer science as much as it has
hit physics and biology, where I've read that Thomson-Reuters indexed more than 200 papers
having 1,000 authors or more, in a single recent year! When I cite a paper in TAOCP, I like to
mention all of the authors, and to give their full names in the index. That policy will become
impossible if CS publication practices follow in the footsteps of those fields.

Collaborative work is exhilarating, and it's wonderful when new results are obtained that
wouldn't have been discovered by individuals working alone. But as you say, authors should be
authors, not hangers-on.

You mention the history of ideas. To me the method of discovery tends to be more important
than the identification of the discoverers. Still, credit should be given where credit is due;
conversely, credit shouldn't be given where credit isn't due.

I suppose the multiple-author anomalies are largely due to poor policies related to financial
rewards. Unenlightened administrators seem to base salaries and promotions on publication
counts.

What can we do? As I say, I'm incompetent to deal with economics. I've gone through life
refusing to go along with a crowd, and bucking trends with which I disagree. I've often declined
to have my name added to a paper. But I suppose I've had a sheltered existence; young people
may be forced to bow to peer pressure.

17. Andrew Binstock, Dr. Dobb's: At the ACM Turing Centennial in 2012, you stated that you
were becoming convinced that P = N P. Would you be kind enough to explain your current
thinking on this question, how you came to it, and whether this growing conviction came as a
surprise to you?

Don Knuth: As you say, I've come to believe that P = N P, namely that there does exist an
integer M and an algorithm that will solve every n-bit problem belonging to the class N P in
nM elementary steps.

Some of my reasoning is admittedly naïve: It's hard to believe that P ≠ N P and that so many
brilliant people have failed to discover why. On the other hand if you imagine a number M that's
finite but incredibly large—like say the number 10 3 discussed in my paper on "coping
with finiteness"—then there's a humongous number of possible algorithms that do nM bitwise or
addition or shift operations on n given bits, and it's really hard to believe that all of those
algorithms fail.

My main point, however, is that I don't believe that the equality P = N P will turn out to be helpful

9 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

even if it is proved, because such a proof will almost surely be nonconstructive. Although I think
M probably exists, I also think human beings will never know such a value. I even suspect that
nobody will even know an upper bound on M.

Mathematics is full of examples where something is proved to exist, yet the proof tells us
nothing about how to find it. Knowledge of the mere existence of an algorithm is completely
different from the knowledge of an actual algorithm.

For example, RSA cryptography relies on the fact that one party knows the factors of a number,
but the other party knows only that factors exist. Another example is that the game of N × N
Hex has a winning strategy for the first player, for all N. John Nash found a beautiful and
extremely simple proof of this theorem in 1952. But Wikipedia tells me that such a strategy is
still unknown when N = 9, despite many attempts. I can't believe anyone will ever know it when
N is 100.

More to the point, Robertson and Seymour have proved a famous theorem in graph theory: Any
class of graphs that is closed under taking minors has a finite number of minor-minimal
graphs. (A minor of a graph is any graph obtainable by deleting vertices, deleting edges, or
shrinking edges to a point. A minor-minimal graph H for is a graph whose smaller minors all
belong to although H itself doesn't.) Therefore there exists a polynomial-time algorithm to
decide whether or not a given graph belongs to : The algorithm checks that G doesn't contain
any of 's minor-minimal graphs as a minor.

But we don't know what that algorithm is, except for a few special classes , because the set
of minor-minimal graphs is often unknown. The algorithm exists, but it's not known to be
discoverable in finite time.

This consequence of Robertson and Seymour's theorem definitely surprised me, when I
learned about it while reading a paper by Lovász. And it tipped the balance, in my mind, toward
the hypothesis that P = N P.

The moral is that people should distinguish between known (or knowable) polynomial-time
algorithms and arbitrary polynomial-time algorithms. People might never be able to implement a
polynomial-time-worst-case algorithm for satisfiability, even though P happens to equal N P.

18. Jeffrey O. Shallit, University of Waterloo: Decision methods, automated theorem-


proving, and proof assistants have been successful in a number of different areas: the
Wilf-Zeilberger method for combinatorial identities and the Robbins conjecture, to name two.
What do you think theorem discovery and proof will look like in 100 years? Rather like today, or
much more automated?

Don Knuth: Besides economics, I was also afraid that somebody would ask me about the
future, because I'm a notoriously bad prophet. I'll take a shot at your question anyway.

Assuming 100 years of sustainable civilization, I'm fairly sure that a large percentage of
theorems (maybe even 38.1966%) will be discovered with computer aid, and that a nontrivial
percentage (maybe 0.7297%) will have computer-verified proofs that cannot be understood by
mortals.

In my Ph.D. thesis (1963), I looked at computer-generated examples of small finite projective


planes, and used that data to construct infinitely many planes of a kind never before known.
Ten years later, I discovered the so-called Knuth-Morris-Pratt algorithm by studying the way one
of Steve Cook's automata was able to recognize concatenated palindromes in linear time. Such
investigations are fun.

A few months ago, however, I tried unsuccessfully to do a similar thing. I had a 5,000-step
mechanically discovered proof that the edges of a smallish flower snark graph cannot be
3-colored, and I wanted to psych out how the machine had come up with it. Although I gave up
after a couple of days, I do think it would be possible to devise new tools for the study of
computer proofs in order to identify the "aha moments" therein.

In February of this year I noticed that the calculation of an Erdős-discrepancy constant—made


famous by Tim Gowers' Polymath project, in which many mathematicians collaborated via the
Internet—makes an instructive benchmark for satisfiability-testing algorithms. My first attempt to
compute it needed 49 hours of computer time. Two weeks later I'd cut that down to less than 2

10 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

hours, but there still were 20 million steps in the proof. I see no way at present for human
beings to understand more than the first few thousand of those steps.

19. Scott Aaronson, MIT: Would you recommend to other scientists to abandon the use of
email, as you have done?

Don Knuth: My own situation is unusual, because I do my best work when I'm not interrupted. I
eat, sleep, and write content, more-or-less as a recluse who spends considerable time reading
archives and other people's code. As I say on my home page, most people need to keep on top
of things, but my role is to get to the bottom of things.

So I don't recommend a no-email policy to people who thrive on communication. And I actually
take advantage of others in this respect (either shamelessly or shamefully, I'm not sure which),
by pestering them with random questions, even though I don't want anybody to pester
me—except about the one topic that I happen to be zooming in on at any particular time.

I do welcome email that reports bugs in TAOCP, because I always try to correct them as soon
as possible.

Other unsolicited messages go to the bit bucket in the sky, otherwise known as /dev/null.

20. J. H. Quick, blogger: Why is this multi-interview called "twenty questions," when only 19
questions were asked?

Don Knuth: I'm stumped. No, wait—Radia asked two.

Incidentally, the eVolumes of TAOCP contain some 4,500 questions, and almost as many
answers.

You might also like:


Art of Computer Programming, Volume 1: Fundamental Algorithms:
Fundamental Algorithms, 3rd Edition
By Donald E. Knuth

Learn More

Art of Computer Programming, Volume 2: Seminumerical Algorithms, 3rd


Edition

By Donald E. Knuth

Learn More

Art of Computer Programming, Volume 3: Sorting and Searching, 2nd


Edition

By Donald E. Knuth

Learn More

+ Share This 🔖 Save To Your Account Page 1 of 1

Discussions

11 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

Comments Community  Login

Sort by Oldest Share ⤤ Favorite ★

Join the discussion…

Vladislav Zorov • 8 months ago


Wow... Knuth is so humble, I almost believed he is a
normal human being, just like the rest of us! He
even has normal human problems like software
crashes (of course not in HIS software, but still -
you'd think bugs would be afraid to manifest
themselves on his PC).
19 • Reply • Share ›

William • 8 months ago


"I'm not strong on logic" - Don Knuth

We caught him, guys!


2 • Reply • Share ›

William P. Riley-Land • 7 months ago


Knuth is a fascinating guy. This is a great interview.
• Reply • Share ›

Kyle Cranmer • 7 months ago


love it. Specifically "This consequence of Robertson
and Seymour's theorem definitely surprised me,
when I learned about it while reading a paper by
Lovász. And it tipped the balance, in my mind,
toward the hypothesis that P = N P."
• Reply • Share ›

Oscar Riveros • 6 months ago


P = NP, The Collapse of Hierarchies
https://www.academia.edu/75180...
Exponential Algorithms can be replaced by
FUNCTIONS ...

Ω (log n) vs. EXPTIME,

here some examples https://twitter.com/maxtuno


Read the paper!!!
• Reply • Share ›

Spike • 2 months ago

12 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...

13 of 13 01/07/2015 01:54 PM
INFORMATION AND CONTROL 8, 6 0 7 - 6 3 9 (1965)

On the Translation of Languages from Left to Right


DONALD E. KNUTtt
Mathematics Department, California Institute of Technology, Pasadena, California

There has been much recent interest in languages whose grammar


is sufficiently simple that an efficient left-to-right parsing algorithm
can be mechanically produced from the grammar. In this paper, we
define LR(k) grammars, which are perhaps the most general ones
of this type, and they provide the basis for understanding all of the
special tricks which have been used in the construction of parsing
algorithms for languages with simple structure, e.g. algebraic lan-
guages. We give algorithms for deciding if a given grammar satisfies
the LR (k) condition, for given k, and also give methods for generating
recognizers for LR(k) grammars. It is shown that the problem of
whether or not a grammar is LR(k) for some k is undecidable, and the
paper concludes by establishing various connections between LR(k)
grammars and deterministic languages. In particular, the LR(]c) con-
dition is a natural analogue, for grammars, of the deterministic
condition, for languages.

I. INTI~ODUCTION AND DEFINITIONS


T h e w o r d " l a n g u a g e " will be u s e d here to d e n o t e a set of c h a r a c t e r
strings w h i c h has been v a r i o u s l y called a context free language, a (simple)
phrase structure language, a constituent-structure language, a definable set,
a B N F language, a Chomsky type 2 (or type 4) language, a push-down
automaton language, etc. S u c h languages h a v e a r o u s e d wide i n t e r e s t
because t h e y serve as a p p r o x i m a t e models for n a t u r a l languages a n d
c o m p u t e r p r o g r a m m i n g languages, a m o n g others. I n this p a p e r we single
o u t an i m p o r t a n t class of l a n g u a g e s wl~fich will be called translatable from
left to right; this m e a n s if we r e a d t h e c h a r a c t e r s of a string f r o m left to
right, a n d l o o k a given finite n u m b e r of c h a r a c t e r s a h e a d , we are able to
p a r s e t h e given string w i t h o u t ever b a c k i n g u p to consider a p r e v i o u s
decision. S u c h l a n g u a g e s are p a r t i c u l a r l y i m p o r t a n t in t h e case of com-
p u t e r p r o g r a m m i n g , since this c o n d i t i o n m e a n s a p a r s i n g a l g o r i t h m can
be m e c h a n i c a l l y c o n s t r u c t e d w h i c h requires an execution t i m e at w o r s t
p r o p o r t i o n a l to t h e l e n g t h of t h e s t r i n g being p a r s e d . S p e c i a l - p u r p o s e
607
608 KNUTI-I

methods for translating computer languages (for example, the well-


known precedence algorithm, see Floyd (1963)) are based on the fact
that the languages being considered have a simple left-to-right structure.
By considering all languages t h a t are translatable from left to right, we
are able to study all of these special techniques in their most general
framework, and to find for a given language and g r a m m a r the " b e s t
possible" way to translate it from left to right. The study of such lan-
guages is also of possible interest to those who are investigating h u m a n
parsing behavior, perhaps helping to explain the fact t h a t certain English
sentences are unintelligible to a listener.
Now we proceed to give precise definitions to the concepts discussed
informally above. The well-known properties of characters and strings
of characters will be assumed. We are given two disjoint sets of char-
acters, the "intermediates" I and the "terminals" T; we will use upper
case letters A, B, C , . . . to stand for intermediates, and lower case
letters a, b, c, . . . to stand for terminals, and the letters X, Y, Z will be
used to denote either intermediates or terminals. The letter S denotes
the "principal intermediate c h a r a c t e r " which has special significance as
explained below. Strings of characters will be denoted b y lower case
Greek letters a, fl, % • • • , and the empty string will be represented b y E.
The notation a s denotes n-fold concatenation of string a with itself;
0 n n--1
s = e, and s = s s . A production is a relation A --+ ~ where A is in
I and ~ is a string on I (J T; a grammar 9 is a set of productions.
We write ~ -~ ¢ (with respect to 9, a g r a m m a r which is usually under-
stood) if there exist strings s, ~, ~0, A such t h a t ~ = aA~, ¢ = aO~,
and A --~ ~ is a production in 9. The transitive completion of this rela-
tion is of principal importance: a ~ f~ means there exist strings
so , O L i , " * " , S n (with n > 0) for which a = s0 --~ sl --~ • " --~ ~ = ~.
Note t h a t b y this definition it is not necessarily true t h a t a ~ a; we will
write a - - > ~ to mean a = /~ or a ~ ft. A g r a m m a r is said to be circular
if ~ ~ s for some ~. (Some of this notation is more complicated than
we would need for the purposes of the present paper, but it is introduced
in this w a y in order to be compatible with t h a t used in related papers.)
The language defined by 9 is
{s [ S ~ s and s is a string over T}, (1)
namely, the set of all terminal strings derivable from S b y using the
productions of 9 as substitution rules. A sentential form is any string s
for which S ~ s.
TRANSLATION FROM LEFT TO RIGHT 609

For example, the grammar


S ---+A D , A ---* aC, B --~ bcd, C ~ B E , D --~ ~, E --~ e (2)
defines the language consisting of the single string "abcde". Any sen-
tentiM form in a grammar m a y be given at least one representation as
the leaves of a derivation tree or "parse diagram"; for example, there is
but one derivation tree for the string abcde in the grammar (2), namely,
bcd e
/
B E
\/
a C (3)
\/ /
A D
\/
S
(The root of the derivation tree is S, and the branches correspond in
an obvious manner to applications of productions.) A grammar is said
to be unambiguous if every sentential form has a unique derivation tree.
The grammar (2) is clearly unambiguous, even though there are several
different sequences of derivations possible, e.g.
S ~ A D ...+ aCD --+ a B E D ~ abcdED .-~ abcdeD ~ abcde (4)
S --~ A D -.~ A -+ aC ~ a B E --~ a b e ~ abcde (5)
In order to avoid the unimportant difference between sequences of
derivations corresponding to the same tree, we can stipulate a particular
order, such as insisting that we always substitute for the leftmost inter-
mediate (as done in (4)) or the rightmost one (as in (5)).
In practice, however, we must start with the terminal string abcde and
t r y to reconstruct the derivation leading back to S, and that changes our
outlook somewhat. Let us define the handle of a tree to be the leftmost
set of adjacent leaves forming a complete branch; in (3) the handle is
bcd. In other words, if X1, X~, • • • , Xt are the leaves of the tree (where
each Xi is an intermediate or terminal character or e), we look for the
smallest k such that the tree has the form

Y
610 KNUTtt

for some j and Y. If we consider going from a b c d e backwards to reach S,


we cap_ imagine starting with tree (3), and "pruning off" its handle;
then prune off the handle ( " e " ) of the resulting tree, and so on until
only the root S is left. This process of pruning the handle at each step
corresponds exactly to derivation (5) in reverse. The reader may easily
verify, in fact, that "handle pruning" always produces, in reverse, the
derivation obtained by replacing the r i g h t m o s t intermediate character
at each step, and this may be regarded as an alternative way to define
the concept of handle. During the pruning process, all leaves to the right
of the handle are terminals, if we begin with all terminal leaves.
We are interested in algorithms for parsing, and thus we want to be
able to recognize the handle when only the leaves of the tree are given.
Number the productions of the grammar 1, 2, . . . , s in some arbitrary
fashion. Suppose a = X 1 • • " X ~ • • • X t is a sentential form, and suppose
there is a derivation tree in which the handle is Xr+l • • • X~, obtained
by application of the pth production. (0 -<_ r =< n -< t, 1 =< p =< s.) We
will say (n, p) is a h a n d l e of a.
A grammar is said to be t r a n s l a t a b l e f r o m l e f t to r i g h t w i t h b o u n d k
(briefly, an " L R ( k ) g r a m m a r " ) under the following circumstances.
Let k > 0, and let " ~ " be a new character not in I 0 T. A/~-sentential
form is a sentential form followed by /c " ~ " characters. Let
a = X1X2 ... XnX~+I ... X~+kY1 ... Y~and¢~ = XIX2 ... X~X~+I ...
X , , + ~ Z ~ • • • Z ~ be k-sentential forms in which u >_- 0, v >= 0 and in which
none of X . + I , • "", X n + ~ , Y ~ , " • , Y ~ , Z ~ , • . . , Z ~ is an intermediate
character. If (n, p) is a handle of a and (m, q) is a handle of ~, we require
that m = n, p = q. In other words, a grammar is L R ( k ) if and only if
any handle is always uniquely determined by the string to its left and
the k terminal characters to its right.
This definition is more readily understandable if we take a particular
value of k, say/c = 1. Suppose we are constructing a derivation sequence
such as (5) in reverse, and the current string (followed by the delimiter
-~ for convenience) has the form X 1 . ' . X ~ X ~ + ~ a ~ , where the tail end
"X~+~a ~ " represents part of the string we have not yet examined; but
all possible reductions have been made at the left of the string so that
the right boundary of the handle must be at position Xr for r > n. We
want to know, by looking a t the next character X~+I, if there is in fact
a handle whose right boundary is at position X~ ; if so, we want this
handle to correspond to a unique production, so we can reduce the
string and repeat the process; if not, we know we can move to the right
TRANSLATION FROM LlgFT T O RIGHT ~11

and read a new character of the string to be translated. This process


will work if and only if the following condition ( " L R ( 1 ) " ) always holds
in the grammar: If X1X~ . . . X~X~+lo~I is a sentential form followed by
" -~ " for which all characters of X,+1o~1 are terminals or " -~ ", and if
this string has a handle (n, p) ending at position n, then all l-sentential
forms X 1 X 2 . . . X,~X~+lo~ with X~+l~o as above must have the same
handle (n, p). The definition has been phrased carefully to account for
the possibility that the handle is the empty string, which if inserted
between X~ and X~+I is regarded as having right boundary n.
This definition of an L R ( k ) grammar coincides with the intuitive
notion of translation from left to right looking k characters ahead.
Assume at some stage of translation we have made all possible reductions
to the left of Xn ; by looking at the next k characters Xn+l . . . X~+k,
we want to know if a reduction on Xr+l . . - X~ is to be made, regardless
of what follows X,+k. In an L R ( k ) grammar we are able to decide
without hesitation whether or not such a reduction should be made. If a
reduction is called for, we perform it and repeat the process; if none
should be made, we move one character to the right.
An LR(/c) grammar is clearly unambiguous, since the definition
implies every derivation tree must have the same handle, and by indue-
tion there is only one possible tree. I t is interesting to point out further-
more that nearly every grammar which is known to be unambiguous is
either an L R ( k ) grammar, or (dually) is a right-to-left translatable
grammar, or is some grammar which is translated using " b o t h ends to-
ward the middle." Thus, the L R ( k ) condition may be regarded as the most
powerful general test for nonambiguity that is now available.
When/~ is given, we will show in Section II that it is possible to decide
if a grammar is LR(/c) or not. T h e essential reason behind this that the
possible configurations of a tree below its handle may be represented by a
regular (finite automaton) language.
Several related ideas have appeared in previous literature. Lynch
(1963) considered special eases of LR(1) grammars, which he showed are
unambiguous. Paul (1962) gave a general method to construct left-to-
right parsers for certain very simple L R ( 1 ) languages. Floyd (1964a)
and Irons (1964) independently developed the notion of bounded con-
text grammars, which have the property that one knows whether or not to
reduce any sentential form aO~ousing the production A ~ 0 by examining
only a finite number of characters immediately to the left and right of 0.
Eiekel (1964) later developed an algorithm which would construct a
612 KNUTH

certain form of push-down parsing program from a bounded context


grammar, and Earley (1964) independently developed a somewhat
similar method which was applicable to a rather large number of LR (1)
languages but had several important omissions. Floyd (1964a) also
introduced the more general notion of a bounded right context grammar;
in our terminology, this is an LR(k) grammar in which one knows
whether or not Xr+1 ... X~ is the handle by examining only a given
finite number of characters immediately to the left of Xr+1, as well as
knowing Xn+'1 • • • X,,+k. At that time it seemed plausible that a bounded
right context grammar was the natural way to formalize the intuitive
notion of a grammar by which one could translate from left to right with-
out backing up or looking ahead by more than a given distance; but it
was possible to show that Earley's construction provided a parsing
method for some grammars which were not of bounded right context,
although intuitively they, should have been, and this led to the above
definition of an LR(/c) grammar (in which the entire string to the left of
X~+I is known).
I t is natural to ask if we can in fact always parse the strings corre-
sponding to an L R ( k ) grammar by going from left to right. Since there
are an infinite number of strings X1 • • • X~+k which must be used to make
a parsing decision, we might need infinite wisdom to be able to make
this decision correctly; the definition of L R ( k ) merely says a correct
decision exists for each of these infinitely many strings. But it will be
shown in Section II that only a finite number of essential possibilities
really exist.
Now we will present a few examples to illustrate these notions. Con-
sider the following two grammars:

S ---* aAc, A ---> bAb, A ---* b. (6)


S --> aAc, A --~ Abb, A ---* b. (7)
Both of these are unambiguous and they define the same language,
{ab~+lc}. Grammar (6) is not LR(/c) for any k, since given the partial
string ab m there is no information by which we can replace any b by A;
parsing must wait until the " c " has been read. On the other hand gram-
mar (7) is L R ( 0 ) , in fact it is a bounded context language; the sentential
forms are {aAb2nc} and {ab~+lc}, and to parse we must reduce a substring
ab to aA, a substring Abb to A, and a substring aAc to S. This example
shows that LR(k) is definitely a property of the grammar, not of the
TRANSLATION FRON[ LEFT TO RIGHT ~1~

language alone. The distinction between grammar and language is ex-


tremely important when semantics is being considered as well as syntax.
The grammar

S ~ aAd, S ~ bAB, A --~ cA, A --~ c, B ---+d (8)

has the sentential forms {ac~Ad} U {ac~+~d} U {bc~AB} U {bc~Ad} U


{bc~+~B} U {bc~+ld}. In the string bc'+ld, d must be replaced by B, while
in the string ac~+~d, this replacement must not be made; so the decision
depends on an unbounded number of characters to the left of d, and the
grammar is not of bounded context (nor is it translatable from right
to left). On the other hand this grammar is clearly L R ( 1 ) and in fact
it is of bounded right context since the handle is immediately known by
considering the character to its right and two characters to its left;
when the character d is considered the sentential form will have been
reduced to either aAd or bAd.
The grammar

S ~ aA, S ~ bB, A --~ cA, A .-->d, B --->cB, B ~ d (9)

is not of bounded right context, since the handle in both acid and bc~d
is " d " ; yet this grammar is certainly L R ( 0 ) . A more interesting ex-
ample is

S ~ aAc, S ~ b, A ~ aSc, A --~ b. (10)

Here the terminal strings are {a~bc~}, and the b must be reduced to S
or A according as n is even or odd. This is another LR(0) grammar
which fails to be of bounded right context.
In Section I I I we will give further examples and will discuss the
relevance of these concepts to the grammar for ALGOL 60. Section IV
contains a proof that the existence of k, such that a given grammar is
L R ( k ) , is recursively undecidable.
Ginsburg and Greibach (1965) have defined the notion of a deter-
ministic language; we show in Section V that these are precisely the
languages for which there exists an L R ( k ) grammar, and thereby we
obtain a number of interesting consequences.
II. ANALYSIS OF LR(k) GRAMMARS
Given a grammar ~ and an integer k => 0, we will now give two ways
to test whether ,q is L R ( k ) or not. We may assume as usual that ~ does
614 KNUTH

not contain useless productions, i.e., for any A in I there are terminal
strings ~, f, ~ such that S - > a A ' , / ~ aft'/.
The first method of testing is to construct another grammar ~ which
reflects all possible configurations of a handle and k characters to its
right. The intermediate symbols of ~ will be [A; a], where a is a k-letter
string on T U { ~ } ; and also [p], where p is the number of production in
9. The terminal symbols of ~ will be I U T U { -~}.
For convenience we define Hk(a) to be the set of all k-letter strings f
over T U { -~} such that a - > ¢~-/with respect to @ for some v; this is
the set of all possible initial strings of length k derivable from a.
Let the pth production of ~ be

A~ -~ X p l " "" X p n p , 1 5~ p ~-~ 8, T~ ~" O. (11)

We construct all productions of the following form:

[A~ ; a] --~ Xpl " " Xp(j_I)[X~ ; f] (12)

where 1 = j =< n~, X ~ is intermediate, and a, ¢~are k-letter strings over


T U { -~} with f in Hk(X~(j+I) - . . X ~ a ) . Add also the productions

[A~ ; ,~] ---, x ~ ... X~,~[p] (13)


It is now easy to see that with respect to ~,

[S; ~k] ~ X ~ . . . X~X,+~... X.+~[p] (14)

if and only if there exists a k-sentential form X ~ . . . X,~X,~+I...


X,~+~YI"" Y~ with handle (n, p) and with X~+~ . . . Y~ not inter-
mediates. Therefore by definition, ~ will be LR(k) if and only if ~ satis-
fies the following property:

[S; _~k]~ O[p] and [S; _~k]~ O~[q] implies ¢ = e and p = q. (15)
But ~ is a regular grammar, and well-known methods exist for testing
Condition (15) in regular grammars. (Basically one first transforms
so that all of its productions have the form Q~ ~ aQ], and then if Q0 =
IS; qk], one can systematically prepare a list of all pairs (i, j) such that
there exists a string a for which Qo ~ aQ~ and O0 ~ aQj .)
When k = 2, the grammar ~ corresponding to (2) is
TRANSLATION FROM LEFT TO RIGHT 615

IS; 4 4] --+[A; q q] [C; 4 41--+[B;e-~]

[S; q -t] --~ A[D; -q q] [C; 4 41-+ B[E; 4 4]

[S; 4 41 - ' ~ A D 4 4[ 1] [C; -~ -7]--+BE 4 414]


(16)
[a; < g I -+ ale; 4 g I [B; e 41 -~ be& 4 [31
[A; 4 - f l - + a C 4 412] [E; 4 q ] - - ~ e 4 4161

[D; 4 4]--+ 4 415]


It is, of course, unnecessary to list productions which cannot be reached
from [S; 4 4]. Condition (15) is immediate; one may see an intimate
connection between (16) and the tree (3).
Our second method for testing the LR(.6) condition is related to the
first b u t i t is perhaps more natural and at the same time it gives a method
for parsing the grammar @ if it is indeed LR(/c). The parsing method is
complicated by the appearance of e in the grammar, when it becomes
necessary to be very careful deciding when to insert an intermediate
symbol A corresponding to the production A --~ e. To treat this condition
properly we will define Hk'(¢) to be the same as Hk(¢) except omitting
all derivations that contain a step of the form
Ao~ --~ o),
i.e., when an intermediate as the initial character is replaced by e. This
means we are avoiding derivation trees whose handle is an empty string
at the extreme left. For example, in the grammar
S --~ BC 4 4 4, B --~ Ce, B ---÷e, C ---÷D, C ---~Dc, D ---~ e, D --~ d
we would have
Ha(S) = { 4 4 4, c4 4, ceq, cec, ced, d 4 4, dce,
de4, dec, ded, e 4 4, ec4 ,ed4, edc}
Ha'( S) = {dce, de4, dec, ded}.
As before we assume the productions of ~ are written in the form (11).
We will also change ~ by introducing a new intermediate So and adding
a "zeroth" production
So --~ S -t k (16)
616 KNUTH

and regarding So as the principal intermediate. The sentential forms are


now identical to the k-sentential forms as defined above, and this is a
decided convenience.
Our construction is based on the notion of a "state," which will be
denoted by [p, j; a]; here p is the number of a production, 0 <= j -<_ np,
and a is a k-letter string of terminals. Intuitively, we will be in state
[p, j; ~] if the partial parse so far has the form ~X~I • • • X~, and if
contains a sentential form ~A~a .-. ; that is, we have found j of the
characters needed to complete the pth production, and a is a string
which may legitimately follow the entire production if it is completed.
At any time during translation we will be in a set $ of states. There
are of course only a finite number of possible sets of states, although it is
an enormous number. Hopefully there will not be many sets of states
which can actually arise during translation. For each of these possible
sets of states we will give a 1~dle which explains what parsing step to
perform and what new set of states to enter.
During the translation process we maintain a stack, denoted by
SoX1S1X~2 " " X~$~ I Y1 . . " Y ~ . (17)
The portion to the left of the vertical line consists alternately of state
sets and characters; this represents the portion of a string which has
already been translated (with the possible exception of the handle)
and the state sets $~ we were in just after considering X1 • • • X~. To the
right of the vertical line appear the k terminal characters I11"'" Yk
which may be used to govern the translation decision, followed by a
string o~which has not yet been examined.
Initially we are in the state set C0 consisting of the single state
[0, 0; ~k], the stack to the left of the vertical line in (17) contains only
C0, and the string to be parsed (followed by -~k) appears at t h e right.
Inductively at a given stage o f translation, assume the stack contents
are given by (17) and that we are in state set 8 = S~.
S t e p 1. Compute the "closure" $' of $, which is defined recursively as
the smallest set satisfying the following equation:

$' = $ [J {[q, 0; ~] I there exists [ p , j ; a] in $ ' , j < np,


(18)
X~,(s+l) = A q , and B in Hk(Xi,(~+~.) " " X p ~ ) } .

(We thus have added to $ all productions we might begin to work on,
in addition to those we are already working on.)
T R A N S L A T I O N FROM L E F T TO R I G H T 617

S t e p 2. C o m p u t e the following sets of k-letter strings:


!
Z = {~ ] there exists [p, j; a] in 6, j < np,
(19)
in H k ' ( X p ( j + l ) . . " Xp,~pa)}
Zp = {a I[P, np ; a] in $'}, 0 = p < s. (20)
Z represents all strings Y1 "'" Yk for which the handle does n o t appear
on the stack, and Zp represents all for which the pth production should
be used to reduce the stack. Therefore, Z , Zo , • • • , Z~ m u s t all be d i s j o i n t
sets, or the g r a m m a r is not L R ( k ) . These formulas and remarks are
meaningful even when k = 0. Assuming the Z ' s are disjoint, Y1 " ' " Yk
must lie in one of them, or else an error has occurred. If Y1 "'" Yk lies
in Z, shift the entire stack left:
$0X151 . - - g~Yll Y~ " ' " Y~e
and rename its contents by letting Xn+~ = Y~, Y~ = Y2, " "" :

80X151 " - S~X~+I I Y1 "'" Y ~ '


and go on to Step 3. If Y~ • .- Yk lies in Zp, let r = n - n~ ; the stack
now contains X~+~ • • • X~, equalling the righthand side of production p.
Replace the stuck contents (17) by

goX~S~ . . . X ~ % A p ] Y 1 . . . Yk¢o (21)


and let n = r, Xn+~ = A p . (Notice t h a t obvious notational conventions
have been used here to deal with e m p t y strings; we have 0 ~ r =<_ n.
If n~ = 0, i.e. if the righthand side of production p is empty, we have
just i n c r e a s e d the stack size b y going from (17) to (21), otherwise the
stack has gotten sm~ller.)
S t e p 3. The stack now has the form

~DX151 " ' " XnSnXn..kl [ Y 1 . . . YkO2. (22)

C o m p u t e & ' by Eq. (18) and then compute the new set &~+~as follows:

&~+~ = {[p, j q- 1; a l l [p, j; a] in S,~' and X~+I = X~o.+~)}. (23)


This is the state set into which we now advance; we insert S~+~ into
the stack (22) just to the left of the vertical line and return to Step 1,
with $ = $~+~ and with n increased by one. However, if $ now equals
[0, 1 ; qr~] and Y1 • • • Yk = -~k, the parsing is complete.
This completes the construction of a parsing method. In order to
618 XNUTrI

properly take care of the most general case, this method is necessarily
complicated, for all of the relevant information must be saved. The
structure of this general method should shed some light on the im-
portant special cases which arise when the LR(k) grammar is of a simpler
type.
We will not give a formal proof that this parsing method works, since
the reader may easily verify that each step preserves the assertions we
made about the state sets and the stack. The construction of all possible
state sets that can arise will terminate since there are finitely many of
these. The grammar will be LR(k) unless the Z sets of Eqs. (19)-(20)
are not disjoint for some possible state set. The parsing method just
described will terminate since any string in the language has a finite deri-
vation, and each execution of Step 2 either finds a step in the derivation
or reduces the length of string not yet examined.
III. EXAMPLES
Now let us give three examples of applications to some nontrivial
languages. Consider first the grammar

S ~ ~, S --~ aAbS, S ~ bBaS, (24)


A --~ ~, 4 ~ aAbA, B ~ e, B ---* bBaB
whose terminal strings are just the set of all strings on {a, b} having exactly
the same total number of a's and b's. There is reason to believe (24) is
the briefest possible unambiguous grammar for this language. We will
prove it is unambiguous by showing it is LR(1), using the first construe-
tion in Section II. The grammar ~ will be
[z; q]
IS;-~]--->a[A;b], IS; -~]---*aAb[S; ~], IS; -~]---+aAbS-~[2]

[S; -~]----~b[B;a], [S;-~]---~5Ba[S;-~], IS; -~]---+bBaS-~[3]

[A; b] --~ 5[4]


[A ; 5] ~ a[A; b], [A ; b] ~ aAb[A ; b], [A ; b] ~ aAbAb[5]

[B;a] --~ a[6]


[B; a] --> b[B; a], [B, a] ~ bBa[B; a], [B; a] --~ bBaBa[7]
TRANSLATION FROM LEFT TO RIGHT 619

The strings entering into condition (15) are therefore


(aAb, bBa),~ [1], (aAb, bBa),aAbS~ [2], (aAb, bBa),bBaS~ [3]
(aAb, bBa),a(a, aAb),b[4], (aAb, bBa),a(a, aAb),aAbAb[5]
(aAb, bBa)*b(b, bBa),a[6], (aAb, bBa)*b(b, bBa),bBaBa[7].
Here (a, f~), denotes the set of all strings which can be formed by con-
catenation of a and ~; dearly condition (15) is met.
Our second example is quite interesting. Consider first the set of all
strings obtainable by fully parenthesizing algebraic expressions involving
the letter a and the binary operation + :
S ~ a, S--+ (S -~ S) (25)
where in this grammar "('% "-~ ", a n d " ) " denote terminals. Given any
such string we will perform the following acts of sabotage:
(i) All plus signs will be erased.
(ii) All parentheses appearing at the extreme left or extreme right
will be erased.
(iii) Both left and right parentheses,will be replaced by the letter b.
Question: After all these changes, is it still possible to recreate the
original string? The answer is, surprisingly, yes; it is not hard to see
that this question is equivalent to being able to parse any terminal string
of the following grammar unambiguously:
Production ~ Production Production # Production
0 S --*Bq
1 B ----~a 2 B -->LR
3 L ---~a 4 L ---~LNb (26)
5 R ---+a 6 R ---~bNR
7 N --~ a 8 N --* bNNb
Here B, L, R, N denote the sets of strings formed from ( 2 5 ) with altera-
tions (i) and (iii) performed, and with parentheses removed from
both ends, the left end, the right end, or neither end, respectively.
It is not in,mediately obvious that grammar (26) is unambiguous,
nor is it immediately clear how one could design an efficient parsing
algorithm for it. The second construction of Section II shows however
that (26) is an L R ( 1 ) granm~ar, and it also gives us a parsing method.
Table I shows the details, using an abbreviated notation.
620 KNUTH

In Table I, the symbol 21-~ stands for the state [2, 1; ~ ], and 4lab
stands for two states [4, 1; a] and [4, 1; b]. "Shift" means "perform the
shift left operation" mentioned in step 2; "reduce p " means "perform
the transformation (21) with production p." The first lines of Table I
TABLE I
~ARSING METHOD FOR GRAMMAR (26)

Additional states If X~+I then go to


State set 8 in $~ If Y1 is then is

004 10~ 204 30ab 40ab a shift B 014


a 114 3lab
L 214 4lab

01~ 4 stop

114 3lab 4 reduce 1


a, b reduce 3

214 4lab 504 604 70b 80b a, b shift R 224


N 42ab
a 514 71b
b 614 81b

224 4 reduce 2

42ab b shift 43ab

51~ 7lab 4 reduce 5


a, b reduce 7

61~ 8lab 70ab 80ab a, b shift N 614 82ab


a 7lab
b 8lab

43ab a, b reduce 4

624 82ab 504 604 70b 80b a, b shift R 63~


N 84ab
a 514 71b
b 614 81b

634 4 reduce 6

84ab a, b reduce 8
TRANSLATION FROM LEFT TO R I G H T 621

are formed as follows: Given the initial state $ = {004} , we. m u s t form
S' according to Eq. (18). Since X01 = B and X02 = 4 we must include
10 4 and 20 4 in $'. Since X21 = L and X~2 = R we must:include 30ab;
40ab in $ ' ( a and b being the possible initial characters o f R 4 ). Since
X41 = L and X4~ = N we must, similarly, include 30ab and 40ab in 8';
but these have already been included, and so 8' is completely deter-
mined. Now Z = {a} in this case, so the only possibility i n s t e p 2 is to
have Yi = a and shift. Step 3 is more interesting; if we ever get to
Step 3 with $~ = $ (this includes later events when a reduction (21) has
been performed) there are three possibilities for X,~+i. These are de-
termined by the seven states in St, and the righthand column is merely
an application of Eq. (23).
An important shortcut has been taken in Table I. Although it is
possible to go into the state set "514 71b", we have no entry for that
set; this happens because 51471b is contained i n 51471ab. A procedure
for a given state set must be valid for any of its subsets. (This implies less
error detection in Step 2, but we will soon justify that.) It is often
possible to take the union of several state sets for which the parsing
action does not conflict, thereby considerably shortening the parsing
algorithm generated by the construction of Section II.
When only one possibility occurs in Step 2 there is no need to test
the validity of Yi • • • Yk ; for example in Table I line 1 there is no need
to make sure Y~ = a. One need do no error detection until an attempt
to shift Y~ = ~ left of the vertical line occurs. At this point the stack
will contain "$oS8i[ 4 k'' if and only if the input string was well-
formed; for we know a well-formed string will be parsed, and (by defini-
tion!) a malformed string cannot possibly be reduced to " S 4 ~'' by
applying the productions in reverse. Thus, any or all error detection
m a y be saved until the end. (When k = 0, 4 must be appended at the
right in order to do this delayed error check.)
One could hardly write a paper about parsing without considering the
traditional example of arithmetic expressions. The following grammar is
typical:

Production /~ Production Production ~ Production


0 S-.E~ 4 T--~P
1 E--~-T 5 T--~T.P (27)
2 E--~T 6 P--~a
•~ E - ~ E -- T 7 P ~ (E)
622 KNUTH

This grammar has the terminal alphabet {a, - , . , (,), 4 }; for example,
the string " a -- ( - - a . a - a) 4 " belongs to the language. Table II shows
how our construction would produce a parsing method. In line 10, the
notation "4, 5, 6" appearing in the X column means rules 4, 5, and 6
apply to this state set also. Such "factoring" of rules is another way to
simplify t h e parsing routine produced by our construction, and the
reader will undoubtedly see other ways to simplify Table II.
By means of our construction it is possible to determine exactly what
information about the string being parsed is known at any given time.
Because of this detailed knowledge, it will be possible to study how much
of the information is not really essential (i.e., how much is redundant)
and thereby determine the "best possible" parsing method for a gram-
mar, in some sense. The two simplifications already mentioned (delayed
error ehecldng, taking unions of compatible state sets) are simplifications
of this ldnd, and more study is needed to analyze this problem further.
In many eases it will not be necessary to store the state sets $~ in the
stack, since the states Sr which are used in the latter part of Step 2 can
often be determined by examining a few of the X's at the top of the
stack. Indeed, this will always be true if we have a bounded right con-
text grammar, as defined in Section I. Both grammars (26) and (27)
are of bounded context.
From Table I we can see how to recover the necessary state set in-
formation without storing it in the stack. We need only consider those
state sets which have at least one intermediate character in the " X ~ + I "
column for otherwise the state set is never used by the parser. Then it is
immediately clear from Table I that {004} is always at the bottom of
the stack, {214 , 4lab} is always to the right of L, {614,8lab} is always
to the right of b, and {624, 82ab} is always to the right of N.
Grammar ( 2 7 ) is related to the definition of arithmetic expressions in
the ALGOL 60 language, and it is natural to ask whether ALGOL 60 is
an LR(k) language. The answer is a little difficult because the definition
of this language (see Naur (1963)) is not done completely in terms of
productions; there are "comment conventions" and occasional informal
explanations. The grammar cannot be LR(k) because it has a number
of syntactic ambiguities; for example, we have the production
(open string} --+ (open string} (open string}
which is always ambiguous. Another type of ambiguity arises in the
parsing of (identifier) as (actual parameter}. There are eight ways to do
T A B L E II
]~ARSING METHOD FOR GRAMMAR (2,7)

S t a t e set S A d d i t i o n a l s t a t e s in $ Y~ Step 2 action Rule # X,~ Go t o

00q 7 1 ~ ) - * 10~)-- 20q)-- 30~)-- -(a shift E . 01t 7 2 t ) - - * 31~) --


404)--* 50~)--, 114)--
60q)--, 70q)--, 2' 21q)- 5 1 q - ,
P 414)--*
a 61q)--*
( 71~)--*

01t 72~)--* 314)-- q stop 7 ) 734)-*


) -- shift 8 324)-
11~)~ 40q)--, 50~)--, 9 T 12q)- 5 1 ~ ) - ,
60q)--, 70~)--, 4, 5, 6
o
21q)-- 514)--* • shift 10 • 52~)-,
q ) -- reduce 2

32q)-- 40~)--, 50q)--, 11 T 33q)- 51q)-, ©


60~)--, 70~)--. 4, 5, 6

12d)- 51~)-* • shift 12 • 52q)-,


) - reduce 1

52~)-* 60~)-* 70q)-* (a shift 13 P 53~ ) - •


5, 6

33q ) -- 514) -- * * shift 14 52q ) - *


) -- reduce 3
bD
pn~,X X reduce p
624 KNUTH

this:
(actual parameter} --~ (array identifier} --~ (identifier}
(actual parameter --~ (switch identifier} --~ (identifier)
(actual parameter --* (procedure identifier} --* (identifier}
(actual parameter -+ (expression} --~ (designational expression}
(identifier}
(actual parameter (expression} --~ (Boolean expression}
(variable} ~ (identifier}
(actual parameter --~ (expression} --~ (Boolean expression}
(function designator) ~ (identifier}
: (actual parameter --~ (expression} --~ (arithmetic expression}
(variable} ~ (identifier}
(actual parameter} --* (expression} --+ (arithmetic expression}
(function designator) ~ (identifier}
These syntactic ambiguities reflect bona fide semantic ambiguities,
if the identifier in question is a formal parameter to a procedure, for it is
then impossible to determine what sort of identifier will be the actual
arg~lment in the absence of specifications. At the time the ALGOL 60
report was written, of course, the whole question of syntactic ambiguity
was just emerging, and the authors of that document naturally made
little attempt to avoid such ambiguities. In fact, the differentiation
between array identifiers, switch identifiers, etc. in this example was done
intentionally, to provide explanation along with the syntax (referring
to identifiers which have been declared in a certain way). In view of this,
a ninth alternative
(actual parameter) --~ (string} --* (formal parameter} --* (identifier)
might also have been included in the ALGOL 60 syntax (since section
4.7.5.1 specifically allows formal parameters whose actual parameter is a
string to be used as actual parameters, and this event is not reflected in
any of the eight possibilities above). The omission of this ninth alterna-
tive is significant, since it indicates the philosophy of the ALGOL 60 re-
TRANSLATION FRCM LEFT TO RIGHT ~5

port towards formal parameters: they are to be conceptually replaced by


the actual parameters before rules of syntax are employed.
At any rate when parsing is considered it is desirable to have an
unambiguous syntax, and it seems clear that with little trouble one
could redefine the syntax of ALGOL 60 so that we would have an LR(1)
grammar for the same language.
By the " A L G O L 60 language" we mean the set of strings meeting
the syntax for ALGOL 60, not necessarily satisfying any semantical
restrictions. For example,
begin array x[100000: 0]; y :~- z/O end
would be regarded as a string in the ALGOL 60 language.
It is interesting to observe that it might be impossible to define
ALGOL 60 using an RL(k) grammar (where by RL(k) we mean "trans-
latable from right to left," defined dually to LR(k)). Several features
of that language make it most suited to a left-to-right reading; for ex-
ample, going from right to left, note that the basic symbol comment
radically affects the parsing of the characters to its right. A similar
language, for which some LR(k) grammars but no RL(k) grammars
exist, is considered in Section V of this paper; but we also will give an
example there which makes it appear possible that ALGOL 60 could be
RL(k).

IV. AN UNSOLVABLE PROBLEM


Post (1947) introduced his famous correspondence problem which has
been used to prove quite a number of linguistic questions undeeidable.
We will define here a similar unsolvable problem, and apply it to the
study of LR(k) grammars.
THE PARTIALCORRESPONDENCE PROBLEM.Let (al , ~1), (a~ , ~ ) , . . . ,
(an, ~n) be ordered pairs of nonempty strings. Do there exist, for all p > O,
ordered p-tuples of integers ( il , i~ , • • • , ip) such that the first p characters
of the string ahai2 . . . ai, are respectively equal to the first p characters
of ~ , ~ . . . ~.~
The ordinary correspondence problem asks for the existence of a
p > 0 for which the entire strings ~h "'" a~, a n d / ~ - - . / ~ are equal.
A solution to the ordinary correspondence problenl implies an affirmative
answer to the partial correspondence problem, but the general solvability
of either problem is not directly related to the solvability of the other.
There are relations between the partial correspondence problem and
626 ~NUT~

the Tag problem (see Cocke and Minsky (1964)) but no apparent simple
connection. We can, however, prove that the partial correspondence
problem is recursively unsolvable, using methods analogous to those
devised by Floyd (1964b) for dealing with the ordinary correspondence
problem and using the determinacy of Turing machines.
For this purpose, let us use the definition and notation for Turing ma-
c.hines as given in Post (1947) ; we will construct a partial correspondence
problem for any Turing machine and any initial configuration. The
characters used in our partial correspondence problem wilt be
q~SiS~hh, 1 < i <_ R, 0 <=j <-_ m.
If the initial configuration is
S i l S j ~ " " Sj~_tq~lSjk'" S~
the pair of strings
( ~, ~hSj~...S~_lqi~Sjk... Si~,h) (28)
will enter into our partial correspondence problem. We also add the
pairs
(/~, h), (h,/~), (S~., ~.), (Ss', Sj), (~ , q~), 1 <_-i --- R, 0 ~ j = m. (29)
Finally, we give pairs determined by the quadruples of the Turing ma-
chine:
Form of quadruple Corresponding pairs, 0 < t -< m:
q~S~Lq~ (hqiS~, h(tzSoSj), ( Stq~S~, q~S~Ss)
q~S~Rqz (q~Sjh, ,~J(l~Sof~), (q~SjSt, Si~zSt) (30)
qiSjSkq~ (q~Sj, (lzS~)
N o w it is easy to see that these corresponding pairs will simulate the
behavior of the Turing machine. Since the pair (28) is the only pair
having the same initial character, and since the pairs in (30) are the
only ones involving any q~ in the ]efthand string, the only possible
strings which can be initial substrings of both a~la~: .-. and
fl~fl~ . . . are initial substrings of
, ~-aO~la~a~&~a~ "" , (31 )
where no, m , a~, etc. represent the successive stages of the Turing
machine's tape (with h's placed at either end, and where ~ is an obvious
TRANSLATION FROM LEFT TO RIGHT ~27

notation signifying the " b a r r i n g " of each letter of a). For these pairs,
therefore, the partial correspondence problem has an affirmative answer if
and only if the Turing machine never halts. And the problem of telling if a
Turing machine will ever halt is, of course, well known to be recursively
unsolvable.
We will apply this result to L R ( k ) grammars as follows:
T~EOREM. The problem of deciding, for a given grammar ~, whether or
not there exists a k ~ 0 such that ~ is L R ( k ) , is recursively unsolvable.
This theorem is in contrast to the results of Section II, where we
showed the problem to be solvable when k is also given. To prove this
theorem we will reduce the partial correspondence problem to the L R ( k )
problem for a particular class of grammars.
Let ( a l , ill), "" • , (a,~, ft.) be pairs of strings entering into the partial
correspondence problem, and let
X1X2 " " X~ +
be n + 1 characters distinct from those appearing among the a's and
3's. Let ~ be the following grammar:
S - - ~ A , S---~ B, A -+ X i + o~i , B - ~ X I + fli
(32)
A --+ X i A o ~ i , B --> X i B f l i , ] ~- i <~ n .

The sentential forms are


{X,,~ . . . X , , A a q . . . a,,~} U {X,,~ . . . X,xBfl,~ . . . fl,,,}

O {X,m "'" X i l --~ (~il "'" C~im} O {Xim "'" X i 1 ~- ~,1 "'" ~,m}:
We will show @ is LR(tc) for some k if and only if the partiM corre-
spondence problem has a negative answer. If the answer is affirmative,
for every p we have sentential forms X 9 . . . X{, + a~ . . . a ~ , X{. .- •
X q + fl~ • • • fl~ in which the first p characters following " + " agree.
The handle must include the " + " sign, but the p - q characters following
the handle do not tell us whether the production A --+ Xi, + a~ or
B --+ X~I + fi~ is to be applied, if q is the maximum length of the
strings a~, fl~. Hence the grammar is not LR(q). On the other hand, if
the answer to the partial correspondence problem is negative, there is
a p for which, knowing (ix, ".- , i,~i~(~.o) and the first p characters
of aqai~ - " ai, ~ ~ or fli,fl~ "'" flit q ~, we can distinguish whether it
is a string of a's or a string of fl's, and therefore @ is in fact a bounded
context grammar.
628 KNUTH

We have proved slightly more, answering a question posed by Floyd


(1964a, p. 66):
T~EOgEM. The problems of deciding whether a given grammar (i) has
bounded context, or (ii) has bounded right context, are recursively un-
solvable.
These theorems could be sharpened in the usual ways to show that we
can assume the grammar ~ is unambiguous, linear, has at most two
terminals, and has either a bounded number of productions or a bounded
length of string in a production, and can still prove the problem to be
unsolvable.
V. CONNECTIONS WITH DETERMINISTIC LANGUAGES
Ginsburg and Greibach (1965) define a deterministic language as one
which is accepted by a so-cMled deterministic push-down automaton
(DPDA). The latter is a device which has a finite number of states
qo, ql, q2, "'" q, ~nd which manipulates strings of characters in two
alphabets T and I, according to the production rules of the following
two types:
Aq~ --) Oqj (33)
Aq~a --~ Oqi (34)
Here A and a are single characters in I and T, respectively, and 0 is
any string over I. When A is the special character ~ we require ~ to be a
nonempty string whose initial character is ~. For each pair Aq~, where
A is in I and 0 <= i _< r, we stipulate there is either a unique rule of
type (33) and none of type (34), or there are no rules of type (33) and
at most one of type (34) for each a in T. Some of the states are desig-
nated as "final states", and the terminal string a is accepted by the
D P D A if and only if ~ q0a --> ~ ~qi for some final state ql and some
string ~o. Here the relation " ~ > " is generated from "--~" as in Section I.
THEORFZ~. I f ~ is an LR(k) grammar, and if 9 defines the language L,
there is a D P D A which accepts the language L ~ ~.
The Second construction of Section II is in fact closely related to a
DPDA. The grammar 9 augmented by production (16) defines the
language L ~ k. To construct such a D P D A we will take as our states, ql,
terminal k-letter strings [ Y I " ' " Yk], and there will also be various
auxiliary states. The terminal Mphabet for the D P D A will be T [J { -~/
and the intermediate alphabet will be {8} U I [J T U { ~}. We want our
TRANSLATION FROM LEFT TO RIGHT 629

DPDA to arrive at the configuration


~-$0Xlg, . . . X,,g~[Y1 -,. Y~]co (35)
if and only if the stack in the parsing algorithm of Section II is
" 8 o X i ~ ".. X,~$,~ I Y* "'" Ykc~" at a corresponding stage of the calcula-
tion.
Clearly we can construct productions of form (34) which read the
first k characters of our input string I/1 "" • Yko0and get us to the initial
configuration ~{[0, 0; q k ] } [ y , . . . Y~]~o. Now assume the D P D A has
arrived at the configuration (35); as in steps 1 and 2 of the parsing
algorithm we can compute the sets Z and Z~. If Y1 "'" Yk is in Z, we
create instructions of the form (34)
&[Y, ". Yk]a---+ $~Ylg~+I[Y~ "'" Yka] (36)
where &+, is determined by X,~+I = !71 (or a if k = 0) in (23). If
Y~ "'" Yk is in Z~, we let q(0), q(~), . . . , q(2,~) be new auxiliary states
and write
Sn[Y1 ''' Yk] ~ &q (O)
oSq(2t) ~ q(2t+l), X r4,b,_t)q (2t-~l) " ~ q(2t+2), 0 ~ t < n~, all $. (37)
gq(2,p) --+ $A~$~+I[Y1 --- Yk], all $.
where &,+~ is determined from 8 by using (23) with g. = g, X.+~ = A~.
We make one exception to this rule, namely, if Y~ ..- Y~ = _~k and
$ = {[0, 0; -{k]}, we change the last instruction to
gq(2~p) --+ q/

where q/is the unique final state of our DPDA.


The rules (36) and (37) for all possible combinations of S~ and
[Y1 " " Yk], plus the few initial and final ones, give us a D P D A which
exactly follows the procedure of the parsing algorithm in Section II.
COROLLAn¥. / f ~ is an LR(]c) grammar and if ~ defines the language L,
there is a D P D A which accepts the language L.
For Ginsburg and Greibach (1965) have proved, among several other
interesting theorems, that if L0 is deterministic and R is regular, then
{a [a/~ in L0 for some fl in R} is deterministic. We take L0 = L _~k and
.

We now prove a converse result.


630 KNUT/-I

THEOREM. I f L is deterministic, there is an L R ( 1 ) grammar ~ which


defines L.
To prove this theorem, we want to take an arbitrary D P D A with its
instructions of the forms (33) and (34), and construct a corresponding
grammar. First it will be necessary to simplify the problem a little, and
so we will require that all of the instructions of our D P D A are of three
types:
type (i) : Aq~a --+ Aqj
type (ii): Aqi --~ q~ (38)
type (iii) : Aqi --~ ABqj
where A, B are intermediates, a is terminal. This involves no loss of
generality, since a rule (34) can be replaced by Aqia --~ Aq, Aq ---> Oq:
for some new state q, and we are left with type (i) and rules of the
form (33). The rule Aq~ --+ Oqj is of type (ii) if 0 is empty, otherwise
assume0 = A 1 A ~ . . . A t w i t h t => 1. If A1 ~ A w e h a v e A ~ ~- so we
can replace (33) by
Aq~ --~ q, Bq --* BAlq' for all intermediates B, Alq' -+ Oq:
!
where q, q are new states. Thus we may assume A -- A1, and hence
the rule (33) m a y be replaced b y a sequence of t -- 1 rules of type (iii),
introducing t -- 2 new states, provided t > 1. Finally if t = 1, the rule
Aq~ -~ Aqj may be replaced by
Aql --> AAq, Aq --~ q~
where q is a new state, thereby reducing all rules to the forms (38).
For any pair Aqi we still have the deterministic property that if more
than one rule appears with Aq~ on the left, all such rules are of type (i),
and there is at most one such rule for any particular terminal character a.
A further assumption is needed about final states. If q:, q/ are final
states (possibly identical), we want to avoid the situation
aq: ~ ~q:' (39)
since this would imply an input string would be "accepted twice" b y
the D P D A . To exclude this possibility, we double the number of states
in the D P D A , using two states q~, ~ for each original state q~. The
instructions (38) are then replaced by
type (i) Aqia ---->Aqj , A~ia ~ Aqj .
TRANSLATION FROM LEFT TO RIGHT 631

type (ii) Aq~ ~ q~ if qi is not final, Aqi ~ (ti if qi is final, A ~ --~ ~..
type (iii) Aql ~ A B q j if qi is not final, Aq~ --~ A B ~ j if q~ is final, A ~ -*
A B(l j .
One easily verifies that (39) cannot occur, and the same set of strings
is accepted; basically we get into a state ~. if the current string has been
accepted, and then we do not accept the string again, but return to an
unbarred state when the next rule of type (i) is used.
Once the D P D A has been modified to meet these assumptions, let it
have the states q0, • • • , q, ; we are ready to construct a grammar for
the language it accepts. We begin by defining the languages L~At for
0 < i, t < r and for all intermediates A of the DPDA:
L~At = {a [ Aq~a _t> Aq --+ qt for some q} (40)
where no step in the derivation represented by " - ' > " affects the A appear-
ing at the left.
Constl~ct the following productions for all rules (38) of the DPDA:
Rule P r o d u c t i o n s for

type (i) Aq~a --~ Aqi LiAr----> aLjAt, 0 =


< t =
< r.
type (ii) Aql -+ qj (41)
type (iii) Aqi ---+A B q j LiAr --+ LjB~L~t, 0 < s, t ~ r.
An easy induction based on the length of the derivation " ~ > " or the
derivation in ~ establishes the equality of the sets of strings defined in
(40) and the sets of strings derivable from LiAr using the productions
(41).
Another set of languages is also important:
L~A = {a I Aq~a ~ > Ao~q/, some string ~, some final state q/}. (42)
We construct the following further productions:
Rule P r o d u c t i o n s for

type (i) Aqla ~ Aq] L~A --~ aL~A


type (ii) Aq~ ~ qj (none) (43)
type (iii) Aq~ ~ ABq~ Li~ ~ L j , , Lia --~ LjB~L~ , 0 < s < r.
ql is final Lia --* e, all A.
Again, induction establishes the equivalence of (42) and (43). The
language derivable fi'om Lo~ using ~ is precisely the language L of the
theorem, by the definition of a DPDA.
632 KNUTH

Now remove all useless productions from ~, i.e., those which can never
appear in a derivation of a terminal string starting from L0~. We claim
the resulting grammar ~ is L R ( 1 ) . This result could be proved using
either of the constructions in Section II, where the state sets have a
rather simple form, but for purposes of exposition we will give here a
more intuitive explanation which shows the connection between the
operation of the D P D A and the parsing process.
Consider any string a-{ where a is accepted by the DPDA, and
consider the step-by-step behavior of the D P D A as it processes a. At
the same time we will be building a partial derivation tree which reflects
all of the information known at a given stage of the parse. The nodes of
this partial tree will contain symbols [i, A, .] which means that in the
only possible parsing of the string the intermediate L~at, for some t =
0, 1, . . . , r or t " b l a n k " , must fill that position. We will be " a t " some
node [i, A, *] of t h e tree, meaning this particular node below the handle
is of interest, and at the same time the D P D A will contain the con-
figuration .-. A q ~ . . . .
All of this can be clarified by considering an example, so we will con-
sider the following " r a n d o m " D P D A :
Rules of DPDA Productions of ~ (useless ones deleted)

qoa --~ ~ ql Lof- --~ aL1F-


~-ql ~ ~Aq~
Aq2a --~ Aql L2at --+ aL1At(t = 2, 5, 6), L2A --~ aLia
Aq~ ~ AAq~
L1~2 ~ Lea6L6A2 , Ll~t "--* L2a2L2.4~
A q2b --+ A q3 L2a5 -'~ bL~a~ , L2a -'-+ bL3a
Aq2c --~ Aq4 L~.~6 --~ cL4~ (44)
Aq~ ~ q~ L3A5 --~ e
Aq4 --~ q6 n4a6 -"-->e
A q6 -'* q2 LeA2 - - - o e
~ qsc --+ ~ ql Ls~ ~ c L ~
ql final
q3 final L~a ~ e
Consider the action of the D P D A when given the string aaacb-~.
We have
}-qoaaacb -~ -+ [-qlaacb -~ .-.4 ~-A q2aacb ~ --4 }-A qlacb -~ --4 }-A A q2acb -~
•-} }-AAq~cD -~ ~ }-AAAq2cD -~ --+ }-AAAq4b -~
TRANSLATION FROM LEFT TO RIGHT 633

Corresponding to these seven transitions we will build the following


partial tree, one node at a time:

c [4, A, *]
\ /
[2, A, ,]
\
a ]1, A, *]
\ /
[2, A, *] (45)
\
a [1, A, *]
\ /
[2, A, ,1
\
a [1, [-, *]
\ /
[0, ~, ,]
We are now " a t " node [4, A, *], signified by the three dots above it. At
this point the D P D A uses the rule Aq4 --* q6 and we transform the top
of tree (45) to

!i C ~'L 4a6 ii "."


....% ~ A 6 ~ ........... zE6, A,,I
a~ /[1, A,*]
(46)
[~kA,*]

(Thus, two handles are recognized and then removed from the tree.)
Then the D P D A uses the rule Aq6 --~ q2 and (46) becomes

i L L6A~i
i a... /<L,~ i ". (47)

[~A,*]

by reducing three more handles. When the rule Aq~b --~ Aq3 is next ap-
634 KNUTH

plied~ the tree becomes

b [3, A, *]
L2~2 [2, A, *]
\/
[1, A, ,]
aN,// (481
[2, A, *]
\
[1, ~, ,]
a /
\/
[o, ~, ,]
Now q3 is a final state and the next character is " ~ ", so we complete
the parsing; (48) becomes
b L3~
\/
L2~2 L2~
\/
L1A
a\ // (49)
L2A
\
a LI~
\/
Lo~
Having worked the example, we can consider the general case. Suppose
the D P D A is in t h e configuration ..- C A q i a . . . , and suppose we are
at node [i, A, .] of the tree. If q~ is a final state and a -= " ~ ", by condi-
tion (39) we must now complete the parsing, so we proceed to replace
each [i, A, ,] in the tree by L~u until the root is reached (as in going from
(48) to (49)). If q~ is not final or a ~ " -~ ", there are three cases de-
pending on the pair Aq~ :
C a s e ( i ) . The D P D A contains a rule of the form A q i a --~ A q j . Then
the only possible parse must occur by changing
TRANSLATION FROM LEFT TO RIGHT 635

from to a [j, A, *]
[i, A, ,] ~ /
[i, A, *]

as we did in changing (47) to (48).


Case (ii). The D P D A contains a rule of the form Aq~ --+ qj. Then
our tree must be changed from

\
/[ i, A,*] to i X~\ ? ~ j
X2 [il, A,,.] X2 LqA~j
x~ \[~i A2.1 X. ~./2A2,
......... .\/:i ..........................
"c,1
.

\
[i', c,,l [i', c,,]

as we did in changing from (45) to (46) and (46) to (47). Here n _= 0.


Case (iii). The D P D A contains a rule of the form Aq~ ~ A B q j . Then
the only possible parse must occur by changing
from to

[i, A, .1 [j, B, .]
\
[i, A, *]
as we did while building tree (45).
Cases (i), (ii), (iii) are mutually exclusive by the definition of DPDA,
and the arguments are justified by the fact that our tree represents all
possible productions of the grammar that could conceivably work.
Notice that in the parsing we actually have almost an LR(0) grammar
since it was necessary to look at the character following the handle only
when q~ was a final state, to see if the next character is " ~ " or not.
As a consequence of our two theorems, we find a language can be
generated by an LR(k) grammar if and only if it is deterministic, if and
only if it can be generated by an LR(1) grammar.
The theorem cannot be improved to " L R ( 0 ) grammar", since ob-
636 KNUTH

viously even the simple language {e, a} cannot be given an L R ( 0 ) gram-


mar. However, it is possible to show that the language L ~ can always
be given an L R ( 0 ) grammar; simply take the L R ( 1 ) grammar of the
second theorem, and reapply the first theorem to get another D P D A
for L 4. T h i s D P D A has only one final state qs, which leads to no
further states, so the construction of the second t h e o r e m applied to this
new grammar will be L R ( 0 ) . A deterministic language-in which no
accepted string is a proper initial substring of any other will likewise
have an L R ( 0 ) grammar.
Our last theorem shows that "deterministic" is essentially an asym-
metric property, for there are languages which are translatable from
right to left but which are not deterministic.
THEOREM. The following productions constitute an R L ( 0 ) grammar for
which the corresponding language is not deterministic:
S --* Ac, S --~ B, A -~ aAbb, A --* abb, B ~ aBb, B --~ ab. (50)
Proof: The terminal strings of this language are either anb~'e or a~b n,
where n > 0. The grammar is clearly R L ( 0 ) . On the other hand, suppose
we could find an L R ( k ) grammar for the same language. (The problem
is, of course, the appearance of " c " at the extreme right.) If we consider
the derivations of the infinitely many strings anb n we must find one in
which a recursive intermediate appears; thus, there will be an inter-
mediate C and strings a, ~, ~, ~, w such that S ~ aC~o ~ a~C~o~
a~o = anb ~ for some n. Now a~t~to~ must be in the language for all
t >_- 0, and ~ is not empty since the grammar is unambiguous. We see
therefore that ~ = a ~, ~ = b ~ for some p > 0. This implies that C cannot
n~2n
appear in the derivation of any of the strings a o c. For arbitrarily large
t, the language contains strings a~t+t~+~w = an+P%~+p~ in which, by
nonambiguity, the handle must be at least p(t -t- 1) characters from the
right and must lead to a sentential form a~t+~C~t+~o with p(t -+- 1)
characters to the right of the handle; yet the language also contains the
strings a~+P~b2('~+Pt)cwhich must not have the same handle, so the gram-
mar cannot be L R ( k ) . By the preceding theorem the language is not
deterministic in the left-to-right sense.
When this paper was being prepared, an attempt was made to show
that the language {a~b~}d U-(a, b)*c cannot be given an L R ( k ) grammar.
Although this seemed plausible at first, t h e following grammar actually
does work: . . . . . . . . . . .
TRANSLATION FROM LEFT TO RIGHT ~37

S - ~ A , S ~ bC, S --~ Bd, S --~ BcC, S --) c

A ~ Be, A --~ BaC, A --~ a A ,


(51)
B --+ ab, B ~ aBb,

C -+ c, C --+ aC, C --+ bC.

This is an LR(0) grammar.


Indeed, we can note that a DPDA is able to recognize the complement
of the strings it accepts, so that if L is a deterministic language not
involving the character " c , " the language L U { a c t a a string on the
terminal symbols of L} would actuMly be deterministic, contrary to
expectations. This weakens the argument that "comment" in Algol 60
might make it a non-RL language.

VI. R E M A R K S AND O P E N QUESTIONS


The concept of LR(k) grammars sheds much light on the translation
problem for phrase structure languages, and it suggests several inter-
esting areas for further investigation.
Of principal interest would be the study of grammatical transforma-
tions which preserve the LR(k) condition. Many such transformations
are well known (for example, the removal of " e m p t y " from a grammar;
elimination of left-reeursion; reducing to a "normal form" in which all
productions are of type A - ~ B C or A --~ a; the operation of transduction
which converts a grammar to another grammar for its translation; and
many special cases of the latter). Which of these grammatical modifica-
tions take L R ( ! c ) grammars into LR(k) grammars? Similar questions
apply to bounded context and bounded right context grammars.
Another important area of research is to develop algorithms that
accept LR(k) grammars, or special classes of them, and to mechanically
produce efficient parsing programs. In Section III we indicated three
ways to simplify the general parsing schemes produced by our construc-
tion and many more techniques certainly exist. A table such as Table II
shows essentially all of the information available during the parsing, and
much of it can be recognized as repetitive or redundant.
There are also implications for automata theory. We have shown that
a deterministic push-down automaton accepts precisely those languages
that. can be given an LR(h) grammar. This result can be strengthened
to show that in fact such languages can always be given a bounded right
638 KNUTIt

context grammar: We simply modify the construction (41), (43) by


changing
Li~t -4 a to L~.~t ---->M ~ a
L~a -~ a to L ~ -~ M ~ a
and adding the productions M ~ --~ e for all i, A. This has the effect of
keeping the necessary information in the sentential form that has been
parsed.
The question is, however, what type of automaton is capable of accept-
ing precisely those languages for which a bounded context grammar can
be given. The bounded context condition is symmetric with respect to
left and right, and we have shown that the deterministic property is
not; for example, the mirror reflection of language (50) is a deterministic
language which cannot be defined by a bounded context grammar.
The speed of parsing is another area of interest. Although LR(/c)
grammars can be efficiently parsed with an execution time essentially
proportional to the length of string, there are more general grammars
which can be parsed at a linear rate of speed. This may involve, for
example, backing up a bounded number of times, or scanning back and
forth from left to right and right to left in combination, etc. For every
general parsing method known, there are grammars which cause it to
take an exponential amount of time; yet it has never been proved that
the parsing problem is necessarily inefficient in general. Are there par-
ticular grammars for which no conceivable parsing method will be able
to find one parse of each string in the language with running time at
worst linearly proportional to the length of string? Are there general
parsing methods for which a linear parsing time can be guaranteed for
all grammars? (In these questions, a parsing method means a process of
constructing a derivation sequence from a terminal string by scanning a
bounded number of characters at a time.)
Finally, we might mention another generalization of LR(k) to be ex-
plored. The "second handle" of a tree may be regarded as the left-most
complete branch of terminals lying to right of the handle, and similarly
we can eonsider the r-th handle. A parsing process which always reduces
one of the first t handles leads to what might be called an L R ( k , t)
grammar. (In our ease, t = 1.) The grammar
S ~ ACe, S ~ BCd, A --* a, B ~ a, C ~ Cb, C ~ b (52)
TRANSLATION FROM LEFT TO RIGHT 639

is n o t L R ( k , 1) for a n y k, since " a " is t h e h a n d l e in b o t h abnc a n d


abnd; b u t i t is L R ( 0 , 2). T h e following r e d u c t i o n rules serve to
parse (52):

ab ~ aC, Cb ~ C, aCc ~ A C c , aCd ~ BCd, A C c ---+ S, BCc ~ S.

One m i g h t choose to call this l e f t - t o - r i g h t t r a n s l a t i o n , a l t h o u g h we h a d


to b a c k u p a finite a m o u n t .

RECEIVED: J u n e 23, 1965

REFERENCES
CocK~, J., AND MINSKY, M. (1964), Universality of Tag systems with P = 2.
J. Assoc. Comput. Mach. 11, 15-20.
EARLEY, J. (1964), "Generating Productions from B N F " (preliminary report).
Carnegie Institute of Technology.
EICKEL, J. (1964), Generation of parsing algorithms for Chomsky type 2 languages.
Tech. Hoch. M~nchen, Bet. //6401.
FLoYn, R. W. (1963), Syntactic analysis and operator precedence. J. Assoc. Corn-
put. Mach. 10, 316-333.
FLOYD, R. W. (1964a), Bounded context syntactic analysis. Commun. Assoc.
Comput. Mach. 7, 62-66.
FLOYD, R. W. (1964b), "Now Proofs of Old Theorems in Logic and Formal Lin-
guistics." Computer Associates, Inc., Wakefield, Massachusetts.
GINSBURG, S., AND GREIBACH,S. (1965), "Deterministic Context-Free Languages"
(preliminary report). Am. Math. Soc. Not. 12, 246, 367.
IRONS, E. T. (1964), "Structural connections" in formal languages. Commun.
Assoc. Comput. Mach. 7, 67-71.
LzNc~, W. C. (1963), "Ambiguities in BNF Languages." Thesis, Univ. of Wis-
eonsin.
NAU~, P., ed. (1963), Revised Algol 60 report. Commun. Assoc. Comput. Mach. 6,
1-17.
P~us, M. (1962), A general processor for certain formal languages. Proc. Syrup.
Symbolic Languages in Data Processing, Rome, I962. Gordon and Breach,
New York.
POST, E. L. (1947), Beeursive unsolvability of a problem of Thue. J. Symbolic
Logic 19., 1-11.
SOFTWARE-PRACTICE AND EXPERIENCE, VOL. 19(7), 607-685 (JULY 1989)

The Errors of TEX*

DONALD E. KNUTH
Computer Science Department, Stanford University, Stanford, California 94305, U S A .

SUMMARY
This paper is a case study of program evolution. The author kept track of all changes made to
TEX during a period of ten years, including the changes made when the original program was
first debugged in 1978. The log book of these errors, numbering more than 850 items, appears
as an appendix to this paper. The errors have been classified into fifteen categories for purposes
of analysis, and some of the noteworthy bugs are discussed in detail. The history of the TEX
project can teach valuable lessons about the preparation of highly portable software and the
maintenance of programs that aspire to high standards of reliability.
KEY WORDS Errors Debugging TEX Program evolution Language design True confessions

INTRODUCTION
I make mistakes. I always have, and I probably always will. But I like to think that I
learn something, every time I go astray. I n fact, one of my favourite poems consists
of the following lines by Piet Hein:'

The road to wisdom? Well, it's plain


and simple to express:
Err
and err
and err again
but less
and less
and less.

I am writing this paper on 5 May 1987, exactly ten years since I began to work
intensively on software systems for typesetting. I have certainly learned a lot during
those ten yrears, judging from the number of mistakes I made; and I would like to
share what I have learned with other people who are developing software. The best
way to do this, as far as I know, is to present a list of all the errors that were corrected
in TEX while it was being developed, and to attempt to analyse those errors.

TEX is a trademark of the American Mathematical Society.

003&0644/89/070607-79f639.50 Received 19 September 1988


@ 1989 by John Wiley & Sons, Ltd. Revised 28 November 1988
608 D. E. KNUTH

When I mentioned my plan for this paper to Paul M. B. Vitiinyi, he told me about
a best-selling book that his grand-uncle had written for civil engineers, devoted entirely
to descriptions of foundation work that had proved to be defective. The preface to that
book2 says

It is natural that engineers should not wish to draw attention to their


mistakes, but failures are sometimes due to causes of which there has been
no previous experience or of which no information is available. An engineer
cannot be blamed for not foreseeing the unknown, and in such cases his
reputation would not be harmed if full details of the design and of the
phenomena that caused the failure were published for the guidance of others.
. . . T o be forewarned is to be forearmed.
In my own case I cannot claim that ‘unknown’ factors lay behind my blunders, since
I was totally in control of my programming environment. I can justly be blamed for
every mistake I made, and I am certainly not proud of the record. But I see no harm
in admitting the horrible truth about my tendency to err, when such details might
shed light on the problem of writing large programs. (Besides, I am lucky enough to
have a secure job.)
Empirical studies of programming errors, conducted by Endres3 and by Basili and
Perri~one,~ have already led to interesting results and to the conclusion that ‘more data
must be collected on different projects’. I cannot claim that the data presented below
will be as generally applicable as theirs, because all of the programming I shall
discuss was done by one person (me). Insightful models of truly large-scale software
development and program evolution have been introduced by Belady and Lehman.5
However, I do have one advantage that the authors of previous studies did not have;
namely, the entire program for TEX has been published.6 Hence I can give fairly
precise information about the type and location of each error. The concept of scale
cannot easily be communicated by means of numerical data alone; I believe that a
detailed list gives important insights that cannot be gained from statistical summaries.

TYPES OF ERROR
Some people undoubtedly think that everything I did on TEX was an error, from start
to finish. But I shall consider only a limited class Qf errors here, based on the log books
I kept while I was developing the program. Whenever I made a change, I noted it
down for future reference, and it is these changes that I shall discuss in detail. Edited
forms of my log books appear in the appendix below.
I guess I could say that this paper is about ‘changes’, not ‘errors’, because many of
the changes were made in order to introduce new features rather than to correct
malfunctions. However, new features are necessary only when a design is deficient (or
at least non-optimal). Hence, I will continue to say that each change represents an
error, even though I know that no complex system will ever be error-free in this
extended sense.
The errors in my log books have each been assigned to one of fifteen general
categories for purposes of analysis:
A, an algorithm awry. Here my original method proved to be incorrect or inad-
equate, so I needed to change the procedure. For example, error no.212 fixed
THE ERRORS OF TEX 609
a problem in which footnotes appeared on a page backwards: the last footnote
came out first.
B, a blunder or botch. Here I knew what I ought to do, but I wrote something
else that was syntactically correct-sort of a mental typo. For example, in error
no. 126 I wrote ‘before’ when I meant ‘after’ and vice versa. I was thinking so
much of the Big Picture that I did not have enough brainpower left to get the
small details right.
C, a clean-up for consistency or clarity. Here I changed the rules of the language
to make things easier to remember and/or more logical. Sometimes this was just
a surface change to TEX’S ‘syntactic sugar’, as in error no. 16 where I decided
that \input would be a better name than Lequire.
D, a data structure debacle. Here I did not properly update the representation of
information to preserve the appropriate invariants. For example, in error no. 105
I failed to return nodes to available memory when they were no longer accessible.
E, an efficiency enhancement. Here I changed the program so that it would run
faster; the existing code was correct but slow. For example, in error no.287 I
decided to give TEX the ability to preload fount information, since it took a
while to read thirty short files at the beginning of every run.
F, a forgotten function. Here I did not remember to do everything I had intended,
when I actually got around to writing a particular part of the code. It was a
simple error of omission, rather than commission. For example, in error no. 11
and again in no. 172 I had a loop of the form while p # null do, and I forgot to
advance the pointer p inside the loop! This seems to be one of my favourite
mistakes: I often forget the most obvious things.
G, a generalization or growth of ability. Here I realized that some extension of the
existing specifications was desirable. For example, error no. 303 generalized my
original primitive command ‘\ifT (char)’ (which tested if a given character was
‘T’or not) to the primitive ‘\if (char)(char)’ (which tested if two given characters
were equal). Eventually, in no. 666, I decided to generalize further and allow
‘\if (token)(token)’.
I, an interactive improvement. Here I made TEX respond better to the user’s
needs. Sometimes I saw how to help TEX identify and recover from errors in
the documents it was processing. I also kept searching for better ways to
communicate the reasons underlying X'S behaviour , by making diagnostic
information available in symbolic form. For example, error no. 54 introduced
‘. . . ’ into the display of context lines so that users could easily tell when
information was truncated.
L, a language liability. Here I misused or misunderstood the programming language
or system hardware I was working with. For example, in error no. 24 I wanted
to reduce a counter modulo 8, so I wrote t := (t - 1) mod 8; this unfortunately
made t negative because of the way mod was defined. Sometimes I forgot the
precedence of operators, etc.
M, a mismatch between modules. Here I forgot the conventions I had built into a
subroutine when I actually got around to using that subroutine. For example,
in error no.64 I had a macro with four parameters (xo, yo, xl, yl) that define a
rectangle; but when I used it, I gave the parameters in different order, (xo, xl,
yo, yl). Such ‘interface errors’ included cases when a procedure had unwanted
side-effects (such as clobbering a global variable) that I failed to take into
610 D. E. KNUTH

account. Some mismatches (such as incorrect data types) were caught by the
compiler and not entered in my log.
P, a promotion of portability. Here I changed the organization or documentation
of the program; this affected only a person who would try to read or modify the
code, not a person who tried to run it. For example, in error no. 59, one of my
comments about how to set the size of memory had ‘1’ where 1 meant to say
‘5’. (Most changes of this kind were not recorded in my log; I noted only the
noteworthy ones.)
Q, a quest for quality. Here I changed the specifications of what the program should
output from given input, when I learned how to improve the typographic
appearance of the output. For example, error no. 187 changed TEX’S behaviour
when typesetting formulae that have an unusually complex superscript; as a
result, Q X now produces
- 1

instead of e+.

R, a reinforcement of robustness. Whenever I realized that TEX could loop or crash


in the presence of certain erroneous input, I tried to make the code bullet-proof.
For example, error no. 200 made sure that a user-supplied character number was
between 0 and 127; otherwise parts of -X’s memory could be wiped out.
S, a surprising scenario. Errors of type S were particularly bad bugs that forced
me to change my original ideas, because of unforeseen interactions between
various parts of the program. For example, error no. 25 was logged when I first
discovered a consequence of Q X ’ s convention about blank lines denoting the
end of a paragraph: There is often a blank space in TEX’S internal data structure
just before a paragraph ends, because a space is usually supplied at the end of
the line just preceding a blank line. Thus I had to write new code to delete the
unwanted space. Whenever such unexpected phenomena showed up, I had to
go back to the drawing board and fix the design.
T, a trivial typo. Sometimes I did not type the right thing when I entered the
program into the computer, although my original pencil draft was correct. For
example, in error no.48 I had typed ‘-’ instead of If a typing mistake was
I+’.

detected by the compiler as a syntax error, I did not log it, because bad syntax
can easily be corrected.
Nine of these categories (A, B, D, F, L, M, R, S, T) represent ‘bugs’; such errors
absolutely had to be corrected. The other six categories (C, E, G, I , P, Q) represent
‘enhancements’; I could have refused to consider the existing situation erroneous. As
remarked earlier, I am considering all items in the log to be indications of error. But
there is a significant difference between errors of these two kinds: I felt guilty when
fixing the bugs, but I felt virtuous when making the enhancements.
My classification of errors into fifteen categories is ad hoc, but at the moment it is
the best way I can think of to make sense out of my experiences. Some of the bug
categories refer to simple flaws in the basic mechanics of programming: writing the
right thing but typing it wrong (T) ; thinking the right thing but writing it wrong (B) ;
knowing the right thing but forgetting to think it (F); imperfectly knowing the tools
(L) or the specifications (M). Such bugs are easy to fix once they have been identified.
Categories A and D represent the next level of difficulty, as we get into technical
T H E ERRORS OF TEX 61 1
aspects of what programming is all about. (As Niklaus Wirth has said, Algorithms +
Data Structures = Programs.) Category R covers the special situation in which we
want a program to survive even when its input is incorrect. Finally, category S accounts
for higher-level surprises ; these are the subtle bugs that result from compIex interactions
between different parts of a system. Thus the nine types of bugs have a somewhat
logical structure. The remaining six categories-cleanliness (C), efficiency (E), general-
ization (G), interaction (I), portability (P) and quality (Q)-seem to provide a reason-
able way to classify the various kinds of enhancements that were made to TEX during
its development.
My classification scheme relies more on essential functionality than on the external
form of the program. Thus it is not easy to use my statistics about the number of
errors per category to answer questions such as ‘How many bugs were due to improper
use of goto statements?’ Such questions are interesting to teachers of programming,
but I no longer think that they are extremely important. If I had indexed my errors
by syntactic categories, I would have found that error nos. 45, 91, 119, 155, 231, 352,
354, 419, 523, 581 and 801 could be ascribed to my use or abuse of goto; also no. 512
could be added to this list, since return and goto are analogous. Thus we can conclude
from my experience with TEX that goto statements can indeed be harmful. On the
other hand we must balance this fact with the realization that bad gotos account for
only 1.4 per cent of my errors; we must identify other culprits if we’re going to do
away with the other 98-6 per cent. Sure enough, several other errors were caused by
lapses in my use of other control structures: A case statement got me in trouble in
no. 21; a while confused me in no. 29; if-then-else led me astray in nos. 467, 471, 680
and 843. (See also nos.796 and 845, where efficiency of control was important.) I
conclude that every feature of a programming language can be harmful, if it is misused.
Some of the errors noted in my log book were much more devastating than others.
In certain cases the changes were far-reaching, affecting dozens of different parts of
the program; several days of ‘hacking’ were necessary before such changes had been
made and verified. For example, change no. 110 required major surgery to the program,
because my original ideas were incapable of handling aligned tables inside of aligned
tables. On the other hand, some of my errors were only venial sins, and some of the
changes were merely twiddles; for example, no. 87 simply improved the wording of a
diagnostic message. Although the log does not give an explicit weighting to the errors,
the ‘heavy’ errors tend to cancel with the ‘light’ ones, so we can still get a reasonable
insight into the stability of the program if we calculate, say, the number of errors
logged per year.

CHRONOLOGY
The development of m X has taken place over a period of ten years, and the lessons
I learned can best be understood when they are put into the context of the other things
I was doing during that time. Typography has many facets, hence TEX itself was only
one of the projects I decided to work on. The two most significant companion systems
were METAFONTI (a system for typeface design) and Computer Modern (a family of
typefaces defined in terms of the METAFONT language); these programs had to be

METAFONT is a trademark of Addison-Wesley Publishing Company, Inc.


612 D. E. KNUTH

debugged just as TEX did, and their debugging logs show a similar development
history. I also needed a dozen or so utility routines to support TEX and METAFONT;
the most notable of these are TANGLE and WEAVE, which constitute the WEB system of
structured doc~mentation.’*~

BeghhgS
The genesis of TEX probably took place on 1 February 1977, when I first chanced
to see the output of a high-resolution typesetting machine. I was told that this fine
typography (the galley proofs of a book by Winston,’ which our faculty was considering
for inclusion in an exam syllabus) was produced by entirely digital methods; yet I
could see no difference between the digital type and ‘real’ type. Therefore I realized
that a central aspect of printing had been reduced to bit manipulation. As a computer
scientist, I could not resist the challenge of improving print quality by manipulating
those bits better. Therefore my diary entry for 8 February says that, already at that
time, I began discussing the possibility of new typesetting software with people at
Stanford’s Artificial Intelligence Lab. By 13 February I had changed my plan to spend
a forthcoming sabbatical year in South America; instead of travelling to an exotic place
and working on Volume 4 of The Art of Computer Programming, I had decided to stay
at Stanford and work on digital typography.
I mentioned earlier that the design of TEX was begun on 5 May 1977. A week later,
I wrote a draft report containing what I thought was a pretty complete design, and I
stayed up until 5 a.m. typing it into the computer. The problem of typesetting seemed
quite straightforward, so I soon started thinking about founts instead; I spent the next
45 days writing a program that was destined to evolve into METAFONT. By 28 June,
I had 25 lower-case letters in various styles that looked reasonably good to me at the
time; and three days later I figured out how to handle the 26th letter, which required
some new ideas.”
I went back to thinking about TEX on 3 July. Several people had made thoughtful
comments on my earlier draft, and I prepared a thoroughly revised language definition
after two weeks of further study. (This included two days of working with dictionaries
in order to develop an algorithm for hyphenation of English.) The resulting document,
I thought, was a reasonably complete specification of a language for typesetting, and
I left it in the capable hands of two graduate students who were my research assistants
that summer (Frank Liang and Michael Plass). Their job was to implement T o while
I flew off for a visit to China. I returned on 25 August and had just one day to meet
with them before leaving on another three-week trip. On 14 September I returned and
they presented me with a sheet of paper that had been typeset by their p r o t o - T o
program! They had implemented only about 15 per cent of the language, and they
had used data structures that were not general enough or efficient enough to support
the remaining 85 per cent; but they had chosen their subset wisely, so that a small
test program could run from start to finish. Hence it was easy for me to imagine what
a complete system would entail.
Now it was time for Liang and Plass to go back to school, and time for my sabbatical
year to begin. I started coding the ‘final version of TEX’(or so I thought) on 16
September, and immediately I discovered that their summer work represented a truly
heroic achievement. Although I had thought that my specification of TEX was quite
complete, I encountered loose ends every 15 minutes or so when I was actually faced
- _
T H E ERRORS OF TEX 613
with writing the code. I soon realized that if I had been in my students’ shoes-having
to implement this language when the author was completely unreachable-I would
have thrown up my hands in despair; important policy decisions had to be made at
every turn.
That was the first big lesson I learned during my work with T@: the designer of
a new kind of system must participate fully in the implementation. Even if I had been
available for consultation with my students, they would have had to come to me so
often with questions that the work would have dragged on forever. I can imagine them
having to spend a half hour or so explaining each particular problem to me, and we
would have needed literally hundreds of those meetings. Now I knew why other
projects I had heard about, in which the language designer had decided not to be the
compiler writer, had failed.
By 14 October I had coded all of TEX except for the parts that typeset mathematics,
and except for the routines that convert from X'S internal representation into codes
for an output device. At this point I had to leave for three weeks of travel in Europe.
This European trip had been planned long before, so it was mostly unrelated to
typesetting; but I did have some interesting discussions about curve-drawing with
mathematicians I met in Oberwolfach, Germany, and in Oslo, Norway. I also was able
to arrange a visit to the headquarters of Monotype Corporation in Redhill, England.
After returning, I spent November finishing the numerals, upper-case letters, and
punctuation marks of the first-draft Computer Modern types. I needed to have a
complete fount because I had been invited to give a lecture about this work to the
American Mathematical Society, and I did not want to have only lower-case examples
to show. I prepared the AMS lecture” during December and presented it in January,
so I did not have a chance to resume the coding of m X until 14 January. But finally
I was able to write the following in my diary on 9 February 1978:
Finished the TEX programs including all loose ends and got them all compiled
without syntax errors (4 a.m.).

was the first fairly large program I had written since 1970; so it was my first
non-trivial ‘structured program’, in the sense that I wrote it while consciously applying
the methodology I had learned in the early 1970s from Dijkstra, Hoare, Dahl and
others. I found that structured programming greatly increased my confidence in the
correctness of the code, while the code still existed only on paper. Therefore I could
wait until the whole program was written, before trying to debug any of it. This saved
a lot of time, because I did not have to prepare ‘dummy’ versions of non-existent
modules while testing modules that were already written; I could test everything in its
final environment. Of course I had a few qualms in January about whether my code
from September would really work; but that gave me more of an incentive to finish
the whole thing sooner.
Even on 10 February, when TEX had been compiled and was ready to be tested, I
did not feel any compelling need to try it immediately. I knew that the program was
fairly readable and ‘informally proved correct’, so I spent the next month making italic,
greek, script, symbols and large delimiter founts. My test program for required
those founts, so I did not want to start testing until everything was in place. Again, I
knew I was saving time by not having to prepare prototypes that would merely simulate
the real thing; structured programming gave me the courage to wait until the whole
614 D. E. KNUTH

system was ready. I finished the large symbols on 8 March, and I happily penned the
following in my diary on 9 March:

Entered all accumulated corrections to TEX program and compiled it-


tomorrow the debugging begins!

My log book for errors in TEX began that next day, 10 March; the debugging
process will be discussed below. By 29 March I had decided that TEX was essentially
working,

. . . (except perhaps for error recovery)-it's time to celebrate!

I began tuning up the founts and drafting ideas for a user manual; then I spent a few
days at Alphatype Corporation in Illinois, from whom Stanford had decided to purchase
a phototypesetter. From 11 April to 11 May I took time off from typography to work
on dozens of updates to Seminumen'cal Algorithms, which is Volume 2 of The Art of
Computer Programming;'2 I wanted to incorporate new research results into that text,
which was to be W X ' s first big application. Then on 14 May I began to get TEX
running again; proof copies of pages iv to 8 of Volume 2 came out of our Xerox
Graphics Printer on 15 May.
My work was cut out for me during the next weeks: I became a production user of
Q X , typing the manuscript of Volume 2. This proved to be an invaluable experience,
as explained below. By the time my sabbatical year ended, on 24 September, I had
finished the typing up to page 441 of that 700-page book. Improvements to kept
occurring to me all during that time, of course-except during a month-long vacation
trip with my family. (Even on vacation I kept seeing founts everywhere and thinking
about how to draw such letterforms by computer. I spent one morning sitting by one
of the trails in the Grand Canyon designing the algebraic notation for METAFONT; my
founts had previously been written in a primitive macro language and compiled directly
into machine code, not interpreted.) I also spent three weeks that summer writing the
first manual for WX.
Although my sabbatical year was over, I kept working on typography in odd moments
between classes in the autumn; the text of Volume 2 was completed on the morning
of 15 November. On 17 November I began writing METAFONT, and my diary entry
for 31 December 1978 was this:

Finished the METAFONT interpreter, just in time to celebrate New Year's


eve (1159 p.m.).

Other people had begun to use TEX in August of 1978, and I was surprised to see
how fast the system was propagating. I spent my spare time during the first three
months of 1979 thinking about how to make TEX available in Pascal form. (The
original program was written in S A I L , a language that was available on only a few
computers.) During this period I began to experiment with the typesetting of Pascal
programs; I wrote a program called BLAISE that converted Pascal source code into a
file for pretty-printing. BLAISE soon developed into a system called DOC for
structured documentation, completed on 31 March, 1979; programs in DOC format
could be converted either to Pascal or to TEX.Luis Trabb Pardo and Ignacio Zabala
THE ERRORS OF TEX 615
subsequently used DOC to prepare a highly portable version ofm in Pascal, completed
in April of 1980.
About this time I learned another big lesson: writing software is much harder than
writing books. I could not simultaneously teach classes well and finish what needed to
be done on typography. So I asked to be excused from teaching in the spring of 1979;
my diary for March 22 said,

Now my obligations are fairly well cleared away and it’s back to the stalled
research on m X .

(It turned out that I was able to teach during only 13 of the 21 academic quarters
between my sabbatical years in that period. I continued to supervise graduate students,
but I gave no classroom lectures during 1983 when the work on TEX and METAFONT
was at its peak; I also missed three months in 1982, 1984 and 1985. I really enjoy
teaching, but I could not see any way to finish the Q X project without relinquishing
almost all of my other duties.)
On 1 April 1979, I returned to METAFONT, which had been written but not
debugged. METAFONT began to work on 28 April. Then I began to design software
for the Alphatype machine; that took about three months. During the summer I wrote
the METAFONT manual, which gave me further experience with TEX. And TEX also
received an important stimulus from the American Mathematical Society that summer,
when several people (including Barbara Beeton and Michael Spivak) were given the
opportunity to spend some time at Stanford developing QX macros. The AMS people
introduced me to several important applications, such as the indexes to Mathematical
Reviews, which stretched TEX to its limits and led to substantial improvements.

Endings
By 14 August 1979, I felt that TEX was essentially complete and fairly stable. I
lectured that evening to about 100 participants of the Western Institute for Computer
Science in Santa Cruz, telling about my experiences developing and debugging the
program. At that time my log book of errors had accumulated 420 items; little did I
know that the final total would be more than twice that! But already I knew that I
had learned a lot by keeping the log, and I must have been enthusiastic because I
lectured from 7:30 to 9:30 p.m. (The audience was equally enthusiastic-they kept
asking me questions until 11:30 p.m. So I resolved to write a paper about the errors
of Q X , and at last I am able to do so.)
I devoted the last months of 1979 and the first months of 1980 to Computer Modern,
which needed to be rewritten in terms of the new METAFONT. Then I needed to
update Volume 2 again-computer science marches inexorably forward-until I had
finally finished producing camera-ready copy on our Alphatype. This was the goal I
had hoped to achieve during my sabbatical year; I reached it at 2 a.m. on 29 July
1980, about two years late. During the rest of 1980 I wrote papers about what 1
thought were the most novel ideas in TEX13 and in METAFONT.14
But my research on TEX was by no means finished. About 50 people from all over
the U.S.A. met at Stanford on 22 February 1980, and established the TEX User
Group (TUG). I asked them if they would mind my cleaning up the language in
several upward-incompatible ways, even though this would make the user manual and
616 D. E. KNUTH

their existing computer files obsolete; and nobody objected to such changes! Soon
T U G grew dramatically, under the able chairmanship of Richard Palais, and it became
international. I realized that I could not disappoint all these people by leaving TEX
in its current state and returning immediately to work on subsequent volumes of The
Art of Computer Programming.
I needed to work out a better 'endgame strategy', and it soon became clear what
ought to be done: the original versions of TEX and METAFONT should be scrapped,
once they had served their purpose of accumulating enough user experience to indicate
what such languages ought to be. New versions of TEX and METAFONT should be
written, designed to last a long time and to be highly portable between computers and
typesetting devices of all kinds. Moreover, these new programs should be published,
because TEX was making it possible to improve the state of the art of program
documentation. I decided to do my best to produce a stable system and to explain all
I knew about it, so that other people could take it over and maintain it if it proved to
be important. This way I could return to other pursuits in good conscience, knowing
that if MY typographic research had any merit it would be carried on by others in
whatever ways would prove to be necessary.
So that was my new goal; I thought I could achieve it in one or two more years.
The original TEX program was renamed QX78, and the new one was to be called
m82.
Classes and miscellaneous chores kept me too busy to do much else during the first
half of 1981, but I began to write TEX82 on 22 August. By 9 September I realized
that the DOC system needed to be completely revised, so I spent two months replacing
it by a much better system called WEB'. Since then my programming language of choice
has been WEB (which, unlike DOC, was written in its own language). After a month in
Europe, I was able to resume writing "EX82 on 1 December 1981. The draft of
m 8 2 was completed on 29 June 1982; as before, I wrote the entire program before
trying to run any of it.
Meanwhile I had other problems to worry about. When my new copy of Seminumer-
icai Algorithms arrived in January 1981, I had expected to be filled with joy at the
consummation of so much hard work. Instead, I burned with disappointment, as I
realized that I still had a great deal to learn about founts. The early Computer Modern
typefaces were not at all what I had hoped to achieve, when I first saw them in print.
They had looked reasonably good at low resolution, so I had blithely assumed that
high resolution would be much better. Not so. My education in typefaces was barely
beginning. Later in 1981 I met Richard Southall, a professor of type design who had
exactly the expertise I was lacking; so I invited him to visit Stanford. We spent the
entire month of April 1982 working about 16 hours a day, revising Computer Modern
from A to z.
I debugged T ~ x 8 in 2 the summer of 1982, then began to write the new manual-
called The T@b~ok'~-in October. The first manual had been written hastily and
finished in 21 days, but I wanted The T & Y h k to meet much higher standards.
Therefore I was not able to finish it until a full year later.
It was during this period, October 1982 to October 1983, that TEX became a mature
system. I had to rethink every aspect of its design as I rewrote the manual. Fortunately
I was aided by a wonderful group of knowledgeable volunteers, who would meet with
me for two or three hours every Friday noon and we would discuss the trade-offs of every
important decision. The diverse backgrounds of these people provided an important
THE ERRORS OF TEX 617
counterweight to my one-sided views. Finally, on 9 December 1983, I decided that
the first phase of my endgame strategy was complete; I gratefully hosted a coming-of-
age party for TEX,with 36 guests of honour, at the Fuki-Sushi restaurant in Palo
Alto.
The rest is history. I wrote METAFONT in WEB between December 1983 and July
1984; I wrote The METAFONTbook between August 1984 and October 1985, taking off
five months (February to July) to rewrite Computer Modern in terms of the new
METAFONT. I began another sabbatical year in October 1985, just after the TEX project
disbanded. Finally, after adding a few more finishing touches, I was able to celebrate
the long-planned completion of my ‘endgame’ on 21 May 1986, when my publishers
sponsored a reception at the Computer Museum in Boston; that was the day I first
saw the five hardcover volumes of Computers & Typesetting,the books that summarize
my nine years of work on TEX, METAFONT and Computer Modern.
Another year has gone by and I would like to report that TEX has proved to be 100
per cent correct. But I cannot, not yet. For I stumbled across a hidden Q X anomaly
last January. And I have just been teaching a course about software development based
on the internal structure of TEX; students in the class have noticed a few things that
should be improved. So I suppose there is still at least one bug lurking there. I plan
to hold off publishing this paper until another year or so has gone by, so that I will
have more reason to believe that my log book of errors is complete.

CONTENTSOFTHELOGBOOKS
As I said, the appendix to this paper reproduces the entire list of errors that I kept as
TEX was evolving. The best way to comprehend how TEX evolved is to peruse this
list. The first 519 items refer to the original program m 7 8 , which was written in
SAIL, from the time I began to debug it to the time I stopped maintaining it. The
remaining items, numbered 520-849 (as of May 1987), refer to the ‘real’ program
m 8 2 , which was written in WEB. I did not keep any record of errors removed during
the hectic period when m 8 2 was being debugged, but items 520 and following
include every change that was made to W 8 2 after it passed its first test. The
differences between m 7 8 and m 8 2 , seen from a user’s standpoint, have been listed
elsewhere.l6
I have tried to edit the log entries so that they can be understood in terms of the
published listing6 of m 8 2 . For example,

15 Add the forgotten case ‘setfont:’ to eq-destroy. $275 F

is entry no. 15. My original log entry referred to case ‘[font]’in ‘eqdestroy’using
SAIL syntax, but I have changed to Pascal syntax in the edited log. Similarly, the 1978
identifier font eventually became setfont, so I have adopted the published equivalent.
m 8 2 contains a procedure called eq-destroy in $275 of the program, and this
procedure is quite similar to the eqdestroy of n X 7 8 ; so I have supplied $275 as a
program reference. (It turns out that eq-destroy no longer needs a ‘setfont:’ subcase,
but it did in 1978.) The ‘F‘ after $275 means that this was a bug of type F, a forgotten
function.
Changes to a program often spawn other changes later. I have tried to indicate that
618 D. E. KNUTH

phenomenon in the appendix by prefixing the number of a prior error when it was an
important part of the reason for a subsequent error. Thus no. 67 is

25 I+ 67 Replace the space at paragraph end by fillglue, not by zero. $816 B

Error no. 25 was logged when I had been surprised to find a space at the end of TEX'S
internal representation of a paragraph. I had 'cured' the problem by converting the
space from a normal interword space to a space of width zero. But that was not good
enough, since it was possible for TEX to try breaking a line at the zero-width space.
A better solution was to replace the space by the glue that is always added to fill out
the end of a paragraph.
Figure 1 shows a time chart of the first 519 log entries-the errors of TEX78. There
is a burst of activity right near the beginning, since I logged the first 237 errors during
the three weeks of initial debugging. Thus the main line in Figure 1, which shows the
cumulative number of errors as a function of time, is nearly horizontal at the beginning.
But it is nearly vertical at the end, since only 13 changes were made during the last
year of T ~ X 7 8 ' sactivity.
Another line also appears in Figure 1: it represents the total number of different
pages I typeset with "EX78 as I was experimenting with the first version. The dotted
line in July 1978 stands for the 200 pages of the first TEX manual, and the dotted line
in June 1979 stands for the 100 pages of the first METAFONT manual; the remaining
solid lines stand for the 700 pages of Volume 2 and some experiments with DOC.
Figure 1 shows that four different phases can be distinguished in the development
of T ~ x 7 8First
. came the debugging phase (Phase 0), already mentioned. Then came
a longer period of time (Phase 1) when I typeset several hundred pages of Volume 2
and the first user manual; this experience suggested many amendments to my original
design. Then TEX suddenly had more than one user, and different kinds of errors
began to show up. New usersfind new bugs. This coming-out phase (Phase 2) included
small bursts of changes when I faced new applications-a suite of difficult test cases
posed by the American Mathematical Society, then the application to Pascal formatting,
then the complex index to Mathematical Reviews. Finally there was Phase 3, when
changes were made in anticipation of a future Q X 8 2 ; I wanted several new ideas to
be well tested before I programmed the 'ultimate' TEX.

THE I N I T I A L DEBUGGING STAGE


Let us roll the clock back now and look more closely at the earliest days of " ~ X 7 8 .
In some ways this was the most interesting time, because the whole concept of TEX
was just beginning to take shape. Figure 2 is a modified version of Figure 1, redrawn
with a time warp. There is now exactly one error per time unit, so the 18-day debugging
phase has been slowed down to almost half of the total development time; on the other
hand, the years 1981-1982 at the bottom go by so fast as to be barely visible.
I mentioned that T m 7 8 was entirely coded before I first tried to run it on 10
March. My debugging strategy was to walk through the program using the BAIL
debugger, a system program by John Reiser that allowed me to execute the statements
of my program one at a time; BAIL would also interpret additional SAIL statements
that I entered on-line. Whenever I came to a section of program that I had seen before,
THE ERRORS OF TEX 619

Nov 80
Dec
Jan
80
81
\
Feb 81
Mar 81
.rEx installers’ workshop
b
Apr 81
May 81
Jun 81
Jul 81
Aug 81 Begin coding W 8 2
Sep 81
Oct 81
Nov 81
Dec 81
Jan 82
Feb 82
Mar 82
~ p a2
r

Figure 1. The rise and fall of T ~ x 7 8


620 D. E. KNUTH

iy 3: 1
12 Mar 78
13 Mar 78
14 Mar 78
\
Data structures, memory management
Syntax, error recovery
Basic typesetting primitives
output
Paragraphing
I
PHASE

~-
DEBUGGING

15 Mar 78
16 Mar 78 Page breaking
Paragraphing, continued
17 Mar 78
19 Mar 78
Alignment
Math typesetting
I
20 Mar 78
21 Mar 78

22 Mar 78
23 Msr 78
25 Mar 78 “Realistic” test F gram
27 Mar 78
29 Mar 78
II
May 78 PAASE 1: FIRST AP~LIcATIONS
Jun 78
. **..User manual \I

Jul 78 0.

Aug 78
0 .

-Manual 0

Sep 78
Nov 78
PHASE 2: LnPAl l l -”I.-C f 7 D C
I
--\ Manual 1
Manual 2
I I
A MS
es so far)/2
Pascal typese
METRFONT .... Manual 3
AMS inclex demo
~~
- -
- Manual 4
PHASE 3: GLOBAL USEM New linebreaking algorithm rs so far
1980

I
-
1981 installers’ workshop \
1982
100 200
Begin coding ‘Q9C82
300 400 5
L
0

Figure 2. The emrs of T H 7 8

I could set a break-point and continue at high speed until coming to new material.
Watching the program execute itself in this ‘dynamic order’ has always been insightful
for me, after I have desk-checked it in the ‘static order’ of my original code.
Figure 2 shows that I got through the program initialization the first day; then I
was gradually able to check out the routines for basic data management, parsing and
error reporting. On the fourth day TEX began to combine boxes and glue, and there
was visible output on the fifth day. During the following three days I tested the
algorithms for breaking paragraphs into lines and breaking lines into pages. All this
went rather smoothly; I had already logged 101 errors during this first week, but all
of the problems were comparatively minor oversights, to be expected in any program
of this size.
On the ninth day I tackled alignment of tables, and got a big shock: my original
algorithms were quite wrong. I had greatly misunderstood this aspect of TEX, because
I had greatly underestimated the complications of nested alignments. (The log mentions
some of the puzzlement and frustration I felt at the time.) I wrestled with alignment
for two days before finding a solution.
THE ERRORS OF TEX 62 1
Then I looked at the last remaining part of TEX, the code for typesetting mathemat-
ics; this took another four days. (Well, the ‘days’were nights actually; I worked during
the night to avoid delays due to time-sharing.) Finally I had seen essentially all of
TEX in operation, and I could let it run at full speed instead of relying on single-step
mode. I spent six more days helping TEX get through its first test data; finally the
test was passed. Whew! The debugging phase was over, 18 days and 237 log-book
entries after it began.
I kept track of how long this process took, so that I’d be better able to estimate the
duration of future programming projects. Table I gives the figures.
The total debugging time, 132 h, was extremely encouraging to me, because it was
much less than the 41 days it had taken me to write the program. Previously I had
needed to devote about 70 per cent of program development time to debugging, but
now the figure had dropped to about 30 per cent. I considered this to be a tremendous
victory for structured programming, since my programming time had also decreased
from what it had been with old habits. Later, with the WEB system, I noticed even
further gains in productivity.
How big was Q X at the time? I estimated this by counting the number of semicolons
(4857) and the number of occurrences of the S A I L reserved words comment (480) and
else (223). Since I always put semicolons before end, the total number of statements
in the program could be computed as

; - comment + else = 4857 - 480 + 223 = 4600


Thus the debugging strategy I used allowed me to verify about 35 statements per hour.
The fact that I made 237 log entries in 132 h means that I was logging things only
about once every 33 min; thus the total time needed to keep the log was negligible. 1
can definitely recommend the practice to everybody. During most of the debugging
time I was clicking away at the keys of my terminal, getting to know exactly what TEX
was doing; I needed only a few extra minutes to make the log entries, which helped
me to get to know myself.

EARLY TYPESETTING EXPERIENCE


Now that TEX was able to typeset its test program, I could proceed to my main goal,
the typesetting of Volume 2. This was a somewhat tedious task-the keyboarding of a

Table I

Day Time, h Day Time, h

10 March 1978 19 March 1978 7.5


11 March 1978 20 March 1978 10
12 March 1978 21 March 1978 8
13 March 1978 22 March 1978 6
14 March 1978 23 March 1978 7.5
15 March 1978 25 March 1978 7
16 March 1978 26 March 1978 6
17 March 1978 27 March 1978 8
18 March 1978 29 March 1978 6
622 D . E. KNUTH

700-page book is not one of life’s greatest pleasures-but the regular appearance of
nice-looking pages kept me happy. T h e jagged line in Figure 2 shows my progress in
terms of pages typeset versus errors in the TEX log; a similar (even more jagged) line
appears in Figure 1, showing pages typeset as a function of time.
The most striking thing about the jagged line in Figure 2 is that it is almost straight.
Ideas about how to improve TEX kept occurring to me quite regularly as I typed the
manuscript. Between 13 May and 22 June I processed about 250 pages, and added 69
new entries to the log. Those 69 entries included 29 ‘bugs’ and 40 ‘enhancements’;
thus, I thought of a new way to improve TEX at a regular rate of about one enhancement
for every six pages typed.
I mentioned earlier my firm conviction that I could not have correctly delegated the
coding of TEX to another person; I had to be doing it myself, because writing a new
sort of program implies continually revising the specifications. Similarly, I could not
have correctly delegated these initial typing experiments to another person. I had to
put myself in the r61e of a regular user; there is no substitute for such experience,
when a new system is being designed.
But at the time I was not thinking about creating a system that would be used
widely; I was designing TEX primarily for my own use. The idea that TEX could or
should be generalized to other applications besides The Art of Computer Programming
dawned on me only gradually, as people kept noticing what I was doing and expressing
an interest in it.
John McCarthy observed during this period that TEX was doing a reasonable job
with respect to traditional mathematical copy, but he suspected that I would have a
tough time typesetting a book about TEX itself. ‘That will be the real test’, he said,
‘because you’ll have to shut off many of TEX’S automatic features in order to handle
problems of self-reference’.
In July I succumbed to John’s challenge and prepared a user manual for TEX. Sure
enough, this experience helped me identify quite a few weaknesses in the existing
design, things that I probably wouldn’t have noticed if I had confined my attention to
The Art of Computer Programming alone. Again I thought of enhancements at the rate
of about one for every six or seven pages, as I wrote the manual; but these were not
really occasioned by defects in TEX’S ability to be self-referential, as John had predicted.
The new enhancements came about because the process of manual-writing forced me
to think about TEX as a whole, in a new way. The perspective of a teacher/expositor
helped me to notice several inconsistencies and shortcomings.
Thus, I came to the conclusion that the designer of a new system must not only be
the implementor and the first large-scale user; the designer should also wn’te the first
user manual. The separation of any of these four components would have hurt TEX
significantly. If I had not participated fully in all these activities, literally hundreds of
improvements would never have been made, because I would never have thought of
them or perceived why they were important.

PHASES 2 AND 3: USERS


But a system cannot be successful if it is too strongly influenced by a single person.
Once the initial design is complete and fairly robust, the real test begins as people with
many different viewpoints undertake their own experiments. At the beginning of
August, I distributed 45 copies of the draft manual to people who had expressed
THE ERRORS OF TEX 623
interest in using TEX and who had promised to give me feedback before the ‘real’ user
manual would be issued in September. So TEX had a multitude of users for the first
time, and I began to learn about a wide variety of new applications and perceptions.
I continued to typeset the remaining 450 pages of Volume 2, and my personal
experiences with those pages continued to suggest regular improvements to TEX until
I got up to about page 500. But the final 200 pages were just drudgework, not really
inspirational to me in any way as far as TEX was concerned. Nor did I learn much
more, except about page layout, when I typed the METAFONT manual some months
later. T h e really important influences on TEX after the first manual was published were
the users, first because they made different kinds of mistakes than I had anticipated, and
later because they had important suggestions about how to improve TEX’S capabilities.
Guy Steele was visiting Stanford that summer; he took a copy of TEX back to M I T
with him, and I began to get feedback from two coasts. One of Guy’s suggestions,
which I staunchly resisted at the time, was to include some sort of mini-programming
language in TEX so that users could do numerical calculations. Slowly but surely I
began to understand the need for such features, which eventually became a basic part
of T ~ x 8 2 Another
. early user was Terry Winograd, who pushed TEX’S early macro
capabilities to their limits. He and Michael Spivak, who began to work with TEX in
the summer of 1979, taught me a lot about the peculiar properties of macro expansion.
Researchers at Xerox PARC also had a significant influence on TEX at this time; Lyle
Ramshaw modified the program to work with Xerox’s new founts and new output
devices, while Leo Guibas and Doug Wyatt undertook to rewrite TEX in the MESA
language.
Figures 1 and 2 indicate that the first TEX user manual was issued in five versions.
‘Manual 0’ was the preliminary draft, handed out to 45 guinea pigs who agreed to help
me test the very first system. ‘Manual 1’ was a Stanford technical report issued a month
later; it was reprinted as ‘Manual 2’ in November, using the higher-resolution printing
devices at Xerox PARC. T h e American Mathematical Society published a paperback
version” of Manual 2 in the summer of 1979; that was ‘Manual 3’. Then Digital Press
published ‘Manual 4’, which included the METAFONT manual and some background
information, in December 1979.’*
T h e publishers of manuals 3 and 4 asked readers to mail a reply card if they were
interested in forming a TEX User’s Group, and more than 100 people answered Yes.
So the first T U G meeting, in February 1980, marked the beginning of yet another
phase in the life of the S A I L program TEX78. A great influx of new users and new
applications made me strive for a more complete language. Hence there was a flurry
of activity at the end of March 1980, when I decided to extend TEX in more than a
dozen ways. These extensions represented only a fraction of the ideas that had been
suggested, but they seemed to provide all the requested functions in a clean way. The
time was ripe to make the extensions now or never, because the first versions of TEX
in Pascal were due to be released in April.
The last significant batch of changes to T ~ x 7 were
8 made in the summer of 1980,
when TEX acquired the ability to typeset paragraphs with arbitrary shapes. Still, the
error log shows that I kept adding enhancements regularly as the world-wide use of
TEX continued to grow. It turned out that the final bugs corrected in T ~ x 7 were8 all
introduced by recent enhancements; they were not present in the program of 1978.
T h e most significant pattern to be found among the enhancements made to TEX78
after its earliest days is the ‘unbundling’ of things that used to be frozen inside the
624 D. E. KNUTH

code. At first I had fairly rigid ideas about how much space to put in certain places,
about how much penalty to charge for certain line breaks, about how to interpret
various characters in the input, and even about where to find certain characters in
founts. One by one, starting already at change no. 104, these things became parameters
that could be changed by users who had different requirements and/or different
preferences.

T H E REAL TEX
I had vastly underestimated the complexities and subtleties of typesetting when I had
naively expected to work out a complete system for myself during a single sabbatical
year. By 1980 it became clear that I had acquired almost a moral obligation to advance
the art and science of typography in a more substantial way. I realized that I could
never be happy with the monster 1 had created unless I started over and built an
entirely new system, using the experience I had gained from m X 7 8 .
I began writing the new system in the summer of 1981, and I decided to call it
W S Z because I knew it would take a year to complete. Once again I could not
delegate the job to an associate; I wanted to rethink every detail of TEX, and I wanted
to have a thorough taste of ‘literate programming’ before I dared to inflict such ideas
on other^.^ I wanted to produce truly portable software that would have a chance to
serve for many years as a reliable component of larger systems. I wanted m X 8 2 to
justify the confidence that people were placing in T ~ x 7 8 which , was getting more
praise than it deserved.
Figure 3 shows the development of T m 8 2 , starting at the moment I decided that
it was essentially bug-free; this illustration uses the same time-warp strategy as Figure 2.

PHASE 1: MANUAL REVISION


Oct 82

Nov 82
Dec 82

Jan 83

Feb 83
Mar 83
Apr 83
May 83

J u n 83

Jul83

AUK83
Sep 83
Version 1.0
1984 PHASE 2: GLOBAL USERS
I Version 1.3
1985
Version 2.0
PHASE 3: CqNVERGENCE
100 200 300

Figure 3. The e m r s of TH82


T H E ERRORS OF TEX 625
From the beginning there were hundreds of users, so T ~ X 8 2 ’ Phase
s 1 was analogous
to T ~ X 7 8 ’ sPhase 2. But now there was yet a new dimension: several dozen people
were also reading the code and making well-informed comments on how to improve
it. Furthermore I had regular meetings with volunteer helpers who represented many
different points of view. So I had a golden opportunity to hone the ideas to a new
state of perfection.
Two major changes were installed very early in T ~ X 8 2 ’ shistory. One was to the
way founts are selected in a document (change no.545), and the other was to the
treatment of conditional parts of macros (change no. 564). Both of these changes
impinged on many of the fundamental assumptions I had made when writing the code;
these were definitely the most traumatic moments in TEX’S medical history. I was glad
to see that WEB’S documentation facilities helped greatly to make such drastic revisions
possible.
Phase 1 of T ~ x 8 ended
2 about a year after it began, when I completed writing The
T’book. The log reveals that most of the changes made to TEX during 1983 relate
to the chapters of the manual that I was writing at the time. This was the period when
TEX really grew up. As I said above, manual writing provides an ideal incentive for
system improvements, because you discover and remove glitches that you cannot justify
in print. When you are writing a user manual, you also have your last chance to make
any enhancements that you have thought about before; if certain enhancements are
not made then, you know that you will forever wish you had taken time to add them.
As with T ~ x 7 8 the
, error log of enhancements to T ~ x 8 shows
2 a significant trend
toward greater user control. More and more things that were originally hardwired in
the system became parametric instead of automatic.
Phase 2 of T@82 began with the paperback publication of The TG-book and ended
with the publication of the hardcover edition. During this phase (which lasted from
October 1983 to May 1986) I was mostly working on METAFONT and Computer
Modern, so TEX changed primarily in ways that would blend better with those systems.
The log entries of Phase 2, nos.790 to 840, also show that a number of ever-more
subtle bugs were detected by ever-more sophisticated users during this time. There
was also a completely unsubtle bug, no. 808, which somehow had sneaked through all
my tests and caused no apparent harm for an amazingly long time.
Now T ~ x 8 2is in its third and final phase. It has grown from the original 4600
statements in S A I L to 1376 modules in WEB, representing about 14,000 statements in
Pascal. Five volumes describing the complete systems for TEX, METAFONT and
Computer Modern have been published. No more changes will be made except to
correct any bugs that still might lurk in the code (or perhaps to improve the efficiency
or portability, when it is easy to do so while correcting a real bug). I hope T ~ x 8 will
2
remain stable at least until I finish Volume 7 of The Art of Computer Programming.

T E S T PROGRAMS
Since 1960 I have had extremely good luck with a method of testing that may deserve
to be better known: instead of using a normal, large application to test a software
system, I generally get best results by writing a test program that no sane user would
ever think of writing. My test programs are intended to break the system, to push it
to its extreme limits, to pile complication on complication, in ways that the system
programmer never consciously anticipated. T o prepare such test data, I get into the
626 D . E. KNUTH

meanest, nastiest frame of mind that I can manage, and I write the nastiest code I can
think of; then I turn around and embed that in even nastier constructions that are
almost obscene. T h e resulting test program is so crazy that I could not possibly explain
to anybody else what it is supposed to do; nobody else would care! But such a program
proves to be an admirable way to flush the bugs out of software.
In one of my early experiments, I wrote a small compiler for Burroughs Corporation,
using an interpretive language specially devised for the occasion. I rigged the interpreter
so that it would count how often each instruction was interpreted; then I tested the
new system by compiling a large user application. T o my surprise, this big test case
did not really test much; it left more than half of the frequency counts sitting at zero!
Most of my code could have been completely messed up, yet this application would
have worked fine. So I wrote a nasty, artificially contrived program as described above,
and of course I detected numerous new bugs while doing so. Still, I discovered that
10 per cent of the code had not been exercised by the new test. I looked at the
remaining zeros and said, ‘Shucks, my source code wasn’t nasty enough, it overlooked
some special cases I had forgotten about’. It was easy to add a few more statements,
until eventually I had constructed a test routine that invoked all but one of the
instructions in the compiler. (And I proved that the remaining instruction would never
be executed in any circumstances, so I took it out.)
I used such ‘torture tests’ to debug three compilers during the 1960s. In each case
very few bugs were ever discovered after the tests had been passed, so the methodology
was quite effective. But when I debugged T ~ x 7 8 my , test program was quite tame by
comparison-except when I was first testing the mathematics routines (20-23 March).
I guess I was not trying as hard as usual to make TEX a bullet-proof system, because
I was still thinking of myself as TEX’S main user. My original test program for T ~ x 7 8
was written with an ‘I hope it works’ attitude, rather than ‘I bet I can make it fail’. I
suppose I would have found several dozen of the bugs that showed up later (such as
nos. 240 and 263) if I had stuck to the torture-test methodology. Still, considering my
mood at the time, I suppose it was a good idea to have a test program that would look
like real typography; I did not know what TEX should do until I could judge the
aesthetic quality of its output.
At any rate, my first test program was based on a sampling of material from Vol-
ume 2. I went through that book and boiled it down to five pages that illustrated just
about every kind of typographical difficulty to be found in the entire volume. (The
output of this test program can be seen in another paper,” where David Fuchs and I
used the same test data to study some algorithms for fount management.)
Years later, when T ~ x 8 2was ready to be debugged, I understood pretty clearly
what the program was supposed to do, so I could then apply the superior torture-test
methodology. My test program was called TRIP; I spent about five days preparing the
first draft of T R I P in July 1982. Here, for example, is a relatively tame part of the
original TRIP code:

\def\gobble#l{} M l o a t i n g p e n a l t y 100
\everypar {A\insert200{\basel ineskip400pt\spl i t t o p s k i p \ c o u n t 1 5 p t
\hbox{\vadjust{\penalty999}}\hbox t o -lOpt{}}\showthe\pagetotal
\show t he\p agegoal\advance\co mt 15by l h a r k {\ t he\c oun t 15 }%
\splitmadepth-lpt\paR\gobble}%abort e v e r y paragraph a b r u p t l y
THE ERRORS OF TEX 627
\def \ w e i r d # l {\csname\expandaf t e r k o b b l e\s tr i n g # l
\s tr ing\c sname\endc sname} \message{\the\output\we ird\one}

(Please do not ask me what it means.) Since then 1 have probably spent at least 200
hours modifying and maintaining TRIP, but I consider that time well spent, and I
think TRIP is one of the most significant products of the TEX project.” The reason
is that the T R I P test has detected extremely subtle bugs in hundreds of implementations
of TEX, bugs that would have been almost impossible to track down in any other way.
T ~ x 8 . 2 ,with its TRIP test, has proved to be much more reliable than any of the Pascal
compilers it has been compiled with. In fact, I believe it is fair to say that T E X ~ Zhas
helped to flush out at least one previously unknown compiler bug whenever it has been
ported to a new machine or tried on a compiler that has not seen TEX before! These
compiler errors were detectable because of the TRIP test. Later I developed a similar
test program for METAFONT, called TRAP,” and it too has helped to exorcise dozens
of compiler bugs.
A single test program cannot detect all possible mistakes. For example, TEX might
terminate with a ‘fatal error’ in several ways, only one of which can happen on any
particular run. Furthermore, TRIP runs almost automatically, so it does not test all of
TEX’S capability for on-line interaction. But TRIP does exercise almost all of TEX’S
code, and it does so in tricky combinations that tend to fail if any part of TEX is
damaged. Therefore it has proved to be a great time-saver: whenever I modify TEX,
I simply check that the results of the TRIP test have changed appropriately.
The only difficulty with the TRIP methodology is that I must check the output
myself to see if it is correct. Sometimes I need to spend several hours before I have
determined the appropriate output; and I am fallible. So TEX might give the wrong
answer without my being aware of it. This happened in bug nos. 543 and 722, when
I learned to my surprise that TEX had never before done the correct thing with TRIP.
A system utility for comparing files suffices now to convince me that incremental
changes to TEX or TRIP cause the correct incremental changes to the TRIP test output;
but when I began debugging, I needed to verify by hand that thousands of lines of
output were accurate.
I should mention that I also believe in the merit of formal and informal correctness
proofs. I generally try to prove my programs correct, informally, by stating appropriate
invariants in my documentation and checking at my desk that those relations are
preserved. But I can make mistakes in proofs and in specifying the conditions for
correctness, just as I make mistakes in programming; therefore I do not rely entirely
on correctness proofs, nor do I rely entirely on empirical test routines such as TRIP.

LOCATION AND TYPE OF ERRORS


Let me review again the fifteen classes of errors that are listed in my error log:

A - Algorithm F - Forgotten P - Portability


B - Blunder G - Generalization Q - Quality
C - Cleanup I - Interaction R - Robustness
D - Data L - Language S - Surprise
E - Efficiency M - Mismatch T -Typo
628 D . E. KNUTH

I mentioned before that each of the errors listed in the appendix refers where possible
to its approximate location in the program listing of T ~ x 8 2 It. is natural to wonder
whether the errors are uniformly interspersed throughout the code, or if certain parts
were particularly vulnerable. Figure 4 shows the actual distribution. No part of the
program has come through unscathed-or, shall we rather say, unimproved-but some
parts have seen significantly more action. The boxes to the left of the vertical lines in
Figure 4 represent ‘bugs’ (A, B, D, F, L, M, R, S, T ) , whereas the boxes to the right
represent ‘enhancements’ (C, E, G, I, P, Q). T h e most unstable parts of T ~ x 7 were
8
the parts I understood least when I began to write the code, namely mathematical
formatting and alignment. The most unstable parts of T ~ x 8 2were the parts that
differed most from T ~ x 7 8(the conditional instructions and other aspects of macro
expansion; also the increased user access to registers and internal quantities used in
m’sdecision-making) .
I should mention why hyphenation is almost never mentioned in the log of TEX78.
Although I said earlier that m X 7 8 was entirely written before any of it was tested,
that is not quite true. T h e hyphenation algorithm was quite independent of everything
else and easily isolated from the code, so I had written and debugged it separately
during three days in October 1977. (There is obviously no advantage to testing
independent programs simultaneously; that leads only to confusion. But the rest of
was highly interdependent, and it could not easily be run when any of the parts

$0 Input/output, strings
$50 Error handling
$100
Data structures for semantics
$150
Basic operations on data
$200
The hash table
8250
Data structures for syntax
$300
Low-level parsing
$350
Macro expansion
$400
Medium-level parsing
$450
Conditionals
$500
File name scanning
$550
Font data
$600
Binary output
$650
Data structures for math
$700
Math typesetting
$750
Alignment
gsoo Line breaking
$850
Line breaking, continued
$900
Hyphenation
$950
Page breaking
51000 The chief executive
$1050
Building boxes
$1100
Building lists
$1150
Building math formulas
$ 1200
Assigning to user registers
$1250
Miscellaneous
$1300
Initialization
$1350
Extensions
§ 1400

Figure 4. Distribution of p u g s I enhancements] by program location


THE ERRORS OF TEX 629
were absent, except for the routines that produced the final output.) T h e hyphenation
algorithm of T ~ x 7 was
8 English-specific; Frank Liang, who had helped me with this
part of TEX78, developed a much better approach in his thesis,” and I ultimately
incorporated his algorithm in m X 8 2 .
Figure 5 shows the accumulated number of errors of each type in T ~ x 7 8with
, bugs
at the bottom and enhancements at the top. Initially the log entries are mostly bugs,
with occasional enhancements of type I ; at the end, however, enhancements C, G and
Q predominate. Figure 6 is a similar diagram for T ~ x 8 2 In. the latter case the vast
majority of errors are enhancements, and there are no bugs of types M or T. That is
because the debugging phase of w 8 2 does not appear in the log, not because I
learned how to make fewer mistakes.

SOME NOTEWORTHY BUGS


The gestalt of X'S evolution can best be perceived by scanning through the log
book, .item by item. But I would like to single out several errors that were particularly
instructive or otherwise memorable.

Figure 5. Accumulated e m r s of TM78, divided into jifteen categories


630 D. E. KNUTH

Figure 6. Accumulated e m r s of T E X S ~ not


, counting its initial debugging stage

A, Algorithmic anomalies
I decided from the beginning that the algorithms of TEX would be in the public
domain. But if I were to change my mind and charge a fee for my services in inventing
them, I would probably request the highest price for a comparatively innocuous-looking
group of statements now found in sections 851 and 854 of the program. This precise
sequence of logical tests, used to control when a line break is being forced because
there is no ‘feasible’ alternative, has the essential form

if a, v az then
if ag A a4 A a5 A a6 then u1
else if a7 then uz else u3
else u4

and most of the appropriate boolean conditions a; were discovered only with great
difficulty. T h e program now warns any readers who seek to improve TEX to ‘think
thrice before daring to make any changes here’. Some indications of my struggles with
this particular logic appear in error nos. 75, 93 and 506.
-X’s line-breaking algorithm determines the optimum sequence of breaks for each
paragraph, in the sense that the total ‘demerits’ are minimized over all feasible sequences
of breaks. The original algorithm was fairly simple, but it continued to evolve as I
fiddled with the formula used to calculate demerits. Demerits are based on the ‘badness’
b of the line (which measures how loose or tight the spacing is) and the ‘penalty’p for
the break (which may be at a hyphen or within a mathematical formula). A penalty
might be negative to indicate a good break. T h e original formula for demerits in T ~ x 7 8
was

D = max(b +P,O)~
T H E ERRORS OF TEX 63 1

error no. 76 replaced this by

D = { (1 + b + P ) ~ , ifp 2 0
+
(1 b)2-p2, ifp < 0

The extra constant 1 was used to encourage paragraphs with fewer lines; the subtraction
of p2 when p < 0 gave fewer demerits to good breaks. This improved formula was
published on page 1128 of the article on line-breaking by Knuth and Plass.13 T h e first
draft of T ~ x 8 2
added an obvious generalization to the improved formula by introducing
a \1inepenalt y parameter, t , to replace the constant 1. A further improvement was
made in change no. 554, when I realized that better results would be obtained by
computing demerits as follows:

D = { (1 + b ) 2+ p 2 , ifp 2 o
(1 + b ) 2- p 2 , if p < 0

Otherwise, a line with, say, ( b , p ) = (SO,lOO), followed by a line with ( b , p) = (O,O),


would be considered inferior to a pair of lines with ( b , p) = (0,100) and (100,0),
although the second pair of lines would actually look much worse.

B, Blunders
A typical blunder, among the 50 or so errors of class B in the appendix, is illustrated
by error nos. 7 and 92. I had declared two symbolic constants in my program, new-line
(for one of the three states of TEX’S lexical scanner) and next-line (for the sequence-of
ASCII codes carriage-return and line-feed, needed in S A I L output conventions).
Although the meanings were quite dissimilar, the names were quite similar; therefore
I confused them in my mind. T h e compiler did not detect any syntax error, because
both were legal in an output statement, so I had to detect and correct the bugs myself.
I could have avoided these errors by using a name like cr-If instead of nextline; but
that sounds too jargony. A better alternative would have been new-line-state instead
of new-line.

D, Data disasters
My most striking error in data-structure updating was no. 630, which crept in when
I made change no. 625. T h e error needs a bit of background information before I can
explain it: using an idea of Luis Trabb Pardo, I was able to save one bit in each node
of TEX’S main data structures by putting the nodes in which the bit would be &the
so-called charnodes-into the upper part of the mem array, all other nodes into the
lower part. (It was very important to save this bit, because I needed at least 32
additional bits in every charnode.) One of the aspects of change no. 625 was to optimize
my data structure for representing mathematical subformulae that consist of a single
letter. I could recognize and simplify such a subformula by looking for a list that
consisted of precisely two elements, namely a charnode followed by a ‘kern node’ (for
an ‘italic correction’). A kern node is identified by (a) not being a charnode, i.e. not
having a high memory address, and (b) having the subfield type = 11.
632 D . E. KNUTH

I forgot to test condition (a). But my program still worked in almost every case,
because unsuitable lists of length 2 are rare as subformulae, and because the type
subfield of a charnode records a fount number. Amazingly, however, within one week
of my installing change no. 625, some user happened to create a mathematical list of
length 2 in which the second element was a character from fount number l l !
This example demonstrates that I was lucky to have a wide variety of users. Still,
such a bug might survive for years before it would cause trouble for anybody.

F, Forgetfulness
As I am writing this paper, I am trying to remember all the points I wanted to
explain about TEX’S evolution. Probably I will forget something, as I did when I was
writing the program for TEX.
Usually a bug of class F was easily noticed when I first looked at the corresponding
part of the code, with my walk-through-in-execution-order method of debugging. But
I would like to mention two of the F errors that were among the most difficult to find.
Both of them occurred in routines that had worked correctly the first few times they
were exercised; indeed, these routines had been called hundreds of times, with perfect
results, so I no longer suspected that they could be the source of any trouble.
Error no.91 occurred in the memory allocation subroutine, the first time I ran out
of memory. That subroutine had the general form

begin (Get ready to search);


repeat (Look at an available s!ot);
if (big enough) then goto found;
(Move to next slot);
until (back at the beginning);
found: (Allocate and return, unless the available list becomes exhausted);
ovfl: (Give an overflow message);
end

The bug is obvious: I forgot to say ‘goto ovfl’ just before the label ‘found:’. And it is
also obvious why this bug was hard to find: I had lost my suspicions that this subroutine
could fail, but when it did fail it allocated one node right in the middle of another.
My linked data structure was therefore destroyed, but its defective fields did not cause
trouble until several hundred additional operations had been performed by the parts
of the program where I was still looking for bugs.
Error no.203 was even more difficult to find; it lurked in TEX’S get-next routine,
the subroutine that is executed far more than any other. Whenever TEX is ready to
see another token of input, get-next comes into action. Therefore, by the time I had
corrected 200 errors, get-next had probably obtained the correct next token more than
100,000 times; I considered it rock-solid reliable.
Since get-next is part of TEX’S ‘inner loop’, I had wanted it to be efficient. Indeed,
I learned later that the very first statement of get-next, ‘cur-cs t O’,is performed more
often than any other single statement of Q X 8 2 . (Empirical tests covering a period of
more than a year show that ‘cur-cs t 0’ was performed more than 1.4 billion times on
Stanford’s SUAI computer. T h e get-avail routine, which is next in importance, was
invoked only about 438 million times.) Knowing that get-next was critical, I had tried
THE ERRORS OF TEX 633
to avoid performing ‘cur-cs t 0’ in my first implementations, in cases where I knew
that the value of cur-cs would not be examined by the consumers of getnext’s tokens.
In fact, I knew that cur-cs would be irrelevant in the vast majority of cases. (But I
also knew, and forgot, Hoare’s dictum that premature optimization is the root of all
evil in programming.)
Well, you can almost guess the rest. When 1 corrected my serious misunderstanding
of alignments, error nos. 108 and 110, I introduced a new case in get-next, and that
new case filled my thoughts so much that I forgot to worry about the ‘cur-cs t 0’
operation. Still, no harm was done unless cur-cs was actually being looked at; TEX
would not fail unless \ c r occurred in an alignment having a special sort of template
that required back-up in the parser. As before, the effect of this error was buried in a
data structure, where it remained hidden until much later. I found the bug only by
temporarily inserting new code that continually monitored the integrity of the data
structures. (Such code later became a standard diagnostic feature of it can be
seen for example in section 167.)

L, Language lossage
Some of my errors (nos.98, 295, 296, 480) were due to the fact that algorithms
involving floating-point numbers sometimes fail because of round-off errors. (I have
assigned these errors to class L instead of class A, although it was a close call.) T ~ x 8 2
was designed to be portable so that it gives essentially identical results on all computers;
therefore I avoided floating-point calculations in critical parts of the new program.
Two other errors in my log belong unambiguously to class L: in nos. 63 and 827, I
failed to insert parentheses into a macro definition. As a result, when I used the macro
with text replacement, any frequent user of macros can guess what happened. (Namely,
in no.827, I had declared the macro

hi-mem-stat-min = mem-top - 13

and used it in the statement

dyn-used t mem-top + 1 - hi-mem-stat-min;


this gave a minus where I wanted a plus.)

MyMismatches
When I write a program I tend to forget the exact specifications of its subroutines.
One of my frequent flubs is to blur the distinction between an object and a pointer to
that object. In T ~ x 8 2 for
, example, I noticed when I got to error no.79 that I had
called vpackage (p, . . . ) where p pointed to the first node of a vlist, whereas in the
declaration of vpackage I had assumed parameters of the form (h, . . . ) where h points
to a list header; thus link(h), not h itself, was assumed to point to the first list item.
T h e compiler did not catch the error because both h and link(h) were of type pointer.
While fixing this bug it occurred to me that vpackage was an oft-used subroutine
and that I might have made the same mistake more than once. So I looked closely at
each of the 26 places I had called vpackage, and the results proved that I was remarkably
634 D . E. KNUTH

inconsistent: I had specified a list head 14 times, and a direct pointer 12 times!
(Fortunately there was not a 13-13 split; that would have been unlucky.)
This error reminded me that I should always check the entire program whenever I
notice a mistake;fuilures tend to recur. In fact, several errors of T ~ x 8 (nos.
2 803, 813,
815, 837) were first noticed when I was debugging similar portions of METAFONT.

R, Robustness
Most of the changes of type R were introduced to keep TEX from crashing when
users supply input that does not obey the rules. But some of the Rs in the log are
intended to keep TEX alive even when other parts of TEX are failing, because of my
programming errors or because somebody else is trying to produce a new modification
of TEX.
Thus, for example, in nos.99 and 123, I redesigned two of my procedures so that
they would produce a symbolic printout of given data structures in memory even when
those data structures were malformed. I made it possible to obtain meaningful output
from arbitrary bit configurations in memory, so that while debugging TEX I could
interactively look at garbage and guess how it might have arisen.
One of the most recent changes to TEX, no. 846, has the same flavour: the parameter
to show-node-list was redeclared to be of type integer instead of type pointer, because
buggy calls on show-node-list might not supply a valid pointer.

S, Surprises
The most serious errors were those due to my global misunderstandings of how the
system fits together. T h e final error in TEX78 was of type S, and I suppose the final
error of T ~ x 8 2will be yet another surprise.
Let me mention just two of these. The first is extremely embarrasing, but it makes
a good story. TEX produces D V I files as output, where DVI stands for Device
Independent. T h e D V I language is like a machine language, consisting of %bit instruc-
tion codes followed in certain cases by arguments to the instructions. Two of the
simplest instructions of DVI language are push (code 141) and pop (code 142). It turns
out that TEX might output push followed immediately by pop in various circumstances,
and this needlessly clutters up the DVI file; so I decided to optimize things a bit by
checking to see whether the final byte in my output buffer was push before TEX would
output a pop. If so, I could cancel both instructions. This technique even made it
possible to detect and cancel long redundant sequences such as push push pop push
push pop pop pop. Naturally, I checked to see that the buffer had not been entirely
cancelled out when I tested for such an optimization. ( I was not 100 per cent stupid.)
But I failed to realize that the byte just preceding pop might just happen to be 141
(the binary code for push) when it was the final operand byte of some other instruction.
Ouch!
The other S bug I want to discuss is truly an example of global misunderstanding,
because it arose in connection with my misperceptions about \global definitions in
TEX documents. Users can define control sequences such as \abc inside a TEX
‘group’, which is essentially a ‘block’ in the sense of Algol scope rules. At the end of
a group, local definitions are rescinded and control sequences revert to the meanings
T H E ERRORS OF TEX 635
they had at the beginning of the group. In my first implementation of T ~ x 7 8I went
even further: If \abc was defined inside a group but not before the group had begun,
I actually removed \abc from the hash table when the group ended.
There is one exception, however, to TEX’S local scope rules (and it is usually the
exceptions that lead to surprises). Users can state that a definition is \ g l o b a l ; this
means that the new definition will survive at the end of the current group, unless it
has been globally redefined again. Therefore my implementation removed control
sequences from the hash table at group endings only when they had not been globally
defined.
That caused bug no. 422, which was identical to one of the first serious bugs I had
ever encountered when learning to program in the 1950s: deletions from an ‘open’ hash
table might make other keys inaccessible, unless the deletions occur in FIFO order,
or unless the deletion algorithm takes special precautions to relocate keys in the table.
(See my book Sorting and Searching,22 pp. 526-527, where I say-in italics-‘The
obvious w a y to delete records from a scatter table doesn’t work.’) Alas, I had deleted
the control-sequence records in the ‘obvious way’ in T ~ x 7 8 not
, realizing that global
definitions destroyed the F I F O order.
To fix bug no. 422, I could not patch the definition procedure by using Algorithm
6.4R from my book,22 because the organization of TEX did not allow for relocation of
keys. So I needed to change the hash table algorithm from linear probing to chaining,
which supports arbitrary deletions. This change was not as painful as it might have
been at this late date (August 1979), because I had needed an excuse anyway to
overcome my initial hash table design. In order to keep the original implementation
simple, I had decided to require that control sequence names be essentially unique
when restricted to their first six letters. Such a restriction was quite reasonable when
I was to be the only user of TEX; but it was becoming intolerable when the number
of users began to grow into the thousands. Therefore change no. 422 not only altered the
hash discipline, it also changed the entire representation mechanism so that identifiers of
arbitrary length could be accommodated.
And that was not the end of the story. Another year and a half went by before I
realized (in no. 493) that TEX allows declarations like

\def\abc{. . . }
\global\def \xyz{.. . \abc. . . }

within a group. In such cases I could not eliminate \abc from the hash table at the
end of the group, because a reference to \abc still survived within \xyz. I finally
decided not to delete anything from the hash table (although I did provide a mechanism
to prevent unwanted keys from ever getting in; see nos. 294 and 769).
How did such serious bugs remain undetected for so long? They lay dormant because
normal usage of TEX does not require complicated interactions between local and
global definitions in groups. Most formatting is simpler than this; even complex books
such as The Art of Computer Programming and the TEX manual itself do not need such
generality. But if I had used the TRIP test methodology in the early days, I would
have found and corrected the local/global problems right at the start. This experience
suggests that all software systems be subjected to the meanest, nastiest torture tests
imaginable; otherwise they will almost certainly continue to exhibit bugs for years after
they have begun to produce satisfactory results in large applications.
636 D. E. KNUTH

T, Typographical trivia
T h e typographical errors of T@ were not especially significant, but I will mention
two of them (nos. 69 and 86), where my original SAIL code looked like this:

glueshrink ( q ) cglueshrink (q)t glueshrink (t);


xt x c width (9).
SAIL was written for the extended ASCII character set that once was widely used at
Stanford, MIT, CMU and a few other places; one of the important characters was
‘+’, for Algol’s ‘:=’. The language allowed multiple assignment, hence both of these
statements were syntactically correct (although rather silly).
A language designer straddles a narrow line between restrictiveness and permissive-
ness. If almost every sequence of characters is syntactically correct, the inevitable
typographical errors will almost never be detected. But if almost no sequences of
characters are syntactically correct, typing becomes a real pain.
In T ~ x 7 I8 made a terrible decision (no. 402) to allow users to type a letter such as
‘A’ whenever TEX was expecting to see a number; the meaning was to use the ASCII
code of A (97) as the number. This extended the language for certain hacker-type
applications; but it caused all sorts of grief to ordinary users, because their typographical
errors were being treated as perfectly meaningful TEX input, and they could not figure
out what was going wrong. (I compounded the error in no. 507; see also no. 511. This
is a sorry part of the record.) T ~ x 8 resolved
2 the problem by using a special character
to introduce ASCII constants.

SOME NOTEWORTHY ENHANCEMENTS


Let us turn now to the other six kinds of errors in the log.

C, Clean-ups
The stickiest issue in TEX has always been the treatment of blank spaces. Users
tend to insert spaces in their computer files so that the files look nice, but document
processors must also treat spaces as objects that appear in the final output. Therefore,
when you see documents nowadays that have been prepared by systems other than
m X , you often find cases where double spaces appear incorrectly between words; arid
when you see documents prepared with Tm, you run into cases where a necessary
space between words has disappeared. I kept searching for rules that would be simple
enough to be easily learned, yet natural enough that they could be applied almost
unconsciously. I finally concluded that no such rules existed, and I opted for the best
compromise I could find.
Several of the log entries refer to the question of optional spaces after a macro
definition. In no. 133, I decided to ignore a space that appears there; this was prompted
by experiences recorded in my comments following nos. 115 and 119. But no. 133 caused
a timing problem in no. 560, because the macro definition had not been fully processed
when TEX wanted to check for the optional space; if the user invoked the macro
immediately, instead of putting a space there, TEX was not ready to respond. Finally
in no. 606 I came to the conclusion that m X users will best be able to keep their sanity
T H E ERRORS OF TEX 637

if I do not ignore spaces after definitions; then dozens of similar-appearing cases all
have consistent rules.
(See also no.220, for space after $$; nos.361, 708, 720 and 723, for space after
constants; no. 440, for space after active characters; and no. 632, for space after ‘\Y.)

G, Generalizations
TEX continued to grow new capabilities as people would present me with new
applications. When I could not handle the new problem nicely with the existing TEX,
I would usually end up changing the system. (But I kept the changes minimal, because
I always wanted to finish and get on with other things. More about that later.)
Such generalizations were often built incrementally on the shoulders of their prede-
cessors. For example, the original TEX78 had \output and b a r k and macro
definitions, which scanned and remembered lists of tokens, but there was no good way
to assign a list of tokens to a ‘token list variable’ without causing macro expansion.
Then TEX82 added a feature called \everypar, which Arthur Keller had long been
lobbying for. One day I noticed that I could solve a user’s problem in a tricky way by
temporarily using \everypar to store a list of tokens. This was quite different from
the intended use of \everypar, of course; so I introduced a new primitive operation
called \tokens for such purposes (no. 559). Later, \everypar spawned several
descendants called \everymath and \everydi splay (no. 568), \everyhbox and
\ e v e r y b o x (no. 649), \everyj ob (no. 657), \everycr (no. 688). I eventually found
applications where \tokens was not enough by itself and I needed to borrow one of
the \every features temporarily to do some non-standard hackery. So I finally replaced
\tokens by an array of 256 registers called \toks (no.713), analogous to TEX’S
existing arrays of registers for integers, dimensions, boxes and glue. TEX82 also
acquired the ability to make assignments between different kinds of token-list variables
(no. 746). In such ways I tried to keep the design ‘orthogonal’ as the language grew.
Of course every language designer likes to keep a language simple by applying
Occam’s razor. I was pleased to discover early in 1977 that simple primitive operations
involving boxes, glue and penalties could account for many of the fundamental oper-
ations of typesetting. This was a real unification of basic principles, and it turned out
to be even better when I realized that the concepts of ordinary line-breaking applied
also to tasks that seemed much harder.I3 But I also fooled myself into thinking that
TEX had fewer primitives than it really did, by ‘overloading’ operations that were
essentially independent and calling them single features.
For example, my original design of TEX78 would break paragraphs into lines by
ignoring all lines whose badness exceeded 200. Later (no. 104) I made this threshold
value user-settable by introducing a new primitive called \j par. Setting \jpar =2
was something like setting \tolerance=200 in TEX82; but I also included a peculiar
new convention: if \ j p a r was odd, the paragraphs would be set with ragged right
margins, otherwise they would be justified to the full width!
Thus, in my attempt to minimize primitives, I had loaded two independent ideas
onto a single parameter. I had also packed half a dozen different kinds of diagnostic
output into a single number called \ t r a c i n g (see no. 199), whose binary digits were
examined individually when TEX was deciding whether to trace parts of its operations.
Then I began to see the need for more user-settable numbers, and I shuddered to
think at the resultant multiplicity of new primitives. So I replaced both \ j p a r and
638 D . E. KNUTH

\tracing by a single primitive called \chpar (no.244); one could now


say, for example, \chparl=2 instead of \jpar=2. This change gave me the
courage to add new parameters for hyphen penalties, etc., and I even added a new
parameter to control the raggedness of right margins (no. 334). Now the parity of
\jpar was irrelevant; henceforth, the right margins could be either straight or ragged,
or they could be produced using some smoothly varying compromise between those
extremes--‘one third of the way to full raggedness.’
My decision to introduce \chpar in T ~ x 7 8was not too bad, because TEX is a
macro language and I could immediately define \jpar and \tracing as abbreviations
for \chparl and \chpar2. But still, those arbitrary numerical codes were inelegant.
T ~ x 8 2now has fifty different primitive operations that denote integer-valued par-
ameters, each with standard (but user-changeable) names. The old \jpar has become
\tolerance and \pretolerance. The old \tracing has been unbundled into
\tracingparagraphs, \tracingpages, \tracingmacros, and half a dozen more,
with separate parameters such as \showboxdepth to govern the amount of display.

I, Interactions
About 15 per cent of the errors in the TEX log have been classified type I. T h e
main issue in such cases is to help users identify and recover from errors in their source
programs, and this is always problematical because there are so many ways to make
mistakes. ‘When your error is due to misunderstanding rather than mistyping, . . . TEX
can only explain what looks wrong from its own viewpoint; such an explanation is
bound to be mysterious unless you understand the machine’s attitude’. I s Which you
don’t.
Still, I kept trying to make TEX respond more productively, and every such change
was logged as an ‘error’ in my original design. The most memorable error of this type
was probably no.213, when I first realized how nice it would be if I could insert a
token or two that mX could read immediately, instead of aborting a run and starting
from scratch. (This was soon followed by no. 242, when deletion of tokens was also
allowed in response to an error message.) I would never have thought of these
improvements if I had not participated in the implementation and testing of TEX, and
I have often wished for similar features in the compilers I have used since. This one
feature must have saved me hundreds of hours as a mX user during recent years.
Another improvement in interaction did not occur to me until several months and
several hundred pages of output later. Error no.338 records the blessed day when I
gave TEX the ability to track ‘runaways’, parts of the program that were being processed
in the wrong mode because of missing right delimiters. (Further refinements to that
change were logged as entry nos.344, 426 and 793.) Without such provisions, errors
that TEX could not have detected until long after their appearance would have been
much harder to track down.
There was another significant improvement in interaction that never made it into
my error log, because I included it in the original TEX82 without ever putting it into
m X 7 8 . This is the short-display procedure, for showing the contents of ‘overfull boxes’
and such things in an abbreviated form easily understood by novice users. The
short-display idea was invented by Ralph Stromquist, who installed it in his early
version of TEX at the University of Wisconsin.
THE ERRORS OF TEX 639
P, Portability
The first changes of type P were simply enhancements to the comments in my S A I L
program, but the advent of WEB made it possible for TEX to become truly independent
of the machine and operating system it was being run on.
Change no.633 is perhaps the most instructive class-P modification: I decided to
guarantee compatibility between DEC-like systems (which break the source file into
lines according to the appearance of ASCII carriage-return characters) and IBM-like
systems (which have fixed-length source lines reminiscent of 80-column cards),* in the
following way: whenever TEX reads a line of input, on any system, it automatically
removes all blank spaces that appear at the right end. The presence or absence of such
blanks therefore cannot influence the behaviour of TEX in any way. An ASCII file
whose lines are at most 80 characters long (as defined by carriage returns, with or
without blanks in front of those carriage returns) can be converted to a file of 80-
character records that will produce identical results with TEX, simply by padding each
line with blanks.
Change no. 791 carried no. 633 to its logical conclusion.

From the beginning, I wanted TEX to produce documents of the highest possible
typographical quality. T h e time had come when computer-produced output no longer
needed to settle for being only ‘pretty good’; I wanted to equal or exceed the quality
of the best books ever printed by photographic methods.
As Kernighan and Cherry have said, ‘The main difficulty is in finding the right
numbers to use for esthetically pleasing positioning. . . . Much of this time has gone
into two things-fine-tuning (what is the most esthetically pleasing space to use between
the numerator and denominator of a fraction?), and changing things found deficient
by our users (shouldn’t a tilde be a delimiter?)’.23
I too had trouble with numerators and denominators: change no. 229 increased the
amount of space surrounding the bar line in displayed fractions, and I should have
made a similar change to fractions in text. (Page 68 of the new Volume 2 turned out
to be extremely ugly because of badly spaced fractions.) T ~ x 8 was2 able to improve
the situation because of my experiences with T ~ x 7 8 but , even today I must take
special precautions in my TEX documents to get certain square roots to look right.

THE EVOLUTION PROCESS AS A WHOLE


Looking now at the entire log of errors, I am struck by the fact that my attitude during
those years was clearly far from ideal: my overriding goal was always to finish, to
finish, to get this long-overdue project done so that I could resume work on other
long-overdue projects. I never wanted to spend extra time studying alternatives for the
best possible typesetting language; only rarely was I in a mood to consider any changes

*Paradoxically, DEC has also introduced the VMS operating system, which has fixed-length lines that can include
troublesome carriage-returns. But that is another story.
640 D . E. KNUTH

to TEX whatsoever. I wanted TEX to produce the highest quality, sure, but I wanted
to achieve that with the minimum amount of work on my part.
At the end of almost every day between 29 March 1978 and 29 March 1980, I felt
that TEX78 was a complete system, containing no bugs and needing no further
enhancements. At the end of almost every day since 9 September 1982, I have felt
that TEX82 was a complete system, containing no bugs and needing no further
enhancements. Each of the subsequent steps in the evolution of TEX has been viewed
not as an evolutionary step towards a vague distant goal, but rather as the final
evolutionary step towards the finally reached goal! Yet, over time, TEX has changed
dramatically as a result of many such ‘final steps’.
Was this horizon-limiting attitude harmful, or was it somehow a blessing in disguise?
I am pleased to see that TEX actually kept getting simpler as it kept growing, because
the new features blended with the old ones. I was constantly bombarded by ideas for
extensions, and I was constantly turning a deaf ear to everything that did not fit well
with TEX as I conceived it at the time. Thus TEX converged, rather than diverged,
to its final form. By acting as an extremely conservative filter, and by believing that
the system was always complete, I was perhaps able to save TEX from the ‘creeping
featuri~m’’~ that destroys systems whose users are allowed to introduce a patchwork of
loosely connected ideas.
If I had time to spend another ten years developing a system with the same aims as
TEX-if I were to start all over again from scratch, without any considerations of
compatibility with existing systems-I could no doubt come up with something that
is marginally better. But at the moment I cannot think of any big improvements. The
best such system I can envision today would still look very much like T ~ x 8 2 so ; 1
think this particular case study in program evolution has proved to be successful.
Of course I do not mean to imply that all problems of typography have been solved.
Far from it! There still are countless important issues to be studied, relating especially
to the many classes of documents that go far beyond what I ever intended TEX to
handle.

CONCLUSIONS
My purpose in this paper has been to describe what I think are the most significant
aspects of the experiences I had while developing TEX, basing this on a study of more
than 800 errors that I noted down in log books over the years. I have tried to interpret
many specific facts and observations in a sufficiently general way that readers may
understand how to apply similar concepts to other software developments.
In Volume 1 of The Art of Computer Pr~gramrning,~’ I wrote:

Debugging is an art that needs much further study . . . T h e most effective


debugging techniques seem to be those which are designed and built into
the program itself . . . Another good debugging practice is to keep a record
of every mistake that is made. Even though this will probably be quite
embarrassing, such information is invaluable to anyone doing research on
the debugging problem, and it will also help you learn how to reduce the
number of future errors.
THE ERRORS OF TEX 64 1

Well, I hope that my error log in the appendix below, especially the first 237 items
(which relate specifically to debugging), will be useful somehow to people who study
the debugging process.
But if you ask whether keeping such a log has helped me learn how to reduce the
number of fature errors, my answer has to be no. I kept a similar log for errors in
METAFONT, and there was no perceivable reduction. I continue to make the same
kinds of mistakes.
What have I really learned, then? I think I have learned, primarily, to have a better
sense of balance and proportion. I now understand the complexities of a medium-size
software system, and the ways in which it can be expected to evolve. I now understand
that there are so many kinds of errors, we cannot stamp them out by systematically
eliminating everything that might be ‘considered harmful’. I now understand enough
about my propensity to err that I can accept it as an act of life; I can now be convinced
more easily of my fallacy when I have made a mistake. Indeed, I now strive energetically
to find faults in my own work, even though it would be much easier to look for
assurances that everything is OK. I now look forward to making (and correcting)
hundreds of future errors as I write Volume 4 of The Art of Computer Programming.

ADDENDUM: F I F T E E N M O N T H S MORE
As I mentioned above, I began to write this paper in May 1987, but I decided to wait
before publication until more time had gone by. Then I could present a ‘complete’
and ‘final’ record of TEX’S errors.
Now it is September 1988, and I have decided to bring this paper to a possibly
premature conclusion, because I am scheduled to present it at a conference.’” TEX
still has not shown encouraging signs of becoming quiescent; indeed, sixteen more
entries have entered the error log since May 1987, including three as recent as June
1988. Therefore it still is not the right moment to manufacture TEX on a chip!
All errors known to me as of 1 September 1988, are now included in the appendix
to this paper; the total has now reached 865.” I plan to publish a brief note ten years
from now, bringing the list to its absolutely final form.
I have been paying a reward to everyone who discovers new bugs in TEX, and
doubling the amount every year. Last December I made two payments of $40.96 each,
and my chequebook has been hit for five $81.92 payments in recent months. I am
desperately hoping that this incentive to discover the final bugs will produce them
before I am unable to pay the promised amount. (Surely in 1998 I won’t be writing
cheques for $83,886.08?)
As I expected, half of the most recent errors have fallen into the surprise (S)
category-even though surprises, by definition, are unexpected. But one of the others
(error no.854) was perhaps the most surprising of all, because it was the result of a
terrible algorithm by a person who certainly should have known better (me). I wanted
to multiply the two’s-complement fixed-point number

A = -16 + a, x 2-4 + az x 2-’* + a3 x 2-’O, 0 5 a; < 256


*Errors 866 and 867 were added after this paper was first submitted.
642 D . E. KNUTH

by the positive quantity Z/2'6, where Z is an integer, ZZ6 5 Z < 227, obtaining an
answer of the form P/216 where P is an integer, IPI < Z 3 1 ; all intermediate quantities
in the calculation were required to be less than Z31 in absolute value. My program did
this by computing

C t 16 * Z;
Z t Z div 16;
P t ((a, * Z) div 256 + a2 * Z) div 256 + a, *Z - C;

I should rather have computed

Z t Z div 16;
P t ((a, * Z) div 256 + a2 * Z) div 256 + (a, - 256) * Z

(Consider, for example, the case Z = ZZ6 + 15 and a, = a2 = a3 = 255, so that A =


-2-*O. The first method gives P = -304; the second method gives the correct answer,
P = -64.)
Let me close by discussing one more recent error, no. 864. This change yields only
a slight gain in efficiency, so I need not have made it; but it was easy to correct one
more statement while I was fixing no. 863. It is an instructive example of how a design
methodology based on invariants might not lead to the best algorithm unless we think
a bit harder about what is going on.
Here is the idea: each run of TEX determines a threshold value O above which the
(one-word) charnodes will reside, below which all other (variable-size) nodes will be
stored. Actually there are two values, €lo and 8 , ; memory positions between 8" and O1
are empty. (In the program, O0 is actually called lo-mem-max, and 8, is called
himem-min.) TEX changes Oo and 8, conservatively as it runs, so that they will
converge to values appropriate to particular applications. The boundary value 8 was
originally fixed at compile time; this transition to 'late binding' was change no. 819.
When TEX needs more space for charnodes, it usually sets 8' + O1 - 1; when TEX
needs more space for variable-size nodes, it usually sets Oo + O0 + 1000. But we need
to have O0 < 8,. Therefore, instead of setting Oo t €lo + 1000, my original code said

if el - eo > 1000 then 8, + 8, + 1000


else if 8, - 8, > 2 then 8, t (e, + 8, + 2) div 2
else (Report memory overflow).

(The variable O0 had to increase by at least 2.) Chris Thompson of Cambridge


University pointed out that this strategy, although preserving the necessary invariants,
is discontinuous. If - €lo = 1001, the algorithm gobbles up all the discretionary
space that is left. Therefore change no. 864 substituted better logic:

if el - 8, 2 1998 then e 0 + 8 , + 1000


else if el - O2 > 2 then 0, t 8, + 1 + (0, - 0,) div 2
else (Report memory overflow).

The new version also avoids problems on certain computers when O0 and O1 are
negative; that was error no. 863. (Of course, when TEX is this close to running out of
T H E ERRORS OF TEX 643
memory, it probably will not survive much longer anyway. I am grasping at straws.
But I might as well grasp intelligently.)

ACKNOWLEDGEMENTS

I have already mentioned that the TEX project has had hundreds of volunteers who
helped to guide me through all these developments. Their names can be found in the
rosters of the TEX Users Group; I couldn’t possibly list them all here. Luis Trabb
Pardo and David R. Fuchs were my ‘right-hand men’ for T ~ x 7 and 8 T ~ x 8 2 respect-
,
ively. T h e project received generous financial backing from several independent sources,
notably the System Development Foundation, the U.S. National Science Foundation,
and the Office of Naval Research. The material on which this report has been based
is now housed in the Stanford University Archives; I wish to thank the archivist,
Roxanne L. Nilan, for her friendly co-operation. The preparation of this paper has
been supported by U. S. National Science Foundation grant CCR-86-10181. Thanks
are due to the referee who helped me to remove errors not from TEX but from this
paper. And above all, I want to thank my wife, Jill, for ten years of exceptional
tolerance; software development is much more demanding than the other things I
usually do. Jill also helped me to design the format for the appendix that follows.

APPENDIX: T H E COMPLETE ERROR LOG


Each entry is numbered and cross-referenced (where possible) to other entries and to
the TEX program, as explained in the text above. Sometimes I have given credit to
the person who detected the error or suggested the change, but (alas) I did not always
remember to note such information down. Here are the initials of people who made
so many contributions that I have abbreviated their names in the log entries:
ARK Arthur Keller
CET Chris Thompson
DRF David Fuchs
FY Frank Yellin
HWT Howard Trickey
JS Jim Sterken
LL Leslie Lamport
MDS Mike Spivak
644 D . E. KNUTH

10 Mar 1978
1 Rename a few external variables to make their first six letters unique. L
2 Initialize escape-char to -1, not 0 [it will be set to the first character input]. $240 D
3 Fix bug: The test ‘id < ‘200’ was supposed to distinguish one-letter identifiers
from longer (packed) ones, but negative values of id also pass this test. $356 L
4 Fix bug: I wrote ‘while a A (p V 7)’ when I meant ‘while ( a A p) V y’. $259 B
5 Initialize the input routines in INITEX [at this time a short, separate program
not under user control], in case errors occur. $1337 R
6 Don’t initialize mem in INITEX,it wastes time. $164 E
7 Change ‘new-line’ [which denotes a lexical scanning state] to ‘neb-line’ [which
denotes carriage-return and line-feed] in print commands. B
8 Include additional test ‘memlp] # 0 A’ in check-mem. $168 F
9 Fix inconsistency between the eq-level conventions of macro-def and eq-define. $277 M
About six hours of debugging time today.
INITEX appears to work, and the test routine got through start-input, chcode
[the m 7 8 command for assigning a cat-code], get-next , and back-input the
first time.
11 Mar 1978
10 Insert space before ‘(’ on terminal when opening a new file. $537 I
11 Put ‘ p +- link@)’ into the loop of show-token-list, so that it doesn’t loop
forever. $292 F
12 Shift the last item found by scan-toks into the info field. [With SAIL all packing
of fields was done by arithmetic operations, not by the compiler.] $474 L
12 H 13 Fix the previous bugfix: I shifted by the wrong amount. $474 B
14 Add a feature that prints a warning when the end of a file page occurs within
a macro definition or call. [System dependent.] $336 I
Unintended bugs in my test routine [a format intended eventually to typeset
The Art of Computer Programming] helped check out the error recovery
mechanisms. For example, I had ‘\Zftf#]’ instead of ‘\lft.C##]’inside a
macro, and three cases of improper f and 3 nesting.
15 Add the forgotten case ‘set-font:’ to eq-destroy. $275 F
16 Change \require to \input. $376 C
17 Add code for the case cur-cmd = 0 [later known as the case ‘ t 2 cs-token-flag’]
when scanning a tokenlist. $357 F
That’s the first “big” error I’ve spotted so far.
18 Introduce a ‘d’ option in the error routine, to facilitate debugging. $84 I
19 Assign a floating-point constant ignore-depth to prev-depth, instead of assigning
the integer constant flag [since prev-depth is type real in m 7 8 1 . $215 L
20 Improve the readability and spacing of show-node-list output. $182,187 I
21 Set the variable v before using the case construction in show-node-last , because
there’s one case where v didn’t receive a value [aspart of the field unpacking]. $182 F
About seven hours today.
12 Mar 1978
One hour to enter yesterday’s corrections and recompile.
0 At this point correctly located further unintended syntax errors in acphdr
[the test file].
22 Insert debug-help into succumb, giving a chance to look at memory before the
system dies. $93 I
23 Use eq-destroy wherever necessary in unsave. $283 D
THE ERRORS OF TEX 645
24 Change ‘t +- ( t - 1) mod 8’ to ‘t +- ( t - 1) land 7’ in id-name, since SAIL
has -1 mod 8 = -1. [At this time, id-name is a routine that unpacks
control sequence names, according to a scheme that will become obsolete
after change #422.] L
25 Remove the space that appears at end of paragraph. (I hadn’t anticipated
that.) $816 S
26 Throw away unwanted line-feed after getting a carriage-return in response to
in-chr-w [a system routine for input from the terminal]. 883 L
27 Delete spurious call t o flush-list in end-token-list . $324 B
Why did I make such a silly mistake?
28 Fix bug in get-x-token: I forgot to say ‘macro-call’ (which is the main point of
that routine)! $380 F
While tracking that bug down, I found out incidentally that kerning is okay.
Also T)$ correctly caught an error Op for Opt.
29 Fix bug in scan-spec (while instead of repeat). $404 L
30 Make the table entries for \ h f i l l and \hskip consistent with the program
convent ions. $1058 M
31 Disable unforeseen coercion: When scan-spec put hsize on save-stack, the value
changed from real to integer. $645 L
32 Use ‘*’ instead of ‘-1.0’for running dimensions of rules in show-node-Zist. $176 I
33 Clear mem[head] to null in push-nest [in m 8 2 , this will be done by get-avail]. $216 D
A vrule link got clobbered because I forgot to do this.
34 Translate ASCII control codes t o special form when displaying them. §48,68 I
Ligatures work, but show-node-list showed them funny.
35 Remember to clear parameters off save-stack in package routine. $1086 F
About eight hours today.
13 Mar 1978
36 Introduce a new variable hang-first [later the sign of hang-after]. $849 D
36 H 37 Simplify the new code, realizing that if hang-indent = 0 then hang-first is
irrelevant. $848 E
Time sharing is very slow today, so I’m mostly reading technical reports while
waiting three hours for compiler, editor, and loading routine.
I’m not counting this as debugging time!
(Came back in the evening.)
38 Spruce up the comments in the line-break routine, which appears t o be almost
working. $813 P
39 Rethink the setting of best-line; it’s 1 too high in many cases. [The final line
of a paragraph was handled in a treacherous way.] $874 D
40 Compute proper initialization for prev-depth when beginning a n \hbox with
a paragraph inside. [This refers t o a special ‘paragraph box’ construction,
used when an hbox of specified size becomes overfull; m 78 doesn’t have
the concept of internal vertical mode.] $1083 D
41 Also initialize tail in that case. $1083 D
42 Also put the result of line-breaking into the correct list. M
43 Fix a typo in the free-node routine (‘link’ not ‘Zlank’); by strange chance it had
been harmless until today. $130 T
44 Fix bug: post-line-break forgot t o set adjust-tail. $889 F
45 Update act-width properly when looking for end of word while line breaking. $866 D
46 Repair the “tricky” part of get-node: I used the info field when I meant to say
llink. $127 B
Now the \corners macro of acphdr works! [See \setcornerrules in The
QXbook, page 41 7.1
646 D. E. KNUTH

47 Reset contrib-tail properly in build-page. $995 D


48 Fix typo (- for +) in computation of page-total. $1004 T
49 Change the page-breaking logic: ‘ I $ $
reached fire-up with best-page-break =
null in one case, since the badness was too bad. $1005 S
50 Perform the operation delete-token-ref (top-mark) only when top-mark # 0. $1012 M
51 Make scan-toks omit the initial i of an \output routine. $473 F
52 Insert a comma to make memory usage statistics look better. $639 I
About seven good hours of debugging today.
Tomorrow will be first-output day (I hope).
14 Mar 1978
(Came in evening after sleeping most of day, to get computer at better time.)
(Some day we will have personal computers and will live more normally.)
8:30pm, began to enter corrections to yesterday’s problems.
53 Issue an error message for non-character in filename or in font name. $771 I
54 Display ‘. . .’ for omitted stuff in show-conted routine. $643 I
55 +
Watch out for the SAIL syntax ‘ a p lsh y’; it doesn’t shift a + p left (only 0). $464 L
That error was very hard to track down; it created a spurious link field and
sent hash[O] = \beta to the scanner!
I could have found this bug an hour sooner if I had looked at the correct stack
entries for name and token-type.
56 Show the correct page number when tracing pages before output is shipped. $638 D
57 Remember to nullify a box after using it. $1079 F
58 Issue an error message if \box255 isn’t consumed by the output routine. $1015 I
I’m having trouble with the BAIL debugger; it makes an illegal memory refer-
ence and dies, when single-stepping past the entry to recursive procedures
hlist-out and vlist-out. So I have to reload and be careful to go thru these
procedures at high speed.
59 Fix bug in comment (memory parameter description said 2 not 5). $11 P
60 Fix typo in definition of rule output (said x , y not ~ 0yo). , [This part of the
code went away when D V I files were introduced.] B
61 Correct the embarrassing bug in shellsort, where I said ‘ 5 str[k]’ not ‘ 5 t’.
[The first had to sort all output by vertical position on page.] B
62 Make start-input set up job-name in the form needed by shipout; it uses obso-
lete conventions. $532,537 M
63 Insert ( and ) into the SAIL macro definition of newstring. [This macro was
for pre-DVI output.] L
64 Unscramble the parameters of out-rule: The declaration was (x0,y o , x1, y l )
while the call was (z0, sl , yo, yf ). M
4:30am, TEX’Sfirst page is successfully output!
(It was ‘\titlepage \set cpage 1 \ corners \ e j e c t \end’.)
15 Mar 1978
10:30pm. Today I’m instrumenting the line-breaking routine and putting it
through a bunch of tests.
(The inserted instrumentation had bugs that won’t be mentioned here.)
65 Don’t abort the job when eq-destroy redefines a control sequence. $275 C
The first word of a paragraph won’t be hyphenated . . . so be it!
66 Fix the typo in lane-break that spoils the test for ‘letters in the same font’. $896 T
The effect of that typo was to suppress all hyphenation attempts.
25 H 67 Replace the space at paragraph end by fillglue, not by zero. $816 B
68 Pack the hyphen character properly into its node. $582 L
69 Fix a typo (‘+’ for ‘+’) in the computation of break-width. $838 T
THE ERRORS OF TEX 647
70 Change the \end maneuver; the present code doesn’t end the job, since I forgot
that back-input uses cur-tok . $1054 M
71 Add a parameter to try-break, since the width is different at a discretionary hy-
phen. [This problem will be solved differently in m82,when discretionaries
become much more general.] $840 A
72 Bypass kern nodes in pre-hyphenation. $896 F
73 Supply code for the forgotten case ‘< l’a’l’in pre-hyphenation. [This case was
later generalized to a test of lc-code.] $897 F
74 Change mem[q] to prev-break(q) in the reverse-linking loop. $878 B
0 (Such blunders. Am I getting feeble-minded?)

75 Introduce special logic for eject-penalty; I was wrong to think that forced ejec-
tion was exactly like an infinitely negative penalty. $851 A
+
76 Use (1 b)2 - p 2 when computing demerits with p < 0. $859 A
0 6:30am. The line-breaking algorithm appears to be working fine and efficiently.

On small measures (about 20 characters per line), it gives overfull boxes


instead of spaced out ones. Surprising but satisfactory.
16 Mar 1978
0 9pm. The plan for tonight is to test page breaking and more paragraphing.
77 Insert ‘\topskip’glue at beginning of page. $1001 G
78 Add ‘\pausing’feature. $363 I
79 Fix discrepancy: In make-accent I called vpackage with a pointer to the first
list item, but vpackage itself assumes that the parameter is a pointer to that
pointer. [The vpackage of m 8 2 will be different.] $668 M
0 I checked for other lapses like that. Result: 14 calls OK, 12 NG.
80 Create a new temporary list head location, hold-head, since there’s a case
where vpackage is improperly called with parameter temp-head. [At this
time vpackage uses temp-head to make a list of all insertions found.] $1014 M
0 11:30pm. The machine is tied u p again.
81 Write code to handle charnodes in vlists; I forgot that I’d decided to allow
them. [Later I prohibited them again!] $669 F
82 Combine the page lists before pruning off glue in fire-up; otherwise the pruning
doesn’t go far enough. $1017 F
25 H 83 Fix typo where line-breaking starts: ‘Jill-glue’ should be ‘new_gZue(fiZZ-gZue)’. $816 B
84 Add /q to xspool command (cosmetic change). [This changes a system com-
mand that causes output to be printed on the Xerox Graphics Printer
(XGP), the progenitor of future laser printers; the /q option says that the
queue of printing requests should be displayed on the user’s terminal.] $642 I
85 Don’t write a form feed after the last page of output. $642 E
To fix this, I reorganized shzp-out , and i t became simpler.
86 Correct a typo (‘+’ for ‘+’) in the vlast-node case within hlist-out. [The output
routines were quite different at this time, because output went directly to
the XGP.] $622 T
87 Change the message ‘completed page’ to ‘Completed for page’. $638 I
48 H 88 Fix yet another typo in the computation of page-total: My original code said
stretch@) instead of stretch(q) (terrible). $1004 B
89 Document the dirty trick about bot-mark’s reference count. [That trick is,
fortunately, no longer useful.] $1016 P
90 Rethink the algorithm for contributing an insertion: The original code tests
for a page break after incrementing the totals but before the contrab-list
is updated. [ w 7 8 handles insertions in a hardwired manner that will be
greatly generalized in m 8 2 . l A
648 D. E. KNUTH

91 Fix get-node again: After the variable memory overflows, control falls through
to found instead of going to the overflow call. $125 F
I spent several hours tracking down that data structure bug!
92 Change new-line to next-line in yet another print command (see #7). B
75 ++ 93 Amend the line-breaking algorithm: \break in paragraph doesn’t work with
really bad breaks. $851 A
A problem to be diagnosed tomorrow: Each time I run the test program, the
amount of memory in use grows by 13 cells not returned.
Seven hours tonight.
17 Mar 1978
94 Introduce dead-cycles to keep \end active until ship-out occurs. $1054 G
95 Don’t call line-break with an empty list. $1096 E
96 Take proper account of the (infinite) fillglue when computing the width of a
paragraph line preceding a display. $1146 S
97 Add a new parameter to hpack so that line-break won’t be called at the wrong
time. [This is for the soon-to-be-obsolete feature described in #40.] M
98 Give a warning message if there’s an \ h f i l l in the middle of a paragraph;
fillglue upsets the line breaker, because floating-point calculations don’t have
sufficient accuracy. $869 L
I spent an hour looking for another bug in m, but the following one was in
METRFONT: The xgp-height data in fonts had been supplied wrong.
It took two hours to recompile 32 fonts with proto-METRFONT.
99 Make show-node-list and show-token-list more robust in the presence of soft-
ware bugs. $182 R
97 H 100 Do not remove nodes with eject-penalty, when the new parameter to hpack is
true. D
97 H 101 Put a fast exit into hpack; e.g., at glue nodes, test ‘if paragraphing A (current
width is large)’. E
2am. I have to go to bed “early” tonight.
18 Mar 1978
3:30pm. (Saturday)
102 Add a parameter to check-mem (to suppress display unless needed). $167 I
103 Introduce a user-settable parameter \mudepth, and pass it as a parameter to
vpackage. $668 G
I realized the need for \mardepth while fixing insertions (see #90).
104 Introduce a user-settable parameter for h e - b r e a k : The constant 2.0 in my
original algorithm becomes \jpar [later \tolerance], to be set like \tracing. $828 G
105 Reclaim the eject-penalty nodes removed during line-breaking. $879 D
(Those were the 13 extra nodes reported on Thursday.)
The anit-align procedure worked right the first time!
Also anit-row, init-col. But then.. .
106 Rethink the command codes: endv in a token list has too high a code for the
assumptions of get-next . $207 S
107 Add a prev-cmd variable for processing delimited macro parameters; the origi-
nal algorithm loses track of braces. [The rules will change slightly in w 8 2 ,
and rbrace-ptr will take on a similar function.] $400 A
108 Make the get-next routine intercept & and \cr tokens. $342 S
I’d thought I could just put & and \ c r into big-switch [i.e., in the stomach of
QX, not the eyes]; that was a great big mistake.
109 Make more error checks on endv; e.g., it must not occur in a macro definition
or call. $780 R
T H E ERRORS OF TEX 649
108 H 110 No, rethink alignments again; the new program still fails! $768 S
For the first time I can glimpse the hairiness of alignment in general (e.g.,
‘\halign{\u#\vk. . . ’ when \u and \v are defined to include k’s and possible
alignments themselves).
I think there’s a “simple” solution, by considering only whether an alignment
is currently active (in $3421.
11:30pm. Went to bed.
19 Mar 1978
Woke u p with “better” idea on how to handle & and \cr.
(Namely, to consider a special kind of \def whose parameters don’t interrupt
on k’s and \cr’s.)
But replaced this by a much better idea (to introduce align-state).
l l p m . Began to use computer. Performed major surgery (inserting align-state
and updating the associated routines and documentation).
111 Pop the alignment stacks in fin-align. $800 D
110 H 112 Fix a (newly inserted) typo in show-conteb. $314 T
110 H 113 Set align-state false when a live & or \cr is found. [Originally align-state was
of type boolean .] $789 D
114 Insert \cr when ‘1’ occurs prematurely in an alignment. $1132 I
115 Remember to record glue-stretch when packaging an unset node. $796 M
I had a mistake in acphdr definition of \quoteformat;also extra spaces.
My first test programs, used before today, were contrived to test macro expan-
sion, line-breaking, and page layout.
Next I’m using a test program based on Volume 2.
116 Make carriage-return, space, and tab equivalent for macro matching. $348 C
117 Omit the reference count node when displaying a mark. $176 F
118 Correct a silly slip: I wrote ‘type-displacement ’ instead of ‘value-displacement ’
when packing data in a penalty node. $158 B
119 Don’t go to build-page after seeing \noindent; isn’t ready for that. [In
the original program, this was an instance of a bad goto.] $1091 M
I had undesired spaces coming thru the scanner in my macro definitions of
\tenpoint (see The W b o o k , page 4141.
4am. T)& now knows enough to typeset page 1 of Volume 2!
Also i t did its first “math formula” (namely ‘$X$’) without crucial error.
(Except that the italic correction was missing for some reason.)
120 Remember to decrement cur-level in fin-align. [The routines will eventually
become more general and use unsave here.] $800 D
121 Remember to increment cur-level in error corrections by handle-right-brace.
[A better procedure will be adopted later.] $1069 D
122 Fix a typo: (‘{’ instead of ‘1’) in error message for mmode + math-shift. $1065 T
99 I-+ 123 Make show-noad-list more robust and more like the new show-node-list. [The
routines will be combined in T)jX82.] $690 R
124 Fix a typo in char-box: should say font-info-real. [In w 7 8 a single array is
used for both real and integer; in m 8 2 things will be scaled.] $554 L
125 Fix typos in the definitions of default-rule-thickness and big-op-spacing; they
shouldn’t start at mathez(7). $701 B
126 Reverse the before and after conventions in math nodes. $1196 B
I had them backwards; this turned hyphenation on just before math, and off
just after it!
Seven and a half hours debugging today. Got through the test program a little
more. But blew up on ‘$Y+l$’;tomorrow I hope to find out why.
650 D . E. KNUTH

20 Mar 1978
8pm. I decided to work next on a super-hairy formula.
1 2 7 Change ‘\ascii’t o ‘ \ c c ’ (character code). [This name will change again later,
to ‘\char’.] $265 C
128 Don’t bother to store a penalty node at the beginning of $$ when the paragraph-
so-far fits on a single line, since such a penalty has already been stored.
[These conventions will change later, and the \predisplaypenalty will al-
ways be stored.] $1203 E
129 Avoid reference to tail in build-page, if nest-ptr > 0. $995 s
130 Correct a silly slip in math-comp (the exact opposite of what I did in #118). $1158 B
131 Rectify my mental lapse in make-fraction; I said nucleus instead of thickness. $743 B
1 3 2 Mask off the math class when scanning delimiters. $1160 F
133 Allow an optional space after \def {. . .I. [This decision will be retracted later.] $473 C
My test example is so complicated it causes the semantic stacks to overflow!
134 Don’t test for no pages output by looking at the channel status. $642 L
135 Fix typo in definition of \mathop (open-noad not op-noad). 51156 T
136 Rewrite fin-mlzst , because ‘\left( . . .\above. . .\right)’ doesn’t parse cor-
rectly; the \left goes into the numerator, the \right into the denominator. $1184 A
1 3 7 Correct the use of depth-threshold in print-subsidiary-data: Simple fields get
shown while others look empty. $692 B
78 H 138 Return the carriage before showing the first line of a new file when pausing. 5538 I
139 Fix bug: The call show-noad-list ( m e m [ ..I) should be show-noad-list(. .), in
the incompleat-noad case of show-activities . $219 M
3am. The whole messy formula has been parsed correctly into a tree.
The easy part is done, now comes the harder part.
140 Don’t shift single characters down in make-op. $749 F
141 Make clean-box return a box (as its name implies), not an hlist. $720 D
Font info still isn’t quite right, it has the wrong value of quad.
142 Retain the italic correction when doing rebox; can make glue-set # 0 a flag for
this. [A better solution will be adopted later.] $715 A
143 Fix the bug that makes rebox bomb out: value(p) should be value(mem[p]). 5715 B
6am; ten hours today. didn’t do $\pi\over2$ correctly, but was close.
I found that the rebox problem (#I421 went away when I fixed the clean-box
problem (#141); but I will leave the extra stuff about glue-set # 0 in the
program anyway, just for weird cases.
144 Omit extra levels of boxing when possible in clean-box. $721 E
(To do this, I need to face the rebox problem anyway.)
21 Mar 1978
IOpm. The computer is rather heavily loaded tonight.
145 Don’t forget thickness when making a square-root sign (see #131). [The rule
thickness will later be derived from the character height.] $737 F
146 Define p local to the make-fraction routine. $743 L
Unwittingly using the global p was a disaster.
147 Don’t show the amount of glue-set when it’s zero. $186 I
142 H 148 Make glue-set nonzero in the result of var-delimiter . 5706 D
149 Fix bug: The math-glue function didn’t return any result. 5716 F
150 Fix typo in char-box ( c not w); this caused a subscripted P to come out the
same width as an unsubscripted P. [Later changes in the rules will move
this computation to $755.1 $709 T
144 H 151 Revise clean-box to do operations that are needed often because of the rebox
change. $720 P
152 Use the new clean-box to avoid a bug in \sqrt(\raise. . . I . 5737 D
THE ERRORS OF TEX 65 1
153 Change the definition of \not so that it’s a relation (which will butt against
the following relation). [All math symbols and Greek letters are defined in
INITEX at this time, not in a changeable format definition.] Q
154 Give error message ‘Large d e l i m i t e r must be i n mathex font’, instead of
calling confusion, since the error can occur. [This particular error is impos-
sible in QX82.1 $706 I
155 Change the use of p in var-delimiter; it isn’t always set when I say goto found. $706 F
Another font problem now surfaced: The mathex meta-font didn’t compute
7QX info in a machine-independent way. (It took two hours to correct this.)
156 Don’t forget to set type@) in all relevant cases of var-delimiter. $708 D
157 Use the correct sign convention for shift-amount in hpackage. $653 B
158 Always kern by delta when there’s no superscript. $755 F
159 Declare space-table t o be [0 . . 6 , 0 . .61 not [0 . . 7 , 0 . .71; otherwise its entries are
preloaded into the wrong positions. [The space-table in Q X 7 8 is 7 x 7; it
will become 8 x 8 in w 8 2 , represented as a string called math-spacing.] $764 M
160 Use a negative value, not zero, to represent a null delimiter. [Actually zero will
come back again later.] $685 C
127 H 161 Change \cc to \char. $265 C
162 Don’t use tricky subtraction on packed data when changing q to an ord-noad
in mlist-to-hlzst ; subtraction isn’t always safe. $729 L
163 Fix two typos in the space-table (* for 0). $764 B
164 Initialize cur-size everywhere (I forgot it in two places). $703 F
165 Reset op-noad before resetting bin-noad. $728 A
+
166 Treat display-style cramped the same as display-style inside make-op. $749 F
167 Shift the character correctly in the non-\displaystyle case of make-op. $749 B
Still another font problem: The italic corrections are wrong because the corre-
sponding array was declared real in proto-METRFONT (and italic corrections
were used in nonstandard way in mathex).
168 Use depth instead of height in var-delimiter. [Later, both were used.] $714 B
169 Skew the accents according t o the font slant. [Soon retracted.] $741 Q
At this point I think nearly all the math routines have been exercised.
Tomorrow they should work!
Eight hours debugging today.
22 Mar 1978
(Wednesday, but actually Thursday: I began a t midnight because I was proof-
reading a paper.)
I checked out the font access tables, slowly (i.e., all the \mathcode and special-
character name entries were catalogued).
169 H 170 Do not consider slants after all in the math accent routine, since slanted math
letters are put differently into fonts. $741 Q
171 Don’t use q for two different things simultaneously in make-math-accent. $738 B
172 Fix bug in compact-list (I forgot to advance the loop variable). !This procedure
became unnecessary in m 8 2 . J F
173 Avoid conflict between var-delimiter and mlist-to-hlist, which want to use
temp-head simultaneously. $713 M
174 Fix bad typo in overbar routine (b for p ) . $705 T
Finally T)$i got to after-math after dealing with that hairy formula.. .
175 Fix another bad typo: p for b this time. $1199 T
176 Insert more parentheses (twice) because of ‘lsh’ precedence in SAIL. $1199 L
36 H 177 Use the new hanging-indentation conventions when formatting displayed equa-
tions. $1199 M
178 Recompute penalties so that break is allowed after punct-noads. $761 Q
652 D . E. KNUTH

179 Center the large delimiters vertically. $749 F


180 Round all rule sizes (up) before drawing them. $589 Q
181 Provide more space over x in 6, and more space atop vincula. $705,737 Q
182 Make large delimiters large enough to cover formula height (important for sub-
scripts, superscripts). $762 Q
183 Insert /ntn=33 on XGP prompt message so that complex math won’t blow the
device driver. [See #84.] I
161 H 184 Update the comment about the meaning of \char, since it can be used in math
mode. $208 P
Six hours today.
23 Mar 1978
1Ipm, Maundy Thursday.
104 H 185 Make \tracing and \jpar follow block structure. $283 C
It took me two hours to enteryesterdayb corrections, because the changes were
so numerous.
186 Fix bad call on begin-token-list when marks are to be scanned. $396 M
Now the formula looks like it should, modulo problems in fonts.
187 Prevent an exponent from going below baseline + xheight/4. $758 Q
188 Change quad to math-quad when finishing a display (several places). $1199 B
189 Don’t use append-to-ulist when putting an \eqno box on a separate line, because
the page shouldn’t break at glue there. [Later, the append will be used but
preceded by an infinite penalty.] $1205 A
190 Increase the input stack-size; ‘I)may $need to back up a lot. $11 s
191 Don’t assume that p always points to a glue node when a page is broken. $1017 A
192 Use epsilon in scan-spec (I had used a different small constant). [This was a
kludge to avoid the extra parameter later called exactly or additional.] $645 A
193 Introduce a new procedure scan-positive-length , to prevent negative or zero
lengths in scan-rule-spec. [This restrictive rule will be “overruled” later.] $463 R
194 Fix ridiculous bug in the leaders routine of ulist-out: I had the initialization
inside the loop! $635 B
195 Eliminate confusion between the two temp variables named h; one is real and
the other is integer. $629 L
196 Include forgotten case (leader-node) in hlist-out. [Type leader-node will be
absorbed into glue-node in m 8 2 . 1 $622 F
197 Don’t forget to compute ZU in variable horizontal rules. $624 F
Seven and a half hours today.
seems to be ready to tackle my test file based on Volume 2.
198 Calculate yUU in horizontal rules as an integer number of pixels from the base-
line, so that the baseline doesn’t jump. $589 S
25 Mar 1978
2am Saturday. (Might as well drop Fkiday.)
185 H 199 Make def-code consistent with the new \tracing conventions. [Many tracing
options are packed into a single parameter called \tracing.] $1233 D
185 H 200 Don’t allow users to change nonexistent things like \catcode1000. $1232 R
110 H 201 Reset align-state at beginning of init-align. $774 F
202 Don’t forget scan-left-brace after \noalign. $785 F
110 H 203 Set CUT-cs+- 0 in get-nesct, after \cr causes a switch to the (wj) template. $342 F
Ouch, that was a big bad bug, which took me three hours to find (since I
thought Q 3 ’ s low-level scanning mechanism was working).
THE ERRORS OF TEX 653
Note to myself: I knew it would be cleaner to define get-next so that it sets
cur-cs to zero every time it begins Lie., in $341, where this change will in
fact be made in w82].But I had avoided this on grounds of efficiency in
the inner loop. Well, now I have earned this tiny bit of efficiency.
204 Prohibit the first word of an unavailable node from becoming negative. [The
storage allocator of m 7 8 uses a negative value to signify a node that is
available, just as ‘link = max-haljword’ will signal availability in m 8 2 . 1 $124 D
That was another bad one, it’s not my night.
At least I’m developing more subtle diagnostic techniques.
205 Remember to un-negate the top save-stack entry when handle-right-brace fin-
ishes an insert-group. [This routine was completely revised in m 8 2 . 1 $1100 M
185 H 206 Initialize \jpar [i.e., \tolerance]. $240 F
207 Correct the display of insertion nodes by show-node-list . $188 B
208 Prevent show-tok-en-list from generating really long strings when in a loop. $292 R
209 Increase the reference count of bot-mark when vpackage finds it. [This was
later the job of $re-up.] $1016 D
210 Remember that the tokenlist for a mark ends with a I. $1101 M
211 Don’t let vpackage lose the top insert. (It fails when the very first item is a
\topinsert .) $1014 D
212 And when that stupid code is corrected, make it handle insertions first-in-first-
out. $1018 A
Seven hours today.
26 Mar 1978
Easter Sunday, will work till sunrise.
213 Add an ‘i’feature to the error recovery routine. $87 I
214 Include a prompt. $87 I
215 Ignore space after \noalign{. . .I. $1133 C
Otherwise, things are going well tonight; I’m finding more bugs in my test
program than in !QX.
The 5’feature is proving to be very helpful.
I increased the size of m e m (now lo-mem-mas = 3500, mem-max = 10000).
In fact Ijust needed to increase it again (now lo-mem-mas = 4500, mem-max =
11000).
216 Make INITEX output mem-top for consistency checking. $1307 R
217 Calculate the size of delimiters by considering the enclosed formula’s distance
from the axis, not from the baseline. $762 A
I’m having trouble with a SAIL compiler bug; I must rearrange the program,

-
more or Jess at random, until it compiles correctly. I hope the bug isn’t more
severe than it appears.
210 218 Don’t put a new group on save-stack if a null mark is expanded. [ w 8 2 will
remove the ‘I’from the mark text.] $386 D
I had to redo the typewriter-style font since its width tables were wrong.
And I increased low-memory size again to 5500, then 6500.
Finally the entire test program was w e d . Happy Easter! Six hours today.
27 Mar 1978
Beginning at 2:30am.
219 Move \vcenter processing to the first pass of mlist-to-hlist; otherwise the
height, depth, subscripts, etc., are way off. $733 A
220 Omit space after closing $$. $1200 c
Spacing is wrong in the formula YI+. +
. . Yk;I have to rethink the use of three
dots.
654 D . E. KNUTH

221 Make conditional thin space available to user as \S. [Later will retract this.] $226 G
222 Introduce \dispaskip and \dispbskip [later called \abovedisplayshortskip
and \beloudisplayshortskip]. $226 Q
Reminder: I need to test line-breaking with embedded math formulas.
223 Make sure that interaction # error-stop-mode in the ‘Whoa’error Ifatal-error]. $93 I
224 Fix a big mistake in the style-node routine (which points to a glue spec, not
to glue itself); somehow this didn’t cause trouble yesterday. [In w 7 8 , style
nodes double as placeholders for math glue like thin spaces.] $732 B
225 Make \fntfam obey group structure. [ m 7 8 ’ s \fntfam operation is a combi-
nation of W 8 2 ’ s \textfont, \ s c r i p t f o n t , and \scriptscriptfont.] $1234 C
At this point the test routine for Volume 2 works perfectly.
But I will change the page width in order to check harder cases.
178 H 226 Disable automatic line breaks after punctuation in math (e.g., consider f ( x ,y)). $761 Q
227 Represent italic corrections as boxes, not glue, so that they won’t be broken.
[The \kern command doesn’t exist yet.] $1113 S
Eight hours today.
228 Fix a bug that just clobbered the memory: Call free-avail, not free-node, in
the ins-node case of vpackage. [This logic will change completely in w 8 2 . 1 $1019 B
29 Mar 1978
(Wednesday) Again beginning at 2:30am.
229 Put still more space above and below fraction lines in displayed formulas. $746 Q
189 H 230 Install an infinite penalty feature, which positively suppresses breaks; use it in
displayed formulas whose \eqno doesn’t fit. $1205 G
231 Call build-page after finishing a display; and don’t go to the \noindent routine
because of the next remark. $1200 F
232 Put \parskip glue just before a paragraph, not just after (since it interferes
with a penalty after). $1091 s
Although the test program gives correct output, it generates 46 locations of
variable-size memory and 280 of one-word memory that are not freed.
233 Recycle the ulists and vlists in fin-align. $801 F
25 H 234 Fix bug when deleting space at end of paragraph: delete-glue-ref (cur-node)
not delete-glue-ref (value (cur-node)). $816 M
There’s also a more mysterious type of uncollected garbage, a fraction-noad
corresponding to $p\choose$, an incornpleat-noad not completed.
Couldn’t find that one, so I recompiled with #233 and #234 corrected.
Now it gains just 10 locations of variable-size memory and 7 of the other kind.
235 Extend search-mern to search eqtb also. $255 I
143 H 236 Fix bug in rebox when list-ptr(b) = 0. $715 D
The seven one-word nodes were generated by this bug; rebox put them onto a
linked list starting with m e m [ O ] ,growing at the far end!
237 Remember to complete each incornpleat-noad. $1184 D
This solved the other mystery. I had never noticed that my test output was
actually wrong: $p\choose k$ came out as ‘k’.
After these corrections, the test routine worked.. . Ifeel that is nowpretty
well debugged (except perhaps for error recovery)-it ’s time to celebrate!
1 Apr 1978
238 Don’t quit after file lookup fails. $530 I
2 Apr 1978
239 Add w - f o n t - a r e a , so that it’s easier to change the default library area asso-
ciated with a device. $514 P
THE ERRORS OF TEX 655
3 Apr 1978
240 Insert parentheses again, to cope with the precedence of l s h when packing
data. (See #55 and #176.) $1114 L
0 I had never tried hmode + discretionary before!
241 Remember that back-error requires cur-tok to be set. (Problem can arise
during error recovery on parameter #n with n out of range.) $476 M
4 Apr 1978
242 Add a deletion feature to the error routine. $88 I
5 Apr 1978
243 Reset space-factor after \/ [this was later rescinded] and after math in text. $1196 Q
10 Apr 1978
104 H 244 Replace \ j p a r and \ t r a c i n g by a new primitive \chpar for parameters. It
allows a user to change those quantities as well as the penalties for hyphens,
relations, binary ops, widows. $209 G
14 May 1978
Beginning to typeset a real book (Volume 2, second edition), not just a test.
245 Make math in text end with spacing as if it were followed by punctuation. [This
rule will soon be rescinded.] $760 Q
246 Insert \times into the hash table; I left it out by mistake. [It will eventually
move into p l a i n . tex.] F
247 Change the names of Scandinavian accents from \o, \oslash, \Oslash to \a,
\o, \O. [This will also move to plain.] C
17 May 1978
248 Fix a silly bug that hasn’t been tweaked until today: ‘\halign t o s i z e ’
[obsolete in m S 2 ] used vsize instead of hsize. $645 B
19 May 1978
249 Add a \topbaseline feature [later called \topskip]. $1001 G
245 H 250 Subtract the math spacing change of May 14. $760 Q
251 Skip past blanks in the scan-math procedure. [This blank-skipping will even-
tually go into scan-left-brace.] $403 A
252 Introduce a massing-brace routine [later generalized] to improve error recovery
+
in mmode math-shift , when the top of save-stack isn’t a math-shift-grou~. $1065 I
253 Adjust the math spacing between closing parentheses and Ord, Op, Open,
Punct. §764 Q
254 Make the underline go further under. §735 Q
96 H 255 Compute the proper natural width when a displayed equation follows a para-
graph whose fillglue has been deleted by h e - b r e a k . $1146 S
20 May 1978
256 Fix the spurious value of prev-depth inside alignments. 5775 A
257 Consider (and defeat) the following scenario: The u and v lists are built in
init-align using temp-head; then while scanning ‘\tabskip Spt\rt{. . . I ’
the macro \rt is expanded, clobbering temp-head. 5779 s
That bug was more subtle than usual.
258 Add the parameter n u m 3 , so that the positioning of \atop can be different
from that for fractions. §700 Q
259 Add new parameters deliml and delimd, so that \comb can use fixed size
delimiters, not computed as with \ l e f t . §748 Q
22 May 1978
221 H 260 Change \I to \I and introduce \I as the negative of \b. [Later obsolete.] $226 C
656 D . E. KNUTH

261 Fix the display of negative penalty nodes; show-node-list is confused when a
negative value has been packed into the middle of a word. $194 L
Memory overflow just occurred with lo-mem-max = 7500 and mem-max =
16384. So I have to go to 15-bit pointers. (A problem on 32-bit machines?)
23 May 1978
262 Add a new parameter big-op-spacing5, for extra space above and below limits
of big displayed operators. $751 Q
263 Initialize incornpleat-noad in $ $ \ h a l i e { . . .I$$. $775 F
That was another heretofore-untested operation. How much of the code has
not yet been exercised?
238 H 264 Close the file when doing lookup-failure recovery. $27 F
265 Improve the error recovery for ‘Extra &’. $792 I
266 The top piece must be calculated mod 128 in war-delimiter, to guarantee a
valid subscript range. [Obsolete in w 8 2 . 1 $546 R
252 H 267 Fix a blunder in new missing-brace code. $1065 B
262 H 268 Fix a blunder in new code for limits on display operators. $751 B
26 May 1978
269 Don’t insert a new penalty after an explicit penalty in math mode. $767 Q
The hash table overflowed; I ought to make it much bigger.
110 H 270 Avoid possible bad memory references in alignment when there is erroneous
input after \cr. [Instead of extra-info, the value of cur-align in w 7 8 is
negated, because we need only distinguish \cr from &.I $789 R
271 Make the dimension parameters like \hsize all global, so that they can be set
in the \output routine. $279 S
This led to major simplifications, also to major surgery.
[But it was a kludgy decision, overruled in m 8 2 . 1
94 H 272 Don’t forget to set the type of the new null box in the \end routine. $1054 D
27 May 1978
The data overflowed memory again, both low and high, doing Section 3.3.2.
184 H 273 Mask off extra bits of \char in math mode, to avoid bad memory references. $1151 R
274 Zero out the negative \rnedmuskip in script styles. $732 B
29 May 1978
275 Be prepared to handle an undefined control sequence during get-x-token. (Can
fix this by brute force, using get-token instead of get-next.) $380 S
276 Correct the superscript shift when a single character is raised. $758 D
184 H 277 Mask off all but 7 bits in \char routine, to avoid space-factor index out of
range. 5435 R
More memory capacity overflows.
22 H 278 Fix W’soverflow stop so that I don’t have to wait for loading of the BAIL
debug routines. [System dependent.] $93 E
279 Remember to adjust the page number when a file page ends in mid-macro.
[System dependent.] $306 F
5 Jun 1978
280 Make sure that the arguments of positioning commands don’t overflow their
field size. $610 R
281 Report the excess amount when giving an overfull box warning. $666,677 I
7 Jun 1978
282 Use 2 instead of > as termination criterion in war-delimiter. §714 Q
283 Disallow \eject in math mode. [In 33x78, \eject is distinct from \break; in
horizontal mode it includes ‘QjX82’s ‘\vadjust{\break)’.] 51102 R
T H E ERRORS OF TEX 657
284 Don’t put too much clearance above \sqrt in text style. $737 Q
9 Jun 1978
110 H 285 Make align-state an integer variable, not boolean, so that \eqalign can be
within another \eqalign. $309 G
286 A \mark should expand its input. $1101 C
10 Jun 1978
287 Provide for preloading of fonts. $1320 E
288 Close the output file before switching to edit the input file with the ‘e’option. $84 L
289 Return adjustments found by hpack to free storage if they’re not used. [Later,
hpack will detach them only when they’re used.] $655 E
290 Strive for consistency between make-under and make-over. $735 Q
18 J u n 1978
236 H 291 Fix a serious error in reboz (‘b’ instead of ‘Zzst-ptr(b)’). $715 B
Strange that such a bug would now surface for the first time!
292 Remove \deg from INITEX,since macros suffice. C
293 Add an extra hyphenation penalty for two hyphenated lines in a row. $859 Q
19 Jun 1978
294 Introduce the Lno-new-control-sequence’ switch. Among other things, this will
prevent an undefined control sequence following scan-math from clobbering
the save stack. $259 S
20 Jun 1978
295 Change the badness test ‘glue 5 0.0’ to ‘glee 5 0.0001’. [%82 will avoid such
problems by calculating badness without floating point arithmetic.] $99 L
296 Force badness to be at most lo1’. $108 R
297 Add end-template for better error recovery in alignments. $375 I
287 H 298 Make INITEX more like the real w; my simple scheme for font preloading was
no good because it left thousands of ‘dead’ words in memory. $8 €3
299 Economize disk space by using internal arrays in load modules that aren’t being
reinitialized. [System dependent .] E
300 Move the declaration of m e m to the semantics module, so that the object code
will be more efficient. [System dependent. The code of W 7 8 was divided
into separately compiled modules for syntax, semantics, output, extensions,
and general organization.] E
21 J u n 1978
Today I’m working on the user manual.
301 Disallow \input except in vertical mode. [I will change this in w 8 2 , treating
\input as a case of expansion.] $378 C
302 Add error recovery for endv and par-end occurring in math mode. $1047 I
303 Generalize \ifT to \if T. $506 G
22 Jun 1978
304 Preload the \bullet [later done by plain.tax]. F
256 H 305 Get the correct prev-depth at the beginning of an alignment. $775 D
300 Change \ e j e c t so that it ejects only once. $1000 c
14 Jul 1978
307 Look in standard area if a file isn’t found in the user’s area. $537 I
308 Echo all online inputs in the transcript file. $71 I
19 Jul 1978
309 Equalize spacing when only one of numeratorfdenominator is big. $745 Q
310 Prevent subscript from getting too high above baseline. $757 Q
658 D. E. KNUTH

311 Avoid infinite loop when stack overflows: push-input should say ‘ifinput-ptr 2
stachsize A interaction = error-stop-mode’. $321 R
22 Jul 1978
312 Make \quad meaningful outside math mode. (All fonts must be generated
again!) $558 C
313 Show the nesting level at the end of show-activities. [But I decided not to do
this in w 8 2 . 1 $218 I
314 Put in \> [namely, \mskip\medmuskip; w 7 8 already has \L, for conditional
\thinmuskip, as well as the negative amounts \<, \<I.
Change the name
of vector accent from \> to \b. [Math spacing operators will become much
more general in w 8 2 . 1 $716 C
25 Jul 1978
94 H 315 Give the correct \hsize and \vsize to the null boxes created at \end. $1054 Q
94 H 316 And don’t “append” them. [Later this was changed, so that it would work
better with generalized output routines.] $1054 A
297 H 317 Remove the control sequence \endv, since error recovery is now better. $375 I
318 Define another mode of tracing: It says ‘OK’ and stops after \shoulists. $1298 I
244 H 319 Give better defaults to parameters. [Later done by plain.tex.1 $209 Q
320 Allow more bits in the packed representation of \showboxdepth. $238 I
321 Scan past delimiters and/or dimensions when recovering from ambiguous frac-
tions. $1183 I
322 Reduce accent numbers modulo 128 or 512, depending on the mode. $1165 R
323 Include a warning, ‘(\end occurred on level . . .>’. $1335 I
28 Jul 1978
(I’mwriting Chapter 27 of the manual: ‘Recovery From Errors’.)
324 Improve the error message in scan-digit. [This procedure will change its name
to scan-ezght-bit-int, when the number of registers increases from 10 to 256.1 5433 1
325 Don’t report overfull boxes if they’re less than .1 point over. $666,677 I
326 Give the user extra chances to define the font, if read-font-info is unsuccessful. $560 I
327 Change default recovery for bad parameter number from #1 to ##, since #1
won’t always work and since ## is probably intended. $479 I
328 Omit the “Negative?” message on things like scan-char-num. $435 I
329 Improve error recovery when a large delimiter isn’t in family 3. [Obsolete.] I
330 Give a more appropriate error message when the input is ‘$\right’. $1192 I
Currently says ‘Missing $’!
331 Call backinput before the error message in backerror, not afterwards. $327 I
1 Aug 1978
332 Give an appropriate warning when there’s no input file and the user types ‘e’. $84 I
333 Increase the system pushdownlist size so that the manual will compile. [Pro-
cedures hlist-out and vlist-out can recurse deeply.] L
Yesterday I distributed 45 preliminary copies of the manual; today I took out
the “debugging hooks’’ and p u t up as a system program.
2 Aug 1978
I’m typing Volume 2 again (currently in Section 4.2.2). Culture shock!
334 Introduce a \ragged parameter, to indicate a degree of raggedness. [Previ-
ously, ragged-right setting was performed when the \tolerance/100 was
odd! Eventually a better approach, with \rightskip and such things, will
be discovered.] $886 G
335 Omit the ‘widow penalty’ in one-line paragraphs. 5890 Q
THE ERRORS OF TEX 659
5 Aug 1978
336 Generalize \pageno to \count (digit). $236 G
285 H 337 Update align-state when recovering from ‘Missing C’ and ‘Extra 1’ errors.
$1069,1127 D
338 Show “runaway” tokens, making it easier to pinpoint an error. $306 I
22 Aug 1978
339 Add \predisplaypenalty. $1203 G
340 Clarify error messages; they should indicate when something has been inserted,
etc. $1064 I
23 Aug 1978
114 H 341 Substitute ‘Extra 1’ for the losing ‘Missing \cr’ error message. $1069 I
213 H 342 G o past online insertions in show-context. $311 I
343 Exact no penalty for breaking one line before a display. $1145 Q
338 H 344 Check for runaways at end of file. $362 I
345 Give error message when a macro argument begins with 1. 5395 I
24 Aug 1978
213 H 346 Remove extra line-feed in show-contezt after printing insertions. [System de-
pendent.] $318 L
25 Aug 1978
347 Leave no glue at top of page, even after \eject. $997 Q
27 Aug 1978
348 Adopt Guy Steele’s new version of the T)$ source files. [He has recently
made a copy and modified it by introducing compile-time switches for MIT
conventions as an alternative to SUAI. This is the first time that T)$ is being
ported to another site; additional switches for PARC, TENEX, TOPS10, and
TOPS20 will be added later, using the Steele style.] P
1 Sep 1978
349 Don’t pass over leader nodes in the try-break background computation. [At
this time, leaders have not yet been unified with glue.] $837 Q
82 H 350 Prune away all penalties at the top of a page. $997 Q
4 Sep 1978
338 H 351 Include ‘\’ in error message about a runaway argument. $306 I
8 Sep 1978
I just remade all the fonts, with increased ligature field size.
350 H 352 Insert a necessary goto statement in the first branch of the new penalty routine
within build-page. 5997 B
30 Sep 1978
338 H 353 Make the token list for runaway arguments meaningful outside of macro-call.
(I just had a runaway argument ending with ‘\1cm’,which turned out to be
the control sequence in hashtable location 0.) $371 M
354 Avoid infinite loop when recovering from $$ in restricted horizontal mode. $1138 R
355 Fix two hyphenation bugs related to -ages, -ers. [A completely new algorithm
for hyphenation will go into w 8 2 . 1 L
356 Add -est to hyphenation routine; also disable puz-zled and rat-tled, etc. Q
4 Oct 1978
357 Add new primitive \vtop. $1087 G
358 Treat implicit kerns properly after discretionary hyphens have been inserted. 5914 Q
660 D. E. KNUTH

4 Nov 1978
359 Forget the half quad originally required at left and right when centering dis-
played equations without equation numbers. $1202 Q
11 Nov 1978
360 Don’t let the postamble come out empty. [This could occur if no fonts were
selected.] $642 R
15 Nov 1978
361 Allow optional space after digit in scan-int routine. $444 c
17 Nov 1978
362 Make the check-mem procedure slightly more robust. $167 R
20 Nov 1978
363 Make the \par in a \def match the \par that comes automatically with a blank
line. (Suggested by Terry Winograd.) $351 C
364 Add new parameter \mathsurround for spacing before and after math in text. $1196 G
365 Extend \advance to allow increase by other than unity. [At this time it applies
only to the ten \count registers, and it is called \advcount.] $1238 G
25 Nov 1978
366 Add a new primitive: \unskip. $1105 G
367 Add new primitives \uppercase and \lowercase. $1288 G
28 Nov 1978
338 H 368 Don’t let \mark and macro-call interfere with each other’s scanner-status. $306 M
369 Omit extra 3 after show-node-list shows a \mark,since the right brace is already
there. (See #210.) $176 M
370 Add a new primitive suggested by Terry Winograd: \xdef. $1218 G
29 Nov 1978
371 Delete a space following \else{. .. I also in the false case. ( w 7 8 uses braces,
not \f i, for conditionals.] S
320 H 372 Make \tracing set \showboxbreadth as advertised. $198 D
373 Account properly for kerns in width calculations of line-break. $866 F
364 H 374 Delete a math-node at the beginning of a line. $148 Q
339 H 375 Guarantee that \predisplaypenalty=lOOOO will suppress page breaking before
a display. $1005 A
6 Dec 1978
376 Change the file opening statement to allow lines up to 150 characters long.
[System dependent .] L
16 Jan 1979
365 H 377 Initialize negative properly in the \advance routine with a \count as argument. $440 F
20 Jan 1979
378 Try to keep complex, buggy preambles of alignments from crashing the pro-
gram. $789 R
17 Feb 1979
376 H 379 Give more detailed information when warning about a long line being broken.
[System dependent; the buffer size in w 7 8 is very limited.] I
380 Declare p local to try-break, for the “rare” case code. [My original program
included the following comment: “This case can arise only in weird circum-
stances due to changing line lengths, and the code may in fact never be
executed.” Later Michael Plass will discover that variable line lengths re-
quire an entirely different algorithm, using last-special-line .] $847 L
THE ERRORS OF TEX 66 1
334 ++ 381 Don’t omit the raggedness correction when the last line of paragraph has to
shrink. [Obsolete in m 8 2 . 1 F
22 Feb 1979
363 ++ 382 Don’t forget to return from g e t x - t o k e n after finding \par. $351 F
383 Add a new parameter: \lineskiplimit. $679 Q
384 Change the syntactic sugar: ‘\hbox par’ replaces ‘\hjust t o . . .Coverfull)’.
[This vastly improves on the old idea (see #40), but there still is no internal
vertical mode.] C
385 Introduce new names \hbox and \vbox for \hjust and \vjust. $1071 C
386 Add a new condition: \ifpos. [It will later be generalized to \ifnum and
\ i f dim.] 5513 G
387 Add vu and \varunit. [ w 8 2 will eventually allow arbitrary internal dimen-
sions as units of measure.] 5453 G
312 +-+ 388 Add an em unit. $455 G
389 Legalize \hbox spread (negative dimension) [since scan-spec no longer uses
the sign as a flag]. 5645 C
10 Mar 1979
370 390 Make scan-toks expand \count during \xdef. [This will change later when
\the and \number are introduced.] $367 C
23 Mar 1979
391 Put only 100000 pt stretch at the end of a paragraph instead of 10000000000 pt.
[In w 7 8 , “infinite” glue is actually finite but large; in the language of
w 8 2 we would say that \parfillskip, which is not yet user-settable, is
being changed to be like \hf il instead of like \hf ill.] 5816 Q
392 Treat the last line of a paragraph more consistently with the other lines (e.g.,
when \hf il appears in mid-paragraph), by effectively inserting znf-penalty
at the end. 5816 Q
31 Mar 1979
393 Ensure that penalty nodes aren’t wiped out, in weird cases where breaks occur
at penalties that normally disappear. 5879 S
27 Apr 1979
394 Correct the page number count when files begin with an empty page. [System
dependent .] A
395 Allow the rnath-code table to be changeable via \chcode. [In m 8 2 , \chcode
will split into \mathcode and \catcode.] $1232 G
332 +-+ 396 Don’t accept ‘e’ after an error message if not inputting from a file. $84 I
29 May 1979
397 Don’t call end-file-reading if you haven’t already invoked begin-file-readzng ; this
could happen when trying to recover from an error in start-input. $537 F
7 Jun 1979
306 H 398 Be sure to eject two pages, when \eject comes just at the time another break
is preferable (e.g., when the page has just become too full). $1005 A
27 Jun 1979
354 ++ 399 Don’t say ‘You can’t do that in math mode’ when the user says ‘$$’ in
restricted horizontal mode! $1138 I
30 Jun 1979
400 Add ud, dp, ht dimension units. 5455 G
307 401 Don’t try the system area for file names whose area is explicitly indicated. 5537 I
662 D. E. KNUTH

1 Jul 1979
402 Allow letters as (ASCII) numbers [without the ‘ marker introduced later]. $442 G
2 Jul 1979
403 Fix a \gdef bug: If the control sequence was never defined before [this later
became the restore-zero option], don’t remove it at group end. $282 F
16 J u l 1979
320 H 404 Update show-noad-list to be like show-node-list. [The two routines, originally
separate, will be merged in ‘QX82.1 $238 I
18 J u l 1979
405 Extend capacity from 32 fonts to 64 fonts if desired. $134 G
406 Add new eztra-space parameter to all text fonts (requested by Frances Yao). $558 Q
407 Make each node-noad print properly in show-noad-list. $183 F
408 Make \jpar allow any break if it is 1000000 or more. [In ‘QX82, a \tolerance
of 10000 or more allows any break.] $851 Q
23 J u l 1979
409 Introduce new primitives \hf il, vf il, \hf ilneg, \vf ilneg. $1058 E
410 Add \ifmode. $501 G
411 Add \f irstmark. $1012,1016 G
412 Allow break at leaders (horizontal mode only). $149 C
25 J u l 1979
213 H 413 Revise error so that online insertions work properly after end-of-file errors. $336 I
411 H 414 Change ‘iffirst-mark # 0’ to ‘iffirst-mark 2 0’ [because -1 is used to indicate
‘not yet given a value’]. $1012 B
28 Jul 1979
370 415 Stop \xdef from expanding control sequences after \def ’s. [This decision will
H

be rescinded later, after several more years of experience with macro expan-
sion will suggest better ways to cure the problem.] $366 C
416 Change symbolic printout for control symbols. [System dependent.] $49 I
308 H 417 Avoid linefeeds in the transcript file. [System dependent.] L
370 H 418 Expand topmark, etc., in \xdef. $366 C
4 A u g 1979
413 H 419 Fix an error introduced recently: \par was suddenly omitted at end of page.
[System dependent.] B
11 A u g 1979
420 Change error messages that use SAIL characters not in standard ASCII. $360 P
28 A u g 1979
411 H 421 Move the command ‘first-mark t -1’ from wpackage to fire-up. $1012 D
403 H 422 Correct a serious \gdef bug: Control sequences don’t obey a last-in-first-out
discipline, so ‘QX loses things from the hash table when deleting a control
sequence. $259 S
To fix this, I either need to restrict (so that \gdef can be used inside a
group only for control sequences already defined on the outer level) or need
to change the hash table algorithm. Although all applications of T&X known
to me will agree to the former restriction, I’ve chosen the latter alternative,
because it gives me a chance to improve the language: Control sequences of
arbitrary length will now be recognized.
423 Make sure that unsawe cannot call eq-destroy with a value from the upper part
of eqtb. $282 D
T H E ERRORS OF TEX 663
I noticed this long-standing bug while fixing #422. It had very low probability
of causing damage (e.g., it required a certain field of a floating-point number
to have a certain value), but it would have been devastating on the day it
first showed up!
29 Aug 1979
424 Call eq-destroy when a control sequence is \gdef ’ed after being \def ’ed. $283 F
418 H 425 Treat the first token consistently when \topmark and its cousins are expanded
in scan-toks. $477 F
Now I’ve checked things pretty carefully and I think T&X is (‘fully debugged.”
25 Jan 1980
338 H 426 Display runaway alignment preambles. $306 I
427 Introduce active characters (one-stroke control sequences). [I don’t yet go all
the way: The meanings of Lx’ and ‘\x’ have to be identical.] $344 G
7 Feb 1980
314 H 428 Fix a glaring omission: Op space \> was never implemented in math mode! $716 F
25 Feb 1980
429 Add a new dimension ‘ex’ (for units of xheight). $455 G
3 Mar 1980
427 H 430 Allow the control sequence \: to be redefined [it was the ‘select font’ operator];
this allows the character : to be active. [Obsolete.] C
23 Mar 1980
0 An extend-m-for-the-eighties party:
431 Add a new \copy feature. $204 G
432 Add a new \uubox feature. $1110 G
433 Add a new \open feature [later \openout]. $1351 G
434 Add a new \send feature [later \write]. $1352. G
435 Add a new \leqno feature, requested by MDS. $1204 G
436 Add a new \ifdimen feature [later \ifdim]. $513 G
437 Make \(space) in vertical mode begin a paragraph. $1090 c
438 Add a new \font feature [replacing the silly previous convention that a font
must be defined when it is first selected]. $1256 G
439 Add new \parVal and \codeVal features [later \the (whatever) 1, $413 G
427 H 440 Don’t let active characters gobble the following space. $344 c
208 +-+441 Add a new parameter to govern amount of token list dumped. [Obsolete.] $295 G
442 Add a new \linebreak feature [later replaced by \break]. $831 G
25 Mar 1980
(‘Still working on the above, also thought of more.)
443 Add a new \mskip feature. $716 G
444 Add a new \newname feature (soon changed to \ l e t ) . $1221 G
430 H 445 Allow any control sequence to be redefined. $275 G
446 Send the output to the user’s current file area, even when input comes from
elsewhere. $532 I
27 Mar 1980
447 Compute the xheight for accents in math mode from family 1, not family 3.
[Obsolete.] Q
28 Mar 1980
448 Increase minimum clearance between subscript and superscript. §759 Q
29 Mar 1980
222 H 449 When a display follows a display, the second should have the ‘shortskip’ glue. $1146 Q
664 D . E. KNUTH

4 Apr 1980
445 H 450 Look at current token meanings when trying to recognize \tabskip in alignment
preambles. $782 A
23 Apr 1980
451 Estimate the length of printed output, for the new priority feature on our XGP
device driver. [System dependent.] I
434 H 452 Break long \send lines into pieces so that the file can be read in again. [System
dependent .] C
19 May 1980
182 H 453 Don’t make \left and \right delimiters too large; they need to be only 90%
of the enclosed size. [This eventually became \delimiterfactor.] $762 Q
2 1 May 1980
454 Add a new \pagebreak feature [later \vadjust{\break)]. $655 G
13 Jun 1980
0 Today I’m beginning to overhaul the line-breaking routine, and I’ll also install
miscellaneous goodies.
455 Allow a radical sign to be in different font positions. $737 G
456 Clear empty tokenlists off input stacks to allow deeper recursions (suggested
by Jim Boyce’s macros for chess positions). $325 E
457 Make \spaceskip and \parfillskip changeable. $1228 G
458 Add a new parameter \rfudge (per request of Zippel) [later \mag]. $288 G
459 Add a new parameter \loose [later \looseness];now parameters are allowed
to take negative values. $875 G
460 Remove the variable just-par. [Obsolete; it was the real equivalent of an
integer]. E
14 Jun 1980
461 Install new line-breaking routines, including \parshape. (These major changes
are introduced as Michael Plass and I write our article.) $813 Q
462 Add a new parameter \exhyf [later \exhyphenpenalty]. $870 G
16 Jun 1980
444 H 463 Change conventions in eqtb so that glue is distinguishable from other equiva-
lents. $275 S
444 H 464 Don’t expand \b in \xdef C\d\b{. . .1) after \let\d=\def. [Obsolete.] A
444 H 465 Avoid creating dead storage when doing unsave in certain regions. $275 D
17 Jun 1980
466 Allow negative dimensions in rules. $138 C
19 Jun 1980
463 H 467 Make the new test for glue at the outer level of show-eqtb. $252 B
27 Jun 1980
453 H 468 Don’t let \left and \right become too small for big matrices. [This eventually
became \delimitershortfall.] $762 Q
3 Aug 1980
469 Don’t move extra-wide, numbered equations flush left unless they begin with
glue. $1202 Q
15 Sep 1980
461 H 470 Say ‘ 2 fz’ instead of ‘> f.’ in the pre-hyphenation routine; I’d forgotten my
definition of fz [a variable used to test for a sequence of lowercase letters in
the same font]. $897 M
395 H 471 Check the range of the index in \chcode before saving the old value. $1232 R
THE ERRORS OF TEX 665
18 Sep 1980
457 H 472 Don’t forget to increase the reference count to \parfillskip,or it will myste-
riously vanish. $816 D
19 Sep 1980
412 H 473 Make leaders break like glue in both horizontal and vertical modes. $149 C
364 H 474 Make \mathsurround break properly at left and right end of lines. $879 Q
13 Oct 1980
461 H 475 Remove spurious overfull boxes generated when the looseness criterion fails.
[Obsolete.] I
461 H 476 Redesign the iteration for looseness; breakpoints were not chosen optimally. $875 A
461 H 477 Avoid storing a lot of breakpoints when they are dominated by others. $836 E
366 H 478 Don’t say ‘cur-node’when you mean ‘mem[cur-node]’. $1105 B
461 H 479 Prefer the oldest break to the youngest break when two break nodes have the
same total demerits. $836 Q
461 H 480 Don’t make badness too big for floating-point calculations, when forced to make
an overfull box. [Obsolete.] L
10 Dec 1980
481 Make it impossible to get unmatched ‘1’in a delimited macro argument. $392 R
482 Add new \topsep and \botsep features. [These are W 7 8 ’ s way to put space
at the edge of inserts, replaced in m 8 2 by the \skip register corresponding
to an \insert class.] $1009 G
6 Jan 1981
483 Install new routines for reading the font metrics, using Ramshaw’s TFM files
instead of TFX files. $539 P
484 Abort after reporting 100 errors, if not pausing on errors. $82 I
485 Add new \spacef actor and \specskip and \skip primitives. [At this time we
write ‘\specskipd=lOpt’and ‘\skip3’for what will become ‘\skip3=lOpt’
and ‘\hskip\skip3’in m 82 .
1 $1060 G
366 H 486 \unskip is now allowed in internal vertical mode. $1105 G
26 Jan 1981
482 H 487 Don’t say ‘mem[q]’when you mean ‘9’. (See #143 and #478.) $1009 B
27 Feb 1981
417 H 488 Put some linefeeds back into the transcript file, in order to prevent overprinting
in listings. [System dependent.] I
489 Add a new \dpenalty feature [later \postdisplaypenalty]. $1205 G
490 Add the dimension cc for European users. $458 G
491 Make scan-keyword match uppercase letters as alternatives to lowercase ones
(suggested by Barbara Beeton’s experiments with \uppercase). $407 C
492 Add nonstop mode so that overnight batch processing is possible. $73 I
2 Mar 1981
422 H 493 Fix a still more serious \gdef bug: The generality of \gdef almost makes it
a crime to forget any control sequence names, ever! (The previous bug was
only the tip of an iceberg.) $259 S
494 Issue warning message at the end of a file page if nesting level isn’t zero. [System
dependent .] I
5 Mar 1981
495 Keep track of maximum memory usage, for statistical reporting. [Obsolete.] $125 I
350 H 496 Prune away glue and penalties at top of page after marks, sends, inserts. $1000 Q
497 Allow \mark in horizontal mode. [Later it will be \vadjust{\mark.. .I.] $655 G
666 D. E. KNUTH

498 Allow optional space before a required left brace, e.g., \if AA C . . . I . [See
#251.] $403 C
499 Issue an incomplete \ i f error, to help catch a bad \ i f . $336 I
17 Mar 1981
494 H 500 Omit the warning message at end of a file page unless the nesting level has
changed on that page. [System dependent.] I
310 ++ 501 Fix the spacing when there is a very tall subscript with a superscript. $759 Q
20 Mar 1981
371 H 502 Make space-eating after \ e l s e fully consistent between the true and false cases.
[Obsolete.] S
24 Mar 1981
496 H 503 Change glue-spec-size to ins-spec-size in wpackage [where insertions are done].
[Obsolete.] B
5 Apr 1981
501 H 504 Fix a typo ('+' instead of '-') in the new subscript code; this shifted certain
subscripts down instead of up. $759 B
18 Apr 1981
505 Make leaders with rules of specified size act like variable rules. $626,635 G
29 Apr 1981
461 H 506 Don't consider badness > threshold at a line \break except in an emergency. $854 A
13 Jul 1981
402 H 507 Allow other characters as numbers. $442 C
294 H 508 Avoid dead storage if a no-new-controlsequence error occurs. [Obsolete.] $259 R
509 Add a new \ i f x feature. $507 G
510 Add new features \xleaders and \cleaders. $626,635 G
14 Jul 1981
507 H 511 Amend the new code for constants; the ' .' in ' . 5 ' is thought to mean '056 ! $442 S
507 H 512 And fix an egregious blunder in that code: New commands at the end of a
procedure are ignored when earlier statements exit via return. $442 L
4 Aug 1981
513 Accept alphabetic codes for all online error recovery options, instead of insisting
on control codes like line feed or form feed. [The original error-recovery codes
were suggested by the conventions of the SAIL compiler.] $84 P
514 Add a new \thebox feature [later \lastbox]. $1079 G
7 Aug 1981
515 Add f il, f i l l , and f i l l 1 as units for glue stretching or shrinking. $454 G
516 Suppress the overfull box error when shrinkage amount is negative. $664 I
9 Aug 1981
517 Let unset boxes inherit the size of their parent in alignments. $810 Q
12 Apr 1982
518 Make INITEX dump out the font-dsize array needed by the new D V I output
module. $1322 F
1 May 1982
151 H 519 Fix clean-box so that mlist-to-hlist cannot make Zznk(q) = 0 and type(q) =
glue-node. $720 S
[That was the historic final change to m 7 8 . AJJ subsequent entries in this Jog
refer to QX82.1
THE ERRORS OF TEX 667

28 Sep 1982
Here are the first changes made to the preliminary listing of QX82 that was
published by the project earlier this month.
520 Insert the missing cases letter and other-char after z t o k e n looks ahead. $1036 F
521 Change ‘\pause’to ‘\pausing’. $236 C
522 Reset overfull-rule when determining tabskip glue. $804 D
523 Fix the logic for scanning \ i f case [in obsolete syntax-everything is still done
with braces since ‘\fi’doesn’t exist yet]. $509 A
30 Sep 1982
524 Change “0.0“t o “?.?“ (suggested by DRF). $186 I
2 Oct 1982
525 Use conditional thin spacing next to ‘Inner’ noads. $764 Q
526 Make thick spaces conditional. $766 Q
4 Oct 1982
527 Increase trie-size from 7000 to 8000, because of Frank Liang’s improved (but
longer) hyphenation patterns. $11 P
6 Oct 1982
528 Change the string lengths to match the new m - j o r r n a t - d e f a d t . $520 F
Version 0 of TE$ is being released today!
8 Oct 1982
529 Fix a blunder: I decreased h mod a quarterword when it should have been
decreased mod trie-op-hash-size (HWT). $944 B
9 Oct 1982
530 Fix a typo (‘!’ not ‘&’) in the WEB documentation. $524 P
531 Remember to call initialize if a different format was preloaded (Max Diaz). $1337 F
Version 0.1 incorporates the above changes.
12 Oct 1982
532 Add the ‘\immediate’feature, by popular request. $1375 G
Version 0.2 incorporates this (somewhat extensive) change.
13 Oct 1982
533 Introduce new WEB macros so that glue-ratio is more easily changed. $109 P
I began writing The m b o o k today: edited the old preface and searched in the
library for quotations.
14 Oct 1982
534 Change the type of hd to eight-bits; it’s not a quarterword (HWT). $649 B
535 Revise the optimization of D V I commands: It’s not always safe to eliminate pop
when the preceding byte is push, since D V I commands have variable length!
(Embarrassing oversight caught by DRF.) $601 S
15 Oct 1982
536 Test ‘prev-depth > ignore-depth’, not ‘f’. $679 C
Version 0.3incorporates the above changes.
16 Oct 1982
537 Omit definition of align-size; it’s never used (Bill Scherlis). $11 P
538 Inhibit error messages when packaging box 255. $1017 I
21 Oct 1982
539 Subtract width(q) from page-goal, don’t add it to page-so-far[l]. $1009 A
The comment in $982is correct, and so was my first draft of this code; but when
desk checking the program some months after writing it, I introduced this
bug, believing that I was making the algorithm more elegant or something.
668 D. E. KNUTH

Version 0.4 incorporates the above changes.


22 Oct 1982
540 Increase the amount of lower (variable-size) memory from 12000 to 13000, since
the TEX program listing now needs about 11500. [At this time there still is
a fixed boundary between upper and lower memory.] $12 P
541 Add a new parameter \bornaxdepth. $1086 G
Version 0.5 incorporates the above changes.
26 O c t 1982
542 Fix an off-by-one error caught by Gabi Kuper and HWT. (I forgot ‘ 1’). + $1317 B
543 Fix the spacing of displayed control sequences: print-cs should base its decision
on cat-code(p - single-base), not cat-code@). $262 B
The TRIP test detected this bug, but I didn’t notice.
27 Oct 1982
544 Set math-type before saying fetch(nucleus(q)), since fetching can have a side
effect. $752 S
28 Oct 1982
545 Install a major change: Fonts now have identifiers instead of code letters. Elim-
inate the ‘\:’ primitive, and give corresponding new features to ‘\the’. $209 G
Actually I began making these changes on October 26, but I needed two days
to debug them and to put Humpty Dumpty together again.
At this time I’m also drafting macros for typesetting The QXbook.
The above changes have been incorporated into Version 0.6.
30 Oct 1982
After years of searching, I’ve finally found a definitive definition of the printer’s
point; and (unfortunately) my previous conjecture was wrong. The truth is
that 83 pc = 35 cm, exactly; so I am changing w to conform.
546 Revise unit definitions for the ‘real’ printer’s point. $458,617 C
Version 0.7 incorporates the above.
1 Nov 1982
Oops! Retract error #546, and retract w Version 0.7; the source of my
information about points was flaky after all. My original suppositions were
correct, as confirmed by NBS Circular 570.
4 Nov 1982
547 Revise the definition of dd, conforming to the definitive value shown me by
Chuck Bigelow. $458 C
545 H 548 Introduce ‘‘frozen’’ copies of font identifiers, to be returned by \the\font, so
that font manipulation is more robust. $1257 R
5 Nov 1982
549 Reset looseness and paragraph shape when beginning a \vbox. $1083 D
6 Nov 1982
550 De-update align-state when braces are in constants. $442 D
551 Improve error recovery for bad alignments. $1127 I
Today I wrapped u p Chapters 4 and 5.
8 Nov 1982
552 Give more power to \let: the right-hand side needn’t be a control sequence. $1221 G
553 Amend show-context to say ‘(base-ptr = input-ptr) V ’; otherwise undefined
control sequences can be invisible in unusual cases (John Hobby). $312 I
554 Compute demerits more suitably by adding a penalty squared, instead of adding
penalties before squaring. $859 A
THE ERRORS OF TEX 669

Previously a slightly loose hyphenated line followed by a decent line was con-
sidered worse than a decent hyphenated line followed by a quite loose line.
10 Nov 1982
555 Save a bit of buffer space by declaring pool-$le only in INITEX. $50 E
11 Nov 1982
556 Introduce a new context indicator to clarify W’sscanning state: A special
type called backed-up is distinguished from other kinds of inserted lists; it is
called ‘recently read’ or ‘to be read again’, while others are called ‘inserted’. $314 I
557 Append a comment, ‘treated as zero’,to the missing-number message. $446 I
558 Ignore the settings of \hfuzz or \vfuzz if \hbadness or \vbadness is less than
100. $666,677 I
13 Nov 1982
Major surgery on the program is planned for today, because of new ideas sug-
gested by correspondence with MDS and other macro writers.
559 Introduce a new \tokens register; this will be useful and easy to add, since
T@ already can handle \everypar and \output. $1227 G
560 Change get-x-token to get-token when scanning an optional space; then a con-
struction like \def\f oo{ . . .)\f oo won’t complain that \f oo is undefined. $443 C
This change was retracted when it was being debugged, because it could cause
endv to abort the job. Then it was re-established again when I found that
endv needed to be more robust anyway. [But it was eventually rescinded
again.]
561 Make \span mean ‘expand’ in a preamble. $782 G
562 Use three separate if tests instead of ‘A’ in the inner loop of get-next, to gain
efficiency. $342 E
563 Introduce get-r-token so that assignments have uniform error messages and so
that frozen equivalents cannot be changed. $1215 R
I gave a few variables more mnemonic names as I made these changes.
564 Move conditional statements from the semantics (‘stomach’) part of to
the syntax (‘mouth’) part, by introducing ‘\fi’. Also introduce \csname and
\endc sname. 3372,489-500 C
This makes macros much more predictable and logical, but it is by far the most
drastic change ever made to w. The program began to come back to life
only after three days of solid hacking.
Several other things were cleaned u p as part of this change because it is now
more natural to handle them differently. For example, a null control sequence
has now become more logical.
The result of all this is called Version 0.8.
18 Nov 1982
Today I resumed writing Chapter 8. Tomorrow I’m 214 days old!
21 Nov 1982
565 Declare c as a local variable for hyphenation (DRF). $912 F
566 Omit the “first pass” and try hyphenations immediately, if \pretolerance is
negative (suggested by DRF). $863 E
567 Don’t ship out incredibly huge pages; they might foul up D V I files. $641 R
2 Dec 1982
568 Add new features \everymath and \everydisplay. $1139,1145 G
569 Add a new feature \futurelet. $1221 G
The changes above have been incorporated into Version 0.9 of TeX.
670 D . E. KNUTH

7 D e c 1982
570 Add a new \endinput primitive (suggested by FY). $362,378 G
8 Dec 1982
571 Try 08-save, if \par occurs in restricted horizontal mode. (This avoids em-
barrassment if T)$ says ‘type a command or say \end’,then when you type
\end it says you can’t!) [However, I soon retracted this change.] $1094 I
21 Dec 1982
572 Redefine \relax so that its chr field exceeds 127. (This facilitates the test for
end in scan-file-name.) $265 A
566 H 573 Call begin-diagnostic when omitting the first pass of line breaking. $863 F
574 Fix the logic of glue scanning: In \hskip-lpt plus2pt the minus should apply
only to the ipt. $461 A
23 D e e 1982
575 Renumber the decimal codes in paragraph statistics for loose and tight lines;
they were ordered backwards. $817 I
570 Treat a paragraph that ends with leaders like a paragraph that ends with glue. $816 C
577 Allow commas as alternates to radix points, for Europeans. $438 c
578 Change \hangindent to a normal dimension parameter. [It had been a combi-
nation of \hangindent and \hangafter, with special syntax.] $247 C
579 Make \prevgraf accessible to users. $422,1244 G
580 Split \clubpenalty off from \uidoupenalty. $890 G
I’m typing Chapter 14 while making these changes.
24 D e e 1982
581 Use back-input instead of goto reswitch when inserting \par, because \par
may have changed. $1095 S
25 Dee 1982
It’s 1Opm after a very Merry Christmas!
582 Don’t prompt for a new file name if \openin doesn’t find a file. $1275 I
583 Add a new \jobname primitive. $472 G
584 Give the user a way to delete the dollar sign, when T)$i decides to insert one. $1047 I
585 Allow optional equals after \parshape, and implement \the\parshape. $423,1248 C
26 Dec 1982
580 Add an zf-line-field to the condition stack entries, so that more informative
error messages can be given. $489 I
549 I-+ 587 Introduce a normal-paragraph procedure, since initialization is needed also
within \insert, \vadjust, \valign, \output. $1070 D
27 Dec 1982
588 Give users access to \pagetotal and \pagegoal. (Analogous to #679 and
#585, but simpler.) $1245 G
589 Introduce \tracingpagas,allowing users to see page-optimization calculations.
Also split \tracingparagraphs off from \tracingstats. $987,1005,1011 I
The changes above have been incorporated into Version 0.91 of m.
31 Dec 1982
590 Break the buildpage procedure into two parts, by extracting the section now
called fire-up. [This is necessary because some Pascal compilers, notably for
IBM mainframes, cannot deal with large procedures.] $1012 P
564 591 Make \ifoddi\else legal by introducing if-code. $489 s
592 Improve alignments when columns don’t occur: Don’t append null boxes for
columns missing before \cr, and zero out the tabskip glue after nonpresent
columns. !PO2 Q
THE ERRORS OF TEX 67 1
593 Make the error message about overfull alignment more intelligible. $801,804 I
The changes above have been incorporated into Version 0.92 of m 8 2 , which
was the last version of 1982, completed at 11:59pm on December 31.
3 Jan 1983
Today I’m beginning to write Chapter 15, and planning the \output routine
of plain. tex.
594 Change the logic of its-all-over; use max-dead-cycles instead of the fixed con-
stant 100. $1054 C
595 Don’t forget to pop-nest when an insert is empty. Also disallow optional space
after \insert n {. . . I . $1100 F
4 Jan 1983
541 H 596 Use the \bornaxdepth that’s declared inside a \vbox when packaging it. $1086 C
597 Rename \groupbegin and \groupend as \begingroup and \endgroup. $265 C
598 Make \deadcycles accessible to users. $1246 G
599 Base the split insertions on natural height plus depth, not on delta. $1010 Q
The changes above have been incorporated into Version 0.93.
6 Jan 1983
600 Add push-math to handle a case where I forgot to clear incornpleat-noad. (This
long-standing bug was unearthed today by Phyllis Winkler.) $1136 D
588 H 601 Add \pageshrink,etc., too. $1245 G
602 Introduce new parameters \f loatingpenalty,\insertpenalties. Also adopt
a new internal representation of insertion nodes, so that \floatingpenalty,
\splittopskip and \splitmaxdepth can be stored with each insertion. $140,1008 G
7 Jan 1983
603 Improve the rules for entering new-line, in particular when the end-of-line
character is active. §343 Q
9 Jan 1983
604 Distinguish between implicit and explicit kerns. $155,896 Q
605 Change the name \ignorespace to \ignorespaces. $265 C
560 H 606 Don’t omit a blank space after \def,\message,\mark,etc.; the previous hodge-
podge of rules is impossible to learn. $473 c
The above changes appear in Version 0.94.
12 Jan 1983
Beginning to write the chapters on math today.
607 Add a new feature: active characters in math mode. $1151 G
15 Jan 1983
608 Fix a surprise bug: ‘$1-$’ treated the - as binary. $729 A
609 Initialize space-factor inside discretionaries. $1117 D
16 Jan 1983
610 Fix an incredibly embarrassing bug: I forgot to update spotless in the error
routine! F
While fixing this, I decided to change spotless to a more general history vari-
able, as suggested by IBMers who want a return code. $76,82,1335
611 Replace two calls of confusion by attempts at error recovery, in places where
‘This can’t happen’could actually happen. $1027,1372 I
18 Jan 1983
612 Introduce the normalize-selector routine to protect against startup anomalies
when the transcript file isn’t open. Also make open-log-file terminate in some
cases. $92,535 R
672 D. E. KNUTH

591 H 613 Insert \relax, not a blank space, to cure infinite loop like \ifeof\fi (LL). $510 R
614 Change the old \limitswitch to \limits, \nolimits, and \displaylimits.
Incidentally, this fixes a bug in the former positioning of integral signs. $682,749 G
615 Give a \char in math mode its inherited \mathcode. 51151 C
525 616 Make underline, overline, radical, vcenter, accent noads and i. . . I ail revert
to type Ord instead of type Inner. Introduce a new primitive \mathinner.
(This fixes the spacing, which got worse in some ways after change #525.) 3761 Q
I’m working on Appendix G today.
19 Jan 1983
617 Introduce a \mathchoice primitive. $1174 G
618 Move \input from the stomach to the eyes. $378 C
619 Introduce \chardef, analogous to \mathchardef. 31036,1224 C
620 Change \unbox to \-box and \unvbox; also add \unhcopy. $1110 G
621 Consider \spacef actor, \pagetotal, etc., as part of prefixed-command, even
though they are always global. $1211 c
20 Jan 1983
622 Switch modes when \hrule occurs in horizontal mode or \vrule in vertical.
§1090,1094 C
623 Add a new \globaldefs feature. $1211 G
21 Jan 1983
624 Optimize the code, in places where it’s important (based on frequency counts
$ accumulated during the past week): Introduce fast-get-avail
of ‘I)usage
and fast-store-new-token; reduce procedure-call overhead in begin-token-list ,
end-token-list , b a c k i n p u t , flush-node-list ; change some tests from ‘if a A b’
to ‘if a then if b’. $122,371 E
22 Jan 1983
625 Save space in math lists: Don’t insert penalties within restricted horizontal
mode; simplify trivial boxes. $721,1196 E
626 Fix a surprising oversight in the rebox routine: Ensure that b isn’t a vbox. $715 S
545 H 627 Make \nullfont a primitive, so that cur-font always has a value. (This is
a dramatic improvement to w 7 8 , where a missing font was a fatal error
called ‘Whoa’!) $552 C
24 Jan 1983
586 H 628 List all incomplete \if’s when the job ends. $1335 I
29 Jan 1983
629 Change initialization of alignstate SO that \halign\bgroup works. $777 c
30 Jan 1983
625 +-+ 630 Be sure to test ‘is-char-node(q)’ when checking for a trivial box. $721 D
By extraordinary coincidence, this bug was caught when somebody used font
number 11 (= kern-node) in the second character of a list of length 2!
631 Improve format for stats at end of run, as suggested by DRF. $1334 1
The changes above have been incorporated into Version 0.95.
632 Don’t ignore the space after a control symbol (except ‘\ ’). $354 c
633 Remove all trailing spaces at the right of input lines, so that there’s perfect
compatibility with IBM systems that extend-short lines with spaces. $31 P
3 Feb 1983
634 Assume that a math-accent was intended, after giving an error message in the
+
case mmode accent. $1165 I
635 Add new primitives \iftrue and \iffalse. $488 G
T H E ERRORS OF TEX 673

6 Feb 1983
636 Improve the accuracy of fixed-point arithmetic when calculating sizes for \left
and \right. (I had started by dividing delimiter-factor, not deltal, by 500.) $762 A
12 Feb 1983
637 Change the name \delimiterlimit to \delimitershortfall. $248 C
638 Make \aboveuithdelims. . equivalent to \above;change the order of operands
so that delimiters precede the dimension. $1182 C
607 H 639 Remove the kludgy math codes introduced earlier; make \f am a normal integer
parameter and allow \mathcode to equal 215. $1233 C
640 Don’t let \spacef actor become 215 or more. 81233,1243 R
I finished drafting Chapter 17 today.
14 Feb 1983
639 H 641 Replace octal output (print-octal) by hexadecimal (print-hex) so that math
codes are clearer. $67 I
619 H 642 Don’t forget char-given in the math-accent routine. $1124 F
17 Feb 1983
643 Switch modes when \halign occurs in horizontal mode, or \valign in vertical
mode. $1090,1094 C
18 Feb 1983
644 Add a new feature \tracingrestores. This requires a new procedure called
show-eqtb, whose code can be interspersed with the eqtb definitions. $252 1
25 Feb 1983
622 H 645 Suggest using \leaders when the user tries a horizontal rule in restricted hor-
izontal mode. 51095 I
27 Feb 1983
646 Specify the range of source lines, when giving warning messages for underfull
or overfull boxes in alignments. $662,675 I
Why did it take me all day to type the middle part of Chapter 18?
4 Mar 1983
647 Introduce a new feature \xcr (suggested by LL). [Changed later to ‘\crcr’.] $785 G
631 H 648 Subtract out W’sown string requirements from the stats. $1334 I
6 Mar 1983
649 Add new features \everyhbox and \everyvbox. $1083,1167 G
9 Mar 1983
650 Avoid accessing math-quad when the symbol fonts aren’t known to be present. $1199 R
533 H 651 Introduce float and unfloat macros to aid portability (HWT). $109 P
652 Introduce new names \abovedisplayskip and \beloudisplayskip for the old
\dispskip;also\abovedisplayshortskipand\beloudisplayshortskip for
the old \dispaskip and \dispbskip. $226 C
10 Mar 1983
653 Unbundle \romannumeral from \number (suggested by FY). $468 C
12 Mar 1983
654 Ignore leading spaces in scan-keyword. $407 C
14 Mar 1983
631 H 655 Use write and write-ln directly when printing stats. $1334 E
16 Mar 1983
602 H 656 Refine the page-break cost function (introducing ‘deplorable’, which is not quite
‘awful-bad’), after suggestion by LL. 5974,1005 Q
674 D. E. KNUTH

The changes above have been incorporated into Version 0.96.


18 Mar 1983
657 Add a new feature \everyjob suggested by FY. $1030 G
19 Mar 1983
658 Don’t treat left braces specially when showing macros. $294 I
659 Ignore blanks that would otherwise become undelimited arguments. 9393 c
21 Mar 1983
660 Make \ l a s t s k i p handle mu-glue as well as ordinary glue. $424 F
561 I-+ 661 Expand only one level in a preamble \span. $782 C
22 Mar 1983
662 Let a single # suffice in \tokens, \message, etc. (The previous rule, in which
## was always required as in macros, was a loser especially in \write where
you had to say ####!) 5477 c
663 Require the keyword ‘to’ in \read. (This will avoid the common error of an
incomplete constant when no space appears before the \cs.) Also allow
terminal 1/0 as a default when a stream number is out of range. 9482,1225,1370 C
26 Mar 1983
664 Replace \ i f even(countnumber) by \ i f odd(number), for better consistency of
language. $504 C
564 H 665 Introduce the change-zf-Zzmit, to overcome a big surprise bug relating to \ i f \ i f
aabc\f i . 9497 s
Such examples show that cur-$ might not be current, in my original imple-
mentation.
28 Mar 1983
666 Tolerate non-characters as arguments to \ i f and \ i f cat. $506 G
667 Change ‘absent’ to ‘void’, a better word. $487 C
668 Clear the shift-amount in \lastbox, since I don’t want to figure out what it
means in all cases. $1081 C
29 Mar 1983
669 Wake up the terminal before giving an error message. (This means a special
print-err procedure is introduced.) (Suggested by DRF.) $34,73 I
1 Apr 1983
Today I finished Chapter 21 (boxes) and began to draft Chapter 22 (align-
ments).
670 Allow periodic preambles in alignments. $793 G
671 Make \leaders line up according to the smallest enclosing box. $627,636 C
672 Allow hyphenation after whatsits (e.g., after items for an index). $896 Q
2 Apr 1983
673 Call build-page when \par occurs in vertical mode. $1094 Q
674 Clear aux in init-row, for tidyness. $786 C
4 Apr 1983
675 Let digits switch families in math mode. $232 C
7 Apr 1983
602 H 676 Refine the test for not splitting an insertion. $1008 Q
8 Apr 1983
647 H 677 Rename \xcr as \crcr, at LL’s request. $780 C
THE ERRORS OF TEX 675
9 Apr 1983
Took a day o f fand had a chance to help print a sample page on a 150-year-old
letterpress in Murphys, California.
11 Apr 1983
678 Recover more sensibly after a runaway preamble. $339 I
12 Apr 1983
679 Make \read span several input lines, if necessary t o get balanced braces. $482 C
14 Apr 1983
680 Fix a subtle bug found by JS: $882 can make q a char-node, so we need t o test
‘if -is-char-node(q)’. [Actually I discovered much later that the real bug
was to omit ‘else’at this point.] $881 s
15 Apr 1983
681 Make \uppercase and \lowercase apply to all characters, regardless of cate-
gory. $1289 C
7:30am. After working all night, I completed a draft of the manual thru Chapter
22, for distribution to volunteer readers.
5pm. The changes above have been incorporated into Version 0.97.
17 Apr 1983
682 Change ‘small-number’ to ‘ 0 . . 65’ in the hyphenation routine (DRF). $901 R
683 Flush patterns in the input when the user tries \patterns outside of INITEX
(suggested by DRF). $1252 I
Tomorrow I fly to England, where I’ll lecture and write a paper about ‘Literate
Programming’ [Cornp. J. 27 (1984), 97-1 111.
14 May 1983
663-684 Improve the behavior of \read from terminal (suggested by Todd Allen at
Yale). [I’d forgotten t o implement the extended stream numbers in #663.
Also, the prompt is now omitted if n < 0.1 $484 I
18 May 1983
685 Restrict \write n to the transcript file only, if n < 0. $1350 I
686 Unify the syntax for registers and internal quantities. (Remove primitives called
‘\insthe’and ‘\minusthe’;rename scan-the t o scan-something-internal, and
change its interface accordingly; clean up command codes generally.) $209,413 C
687 Introduce new parameters \hof fset, \voff set. $617 G
24 May 1983
688 Introduce a new parameter \everycr (suggested by MDS). $774,799 G
Many macro writers and preliminary-manual readers have been requesting new
features; I’ll try to keep the language as concise and consistent as possible.
25 May 1983
689 Introduce \countdef, \dimendef, etc. (suggested by DRF long ago, easy now
in view of #686). $1224 G
690 Introduce \advance, \multiply, \divide (suggested by FY). $1240 G
691 Introduce \hyphenchar; this requires a new command assign-font-znt , plus
minor changes to about 15 modules. $915 G
692 Introduce \skewchar (easy because of #691). $741 G
693 Introduce \noexpand. (I had difficulty thinking of how to implement this one!)
$358,369 G
694 Introduce \meaning. $296 G
695 Remove ‘dm’and ‘vu’;allow the more general ‘ .5\hsize’. $455 G
696 Change ‘\texinfo f n’ to ‘\fontdimen n f’. $578 C
676 D. E. KNUTH

27 May 1983
697 Add a new feature \afterassignment (suggested by ARK). $1269 G
698 Adjust the timing so that commands like ‘\chardef\xx=5\xx’behave sensibly. $1224 C
28 May 1983
699 Ignore ‘\relax’as if it were a space, in math mode and in a few other places
where \relax would otherwise be erroneous. $404 C
700 Improve \mathaccent spacing with respect to subscripts and superscripts (sug-
gested by HWT). $742 Q
30 May 1983
594 H 701 Terminate a job only when dead-cycles = 0. $1054 C
The changes above constitute Version 0.98.
3 Jun 1983
I finished the draft of Chapter 23 (output routines) today.
702 Allow \mark and \insert and \vadjust in restricted horizontal mode, and
also in math mode. (This is a comparatively big change, triggered by the
fact that \mark in a display presently causes ‘I)$ to crash with ‘This can’t
happen’!) The global variable adjust-tail is introduced. $796,888,1085 G
6 Jun 1983
695 ++ 703 Replace (and generalize) the previous uses of ht, ud, and dp in dimensions by
introducing the new control sequences \ht, \ud, and \dp. $1247 G
-
704 Display sub-parts of noads with the symbols and - instead of ( and [. $696 I
694 ++ 705 Allow A . .F in hex constants to be other-char as well as letter. $445 c
7 Jun 1983
654 H 706 Remove an instance of (Scan optional space), since it’s now redundant. $457 E
707 Legalize \mkern\thinmuskip and \mkern5\thinmuskip. $456 C
708 Clean up the treatment of optional spaces in numerical specifications. $455 c
A construction like 2.5\space\space\dimenOwas previously valid after ‘plus’
or ‘minus’ only!
I’mobviously working on Chapter 24 today.
545 H 709 Allow ‘\font’as a (font identifier) for the current font. $577 c
623 ++ 710 Don’t make \gdef global when global-defs < 0. $1218 C
711 Produce zero-glue as the outcome of \advance\spaceskip by-\spaceskip. $1229 E
712 Make \show do something appropriate for every possible token. $1294 I
559 H 713 Replace the (single) \tokens parameter by an array of 256 token registers. $230 G
714 Allow \indent in math mode; also make \valign in math mode produce the
‘Missing $’ error. $1046,1093 C
715 Remove redundant code: There’s no need to check cur-group or call off-save
when starting alignments or equation numbers in displays. §1130,1142 E
8 Jun 1983
716 Disallow \openout-1 and \closeout-I. $1350 C
717 Disallow \lastbox in math mode. $1080 C
9 Jun 1983
718 Call backemor, not error, when \leaders aren’t followed by proper glue. $1078 I
719 Initialize for a possible paragraph, after \noalign in a \valign. $785 D
10 Jun 1983
720 Expand the optional space after an ASCII constant. $442 C
12 Jun 1983
721 Set space-factor +- 1000 after a rule or a constructed accent. $1056,1123 C
THE ERRORS OF TEX 677
14 Jun 1983
722 Correct a serious blunder: Set disc-width t 0 before testing if s is null (caught
by JS). $870 D
This is a real bug that existed since the beginning! It showed u p on page 37 of
the Version 0 TRIP manual, but I didn’t notice the problem.
708 H 723 Make optional spaces after (dimen) like those after (number). $448 C
568 H 724 Insert every-display before calling build-page. $1145 C
648 ++ 725 Report m’scapacity on overflow errors in a way that’s fully consistent with
other statistical reports. $42 I
17 Jun 1983
726 Make all \tracing decisions on the basis of 2 versus <, not # versus =. $581 C
Today I finished the draft of Chapter 27 (the last chapter)!
The changes above were released as Version 0.99 on June 19, 1983.
20 Jun 1983
727 Set \catcode‘\%=14 in INITEX. $232 C
587 H 728 Call normal-paragraph when \par occurs in vertical mode. $1094 C
Once again I’m retiring about 8am and awaking about 4pm.
21 Jun 1983
558 H 729 Don’t append an overfull rule solely because of \hbadness. $666 C
730 Don’t allow the glue-ratio of shrinking to be less than -1. $810,811 R
22 Jun 1983
653 H 731 Declare the parameter to print-roman-int to be of type integer, instead of
nonnegative-integer (found by Debby Clark). $69 B
690 H 732 Make the keyword ‘by’ optional (suggested by LL). $1236 C
24 Jun 1983
733 Say ‘preloaded’ when announcing format-ident . $1328 I
25 Jun 1983
734 Add extra boxes and glue to the output of alignment. [This thwarts possible
attempts at trickery by which system-dependent glue set values computed
by \span could have gotten into ‘&X’s registers by things like \valign and
\ v s p l i t . It also has the advantage of perfect accuracy in alignment of vertical
rules.] $809 R
735 Make leaders affect the height or width of the enclosing boxes. $656,671 C
Today I’m mainly installing a much-improved format for change files in WEB
programs (suggested by DRF).
28 Jun 1983
736 Permit \=skip in vertical mode when we know that it does nothing. $1106 C
1 Jul 1983
700 H 737 Avoid redundant boxes when things like ‘{\bf A)’ occur in math. $1186 E
738 Add a ‘scaled’feature to \font input. $1258 G
700 I-+ 739 Remember to correct delta when an accented box changes. $742 D
2 Jul 1983
740 Introduce bypass-eoln, to remove anomalous behavior on input files of length 1.
(Suggested by DRF after the problem was discovered by LL). $31 R
4 Jul 1983
741 Allow codes like --b as well as --B. 5352,355 G
678 D. E. KNUTH

742 Introduce new parameters \escapechar, \endlinechar,\defaulthyphenchar,


and \defaultskewchar, to make less dependent on the character set,
(This affects many modules, since a lot of error messages must be broken up
so that they use print-esc.) G
7 Jul 1983
743 Use a system-dependent function erstat when opening or closing files (suggested
by DRF). $27 P
11 J u l 1983
The computer is back u p after more than 50 hours down time (due to air
conditioning failure).
744 Show total glue in the output of \tracingpages. $985 I
745 Guard against insertion into an hbox. $993 R
746 Legalize the assignment (tokenvar)=(tokenvar). $1227 C
747 Introduce a new parameter \errhelp. $1283 I
623 H 748 Don’t forget to check global-defs when \tabskip is changed. $782 F
12 Jul 1983
749 Allow an \outer macro to appear after \string, \noexpand, and \meaning
(Todd Allen). $369,471 C
750 Make ‘\the’an expandable control sequence (i.e., move it from the stomach to
the throat); this cleans up several annoying glitches. $367 C
751 Allow \unhbox and \unhcopy in math mode if the box is void. $1110 c
13 J u l 1983
I lectured for four hours at the TUG meeting today after very little sleep!
16 Jul 1983
The following were suggested by TUG meeting discussions.
752 Round the value of default-mle more properly: It should be 26215. $463 L
700 H 753 Fix \mathaccent again; it’s still not right! The final height should be the
maximum of the height of accented letter without superscript and the height
of unaccented letter with superscript. $742 Q
754 Add a new feature \newlinechar. $59 G
755 Allow boxes and rules in discretionaries (suggested by somebody from Hewlett-
Packard). $1121 G
756 Show all token expansions, not just macros, when \tracingcommands. $367 I
757 Allow \char in a \hyphenation list. $935 c
758 Introduce a new feature \aftergroup;it can be implemented with save-stack. $326 G
759 Run the running dimensions to alignment boundaries (suggested by ARK). $806 C
17 Jul 1983
760 Zero out hyf values at the edges, so that weird pattern data cannot lead to
Pascal range checks. $965 R
761 Decrease the hc codes for hyphenation, so that code 127 cannot possibly be
matched. $937,962 R
672 H 762 Allow whatsits after hyphenatable words. $899 c
604 H 763 Represent an italic correction as an explicit kern. $1113 C
18 J u l 1983
764 Allow lowercase letters in file names. ‘$519 C
765 Change the message ‘No output file’ to: ‘No pages of output’. $642 I
766 Confirm that a quiet mode is being entered, when error interaction ends with
Q, R, or S (suggested by ARK). $86 I
Version 0.999 was finally installed today; a new program listing has been
printed.
THE ERRORS OF TEX 679

From now on, I plan to keep all section numbers unchanged.


I’m done writing Appendix H; beginning to revise Chapter 20.
25 Jul 1983
663 H 767 Allow space after ‘to’in the \read command (FY). $1215 C
To bed at Ipm today.
27 Jul 1983
665 I+ 768 Stack the current type of \ i f ; this precaution is necessary in general (FY). $498 S
To bed at 2pm today.
29 Jul 1983
769 Avoid putting a control sequence in the hash table when it occurs after \ i f x .
(Requested by Math Reviews people.) $507 E
Finished a version of The w b o o k lacking only Appendices D, E, and I, for
distribution to interested readers.
To bed at 10:30pm, planning to arise regularly at 6am for a change.
31 Jul 1983
766 ++ 770 Call update-terminal when going quiet (HWT). $86 I
1 Aug 1983
771 Don’t put an empty line at the end of an \input file! (This simplifies the rules
and the program, and also gets around a bug that occurred at the end of
files with end-line-char < 0.) $362 C
The changes above went into Version 0.9999, which wm widely distributed.
16 Aug 1983
665 772 Rectify a ridiculous gaffe: I initialized q every time the loop of change-if-limit
H

was performed! (Found by FY.) $497 B


648 H 773 Distinguish ‘string’ from ‘strings’ when reporting statistics. $1334 I
774 Introduce lx, to correct a bug in \xleader computations (found by FY). $627 A
20 Aug 1983
775 Don’t forget to apply \/ to ligatures! $1113 F
Today I began to read all previous issues of TUGboat, in preparation for Ap-
pendix D.
27 Aug 1983
776 Add debugging hack number 16, to help catch subtle data structure bugs. $1339 I
777 Remove redundant setting and resetting of name-in-progress . $531 E
778 Suppress \input during a font size spec; otherwise cur-name is clobbered
(found by MDS) . $1258 S
779 Introduce new conditionals \ifhbox and \ifvbox. $505 G
29 Aug 1983
750 H 780 Test for an empty list, if emptiness will mess up the data structure. (Found by
Todd Allen.) $478 D
781 Use fast-for-new-token for efficiency. $466 E
782 Say ‘has only’ instead of ‘has’. $579 I
These changes yield Version 0.99999, used only at Stanford.
30 Aug 1983
783 Make funny blank spaces showable. $298 C
31 Aug 1983
754 H 784 Make \newlinechar affect print-char, not just print. $58 C
680 D. E. KNUTH

4 Sep 1983
785 Add new features \lastkern, \lastpenalty, \unkern, \unpenalty. $424,996,1105 G
OK, Appendix D is finished!!
The above changes have been installed in Version 0.999999,
17 Sep 1983
548 H 786 Don’t bother making duplicate font identifiers; that was overkill, not really
needed. $1258 P
Will this be the historic last change to w?
18 Sep 1983
787 Correct a minor inconsistency, ‘display’not ‘displayed’. $211 I
20 Sep 1983
604 H 788 Treat the kerns inserted for accents as explicit kerns. $1125 C
20 Sep 1983
789 Change ‘log’to ‘transcript’in several messages. $535,1335 I
The index was finished today; I mailed the entire W b o o k East for final proof-
reading before publication.
1 Oct 1983
790 Prevent uninitialized trie positions in case of overflow (found by Bernd Schulze). $944 D
7 Oct 1983
Henceforth our weekly ‘w lunch’ meetings will be called ‘METRFONT lunch’.
DRF begins to produce The W b o o k on our A P S phototypesetter.
14 Oct 1983
633 w 791 Ignore spaces at the ends of lines also in TEX.PO0L (found by DRF). $52 P
792 Initialize the history variable at start-here (DRF). $1332 D
18 Oct 1983
793 Extend runaway to catch runaway text (suggested by FY). $306 I
794 Reset cur-cs after back-input, not after scanning the ‘=’ (found by FY). $1226 D
24 Oct 1983
638 H 795 Change the error recovery for bad delimiters, in accordance with the changed
syntax. (Found by Barry Smith.) $1183 I
9 Nov 1983
796 Optimize the code a bit more, based on empirical frequency data gathered
during September and October: In $45, use the fact that the result is almost
always true. In $380, delete ‘while true do’ since many compilers implement
that badly. Rewrite $852 to avoid calling badness in the most common
case. $45,380,852 E
3 Dec 1983
797 Don’t forget to call error after the message has been given (noticed by Gabi
Kuper) . $500 F
Version 1.0 released today incorporates all of the above.
9 Dec 1983
Dinner party with 36 guests to celebrate W’scoming of age.
2 Feb 1984
786 +P 798 Reinstall \font precautions that I thought were unnecessary. I overlooked
many problematic possibilities, like ‘C\font\a=x \global\a) \the\font’and
‘\font\a=x\font\b=x \let\b=\undefined \the\a’,etc. (Found by Mike Ur-
ban.) The new remedy involves removal of the font-dent array and putting
the identifiers into a frozen part of the hash table; so there’s a sprinkling of
THE ERRORS OF TEX 68 1

corrections in lots of modules. But basically the change is quite conservative,


so it shouldn’t spawn any new bugs (it says here). $222,267,1257 S
9 Feb 1984
799 Remove the possibility of double interrupt, in a scenario found by Clint Cuzzo. $1031 S
12 Feb 1984
800 Improve spacing in a formula like $(A, 0$. $764 Q
13 Feb 1984
801 Avoid a bad goto, as diagnosed by Clint Cuzzo and George O’Connor. (Must
not go directly to switch.) $346 A
802 Conserve string pool space by not storing file name in two guises (suggested by
DRF). $537 E
26 Feb 1984
803 Make scaled output look cleaner by printing fewer decimals whenever this in-
volves no loss of accuracy. (Suggested by METAFONT development.) $103 I
2 Mar 1984
804 Maintain 17-digit accuracy, not 16; now constants like ‘ .00000762939453126pt’
will round correctly. $452 R
16 Mar 1984
805 Plug a loophole that permitted recursion in get-next , by disallowing deletions
in chectouter-validity . $336 R
24 Mar 1984
806 Open the terminal before trying to wake it up, when the program starts bad. $1332 I
27 Mar 1984
807 Check that k < 63, to avoid the \patterns{xxx.. .xxxdxxxdxxxl anomaly
found by Jacques DBsarmCnien. $962 R
11 Apr 1984
808 Supply code for the missing case adjust-node in copy-node-list. $206 F
Yoicks, how could serious bugs like that have escaped detection?
11 Jun 1984
627 I-+ 809 Initialize char-base, etc., for null-font. (Found by Nick Briggs.) $552 D
810 Clear the buffer array initially (Briggs). $331 R
21 Jun 1984
811 Look ahead for ligature or kern after a \chardef’d item (DBsarmBnien). $1036 C
4 Jul 1984
812 Make the quarterword constraint explicit with a new ‘bad’ case (19). $111 R
7 Jul 1984
813 Optimize firm-up-the-line slightly, to be consistent with the METAFONT pro-
gram. $363 E
8 Jul 1984
814 Give additional diagnostics when \tracingmacros>l. $323 I
The changes above were incorporated in Version 1.1, released July 9, 1984.
27 Jul 1984
815 Say ‘see the t r a n s c r i p t f i l e ’ after handling offline \shov commands. (Sug-
gested by METAFONT.) $1298 I
20 Oct 1984
816 Allow ‘0’ in response to error prompts. $84 I
Those two changes led to Version 1.2.
682 D . E. KNUTH

25 Nov 1984
817 Don’t forget to check for null before looking at subfields of a node. (This was
“dirty Pascal,” with two quarterword 0’s read as a halfword.) $846 R
818 Ditto in another place! $939 R
819 Remove the fixed-at-compile-time partition between lower and upper memory.
$116,125,162 E
This major change in memory management completes Version 1.3, which was
published in preiiminary looseieaf form as ‘w: The Program’.
20 Dec 1984
820 Keep the nodesize field from overflowing if the lower part of memory is too
large. $125 R
That was another bug in existence fiom the beginning!
5 Jan 1985
821 Improve the missing-format-file error (DRF). $524 I
7 Jan 1985
822 Update the terminal right away so that the welcoming message will appear as
soon as possible (DRF). $61 I
23 Jan 1985
823 Convey more uncertainty in the help message at times of confusion. $95 I
824 Improve the history logic in the warning-issued case. $245 I
18 Feb 1985
810 H 825 Stick to standard Pascal: Don’t use first in a for loop. [Some procedures
“threaten” it globally, according to British Standard 6192, section 6.8.3.9.1
(Pointed out by CET.) $331 P
11 Apr 1985
826 Prevent nonexistent characters from being output by unusual combinations of
ligatures and hyphenation. $915 S
15 Apr 1985
819 ++ 827 Compute memory usage correctly in INITEX; the previous number was wrong
because of a WEB text macro without parentheses (DRF). $164 L
16 Apr 1985
828 Speed up flush-list by not calling free-avail (DRF). $123 E
17 Apr 1985
788 H 829 Introduce a special kind of kern for accent positioning; it must not disappear
after a line break. $837,879,1125 A
18 Apr 1985
755 ++ 830 Prevent \lastbox and \unkern from removing discretionary replacements.
$1081,1105 R
That completes Version 1.4.
26 Apr 1985
831 Don’t try m - a r e a if a nonstandard file area has been specified (DRF). $537 c
That was #401 in w 7 8 ; I never learn!
30 Apr 1985
754 ++ 832 Eliminate the limitation on \write length; the reason for it has disappeared
(Nancy Tuma) . $1370 C
8 May 1985
819 ++ 833 Allocate two words for the head of the active list (CET). $162 D
THE ERRORS OF TEX 683
11 May 1985
834 Change w t e m to wtem-ln after a bad beginning (Bill Gropp). $1332 I
806 H 835 Don’t open the terminal twice (CET). $1332 E
22 May 1985
836 Test for batch-mode after trying to open the transcript file, not before (DRF). $92 R
837 Be prepared for string pool overflow while reading the command line! (This
bug was first found in METAFONT, when it could occur more easily.) $525 R
7 Aug 1985
838 Fix a bug in \edef\fooC\iffalse\fi\the\toksO): l&$ should stay in the
loop when expanding non-\the. (Found by Dan Brotsky.) $478 A
The above changes were incorporated in Version 1.5.
27 Nov 1985
764 H 839 Make ‘plain’ a lowercase name, for consistency with the manual. $521 C
669 H 840 Wake up the terminal for \show commands. $1294,1297 I
The above changes were incorporated in Version 2.0, which was published as
Volume B of the Computers & Typesetting series.
15 Dec 1986
841 Punctuate the Poirot help message more carefully. $1283 I
28 Jan 1987
842 Make sure that mu-in-open doesn’t exceed 127 (DRF). $14 R
680 H 843 Don’t allow a \kern to be clobbered at the end of a pre-break list when a
discretionary break is taken. (A missing ‘else’was the source of the error,
diagnosed incorrectly before.) $881 D
844 Take account of discarded nodes when computing the background width after
a discretionary. $840 D
That was the first really serious bug detected for more than 17 months! I found
it while experimenting with right-to-left extensions.
a Version 2.1 was released on January 26, 1987.
5 Feb 1987
845 Remove cases in shorthand-def that cannot occur (found by Pat Monardo). $1224 E
14 Apr 1987
846 Improve robustness of data structure display when debugging (Ronald0 Am&).
$174,182 R
21 Apr 1987
847 Make the storage allocation algorithm more elegant and efficient. $127 E
22 Apr 1987
742 H 848 Calculate the empty-line condition properly when end-line-char is absent. $360 A
The previous three changes were found while I was teaching a class based on
Volume B; they led to Version 2.2.
28 Apr 1987
849 Avoid closing a file when T)$ knows that it isn’t open (JS). $560 E
3 Aug 1987
850 Clean up unfinished output if it’s necessary to jump-out (Klaus Gunterman). $642 S
That makes Version 2.3; subsequent version numbers won’t be logged here.
19 Aug 1987
851 Indent rules properly in cases like
\hangindent=lpt$$\halign{ ...\cr\noalign{\hrule))$$. $806 A
684 D. E. KNUTH

20 Aug 1987
852 Introduce co-backup because of cases like \hskip Opt plus lfil\ifdim (Alan
Guth). $366 S
9 Nov 1987
853 Change the calculation for number of leader boxes, so that it won’t be too
sensitive to roundoff error near exact multiples (M. F. Bridgland). $626 S
17 Nov 1987
854 Replace my stupid algorithm for fixed-point multiplication of negatives (W. G.
Sullivan). $572 A
12 Dec 1987
855 Fix a typo in the initialization of hyphenation tables (Peter Breitenlohner). $952 B
That error was almost completely harmless, thus undetectable, except if some
\kcode is 1 and no \patterns are given.
23 Dec 1987
564 H 856 Be more cautious when “relaxing” a previously undefined \csname;you might
be inside a group (CET). $372 S
20 Apr 1988
857 Make sure temp-head is well-formed whenever it can be printed in a LLrunaway”
message: Consider constructions like \outer\def\aOO\a\a (Silvio Levy). $391 S
24 Apr 1988
858 Avoid conflicting use of the string pool in constructions like \def\\#l{)\input
a\\b (Robert Messer). $260 S
10 May 1988
859 Amend the \patterns data structure when trie-mzn = 0 (Breitenlohner). $951,953 R
25 M a y 1988
860 Guarantee that trie-pointer cannot be out of range. $923 R
861 Avoid additional bugs like #858 in constructions like \input a\romannumerall,
etc. $464,465,470 S
862 Prevent similar string pool confusion that could occur during the processing of
**\input\romannumera16. $525 R
19 Jun 1988
819 H 863 Prevent a negative dividend from rounding upward, causing a loop (CET). $126 S
819 ++ 864 Adopt a smoother allocation strategy when memory is nearly gone (CET). $126 E
20 Jun 1988
852 H 865 Initialize cur-order, now that it’s being backed up (Tsunetoshi Hayashi). $439 D
6 Nov 1988
612 ++ 866 Disable fatal-error in prompt-input, so that open-log-file can use it safely (Tim
Morgan). $71 S
836 H 867 Force terminal output whenever open-log-file fails. $535 s
We’re ROW u p to Version 2.94; I sincerely hope all bugs have been found.
THE ERRORS OF TEX 685
REFERENCES
1. Piet Hein, Gmks, M I T Press, 1966.
2. C. Szkchy, Foundation Failures, Concrete Publications, London, 1961.
3. A. Endres, ‘An analysis of errors and their causes in system programs’, Proc. Int. Cbnf. Software
Eng., 1975, pp. 327-336.
4. Victor R. Basili and Barry T. Perricone, ‘Software errors and complexity: an empirical investigation’,
Communications of the ACM, 27, 42-52 (1984).
5. L. A. Belady and M. M. Lehman, ‘A model of large program development’, IBM Systems J., 15,
225-252 (1976).
6. Donald E. Knuth, TEX: The Program, Addison-Wesley, 1986.
7. Donald E. Knuth, ‘Literate programming’, The Computer Journal, 27, 97-1 11 (1984).
8. Donald E. Knuth, ‘The WEB system of structured documentation’, Stanford Computer Science Report
STAN-(3-980, September 1983.
9. Patrick Winston, Artifcia1 Intelligence: An MIT Perspectice, M I T Press, 1979.
10. Donald E. Knuth, ‘The letter S’, The Mathematical Intelligencer, 2, 114-122 (1980).
11. Donald E. Knuth, ‘Mathematical typography’, Bulletin of the h z e n k a n Mathematical Society (new
series) 1, 337-372 (1979).
12. Donald E. Knuth, Seminumerical Algorithms, second edition, Addison-Wesley, 1981.
13. Donald E. Knuth and Michael F. Plass, ‘Breaking paragraphs into lines’, Software-Practice and
Experience, 11, 1119-1184 (1981).
14. Donald E. Knuth, ‘The concept of a meta-font’, fisible Language, 16, 3-27 (1982).
15. Donald E. Knuth, The T ~ X h o kAddison-Wesley,
, 1984.
16. Barbara Beeton (ed), TEX and IIIETAI’OVT: Errata and Changes, 09 September 1983, distributed
with T C G h a t , 4 (1983).
17. Donald E. Knuth, TEX, a System for Technical Text, American Mathematical Society, 1979.
18. Donald E. Knuth, TEX and METM0,VT: AVew Directions in Typesetting, Digital Press, 1979.
19. David R. Fuchs and Donald E. Knuth, ‘Optimal prepaging and font caching’, AC’M Transactions on
Programming Languages and Systems, 7, 62-79 (1985).
20. Donald E. Knuth, ‘A torture test for TEX’, Stanford Computer Science Report STAY-C’S-1027,
November 1984.
21. Donald E. Knuth, ‘A torture test for METAFONT’, Stanford Computer Science Report STAY-C‘S-
1095, January 1986.
22. Donald E. Knuth, Sorting and Searching, Addison-Wesley, 1973.
23. Brian W. Kernighan and Lorinda L. Cherry, ‘A system for typesetting mathematics’, Communications
of the ACM, 18, 151-157 (1975).
24. Guy L. Steele Jr., Donald R. Woods, Raphael A. Finkel, Mark R. Crispin, Richard M. Stallman
and Geoffrey S. Goodfellow, Hacker’s Dictionary: A Guide to the World of Lliiards, Harper and
Row, 1983.
25. Donald E. Knuth, Fundamental Algorithms, Addison-Wesley, 1968.
26. Reinhard Budde, Christiane Floyd, Reinhard Keil-Slawik and Heinz Zullighoven, (eds) Software
Development and Reality Construction, in preparation.
George Forsythe and the
Development of Computer Science
by Donald E. Knuth

The sudden death of George Forsythe this spring was considered such combinatorial algorithms to be a part of
a serious loss to everyone associated with computing. numerical analysis [46, p. 7], and he regarded automatic
When we recall the many things he contributed to the programming as another branch [49, p. 655]. He began
field during his lifetime, we consider ourselves fortunate to foresee the less obvious implications of programming:
that computer science has had such an able leader. The use of practically any computing technique itself raises a
My purpose in this article is to review George For- number of mathematical problems. There is thus a very con-
sythe's contributions to the establishment of Computer siderable impact of computation on mathematics itself, and this
may be expected to influence mathematical research to an in-
Science as a recognized discipline. It is generally agreed creasing degree. [46, p. 5]
that he, more than any other man, is responsible for the The automatic computer really forces that precision of thinking
rapid development of computer science in the world's which is alleged to be a product of any study of mathematics.
colleges and universities. His foresight, combined with his [49, p. 655l
untiring efforts to spread the gospel of computing, have He also noticed that the rise of computers was being
had a significant and lasting impact; one might almost accompanied by an unprecedented demand for young
regard him as the Martin Luther of the Computer Re- mathematicians:
formation! The majority of our undergraduate mathematics majors are lured
Since George's publications express these ideas so at once into the marketplace, where they are greatly in demand
well, I believe the best way to summarize his work is to as servants of the fast-multiplying family of fast-multiplying com-
puters. [49, p. 651]
repeat many of the things he said, in his own words. This
article consists mainly of the quotations that particularly Therefore he began to argue th,at computers should
struck me as I reread his papers recently. Indeed, much of play a prominent role in undergraduate mathematics edu-
what follows belongs in a computer-science supplement cation. At this time he felt that only one new course was
to Bartlett's Familiar Quotations. needed for undergraduates, namely an introduction to
programming; he stressed that the best way to teach it
F r o m N u m e r i c a l A n a l y s i s to Computer Science would be to combine computer programming with the
George's early training and research in numerical traditional courses, instead of having separate training in
analysis was a good blend of theory and practice: numerical analysis. His paper "The Role of Numerical
The fact that the CPC was generally wrong when I knew the Analysis in an Undergraduate Program" [49] suggests
answer made me wonder what it was like for someone who didn't over 50 good ways to mix computing into other courses;
know what to expect. [76, p. 5] these suggestions ought to be required reading for all
Starting in 1948 he worked for the National Bureau of teachers today, since they are now perhaps even more
Standards' Institute for Numerical Analysis in Los Ange- relevant than they were in 1959. Indeed, the ad'aptation
les, California, where he did extensive programming for of traditional courses has been painfully slow (probably
the SWAC computer. In 1954 this Institute became part of because professors of the older generation have not
U.C.L.A., and he put a great deal of energy into the teach- wanted to dirty their hands with the newfangled ma-
ing of mathematics and numerical analysis. H e also worked chines) ; in 1970 Forsythe was still strongly urging math-
on nonnumerical problems, such as the tabulation of all ematics teachers to bend a little:
possible semigroups on four elements; at this time, he Compared with most undergraduate subjects, mathematics courses

721 Communications August 1972


of Volume 15
the ACM Number 8
are very easy to prepare for, because they change so slowly. The gradually been working its way into the English language,
computing part of it is probably the only part that changes much. but his influence was an important factor in the present
Why not devote time to learning that? [80, p. 23]
widespread acceptance of the term.
In 1961 we find him using the term "computer sci-
A brief digression into the history of computer sci-
ence" for the first time in his writing:
ence education seems appropriate at this point. Appar-
[Computers] are developing so rapidly that even computer scien- ently computing courses got started in universities largely
tists cannot keep up with them. It must be bewildering to most
mathematicians and engineers...In spite of the diversity of the because I B M donated about 100 "free" computers dur-
applications, the methods of attacking the difficult problems with ing the 1950s, with the stipulation that programming
computers show a great unity, and the name of Computer Sciences courses must be taught. This strategy made it possible
is being attached to the discipline as it emerges. It must be under-
stood, however, that this is still a young field whose structure is for computing to get its foot in the academic door. Natur-
still nebulous. The student will find a great many more problems ally there were many students and a few members of the
than answers. [59, p. 177] faculty who were intrigued and became involved. Engi-
He identified the "computer sciences" as the theory of neering departments, especially at schools like M.I.T.,
programming, numerical analysis, data processing, and Pennsylvania, and Illinois, where computers were being
the design of computer systems, and observed that the built, also had a head start. Many ideas were exchanged
latter three were better understood than the theory of during special summer school sessions at the University
programming, and more available in courses. of Michigan, and later the F o r d Foundation sponsored a
project there on the use of computers in engineering
T h e Establishment of Computer Science
education. A good survey of these developments has
By that time Forsythe knew that numerical analysis been given by H o w a r d E. Tompkins in Advances in Com-
was destined to be only a part of the computing milieu; a puters Vol. 4, Academic Press, New York, 1963, pp.
new discipline was crystallizing which cried out to be 135-168.
taught. H e had come to Stanford as a professor of math- But these early stages hardly represented computer
ematics in 1957, but now he and Professor John Herriot science as it is understood today, nor did m a n y people
wanted to hire colleagues interested in programming, regard it as the germ of a genuine discipline worthy of
artificial intelligence, and such topics, which are not con- study on a par with other subjects. I myself was a gradu-
sidered mathematics. Stanford's administration, espe- ate student in mathematics who enjoyed programming
cially Dean Bowker (who is now Chancellor at Berkeley), as a hobby; I had written two compilers, but I had no idea
also became convinced that computing is important; so that I would someday be teaching about data structures
George was able to found the Division of Computer and relating all this to mathematics. A few people, like
Science within the Mathematics Department in 1961. George Forsythe and Alan Perlis and Richard Hamming,
During that academic year he lectured on "Edu- had no such mental blocks. Louis Fein had also perceived
cational Implications of the Computer Revolution" at the eventual rise of computer science; he had recom-
Brown University: mended in 1957 that Stanford establish a Graduate
"Machine-held strings of binary digits can simulate a great many School of Computer Science, analogous to the H a r v a r d
kinds of things, of which numbers are just one kind. For example,
they can simulate automobiles on a freeway, chess pieces, elec- Business School. (cf. reference [B] below.)
trons in a box, musical notes, Russian words, patterns on a paper, George argued the case for computer science long
human cells, colors, electrical circuits, and so on. To think of a and loud, and he won; at Stanford he was in fact "the
computer as made up essentially of numbers is simply a carry-
over from the successful use of mathematical analysis in studying producer and director, author, scene designer, and cast-
models...Enough is known already of the diverse applications ing manager of this hit show." [A] Several more faGulty
of computing for us to recognize the birth of a coherent body of members were carefully selected, and the Division be-
technique, which I call computer science...Whether computers
are used for engineering design, medical data processing, com- came a separate academic department in January 1965.
posing music, or other purposes, the structure of computing is Since this was one of the first such departments, it
much the same. We are extremely short of talented people in this naturally came under very close scrutiny. Now we realize
field, and so we need departments, curricula, and research and
degree programs in computer science...I think of the Computer that eventually every university will have such a depart-
Science Department as eventually including experts in Program- ment. Although this development is inevitable in the long
ming, Numerical Analysis, Automata Theory, Data Processing, run, it will happen sooner than might be expected largely
Business Games, Adaptive Systems, Information Theory, Infor-
mation Retrieval, Recursive Function Theory, Computer Linguis- because George was such an effective spokesman, espe-
tics, etc., as these fields emerge in structure...Universities must cially to mathematicians and to people in the government.
respond [to the computer revolution] with far-reaching changes in Here are some important points he has made, in addi-
the educational structure. [60]
tion to those quoted earlier:
At this time there were comparatively few graduate
The most valuable acquisitions in a scientific or technical educa-
c o m p u t e r science p r o g r a m s a v a i l a b l e in A m e r i c a n tion are the general-purpose mental tools which remain service-
colleges; and they had other names, like Systems and able for a lifetime. I rate natural language and mathematics as the
Communication Sciences (Carnegie), Computer and In- most important of these tools, and computer science as a third...
The learning of mathematics and computer science together has
formation Sciences (University of Pennsylvania), Com- pedagogical advantages, for the basic concepts of each reinforce
munication Science (University of Michigan). Forsythe the learning of the other. [71, p. 456-457]
did not invent the term "computer science," which had The question "What can be automated7" is one of the most in~ir-

722 Communications August 1972


of Volume 15
the ACM Number 8
ing philosophical and practical questions of contemporary civili- We have a master's degree program with about 40 graduate stu-
zation. [75, p. 92] dents, and a number of students headed for interdepartmental
Ph.D.'s in Computer Science. [January 8, 1964]
The last sentence is taken from the introduction to an
One thing which enabled computer science to grow
invited address on Computer Science and Education at
was that other universities could point to Stanford's ex-
the IFIP Congress 1968; I wish I could quote the entire
ample. Conversely, George was able to make use of other
article.
universities' activities; in a memo to the dean on January
Forsythe frequently stressed the value of experimental
30, he said:
computer science, as well as the theoretical:
I enclose copies of two letters...which indicate in and between
To a modern mathematician, design seems to be a second-rate the lines that [the University of Wisconsin in] Madison is putting
intellectual activity. But in the most mathematical of the sciences, on a really major effort in Computer Science. They are even call-
physics, the role of design is highly appreciated...If experimental ing it Computer Science at last!
work can win half the laurels in physics, then good experimental
work in computer science must be rated very high indeed. [68, p. 4] On June 5, having been elected president of the ACM,
George wrote:
Intense Activity Votes have strange outcomes in California. Goldwater and
The primary reason George's views have been so in- Forsythe.
fluential is th.at he continually poured so much energy By July 2, he was really feeling the increased respons-
into all aspects of his work. One way to illustrate this is to ibilities:
focus on a randomly-selected period of his life and t o The pile of undigested mail on my desk is staggering.
look more deeply into his daily activity; therefore I His two years as ACM President were in general a
studied his correspondence file for the months of January rather happy and prosperous time for that organization.
and February 1964. He published regular letters to the members [65] in Com-
At this time his Division of Computer Science con- munications, and these letters are worth rereading today
tained two faculty members besides himself (John Herriot because in them he discussed many of his own feelings,
and John McCarthy), plus two young "visiting assistant as well as ACM business. His letter in the March 1965
professors" for whom regular appointments were being issue contains an excellent account of how he grappled
arranged (Gene Golub and Niklaus Wirth), and an in- with the problems of a new Computer Science Depart-
structor (Harold Van Zoeren). As the correspondence ment:
shows he was actively trying to build up the faculty, and We must now turn our attention from the battle for recogni-
tion to the struggle to recognize the identity of our new discipline
I suspect that every computer scientist in America was ...One of my personal concerns with our Computer Science De-
approached at least twice during the early 1960s with a partment is to assess the future of numerical analysis...The core
potential offer of employment at Stanford! George was of Computer Science has become and will remain a field of its
own, concerned with the forefront of new ideas.../conclude that
also the director of Stanford's Computation Center, and the computer and information sciences badly need an association
a member of several national advisory panels and com- of people to study them, improve them, and render them better
mittees. In addition, he had just been appointed editor of understood and thus more useful.
the Algorithms section of Communications. But the intended introduction to his President's Letter
During this two-month period he wrote a total of 195 for September 1965, had to be changed. He had written:
letters, which may be grouped as follows: I am delighted that you have voted to change our name to
the Association/or Computing and Information Sciences...I think
1. Recruiting faculty, 48 letters (including two addressed it gives a much clearer picture of who we are and what we do.
to me). A two-thirds majori~ty was necessary for such a name
2. Algorithms section, editorial work, 43 letters. change, and the actual vote was only 3794-2203. I must
3. Recommendations of policy to outside groups, 36 confess that I was one of the 2203 who opposed making
letters. a change; this was one of the few disagreements I ever
4. Departmental correspondencewith graduate students, had with George.
35 letters.
5. Research interests, 11 letters. Algorithms
6. Miscellaneous, 22 letters. The major thing which distinguishes computer sci-
Many of these letters were two pages long; some were ence from other disciplines is its emphasis on algori,thms,
even longer. and in this field George Forsythe made several vital con-
Several letters described the current status of compu- tributions. He inaugurated a new area of scholarly work:
ter science at Stanford: refereeing and editing algorithms.
We are a bit separate from the Mathematics Department, and His point of view was nicely expressed in the "Forum
have responsibility for courses in numerical analysis, program- on Algorithms" in Communications, April 1966:
ming, artificial intelligence, and any other areas of Computer
Science which we can manage. [January 3, 1964] There are few problems for which a good algorithm of probable
permanent value is known...Small details are of the greatest
The role of the Computer Science Division is likely to be increas- importance...The development of excellent algorithms requires
ingly divergent from that of Mathematics. It is important to a long time, from discovery of a basic idea to the perfection of
acquire people with strong mathematics backgrounds, who are the method...A useful algorithm is a substantial contribution to
nevertheless prepared to follow Computer Science into its new knowledge. Its publication constitutes an important piece of
directions. [January 7, 1964] scholarship. [67]
723 Communications August 1972
of Volume 15
the ACM Number 8
H e was fond of pointing out how much remains to be themselves into task forces able to attack new disciplines as they
done, since even the solution to ax*q-bxq-c=O is at the arise. [April 14, 1970]
frontier of well-understood problems: H e also had made another prediction:
In years to come we may expect a department of computer sci-
Hardly anyone knows how to solve even a quadratic equation on ence to mix with departments of pure mathematics, operations
a computer without unnecessarily risking loss of precision or research, statistics, applied mathematics, and so on, inside a
overflow or underflow! [68, p. 4] school of mathematical sciences. We can hope for some weaken-
As an indication of his behind-the-scenes activities, ing of the autonomy of individual departments, and a concomitant
strengthening of the ability of a university to found and carry
here are some more excerpts from letters he wrote during out interdisciplinary programs. [68, p. 6l
J a n u a r y and F e b r u a r y 1964:
The program is really in poor style, and I'm peeved with the Conclusion
referee for not saying so. You use a switch and a mass of goto's I have tried to summarize our debt to George F o r -
where straightforward ALGOLwould use conditional expressions. sythe by quoting f r o m the extensive writings in which he
You even goto "here" from the line above "here"!! [January 8,
1964l expressed important ideas so clearly. But this is only part
I am sorry that refereeing increases the time between submittal of the story. The a c c o m p a n y i n g article by J o h n Herriot
and publication, but I am confident that the net result of referee- describes the more personal side of George's l i f e - h i s
ing will be a large gain in the quality of our algorithms. [January selfless assistance and counsel to his students and col-
13, 1964]
leagues, and the real qualities of leadership for which we
It is very hard to find matching begins and ends, which should be are especially grateful.
above each other or on the same line. [January 29, 1964]
Since I am not competent to write about numerical
We are punching cards from the galleys and running them as a
check. [February 24, 1964] analysis, I have not been able to describe George's re-
I believe that our algorithms must have enough substantial con- spected contributions to research. A short s u m m a r y of
tent to save a programmer at least an hour's thought. [February this aspect of his work is being prepared by A.S. H o u s e -
26, 1964] holder for publication in the SIAM Journal on Numerical
A t that time .approximately 180 algorithms per year were Analysis later this year. George knew that he would have
being submitted to Communications. to sacrifice m u c h of the time he wanted to spend on his
George also contributed to ACM publications in other main research interests, for the cause of C o m p u t e r Sci-
ways: In 1966 he b e c a m e the first editor of the Education ence. H e quipped:
department of Communications; he had been an editor of In the past 15 years many numerical analysts have progressed
from being queer people in mathematics departments to being
the Journal from 1955 to 1959; and he was chairman of
queer people in computer science departments! [71, p. 456]
the Editorial B o a r d f r o m 1960 to 1962.
Nevertheless he continued to stress the important con-
nections between numerical analysis and the other aspects
The Permanence of Computer Science of computer science.
H o w did George view these developments f r o m a his-
H e was a Fellow of both the British C o m p u t e r Society
torical perspective? H e set down his long-range views in
and the A m e r i c a n Association for the A d v a n c e m e n t of
the following m e m o r a n d u m , written at Stanford in 1970
Science. H e was a council m e m b e r of the A m e r i c a n
just after Edsger Dijkstra had visited our department and
Mathematical Society, 1 9 6 0 - 6 3 , and a Trustee of the
stimulated some thought-provoking discussions:
Society for Industrial and Applied Mathematics, 1 9 7 1 - 7 2 .
My feeling since 1962 has been that, even if Computer Sci- A bibliography of his publications appears below. In
ence should turn out to be fully developed and fairly static by
1985, it will have been very important for universities to have addition to these works, he published m a n y shorter b o o k
created Computer Science Departments in the years 1960-1970. reviews, letters to the editor, etc. H e was especially con-
For, given the departmental structure of universities (which I cerned about the need for good books on c o m p u t e r sci-
deplore), I don't see how universities could otherwise have got
roiling on research in this area. And without this research, much ence, so he served as the editor of Prentice-Hall's prestigi-
of the quality of computer usage in universities would be frozen ous Series in A u t o m a t i c Computation, encompassing
at the level of Early FORTRAN. more than 75 tries. This is a "fifteen-foot-shelf" to be
I don't mean that I do in fact forecast an end to the develop-
ment of computer science by 1985. Being the study of compu- c o m p a r e d with the books he listed in [26].
ters, Computer Science can't begin to settle down until years after H e took notes at every lecture he attended and kept
the hardware developments level off. This is not yet in sight. them in beautifully organized files. This material, together
However, Dijkstra did set me to thinking about how long
Computer Science will last. It may be that its difficult applications with his correspondence, has been deposited in the Stan-
(like robotry and problem solving) will move off into various ford University archives for the use of future historians.
other disciplines. And the difficult problems in the core of Com- Acknowledgments. M a n y people helped me prepare
puter Science may get merged into discrete mathematics, as
mathematicians get interested in them. this article, especially J o h n Herriot, Allen NeweU, A l a n
On the other hand, there is very little evidence at present Perlis, G u y n n Perry, and Alexandra I. Forsythe. I also
that mathematicians are taking any interest at all in the important wish to thank the Mathematical Association of America,
questions of the mathematical theory of computation. (Speed,
optimality, data structures, storage requirements, proofs of cor- J o h n Wiley and Sons, Ginn and Co., and the A m e r i c a n
rectness, etc., etc.) If they do not, then maybe these core com- Society for Engineering Education, for permission to re-
puter scientists may absorb all of discrete mathematics them- print copyrighted remarks f r o m George Forysthe's publi-
selves into a still unnamed discipline.
Most of all, I'd like to see universities able to restructure cations.

724 Communications August 1972


of Volume 15
the ACM Number 8
References 21. Generation and testing of random digits at the National
Bureau of Standards, Los Angeles. Natl. Bur. Stand..,tppL Math.
[A] Edward A. Feigenbaum, "A word entr'acte," Stanford Series 12 (1951), 34-35.
UniversityComputation Center newsletter, Autumn quarter, 1965. 22. Summary of John von Neumann's lecture, Various Tech-
[B] Louis Fein, '~l'he computer-related sciences (Synnoetics) at niques Used in Connection with Random Digits. Natl. Bur. Stand.
a university in the year 1975," Amer. Scientist 49 (1961 ), Appl. Math. Series 12 (1951), 36-38.
149-168. 23. Theory of selected methods of finite matrix inversion and
decomposition. Inst. for Numerical Analysis Rep. 52-5, Natl.
Bibliography of George Elmer Forsythe Bur. Standards, Los Angeles, (1951), 93 pp.
24. An extension of Gauss' transformation for improving the
BOOKS condition of systems of linear equations (with Theodore S.
Motzkin). Moth. Tables Aids Comput. 6 (1952), 9-17.
Dynamic Meteorology (with J6rgen Holmboe and William 25. Bibliographical survey of Russian mathematical mono-
Gustin). John Wiley, New York, 1945, 378 pp. graphs, 1930--1951. Natl. Bur. Stand. Rep. 1628, Mar. 25, 1952,
Bibliography of Russian Mathematics Books. Chelsea, New 64 pp. Supplement, Rep. 1628A, Dec. 12, 1952, 17 pp.
York, 1956, 106 pp. 26. A numerical analyst's fifteen-foot shelf. Math Tables Aids
Finite-Difference Methods for Partial Differential Equations Comput. 7 (1953), 221-228.
(with Wolfgang Wasow). John Wiley, New York, 1960, 444 pp. 27. Tentative classification of methods and bibliography on solv-
Translations into Russian ( 1963 ), Japanese (1968). ing systems of linear equations. Natl. Bur. Stand. Appl. Math.
Computer Solution of Linear AIgebraic Systems (with Cleve Series 29 (1953), 1-28.
B. Moler), Prentice-Hall, Englewood Cliffs, N.J., 1967, 153 pp. 28. Punched-card experiments with accelerated gradient methods
Translations into Russian (1969 ), Japanese (1969), German for linear equations (with A.I. Forsythe). Natl. Bur. Stand. Appl.
(1971). Math. Series 39 (1954), 55-69.
[Another book, based on the notes from his introductory course 29. Alternative derivations of Fox's escalator formulae for latent
at Stanford on numerical methods, is partly finished.] roots. Quart. I. Mech. Appl. Math. 5 (1952), 191-195.
30. Solving linear algebraic equations can be interesting. Bull.
ARTICLES Amer. Math. Soc. 59 (1953), 299-329.
31. Asymptotic lower bounds for the frequencies of polygonal
1. Riesz summability methods of order r,/or R (r) < O. Duke membranes. Pacific 1. Math. 4 (1954), 467-480.
Math. 1.8 (1941), 346--349. 32. Review of Householder, Principles of Numerical Analysis.
2. Remarks on regularity of methods of summation (with A.C. Bull. Amer. Math. Soc. 60 (1954), 488---491.
Schaeffer). Bull. Amer. Math. Soc. 48 (1942), 863-865. 33. Asymptotic lower bounds for the fundamental frequency of
3. Ces~ro summability of random variables. Duke Math. J. 10 convex membranes. Pacific J. Math. 5 (1955), 691-702.
(1943), 397-428. 34. What are relaxation methods? In Modern Mathematics for
4. Note on equivalent-potentialtemperature. Bull. Amer. the Engineer, E.F. Beckenbach (ed.), McGraw-Hill, New York,
Meteorol. Soc. 25 (1944), 149-151. 1956, pp. 428--447.
5. Remarks on the above paper by Neamtan. Bull. Amer. 35. On best conditioned matrices (with E.G. Strans). Proc.
Meteorol. Soc. 25 (1944), 228-229. Amer. Math. Soc. 6 (1955), 340-345. [Presented at Proc. Int.
6. Determination of absolute height and wind for aircraft Congress Math., Amsterdam, 1954.]
operations. Hdqts. Army Air Forces Weather Div. Rep. 708, June 36. SWAC computes 126 distinct semigroups of order 4. Proc.
1944, 69 pp. [author's name omitted]. Amer. Math. Soc. 6 (1955), 443--447.
7. A generalization of the thermal wind equation to arbitrary 37. The Souriau-Frame characteristic equation algorithm on a
horizontal flow. Bull. Amer. Meteorol. Soc. 26 (1945), 371-375. digital computer (with Louise W. Straus). J. Math. Physics 34
8. Universal tables for reduction of pressure to sea level. Hdqts. (1955), 152-156.
Army Air Forces Weather Div. Rep. 972, June 1945, 22 pp. 38. Computing constrained minima with Lagrange multipliers. 1.
[author's name omitted]. Soc. lndust. Appl. Math. 3 (1955), 173-178.
9. Aircraft weather reconnaissance (with R.B. Doremus). 39. Relaxation methods. In Mathematical Theory of Elasticity,
Hdqts. Army Air Forces Weather Service Rep. 105--128-1, Sept. 2nd ed., See. 125, I.S. Sokolnikoff (ed.), McGraw-Hill, New York,
1945, 218 pp. [authors' names omitted]. 1956, pp. 454-465.
10. War-time developments in aircraft weather reconnaissance. 40. Difference methods on a digital computer for Laplacian
Bull. Amer. Meteorol. Soc. 27 (1946), 160-163. boundary value and eigenvalue problems. Comm. Pure Appl.
11. Discussion of E. V. Ashburn and L.L. Weiss's article on Math. 9 (1956), 425-434.
Vorticity. Trans. Amer. Geophys. Union 27 (1946), 279-282. 41. Selected references on use of high-speed computers for
12. Maximum density-altitude in the continental United States scientific computation. Math. Tables Aids Comput. 10 (1956),
(with Morris S. Hendrickson). Bull. Amer. Meteorol. Soc. 27 25-27.
(1946), 576-579. 42. Generation and use of orthogonal polynomials for data
13. Speed of propagation of atmospheric waves with changing fitting with a digital computer..l. Soc. lndust. Appl. Math. 5
shape. J. Meteorol. 4 (1947), 67-69. (1957), 74-88.
14. On N6rlund summability of random variables to zero. Bull. 43. The educational program in numerical analysis of the
Amer. Math. Soc. 53 (1947), 302-313. Department of Mathematics, U.C.L.A. In The Computing Lab-
15. Exact particle trajectories for nonviscous flow in a plane with oratory in the University, Preston C. Hammer, ed.), U. of
a constant Coriolis parameter. Y. Meteorol. 6 (1949), 337-346. Wisconsin Press, 1957, pp. 145-151.
16. Solution of the telegrapher's equation with boundary condi- 44. Suggestions to students on talking about mathematics papers.
tions on only one characteristic. Y. Res. Natl. Bur. Stand. 44 Amer. Math. Monthly 64 (1957), 16-18.
(1950), 89-102. 45. The role of computers in high school science education.
17. Matrix inversion by a Monte Carlo method (with Richard A. Computers and Automation 6 (Aug. 1957), 15-16.
Leibler). Math. Tables Aids Cornput. 4 (1950), 127-129. 46. Contemporary state of numerical analysis. In Numerical
Correction in Math, Tables Aids Comput. 5 ( 1951 ), 55. Analysis and Partial Differential Equations (with Paul C. Rosen-
18. Gauss to Gerling on Relaxation. Math. Tables Aids Comput. bloom), Surveys in Applied Math. 5, John Wiley, New York,
5 ( 1951 ), 255-258, (Translation, with notes, of a letter by 1958, pp. 1-42.
Gauss. ) 47. SWAC experiments on the use of orthogonal polynomials for
19. New matrix transformations for obtaining characteristic data fitting (with Marcia Ascher). J. A C M 5 (1958), 9-21.
vectors (with William Feller). Quart. Appl. Math. 8 (1951), 48. Singularity and near singularity in numerical analysis. Amer.
325-331. [Presented at Proc. Int. Cong. Math., 1950.] Math. Monthly 65 (1958), 229-240.
20. Second order determinants of Legendre polynomials. Duke 49. The role of numerical analysis in an undergraduate program.
Math. Y. 18 (1951), 361-371. Amer. Math. Monthly 66 (1959), 651-662.

725 Communications August 1972


of Volume 15
the ACM Number 8
50. Numerical methods for high-speed computers-a survey. 80. Computer science and mathematics. S1GCSE Bull. 2, 4
Proc. WJCC. Mar. 3-5, 1959, Institute for Radio Engineers, (Sept.-Oct. 1970), 20-23.
New York, pp. 249-254. 81. Recent references on solving eUiptic partial differential equa-
51. Bibliography on high school mathematics education. Com- tions by finite differences or finite elements," SIGNUM News-
puters and Automation 8 (May, 1959), 17-19. letter, 6, 1 (Jan. 1971),99, 32-56.
52. Reprint of a note on rounding-off errors. SlAM Rev. 1 82. Variational study of nonlinear spline curves (with E. H. Lee).
(1959), 66-67. To appear in SIAM Review.
53. The cyclic Jacobi method for computing the principal values 83. von Neumann's comparison method for random sampling
of a complex matrix (with P. Henrici). Trans. Amer. Math. Soc. from the normal and other distributions. To appear in Math. o[
94 (1960), 1-23. Computation.
54. Solution to problem E1398 (with G. Szeg6). Amer. Math.
Monthly 67 (1960), 696-697.
55. Review of Selfridge, On Finite Semigroups. Math. Com- Ph.D. Students
putation 14 (1960), 204-207.
56. Remark on Algorithm 15 (with John G. Herriot). Comm. (A) Ph.D. in Mathematics with specialty in Numerical Analysis
A C M 3 (1960), 602. (B) Interdepartmental Ph.D.
57. Crout with pivoting in ALGOL60. Comm. A C M 3 (1960), (C) Ph.D. in Computer Science
507-508.
58. Vectorcardiographic diagnosis with the aid of ALGOL(with Eldon Hansen (Forsythe, 1960). On Jacobi methods and
J. von der Groeben and J.G. Toole). Comm. A C M 5 (1962), block-Jacobi methods for computing matrix eigenvalues. (A)
118-122. James Ortega (Forsythe, 1962). An error analysis of House-
59. Engineering students must learn both computing and mathe- holder's method for the symmetric eigenvalue problem. (A)
matics. J. Eng. Educ. 52 (1961), 177-188. Betty Jane Stone (Forsythe, 1962). I. Best possible ratios of
60. Educational implications of the computer revolution. Appli- certain matrix norms. 2. Lower bounds for the eigenvalues of a
cations of Digital Computers, W. F. Freiberger and William fixed membrane. (A)
Prager (eds.), Ginn, Boston, 1963, pp. 166-178. Beresford Parlett (Forsythe, 1962). Applications of
61. Tests of Parlett's Algol eigenvalue procedure Eig. 3. Math. Laguerre's method to the matrix eigenvalue problem. (A)
Comput. 18 (1964), 486-487. Donald Fisher (Forsythe and Gilbarg, 1962). Calculation of
62. Automatic grading programs (with Niklaus Wirth). Comm. subsonic cavities with sonic free streamlines. (A)
A C M 8 (1965), 275-278. Ramon E. Moore (Forsythe and McGregor, 1963). Interval
63. On the stationary values of a second-degree polynomial on arithmetic and automatic error analysis in digital computing. (A)
the unit sphere (with Gene H. Golub). J. Soc. lndust. Appl. Robert Causey (Forsythe, 1964). On closest normal
Math. 13 (1965), 1050-1068. matrices. (A)
64. An undergraduate curriculum in numerical analysis. Comm. Cleve B. Moler (Forsythe, 1965). Finite difference methods
A C M 7 (Apr. 1964), 214-215. for the eigenvalues of Laplaee's operator. (A)
65. President's Letters to the ACM Membership. Comm. A C M James Daniel (Forsythe and Schiffer, 1965). The conjugate
7 (1964), 448, 507, 558, 633-634, 697; 8 (1965), 3,143-144, gradient method for linear and nonlinear operator equations. (A)
422-423,541,591,727; 9 (1966), 1,244, 325. Donald W. Grace (Forsythe and Polya, 1965). Computer
66. Solution to Problem 5334. Amer. Math. Monthly 72 (Nov. search for nonisomorphic convex polyhedra. (B)
1965), 1030. James M. Varah (Forsythe, 1966). The computation of
67. Algorithms for scientific computation. Comm. A C M 9 (Apr. bounds for the invariant subspaces of a general matrix
1966), 255-256. operator. (A)
68. A university's educational program in Computer Science. Roger W. Hockney (Buneman, Forsythe, Golub, 1966). The
Comm. A C M 10 (1967), 3-11. computer simulation of anomalous plasma diffusion and the
69. Today's computational methods of linear algebra. SIAM numerical solution of Poisson's equation. (B)
Rev. 9 (1967), 489-515. Reprinted In Studies in Numerical Paul Richman (Forsythe and Herriot, 1968). 1. e-Calculus.
Analysis 1, Soc. Indus. Appl. Math., Philadelphia, 1968. 2. Transonic fluid flow and the approximation of the iterated
70. On the asymptotic directions of the s-dimensional optimum integrals of a singular function. (C)
gardient method. Numerische Mathematik 11 (1968), 57-76. J. Alan George (Forsythe and Dorr, 1971 ). Computer
71. What to do till the computer scientist comes. Amer. Math. implementation of the finite element method. (C)
Monthly 75 (1968), 454-462. [Winner of Lester R. Ford Award, Richard P. Brent (Forsythe, Dorr, and Moler, 1971 ). Algo-
1969.] rithms for finding zeros and extrema of functions without
72. Solving a quadratic equation on a computer. In The Mathe- calculating derivatives. (C)
matical Sciences, COSRIMS and George Boehm (eds.), MIT David R. Stoutemyer (Forsythe, 1972). Numerical imple-
Press, Cambridge, Mass., 1969, pp. 138-152. mentation of the Schwarz alternating procedure for elliptic
73. Remarks on the paper by Dekker. In Constructive Aspects o/ partial differential equations. (C)
the Fundamental Theorem of Algebra, Bruno Dejon and Peter
Henrici (eds.), Wiley-Interscience, New York, 1969, pp. 49-51.
74. What is a satisfactory quadratic equation solver? In Con-
structive Aspects of the Fundamental Theorem of Algebra, Bruno
Dejon and Peter Henrici (eds.), Wiley-Interscience, New York,
1969, pp. 51-61.
75. Computer science and education. Proc. IFIP 68 Cong.,
92-106.
76. Design-then and now. The Digest Record of the ACM-
SIAM-IEEE 1969 Joint Conf. on Mathematical and Computer
Aids to Design, ACM, 1969, pp. 2-10.
77. Let's not discriminate against good work in design or experi-
mentation. AFIPS 1969 SJCC, Vol. 34, 1969, AFIPS Press,
Montvale, N,J., pp. 538-539.
78. Pitfalls in computation, or why a math book isn't enough.
Amer. Math. Monthly 77 (1970), 931-956. [Winner of Lester R.
Ford Award, 1971.]
79. The maximum and minimum of a positive definite quadratic
polynomial on a sphere are convex functions of the radius. S l A M
J. Appl. Math. 19 (1970), 551-554.

726 Communications August 1972


of Volume 15
the ACM Number 8
Student Paper Competition Awards
On the following pages are the winning papers in the We are also grateful to those authors listed below
first annual ACM Communications Student Paper Compe- whose papers we were unable to publish, but whose
tition. We, the Student Editorial Committee, started work efforts were good enough to make our decisions very
on this issue in January 1971, learning each step of the difficult. Without the many months of work they put into
process as we went. The hardest part for us (other than their papers, the Competition could not have been a
waiting for the first entry to arrive) was making the final success.
decision n o t to publish a given p a p e r - e a c h time we found Wilfred S. Ageno, University of Hawaii
ourselves delaying this decision, hoping to make it easier. Todd Allen, University of Delaware
Wayne F. Bialas and David J. Decker, Clarkson College of
We are very pleased with the three winning papers; Technology
we hope that their depth and diversity will encourage the Ronald J. Brachman, Princeton University
professional world to seek student participation, and in- Donald Cohen, Carnegie-Mellon University
Patricia R. Cox and Cheryl J. Whitford, University of
spire students to contribute their own ideas. In addition New Mexico
to the recognition received by having their papers pub- Ola-Olu Adeniyi Daini, Ohio Wesleyan University
lished, the authors' awards are: Dennis J. Eaglestone, Arizona State University
William A. Gates, University of Wisconsin
First place. "Generating Parsers for Affix G r a m m a r s " Randall Glissmann, Northwestern University
by David R. Crowe of the University of British Columbia. Gary Gorsline, Blacksburg High School (Va.)
Joseph W. Guderjohn, University of Colorado
$250 cash, a trip to ACM 72 to receive the award in per- Jewell M. Harwood, State University College at
son, and a three-year subscription to the ACM serial pub- Plattsburgh (N.Y.)
lication of his choice. James R. Heath, Purdue University
Douglas H. Hoffman and Alan R. Schwartz, University of
Second place. "Political Redistricting by Computer" California (Santa Barbara)
by Robert E. Helbig, Patrick K. Orr, and Robert R. Joan Marie Hrenko, North Carolina State University
Roediger of Washington University. $150 cash, and for Robert A. Kelley, Cubberley High School (Palo Alto, Calif.)
Gerard F. Lameiro, Colorado State University
each author a three-year subscription to the ACM serial David Misunas, Massachusetts Institute of Technology
publication of his choice. Randall B. Neff, Rice University
Third place. "An Extensible Editor for a Small Ma- John R. Odden, California Institute of Technology
Richard A. Page, San Jose State College
chine with Disk Storage" by Arthur J. Benjamin of Bran- Donald C. Pierantozzi, Drexel University
deis University. $100 cash, and a three-year subscription Joseph P. Sambataro Jr., Fordham University
to the ACM serial publication of his choice. Lee J. Schemer, Massachusetts Institute of Technology
Thomas A. Schultz, Johns Hopkins University
(The number of papers that will be published and the Douglas R. Spence, Florida Institute of Technology
pattern of awards may not be the same in subsequent Edwin Thanhouser, Trinity University
William W. Thomas II1, PMC Colleges (Pa.)
years.) Mark Tomizawa, Kenwood High School (Chicago)
All of the refereeing of Competition papers was done Ronald W. Van Orne Jr. and William H. Walker IV, US Air
by graduate students at various colleges and universities. Force Academy
Nicholan F. Vitulli and David Woods, Colgate University
Our thanks are extended to the referees listed below (and
Finally, we wish to thank all those in the ACM who
a few others we may have omitted) for their efforts in
have made our job fun and interesting, particularly Elliott
writing careful, detailed critiques of the papers. Both
Organick, Myrtle Kellington, M. Stuart Lynn and George
they and the authors have learned from the work that
Capsis.
went into the excellent two- and three-page reports.
As students at Stanford University, we wish to voice
Susan Bloch Marc T. Kaufman our gratitude for the encouragement and inspiration we
Ashok Chandra Gary Knott received from our late department chairman, George
Clark Crane Jean-Pierre Levy
Robert Crawford Michael Manthey E. Forsythe, whom we all miss so deeply.
Alan Davis William L. McKinney 1971-72 ACM Communications
Michael S. Doyle Donald R. Oestreicher Student Editorial Committee:
Carl D. Farrell Gary J. Pace
Alan Filipski Gerry Purdy Isu Fang, Chairman
Roger G. Frey Gabriele Ricci Dennis P. Brown
T. Furugori Harry Saal Michael A. Malcolm
Gunnar R. Grape Michael Saunders
Michael Hanau Daniel P. Siewiorek Stephen A. Ness
William H. Harrison David C. Smith Richard L. Sites
Alan B. Hayes Edward Syrett Richard E. Sweet
Robert Johnson James W. Welsch
Linda Kaufman Nelson Wiederman Andrew S. Woyzbun

727 Communications August 1972


of Volume 15
the ACM Number 8
SOFTWARE-PRACTICE A N D EXPERIENCE, VOL. 11, 1119-1184 (1981)

Breaking Paragraphs into LinesX


DONALD E. KNUTH AND MICHAEL F. PLASS
Computer Science Department, Stanford University, Stanford, California 94305, U.S.A.

SUMMARY
This paper discusses a new approach to the problem of dividing the text of a paragraph into
lines of approximately equal length. Instead of simply making decisions one line at a time,
the method considers the paragraph as a whole, so that the final appearance of a given line
might be influenced by the text on succeeding lines. A system based on three simple primitive
concepts called ‘boxes’, ‘glue’, and ‘penalties’ provides the ability to deal satisfactorily with
a wide variety of typesetting problems in a unified framework, using a single algorithm that
determines optimum breakpoints. The algorithm avoids backtracking by a judicious use
of the techniques of dynamic programming. Extensive computational experience confirms
that the approach is both efficient and effective in producing high-quality output. The paper
concludes with a brief history of line-breaking methods, and an appendix presents a simplified
algorithm that requires comparatively few resources.
KEY WORDS Typesetting Composition Linebreaking Justification Dynamic programming
Word processing Layout Spacing Box/glue/penalty algebra Shortest paths
TEX (Tau Epsilon Chi) History of printing

INTRODUCTION
One of the most important operations necessary when text materials are prepared for
printing or display is the task of dividing long paragraphs into individual lines. When
this job has been done well, people will not be aware of the fact that the words they
are reading have been arbitrarily broken apart and placed into a somewhat rigid and
unnatural rectangular framework; but if the job has been done poorly, readers will
be distracted by bad breaks that interrupt their train of thought. In some cases it
can be difficult to find suitable breakpoints; for example, the narrow columns often
used in newspapers allow for comparatively little flexibility, and the appearance of
mathematical formulas in technical text introduces special complications regardless
of the column width. But even in comparatively simple cases like the typesetting of
an ordinary novel, good line breaking will contribute greatly to the appearance and
desirability of the finished product. In fact, some authors actually write better material
when they are assured that it will look sufficiently beautiful when it appears in print.
The line-breaking problem is informally called the problem of ‘justification’, since it
is the ‘J’ of ‘H & J’ (hyphenation and justification) in today’s commercial composition
and word-processing systems. However, this tends to be a misnomer, because printers

*This research was supported in part by the National Science Foundation under grants IST-7921977
and MCS-7723738; by Office of Naval Research grant N00014-7&C-0330; by the IBM Corporation; and
by the Addison-Wesley Publishing Company. ‘mX’and ‘Tau Epsilon Chi’ are registered trademarks of the
American Mathematical Society.

0038-0644/8l/lllll9-66 $06.60 Received 25 December 1980


@ 1981 by John Wiley & Sons, Ltd. Revised 6 February 1981
1120 DONALD E. KNUTH AND MICHAEL F. PLASS

have traditionally used justification to mean the process of taking an individual line of
type and adjusting its spacing to produce a desired length. Even when text is being
typeset with ragged right margins (therefore ‘unjustified’), it needs to be broken into
lines of approximately the same size. The job of adjusting spaces so that left and
right margins are uniformly straight is comparatively laborious when one must work
with metal type, so the task of typesetting a paragraph with last century’s technology
was conceptually a task of justification; nowadays, however, it is no trick at all for
computers to adjust the spacing as desired, so the line-breaking task dominates the
work. This shift in relative difficulty probably accounts for the shift in the meaning of
‘justification’; we shall use the term ‘line breaking’ in this paper to emphasize the fact
that the central problem of concern here is to find breakpoints.
The traditional way to break lines is analogous to what we ordinarily do when using
a typewriter: A bell rings (at least conceptually) when we approach the right margin,
and at that time we decide how best to finish off that line, without looking ahead to see
where the next line or lines might end. Once the typewriter carriage has been returned
to the left margin, we begin afresh without needing to remember anything about the
previous text except where the new line starts. Thus, we don’t have to keep track of
many things at once; such a system is ideally suited to human operation, and it also
leads to simple computer programs.
Book printing is different from typing primarily in that the spaces are of variable
width. Traditional practice has been to assign a minimum and maximum width to
interword spaces, together with a normal width representing the ideal situation. The
standard algorithm for line breaking (see, for example, Barnett’, page 55) then proceeds
as follows: Keep appending words to the current line, assuming the normal spacing,
until reaching a word that does not fit. Break after this word, if it is possible to do
so without compressing the spaces to less than the given minimum; otherwise break
before this word, if it is possible to do so without expanding the spaces to more than
the given maximum. Otherwise hyphenate the offending word, putting as much of it
on the current line as will fit; if no suitable hyphenation points can be found, this may
result in a line whose spaces exceed the given maximum.
There is no need to confine computers to such a simple procedure, since the data for
an entire paragraph is generally available in the computer’s memory. Experience has
shown that significant improvements are possible if the computer takes advantage of
its opportunity to ‘look ahead’ at what is coming later in the paragraph, before making
a final decision about where any of the lines will be broken. This not only tends to
avoid cases where the traditional algorithm has to resort to wide spaces, it also reduces
the number of hyphenations necessary. In other words, line breaking decisions provide
another example of the desirability of ‘late binding’ in computer software.
One of the principal reasons for using computers in typesetting is to save money, but
at the same time we don’t want the output to look cheaper. A properly programmed
computer should, in fact, be able to solve the line-breaking problem better than a
skilled typesetter could do by hand in a reasonable amount of time (unless we give this
person the liberty to change the wording in order to obtain a better fit). For example,
Duncan2 studied the interword spacing of 958 lines that were manually typeset by a
“most respectable publishers’ printer” that he chose not to identify by name, and he
found that nearly 5 % of the lines were quite loosely set; the spaces on those lines
exceeded 10 units (i.e., of an em), and two of the lines even had spaces ezceeding
13 units. We shall see later that a good line-breaking algorithm can do better than this.
BREAKING PARAGRAPHS INTO LINES 1121

Besides the avoidance of hyphens and wide spaces, we can improve on the traditional
line-breaking method by keeping the spaces nearly equal to the normal size, so that
they rarely approach the minimum or maximum limits. We can also try to avoid rapid
changes in the spacing of adjacent lines; we can make special efforts not to hyphenate
two lines in a row, and not to hyphenate the second-last line of a paragraph; we can
try to control the white space on the final line of the paragraph; and so on. Given any
mathematical way to rate the quality of a particular choice of breakpoints, we can ask
the computer to find breakpoints that optimize this function.
But how is the computer to solve such a problem efficiently?When a given paragraph
has n optional breakpoints, there are 2" ways to break it into lines, and even the fastest
conceivable computers could not run through all such possibilities in a reasonable
amount of time. In fact, the job of breaking a paragraph as nicely as possible into
equal-size lines sounds suspiciously like the infamous bin-packing problem, which is
well known to be NP ~ o m p l e t e .Fortunately,
~ however, each line is to consist of
contiguous information from the paragraph, so the line-breaking problem is amenable
to the techniques of discrete dynamic p r ~ g r a m m i n g ~this
. ~ ;means there is a reasonably
efficient way to attack it. We shall see that the optimum breakpoints can be found
in practice with only about twice as much computation as needed by the traditional
algorithm; the new method is sometimes even faster than the old, when we consider
the time saved by not needing to hyphenate so often. Furthermore the new algorithm
is capable of doing other things like setting a paragraph one line longer or one line
shorter, in order to improve the layout of a page.

FORMULATING T H E PROBLEM
Let us now state the line-breaking problem explicitly in mathematical terms. We
shall use the basic concepts and terminology of the TEX typesetting system6, but in
simplified form, since the complexities of general typesetting would obscure the main
principles of line breaking.
For the purposes of this paper, a paragraph is a sequence x1x2 . . . x,,,of m items,
where each individual item xi is either a box specification, a glue specification, or a
penalty specification.
0 A box refers to something that is to be typeset: either a character from some font
of type, or a black rectangle such as a horizontal or vertical rule, or something
built up from several characters such as an accented letter or a mathematical
formula. The contents of a box may be extremely complicated, or they may be
extremely simple; the line-breaking algorithm does not peek inside a box to see
what it contains, so we may consider the boxes to be sealed and locked. As far as
we are concerned, the only relevant thing about a box is its width:When item x iof
a paragraph specifies a box, the width of that box is a real number w irepresenting
the amount of space that the box will occupy on a line. The width of a box may be
zero, and in fact it may also be negative, although negative widths must be used
with care and understanding according to the precise rules laid down below.
0 Glue refers to blank space that can vary its width in specified ways; it is an elastic
mortar used between boxes in a typeset line. When item xiof a paragraph specifies
glue, there are three real numbers (wi, y i ,zj) of importance to the line-breaking
1122 DONALD E. KNUTH AND MICHAEL F. PLASS

algorithm:
wi is the ‘ideal’ or ‘normal’ width;
yi is the ‘stretchability’;
zi is the ‘shrinkability’.
For example, the space between words in a line is often specified by the values
w i= f e m , yi = i e m , zi = $em, where one em is the set size of the type being
used (approximately the width of an uppercase ‘M’ in classical type styles). The
actual amount of space occupied by this glue can be adjusted when justifying
a line to some desired width; if the normal width is too small, the adjustment
is proportional to yi, and if the normal width is too large the adjustment is
proportional to zi. The numbers wi, yi, and zi may be negative, subject to certain
natural restrictions explained later; for example, a negative value of wi indicates
a backspace. When yi = zi = 0, the glue has a fixed width wi. Incidentally, the
word ‘glue’ is perhaps not the best term, because it sounds a bit messy; a word
like ‘spring’ would be better, since metal springs expand or compress to fill up
space in essentially the way we want. However, we shall continue to say ‘glue’, a
term used since the early days of TEX (1977), because many people claim to like
it. A glob of glue is often called a skip by T E X users, and it seems preferable to
speak of boxes and skips rather than boxes and springs or boxes and glues. A
skip, by any other name, is of course the same abstract concept, embodied by the
three values (wi,yi, xi).
Penalty specifications refer to potential places to end one line of a paragraph
and begin another, with a certain ‘aesthetic cost’ indicating how desirable or
undesirable such a breakpoint would be. When item x i of a paragraph specifies
a penalty, there is a number pi that helps us decide whether or not to end a
line at this point, as explained below. Intuitively, a high penalty p i indicates
a relatively poor place to break, while a negative value of p i stands for a good
+
breaking-off place. The penalty p i may also be 00 or - 00, where ‘00’ denotes
a large number that is infinite for practical purposes, although it really is finite;
in T E X , any penalty 2 1000 is treated as +a,and any penalty 6 - 1000 is
+
treated as - co. When p i = co, the break is strictly prohibited; when p i =
- GO, the break is mandatory. Penalty specifications also have widths wi, with the
following meaning: If a line break occurs at this place in the paragraph, additional
typeset material of width wi will be added to the line just before the break occurs.
For example, a potential place at which a word might be hyphenated would be
indicated by letting p i be the penalty for hyphenating there and letting wi be
the width of the hyphen. Penalty specifications are of two kinds, flagged and
unflagged, denoted by f i = 1 and f i = 0. The line-breaking algorithm we shall
discuss tries to avoid having two consecutive breaks at flagged penalties (e.g., two
hyphenations in a row).
Thus, box items are specified by one number wi, while glue items have three numbers
(wi,yi,xi) and penalty items have three numbers (wi,pi,fi). For simplicity, we shall
assume that a paragraph x1 . . . x, is actually specified by six sequences, namely

t , . .. t,, where t i is the type of item x i , either ‘box’, ‘glue’, or ‘penalty’;


w 1. . . w,, where wi is the width corresponding to xi;
BREAKING PARAGRAPHS INTO LINES 1123

y1 . . .y,, where yi is the stretchability corresponding to x i if t i = ‘glue’,


otherwise yi = 0;
z, . . . z,, where 3 is the shrinkability corresponding to x i if t i = ‘glue’,
otherwise zi= 0;
p , . . .p,, where p i is the penalty at xiif t i = ‘penalty’,
otherwise p i= 0;
fi. . .f,, where3 = 1 if xiis a flagged penalty, otherwise3 = 0.

Any fixed unit of measure can be used in connection with wi, yi,and zi; TEX uses
printers’ points, which are slightly less than inch. In this paper we shall specify
all widths in terms of machine units equal to h e m , assuming a particular size of
type, since the widths turn out to be integer multiples of this unit in many cases;
the numbers in our examples will be as simple as possible when expressed in terms
of machine units.
Perhaps the reader feels this is altogether too much mathematical machinery to
deal with something that is quite straightforward. However, each of the concepts
defined here must be dealt with somehow when breaking paragraphs into lines, and it is
important to give precise rules even for the comparatively simple job of setting straight
text. We shall see later that these primitive notions of boxes, glue, and penalties
will actually support a surprising variety of other line-breaking applications, so that a
careful attention to details bill solve many other problems as a free bonus.
For the time being, it will be best to think of the simple application to straight
text material such as the typesetting of a paragraph in a newspaper or in a short story,
since this will help us internalize the abstract concepts represented by wi,yi,etc. A
typesetting system like TEX will put such an actual paragraph into the abstract form
we want in the following way:
(1) If the paragraph is to be indented, the first item x, will be an empty box whose
width is the amount of indentation.
(2) Each word of the paragraph becomes a sequence of boxes for the characters of the
word, including punctuation marks that belong with that word. The widths w i
are determined by the fonts of type being used. Flagged penalty items are inserted
into these words wherever an acceptable hyphenation could be used to divide a
word at the end of a line. (Such hyphenation points do not need to be included
unless necessary, as we shall see later, but for the moment let us assume that all
of the permissible hyphenations have been specified.)
(3) There is glue between words, corresponding to the recommended spacing conven-
tions of the fonts of type in use. The glue might be different in different contexts;
for example, T E X will make the glue specifications following punctuation marks
slightly different from the normal interword glue.
(4)Explicit hyphens and dashes in the text will be followed by flagged penalty items
having width zero. This specifies a permissible line break after a hyphen or a
dash. Some style conventions also allow breaks before em-dashes, in which case
an unflagged width-zero penalty would precede the dash.
(5) At the very end of a paragraph, two items are appended so that the final line
will be treated properly. First comes a glue item x,-, that specifies the white
space allowable at the right of the last line; then comes a penalty item x, with
1124 DONALD E. KNUTH AND MICHAEL F. PLASS

p, = - co to force a break at the paragraph end. T E X ordinarily uses a ‘finishing


glue’ with w,- = 0, ym-l = 00 (actually 100000 points, which is finite but
large enough to behave like a), and z,,,-~= 0; thus the normal space at the end
of a paragraph is zero but it can stretch a great deal. The net effect is that’the
other spaces on the final line will shrink, if that line exceeds the desired measure;
otherwise the other spaces will remain essentially at their normal value (because
the finishing glue will do all the stretching necessary to fill up the end of the line).
More subtle choices of the finishing glue x,- will be discussed later.
For example, let’s consider the paragraph of Figure 1 , which is taken from Grimm’s
Fairy Tales.7 T h e five rules above convert the text into the following sequence of
601 items:
x1 = empty box for indentation w1 = 18
x2 = box for ‘I’ w2 = 6
x3 = box for ‘n’ w 3 = 10
x4 = glue for interword space w4 = 6, Y4 = 3, z4 = 2
x5 = box for ‘0’ w 5= 9
......
x309 = box for ‘1’ w309 =
~ 3 1 =
0 box for ‘i’ w310 =
xjl = box for ‘m’ ~ 3 1 =
1 15
~ 3 1 2 = box for ‘e’ w312 =
x31 = box for ‘-’ w313 =
x 3 1 4 = penalty for explicit hyphen w314 = 0, P314 = f314 = 1
x 3 1 5 = box for ‘t’ w315 =
........
x592 = box for ‘y’ w592 = 10
x593= penalty for optional hyphen w593 = 6, P593 = 50’ f593 = 1
x594= box for ‘t’ w594 =7
x595= box for ‘h’ w595 = 10
x 5 9 6 = box for ‘i’ w596 =
x597= box for ‘n’ w597 = 10
x598= box for ‘g’ W598 = 9
x599= box for ‘.’ w599 =5
x 6 0 0 = finishing glue w600 = Y600 = O0, 2600 =
x 6 0 1 = forced break w601 = O, P601 = f601 =
In this particular example, a penalty of 50 has been assessed for every line that ends
with a hyphen.
In olden times when wis4,ing still helped one, there . l a 1
lived a king whose daughters were all beaqtiful; and . a i a
the youngst was so beaqt&ful that the sun i@If’, which -.nnl
Figure 1 . A n example has seen so much, was astoqished whenper it shone in
paragraph that has been her face. Close by the king’s castle lay a great dark .lT1
typeset by the ‘first-fit’ fowst, and uqier an old limqtree in the forgst was a -146
method. Small triangles well, and when the day was very warm, the king’s child -.enK
show permissible places to
divide words with hyphens;
went out into the forpt and sat down by the side of the -.89a
the adjustment ratio for cool fouqtain; and when she was bored she took a .9er
spaces appears at the golden ball, and threw it up on high and caught it; and -.vn8
right of each line. this ball was her favorite pla@hing. .001
BREAKING PARAGRAPHS INTO LINES 1125

Optional hyphenation points have been indicated with triangles in Figure 1 . It is


considered bad form to insert a hyphen unless at least two letters precede it and three
follow it; furthermore the syllable following a hyphen shouldn’t have a silent ‘e’, so
we do not admit a hyphenation like ‘sylla-ble’. Smooth reading also means that the
word fragment preceding a hyphen should be long enough that it can be pronounced
correctly, before the reader sees the completion of the word on the next line; thus, a
hyphenation like ‘pro-cess’ would be disturbing. This pronunciation rule accounts for
the fact that the second-last word of Figure 1 does not admit the potential hyphenation
‘fa-vorite’, since the fragment ‘fa-’ might well be the beginning of ‘fa-ther’ which is
pronounced quite differently.
The choice of proper hyphenation points is an important but difficult subject that
is beyond the scope of this paper. We shall not mention it further except to assume
that (a) such potential breakpoints are available to our line-breaking algorithm when
needed; (b) we prefer not to hyphenate when there is a way to avoid it without seriously
messing up the spacing.
The rules for breaking a paragraph into lines should be intuitively clear from this
example, but it is important to state them explicitly. We shall assume that every
paragraph ends with a forced break item x, (penalty -m). A legal breakpoint in a
paragraph is a number b such that either (i) x b is a penalty item with p b < co, or (ii) xb
is a glue item and x b - l is a box item. In other words, one can break at a penalty,
provided that the penalty isn’t co, or at glue, provided that the glue immediately
follows a box. These two cases are the only acceptable breakpoints. Note, for example,
that several glue items may appear consecutively, but it is possible to break only at
the first of them, and only if this one does not immediately follow a penalty item. A
penalty of co can be inserted before glue to make it unbreakable.
The job of line breaking consists of choosing legal breakpoints 6 , < . . < b,, which
+

specify the ends of k lines into which the paragraph will be broken. Each penalty
item xiwhose penalty p i is - GO must be included among these breakpoints; thus, the
final breakpoint b, must be equal to m. For convenience we let b, = 0, and we define
--
indices a I < . <ak to mark the beginning of the lines, as follows: The value of aj
is the smallest integer i between b j - l and bj such that xi is a box item or a penalty
item with p i = -a; if none of the x i in the range b j - l < i < bj meet this criterion,
<
we let aj = b j . Then the j t h line consists of all items x i for a j i < b j , plus item
x b , if it is a penalty item. In other words we get the lines of the broken paragraph by
cutting it into pieces at the chosen breakpoints, then removing glue and penalty items
at the beginning of each resulting line.

DESIRABILITY CRITERIA

According to this definition of line breaking, there are 2” ways to break a paragraph
into lines, if the paragraph has n legal breakpoints that aren’t forced. For example,
there are 129 legal breakpoints in the paragraph of Figure 1 , not counting x6,,, so
it can be broken into lines in 2129ways, a number that exceeds lo3’. But of course
most of these choices are absurd, and we need to specify some criteria to separate
acceptable choices from the ridiculous ones. For this purpose we need to know (a) the
desired lengths of lines, and (b) the lengths of lines corresponding to each choice of
breakpoints, including the amount of stretchability and shrinkability that is present.
Then we can compare the desired lengths to the lengths actually obtained.
1126 DONALD E. KNUTH AND MICHAEL F. PLASS

We shall assume that a list of desired lengths Z,, I,, .Z3, . . .is given; normally these
are all the same, but in general we might want lines of different lengths, as when fitting
text around an illustration. The actual length Lj of thejth line, after breakpoints have
been chosen as above, is computed in the following obvious way: We add together
the widths wi of all the box and glue items in the range uj < i < b,, and we add w,,,
to this total if xb, is a penalty item. T h e j t h line also has a total stretchability Y j and
total shrinkability Zj, obtained by summing all of the yi and zi for glue items in the
<
range uj i < bj. Now we can compare the actual length L j to the desired length Z,
by seeing if there is enough stretchability or shrinkability to change Lj into 4; we
define the adjustment ratio rj of the j t h line as follows:
If Lj = lj (a perfect fit), let rj = 0.
If L, < Zj (a short line), let r, = (Zj-Lj)/Y j , assuming that Y j > 0; the value
of rj is undefined if Yj<0 in this case.
If L, > Z, (a long line), let r j = (Zj-Lj)/Zj,assuming that Z j > 0; the value of r j
is undefined if Z j < 0 in this case.
4
Thus, for example, rj = if the total stretchability of l i n e j is three times what would
be needed to expand the glue so that the line length would change from L, to 5.
According to this definition of adjustment ratios, t h e j t h line can be justified by
letting the width of all glue items xion that line be

w i + r j y i , if rj> 0;
w i + r j z i , if rj< 0;
For if we add up the total width of that line after such adjustments are made, we get
5
either Lj+rj Yj = or L j + r j Z j = 5, depending on the sign of rj. This distributes
the necessary stretching or shrinking by amounts proportional to the individual glue
components yi or zi, as desired.
For example, the small numbers at the right of the individual lines in Figure 1 show
the values of rj in those lines. A negative ratio like - .881 in the third line means that
the spaces in that line are narrower than their ideal size; a fairly large positive ratio
like .965 in the third-last line indicates a very ‘loose’ fit.
Although there are 2lZ9ways to break the paragraph of Figure 1 into lines, it turns
out that only 49 of these will result in breaks whose adjustment ratios rj do not
exceed 1 in absolute value; this means that the spaces between words after justification
will lie between wi-zi and wi+yi. Furthermore, only 30 of these 49 ways to make
‘nice’breaks will do so without introducing hyphens. One of these ways is obtained by
moving ‘the’ from the eighth line down to the ninth.
Our main goal is to find a way to avoid choosing any breakpoints that lead to lines
in which the words are spaced very far apart,
or in which they are very close together, because such lines are distractingand harder to read.
We might therefore say that the line-breaking problem is to find breaks such that
lrjl < 1 in each line, with the minimum number of hyphenations subject to this
condition. Such an approach was taken by Duncan et a1.’ in the early 1960s, and
they obtained fairly good results. However, this criterion depends only on the values
wi-zi and wi+yi, not wi itself, so it does not use all the degrees of freedom present
in our data. Furthermore, such stringent conditions may not be possible to achie‘ve; for
example, if each line of our example were to be 418 units wide, instead of the present
BREAKING PARAGRAPHS INTO LINES 1127

In olden times when wisung still helped one, there


lived a king whose daughters were all beau,.t#ul; and . d i a
the younest was so beaqtgul that the Bun iQelf, which -.ant
has seen so much, was astoqjshed whewver it shone .ddd
in her face. Close by the king’s castle lay a great dark -.m65
fopst, and uqder an old lim%tree in the forpt was a
well, and when the day was very warm, the king’s child -+a06

Figure 2. The paragraph went out into the for&& and sat down by theside of .zsI
of Figure 1 when the ‘best-jit’ the cool fouqtain; and when she was bored she took a -.lal
method has been used to find golden ball, and threw it up on high and caught it; .60z

successive breakpoints. and this ball was her favoqjte plaGhing. .a01

width of 421 units, there would be no way to set the text of Figure 1 without having at
least one very tight line (rj < -1) or at least one very loose line (rj > 1).
We can do a better job of line breaking if we deal with a continuously varying
criterion of quality, not simply the yes/no tests of whether Irjl ,< 1 or not. Let us
therefore give a quantitative evaluation of the badness of the j t h line by finding a
I 1
formula that is nearly zero when rj is small but grows rapidly when rj takes values I I
exceeding 1. Experience with TEX has shown that good results are obtained if we
define the badness of l i n e j as follows:
if rj is undefined or rj < - 1;
Bj = ( 1001rjI3,
00,
otherwise.
Thus, for example, the individual lines of Figure 1 have badness ratings that are
approximately equal to 0, 7, 68, 18, 5 , 0, 69, 72, 90, 49, 0, respectively. Note that a
line is considered to be ‘infinitely bad’ if rj < -1; this means that glue will never be
shrunk to less than wi -zi. However, values of rj >1 are only finitely bad, so they
will be permitted if there is no better alternative.
A slight improvement over the method used to produce Figure 1 leads to Figure 2.
Once again each line has been broken without looking ahead to the end of the paragraph
and without going back to reconsider previous choices, but this time each break was
chosen so as to minimize the ‘badness plus penalty’ of that line. In other words, when
choosing between alternative ways to end thejth line, given the ending of the previous
line, we obtain Figure 2 if we take the minimum possible value of Pj+nj; here pj is
the badness as defined above, and nj is the amount of penalty pbj if the j t h line ends
at a penalty item, otherwise nj = 0. Figure 2 improves on Figure 1 by moving words
down from lines 4, 8, and 10 to the next line.
The method that produces Figure 1 might be called the ‘first-fit’ algorithm, and the
corresponding method for Figure 2 might be called the ‘best-fit’ algorithm. We have
seen that best-fit is superior to first-fit in this particular case, but other paragraphs can
be contrived in which first-fit finds a better solution; so a single example is not sufficient
to decide which method is preferable. In order to make an unbiased comparison of
the methods, we need to get some statistics on their ‘typical’ behavior. Therefore
300 experiments were performed, using the text of Figures 1 and 2, with line widths
ranging from 350 to 649 in unit steps; although the text for each experiment was the
same, the varying line widths made the problems quite different, since line-breaking
algorithms are quite sensitive to slight changes in the measurements. The ‘tightest’
1128 DONALD E. KNUTH AND MICHAEL F. PLASS

In olden times when wish@g still helped one, there


lived a king whose daug&ers were all beaqtgul; and .4ia
the youngst was so beaqtgul that the sun iQelf, which
has seen so much, was astoqished wheqver it shone .444
in her face. Close by the king’s castle lay a great dark -.sea
Figure 3. This is the forpt, and u d e r an old lim%tree in the for& was .TO$
‘best possible’ w a y to break a well, and when the day was very warm, the king’s
the lines in the paragraph child went out into the for8st and sat down by the side -.e14
of Figures 1 and 2 , in
the sense of fewest total of the cool fouqtain; and when she was bored she took -.4el
‘demerits’ as defined in a golden ball, and threw it up on high and caught it; .%04
the text. and this ball was her favorite plaxthing. .001

and ‘loosest’ lines in each resulting paragraph were recorded, as well as the number of
hyphens introduced, and the comparisons came out as follows:
min yj max rj hyphens
first-fit < best-fit 69% 35% 12%
first-fit = best-fit 26% 50% 77%
first-fit > best-fit 5% 15% 11%

Thus, in 69% of the cases, the minimum adjustment ratio rj in the lines typeset
by first-fit was less than the corresponding value obtained by best-fit; the maximum
adjustment ratio in the first-fit lines was less than the maximum for best-fit about 35%
of the time; etc. We can summarize this data by saying that the first-fit method usually
typesets at least one line that is tighter than the tightest line set by best-fit, and it
also usually produces a line that is as loose or looser than the loosest line of best-fit.
T h e number of hyphens is about the same for both methods, although best-fit would
produce fewer if the penalty for hyphenation were increased. A more detailed study of
the experimental data shows that the superiority of best-fit is especially pronounced in
the cases where the lines are rather narrow.
We can actually do better than both of these methods by finding an ‘optimum’
way to choose the breakpoints. For example, Figure 3 shows how to improve on both
Figures 1 and 2 by making line 6 a bit looser, thereby avoiding a rather tight 7th line
and a fairly loose 10th line. This pattern of breakpoints was found by an algorithm
that will be discussed in detail below. It is globally optimum in the sense of having
fewest total ‘demerits’ over all choices of breakpoints, where the demerits assessed for
the j t h line are computed by the formula

Sj=
I (I +pj+nj)’+aj,
(I+pj)’-n;+aj,
(1 +Pj>’+aj,
if n j j O ;
if --o0<nj<o;
if nj = - m .
Here pj and nj are the badness rating and the penalty, as before; and aj is zero unless
both l i n e j and the previous line ended on flagged penalty items, in which case aj is
the additional penalty assessed for consecutive hyphenated lines (e.g., 3000). We shall
say that we have found the best choice of breakpoints if we have minimized the sum
of Sj over all linesj.
BREAKING PARAGRAPHS INTO LINES 1129

The above formula for Sj is quite arbitrary, like our formula for pj, but it works well
in practice because it has the following desirable properties: (a) Minimizing the sum
of squares of badnesses not only tends to minimize the maximum badness per line, it
also provides secondary optimization; for example, when one particularly bad line is
inevitable, the other line breaks will also be optimized. (b) The demerit function Sj
increases as nj increases, except in the case nj = - co when we don’t need to consider
the penalty because such breaks are forced. (c) Adding 1 to j j instead of using the
badness pj by itself will minimize the total number of lines in cases where there are
breaks with approximately zero badness.
For example, the following table shows the respective demerits charged to the in-
dividual lines of the paragraphs in Figures 1 , 2, and 3:
First fit Best fit Optimum fit
1 1 1
64 64 64
4803 4803 4803
374 96 96
39 33 33
2 2 1274
4958 4958 43
5313 11 581
8252 3 166
2497 519 1
1 1 ~
1
26304 10491 7063
In the first-fit and best-fit methods, each line is likely to come out about as badly as
any other; but the optimum-fit method tends to have its bad cases near the beginning,
since there is less flexibility in the opening lines.
Figure 4 on the following page shows another comparison of the same three methods
on the same text, this time with a line width of 500 units. Note that the optimum
algorithm finds a solution that does not hyphenate any words, because of its ability
to ‘look ahead’; the other two methods, which proceed one line at a time, miss this
solution because they do not know that a slightly worse first line leads in this case to
fewer problems later on. The demerits per line in Figure 4 are
First fit Best fit Optimum fit
1734 1734 2357
4692 4692 6
3440 3440 93 8
3066 9 21 2
3 1 1
1 22 2
276 210 27
5 24 10
1 10 47 6
1 1
13218 10143 403 0
In this example the 3440 demerits on the third line for ‘first fit’ and ‘best fit’ are
primarily due to the penalty of 50 for an inserted hyphen.
1130 DONALD E. KNUTH AND MICHAEL F. PLASS

(a> In olden times when wishing still helped one, there lived a king -.I41

whose daughters were all beautiful; and the youngest was so


beautiful that the sun itself, which has seen so much, was aston- -.la6

ished whenever it shone in her face. Close by the king’s castle lay
a great dark forest, and under an old lime-tree in the forest was -.lol
a well, and when the day was very warm, the king’s child went
out into the forest and sat down by the side of the cool fountain; -.6a8
and when she was bored she took a golden ball, and threw it up -.aaa
on high and caught it; and this ball was her favorite plaything. .aaO
In olden times when wishing still helped one, there lived a king -.Tal
whose daughters were all beautiful; and the youngest was so .nTT

beautiful that the sun itself, which has seen so much, was aston- -.4as
ished whenever it shone in her face. Close by the king’s castle . a m
lay a great dark forest, and under an old lime-tree in the forest .OaT

was a well, and when the day was very warm, the king’s child .3aa
went out into the forest and sat down by the side of the cool m a
fountain; and when she was bored she took a golden ball, and .a40

threw it up on high and caught it; and this ball was her favorite -.a61
plaything. .am

In olden times when wishing still helped one, there lived a


king whose daughters were all beautiful; and the youngest was .ado

so beautiful that the sun itself, which has seen so much, was .EET

astonished whenever it shone in her face. Close by the king’s .El4

castle lay a great dark forest, and under an old lime-tree in the .OaI

forest was a well, and when the day was very warm, the king’s .lTa

child went out into the forest and sat down by the side of the
cool fountain; and when she was bored she took a golden ball, .aTs

and threw it up on high and caught it; and this ball was her .sn3

favorite plaything. .aoa

Figure 4 . A somewhat wider setting of the same sample paragraph, by ( a ) the first-fit
method, ( b ) the best-fit method, and ( c ) the optimum-fit method. Notice the tight line
followed by a loose line at the beginning of examples ( a ) and ( b ) , while no hyphenation
was needed in ( c ) ; on the other hand, ( a ) is one line shorter than ( b ) and ( c ) .

The first-fit method found a way to set the paragraph of Figure 4 in only nine lines,
while the optimum-fit method yields ten. Publishers who prefer to save a little paper,
as long as the line breaks are fairly decent, might therefore prefer the first-fit solution
in spite of all its demerits. There are various ways to modify the specifications so that
the optimum-fit method will give more preference to short solutions; for example, the
stretchability of the glue on the final line could be decreased from its present huge
size to about the width of the line, thereby making the optimum algorithm prefer final
lines that are nearly full. We could also replace the constant ‘1’ in the definition of
demerits Sj by a variable parameter. T h e algorithm we shall describe below can in fact
be set up to produce the optimum solution having the minimum number of lines.
T h e text in these examples is quite straightforward, and we have been setting type
in reasonably wide columns; thus we have not been considering especially difficult or
BREAKING PARAGRAPHS INTO LINES 1131
In the meantime it
knocked a second
time, and cried,
“Princess, youngest
princess, open the
door for me. Do you
Figure 5 . Here the best-fit method is unable to find a satisfactory way to not know what you
break the lines, with respect to justified setting, because the columns are said to me yesterday
so narrow. For example, the third line contains only two spaces, and the by the cool waters of
the we117 princess,
Princess,
third-last line only one; these spaces would have to stretch considerably if
the lines were justified. Thefirst line of this paragraph also illustrates the open the door for
‘sticking-out’ problem that arises in unjustified settings. ma!”

unusual line-breaking problems. Yet we have seen that an optimizing algorithm can
produce noticeably better results even in such routine cases. The improved algorithm
will clearly be of significant value in more difficult situations, for example when math-
ematical formulas are embedded in the text, or when the lines must be narrow as in
a newspaper.
Anyone who is curious about the fate of the beautiful princess mentioned in Figures 1
through 4 can find the answer in Figure 6, which presents the whole story. The columns
in this example are unusually narrow, allowing only about 21 or 22 characters per
line; a width of about 35 characters is normal for newspapers, and magazines often
use columns about twice as wide as those in Figure 6. The line-at-a-time algorithms
cannot cope satisfactorily with such stringent restrictions, but Figure 6 shows that the
optimizing algorithm is able to break the text into reasonably equal lines.
Incidentally, our line-breaking criteria have been developed with justified text in
mind; but the algorithm has been used in Figure 6 to produce ragged right margins.
Another criterion of badness, which is based solely on the difference between the
desired length 4 and the actual length Lj, should actually be used in order to get
the best breakpoints for ragged-right typesetting, and the space between words should
be allowed to stretch but not to shrink so that Lj never exceeds 4. Furthermore,
ragged-right typesetting should not allow words to ‘stick out’, i.e., to begin to the
right of where the following line ends (see the word ‘it’ in Figure 5). Thus, it turns
out that an algorithm intended for high quality line breaking in ragged-right formats
is actually a little bit harder to write than one for justified text, contrary to the
prevailing opinion that justification is more difficult. On the other hand, Figure 6
indicates that an algorithm designed for justification usually can be tuned to produce
adequate breakpoints when justification is suppressed.
The difficulties of setting narrow columns are illustrated in an interesting way by the
pattern of words
“Now, push your little golden plate nearer . . .”
that appears in the third-last paragraph of Figure 6. We don’t want to hyphenate any
of these words, for reasons stated earlier; and it turns out that all of the four-word
sequences containing the word ‘little’, namely
“Now, push your little
push your little golden
your little golden plate
little golden plate nearer
1132 DONALD E. KNUTH AND MICHAEL F. PLASS

I N olden times when the water. “Ah, old delighted t o see her sitting by the well,
wishing still helped water-splasher, is it pretty plaything once playing, my golden
one, there lived a king you?” said she; ”I more, and she picked ball fell into the
whose daughters were am weeping for my i t up and ran away water. And because
all beautiful; and the golden ball, which has with it. “Wait, wait,” I cried so, the frog
youngest was so beau- fallen into the well.” said the frog. “Take brought it out again
tiful that the sun it- “Be quiet, and do not me with you. I can’t for me; and because
self, which has seen so weep,” answered the run as you can.” But he so insisted, I prom-
much, was astonished frog. “I can help you; what did it avail him ised him he should
whenever it shone in but what will you give t o scream his croak, be my companion, but
her face. Close by me if I bring your croak, after her, as I never thought he
the king’s castle lay a plaything up again?” loudly as he could? would be able t o come
great dark forest, and “Whatever you will She did not listen t o out of his water. And
under an old lime-tree have, dear frog,” said it, b u t ran home and now he is outside
in the forest was a she; “my clothes, my soon forgot the poor there, and wants t o
well, and when the pearls and jewels, and frog, who was forced come in t o see me.”
day was very warm, even the golden crown to go back into his In the meantime
the king’s child went that I am wearing.” well again. it knocked a sec-
out into the forest The frog answered, The next day when ond time, and cried,
and sat down by “I do not care for your she had seated her- “Princess, youngest
the side of the cool clothes, your pearls self a t table with the princess, open the
fountain; and when and jewels, nor for king and all the cour- door for me. Do you
she was bored she your golden crown; tiers, and was eet- not know what you
took a golden ball, but if you will love ing from her little said t o me yesterday
and threw i t up on me and let me be golden plate, some- by the cool waters
high and caught it; your companion and thing came creeping of the well? Prin-
and this ball was her play-fellow, and sit splish splash, splieh cess, youngest prin-
favorite plaything. by you a t your little splash, up the marble cess, open the door
Now i t so happened table, and eat off your staircase; and when for me!”
that on one occasion little golden plate, i t had got t o the Then said the king,
the princess’s golden and drink out of your top, i t knocked a t “That which you have
ball did not fall into little cup, and sleep in the door and cried, promised must you
the little hand that your little bed-if you “Princess, youngest perform. Go and let
she was holding up will promise me this princess, open the him in.” She went
for it, but on t o the I will go down below, door for me.” She and opened the door,
ground beyond, and and bring you your ran to see who was and the frog hopped
it rolled straight into golden ball up again.” outside, but when in and followed her,
the water. The king’s “Oh yes,” said she, she opened the door, step by step, t o her
daughter followed i t “I promise you all there sat the frog chair. There he sat
with her eyes, but you wish, if you will in front of it. Then and cried, “Lift me
it vanished, and the but bring me my ball she slammed the door up beside you.” She
well was deep, so back again.” But she to, in great haste, delayed, until at last
deep that the bottom thought, “How the sat down t o dinner the king commanded
could not be seen. At silly frog does talk! again, and was quite her t o do it. Once the
this she began to cry, All he does is sit in the frightened. The king frog was on the chair
and cried louder and water with the other saw plainly that her he wanted t o be on
louder, and could not frogs, and croak. He heart was beating vi- the table, and when
be comforted. And can be no companion olently, and said, “My he was on the table he
as she thus lamented to any human being.” child, what are you so said, “Now, push your
someone said t o her, But the frog, when afraid of? Is there per- little golden plate
“What ails you, king’s he had received thie chance a giant outside nearer t o me, that
daughter? You weep promise, put his head who wants t o carry we may eat together.”
so that even a stone into the water and you away?” “Ah, no,” She did this, but it
would show pity.” sank down; and in a replied she. “It is no was easy t o see that
She looked round short while he came giant, it is a disgust- she did not do it will-
to the aide from swimming up again ing frog.” ingly. The frog en-
whence the voice with the ball in his “What does a frog joyed what he ate, but
came, and saw a frog mouth, and threw it want with you?” “Ah, almost every mouth-
stretching forth its on the grass. The dear father, yesterday ful she took choked
big, ugly head from king’s daughter was as I was in the forest her. At length he said,

are too long to fit in one line. Therefore the word ‘little’ will have to appear in a
line that contains only three words and two spaces, no matter what text precedes this
particular sequence.
The final paragraphs of the story present other difficulties, some of which involve
complex interactions spanning many lines of the text, making it impossible to find
breakpoints that would avoid occasional wide spacing if the text were justified. Figure 7
shows what happens whena portion of Figure 6 is, in fact, justified; this is the most
difficult part of the entire story, in which one of the lines in the optimum solution is
BREAKING PARAGRAPHS INTO LINES 1133
‘‘I have eaten and awoke them, a car-
am satisfied, now I riage came driving
am tired; carry me up with eight white
into your little room horses, which had
and make your little white ostrich feath-
silken bed ready, and ers on their heads,
we will both lie down and were harnessed
and go to sleep.” with golden chains;
The king’s daugh- and behind stood
ter began to cry, for the young king’s ser-
she was afraid of the vant Faithful Henry.
cold frog, which she Faithful Henry had
did not like t o touch, been so unhappy
and which was now when his master was
t o sleep in her pretty, changed into a frog,
clean little bed. But t h a t he had caused
the king grew angry three iron bands t o
and said, “He who be laid round his
helped you when you heart, lest i t should
were in trouble ought burst with grief and
not afterwards t o be sadness. The car-
despised by you.” So riage was to conduct
she took hold of the the young king into
frog with two fingers, his kingdom. Faithful
carried him upstairs, Henry helped them
and put him in a cor- both in, and placed
ner. But when she was himself behind again,
in bed he crept to her and was full of joy
and said, ‘‘I am tired, because of this de-
I want to sleep as well liverance. And when
as you; lift me up or I they had driven a part
will tell your father.” of the way, the king’s
At this she was terri- son heard a cracking
bly angry, and took behind him as if some-
him up and threw him thing had broken. So
with all her might he turned round and
against the wall. cried, “Henry, the
“Now, will you be carriage is breaking.”
quiet, odious frog?” “No, master, i t is
said she. But when he not the carriage. It
fell down he was no is a band from my
frog but a king’s son heart, t h a t was put
with kind and beauti- there in my great
ful eyes. He by her pain when you were Figure 6 . The tale of the Frog K i n g , typeset
father’s will was now a frog and impris- with quite narrow lines and with ‘ragged right’
her dear companion oned in the well.”
and husband. Then Again and once again margins. The breakpoints were optimally chosen
he told her how he while they were on under the assumption that the lines would
had been bewitched their way something be justijied; a somewhat dzfferent criterion of
by a wicked witch, cracked, and each
and how no one could time the king’s son optimality would have been more appropriate f o r
have delivered him thought the carriage unjustified setting, y e t the lines did turn out to
from the well but was breaking; but i t be of approximately equal width. Quite a f e w
herself, and t h a t to- was only the bands
morrow they would t h a t were spring- hyphenations were found to be desirable, since
go together into his ing from the heart this increases the number of spaces per line and
kingdom. of Faithful Henry aids justification, even though the penalty for
Then they went to because his master
sleep, and next morn- was set free and was hyphenation was increased f r o m 50 to 5000 in
ing when the sun so happy. this example.

forced to stretch by the enormous factor 6.833. The only way to typeset that paragraph
without such wide spaces is to leave it unjustified (unless, of course, we change the
problem by altering the text or the line width or the minimum size of spaces).

FURTHER APPLICATIONS
Before we discuss the details of an optimizing algorithm, it is worthwhile to consider
more fully how the basic primitives of boxes, glue, and penalties allow us to solve a
1134 DONALD E. K N U T H AND MICHAEL F. PLASS

and were harpessed w s o


with golden chains; 3.160
and bqhind stood S.OII
the young king’s ser- .W
vant F a i t v u l Henry. 1 . C W
F a i t v u l Henry had 3.100
been so uqhappy e.esa Figure 7. This portion of the story in Figure 6 is the most difficult to
when his maqter was *.ow handle, when we try to justify the second-last paragraph using such
changed into a frog, 1.66s narrow columns; even the optimum breakpoints result in wide spaces.

wide variety of typesetting problems. Some of these applications are straightforward


extensions of the simple ideas used in Figures 1 to 4,while others seem at first to be
quite unrelated to the ordinary task of line breaking.

Combining paragraphs
If the desired line widths Zi are not all the same, we might want to typeset two para-
graphs with the second one starting in the list of line lengths where the first one leaves
off. This can be done simply by treating the two paragraphs as one, i.e., appending the
second to the first, assuming that each paragraph begins with an indentation and ends
with finishing glue and a forced break as mentioned above.

Patching
Suppose that a paragraph starts on page 100 of some book and continues on to
the next page, and suppose that we want to make a change to the first part of that
paragraph. We want to be sure that the last line of the new page 100 will end at the
right-hand margin just before the word that appears at the beginning of page 101, so
that page 101 doesn’t have to be redone. It is easy to specify this condition in terms
of our conventions, simply by forcing a line break (with penalty - 00) at the desired
place, and discarding the subsequent text. T h e ability of the optimum-fit algorithm
to ‘look ahead’ means that it will find a suitable way to patch page 100 whenever it
is possible to do so.
We can also force the altered part of the paragraph to have a certain number of
lines, k, by using the following trick: Set the desired length Z k f l of the (k+ 1)st line
equal to some number 8 that is different from the length of any other line. Then an
empty box of width 8 that occurs between two forced-break penalty items will have to
be placed on line k 1. +
Punctuation in the margins
Some people prefer to have the right edge of their text look ‘solid’, by setting periods,
commas, and other punctuation marks (including inserted hyphens) in the right-hand
margin. For example, this practice is occasionally used in contemporary advertising.
It is easy to get inserted hyphens into the margin: We simply let the width of the
corresponding penalty item be zero. And it is almost as easy to do the same for periods
and other symbols, b y putting every such character in a box of width zero and adding
the actual symbol width to the glue that follows. If no break occurs at this glue, the
accumulated width is the same as before; and if a break does occur, the line will be
justified as if the period or other symbol were not present.
BREAKING PARAGRAPHS I N T O LINES 1135

Avoiding ‘psychologically bad’ breaks


Since computers don’t know how to think, at least not yet, it is reasonable to wonder
if there aren’t some line breaks that a computer would choose but a human operator
might not, if they somehow don’t seem right. This problem does not arise very often
when straight text is being set, as in newspapers or novels, but it is quite common in
technical material. For example, it is psychologically bad to break before ‘x’or ‘y’
in the sentence

A function of x is a rule that assigns a value y to every value of x.

A computer will have no qualms about breaking anywhere unless it is told not to; but a
human operator might well avoid bad breaks, perhaps even unconsciously.
Psychologically bad breaks are not easy to define; we just know they are bad. When
the eye journeys from the end of one line to the beginning of another, in the presence
of a bad break, the second word often seems like an anticlimax, or isolated from
its context. Imagine turning the page between the words ‘Chapter’ and ‘8’ in some
sentence; you might well think that the compositor of the book you are reading should
not have broken the text at such an illogical place.
During the first year of experience with TEX, the authors began to notice occasional
breaks that didn’t feel quite right, although the problem wasn’t felt to be severe enough
to warrant corrective action. Finally, however, it became difficult to justify our claim
that TEX has the world’s best line-breaking algorithm, when it would occasionally make
breaks that were semantically annoying; for example, the preliminary TEX manual6
has quite a few of these, and the first drafts of that manual were even worse.
As time went on, the authors grew more and more sensitive to psychologically bad
breaks, not only in the copy produced by TEX but also in other published literature,
and it became desirable to test the hypothesis that computers were really to blame.
Therefore a systematic investigation was made of the first 1000 line breaks in the ACM
Journal of 1960 (which was composed manually by a Monotype operator), compared
to the first 1000 line breaks in the ACMJournaZ of 1980 (which was typeset by one of
the best commercially available systems for mathematics, developed by Penta Systems
International). T h e final lines of paragraphs, and the lines preceding displays, were
not considered to be line breaks, since they are forced; only the texts of articles were
considered, not the bibliographies. A reader who wishes to try the same experiment
should find that the 1000th break in 1960 occurred on page 67, while in 1980 it occurred
on page 64. T h e results of this admittedly subjective procedure were a total of
13 bad breaks in 1960,
5 5 bad breaks in 1980.
In other words, there was more than a four-fold increase, from about 1% to a quite
noticeable 5 - 5 % ! Of course, this test is not absolutely conclusive, because the style of
articles in the ACM Journal has not remained constant, but it strongly suggests that
computer typesetting causes semantic degradation when it chooses breaks solely on the
basis of visual criteria.
Once this problem was identified, a systematic effort was made to purge all such
breaks from the second edition of Knuth’s book Seminumerical AZgorithms’, which
was the first large book to be typeset with TEX. I t is quite easy to get’the line-
breaking algorithm to avoid certain breaks by simply prefixing the glue item by a
1136 DONALD E. KNUTH AND MICHAEL F. PLASS

penalty with pi = 999, say; then the bad break is chosen only in an emergency, when
there is no other good way to set the paragraph. I t is possible to make the typist’s
job reasonably easy by reserving a special symbol (e.g., &) to be used instead of a
normal space between words whenever breaking is undesirable. Although this problem
has rarely been discussed in the literature, the authors subsequently discovered that
some typographers have a word for it: they call such spaces ‘auxiliary’. Thus there is
a growing awareness of the problem.
It may be useful to list the main kinds of contexts in which auxiliary spaces were
used in Seminumerical AZgorithms, since that book ranges over a wide variety of tech-
nical subjects. The following rules should prove to be helpful to compositors who are
keyboarding technical manuscripts into a computer.
1. Use auxiliary spaces in cross-references:
Theorem&A Algorithm&B Chapter&3 Tablek4 Programs E and&F
Note that no & appears after ‘Programs’ in the last example, since it would be
quite all right to have ‘E and F’ at the beginning of a line.
2. Use auxiliary spaces between a person’s forenames and between multiple sur-
names:
&.&I .&J. Matrix LuiskI. Trabb&Pardo Peter Van&Emde&Boas
A recent trend to avoid spaces altogether between initials may be largely a reaction
against typical computer line-breaking algorithms! Note that it seems better to
hyphenate a name than to break it between words; e.g., ‘Don-’and ‘ald E. Knuth’
is more tolerable than ‘Donald’ and ‘E. Knuth’. In a sense, rule 1 is a special
case of rule 2, since we may regard ‘Theorem A’ as a name; another example is
‘register&X’.
3. Use auxiliary spaces for symbols in apposition with nouns:
base&b dimensionkd function&f(x) string&sof lengthkl
However, compare the last example with ‘stringks of length k o r more’.
4. Use auxiliary spaces for symbols in series:
1,&2,or&3 a,&b, and&c l,&2, . . . ,&n
5 . Use auxiliary spaces for symbols as tightly-bound objects of prepositions:
of&x from 0 to&l increase z by&l in common with&m
This does not apply with compound objects: For example, type ‘of u&and&v’.
6. Use auxiliary spaces to avoid breaking up mathematical phrases that are rendered
in words:
equals&n less thanks mod&2 modulo&p‘ (given&X)
Also type ‘If &is. . .’, ‘when xkgrows’. Compare ‘is&15’,with ‘is 15ktimes the
height’; and compare ‘for all largekn’ with ‘for all nkgreater than&n,,’.
7 . Use auxiliary spaces when enumerating cases:
(b)&Showthat f(x) is (l)&continuous; (2)&bounded.
BREAKING PARAGRAPHS INTO LINES 1137

It would be nice to boil these seven rules down into one or two, and it would be even
nicer if the rules could be automated so that keyboarding could be done without them;
but subtle semantic considerations seem to be involved in many of these instances.
Most examples of psychologically bad breaks seem to occur when a single symbol or a
short group of symbols appears just before or after the break; one could do reasonably
well with an automatic scheme if it would associate large penalties with a break just
before a short non-word, and medium penalties with a break just after a short non-
word. Here ‘short non-word’ means a sequence of symbols that is not very long, yet long
enough to include instances like ‘exercise&lS(b)’, ‘length&~2~”, ‘order&n/2’followed by
punctuation marks; one should not simply consider patterns that have only one or two
symbols. On the other hand it is not so offensive to break before or after fairly long
sequences of symbols; e.g., ‘exercise 4.3.2-15’ needs no auxiliary space.
Many books on composition recommend against breaking just before the final word
of a paragraph, especially if that word is short; this can, of course, be done by using
an auxiliary space just before that last word, and the computer could insert this
automatically. Some books also give recommendations analogous to rule 2 above,
saying that compositors should try not to break lines in the middle of a person’s
name. But there is apparently only one book that addresses the other issues of psycho-
logically bad breaks, namely a nineteenth-century French manual by A. Frey”, where
the following examples of undesirable breaks are mentioned (vol. 1, p. 110):

Henri&IV M.&Colin le’&sept. art.&25 20&fr.

It seems to be time to resurrect such old traditions of fine printing.


Recent experience of the authors indicates that it is not a substantial additional
burden to insert auxiliary spaces when entering a manuscript into a computer. The
careful use of such spaces may in fact lead to greater job satisfaction on the part of
the keyboard operator, since the quality of the output can be noticeably improved
with comparatively little work. It is comforting at times to know that the machine
needs your help.

Author lines
Most of the review notices published in Mathematical Reviews are signed with the
reviewer’s name and address, and this information is typeset flush right, i.e., at the
right-hand margin. If there is sufficient space to put such a name and address at the
right of the final line of the paragraph, the publishers can save space, and at the same
time the results look better because there are no strange gaps on the page. During
recent years the composition software used by the American Mathematical Society
was unable to do this operation, but the amount of money saved on paper made it
economical for them to pay someone to move the reviewer-name lines up by hand
wherever possible, applying scissors and (real) glue to the computer output.

This is a case where the name and address fit in nicely


with the review. A. Reviewer (Ann Arbor, Mich.)
But sometimes an extra line must be added.
N.Bourbaki (Paris) Figure 8. The M R problem.
1138 DONALD E. KNUTH AND MICHAEL F. PLASS

Let us say that the ‘MR problem’ is to typeset the contents of a given box flush right
at the end of a given paragraph, with a space of at least w between the paragraph and
the box if they occur on the same line. This problem can be solved entirely in terms
of the box/glue/penalty primitives, as follows:

(text of the given paragraph)


penalty(0,00, 0)
glue(0,100000,0)
penalty(0, 50, 0)
g w w , 070)
box(0)
penalty(0, co,0)
glue(0,100000,0)
(the given box)
penalty(0, - m,0)

T h e final penalty of - co forces the final line break with the given box flush right; the
two penalties of + co are used to inhibit breaking at the following glue items. Thus,
the above sequence reduces to two cases: whether or not to break at the penalty of 50.
If a break is taken there, the ‘glue(w, 0,O)’ disappears, according to our rule that each
line begins with a box; the text of the paragraph preceding the penalty of 50 will be
followed b y ‘glue(0, 100000, O)’, which will stretch to fill the line as if the paragraph
had ended normally, and the given box on the final line will similarly be preceded by
‘glue(0, 100000,O)’to fill the gap at the left. O n the other hand if no break occurs at
the penalty of 50, the net effect is to have the glues added all together, producing

(text of the given paragraph)


glue(w, 200000,O)
(the given box)

so that the space between the paragraph and the box is w or more. Whether the break is
chosen or not, the badness of the two final lines or the final line will be essentially zero,
because so much stretchability is present. T h u s the relative cost differential separating
the two alternatives is almost entirely due to the penalty of 50. T h e optimum-fit
algorithm will choose the better alternative, based on the various possibilities it has
for setting the given paragraph; it might even make the given paragraph a little bit
tighter than its usual setting, if this words out best.

Ragged right margins


We observed in Figure 6 that an optimum line-breaking algorithm intended for
justified text does a fairly good job at making lines of nearly equal length even when
the lines aren’t justified afterwards. However, it is not hard to construct examples
in which the justification-oriented method makes bad decisions, since the amount of
deviation in line width is weighted by the amount of stretchability or shrinkability
that is present. A line containing many words, and therefore containing many spaces
between words, will not be considered problematical by the justification criteria even
if it is rather short or rather long, because there is enough glue present to stretch or
shrink gracefully to the correct size. Conversely, when there are few words in a line, the
BREAKING PARAGRAPHS INTO LINES 1139

algorithm will take pains to avoid comparatively small deviations. This is illustrated
in Figure 5 , which actually reads better than the corresponding paragraph in Figure 6
(except for the word that sticks out on the first line); hyphens were inserted into the
paragraph of Figure 6 in order to create more interword space for justification.
Although the box/glue/penalty model appears at first glance to be oriented solely to
the problem of justified text, we shall now see that it is powerful enough to be adapted
to the analogous problem of unjustified typesetting: If the spaces between words are
handled in the right way, we can make things work out so that each line has the same
amount of stretchability, no matter how many words are on that line. The idea is to
let spaces between words be represented by the sequence
glue(0,18,0)
penalty(0, 0,O)
glue(6, -18,O)
instead of the ‘glue(6,3,2)’ we used for justified typesetting. We may assume that there
is no break at the ‘glue(O,18,0)’ in the sequence, because it will always be at least as
good for the algorithm to break at the ‘penalty(0, 0, O)’, when 18 units of stretchability
are present. If a break occurs at the penalty, there will be a stretchability of 18 units
on the line, and the ‘glue(6, -18,O)’ will be discarded after the break so that the next
line will begin flush left. On the other hand if no break occurs, the net effect is to have
glue(6,0,0), representing a normal space with no stretching or shrinking.
Note that the stretchability of -18 in the second glue item has no physical signifi-
cance, but it nicely cancels out the stretchability of +18 in the first glue item. Negative
stretchability has several interesting applications, so the reader should study this
example carefully before proceeding to the more elaborate constructions below.
Optional hyphenations in unjustified text can be specified in a similar way; instead
of using ‘penalty(6,50,1)’ for an optional 6-unit hyphen having a penalty of 50, we
can use the sequence

penalty(0, 00 ,0)
glue(0,18,0)
penalty(6,500,1)
glue(0, -18,O).

The penalty has been increased here from 50 to 500, since hyphenations are not as
desirable in unjustified text. After the breakpoints have been chosen using the above
sequences for spaces and for optional hyphens, the individual lines should not actually
be justified, since a hyphen inserted by the ‘penalty(6,500,1)’would otherwise appear
at the right margin.
I t is not difficult to prove that this approach to ragged-right typesetting will never
lead to words that ‘stick out’ in the sense mentioned above; the total demerits are
reduced whenever a word that sticks out is moved to the following line.

Centered text
Occasionally we want to take some text that is too long to fit on one line and break
it into approximately equal-size parts, centering the parts on individual lines. This is
most often done when setting titles or captions, but it can also be applied to the text
of a paragraph, as shown in Figure 9.
1140 DONALD E. KNUTH AND MICHAEL F. PLASS

In olden times when wishing still helped one, there lived a king
whose daughters were all beautiful; and the youngest was
so beautiful that the sun itself, which has seen so much, was
astonished whenever it shone in her face. Close by the king’s castle
lay a great dark forest, and under an old lime-tree in the forest was
a well, and when the day was very warm, the king’s child went
out into the forest and sat down by the side of the cool fountain;
and when she was bored she took a golden ball, and threw it up
on high and caught it; and this ball was her favorite plaything.
Figure 9 . ‘Ragged-centered‘ text: The optimum-$t algorithm will produce special efJects like this,
when appropriate combinations of box/gluelpenalty items are used f o r the spaces between words.

Boxes, glue, and penalties can perform this operation, in the following way: (a) At
the beginning of the paragraph, use ‘glue(O,l8,0)’ instead of an indentation. (b) For
each space between words in the paragraph, use the sequence

glue(0,18,0)
penalty(O,O,0)
glue(6, -36’0)
box(0)
penalty(O,cc, 0)
glue(0,18,0).

(c) End the paragraph with the sequence

glue(0,18,0)
penalty(0, - CO, 0).

The tricky part of this method is part (b), which ensures that an optional break
a t the ‘penalty(O,O,O)’ puts stretchability of 18 units at the end of one line and at
the beginning of the next. If no break occurs, the net effect will be glue(0,18,0)+
glue(6, -36,0)+glue(O, 18,O) = glue(6,0,0), a fixed space of 6 units. The ‘box(0)’
contains no text and occupies no space; its function is to keep the ‘glue(O,18,0)’from
disappearing at the beginning of a line. The ‘penalty(0, 0,O)’ item could be replaced
by other penalties, to represent breakpoints that are more or less desirable. However,
this technique cannot be used together with optional hyphenation, since our box/glue/
penalty model is incapable of inserting optional hyphens anywhere except at the right
margin when lines are justified.
The construction used here essentially minimizes the maximum gap between the
margins and the text on any line; and subject to that minimum it essentially minimizes
the maximum gap on the remaining lines; and so forth. The reason is that our defini-
tions of ‘badness’ and ‘demerits’ reduce in this case so that the sum of demerits for
any choice of breakpoints is approximately proportional to the sum of the sixth powers
of the individual gaps.

ALGOL-like languages
One of the most difficult tasks in technical typesetting is to get computer programs
to look right. In addition to the complications of mathematical formulas and a variety
BREAKING PARAGRAPHS I N T O LINES 1141

const n = 10000;
var sieve, primes :
setof2..n;
next,j : integer;
begin { initialize }
sieve := [2. .n];
primes := [I;
nezt := 2;
repeat { find next
prime }
while not (nezt in
sieve) do
next :=
succ (next);
primes :=
+
primes [nezt];
j := next;
Figure 10. These two settings of a sample P A S C A L program
while j <= n do were made from identical input specifications in the
{ eliminate } boxlgluelpenalty model; in the jirst case the lines were set 100
begin sieve := points wide, and in the second case the width was 250points. All of
sieve - b]; the line-breaking and identation was produced automatically by
j :=j + next the optimum-fit algorithm, which has no specific knowledge of
P A S C A L . Compilation of the P A S C A L source code into boxes,
end glue, and penalties was done mechanically.
until sieve = [I
end.
eonst n = 10000;
var sieve, primes : set of 2 . .n;
next,j : integer;
begin { initialize }
sieve := [2. .n];primes := [ 1; next :=2;
repeat { find next prime }
while not(nezt in sieve) do next := succ(next);
+
primes := primes [next];j := next;
while j <= n do { eliminate }
begin sieve := sieve - b];j :=j +
nezt
end
until sieve = [ ]
end.

of typestyles and spacing conventions, it is important to indent the lines suitably


in order to display the program structure. Sometimes a single statement must be
broken across several lines; sometimes a number of short statements should be grouped
together on a single line. Authors who attempt to publish programs in journals that
are not accustomed to computer science material soon discover that very few printing
establishments have the expertise necessary to handle ALGOL-like languages in a
satisfactory way.
1142 DONALD E. KNUTH A N D MICHAEL F. PLASS

Once again, the concepts of boxes, glue, and penalties come to the rescue: I t turns out
that our line-breaking methods developed for ordinary text can be used without change
to do the typesetting of programs in ALGOL-like languages. For example, Figure 10
shows a typical program taken from the PASCAL manual’’ that has been typeset
assuming two different column widths. Although these two settings of the program do
not look very much alike, they both were made from exactly the same input, specified
in terms of boxes, glue, and penalties; the only difference was the specification of line
width. (The input text in this example was prepared by a computer program called
BLAISEI2, which will translate any PASCAL source text into a T E X file that can be
incorporated within other documents.)
The box/glue/penalty specifications that lead to Figure 10 involve constructions
similar to those we have seen above, but with some new twists; it will be sufficient for
our purposes merely to sketch the ideas instead of dwelling on the details. One key
point is that the breaks are chosen by the minimum-demerits criteria we have been
discussing, but the lines are not justified afterwards (i.e., the glue does not actually
stretch or shrink). The reason is that relations and assignment statements are processed
by TEX’Snormal ‘math mode’, which allows line breaks to occur in various places but
without any special constructions particular to this application, so that justification
would have the undesirable effect of putting all such breaks at the right margin. The
fact that justification is suppressed actually turns out to be an advantage in this case,
since it means that we can insert glue stretching wherever we like, within a line, if it
affects the ‘badness’ formula in a desirable way.
Each line in the wider setting of Figure 10 is actually a ‘paragraph’ by itself, so it
is only the narrower setting that shows the line-breaking mechanism at work. Every
‘paragraph’ has a specified amount of indentation for its first line, corresponding to its
position in the program, as a given number t of ‘tab’ units; the paragraph is also given
a hanging indentation of t + 2 tab units. This means that all lines after the first are
required to be two tabs narrower than the first line, and they are shifted two tabs to
the right with respect to that line. In some cases (e.g., those lines beginning with ‘var’
or ‘while’)the offset is three tabs instead of two.
The paragraph begins with ‘glue(0, 100000,O)’, which has the effect of providing
enough stretchability that the line-breaking algorithm will not wince too much at
breaks that do not square perfectly with the right margin, at least not on the first line.
Special breaks are inserted at places where T E X would not normally break in math
mode; e.g., the sequence

penalty(0, co ,0)
glue(0,lOOOOO,O)
penalty(0,50,0)
glue(0, -100000,O)
box(0)
penalty(O,oo, 0)
glue(0,lOOOOO,O)

has been inserted just before ‘primes’ in the v a r declaration. This sequence allows
a break with penalty 50 to the next line, which begins with plenty of stretchability.
A similar construction is used between assignment statements, for example between
‘sieve : = [ 2 . . n];’ and ‘primes : = []’, where the sequence is
BREAKING PARAGRAPHS INTO LINES 1143

penalty(O,oo, 0)
glue(0,100000,0)
penalty(0, 0 , O )
+
glue(6 2w,-100000,O)
box(0)
penalty(0, 00, 0)
glue( -2w, 100000,O);

here w is the width of a tab unit. If a break occurs, the following line begins with
‘glue( -2w, 100000,O)’, which undoes the effect of the hanging indentation and effec-
tively restores the state at the beginning of a paragraph. If no break occurs, the net
effect is ‘glue(6,lOOOOO,O)’,a normal space.
No automatic system can hope to find the best breaks in programs, since an under-
standing of the semantics will indicate that certain breaks make the program clearer
and reveal its symmetries better. However, dozens of experiments on a wide variety
of PASCAL source texts have shown that this approach is surprisingly effective; fewer
than 1% of the line-breaking decisions have been overridden by authors of the
programs in order to provide additional clarity.

A complex index
The final application of line breaking that we shall study is the most difficult one
that has so far been encountered by the authors; it was solved only after acquiring more
than two years of experience with more straightforward line-breaking tasks, since the
full power of the box/glue/penalty primitives was not immediately apparent. The task
is illustrated in Figure 11, which shows excerpts from a ‘Key Index’ in Mathematical
Reviews. Such an index now appears at the end of each volume, together with an
‘Author Index’ that has a similar format.
As in Figure 10, the examples in Figure 11 were generated by the same source input
that was typeset using different line widths, in order to indicate the various possibilities
of breakpoints. Each entry in the index consists of two parts, the name part and the
reference part, both of which might be too long to fit on a single line. If line breaks
occur in the name part, the individual lines are to be set with a ragged right margin,
but breaks in the reference part are to produce lines with a ragged left margin. The
two parts are separated by leaders, a row of dots that expands to fill the space between
them; leaders are introduced by a slight generalization of glue that typesets copies
of a given box into a given space, instead of leaving that space blank. A hanging
indentation is applied to all lines but the first, so that the first line of each entry is
readily identifiable. One of the goals in breaking such entries is to minimize the white
space that appears in ragged-right or ragged-left lines. A subsidiary goal is to minimize
the number of lines that contain the reference part; for example, if it is possible to fit
all of the references on one line, the line-breaking algorithm should do so. The latter
event might mean that a break occurs after the leaders, with the references starting
on a new line; in such a case the leaders should stop a fixed distance w 1from the right
margin. Furthermore, the ragged-right lines should all be at least a fixed distance w 2
from the right margin, so that there is no chance of confusing part of the name with
part of the reference material. The individual boxes to be replicated in the leaders
are w 3 units wide.
1144 DONALD E. KNUTH AND MICHAEL F. PLASS

ACM Symposium on Principles of Programming


Languages, Third (Atlanta, Ga., 1976), selected
papers ..................................... .*1858
ACM Symposium on Theory of Computing, Eighth
Annual (Hershey, Pa., 1976) ........1879, 4813,
5414, 6918, 6936, 6937, 6946, 6951, 6970, 7619,
9605, 10148, 11676, 11687, 11692, 11710, 13869
Software .................................. See t1858

ACM Symposium on Principles of


Programming Languages, Third
(Atlanta, Ga., 1976), selected papers
ACM Symposium ................................. *1858
on Principles of ACM Symposium on Theory of
Programming Computing, Eighth Annual
Languages, Third (Hershey, Pa., 1976) ..........
(Atlanta, Ga., 1976), 1879, 4813, 5414, 6918, 6936, 6937,
selected papers ....*1858 6946, 6951, 6970, 7619, 9605, 10148,
ACM Symposium on 11676, 11687, 11692, 11710, 13869
Theory of Computing, Software ...................... See *1858
Eighth Annual
(Hershey, Pa., 1976)
........ 1879, 4813, 5414,
6918, 6936, 6937, 6946,
6951, 6970, 7619, 9605,
10148, 11676, 11687, Figure 1 1 . These three extracts from a ‘ K e y Index’ were all
typset from identical input, with respective column widths of
11692, 11710, 13869 225 points, 175 points, and 125 points. Note the combination
Software ......... See *1858 of ragged right and ragged left setting, and the ‘dot leaders’.

The ground rules are illustrated in Figure 1 1 , where there is a hanging indentation
of 27 units, and w1 = 45, w2 = 9, w 3 =7-2; the digits are 9 units wide, and the
respective column widths are 405 units, 3 15 units, and 225 units. The entry for ‘Theory
of Computing’ shows three possibilities for the leader dots: They can share a line with
the end of the name part and the beginning of the reference part, or they can end a
line before the reference part or begin a line after the name part.
Here is how all this can be encoded with boxes, glue, and penalties: (a) Each blank
space in the name part is represented by the sequence

penalty(0, cc,0)
g w w , , 1890)
penalty(O,O,0)
glue(6-w2, -18,2)

which yields ragged right margins and spaces that can shrink from 6 units to 4 units
if necessary. (b) T h e transition between name part and reference part is represented
BREAKING PARAGRAPHS INTO LINES 1145

by sequence (a) followed by

box(0)
penalty(0, co , 0)
leaders(3w3, 100000,3w,)
g w w , 070)
7

penalty(0, 0,O)
glue(-ww,,-18,O)
box(0)
penalty(0, co,0)
glue(0,18,0).

(c) Each blank space in the reference part is represented by the sequence

penalty(0,999,0)
glue(6, - 18,2)
box(0)
penalty(0, 00 ,0)
glue(0,18, O),
which yields ragged left margins and 6-unit to 4-unit spaces.
Parts (a) and (c) of this construction are analogous to things we have seen before;
the 999-point penalties in (c) tend to minimize the total number of lines occupied by
the reference part. The most interesting aspect of this construction is the transition
sequence (b), where there are four possibilities: If no line breaks occur in (b), the net
result is

(name part) glue(6,0,2) (leaders) (reference part),

which allows leader dots to appear between the name and reference parts on the current
line. If a line break occurs before the leaders, the net result is

(name part) glue(6,0,2)


(leaders) (reference part),

so that we have a break essentially like that after a blank space in the name part,
and the dot leaders begin the following line. If a line break occurs after the leaders,
the net result is
(name part) glue(6,0,2) (leaders) glue(wl, 0,O)
glue(0,18,0) (reference part),
so that we have a break essentially like that after a blank space in the reference part but
without the penalty of 999; the leaders end w 1 units from the right margin. Finally,
if breaks occur both before and after the leaders in (b), we have a situation that always
has more demerits than the alternative of breaking only before the leaders.
When the choice of breakpoints leaves room for at least 3w3 units of leaders, we
are sure to have at least two dots, but we might not have three dots since leader dots
on different lines are aligned with each other. The glue in other blank spaces on the
line with the leaders will shrink if there is less than 3w3 of space for the leaders, and
1146 DONALD E. KNUTH AND MICHAEL F. PLASS

this tends to make it more likely that the leader dots will not disappear altogether;
however, in the worst case the space for leaders will shrink to zero, so there might
not be any dots visible. It would be possible to ensure that all the leaders contain at
least two dots, by simply setting the shrink component of the leader item in (b) to
zero. This would improve the appearance of the resulting output; but unfortunately
it would also increase the length of the author indexes by about 15 per cent, and such
an expense would probably be prohibitive.
A preliminary version of this construction has been used with T E X to prepare the
indexes of Mathematical Reviews since November, 1979. However, the items ‘box(0)
penalty(0, co,0)’were left out of (b), for compatibility with earlier indexes prepared by
other typesetting software; this means that the leaders disappear completely whenever
a break occurs just before them, and the resulting indexes have unfortunate gaps of
white space that spoil their appearance.

A N ALGEBRAIC APPROACH
T h e examples we have just seen show that boxes, glue, and penalties are quite versatile
primitives that allow a user to obtain a wide variety of effects without extending the
basic operations needed for ordinary typesetting. However, some of the constructions
may have seemed like ‘magic’; they work, but it isn’t clear how they were ever conceived
in the first place. We shall now study a fairly systematic way to deal with these
primitives in order to assess their full potentiality; this brief discussion is independent
of the remainder of the paper and can be omitted.
In the first place it is clear that

box(w) box(w’) = box(w + w’),


if we ignore the contents of the boxes and consider only the widths; only the widths
enter into the line-breaking criteria. This formula says that any two consecutive boxes
can be replaced by a single box without affecting the choice of breakpoints, since breaks
do not occur at box items. Similarly it is easy to verify that

glue(w, y, z ) glue(w’,y’, z’) = glue(w + w’,y +y’, z + z‘),


since there will be no break at glue(w’,y‘,z’), and since a break at glue(w,y,z) is
equivalent to a break at glue(w+w’,y+y’,z+z’).
Under certain circumstances we can also combine two adjacent penalty items into a
+
single one; for example, if - 00 < p, p’< 00 we have

penalty(w,p,f) penalty(w,p’,f) = penalty(w, min(p,p’),f)

with respect to any optimal choice of breakpoints, since there are fewer demerits asso-
ciated with the smaller penalty. However, it is not always possible to replace the general
sequence ‘penalty(w,p, f) penalty(w’,p’,f’)’ by a single penalty item.
We can assume without loss of generality that all box items are immediately followed
by a sequence of the form ‘penalty(O,oo, 0) glue(w, y, z)’. For if the box is followed by
another box, we can combine the two; if it is followed by a penalty item with p < 00,
we can insert ‘penalty(0, CC, 0) glue(0, 0,O)’; if it is followed by ‘penalty(w, co ,f)’
we can
BREAKING PARAGRAPHS INTO LINES 1147

assume that w = f = 0 and thgt the following item is glue; and if the box is followed
by glue, we can insert ‘penalty(0,00,0) glue(0,0, O)penalty(O,0,O)’. Furthermore we can
delete any penalty item with p = if it is not immediately preceded by a box item.
Thus, any sequence of box/glue/penalty items can be converted into a ‘normal form’,
where each box is followed by a penalty of CO, each penalty is followed by glue, and
each glue is either followed by a penalty < co or by a box. We assume that there is
only one penalty - 00, and that it is the final item, since a forced line break effectively
separates a longer sequence into independent parts. It follows that the normal-form
sequences can be written
XIXz...X,penalty(w, -00, j-,)

where each Xi is a sequence of items having the form


box(w)penalty(O, 00,0) glue(w’,y, z )
or the form

penalty(v, P,f 1 glue(w, YJ z).


Let us use the notation bpg(w+w’,y,z) for the first of these two forms, noting
that it is a function of w + w r rather than of w and w‘ separately; and let us write
pg(v,p,f,w,y, z ) for X’s of the second form. We can assume that the sequence of X’s
contains no two bpg’s in a row, since

bpg(w,y,z) bpg(w’,y’,z’) = bpg(w+w’,y+y’,z+z‘).


Familiarity with this algebra of boxes, glue, and penalties makes it a fairly simple
matter to invent constructions for special applications like those listed above, whenever
such constructions are possible. For example, let us consider a generalization of the
problems arising in ragged-right, ragged-left, and ragged-centered text: We wish to
specify on optional break between words such that if no break occurs we will have
the sequence
(end of textl) glue(wl,yl, zl) {beginning of text2)
on one line, while if a break does occur we will have

(end Of text 1) y2 z 2 )
9 J pJf)
glue(w3,y3,z 3 )(beginning of text,)
on two lines. A consideration of normal forms shows that the most general thing we
can do is to insert the sequence

bpg(wJy>z, pg(w,>P,f,w:Y: z‘)bpg(w’:y’: zrf)

between text, and textz, where no additional text is associated with the two inserted
bpg’s. Our job reduces therefore to determining appropriate values of w, y, z, w’, y’, z’,
w”, y”, zff,and these can be obtained immediately by solving the equations
W+WwI+W” = wl, y + y f + y r r=y1, Z+zr+X’I = z,;
w = w2, Y =y2, z = z,;
w’f = w 39 Y” = Y3, zrf= 273.
1148 DONALD E. KNUTH AND MICHAEL F. PLASS

Once a construction has been found in this way, it can be simplified by undoing
the process we have used to derive normal forms and by using other properties of
box/glue/penalty algebra. For example, we can always delete the penalty co item in
a sequence like

if y 2 0 and z 2 0 and p < 0, since a break at the glue is always worse than a break
at the penalty p .

I N T R O D U C T I O N T O THE A L G O R I T H M
T h e general ideas underlying the optimum-fit algorithm for line breaking can probably
be understood best by considering an example. Figure 12 repeats the paragraph of
Figure 4(c) and includes little vertical marks to indicate ‘feasible breakpoints’ found
by the algorithm. A feasible breakpoint is a place where the text of the paragraph from
the beginning to this point can be broken into lines whose adjustment ratio does not
exceed a given tolerance; in the case of Figure 12, this tolerance was taken to be unity.
Thus, for example, there is a tiny mark after ‘fountain;’ since there is a way to set the
paragraph up to this point with ‘fountain;’ at the end of the 7th line and with none of
lines 1 to 7 having a badness exceeding 100 (cf. Figure 4(a)).
T h e algorithm proceeds by locating all of the feasible breakpoints and remembering
the best way to get to each one, in the sense of fewest total demerits. This is done
by keeping a list of ‘active’ breakpoints, representing all of the feasible breakpoints
that might be a candidate for future breaks. Whenever a potential breakpoint b is
encountered, the algorithm tests to see if there is any active breakpoint a such that
the line from a to b has an acceptable adjustment ratio. If so, b is a feasible breakpoint
and it is appended to the active list. T h e algorithm also remembers the identity of
the breakpoint a that minimizes the total demerits, when the total is computed from
the beginning of the paragraph, through a, to 6 . When an active breakpoint a is
encountered for which the line from a to b has an adjustment ratio less than -1 (i.e.,
when the line can’t be shrunk to fit the desired length), breakpoint a is removed from
the active list. Since the size of the active list is essentially bounded by the maximum
number of words per line, the running time of the algorithm is bounded by this
quantity (which usually is small) times the number of potential breakpoints.
For example, when the algorithm begins to work on the paragraph in Figure 12,
there is only one active breakpoint, representing the beginning of the first line. It is
infeasible to have a line starting there and ending at ‘In’, or ‘olden’, . . . , or ‘lived’,
since the glue between words does not accumulate enough stretchability in such short
segments of the text; but after the next word ‘a’ is encountered, a feasible breakpoint
is found. Now there are two active breakpoints, the original one and the new one.
After the next word ‘king’, there are three active breakpoints; but after the next word
‘whose’, the algorithm sees that it is impossible to squeeze all of the text from the
beginning up to ‘whose’ on one line, so the initial breakpoint becomes inactive and
only two active ones remain.
Skipping ahead, let us consider what happens when the algorithm considers the
potential break after ‘fountain;’. At this stage there are eight active breakpoints,
following the respective text boxes for ‘child’, ‘went’, ‘out’, ‘side’, ‘of‘, ‘the’, ‘cool’,
BREAKING PARAGRAPHS INTO LINES 1149

I In olden times when wisung still helped one, there lived a’


king‘ whose daughters were all beaqtqul; and the young@ wad .a46
sd bea\tiJul that the sun itgelf, which has seen so much, wad .6sT

asto4shed wheqper it shone in her face. Close by the king’$ .s14


castle‘ lay a great dark forpt, and uqler an old lim%tree in’ the‘ .OaT
fo?’st‘ wad d well, and when the day was very warm, the‘ king’s‘ .I73
child went’ out’ into the for-st and sat down by the side‘ of thd -346
cool’ fouqtain: and when she was bored she took a golden’ ball,’ .aTs
and threw‘ it‘ up’ on’ high and caught it; and this ball wad he? .693

favorjtd pla&hing. .om

Figure 12. Tiny vertical marks show ‘feasible breakpoints’ where it is possible to break
in such a w a y that no spaces need to stretch more than their given stretchability.

and ‘foun-’. T h e line starting after ‘child’ and ending with ‘fountain;’ would be too
long to fit, so ‘child’ becomes inactive. Feasible lines are found from ‘went’ or ‘out’
to ‘fountain;’ and the demerits of those lines are 276 and 182, respectively; however,
the line from ‘went’ actually turns out to be preferable, since there are substantially
fewer total demerits from the beginning of the paragraph to ‘went’ than to ‘out’. Thus,
‘fountain;’ becomes a new active breakpoint. T h e algorithm stores a pointer back from
‘fountain;’ to ‘went’, meaning that the best way to get to a break after ‘fountain;’ is
to start with the best way to get to a break after ‘went’.
T h e computation of this algorithm can be represented pictorially by means of the
network in Figure 13, which shows all of the feasible breakpoints together with the
number of demerits charged for each feasible line between them. T h e object of the
algorithm is to compute the shortest path from the top of Figure 13 to the bottom,
using the demerit numbers as the ‘distances’ corresponding to individual parts of the
path. In this sense, the job of optimal line breaking is essentially a special case of the
problem of finding shortest paths in an acyclic network; the line-breaking algorithm is
slightly more complex only because it must construct the network at the same time as
it is finding the shortest path.
Notice that the best-fit algorithm can be described very easily in terms of a network
like Figure 13: it is the algorithm that simply chooses the shortest continuation at every
step. And the first-fit algorithm can be characterized as the method of always taking
the leftmost branch having a negative adjustment ratio (unless it leads to a hyphen,
in which case the rightmost non-hyphenated branch is chosen whenever there is a
feasible one). From these considerations we can readily understand why the optimum-
fit algorithm tends to do a much better job.
Sometimes there is no way to continue from one feasible breakpoint to any other.
This situation doesn’t occur in Figure 13, but it would be present below the word ‘so’
if we had not permitted hyphenation of ‘astonished’. In such cases the first-fit and
best-fit algorithms must resort to infeasible lines, while the optimum-fit algorithm can
usually find another way through the maze.
O n the other hand, some paragraphs are inherently difficult, and there is no way to
break them into feasible lines. In such cases the algorithm we have described will find
that its active list dwindles until eventually there is no activity left; what should be
done in such a case? It would be possible to start over with a more tolerant attitude
1150 DONALD E. K N U T H A N D MICHAEL F. PLASS

Figure 13. This network shows the feasible breakpoints and the number of demerits
charged when going from one breakpoint to another. The ‘shortest path’from the top to
the bottom corresponds to the best way to typeset the paragraph, if w e regard the demerits
as distances.

toward infeasibility (a higher threshold value for the adjustment ratios). Alternatively,
TEX takes the attitude that the user wants to make some manual adjustment when
there is no way to meet the specified criteria, so the active list is forcibly prevented from
becoming empty by simply declaring a breakpoint to be feasible if it would otherwise
leave the active list empty. This results in an overset line and an error message that
encourages the user to take corrective action.
Figure 14 shows what happens when the algorithm allows quite loose lines to be
feasible; in this case a line is considered to be infeasible only if its adjustment ratio
exceeds 10 (so that there would be more than two ems of space between words).
Such a setting of the tolerances would be used by people who don’t want to make
manual adjustments to paragraphs that cannot be set well. The tiny marks rhat
indicate feasible breakpoints have varying lengths in this illustration, with longer marks
BREAKING PARAGRAPHS INTO LINES 1151

' In olden times when wiswng still helped one,' there lived a'
kind whose daughters were all beau&ijful;and the young& wad .*A6

so' bea<ti;ful' that' the sun i$elf,' which' has' seen so much,' wad .66T

astonjshed whedever it' shone in her' face.' Close' by' the' king'$ .614

castle' lay' 8 great' dark' forkst,' and'udder' ad old lim%,ree' id the' .OIT

fo?'st' wad a' well,' and' when the day wad v e d warm,' the' king'$ .ITS

child went' out' into the forkst and sat' down' by' the' side' of thd .346

cool' foudtain; and' when she wad bored she' tooli a' golden' ball,' .lTK

and thred it' up' on' high' and caught! it: and thid balr wad he? .603

favo4td play$hing. .ooa

Figure 14. When the tolerance is raised to 10 times the stretchability, more breakpoints
become feasible, and there are many more possibilities to explore.

indicating places that can be reached via better paths; the tiny dots are for breakpoints
that are just barely feasible. Notice that all of the potential breakpoints in Figure 14
are marked, except for a few in the first two lines; so there are considerably more
feasible breakpoints here than there were in Figure 12, and the network corresponding
to Figure 13 will be much larger. There are 836,272,858 feasible ways to set the para-
graph when such wide spaces are tolerated, compared to only 81 ways in Figure 12.
However, the number of active nodes will not be significantly bigger in this case than
it was in Figure 12, because it is limited by the length of a line, so the algorithm
will not run too much more slowly even though its tolerance has been raised and the
number of possible settings has increased enormously. For example, after 'fountain;'
there are now 17 active breakpoints instead of the 8 present before, so the processing
takes only about twice as long although huge numbers of additional possibilities are
being taken into account.
When the threshold allows wide spacing, the algorithm is almost certain to find a
feasible solution, and it will report no errors to the user even though some rather loose
lines may have been necessary. T h e user who wants such error messages should set the
tolerance lower; this not only gives warnings when corrective action is needed, it also
improves the algorithm's efficiency.
One of the important things to note about Figure 14 is that breakpoints can become
feasible in completely different ways, leading up to different numbers of lines before the
breakpoint. For example, the word 'seen' is feasible both at the end of line 3:

'In olden. . . lived/a . . . young-/est . . . seen'

and at the end of line 4:

'In olden . . .helped/one, . . .were/all . . .beau-/tiful . . . seen',

although 'seen' was not a feasible break at all in Figure 12. T h e breaks that put 'seen'
at the end of line 3 have substantially fewer demerits than those putting it on line 4
(approximately 1.68 x lo6 versus 1-28 x lo1'), so the algorithm will remember only
the former possibility. This is an application of the dynamic-programming 'principle
of optimality', which is responsible for the efficiency of our algorithm4: the optimum
breakpoints of a paragraph are always optimum for the subparagraphs they create.
1152 DONALD E. KNUTH AND MICHAEL F. PLASS

The area of a
circle is a mean propor-
tional between any two regular
and similar polygons of which one
circumscribes it and the other is iso-
perimetric with it. In addition, the area
of the circle is less than that of any cir-
cumscribed polygon and greater than that
of any isoperimetric polygon. And further,
of these circumscribed polygons, the one
that has the greater number of sides has
a smaller area than the one that has
a lesser number; but, on the other
hand, the isoperimetric polygon
that has the greater num-
ber of sides is the
larger.
- Galileo Galilei (1638)

1
turn, in the
following treatises, to
various uses of those triangles
whose generator is unity. But I leave out
many more than I include; it is extraurdinary how
fertile in properties this triangle is. Everyone can try his hand.
- Blaise Pascal (1654)

Figure 15. Examples of line breaking with lines of different sizes.

But the interesting thing is that this economy of storage would not be possible if the
future lines were not all of the same length, since differing line lengths might well
mean that it would be much better to put ‘seen’ on line 4 after all; for example, we
have mentioned a trick for forcing the algorithm to produce a given number of lines.
In the presence of varying line lengths, therefore, the algorithm would need to have
two separate list entries for an active breakpoint after the word ‘seen’. The computer
cannot simply remember the one with fewest total demerits, because the optimality
principle of dynamic programming would not be valid in such a case.
Figure 15 is an example of line breaking when the individual lengths are all different.
In such cases, the need to attach line numbers to breakpoints might mean that the
number of active breakpoints substantially exceeds the maximum number of words per
line, if the feasibility tolerance is set high; so it is desirable to set the tolerance low.
On the other hand, if the tolerance is set too low, there may be no way to break the
paragraph into lines having a desired shape. Fortunately, there is usually a happy
medium in which the algorithm has enough flexibility to find a good solution without
needing too much time and space. The data in Figure 16 shows, for example, that the
BREAKING PARAGRAPHS INTO LINES 1153

'The area of d .306

circle is a mean propor-' .a61

tional bqtween any two regula? .a


and similar polygons of which one'1.016
cirpqcribes it and the other is is&' i . a w
perhetric with it. In ad&tion, the areal
of the' circle is less than that of any cir;
c w c r i b e d polygon and greater than that! .OTa
of and is&ell$netric polygon. And fuaher,' .693

of these' ci$umpibed polygons, the one' l.6ai


that' had the' greater n u a e r of sides had sTa6

a' smallei areal than the one that had1.4a6


al lesser nurqber;' but, on the othe+l.IK6
Figure 16. Details of the feasible
hand,' the' isberhetric polygon' 1.161

breakpoints in the first example that has the' greater num-' .osO
of Figure 15, showing how the be? of sides id the' 4 0

optimum solution was found. larger. .OOO

algorithm did not have to do very much work to find an optimal solution for Galileo's
remarks on circles, when the adjustment ratio on each feasible line was required to be
2 or less; yet there was sufficient flexibility to make feasible solutions possible.
A good line-breaking method is especially important for technical typesetting, since
it is undesirable to break up mathematical formulas that appear in the text. Some of
the most difficult copy of this kind appears in Muthematical Reviews or in the answer
pages of The A r t of Computer Programming, since the material in those publications
is often densely packed with formulas. Figure 17 shows a typical example from the
answer pages of Seminumerical Algorithmsg, together with indications of the feasible
breaks when the adjustment ratios are constrained to be at most 1 . Although some
feasible breakpoints occur in the middle of formulas, they are associated with penalties
that make them comparatively undesirable, so the algorithm was able to keep all of
the mathematics of this paragraph intact.

' 15. (This procedure maintains four integers (A,B, C, D ) with the invariant meanind .1as
that "our remaining job is to output the continued fraction for (Ay + B)/(Cy + D); .as9

where y is the input yet to come.") Initially set j k c 0, (A,B, C ,D) (a,b, c, d),J
t t .oas
then input xj and set (A,B, C, D) + B,A, Cxj + D, C ) , j j + 1, ond
t ( h j 4- O+ -160

mord times until C + D has the same sign as C. (When j 2 1 and the input' had -606

not' terminated, we know that 1 < y < and when C + D has the samd sign'
00; .DDa

as d wd know therefore that (Ay + B)/(Cy + D) lies between ( A+ B)/(C+'Dland


A/C.Y Nod comes' the general step: JI no integer lies strictly between (A+'B)/(C+'DY -.u6

and A/C: Output! xk t [ A / C ] ,and set (A,B, C ,D ) t (C,D , A - xkc,B -'XkD): .a46
k +I k +' 1;' otherwise' input X j and set (A,B, C,D) t ( A x j + +'
B,A, Cxj D, C): .T6P

j +I j +' 1.' The' general' step' is' repeated ad infinitum. However, if at any time thd - 4 6 1
find z] id input: thd algorithm' immediately switches gears: It outputs the continued .air
fraction' fox' ( k j +'B)/(CZj +ID)!using Euclid's algorithm, and terminates. .000

Figure 17. A n example of the feasible breakpoints found by the algorithm in a paragraph
containing numerous mathematical formulas.
1154 DONALD E. KNUTH AND MICHAEL F. PLASS

In olden times when wishjng still helped one, there lived a .,no
king whose daughters were all beaqtiful; and the younsst was so -el16

beaqtiful that the sun ibelf, which has seen so much, was aston- 4 1 5

ished whenper it shone in her face. Close by the king’s castle lay -.a16
a great dark fopst, and uqder an old lim%tree in the for-st was
a well, and when the day was very warm, the king’s child went . l ~ ~

out into the forEst and sat down by the side of the cool fouqtain; -.538

and when she was bored she took a golden ball, and threw it up -.134

on high and caught it; and this ball was her favorite plaything.
In olden times when wishjng still helped one, there lived a .TnO

king whose daughters were all beaytgul; and the youngest was .a46

so beaqtiful that the sun iQelf, which has seen so much, was .661

astongshed wheqver it shone in her face. Close by the king’s


castle lay a great dark for-st, and uqder an old lim%tree in the . o m
forfist was a well, and when the day was very warm, the king’s a113

child went out into the forfist and sat down by the side of the a346

cool fouqtain; and when she was bored she took a golden ball, -111

and threw it up on high and caught it; and this ball was her -103

favorite play$hing. -661

In olden times when w i s w g still helped one, there lived l..OK

a king whose daughters were all bea&.ful; and the young- l.4~l

est was so beaqtaful that the sun iQelf, which has seen so 1.431

much, was astoqished whewver it shone in her face. Close


by the king’s castle lay a great dark forpt, and uqder an 1.461

old lim%tree in the fopst was a well, and when the day 1.8nI
was very warm, the king’s child went out into the for-st 1.886
and sat down by the side of the cool fouqtain; and when 1.551
she was bored she took a golden ball, and threw it up on 1 . 3 E O
high and caught it; and this ball was her favorite play- l.lTs

thing. .861

In olden times when wishjng still helped one, there 3.313


lived a king whose daugh,,ters were all bea&t@ul; and 3.610

the youngpst was so beaqtiful that the sun ibelf, which a.mn
has seen so much, was astonjshed whenper it shone 3.636

in her face. Close by the king’s castle lay a great 3.163


dark forEst, and uqder an old lim%tree in the for- 3.050
est was a well, and when the day was very warm, 3.616
the king’s child went out into the fopst and sat down
by the side of the cool fouqtain; and when she was 3.150

bored she took a golden ball, and threw it up on S.lao


high and caught it; and this ball was her favoqite play- 1.,76

thing. .B6¶

Figure 18. Paragraphs obtained when the ‘looseness’ parameter has been set to -1, 0,
f l ,and + 2 . As in Figure 14, the spaces have been allowed to stretch up to two ems before
being considered infeasible. Loose settings like this are sometimes necessary to balance a
page, but of course the effects are not beautiful when one goes to extremes.
BREAKING PARAGRAPHS INTO LINES 1155

MORE BELLS AND WHISTLES


The optimization problem we have formulated is to find breakpoints that minimize the
total number of demerits, where the demerits of a particular line depend on its badness
(i.e., on how much its glue must stretch or shrink) and on a possible penalty associated
with its final breakpoint; additional demerits are also added when two consecutive lines
end with hyphens (i.e., end at penalty items with f = 1). Two years of experience
with such a model of the problem gave excellent results, except that a few paragraphs
showed up where further improvement was possible.
The first two lines of Figures 4(a) and 4(b) illustrate a potential source of visual
disturbance that is not accounted for in the model we have been discussing: These
paragraphs begin with a tight line (having r = --741) immediately followed by a
loose line (having r = +-877). Although the two lines are not offensive in themselves
the contrast between tight and loose makes them appear worse. Therefore TEX’Snew
algorithm for line breaking recognizes four kinds of lines:

Class 0 (tight lines), where - 1 < r < - -5;


Class 1 (normal lines), where - . 5 < r < + . 5 ;
+ <
Class 2 (loose lines), where .5 r < 1; +
+
Class 3 (very loose lines), where r 2 -1.
Additional demerits are added when adjacent lines are not of the same or adjacent
classes, i.e., when a Class 0 line is preceded or followed by Class 2 or Class 3 , or when
Class 1 is preceded or followed by Class 3.
This seemingly simple extension actually forces the algorithm to work harder, be-
cause a feasible breakpoint may now have to be entered into the active list up to four
times in order to preserve the dynamic-programming principle of optimality. For ex-
ample, if it is feasible to end at some point with both a Class 0 line and a Class 2 line,
we must remember both possibilities even though the Class 0 choice has more demerits,
because it might be desirable to follow this breakpoint with a tight line. On the other
hand, we need not remember the Class 0 possibility if its total demerits exceed those of
the Class 2 break plus the demerits for contrasting lines, since the Class 0 breakpoint
will never be optimum in such a case.
More experience is needed to determine whether or not the additional computation
required by this extension is worthwhile. It is comforting for the user to know that the
line-breaking algorithm takes such refinements into account, but there is no point in
doing the extra work if the output is hardly ever improved.
Another extension to the algorithm is needed to raise it to the highest standards of
quality for hand composition: Sometimes it is desirable to set a paragraph so that it
comes out one line longer or shorter than its optimum length, because this will avoid
an isolated ‘widow line’ a t the top or bottom of a page, or because it will make the
total number of lines even so that the material can be divided into two equal columns.
Although the paragraph itself will not be in its optimum form, the entire page will look
better, and the paragraph will be set as well as possible subject to the given constraints.
For example, one of the paragraphs in the story of Figure 6 has been set a line shorter
than its optimum length, so that all six columns come out equal.
The line-breaking algorithm we shall describe therefore has a ‘looseness’parameter,
illustrated in Figure 18. The ‘looseness’ is an integer q such that the total number of
lines produced for the paragraph is as close as possible to q plus the optimum number,
1156 DONALD E. KNUTH AND MICHAEL F. PLASS

without violating the conditions of feasibility. Figure 1 8 shows what happens to the
+ +
example paragraph of Figure 1 4 when q = - 1 , 0, 1, and 2, respectively. Values
of q < -1 would be the same as q = - 1 since this paragraph cannot be squeezed any
further, and values of q > 5 would be the same as q = 5 since the paragraph can’t
be stretched to more than 15 lines without having at least one line whose adjustment
ratio exceeds 10. The user can get the optimum solution having fewest possible lines
by setting q to an extremely negative value like - 1 0 0 . When q # 0, the feasible
breakpoints corresponding to different line numbers must all be remembered, even
when every line has the same length.
When the lines of a paragraph are fairly loose, we don’t want the last line to be
noticeably different, so it is undesirable to use a ‘finishing glue’ with almost infinite
stretchability as in our earlier remarks. The penalty for adjacent lines of contrasting
classes seems to work best in connection with looseness if the finishing glue at the
paragraph end is set to have a normal space equal to about half the total line width,
stretching to nearly the full width and shrinking to zero.

T H E ALGORITHM I T S E L F
Now let us get down to brass tacks and discuss the details of an optimum line-
breaking algorithm. We are given a paragraph xi . . . x, described by items x i =
) explained earlier, where x1 is a box item and x, is a penalty
( t i , w i , y i , z i , p i , xas
item specifying a forced break (p, = --a). We are also given a potentially infinite
sequence of positive line lengths I,, I,, . . . . There is a parameter c( that gets added
to the demerits whenever there are two consecutive breakpoints with = 1, and a
parameter y that gets added to the demerits whenever two consecutive lines belong to
incompatible fitness classes. There is a threshold parameter p that is an upper bound
on the adjustment ratios. And there is a looseness parameter q.
A feasible sequence of breakpoints ( b , , . . ., bk) is a legal choice of breakpoints such
that each of the k resulting lines has an adjustment ratio rj d p . If 4 = 0, the job ofthe
algorithm is to find a feasible sequence of breakpoints having the fewest total demerits.
If q # 0, the job of the algorithm is somewhat more difficult to describe precisely; it
can be formulated as follows: Let k be the number of lines that the algorithm would
+
produce when q = 0. Then the algorithm finds a feasible sequence of k q breakpoints
having fewest total demerits. However, if this is impossible, the value of q is decreased
by 1 (if q > 0) or increased by 1 (if q > 0) until a feasible solution is found. Sometimes
no feasible solution is possible even with q = 0; we will discuss this situation later after
seeing how the algorithm behaves in the normal case.
We have seen that it is occasionally useful to permit boxes, glue, and penalties to
have negative widths and even negative stretchability; but a completely unrestricted
use of negative values leads to unpleasant complications. For reasons of efficiency, it is
desirable to place two limitations on the paragraphs that will be treated:
0 Restriction 1. Let i b f b be the length of the minimum-length line from the begin-
ning of the paragraph to breakpoint b, namely the sum of all w i- zitaken over all
box and glue items xifor 1 d i < b , plus wb if xb is a penalty item. The paragraph
must have M a d Mb whenever a and b are legal breakpoints with a < b.
0 Restriction 2. Let a and b be legal breakpoints with a < 6 , and assume that no x i
in the range a< i < b is a box item or a forced break (penalty p i = - a). Then
either b = m, or xb+l is a box item or a penalty with p b + l < co.
BREAKING PARAGRAPHS INTO LINES 1157

Both of these restrictions are quite reasonable, as they are met by all known practical
applications. Restriction 2 seems peculiar at first glance, but we will see in a moment
why it is helpful.
Our algorithm has the following general outline, viewed from the top down:

(create an active node representing the beginning of the paragraph);


for b := 1 to m do (if b is a legal breakpoint) then
begin (initialize the feasible breaks at b to the empty set);
(for each active node a ) do
begin (compute the adjustment ratio Y from a to b ) ;
if Y < -1 or ( b is a forced break) then (deactivate node a);
<
if - 1 Y < p then (record a feasible break from a to b ) ;
end;
(if there is a feasible break at b ) then
(append the best such breaks as active nodes);
end;
(choose the active node with fewest total demerits);
if q # 0 then (choose the appropriate active node);
(use the chosen node to determine the optimum breakpoint sequence)

The meaning of the ad hoc Algol-like language used here should be self-evident. An
‘active node’ in this description refers to a record that includes information about a
breakpoint together with its fitness classification and the line number on which it ends.
We want to have a data structure that makes this algorithm efficient, and it is not
hard to design a reasonably good one, but there are two aspects in which some subtlety
pays off: T h e operation of computing the adjustment ratio, from a given active node a
to a given legal breakpoint b, should be made as simple as possible; and there should
be an easy way to determine which of the feasible breaks at b ought to be saved as
active nodes.
In the first place, the adjustment ratio depends on the total width, total stretch-
ability, and total shrinkability computed from the first box after one breakpoint to
the following breakpoint, and it would take too much time to compute these sums
over and over. We can avoid this by computing the sums from the beginning of the
paragraph to the current place, and subtracting two such sums to obtain the total of
what lies between them. Let ( & u ) b , ( C Y ) b , and ( b ) b denote the respective sums of all
the wi,yi, and zi in the box and glue items x i for 1 < i < b. Then if a and b are legal
breakpoints with a < b , the width L a b of a line from a to b and its stretchability Y a b
and shrinkability z a b can be computed as follows:

Lab = ( c w ) b -(Zw)after(a) + (wb if tb = ‘penalty’);


Yab = @Y)b - (zY)after(a);
zab = ( z : x ) b - (Cz)after(a)*

Here ‘after()’ is the smallest index i > a such that either i > m or x i is a box item
or xi is a penalty item that forces a break (pi = -m). These formulas hold even in
the degenerate case that after(a) > b, because of Restriction 2; in fact, Restriction 2
essentially stipulates that the relation ‘after(a) > b’ implies that (h) =, (&u)after(a),
( z Y ) b = (zY)after(o), and ( x z ) b = (Ez)after(a)*
1158 DONALD E. KNUTH A N D MICHAEL F. PLASS

From these considerations, we may conclude that each node a in the data structure
should contain the following fields:
position(a) = index of breakpoint represented by this node (0 = start of paragraph);
line(a) = number of the line ending at this breakpoint;
fitness(a) = fitness class of the line ending at this breakpoint;
totalwidth(a) = (Cw)after(a), used to calculate adjustment ratios;
totalstretch(a) = (Cy)after(a,,used to calculate adjustment ratios;
totalshrink(a) = (Cz)af,er(a),used to calculate adjustment ratios;
totaldemerits(a) = minimum total demerits up to this breakpoint;
previous(a) = pointer to the best node for the preceding breakpoint;
link(a) = pointer to the next node in the list.
Nodes become active when they are first created, and they become passive when they
are deactivated. T h e algorithm maintains global variables A and P, which point
respectively to the first node in the active list and the first node in the passive list.
T h e first step can therefore be fleshed out as follows:
(create an active node representing the beginning of the paragraph) =
begin A : = new node (position = 0, line = 0, fitness = 1 ,
totalwidth = 0, totalstretch = 0, totalshrink = 0,
totaldemerits = 0, previous = A, link = A);
P : = A;
end.
We also introduce global variables CW, C Y , and ZZ to represent (Zw),,(CY)b,
and (Cz), in the main loop of the algorithm, so that the operation ‘for b := 1 to m do
(if b is a legal breakpoint) then (main loop)’ takes the following form:
C W : =C Y : = C Z : = O ;
for b : = 1 tom do
if tb = ‘box’ then C W := C W + wb
else if tb = ‘glue’ then
begin if t b p 1= ‘box’ then (main loop);
C W : = ZW+w,; C Y : = CY+Yb; xz:=xz+zb;
end
else if p , # +cc then (main loop).
In the main loop itself, the operation ‘compute the adjustment ratio Y from a to b’ can
now be implemented simply as follows:
L : = C W - totalwidth(a);
+
if tb = ‘penalty’ then L : = L w,;
j : = line(a)+l;
if L < lj then
begin Y := C Y - totalstretch(a);
if Y > O then Y : = ($-L)/Y else I : = co;
end
else if L > l j then
begin 2 := C Z - totalshrink(a);
if 2 > 0 then Y : = ( $ - L ) / Z else Y : = 00;
end
else Y : = 0.
BREAKING PARAGRAPHS INTO LINES 1159

The other nonobvious problem we have to deal with is caused by the fact that
several nodes might correspond to a single breakpoint. We will never create two nodes
having the same values of (position, line, fitness), since the whole point of our dynamic
programming approach is that we need only remember the best possible way to get
to each feasible break position having a given line number and a given fitness class.
But it is not immediately clear how to keep track of the best ways that lead to a
given position, when that position can occur with different line numbers; we could,
for example, maintain a hash table with (line, fitness) as the key, but that would
be unnecessarily complicated. The solution is to keep the active list sorted by line
numbers: After looking at all the active nodes for l i n e j , we can insert new active
nodes for line j + 1 into the list just before any active nodes for lines > j + 1 that
we are about to look at next.
An additional complication is that we don’t want to create active nodes for different
line numbers when the line lengths are all identical, unless q # 0, since this would
unnecessarily slow the algorithm down; the complexities of the general case should
not encumber the simple situations that arise most often. Therefore we assume that
an indexj, is known such that all breaks at line numbers >jo can be considered
equivalent. This index j , is determined as follows: If q # 0, then j , = 00; otherwise
j , is as small as possible such that .,Z = 4+,
for all j >j,. For example, if q = 0 and
I, = I, = I, # Z4 = I, = - - . , we let j , = 3 , since it is unnecessary to distinguish a
breakpoint that ends line 3 from a breakpoint that ends line 4 at the same position, as
far as any subsequent lines are concerned.
For each position b and line numberj, it is convenient to remember the best feasible
breakpoints having fitness classifications 0, 1, 2, 3 by maintaining four values Do, D,,
D,,D,,where D,is the smallest known total of demerits that leads to a breakpoint at
position b and l i n e j and class c. Another variable D = min(D,, D,, D,,0,) turns out
to be convenient as well, and we let A, point to the active node a that leads to the best
value 0,. Thus the main loop takes the following slightly altered form:

begin a : = A ;preva := A;
loop: D , : = D , : = D , : = D , : = D : = +0O;
loop: nexta := link(a);
(compute the adjustment ratio Y from a to b ) ;
if r < -1 or pb = -00 then (deactivate node a ) else preva:= a;
if -1Grdpthen
begin (compute demerits d and fitness class c);
if d<D, then
begin D,:=d; A,:= a; if d < D then D : =d;
end;
end;
a : = nexta; if a = A then exit loop;
if line(a)aj and j < j , then exit loop;
repeat;
if D < 00 then (insert new active nodes for breaks from A, to b ) ;
if a = A then exit loop;
repeat;
if A = A then (do something drastic since there is no feasible solution);
end.
1160 DONALD E. KNUTH AND MICHAEL F. PLASS

For a given position b, the inner loop of this code considers all nodes a having
equivalent line numbers, while the outer loop runs through all of the line numbers that
are not equivalent.
It is not difficult to derive a precise encoding of the operations that have been
abbreviated in these loops:
(compute demerits d and fitness class c ) =
+
begin if pb 2 0 then d : = (1 100 I r +pJ2 l3
else if pb# -00 then d : = (1 1001 r + -pi 13) 2
+
else d := (1 100 1 r I3l2;
: = d+ a . f b .fposition(a);
if r < -.5 then c : = 0
<
else if r .5 then c : = 1
<
else if r 1 then c : = 2 else c : = 3;
if )c-fitness(a)) > 1 then d : = d+y;
+
d := d totaldemerits(a);
end;
(insert new active nodes for breaks from A, to b ) =
begin (compute tw = (Zw)after(b),tY = (xY)after(b),and tz = (Cz)aftcr(bt);
for c : = 0 to 3 do if D, < D f y then
+
begin s : = new node(position = 6, line = line(A,) 1, fitness = c,
totalwidth = tw,totalstretch = ty, totalshrink = tz,
totaldemerits = D,, previous = A,, link = a);
if preva = A then A = d else link(preva) := s;
preva := s;
end;
(compute tw = (xw)after(b),tY ==(xY)after(b),and = (xz)after(b)) =
begin t w : = C W , t y : = Z=Y, t z : = CZ, z:= b;
loop: if i > m then exit loop;
if 4 = ‘box’ then exit loop;
if ti = ‘glue’ then
begin tw:= tw+wi; t y : = ty+yi; t z : = tz+zi;
end
else if pi = - 00 and i> b then exit loop;
i: = i+ 1;
repeat;
end;
(deactivate node a> =
begin if preva = A then A : = nexta else link(preva) : = nexta;
link(a) : = P; P : = a;
end;
After the main loop has done its job, the active list will contain only nodes with
position = m, since x, is a forced break. Thus, we can write
(choose the active node with fewest total demerits) =
begin if a : = b := A; d : = totaldemerits(a);
loop: a := link(a);
if a = A then exit loop;
BREAKING PARAGRAPHS I N T O LINES 1161
if totaldemerits(a) < d then
begin d := totaldemerits(a); b : = a;
end;
repeat;
k := line(b);
end.
Now b is the chosen node and k is its line number. The subsequent processing for
q# 0 is equally elementary:
(choose the appropriate active"node) =
begin a := A ;s : = 0;
loop: 6 := line(a) - k;
if q < 6 < s or s < 6 d q then
begin s := b; d : = totaldemerits(a); b := a;
end
else if 6 = s and totaldemerits(a) < d then
begin d := totaldemerits(a); b : = a;
end;
a := link(a); if a = A then exit loop;
repeat;
k := line(b);
end.
Now the desired sequence of k breakpoints is accessible from node b:
(use the chosen node to determine the optimum breakpoint sequence) =
for j : = k down to 1 do
begin bj := position(b); b := previous(b);
end.
(Another way to complete the processing, getting the lines in forward order from 1 to k
instead of from k to 1, appears in the appendix below.) If there is no garbage collection,
the algorithm concludes by deallocating all nodes on lists A and P .
Note that Restriction 1 makes it legitimate to deactivate a node when we discover
that r < - 1, since r < - 1 is equivalent to Zl < Lab-Zab, therefore subsequent
breakpoints b'>b will have Labr-Zabr2 L a b - & , . Thus it is not difficult to
verify that the algorithm does indeed find an optimal solution: Given any sequence of
-
feasible breakpoints b , < . <b,, we can prove by induction on j that the algorithm
constructs a node for a feasible break at j , with appropriate line numbers and fitness
classifications, having no more demerits than the given sequence does.
There is only one loose end remaining in the algorithm, namely the operation 'do
something drastic since there is no feasible solution'. As mentioned above, the TEX
system assumes that the user has chosen the tolerance threshold p in such a way that
human intervention is desirable when this tolerance cannot be met. Another alternative
would be to have two thresholds and to try the algorithm first with threshold po,
which is lower than p , so the algorithm will generate comparatively few active nodes;
if there is no way to succeed at tolerance po, the algorithm could simply return all
nodes to free storage and try again with the actual threshold p. This dual-threshold
method will not always find the strictly optimum feasible solution, since it is possible
in unusual circumstances for the optimum solution to include a line whose adjustment
1162 DONALD E. KNUTH AND MICHAEL F. PLASS

ratio exceeds po while there is a non-optimum feasible solution meeting the tolerance
pa; for practical purposes, however, this difference is negligible.
T E X uses a different sort of dual-threshold method. Since the task of word division
is nontrivial, TEX first tries to break a paragraph into lines without any discretionary
hyphens except those already present in the given text, using a tolerance threshold p l .
If the algorithm fails to find a feasible solution, or if there is a feasible solution with
q # 0 but the desired looseness could not be satisfied (6 # q), all nodes are returned
to free storage and T E X starts again using another tolerance p 2 . During this second
pass, all words of five letters or more are submitted to TEX’S hyphenation algorithm
before they are treated by the line-breaking algorithm. Thus, the user sets p1 to the
limit of tolerance for paragraphs that can be completely broken without hyphenation,
and p2 is set to the tolerance limit when hyphenation must be tried; possibly p 1 will be
slightly larger than p z , but it might also be smaller, if hyphenation is not frowned on
too much. ( T E X users specify two integers, ‘jjpar’ = p: and ‘jpar’ = p z . ) In practice
>
p I and p 2 are usually equal to each other, or else p1 is near 1 and p z 2; alternatively,
one can take pz = 0 to effectively disallow hyphenation.
When both passes fail, T E X continues by reactivating the node that was most
recently deactivated and treats it as if it were a feasible break leading to 6. This situation
is actually detected in the routine ‘deactivate node a’, just after the last active node
has become passive:

if A =A and secondpass and D = co and r < -1 then Y:= -1


The net result is to produce an ‘overfull box’ that sticks out into the right margin,
whenever no feasible sequence of line breaks is possible. As discussed above, some kind
of error indication is necessary, since the user is assumed to have set p to a value such
that further stretching is intolerable and requires manual intervention. An overfuIl box
is easier to provide than an underfull one, by the nature of the algorithm. The setting
of the overfull box will be as tight as possible, so that the user can easily see how to
devise appropriate corrective action such as a forced line break or hyphenation.

COMPUTATIONAL EXPERIENCE
The algorithm described in the previous section is rather complex, since it is intended
to apply to a wide variety of situations that arise in typesetting. A considerably
simpler procedure is possible for the special cases needed for word processors and
newspapers; the appendix to this paper gives details about such a stripped-down
version. Contrariwise, the algorithm in TEX is even more complex than the one we
have described, because T E X must deal with leaders, with footnotes or cross references
or page-break marks attached to lines, and with spacing both inside and immediately
outside of math formulas; the spacing that surrounds a formula is slightly different from
glue because it disappears when followed by a line break, but it does not represent a
legal breakpoint. (A complete description of TEX’s algorithm will appear elsewhere.’ 3,
Experience has shown that the general algorithm is quite efficient in practice, in spite
of all the things it must cope with.
So many parameters are present, it is impossible for anyone actually to experiment
with a large fraction of the possibilities. A user can vary the interword spacing and the
penalties for inserted hyphens, explicit hyphens, adjacent flagged lines, and adjacent
BREAKING PARAGRAPHS INTO LINES 1163

lines with incompatible fitness classifications; the tolerance threshold p can also be
twiddled, not to mention the lengths of lines and the looseness parameter q. Thus
one could perform computational experiments for years and not have a completely
definitive idea about the behavior of this algorithm. Even with fixed parameters there
is a significant variation with respect to the kind of material being typeset; for example,
highly mathematical copy presents special problems. An interesting comparative study
of line breaking was made by Duncan et al.’, who considered sample texts from
Gibbon’s Decline and Fall versus excerpts from a story entitled Salar the Salmon; as
expected, Gibbon’s vocabulary forced substantially more hyphenated lines.
On the other hand, we have seen that the optimizing algorithm leads to better
line breaks even in children’s stories where the words are short and simple, as in
Grimm’s fairy tales. It would be nice to have a quantitative feeling for how much
extra computation is necessary to get this improvement in quality. Roughly speaking,
the computation time is proportional to the number of words of the paragraph, times
the average number of words per line, since the main loop of the computation runs
through the currently active nodes, and since the average number of words per line is
a reasonable estimate of the number of active nodes in all but the first few lines of a
paragraph (see Figures 12 and 14). On the other hand, there are comparatively few
active nodes on the first lines of a paragraph, so the performance is actually faster than
this rough estimate would indicate. Furthermore, the special-purpose algorithm in the
appendix runs in nearly linear time, independent of the line length, since it does not
need to run through all of the active nodes.
Detailed statistics were kept when TEX’Sfirst large production, Seminumerical Algo-
rithms’, was typeset using the procedure above. This 700-page book has a total of
5526 ‘paragraphs’ in its text and answer pages, if we regard displayed formulas as
separators between independent paragraphs. The 5526 paragraphs were broken into
a total of 21,057 lines, of which 550 (about 2.6 per cent) ended with hyphens. The
lines were usually 29 picas wide, which means 626.4 machine units in 10-point type and
about 677.19 machine units in 9-point type, roughly twelve or thirteen words per line.
The threshold values p1 and p2 were normally both set to 7 2 % 1*26, so the spaces
between words ranged from a minimum of 4 units to a maximum of 6+ 3 v 2 ~ 9 . 7 8
units. The penalty for breaking after a hyphen was 50; the consecutive-hyphens and
adjacent-incompatibility demerits were c( = y = 3000. The second (hyphenation) pass
was needed on only 279 of the paragraphs, i.e., about 5% of the time; a feasible solution
without hyphenation was found in the remaining 5247 cases. The second pass would
only try to hyphenate uncapitalized words of five or more letters, containing no accents,
ligatures, or hyphens, and it turned out that exactly 6700 words were submitted to the
hyphenation procedure. Thus the number of attempted hyphenations per paragraph
was approximately 1‘2, only slightly more than needed by conventional nonoptimizing
algorithms, and this was not a significant factor in the running time.
The main contribution to the running time came, of course, from the main loop of the
algorithm, which was executed 274,102 times (about 50 times per paragraph, including
both passes lumped together when the second pass was needed). The total number of
break nodes created was 64,003 (about 12 per paragraph), including multiplicities for
the comparatively rare cases that different fitness classifications or line numbers needed
to be distinguished for the same breakpoint. Thus, about 23 yo of the legal breakpoints
turned out to be feasible ones, given these comparatively low values of p1 and p2. The
inner loop of the computation was performed 880,677 times; this is the total number
1164 DONALD E. KNUTH AND MICHAEL F. PLASS

-1.00 5 r <-0.95
-0.95 5 r <-0.85 1
-0.85 5 r < -0.75
-0.75 5 r <-0.65 1
-0.65 5 r <-0.55 1
-0.55 5 r <-0.45
-0.45
-0.35
-0.25
5 r <-0.35
5 r < -0.25
5 r <-0.15
I
< r <-0.05
-0.15
-0.05 < r <+0.05
+0.05 5 r <+0.15
-1 I

+0.15 5 r <+0.25
+0.25 5 r <+0.35 F I
I
+0.35 5 r <+0.45
+0.45 5 r <+0.55
+0.55 5 r <+0.65
+0.65 5 r <+0.75
t0.75 5 r <+0.85
+0.85 5 r < +0.95
+0.95 5 r <+l.05
+l.05 5 r <+1.15 Figure 19. The adjustment ratios for interword spaces
+1.15 5 r <+l.26 in a 700-page book.

of active nodes examined when each legal breakpoint was processed, summed over all
legal breakpoints. Note that this amounts to about 160 active node examinations per
paragraph, and 3.2 per breakpoint, so the inner loop definitely dominates the running
time. If we assume that words are about five letters long, so that a legal break occurs
for every six characters of input text including the spaces between words, the algorithm
costs about half of an inner-loop step per character of input, plus the time to pass over
that character in the outermost loop.
This source data was also used to establish the importance of the optional dominance
test ‘if D,< D+y’ preceding the creation of a new node; without that test, the
algorithm was found to need about 25% more executions of the inner loop, because
so many unnecessary nodes were created.
And how about the output? Figure 19 shows the actual distribution of adjustment
ratios r in the 15,531 typeset lines of Seminumerical Algorithms, not counting the
5526 lines at the ends of paragraphs, for which r ~ 0 There . was also one line with
8 one with ~ ~ 2(i.e.,
r ~ 1 . and . 2a disgraceful spacing of 12.6 units); perhaps some
reader will be able to spot one or both of these anomalies some day. T h e average value
of r over all 21,057 lines was 0.08, and the standard deviation was only 0-403; about
67% of the lines had word spaces varying between 5 and 7 units. Furthermore the
author believes that virtually none of the 15,531 line breaks are ‘psychologically bad’
in the sense mentioned above.
Anyone who has experience with typical English text knows that these statistics are
not only excellent, they are in fact too good to be true; no line-breaking algorithm can
achieve such stellar behavior without occasional assists from the author, who notices
that a slight change in wording will permit nicer breaks. Indeed, this is another source
of improved quality when an author is given composition tools like TEX to work with,
because a professional compositor does not dare mess around with the given wording
when setting a paragraph, while an author is happy to make changes that look better,
especially when such changes are negligible by comparison with changes that are found
to be necessary for other reasons when a draft is being proofread. An author knows
BREAKING PARAGRAPHS INTO LINES 1165

that there are many ways to say what he or she wants to say, so it is no trick at all to
make an occasional change of wording.
Theodore L. De Vinne, one of America’s foremost typographers at the turn of the
century, wrote14 that ‘when the author objects to [a hyphenation] he should be asked
to add or cancel or substitute a word or words that will prevent the breakage.. .
Authors who insist on even spacing always, with sightly divisions always, do not
clearly understand the rigidity of types.’ Another interesting comment was made by
G. B. Shawl’: ‘In his own works, whenever [William Morris] found a line that justified
awkwardly, he altered the wording solely for the sake of making it look well in print.
When a proof has been sent to me with two or three lines so widely spaced as to make
a grey band across the page, I have often rewritten the passage so as to fill up the lines
better; but I am sorry to say that my object has generally been so little understood
that the compositor has spoilt all the rest of the paragraph instead of mending his
former bad work.’
The bias caused by Knuth’s tuning his manuscript to a particular line width makes
the statistics in Figure 19 inapplicable to the printer’s situation where a given text
must be typeset as it is. So another experiment was conducted in which the material
of Section 3 . 5 of Seminumerical Algorithms was set with lines 25 picas wide instead
of 29 picas. Section 3 . 5 , which deals with the question ‘What is a random sequence?’,
was chosen because this section most closely resembles typical mathematics papers con-
taining theorems, proofs, lemmas, etc. In this experiment the optimum-fit algorithm
had to work harder than it did when the material was set to 29 picas, primarily because
the second pass was needed about thrice as often (49 times out of 273 paragraphs,
instead of 16 times); furthermore the second pass was much more tolerant of wide
spaces (p2 = 10 instead of 72), in, order to guarantee that every paragraph could be
typeset without manual intervention. There were about 6 examinations of active nodes
per legal breakpoint encountered, instead of about 3 , so the net effect of this change
in parameters was to nearly double the running time for line breaking. The reason for
such a discrepancy was primarily the combination of difficult mathematical copy and
a narrower column measure, rather than the ‘author tuning’, because when the same
text was set 3 5 picas wide the second pass was needed only 8 times.
It is interesting to observe the quality of the spacing obtained in this 25-pica experi-
ment, since it indicates how well the optimum-fit method can do without any human
intervention. Figure 20 shows what was obtained, together with the corresponding
statistics for the best-fit method when it was applied to the same data. About 800 line
breaks were involved in each case, not counting the final lines of paragraphs. The main
difference was that optimum-fit tended to put more lines into the range *5 d Y d 1,
while best-fit produced considerably more lines that were extremely spaced out. T h e
standard deviation of spacing was 0.53 (optimum-fit) versus 0.65 (best-fit); 24 of the
lines typeset by best-fit had spaces exceeding 12 units, while only 7 such bad lines were
produced by the optimum-fit method. An examination of these seven problematical
cases showed that three of them were due to long unbreakable formulas embedded in
the text, three were due to the rule that TEX does not try to hyphenate capitalized
words, and the other one was due to TEX’Sinability to hyphenate the word ‘reasonable’.
Cursory inspection of the output indicated that the main difference between best-fit
and optimum-fit, in the eyes of a casual reader, would be that the best-fit method not
only resorted to occasional wide spacing, it also tended to end substantially more lines
with hyphens: 119 by comparison with 80. An author who cares about spacing, and
1166 DONALD E. KNUTH AND MICHAEL F. PLASS

-1.00 5 r
-0.75 5 r
-0.50 5 r
-0.25 5 r
0.00 5 r
t0.25 5 r
<-0.75
<-0.50
<-0.25
< 0.00
<+0.25
<$-0.50
-1.00 5 r <-0.75
-0.75 5 t < -0.50
-0.50 5 t <-0.25
-0.25 5 t < 0.00
0.00 5 r <+0.25
+0.25 5 t <+0.50
a,
+0.50 5 r < i-0.75 +0.50 5 r <+0.75
t0.75 5 r <+l.OO +0.75 5 r <+l.OO
+l.OO 5 r <+1.25 +l.OO 5 r <+1.25
+1.25 5 r <+1.50 +1.25 5 r <+l.50
+l.50 5 r <+1.75 +l.50 5 r <+1.75
f1.75 5 r <+2.00 +1.75 5 r <+2.00 'Optimum fit'
t2.00 5 r <+w +2.00 5 r <+oo !I
Figure 20. The distribution of interword spaces found by the best line-at-a-time method,
compared to thedistributionfound by the best paragraph-at-a-timemethod, whendificult
mathematical copy i s typeset without human intervention.

who therefore will edit a manuscript until it can be typeset satisfactorily, would have
to do a significant amount of extra work in order to get the best-fit method to produce
decent results with such difficult copy, but the output of the optimum-fit method could
be made suitable with only a few author's alterations.

A HISTORICAL SUMMARY
We have now discussed most of the issues that arise in line breaking, and it is interesting
to compare the newfangled approaches to what printers have actually been doing
through the years. Medieval scribes, who prepared beautiful manuscripts by hand
before the days of printing, were generally careful to break lines so that the right-
hand margins would be nearly straight, and this practice was continued by the early
printers. Indeed, printers had to fill up each line of type with spaces anyway, so that
the individual letters wouldn't fall out of position while making impressions, and it
wasn't too much more difficult for a compositor to distribute the spaces between words
instead of putting them at the ends of lines.
One of the most difficult challenges faced by printers over the years has been the
typesetting of 'polyglot Bibles'-editions of the Bible in which the original languages
are set side by side with various translations-since special care is needed to keep
the versions of various languages synchronized with each other. Furthermore the fact
that several languages appear on each page means that the texts tend to be set with
narrower columns than usual; this, together with the fact that one dare not alter the
sacred words, makes the line-breaking problem especially difficult. We can get a good
idea of the early printers' approaches to line breaking by examining their polyglot
Bibles carefully.
The first polyglot Bible'6,'7,'8 was produced in Spain by the eminent Cardinal
J i m h e z de Cisneros, who reportedly spent 50,000 gold ducats to support the project.
It is generally called the Complutensian Polyglot, because it was prepared in Alcalh
de Henares, a city near Madrid whose old Roman name was Complutus. The printer,
Arnao Guillen de Brocar, devoted the years 1514-1517 to the production of this six-
volume set, and it is said that the Hebrew and Greek fonts he made for the occasion
are among the finest ever cut. His approach to justification was quite interesting and
unusual, as shown in Figure 21 : Instead of justifying the lines by increasing the word
spaces, he inserted visible leaders to obtain solid blocks of copy with straight margins.
BREAKING PARAGRAPHS INTO LINES 1167

Figure 21. The opening verses


of Genesis as typeset in the
Complutensian Polyglot Bible;
the Latin words are keyed to the
Hebrew, and leaders are used to
fill out lines that would otherwise
be ragged right and ragged left.
Greek and Chaldee (Aramaic)
versions of the text also appeared
on the same page.

These leaders appear at the right of the Latin lines and at the left of the Hebrew lines.
He changed this style somewhat after gaining more experience: Starting at about the
46th chapter of Genesis, the Hebrew text was justified by word spaces, although the
leaders continued to appear in the Latin column. It is clear that straight margins were
considered strongly desirable at the time.
Brocar’s method of line breaking seems to be essentially a first-fit approach to the
Hebrew text; the corresponding Latin translation could then be set up rather easily,
since there were two lines of Latin for each line of Hebrew, and this gave plenty of room
for the Latin. In some cases when the Greek text was abnormally long by comparison
with the corresponding Hebrew (e.g., Exodus 38), Brocar set the Hebrew quite loosely,
so it is evident that he gave considerable attention to line breaking.
At about the same time, a polyglot version of the book of Psalms was being prepared
as a labor of love by Agostino Giustiniani of Genoa.” This was the first polyglot book
actually to appear in print with each language in its own characters, although Origen’s
third-century Hexupla manuscript is generally considered to be the inspiration for all
of the later polyglot volumes. Giustiniani’s Psalter had eight columns: ( 1 ) The Hebrew
original; (2) A literal Latin rendition of (1); (3) The common Latin (Vulgate) version;
(4)T h e Greek (Septuagint) version; (5) The Arabic version; (6) The Chaldee version;
(7) A literal Latin translation of (6); (8) Notes. Since the Psalms are poems, all of the
columns except the last were set with ragged margins, and an interesting convention
was used to deal with the occasional line that was too wide to fit: A left parenthesis was
placed at the very end of the broken line, and the remainder of that line (preceded by
another left parenthesis) was placed flush with the margin of the preceding or following
line, wherever it would fit.
Only column (8) was justified, and it had a rather narrow measure of about 21 char-
acters per line. By studying this column we can conclude that Giustiniani did not take
great pains to make equal spacing by fiddling with the words. For example, Figure 22,
1168 DONALD E. KNUTH AND MICHAEL F. PLASS

Figure 22. Part of Giustiniani’s commentary on the Psalms. The


presence of a loose line surrounded by two very tight lines indicates
that the compositor did not go back to reset previous lines when a
problem arose.

which comes from the notes on Psalm 6, shows two very tight lines enclosing a very
loose one in the passage ‘scriptum est . . .quod qui’. If Giustiniani had been extremely
concerned about spacing he would have used the hyphenation ‘cog-nosces’; the other
potential solution, to move ‘ad’ up a line, would not have worked since there isn’t quite
room for ‘ad’ on the loose line. Notice that another aid to line breaking in Latin at
that time was to replace an m or n by a tilde on the previous vowel (e.g., ‘premifi’
for premium and ‘miido’ for mundo); an extension to the box/glue/penalty algebra
would be needed to include such options in TEX’Sline-breaking algorithm! It is not
clear why Giustiniani didn’t set ‘acceperk’ on the third line, to save space, since he
had no room for the hyphen of ‘in-tellectum’; perhaps he didn’t have enough 6’s left
in his type case.
Figure 23 shows some justified text from the Complutensian polyglot, taken from
the Latin translation of an early Aramaic translation of the original Hebrew. The
compositor was somewhat miraculously able to maintain this uniformly tight spacing
throughout the entire volume, by making use of abbreviations and frequent hyphena-
tions. Note that, as in Figure 22, the hyphen was omitted from a broken word when
there was no room for it; e.g., ‘diuisit’ has been divided without a hyphen.

Figure 23. Early printing of Latin texts featured


uniformly tight spacing, obtained by frequent use
of abbreviations and word division. This sample
comes from the same page as Figure 21.
BREAKING PARAGRAPHS INTO LINES 1169
I%
Figure 24. The Latin version of 1 Maccabees 2:32from Plantin’s Royal %t ftatiin persexenitit ad
Polygot of Antwerp, showing how the second-last line of a paragraph ,& conitirueruiit aducr-
was spaced out in order to add a line. (The copy is distorted at the cos prcclium in &ebb-
right of this illustration, because the pages of this rare book cannot be
laid $at without harming its binding.)
,
bat-oruin & dixerutx ad
COS.

The next great polyglot Bible was the Royal Polyglot of Antwerp,20produced during
1568-1 572 by the outstanding printer Christophe Plantin. Numerous copies of the
Complutensian Polyglot had unfortunately been lost at sea, so King Phillip I1 commis-
sioned a new edition that would also take advantage of recent scholarship. Plantin
was a pious man who was active in pacifist religious circles and anxious to undertake
the job; but when he had completed the work he described it as an ‘indescribable toil,
labor, and expense.’ On June 9, 1572, Plantin sent a letter to one of his friends, saying
‘I am astonished at what I undertook, a task I would not do again even if I received
12,000 crowns as a gift.’ But at least his work was widely appreciated: Lucas of Bruges,
writing in 1577, said that ‘the art of the printer has never produced anything nobler,
nor anything more splendid.’
Most of Plantin’s polyglot Bible was justified with fairly wide columns having about
42 characters per line, so it did not present especially difficult problems of line breaking.
But we can get some idea of his methods by studying the texts of the Apocrypha, which
were set with a narrower measure of about 27 characters per line. He arranged things
so that each column on a page would have about the same number of lines, even though
the individual columns were in different languages. Figure 24 shows an example of a
passage excerpted from a page where the Latin text was comparatively sparse, so the
paragraphs on that page needed to be rather loose. It appears that the entire page
was set first, then adjustments were made after the Latin column was found to be
too short; in this case the word ‘eos’ was brought down to make a new line and the
previous line was spaced out. Plantin’s compositor did not take the trouble to move
‘sab-’ down to that line, although such a transposition would have avoided a hyphen
without making the spacing any worse. T h e optimum solution would have been to
avoid this hyphenation and to hyphenate the previous line after ‘ad-’, thus achieving
fairly uniform spacing throughout.
The most accurate and complete of all polyglot Bibles was the London
printed by Thomas Roycroft and others during the Cromwellian years 1653-1 657.
This massive 8-volume work included texts in Hebrew, Greek, Latin, Aramaic, Syriac,
Arabic, Ethiopic, Samaritan, and Persian, all with accompanying Latin translations,
and it has been acclaimed as ‘the typographical achievement of the seventeenth cen-
tury.’ As in Plantin’s work shown in Figure 24, a paragraph that has been loosened will
often eod with an unnecessarily tight hyphenated line followed by a loose line followed
by a one-word line; so it is clear that Roycroft’s compositors did not have time to do
complex adjustments of line breaks.
Hyphenations were clearly not frowned upon at the time, since about 40% of all
lines in the London Polyglot end with a hyphen, regardless of the column width. It
is not difficult to find pages on which hyphenated lines outnumber the others; and in
the Latin translation of the Aramaic version of Genesis 4:15, even the two-letter word
‘e-o’ was hyphenated! Such practice was not uncommon: for example, the Hamburg
Polyglot Bible” of 1596 had more than 50% hyphens at the right margin. Both
Plantin’s polyglot and the notes of Giustiniani’s Psalter had hyphenation percentages
of about 40%, and the same was true of many medieval manuscripts. Thus it was
1170 DONALD E. KNUTH AND MICHAEL F. PLASS

considered better to have the margins straight and to keep the spacing tight, rather
than to avoid word splits.
One of the first things that strikes a modern eye when looking at these old Bibles
is the treatment of punctuation. Note, for example, that no space appears after the
commas in Figure 22, and a space appears before as well as after one of the commas
in Figure 24. One can find all four possibilities of ‘space before/no space before’
and ‘space after/no space after’ in each of the Bibles mentioned so far, with respect
to commas, periods, colons, semicolons, and question marks, and with no apparent
preference between the four choices except that it was comparatively rare to put a
space before a period. Giustiniani and Plantin occasionally would insert spaces before
periods, but Roycroft apparently never did. Commas began to be treated like periods
in this respect about 1700, but colons and semicolons were generally both preceded and
followed by spaces until the 19th century. Such extra spaces were helpful in justifying,
of course, and it was also helpful to have the option of leaving out all of the space next
to a punctuation mark. Roycroft would in fact eliminate the space between words
when necessary, if the following word was capitalized (e.g., ‘dixitDeus’); apparently a
printer’s main goal was to keep the text unambiguously decipherable, while ease of
readability was only of secondary importance.
Knowledge about how to carry out the work of a trade like printing was originally
passed from masters to apprentices and not explained to the general public, so we can
only guess at what the early printers did by looking at their finished products. A trend
to put trade secrets into print was developing during the 17th century, however,23
and a book about how to make books was finally written: Joseph Moxon’s Mechanick
published in 1683, was by forty years the earliest manual of printing in
any language. Although Moxon did not discuss rules for hyphenation and punctuation,
he gave interesting information about line breaking and justification.
‘If the Compositor is not firmly resolv’d to keep himself strictly to the Rules of
good Workmanship, he is now tempted to make Botches.. .’, namely bad line breaks,
according to Moxon. The normal ‘thick space’ between words, when beginning to make
up a line, was one-fourth of what Moxon called the body size (one em), and he also spoke
of ‘thin spaces’ that were one-seventh of the body size; thus, a printer who followed
this practice would deal mostly with spaces of 4.5 units and 2.57 units, although these
measurements were only approximate because of the primitive tools used at the time.
Moxon’s procedure for justifying a line whose natural width was too narrow was to
insert thin spaces between one or more words to ‘fill up the Measure pretty stiff,’ and if
necessary to go back through the line and do this again. ‘Strictly, good Workmanship
will not allow more [than the original space plus two thin spaces], unless the Measure
be so short, that by reason of few Words in a Line, necessity compells him to put more
Spaces between the the W o r d s . . .These wide Whites are by Compositers (in way of
Scandal) call’d Pidgeon-holes. . . .And as Lines may be too much Spaced-out, so may
they be too close Set.’
Notice that Moxon’s justification procedure would normally leave uneven spacing
between words on the same line, since he inserts the thin spaces one by one. In fact,
such discrepancies were the norm in early printed books, which look something like
present-day attempts at justification on a typewriter or computer terminal with fixed-
width spacing. For example, the relative proportions in the spaces of the third line of
Plantin’s text in Figure 24 are approximately 8 : 12 : 5 : 9 :4, and in the fifth line
of Giustiniani’s Figure 22 they are approximately 3 : 2 : 1. Moxon’s book itself (see
BREAKING PARAGRAPHS INTO LINES 1171

rf there tx a long rcordor more left out, he


caniiot exp& to Get that in into that Pme, whcre-
fore he ~nultnow Over-ran; that is, he mufl put fo
much of the fore-part of the Lme into the Line
aboveit, or fo much of the hinder p r t of the L ~ H C
into the next Line under it, as will make room for
\E lrat is I,$ mi : Tltcrefitre Ire confiders liow Wide
he has Set, that io by Over-rranzq the fewcr Lrnes
back.ivarcls or foru ards, or both, (as he finrishis help)
lie may take out ii, ninny LTpures, or other CVlirtes
as wiil amount to rile T7wRnefi ofwhat he has Left
o#t : Thus if he have Set wide, he may perhaps Gel
Figure 25. A n excerpt from page 245 of a fmall Wwdor a S$f& into the forcgoing Lmegnd
Joseph Moxon’s ‘Mechanick Exercises,’ prliaqs another fmall @“odorSyikhie in the follow-
vol. 2, the $first book about how printing mg Lme, which if his T-ea.c?~gwt is not much, may
is done. Moxon is describing the process Gst it in : But if he Left out much, he mufl O V C ~ ; . ~ U M
of making corrections to pages that have iwmy Lints, either backwards or forwards, or both,
already been typeset; the irregular spacing till Iie CQIIIC t o a B r e d : And if when he coines at
found throughout his book is probably due a sreilfi it lie not Gotten in ; he l>rrve.s out a Line.
in part to the fact that such corrections In this aiie if hc cantlot Get zn a f.me, by ~ e t t ; ~ ~
are necessary. dta tile Mbrd of rllat t3rt-d (as I juR now iheur’d you

Figure 25) shows extreme variations, frequently breaking the rules he had stated for
maximum and minimum spaces between words.
It would be nice to report that Moxon described a particular line-breaking algorithm,
like the first-fit or best-fit method, but in fact he never suggested any particular
procedure, nor did any of his successors until the computer age; this is not surprising,
since people were just expected to use their common sense instead of to obey some rigid
rules. Many of the breaks in Figure 25 can, however, be accounted for by assuming
an underlying first-fit algorithm. For example, the looseness on lines 1, 4, and 8 is
probably due to the long words at the beginning of lines 2, 5, and 9, since these long
words would not fit on the previous line unless they were hyphenated. O n the other
hand, the extremely tight spacing on line 13 can best be explained by assuming that
one or more words had to be inserted to correct an error after the page had been set.
Thus we cannot satisfactorily infer the compositor’s procedure from the final copy, we
really need to see the first trial proofs. All we can conclude for certain is that there
was very little attempt to go back and reconsider the already-set lines unless it was
absolutely necessary to do so; for example, this paragraph would have been better if
the first line had ended with ‘can-’ and the second with ‘wherefore’.
Moxon’s compositor was, however, supposed to look ahead: ‘When in Composing he
comes near a Break [i.e., the end of a paragraph], he for some Lines before he comes to
it considers whether that Break will end with some reasonable White; If he finds it will,
he is pleas’d, but if he finds he shall have but a single W o r d in his Break, he either Sets
wide to drive a Word or two more into the Break-line, or else he Sets close to get in that
little Word, because a Line with only a little Word in it, shews almost like a White-line,
which unless it be properly plac’d, is not pleasing to a curious Eye.’
Another extract from a London printing manual25is shown in Figure 26; this one
is from 1864 instead of 1683. Although the author says that the justifying spaces
are to be made as nearly equal as possible, whoever did the composition of his book
did not follow the instructions it contains! Only one of the fine books considered
above has spaces that look the same, namely the Complutensian Polyglot. In fact,
printers only rarely achieved truly uniform spacing until machines like the Monotype
1172 DONALD E . KNUTH AND MICHAEL F. PLASS

they m y be all exmtly tho 8nmo length, it will almost


nlwajqs happcn that tho line will either b v e to be
brought out by putting in dditiond qeoecr betwean
tho wordu, or rontr&ed hy subGtuting thinnm R ~ S
than those U B in~lotting up the line& If the line by
that altoration is not quit0 tight, as additional thin
epnm may be inacrlcd between srwh wwds ns &gin
with j or end with f, and also after dl thepoints, but
they must, to look wsll, be put a% n w equdlyaa
Figure 26. Printers do not always practice possible botweenesch word in the line, md after each
what they preach. sentence an om ~ p s wie used

and Linotype made the task easier towards the end of the nineteenth century; and these
new machines, with their emphasis on speed, changed the philosophy of justification
so much that the quality of line breaking decreased when the spacing became uniform:
It became too inconvenient for the compositor to go back and reconsider any of the
earlier line breaks of a paragraph, when he was expected to turn out so many more
ems of type per hour.
The line breaks in Figure 26 are fairly well done in spite of the uneven spacing, given
that the compositor wished to avoid hyphenations and the psychologically bad break
in the phrase ‘with j’; it would have been slightly better, however, to move the word
‘but’ down to the third-last line.
Probably the most beautiful spacing ever achieved in any typeset book appeared in
The Art of Spacing26 by Samuel A. Bartels (1926). This book was hand set by the
author, and it contains about 50 characters per line. There are no loose lines, and
no hyphenated words; the final line of each paragraph always fills at least 65% of the
column width, yet ends at least one em from the right margin. Bartels must have
changed his original wording many times in order to make this happen; the author as
compositor is clearly able to enhance the appearance of a book.
General-purpose computers were first applied to typesetting by Georges P. Bafour,
And& R. Blanchard, and Franqois H. Raymond in France, who applied for patents
on their invention in 1954. (They received French and British patents in 1955, and a
U.S. patent in 1956.’” 2 8 ) This system gave special attention to hyphenation, and its
authors were probably the first to formulate the method of breaking one line at a time
in a systematic fashion. Figure 27 shows a specimen of their output, as demonstrated
at the Imprimerie Nationale in 1958. In this example the word ‘en’was not included in
the second line because their scheme tended to favor somewhat loose lines: Each line
would contain as few characters as possible subject to the condition that the line was
feasible but the addition of the next K characters would not be feasible; here K was a
constant, and their method was based on a K-stage lookahead.
Michael P. Barnett began to experiment with computer typesetting at M.1.T.in 1961,
and the work of his group at the Cooperative Computing Laboratory was destined
to become quite influential in the U.S.A. For example, the TROFF system29 that
is now in use at many computer centers is a descendant of Barnett’s PC6 system’,
via other systems called RUNOFF and NROFF.Another line of descent is represented
by the PAGE-1, PAGE-2,and PAGE-3 systems, which have been used extensively in
the typesetting i n d ~ s t r y . ~ 3’ 1, , 32 All of these programs use the first-fit method of line
breaking that is described above.
At about the same time that Barnett began his M.I.T. studies of computer typeset-
ting, another important university research project with similar goals was started by
John Duncan at the University of Newcastle-Upon-Tyne Computing Laboratory.
BREAKING PARAGRAPHS INTO LINES 1173

Figure 27. This is a specimen of the output


produced in 1958 by the first computer-
controlled typesetting system in which all of
the line breaks were chosen automatically.

Line breaking was one of the first subjects studied intensively by this group, and they
developed a program that would find a feasible way to typeset a paragraph without
hyphenations, if any sequence of feasible breaks exists, given minimum and maximum
values for interword spaces. This program essentially worked by backtracking through
all possibilities, treating them in reverse lexicographic order (i.e., starting with the first
breakpoint b , as large as possible and using the same method recursively to find feasible
breaks (b2,b,, . . .) in the rest of the paragraph, then decreasing b, and repeating the
process if necessary). Thus it would either find the lexicographically largest feasible
sequence of breakpoints or it would conclude that none are feasible; in the latter case
hyphenation was attempted. This was the first systematic sequence of experiments to
deal with the line-breaking problem by considering a paragraph as a whole instead of
working line by line.
No distinction was made in these early experiments between one sequence of feasible
breakpoints and another; the only criterion was whether or not all interword spacing
could be confined to a certain range without requiring hyphenation. Duncan found
that when lines were 603 units wide, it was possible to avoid virtually all hyphenations
if spaces were allowed to vary between 3 and 12 units; with 405-unit lines, however,
hyphens were necessary about 3% of the time in order to keep within these fairly
generous limits, and when the line width decreased to 288 units the hyphenation
percentage rose to 12% or 16% depending on the difficulty of the copy being typeset.
More stringent intervals, such as the requirement of 4-to 9-unit spaces used in most
of the examples we have been considering above, were found to need more than 4%
hyphenations on 603-unit lines and 30% to 40% on 288-unit lines. However, these
numbers are higher than necessary because the Newcastle program did not search for
the best places to insert hyphens: Whenever it was unfeasible to set more than k lines,
the (k+ 1)st line was simply hyphenated and the process was restarted. One hyphen
generated by this method tends to spawn more in the same paragraph, since the first
line of a paragraph or of an artificially resumed paragraph is the most likely to require
hyphenation. Examples of the performance can be seen in the article where the method
was introduced’ (using spaces of 4 to 15 units for the first six pages and 4 to 12 units
for the rest), as well as in Duncan’s survey paper.2 These articles also discuss possible
refinements to the method, one idea being to try to avoid loose lines next to tight lines
in some unspecified manner, another being to try the method first with strict spacing
intervals and then to increase the tolerance before resorting to hyphenation.
Such refinements were carried considerably further by P. I. Cooper33 at Elliott
Automation, who developed a sophisticated experimental system for dealing with entire
paragraphs. Cooper’s system worked not only with minimum and maximum spacing
parameters, it also divided the permissible interword spaces into different sectors that
yielded different so-called ‘penalty scores’. Besides the penalties associated with the
spaces on individual lines, there were additional penalty scores based on the respective
spacing sectors of two consecutive lines, and the goal was to minimize the total penalty
1174 DONALD E. KNUTH AND MICHAEL F. PLASS

needed to typeset a given paragraph. Thus, his model was rather similar to the T E X
model that we have been discussing, except that all spaces were equivalent to each
other and special problems like hyphenation were not treated.
Cooper said that his program ‘employs a mathematical technique known as “dynamic
programming’’ ’ to select the optimum setting. However, he gave no details, and from
the stated computer memory requirements it appears that his algorithm was only an
approximation to true dynamic programming in that it would retain just one optimum
sum-of-penalties for each breakpoint, not for each (breakpoint, sector) pair. Thus, his
algorithm was probably similar to the method given in the appendix below.
Unfortunately, Cooper’s method was ahead of its time; the consensus in 1966 was
that such additional computer time and memory space were prohibitively expensive.
Furthermore his method was evaluated only on the basis of how many hyphens it would
save, not on the better spacing it provided on non-hyphenated lines. For example, J. L.
Dolby’s notes on this paper34compared Cooper’s procedure unfavorably to Duncan’s
since the Newcastle method removed the same number of hyphens with what appeared
to be a less complex program. In fact, Cooper himself undersold his scheme with
unusual modesty and caution when he spoke about it: He said ‘this investigation does
not support the view that [my approach] should be given a general and enthusiastic
recommendation.. . . It has to be admitted that an aesthetic improvement is neither
predictable nor measurable.’ His method was soon forgotten.
In retrospect we can see that the defect in Cooper’s otherwise admirable approach
was the way it dealt with hyphenation: No proper tradeoff between hyphenated lines
and feasible unhyphenated lines was made, and the method would be restarted after
every hyphen had to be inserted. Thus, the hyphens tended to cluster as in the
Newcastle experiments.
Another approach to line breaking has recently been investigated by A. M. Pringle
of Cambridge University, who devised a procedure called Juggle.35 This algorithm
uses the best-fit method without hyphenation until reaching a line that cannot be
accommodated; then it calls a recursive procedure pushback that attempts to move
a word from the offending line up into the previous text. If pushback fails to solve
the problem, another recursive routine pullon tries to move a word forward from the
previous text; hyphenation is attempted only if pullon fails too. Thus, Juggle attempts
to simulate the performance of a methodical super-conscientious workman in the good
olde days of hand composition. The recursive backtracking can, however, consume a lot
of time by comparison with a dynamic programming approach, and an optimum
sequence of line breaks is not generally achieved; for example, Figure 2 would be
obtained instead of Figure 3 . Furthermore there are unusual cases in which feasible
solutions exist but Juggle will not find them; for example, it may be feasible to push
back two words but not one.
Hanan Samet has suggested another measure of optimality in his recent work on line
breaking.36Since all methods for setting a paragraph in a given number of lines involve
the same total amount of blank space, he points out that the average interword space in
a paragraph is essentially independent of the breakpoints (if we ignore the fact that the
final line is different). Therefore he suggests that the variance of the interword spaces
should be minimized, and he proposes a ‘downhill’ algorithm that shifts words between
lines until no such local transformation further reduces the variance.
The first magazine publisher to develop computer aids to typesetting was Time Inc. of
New York City, whose line-breaking decisions went largely on-line in 1967. According
BREAKING PARAGRAPHS INTO LINES 1175

to comments made by H. D. Parks3’ at the time, line breaks were determined one by
one using a variation of the first-fit algorithm that we might call ‘tight-fit’; this gives
the most words per line except that hyphenation is done only when necessary, and
it is equivalent to the first-fit method if the normal interword spacing is the same as
the minimum. The tight-fit method had previously been used on the IBM 1620 Type
Composition System demonstrated in 1963 (see Duncan,2 pages 159-160), and it is
reasonable to suppose that essentially the same method was carried over to the Time
group when they dedicated two IBM 360/40 computers to the typesetting task.38
Since the final copy in Time magazine has been edited and re-edited, and since
manual intervention and last-minute corrections will change line-breaking decisions, it
is impossible to deduce what algorithm is presently used for Time articles merely by
examining the printed pages; but it is tempting to speculate about how the optimum-
fit algorithm might improve the appearance of such publications. Figure 28 on the
next page shows an interesting example based on page 22 of Time magazine dated
June 23, 1980; Version A shows the published spacing and Version B shows what the
new algorithm would produce in the same circumstances. All letters of the text have
been replaced by n’s of the corresponding width, so that it is possible to concentrate
solely on the spacing; however, it should be pointed out that this device makes bad
spacing look more innocuous, since a reader isn’t so annoyingly distracted when no
semantic meaning is present anyway.
The most interesting thing about Figure 28 is that the final line of the first paragraph
was brought flush right in order to balance the inserted photograph properly; this
photograph actually carried over into the right-hand column. Version A shows how the
desired effect was achieved by stretching the final three lines, leaving large gaps that
surely caught the curious eye of many a reader; Version B shows how the optimizing
algorithm is magically able to look ahead and make things come out perfectly. Perhaps
even more important is the fact that Version B avoids the need for letterspacing that
spoiled the appearance of lines 6, 9, 10, 23, and 32 in Version A.
Letterspacing-the insertion of tiny spaces between the letters of a word so as to
make large interword spaces less prominent-could readily be incorporated into the
box/glue/penalty model, but it is almost universally denounced by typographers. For
example, De Vinne14 said that letterspacing is improper even when the columns are
so narrow that some lines must contain only a single word; Bruce Rogers39 said ‘it
is preferable to put all the extra space between the words even though the resultant
“holes” are distressing to the eye.’ Even one-fourth of a unit of space between letters
makes the word look noticeably different. According to the style rules of the U.S.
Congressional Record4’, ‘In general, operators should avoid wide spacing. However,
no letterspacing is permitted.’ The optimum-fit algorithm therefore makes it possible
to comply more easily with existing laws.
The idea of applying dynamic programming to line breaking occurred to D. E. Knuth
in 1976, when Professor Leland Smith of Stanford’s music department raised a related
question that arises in connection with the layout of music on a page (see Clancy
and Knuth4’). During a subsequent discussion with students in a problem-solving
seminar, someone pointed out that essentially the same idea would apply to the texts of
paragraphs as well as to music. The box/glue/penalty model was developed by Knuth
in April 1977 when the initial design of T E X was made, although it wasn’t clear at
that time whether a general optimizing algorithm could be implemented with enough
efficiency for practical use. Knuth was blissfully unaware of Cooper’s supposedly
1176 DONALD E. KNUTH AND MICHAEL F. PLASS

nn N m ru F
m
l
A11p11u1l1 Nnnnmm
n nni inu n nnnnrln,
nru nn nnnm-inn
lnlmu, nnn mnn-
nin inn i n n , in niiinin.
- Nrarmmnn 15% ru lnn mun’i
nn N m ru A m mnn munu, inn A m N m
Anmum Nruuuuu~ iiininnni i i n n n niniini ni
nnruumnannnlln, niiinni nninnini in Nniiiin
nlll nn rnIIlIl--1I1P N m ln inn Nnunn N m nn
lnlnnll, nnn lllnnn- lnn unru Nmlnn, nnlnll mnnn-
mn L[LP mu, m m. Nn- llnnDll nnn mnun l l n n nnnnt;
nnmnn 15% ru am mm’i ll~~lll1l1ul3,OOO rum---
innn m, inn A n m u Nn- inn mllnna. mn N m Niun
innlllnlnnalnnnnnmm m Ann Hnnallln-lnnrulm inn
m w n m nnlnnlal in N n m m 1~1nnmrulnnN~Nnnnn
N~unnn ln lnn N w A m l l ~ ~ l~~ llp 1 1 ~ n - m N m
nn inn un ru N I I U ~ U , N n n N ~ I ~ I I Nnn
. 262.4
mlnnllnlnllnnnnlnnnnm mnnru ru inn N.A.A.N. m n n n
nnnru; nm umn 3,000 rum- m n u n m n 100 m u m nmuu
mmmnmnnll.nnnamnNnn nnn nullrl nnlnnxn nnn NnlM-
hhm in Rnn Fhnnallln-un- ninni, Nnini, Nnnnnii nnn
n l m l n n l n n m m l n n nnnnunu N ~ u ~ u m1l111p1.
n Nnm
NU NU llllt7 L[LP 11111- IupIunnl ln mnn,
nxunnnn N m N n n N n ~ m Nnn . m Nnumn, mmm mn m i
262.4 mum ~ u u p ~rul l mn nmnn ru n 19m lllll~ll~l
N.A.A.N. xn nrun umn N n u m n n u m u . Nnnn, I~ILII~II~~,
100 nlrulln nlnnnl nnn nullrl nn- mnrm in nrun m 100 mmrm
i n m mn Nmmunm, Nnuu, nnn hrmnn, NM,
N- ll~l~l Nm- Nnnnnn, N w , ru mn nm-
nun mum. N m rummnnm
~llnlnll m man, m Nnrunn, N m nnn nnnnn ~~1.lnlll. nmu mmnm m A m
nnnni inn mrunrm nmmn ru N m m lllllll~lnnn mum. Nnn ~~nnlnlnnl nmmunnm ni
n 19in n m m n Nurum nmnm. lnn N.A.A.N. m mmm, n n u n m nnn m u m nlnnnallnn m u
Nnnn, nnnnnu, l~lpnn111 nrun
Lnnn 100 lnnnnnl nnn nnllnln
Ilnnnrmn, Nnn, Nnnnnn, N m ,
ln lnn nlllnlu lnlllll ru xlnnrum rlnrmll m m lM IlMln.
N o l n u nnn nnnnn mnmm nnnu nnnnnnn xtn A m i
N m nun llp~~ll~l nnn mum. Nnn nlnnlnlnnl n n m m - m m ru
inn N.A.A.N. m nmm, nnllnnllnla nnn mmnn rumrim mu
Figure 28. This example is based on the spacing in a recent issue of Time magazine, but
all of the letters have been replaced by n’s of various widths. If the text were readable, the
line breaks in Version B would be less distracting than those in Version A .

unsuccessful experiments with dynamic programming, otherwise he might have rejected


the whole idea subconsciously before pursuing it at all.
During the summer of 1977, M. F. Plass introduced the idea of feasible breakpoints
into Knuth’s original algorithm in order to limit the number of active possibilities and
still find the optimum solution, unless the optimum was intolerably bad anyway. This
algorithm was implemented in the first complete version of TEX (March 1978), and
it appeared to work well. The unexpected power of the box/glue/penalty primitives
gradually became clear during the next two years of experience with TEX; and when
somewhat wild uses of negative parameters were discovered (as in the PASCAL and
Math Reviews examples discussed above), it was necessary to ferret out subtle bugs in
the original implementation.
BREAKING PARAGRAPHS I N T O LINES 1177

Finally it became desirable to add more features to TEX’S line-breaking procedure,


especially an ability to vary the line widths with more flexibility than simple hanging
indentation. At this point a more fundamental defect in the 1978 implementation
became apparent, namely that it maintained at most one active node for each break-
point regardless of the fact that a single breakpoint might feasibly occur on different
lines; this meant that the algorithm could miss feasible ways to set a paragraph, in
the presence of sufficiently long hanging indentation. A new algorithm was therefore
developed in the spring of 1980 to replace TEX’S previous method; at that time the
refinements about looseness and adjacent-line mismatches were also introduced, so
that T E X now uses essentially the optimum-fit algorithm that we have discussed in
detail above.

PROBLEMS A N D REFINEMENTS
One unfortunate restriction remains in T E X although it is not inherent in the box/
glue/penalty model: When a break occurs in the middle of a ligature (e.g., if ‘efficient’
becomes ‘ef-ficient’), the computation of character widths is more complicated than
usual. We must take into account not only the fact that a hyphen has some width, but
also the fact that ‘f‘ followed by ‘fi’ is wider than ‘ffi’. The same problem occurs when
setting German text, where some compound words change their spelling when they are
hyphenated (e.g., ‘backen’becomes ‘bak-ken’and ‘Bettuch’becomes ‘Bett-tuch’). T E X
does not permit such optional spelling variants; it will only insert an optional hyphen
character among other unchangeable characters. Manual intervention is necessary in
the rare cases when a more complicated break cannot be avoided.
It is interesting to consider what extension would be needed to make the optimum-
fit algorithm handle cases like the dropping of m’s and n’s in Figure 22. The badness
function of a line would then depend not only on its natural width, stretchability, and
shrinkability; it would also depend on the number of m’s and the number of n’s on
that line. A similar technique could be used to typeset biblical Hebrew, which is never
hyphenated: Hebrew fonts intended for sacred texts usually include wide variants of
several letters, so that individual characters on a line can be replaced by their wider
counterparts in order to avoid wide spaces between words. For example, there is a
super-extended aleph in addition to the normal one. An appropriate badness function
for the lines of such paragraphs would take account of the number of dual-width
characters present.
The most serious unanticipated problem that has arisen with respect to TEX’Sline-
breaking procedure is the fact that floating-point arithmetic was used for all the
calculations of badness, demerits, etc., in the original implementations. This leads
to different results on different computers, since there is so much diversity in existing
floating-point hardware, and since there are often two choices of breakpoints having
almost the same total demerits. It is important to be able to guarantee that all versions
of T E X will set paragraphs identically, because the ability to proofread, edit, and
print a document at different sites is becoming significant. Therefore the ‘standard’
version of TEX, planned for release in 1982, will use fixed-point arithmetic for all of
its calculations.
Books on typography frequently discuss a problem that may be the most serious
consequence of loose typesetting, the occasional gaps of white space that are called
‘houndsteeth’ or ‘lizards’ or ‘rivers’. Such ugly patterns, which run up through a
1178 DONALD E. KNUTH AND MICHAEL F. PLASS

sequence of lines and distract the reader’s eye, cannot be eliminated by a simple efficient
technique like dynamic programming. Fortunately, however, the problem almost never
arises when the optimum-fit algorithm is used, because the computer is generally able
to find a way to set the lines with suitably tight spacing. Rivers begin to be prevalent
only when the tolerance threshold p has been set high for some reason, for example in
Figure 7 where an unusually narrow column is being justified, or in Figure 18(d)where
the paragraph is two lines longer than optimum. Another case that sometimes leads to
rivers arises when the text of a paragraph falls into a strictly mechanical pattern, as
when a newspaper lists all of the guests at a large dinner party. Extensive experience
with T E X has shown, however, that manual removal of rivers is almost never necessary
after the optimum-fit algorithm has been used.
T h e box/glue/penalty model applies in the vertical dimension as well as in the
horizontal, so TEX is able to make fairly intelligent decisions about where to start
each new page. T h e tricks we have discussed for such things as ragged-right setting
correspond to analogous vertical tricks for such things as ‘ragged-bottom’ setting.
However, the current implementation of TEX keeps each page in memory until it has
been output, so T E X cannot store an entire document and find strictly optimum page
breaks using the algorithm we have presented for line breaks. The ‘best-fit’ method is
therefore used to output one page at a time.
Experiments are now in progress with a two-pass version of TEX that does find
globally optimum page breaks. This experimental system will also help with the
positioning of illustrations as near as possible to where they are cited in the accom-
panying text, taking proper account of the fact that certain pages face each other.
Many of these issues can be resolved by extending the dynamic programming technique
and the box/glue/penalty model of this paper, but some closely related problems can
be shown to be N P complete.42

APPENDIX: A STRIPPED-DOWN ALGORITHM


Many applications of line breaking (e.g., in word processors) do not need all of the
machinery of the general optimizing algorithm described in the text above, and it
is possible to simplify the general procedure considerably while at the same time
decreasing its space and time requirements, provided that we are willing to simplify the
problem specifications and to tolerate less than optimal performance when hyphenation
is necessary. The ‘suboptimum-fit’ program below is good enough to discover the line
breaks of Figure 3 or Figure 4(c), but it will not handle some of the more complicated
examples. More precisely, the stripped-down program assumes that
a) Instead of the general box/glue/penalty model, the input is specified by a sequence
w 1 . . . w , of nonnegative box widths representing the words of the paragraph and
the attached punctuation, together with a sequence of small integersg, . . .g, that
specifies the type of space to be used between words. For example, we might
have g k = 1 when a normal interword space follows the box of width wk, while
g, = 2 when there is to be no space since box k ends with an explicit hyphen,
and gk = 3 when box k is the end of the paragraph. Other type codes might be
used after punctuation. Each type g corresponds to three nonnegative numbers
(x,,~,, zg)representing respectively the normal spacing, the stretchability, and the
shrinkability of the corresponding type of space. For example, if types 1,2, and 3
BREAKING PARAGRAPHS INTO LINES 1179

are used with the meanings just suggested, we might have

(xl,Yl,Zl) = (6,392) between words


(x2,y2,z 2 )= (0, 0,O) after explicit hyphens or dashes
(x3,y3,z 3 )= (0, GO, 0) to fill the final line

in terms of &em units, where GO stands for some large number. T h e width w1
of the first box should include the blank space needed for paragraph indentation;
thus, the Grimm fairy tale example of Figure 1 would be represented by

w l , . . . , W , = 34,42,42,.. . , 2 4 , 3 9 , 3 0 , .. . , 6 0 , 7 9
g, ,...,g,= 1 , 1 , 1 ,..., 1 , 2, 1, . . . , 1 , 3

corresponding to

‘ L I n ’ , ‘olden’, ‘times’, . . . , ‘old’,‘lime-’, ‘tree’, . . . ,‘favorite’, ‘plaything.’

respectively, using widths from a typical roman font of type. T h e general input
sequences w1 . . . w, and g, . . .g, can be expressed in the box/glue/penalty model
by the equivalent specification

followed by ‘penalty(0, - 00,0)’ to finish the paragraph.


b) All lines must have the same width I, and each w k is less than 1.
c) No word will be hyphenated unless there is no way to set the paragraph without
violating minimum or maximum constraints on spacing. The minimum for type g
spaces is

z; = xg-zg
and the maximum is
Y; = xg + PYg,

where p is a positive tolerance that can be varied by the user. For example, if
p = 2 the maximum type g space is xg+ 2yg, the normal amount plus twice the
stretchability.
d) Hyphenation is performed only at the point where feasible line breaking becomes
impossible, even though it may be better to hyphenate an earlier word. Thus,
the general optimum-fit algorithm of the text will give substantially better results
when high-quality output is desired and hyphenation is frequently necessary.
e) No penalty is assessed for a tight line next to a loose line, or for consecutive
hyphenated lines, and the algorithm does not produce paragraphs that are longer
or shorter than the optimum length. (In other words, a = y = q = 0 in the
general algorithm.)
Under these restrictions, optimum breakpoints can be found with extra efficiency.
1180 DONALD E. KNUTH AND MICHAEL F. PLASS

The suboptimum-fit algorithm manipulates two arrays:

where sk denotes the minimum sum of demerits leading to a break after box k , or
s k = 00 if there is no feasible way to break there; and

where p k is meaningful only if sk < co, in which case the best case to end a line at
box k is to begin it with box pk + 1 . We also assume that

this represents an invisible box at the very end of the final line of the paragraph.
+ ,,
Besides the 4n 4 storage locations for w 1 . . .w,+ g, . . .g,, so. . . s,+ and
p , . . . p n +1 , and the memory required to hold the parameters I, p , and (xg,yi, .zi) for
each type g, the stripped-down algorithm needs only a few miscellaneous variables:

a = the beginning of the paragraph (normally 0, changed after hyphenation);


k = the current breakpoint being considered;
j = the breakpoint being considered as a predecessor of k;
i = the leftmost breakpoint that could feasibly precede k;
rn = the number of active breakpoints (i.e., subscriptsj2i with sj< co);
C = the normal width of a line from i to k;
C,,, = the maximum feasible width of a line from i to k;
Cmin= the minimum feasible width of a line from i to k;
C’ = the normal width of a line f r o m j to k;
Ckax= the maximum feasible width of a line f r o m j to k;
Cki, = the minimum feasible width of a line f r o m j to k;
r = adjustment ratio f r o m j to k;
d = total demerits from a to. . .to j to k;
d‘ = minimum total demerits known from a t o . . .to k;
j’ = predecessor of k that leads to d’total demerits, if d’ < 00.
All of these variables are integers, except r , which will be a fraction in the range
- 1 < r < p . The reader may verify the validity of the algorithm by verifying that
these interpretations of the variables remain invariant in key places as the program
proceeds.
Here now is the program, viewed from the ‘top down’:
BREAKING PARAGRAPHS INTO LINES 1181

if m =0 or k > n then exit loop;


x = x + wk+ + XQ; xmax:=
1 ‘cmax +y g k ;
I .
xmin = Z m i n f z&.;
k : = k+l;
repeat;
if K > n then
begin output(a, n+ 1); exit loop;
end
else begin (try to hyphenate box k, then output from a to this break);
a : = k-1;
end;
repeat.
The operation ‘advance i by 1’ is carried out only when Zmin> I, and this cannot
happen when k = i+ 1 since Zmin= wk< I in such a case. Therefore the while
loop terminates; we have
(advance i by 1) =
begin if si<co then m : = m - 1 ;
i:= i + l ;
Z : = c-wi-zgl;Zmax:=C,a,-wi-y;l; xmin:=
zrnin-wi-z1;gt
end.
The inner loop of the suboptimal-fit program is simpler and faster than the corre-
sponding loop in the general optimum-fit algorithm because it does not consider active
breakpoints near k , only those that are approximately one line-width away:
(examine all feasible lines ending at k) =
b e g i n j : = i; Z’:= Z; ELax:=;,Z
,, . . d’:= co;
. := Cmin,
C’min
while ELax2 I do
begin if sj < GO then (consider breaking from a t o . . . to j to k);
j : = j + 1;
X‘ : = C’- wj-xxg,;ZLax: = ZLax- wj-y;,; XLin : = x’ mln - w J.- zr9,’* ’

end.
Again we can conclude that the while loop must terminate, since it will not be executed
when k = j + 1. The innermost code is easily fleshed out:
(consider breaking from a t o . . . t o j to k) =
-
begin if Z’ < I then r := p (I - Z’)/(XLax- Z’)
else if C’ > I then r : = (I - Z’)/(Z’- ELi,,)
else r := 0;
+ I
d := sj+ (1 100 r ) 3 ) 2 ;
if d < d’ then
begin d’ : = d; j’:= j ;
end;
end.
When hyphenation is necessary, the algorithm goes into panic mode, first searching
for the last value of i that was feasible, then attempting to split word k. At this point
the line from i to k - 1 is too short, and from i to k it is too long, so there is hope
that hyphenation will succeed.
1182 DONALD E. KNUTH AND MICHAEL F. PLASS

(try to hyphenate box k, then output from a to this break) =


begin loop: C := C + wi+ xE,;
Emax:=C,,, + ~ ~ + y i , ; C , , ,C,,,in+wi+zi,;
~~:= i: = i-1;
if si< 00 then exit loop;
repeat;
output(a, i);
(split box k at the best place);
(output the line up to the best split and adjust wk for continuing);
end.
Let us suppose that there are h, ways to split box k into two pieces, where the widths
of these pieces in t h e j t h such split are wij and wij, respectively; here wij includes
the width of an inserted hyphen. An auxiliary hyphenation algorithm is supposed to be
able to compute hk and these piece widths on demand; this algorithm is invoked only
when we reach the routine ‘split box k at the best place’. If no hyphenation is desired
one can simply let hk = 0, and the program below becomes much simpler. There are
h,+ 1 alternatives to be considered, including the alternative of not splitting at all,
and the choice can be made as follows:
(split box k at the best place) =
begin (invoke hyphenation algorithm to compute h, and the piece widths);
j’:=O;d’:=a;
+ <
for j := 1 to hk do if Emin wLj - wk Z then
+
begin C’ := Z wij - wk;
if C ’ d I then d : = 10000p~(Z-C’)/(100(C,,,-C)+1)
else d : = 1000O~(Y-Z)/(lOO(Z-C,i,)+ 1);
if d < d’then
begin d’:= & j ’ : = j ;
end;
end;
end.
The final operation, ‘output the line up to the best split and adjust wk for continuing’,
will only be sketched here since it is much easier to state it informally than to introduce
still more notation. If j’ # 0, so that hyphenation is to be performed, the program
outputs a line from box i + l to box k inclusive, but with box k replaced by the
hyphenated piece of width w i j r ;then wk is replaced by the width of the other fragment,
namely wij,. In the other case whenj’ = 0, the program simply outputs a line from
box i+ 1 to box k- 1 inclusive.
One more loose end needs to be tightened up: The procedure ‘output(a,i)’ simply
goes through the p table determining the best line breaks from a to i and typesets
the corresponding lines. One way to do this without requiring extra memory space
is to reverse the relevant p-table entries so that they point to successors instead of
predecessors:
procedure output(integer a , i) =
begin integer q, Y, s; q : = i; s : = 0;
while q # a do
beginr:=p,;p,:=s; s : = q ; q : = r ;
end;
BREAKING PARAGRAPHS I N T O LINES 1183

while q # i do
begin (output the line from box q+ 1 to box s, inclusive);
q:=s; s:=pq;
end;
end.

In practice there is only a bounded amount of memory available for implementing this
algorithm, but arbitrarily long paragraphs can be handled if we make a minor change
suggested by Cooper33:When the number of words in a given paragraph exceeds some
maximum number nmax,apply the method to the first nmaxwords; then output all
but the final line and resume the method again, beginning with the copy carried over
from the line that was not output.

ACKNOWLEDGEMENTS

We wish to thank Barbara Beeton of the American Mathematical Society for numerous
discussions about ‘real world’ applications; we also are grateful to James Eve of the
University of Newcastle-Upon-Tyne and Neil Wiseman of Cambridge University for
helping us obtain literature that was not readily available in California; and we thank
the librarians of the rare book rooms at Columbia University and Stanford University
for letting us study and photograph excerpts from polyglot Bibles. John Wiley & Sons
Limited have taken unudual care in typesetting this paper in exact accordance with
the line breaks and page breaks found by TEX.

REFERENCES
1. Michael P. Barnett, Computer Typesetting: Experiments and Prospects, M.I.T. Press, Cambridge,
Mass., 1965.
2. C. J. Duncan, ‘Look! No hands!’, The Penrose Annual 57, 121-168 (1964).
3. Michael R. Garey and David S. Johnson, Computers and Intractability, W. H. Freeman, San
Francisco, 1979.
4. Richard Bellman, Dynamic Programming, Princeton Univ. Press, Princeton, N.J., 1957.
5. M. Held and R. M. Karp, ‘The construction of discrete dynamic programming algorithms’, I B M
SystemsJ. 4, 136-147 (1965).
6. Donald E. Knuth, TEX and M E T A F O N T : New Directions in Typesetting, American Mathematical
Society and Digital Press, Bedford, Massachusetts, 1979.
7. Jakob Ludwig Karl Grimm and Wilhelm Karl Grimm, ‘Der Froschkonig (The Frog King)’, in
Kinder- und Hausmarchen, first published in Berlin, 1812. For the history of this story see Heinz
Rolleke, Die Altese Marchensammlung der Bruder Grimm, Fondation Martin Bodmer, Cologny-
G e n b e , 1979, pp. 144-153.
8. C . J. Duncan, J . Eve, L. Molyneux, E. S. Page, and Margaret G. Robson, ‘Computer typesetting:
an evaluation of the problems,’ Printing Technology 7 , 133-151 (1963).
9. Donald E. Knuth, Seminumerical Algorithms, Vol. 2 of The A r t of Computer Programming, second
edition, Addison-Wesley, Reading, Massachusetts, 1981.
10. A. Frey, Manuel Nouveau de Typographie, Paris (1835), 2 vols.
11. Kathleen Jensen and Niklaus Wirth, P A S C A L User Manual and Report, Heidelberg, Springer-
Verlag, 1975.
12. Donald E. Knuth, ‘BLAISE, a preprocessor for PASCAL,’ file BLAISE.DEK[up,doc] at SU-AI on
the ARPA network (March 1979). The program itself is on file BLAISE.SAI[tex,dek].
13. Donald E. Knuth, Tau Epsilon Chi: A System for Technical Text, book in preparation.
14. Theodore Low De Vinne, Correct Composition, Vol. 2 of The Practice of Typography, Century, New
York, 1901. The cited material appears on pages 138 and 206.
1184 DONALD E. KNUTH AND MICHAEL F. PLASS

15. George Bernard Shaw, ‘On Modern Typography’, The Dolphin 4, 80-81 (1940).
16. T. H . Darlow and H. F. Moule, Historical Catalogue of the Printed Editions of Holy Scripture in the
library of The British and Foreign Bible Society, T h e Bible House, London, 1911.
17. Basil Hall, The Greatest Polyglot Bibles, T h e Book Club of California, San Francisco, 1966.
18. Jimbnez de Cisneros, sponsor, Uetus testamentum multiplici lingua nunc primo impressum, Industria
Arnaldi Guillelmi de Brocario in Academia Complutensi, 1522. [The printing was completed in
1517, but papal permission to publish this book was delayed for several years.]
19. Aug. Giustiniani, Psalteriunz, Genoa, 1516.
20. Benedictus Arias Montanus, editor, Biblia Sacra Hebraice, Chaldaice, Grace, & Latine, Christoph.
Plantinus, Antwerp, 1569-1 573.
21. Brianus Waltonus, editor, Biblia Sacra Polgyglotta, Thomas Roycroft, London, 1657.
22. David Wolder, Biblia Sacra G r a c e , Latine & Germanice, Jacobus Lucius Juni., Hamburg, 1596.
23. Walter E. Houghton, Jr., ‘The History of Trades: its relation to seventeenth century thought,’
in Philip P. Wiener and Aaron Noland, eds., Roots of Scientific Thought, Basic Books, New York,
1957, pp. 354-381.
24. Joseph Moxon, Mechanick Exercises, J. Moxon, London, 1683. Reprinted by the Typothetae of
New York, 1896, with preface and notes by T. L. De Vinne; also reprinted by Oxford University
Press, London, 1958; but these reprints d o not capture the full feeling of the original, with its
less sumptuous seventeenth-century workmanship. Quoted passages are from vol. 2, pp. 214-21 5,
226, 245, 248.
25. D. G. Berri, The A r t of Printing, London, 1864.
26. Samual A. Bartels, The A r t of Spacing, T h e Inland Printer, Chicago, 1926.
27. G. P. Bafour, A. R. Blanchard, and F. H. Raymond, ‘Automatic Composing Machine,’ U.S. Patent
2762485, September 11, 1956. (See also British patent 771551 and French patent 1103000.)
28. G. Bafour, ‘A new method for text composition-The BBR System,’ Printing Technology 5,
no. 2, 1961, 65-75.
29. Joseph F. Ossanna, ‘NROFF/TROFF User’s Manual,’ Bell Telephone Laboratories Internal
memorandum, Murray Hill, New Jersey, 1975.
30. Paul E. Justus, ‘There is more to typesetting than setting type’, IEEE Trans. on Prof. Commun.
PC-15, 13-16, 18 (1972).
31. John Pierson, Computer Composition using P A G E - 1 , Wiley-Interscience, New York, 1972.
3 2. Information International, Inc., ‘PAGE-3 Composition Language,’ privately distributed. First
edition, October 31, 1975; second edition, October 20, 1976. T h e language is sometimes called
‘PAGE-111’ because of the company that created it.
33. P. I . Cooper, ‘The influence of program parameters on hyphenation frequency in a sophisticated
justification program,’ Advances in Computer Typesetting [Proceedings of the 1966 International
Computer Typesetting Conference], T h e Institute of Printing, London, 1967, 176-178, 21 1-212.
34. [Untitled] Moderators’ summaries of the papers presented at the International Computer Type-
setting Conference at the University of Sussex, T h e Institute of Printing, London, 1966.
35. Alison M. Pringle, ‘Justification with fewer hyphens,’ Rainbow Memo 170, University of Cam-
bridge Computer Laboratory, March 1980,
36. Hanan Samet, ‘Heuristics for the line division problem in computer justified text,’ preprint, Uni-
versity of Maryland, 1980.
37. H. D. Parks, ‘Computerized processing of editorial copy’, Advances in Computer Typesetting [Pro-
ceedings of the 1966 International Computer Typesetting Conference], T h e Institute of Printing,
London, 1967, 176-178, 211-212.
38. Herman Parks, contributions to the discussions, Proc. A S I S Workshop on Computer Composition,
American Society for Information Science, 1971, pp. 143-145, 151, 180-182.
39. Bruce Rogers, Paragraphs on Printing, William E. Rudge’s Sons, New York, 1943, p. 88.
40. U.S. Government Printing Office,Style Manual, Washington, D.C., 1973. T h e quote is from rule 22
(catch?).
41. Michael J . Clancy and Donald E. Knuth, ‘A programming and problem-solving seminar,’ report
STAN-(3-77-606, Computer Science Department, Stanford University, April 1977, 85-88.
42. Michael F. Plass, ‘Optimal pagination techniques for automatic typesetting systems,’ PI1.D. thesis,
Stanford University, June 1981.
S t r u c t u r e d P r o g r a m m i n g w i t h go to S t a t e m e n t s

DONALD E. KNUTH
Stanford University, Stanford, California 9~S05

A consideration of several different examples sheds new light on the problem of ereat-
ing reliable, well-structured programs that behave efficiently. This study focuses
largely on two issues: (a) improved syntax for iterations and error exits, making it
possible to write a larger class of programs clearly and efficiently without g o t o state-
ments; (b) a methodology of program design, beginning with readable and correct,
but possibly inefficient programs that are systematically transformed if necessary into
efficient and correct, but possibly less readable code. The discussion brings out op-
posing points of view about whether or not g o t o statements should be abolished;
some merit is found on both sides of this question. Fina!ly, an attempt is made to
define the true nature of structured programming, and to recommend fruitful direc-
tions for further study.
Keywords and phrases: structured programming, g o t o statements, language design,
event indicators, recursion, Boolean variables, iteration, optimization of programs,
program transformations, program manipulation systems searching, Quieksort,
efficiency
CR categories: 4.0, 4.10, 4.20, 5.20, 5.5, 6.1 (5.23, 5.24, 5.25, 5.27)

You m a y go when you will go,


And I will stay behind.
--Edna St. Vincent Millay [66]
Most likely you go your way and I'll go mine.
--Song title by Bob Dylan [33]
Do you suffer from painful elimination?
--Advertisement, J. B. Williams Co.

INTRODUCTION change your life. The reasons for this revolu-


tion and its future prospects have been aptly
A revolution is taking place i n the way we described by E. W. Dijkstra in his 1972 Tur-
write programs and teach programming, be-
ing Award Lecture, " T h e Humble Program-
cause we are beginning to understand the
mer" [27l.
associated mental processes more deeply. I t
As we experience this revolution, each of
is impossible to read the recent book Struc-
tured programming [17; 55] without having it us naturally is developing strong feelings one
way or the other, as we agree or disagree
This research was supported in part by the Na- with the revolutionary leaders. I must admit
tional Science Foundation under grant number
GJ 36473X, and by IBM Corporation. to being a nomhumble programmer, egotisti-

Copyright (~) 1974, Association for Computing Machinery, Inc. General permission to republish,
but not for profit, all or part of this material is granted, provided that ACM's copyright notice is
given and that reference is made to this publication, to its date of issue, and to the fact that reprint-
ing privileges were granted by permission of the Association for Computing Machinery.

Computing Surveys, V?L 6, No. 4, Dee,ember 1974


262 , Donald E. Knuth

CONTENTS eal enough to believe that my own opinions


of the current treads are not a waste of the
reader's time. Therefore I want to express in
this article several iof the things that struck
me most forcefully as I have been thinking
about structured programming during the
last year; several of my blind spots were re-
INTRODUCTION moved as I ivas learning these things, and I
1. E L I M I N A T I O N OF s o to S T A T E M E N T S hope I can convey some of my excitement to
Historical Background
A Searching Example
the reader. Hardly any of the ideas I will
Efficiency discuss are my own; they are nearly all the
Error Exits work of others, but perhaps I may be pre-
Subscript Checking
Hash Coding senting them in a new light. I write this
Text Scanning article in the first person to emphasize the
A Confession fact that what I'm saying is just one man's
Tree Searching
Systematic Elimination opinion; I don't expect to persuade everyone
E v e n t Indicators that my present views are correct.
Comparison of Features Before beginning a more technical discus-
Simple Iterations
2. I N T R O D U C T I O N OF s o to S T A T E M E N T S sion. I should confess that the title of this
Recursion Elimination article was chosen primarily to generate
Program Manipulation Systems
Reeursion vs. Iteration
attention. There are doubtless some readers
Boolean Variable Elimination who are convinced that abolition of go t o
Coroutines statements is merely a fad. and they may see
Quicksort : A Digression
Axiomatics of Jumps
this title and think, "Aha! Knuth is rehabili-
Reduction of Complication tating the go to statement, and we can go
3. CONCLUSIONS back to our old ways of programming
Structured Programming
With go to Statements
again." Another class of readers will see the
Efficiency heretical title and think, "When are die-
T h e Future hards like Knuth going to get with it?" I
ACKNOWLEDGMENTS
APPENDIX hope that both classes of people will read on
BIBLIOGRAPHY and discover that what I am really doing is
striving for a reasonably well balanced view-
point about the proper role of go to state-
ments. I argue for the elimination of go to's
in certain cases, and for their introduction in
others.
I believe that by presenting such a view I
am not in fact disagreeing sharply with
Dijkstra's ideas, since he recently wrote the
following: "Please don't fall into the trap of
believing that I am terribly dogmatical
about [the go to statement]. I have the
uncomfortable feeling that others are making
a religion out of it, as if the conceptual
problems of programming could be solved by
a single trick, by a simple form of coding
discipline!" [29]. In other words, it, seems
that fanatical advocates of the New Pro-
gramming are going overboard in their strict
enforcement of morality and purity in
programs. Sooner or later people are going
to find that their beautifully-structured

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with go t o Sta~ment~ * 263
!
programs are running at only half the speed we also hear complaints about floating-point
--or worse--of the dirty old programs they calculations, global variables, semaphores,
used to write, and they will mistakenly blame pointer variables, and even assignment
the structure instead of recognizing what is statements. Soon we might be restricted to
probably the real culprit--the system over- only a dozen or so programs that are suffi-
head caused by typical compiler implementa- ciently simple to be allowable; then we will
tion of Boolean variables and procedure calls. be almost certain that these programs
Then we'll have an unfortunate counter- cannot lead us into any trouble, but of
revolution, something like the current rejec- course we won't be able to solve many
tion of the "New Mathematics" in reaction problems.
to its over-zealous reforms. In the mathematical ease, we know what
It may be helpful to consider a further happened: The intuitionists taught the other
analogy with mathematics. In 1904, Bert- mathematicians a great deal about deductive
rand Russell published his famous paradox methods, while the other mathematicians
about the set of all sets which aren't mem- cleaned up the classical methods and even-
bers of themselves. This antinomy shook the tually "won" the battle. And a revolution
foundations of classical mathematical rea- did, in fact, take place. In the computer
soning, since it apparently brought very science case, I imagine that a similar thing
simple and ordinary deductive methods into will eventually happen: purists will point the
question. The ensuing crisis led to the rise way to clean constructions, and others will
of "intuitionist logic", a school of thought find ways to purify their use of floating-point
championed especially by the Dutch mathe- arithmetic, pointer variables, assignments,
matician, L. E. J. Brouwer; intuitionism etc., so that these classical tools can be used
abandoned all deductions that were based on with comparative safety.
questionable nonconstructive ideas. For a Of course all analogies break down, includ-
while it appeared that intuitionist logic ing this one, especially since I'm not yet
would cause a revolution in mathematics. conceited enough to compare myself to
But the new approach angered David Hil- David Hilbert. But I think it's an amusing
bert, who was perhaps the leading mathema- coincidence that the present programming
tician of the time; Hilbert said that "For- revolution is being led b y another Dutchman
bidding a mathematician to make use of the (although he doesn't have extremist views
principle of the excluded middle is like corresponding to Brouwer's); and I do
forbidding an astronomer his telescope or a consider assignment statements and pointer
boxer the use of his fists." He characterized variables to be among computer science's
the intuitionist approach as seeking "to "most valuable treasures!'.
save mathematics by throwing overboard At the present time I think we are on the
all that is troublesome . . . . They would chop verge of discovering at last what program-
up and mangle the science. If we would ming languages should really be like. I look
follow such a reform as they suggest, we forward to seeing many responsible experi-
could run the risk of losing a great part of our ments with language design during the next
most valuable treasures" [80, pp. 98-99, few years; and my dream is that by 1984 we
148-150, 154-157, 184-185, 268-270]. will see a consensus developing for a really
Something a little like this is happening good programming language (or, more likely,
in computer science. In the late 1960's we a coherent family of languages). Further-
witnessed a "software crisis", which many more, I'm guessing that people will become
people thought was paradoxical because so disenchanted with the languages they are
programming was supposed to be so easy. now using--even COBOL and FORTrAN--
As a result of the crisis, people are now be- that this new language, UTOPXA84, will have
ginning to renounce every feature of pro- a chance to take over. At present we are far
gramming that can be considered guilty by from that goal, yet there are indications
virtue of its association with difficulties. Not that such a language is very slowly taking
only go to statements are being questioned; shape.

Computing Surveys, Vol. 6, No. 4, December 1974


264 • Donald E. Knuth

Will UTOPIA 84, or perhaps we should call documentation of a program, instead of using
it NEWSPEAK, contain go to statements? At flow charts... Then I would code the pro-
gram in assembly language from the outline.
the moment, unfortunately, there isn't even Everyone liked these outlines better than
a consensus about this apparently trivial t h e flow charts I had drawn before, which
issue, and we had better not be hung up on w e r e not very neat--my flow charts had been
the question too much longer since there are nick-named "balloon-o-grams".
only ten years left. He reported that this method made programs
I will try in what follows to give a reason- easier to plan, to modify and to check out.
ably comprehensive survey of the go to When I met Schorre in 1963, he told me of
controversy, arguing both pro and con, with- his radical ideas, and I didn't believe they
out taking a strong stand one way or the would work. In fact, I suspected that it was
other until the discussion is nearly complete. really his rationalization for not finding an
In order to illustrate different uses of go t o easy way to put labels and go t o statements
statements, I will discuss many example into his META-II subset of ALGOL [84], a
programs, some of which tend to negate the language which I liked very much except for
conclusions we might draw from the others. this omission. In 1964 I challenged him to
There are two reasons why I have chosen to write a program for the eight-queens prob-
present the material in this apparently lem without using go to statements, and he
vacillating manner. First, since I have the responded with a program using recursive
opportunity to choose all the examples, I procedures and Boolean variables, very much
don't think it's fair to load the dice by select- like the program later published independ-
ing only program fragments which favor one ently by Wirth [96].
side of the argument. Second, and perhaps I was still not convinced that all go t o
most important, I tried this approach when I statements could or should be done away
lectured on the subject at UCLA in Feb- with, although I fully subscribed to Peter
ruary, 1974, and it worked beautifully: Naur's observations which had appeared
nearly everybody in the audience had the about the same time [73]. Since Naur's
illusion that I was largely supporting his or comments were the first published remarks
her views, regardless of what those views about harmful go to's, it is instructive to
were ! quote some of them here:
If you look carefully you will find that surpris-
ingly often a g o t o statement which looks back
1. ELIMINATION OF go to STATEMENTS really is a concealed for statement. And you
will be pleased to find how the clarity of the
Historical Background
algorithm improves when you insert the f o r
clause where it belongs. . . . If the purpose [of
At the I F I P Congress in 1971 I had the a programming course] is to teach ALGOLpro-
pleasure of meeting Dr. Eiichi Goto of gramming, the use of flow diagrams will do
Japan, who cheerfully complained that he more harm than good, in my opinion.
was always being eliminated. Here is the The next year we find George Forsythe
history of the subject, as far as I have been also purging go to statements from algo-
able to trace it. rithms submitted to Communications of the
The first programmer who systematically A C M (cf. [53]). Incidentally. the second
began to avoid all labels and go to state- example program at the end of the original
ments was perhaps D. V. Schorre, then of ALGOL 60 report [72] contains four go t o
UCLA. He has written the following account statements, to labels named AA, BB, CC,
of his early experiences [85]: and DD, so it is clear that the advantages of
Since the summer of 1960, I have been writing ALGOL'S control structures weren't fully
programs in outline form, using conventions of perceived in 1960.
indentation to indicate the flow of control. I In 1965, Edsger Dijkstra published the
have never found it necessary to take excep- following instructive remarks [21]:
tion to these conventions by using go state-
ments. I used to keep these outlines as original Two programming department managers from

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with go to Statements • 265

different countries and different b a c k g r o u n d s By 1967, the entire X P L compiler had


- - t h e one m a i n l y scientific, the o t h e r m a i n l y been written by McKeeman, Homing, and
c o m m e r c i a l - - h a v e communicated to me, in-
d e p e n d e n t l y of e a c h o t h e r a n d on t h e i r own
Wortman, using go to :only once ([65], pp.
initiative, t h e i r o b s e r v a t i o n t h a t the q u a l i t y 365-458; the go to is on page 385). In 1971,
of t h e i r programmers was inversely propor- Christopher Strachey [87] reported that "It
tional to the density of goto s t a t e m e n t s in is my aim to write programs with no labels.
t h e i r programs . . . . I have done various pro- I am doing quite well. I have got the operat-
g r a m m i n g e x p e r i m e n t s . . , in modified ver-
sions of ALGOL 60 in which the goto s t a t e m e n t ing system down to 5 labels and I am plan-
was abolished . . . . T h e l a t t e r versions were ning to write a compiler with no labels at
more difficult to make: we are so familiar w i t h all." In 1972, an entire session of the ACM
the j u m p order t h a t i t requires some effort to National Conference was devoted to the
forget it! I n all cases tried, however, the
program w i t h o u t the goto s t a t e m e n t t u r n e d
subject [44; 60; 100]. The December, 1973,
o u t to be s h o r t e r and more lucid. issue of Datamation featured five articles
about structured programming and elimina-
A few months later, at the ACM Pro- tion of go to's [3; 13; 32; 64; 67]. Thus, it is
gramming Languages and Pragmatics Con- clear that sentiments against go to state-
ference, Peter Landin put it this way [59]: ments have been building up. In fact, the
T h e r e is a game sometimes played w i t h ALGOL discussion has apparently caused some
60 p r o g r a m s - - r e w r i t i n g t h e m so as to avoid people to feel threatened; Dijkstra once told
using g o t o s t a t e m e n t s . I t is p a r t of a more me that he actually received '% torrent of
embracing g a m e - - r e d u c i n g the e x t e n t to
which the program conveys its information b y
abusive letters" after publication of his
explicit sequencing . . . . The game's signifi- article.
cance lies in t h a t it frequently produces a The tide of opinion first hit me personally
more " t r a n s p a r e n t " program---easier to in 1969, when I was teaching an introductory
u n d e r s t a n d , debug, modify, a n d incorporate programming course for the first time. I
into a larger program.
remember feeling frustrated on several
Peter Naur reinforced this opinion at the occasions, at not seeing how to write pro-
same meeting [74, p. 179]. grams in the new style; I would run to Bob
The next chapter in the story is what many Floyd's office asking for help, and he usually
people regard as the first, because it made the showed me what to do. This was the genesis
most waves. Dijkstra submitted a short of our article [52] in which we presented two
article to Communications of the ACM, de- types of programs which did not submit
voted entirely to a discussion of go to state- gracefully to the new prohibition. We found
meats. In order to speed publication, the that there was no way to implement certain
editor decided to publish Dijkstra's article simple constructions with while and condi-
as a letter, and to supply a new title, "Go to tional gtatemeats substituted for go to's,
statement considered harmful". This note unless extra computation was specified.
[23] rapidly became well-known; it expressed During the last few years several languages
Dijkstra's conviction that go to's "should have appeared in which the designers
be abolished from all 'higher level' program- proudly announced that they have abolished
ming languages (i.e., everything except, the go to statement. Perhaps the most
perhaps, plain machine code) . . . . The go t o prominent of these is Brass [98], which
statement as it stands is just too primitive; originally replaced go to's by eight so-called
it is too much an invitation to make a mess of "escape" statements. And the eight weren't
one's program." He encouraged looking for even enough; the authors wrote, "Our
alternative constructions which may be mistake was in assuming that there is no
necessary to satisfy all needs. Dijkstra also need for a label once the go to is removed,"
recalled that Heinz Zemanek had expressed and they later [99, 100] added a new state-
doubts about go to statements as early as ment "leave (label) w i t h (expression)"
1959; and that Peter Landin, Christopher which goes to the place after the statement
Strachey, C. A. R. Hoare and others had identified by the (label). Other go to-less
been of some influence on his thinking. languages for systems programming have

Computing Surveys, VoL 6, No. 4, December 1974


266 • Donald E. Knuth

similarly introduced other statements which more computation an.d aren't really more
provide "equally powerful" alternative ways perspicuous. Therefore, this example has
to jump. been widely quoted in defense of the go to
In other words, it seems that there is wide- statement, and it is appropriate to scrutinize
spread agreement that go to statements are the problem carefully.
harmful, yet programmers and language Let's suppose that we've been forbidden
designers still feel the need for some euphe- to use go to statements, and that we want
mism that "goes to" without saying go to. to do precisely the computation specified in
Example 1 (using the obvious expansion of
A Searching •×ampM such a for statement into assignments and
What are the reasons for this? In [52], Floyd a while iteration). If this means not only
and I gave the following example of a typical that we want the same results, but also that
program for which the ordinary capabilities we want to do the same operations in the
of w h i l e and if statements are inadequate. same order, the mission is impossible. But if
Let's suppose that we want to search a table we are allowed to weaken the conditions
A[1] . . . A[m] of distinct values, in order to just slightly, so that a relation can be tested
find where a given value x appears; if x is not twice in succession (assuming that it will
present in the table, we want to insert it as yield the same result each time, i.e., that it
an additional entry. Let's suppose further has no side-effects), we can solve the problem
that there is another array B, where B[,] as follows:
equals the number of times we have searched
Example la:
for the value A[i]. We might solve such a
problem as follows: i:=1;
w h i l e i < m a n d A[i] # x d o i :-- i + 1 ;
E x a m p l e 1: i f i > m t h e n ra : = i; A[i] : = x; B[i] ::= 0 fi;
B[i] : = B [ i ] + I ;
for i : = 1 s t e p 1 u n t i l m d o .
i f A[i] = x t h e n go t o f o u n d fi; The a n d operation used here stands for
n o t f o u n d : i : = r e + l ; m : = i; McCarthy's sequential conjunction operator
A[i] : = x; B[i] : = 0;
f o u n d : B[i] : = B [ i ] + I ; [62, p. 185]; i.e., "p a n d q" means "if p
t h e n q else false fl", so that q is not evalu-
(In the present article I shall use an ad hoc ated when p is false. Example la will do
programming language that is very similar exactly the same sequence of computations
to ALGOL60, with one exception: the symbol as Example 1, except for one extra compari-
fi is required as a closing bracket for all i f son of i with m (and occasionally one less
statements, so that begin and end aren't computation of m + 1). If the iteration in this
needed between t h e n and else. I don't while loop is performed a large number of
really like the looks of fi at the moment; but times, the extra comparison has a negligible
it is short, performs a useful function, and effect on the running time.
connotes finality, so I'm confidently hoping Thus, we can live without the go to in
that I'll get used to it. Alan Perlis has re- Example 1. But Example la is slightly less
marked that tl is a perfect example of a readable, in my opinion, as well as slightly
cryptic notation that can make program- slower; so it isn't clear what we have gained.
ming unnecessarily complicated for begin- Furthermore, if we had made Example 1
ners; yet I'm more comfortable with fi every more complicated, the trick of going to Ex-
time I write it. I still balk at spelling other ample la would no longer work. For ex-
basic symbols backwards, and so do most of ample, suppose we had inserted another
the people I know; a student's paper con- statement into the for loop, just before the
taining the code fragment "esae; c o m m e n t i f clause; then the relations i _< m and
bletch t n e m m o c ; " is a typical reaction to A[i] -- x wouldn't have been tested consecu-
this trend !) tively, and we couldn't in general have com-
There are ways to express Example 1 bined them with and.
without go to statements, but they require John Cooke told me an instructive story

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with g o t o ,Sta~nezd~ - 267
L
relating to Example 1 and to the design of programs are translated by a typical "90 %
languages. Some P L / I programmers were efficient compiler" wi~h bounds-checking
asked to do the stated search problem with- suppressed, the corresponding run-time
out using jumps, and they came up with figures are respectively about 14n + 5 and
essentially the following two solutions: l l n + 21. (The appendix to this paper
explains the ground rules for these calcula-
tions.) Under the first assumption we save
a) D O I - 1 to M W H I L E A ( I ) -~ ffi X;
END;
about 33 % of the run-time, and under the
IF I > M T H E N second assumption we save about 21%, so
DO; M z I; A ( I ) = X; B ( I ) ffi 0; END;
B ( I ) ffi B ( I ) + I; in both cases the elimination of the go t o
b) F O U N D = 0; has also eliminated some of the running
D O I - i T O M W H I L E F O U N D = 0;
IF A(I) - X T H E N F O U N D = i; time.
END;
I F FOUND ffi 0 THEN

B(I)
DO; M - I ; A ( I )
B ( I ) ffi 1;
-
= X; B ( I ) ffi 0; END; Efficiency
The ratio of running times (about 6 to 4 in
the first case when n is large) is rather sur-
Solution (a) is best, but since it involves a prising to people who haven't studied pro-
null iteration (with no explicit statements gram behavior carefully. Example 2 doesn't
being iterated) most people came up with look that much more efficient, but it is.
Solution (b). The instructive point is that Experience has shown (see [46], [51]) that
Solution (b) doesn't work; there is a serious most of the running time in non-IO-bound
bug which caused great puzzlement before programs is concentrated in about 3 % of the
the reason was found. Can the reader source text. We often see a short inner loop
spot the difficulty? (The answer appears on whose speed governs the overall program
page 298.) speed to a remarkable degree; speeding up
As I've said, Example 1 has often been the inner loop by 10 % speeds up everything
used to defend the go to statement. Un- by almost 10 %. And if the inner loop has 10
fortunately, however, the example is totally instructions, a moment's thought will usually
unconvincing in spite of the arguments I've cut it to 9 or fewer.
stated so far, because the method in Example My own programming style has of course
1 is almost never a good way to search an changed during the last decade, according to
array for x ! The following modification to the the trends of the times (e.g., I'm not quite so
data structure makes the algorithm much tricky anymore, and I use fewer go to's),
better: but the major change in my style has been
Example 2: due to this inner loop phenomenon. I now
look with an extremely jaundiced eye at
A[mq-1] := x; i := 1; every operation in a critical inner loop, seek-
w h i l e A[i] ~ ~cdo i := i+1; ing to modify my program and data struc-
i f i > m then m := i; B[i] := 1;
e l s e B[i] := B [ i ] + I fi;
ture (as in the change from Example 1 to
Example 2) so that some of the operations
Example 2 beats Example 1 because it can be eliminated. The reasons for this ap-
makes the inner loop considerably faster. If proach are that: a) it doesn't take long, since
we assume that the programs have been the inner loop is short; b) the payoff is real;
handcoded in assembly language, so that the and c) I can then afford to be less efficient
values of i, m, and x are kept in registers, in the other parts of my programs, which
and if we let n be the final value of i at the therefore are more readable and more easily
end of the program, Example 1 will make written and debugged. Tools are being
6n + 10 ( + 3 if not found) references to developed to make this critical-loop identifi-
memory for data and instructions on a cation job easy (see for example [46] and
typical computer, while the second program [82]).
will make only 4n + 14 (+6'if not found).• Thus. if I hadn't seen how to remove one
If, on the other hand, we assume that these of the operations from the loop in Example I

Computing Surveys, V61. 6, N o . 4, D e . ~ m b e r I ~ 4


268 • Donald E. Knuth

by changing to Example 2. I would probably the same viewpoint should prevail in soft-
(at least) have made the for loop run from ware engineering~ Of course I wouldn't
m to 1 instead of from 1 to m, since it's bother making such optimizations on a one-
usually easier to test for zero than to com- shot job, but when it's a question of prepar-
pare with m. And if Example 2 were really ing quality programs, I don't want to re-
critical, I would improve on it still more by strict myself to tools that deny me such
"doubling it up" so that the machine code efficiencies.
would be essentially as follows. There is no doubt that the grail of effi-
E x a m p l e 2a:
ciency leads to abuse. Programmers waste
enormous amounts of time thinking about,
A [ m + l ] : = x; i : = 1; g o t o t e s t ; or worrying about, the speed of noncritical
loop: i := i+2; parts of their programs, and these attempts
test: i f A[i] = x t h e n g o t o f o u n d fi;
i f A [ i + I ] ~ x t h e n g o t o loop fi;
at efficiency actually have a strong negative
i := i+1; impact when debugging and maintenance are
f o u n d : i f i > m t h e n m : = i; B[i] : = 1; considered. We should forget about small
e l s e B[i] : = B [ i ] + I fi; efficiencies, say about 97% of the time: pre-
Here the loop variable i increases by 2 on mature optimization is the root of all evil.
each iteration, so we need to do that opera- Yet we should not pass up our opportuni-
tion only half as often as before; the rest of ties in that critical 3 %. A good programmer
the code in the loop has essentially been will not be lulled into complacency by such
duplicated to make this work. The running reasoning, he will be wise to look carefully
time has now been reduced to about 3.5n + at the critical code; but only after that code
14.5 or 8.5n + 23.5 under our respective has been identified. It is often a mistake to
assumptions--again this is a noticeable make a priori judgments about what parts
saving in the overall running speed, if, say, of a program are really critical, since the
the average value of n is about 20, and if universal experience of programmers who
this search routine is performed a million or have been using measurement tools has been
so times in the overall program. Such loop- that their intuitive guesses fail. After work-
optimizations are not difficult to learn and, ing with such tools for seven years, I've be-
as I have said, they are appropriate in just come convinced that all compilers written
a small part of a program, yet they very from now on should be designed to provide
often yield substantial savings. (Of course if all programmers with feedback indicating
we want to improve on Example 2a still what parts of their programs are costing
more, especially for large m, we'll use a more the most; indeed, this feedback should be
sophisticated search technique; but let's supplied automatically unless it has been
ignore that issue, at the moment, since I specificMly turned off.
want to illustrate loop optimization in gen- After a programmer knows which parts of
eral, not searching in particular.) his routines are really important, a trans-
The improvement in speed from Example formation like doubling up of loops will be
2 to Example 2a is only about 12%, and worthwhile. Note that this transformation
many people would pronounce that insig- introduces go to statements--and so do
nificant. The conventional wisdom shared several other loop optimizations; I will re-
by many of today's software engineers calls turn to this point later. Meanwhile I have
for ignoring efficiency in the small; but I to admit that the presence of go to state-
believe this is simply an overreaction to the ments in Example 2a has a negative as well
abuses they see being practiced by penny- as a positive effect on efficiency; a non-
wise-and-pound-foolish programmers, who optimizing compiler will tend to produce
can't debug or maintain their "optimized" awkward code, since the contents of regis-
programs. In established engineering dis- ters can't be assumed known when a label is
ciplines a 12 % improvement, easily obtained, passed. When I computed the running times
is never considered marginal; and I believe cited above by looking at a typical compiler's

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with g o t o Statements • 269

output for this example, I found that the Aim+ 1] was already invalid in the previous
improvement in performance was not quite line. Similarly, in Example 1 there van be no
as much as I had expected. range error in the for loop unless a range
error occurred earlier. It seems senseless to
Error Exits have expensive range cheeks in those parts
For simplicity I have avoided a very impor- of my programs that I know are clean.
tant issue in the previous examples, but it In this respect I should mention I-Ioare's
must now be faced. All of the programs we almost persuasive arguments to the contrary
have considered exhibit bad programming [40, p. 18]. He points out quite correctly that.
practice, since they fail to make the neces- the current practice of compiling subscript
sary check that m has not gone out of range. range checks into the machine code while a
In each case before we perform "m := i" we program is being tested, then suppressing the
should precede that operation by a test such checks during production runs, is like a sailor
as who wears his life preserver while training
on land but leaves it behind when he sails[
if m = max then go to memory overflow; On the other hand, that sailor isn't so foolish
where max is an appropriate threshold value. if life vests are extremely expensive, and if he
I left this statement out of the examples is such an excellent swimmer that the chance
since it would have been distracting, but we of needing one is quite small compared with
need to look at it now since it is another the other risks he is taking. In the foregoing
important class of go to statements: an examples we typically are much more cer-
er~vr exit. Such checks on the validity of tain that the subscripts will be in range than
data are very important, especially in soft- that other aspects of our overall program will
ware, and it seems to be the one class of go work correctly. John Coeke observes that
to's that still is considered ugly but neces- time-consuming range checks can be avoided
sary by today's leading reformers. (I wonder by a smart compiler which first compiles the
how Val Schorre has managed to avoid such checks into the program then moves them
go to's during all these years.) out of the loop. Wirth [94] and ttoare
Sometimes it is necessary to exit from [39] have pointed out that a well-designed
several levels of control, cutting across code for statement can permit even a rather
that may even have been written by other simple-minded compiler to avoid most range
programmers; and the most graceful way to checks within loops.
do this is a direct approach with a go to or I believe that range checking should be
its equivalent. Then the intermediate levels used far more often than it currently is, but
of the program can be written under the not everywhere. On the other hand I a m
assumption that nothing will go wrong. really assuming infallible hardware when I
I will return to the subject of error exits say this; surely I wouldn't want to remove
later. the parity check mechanism from the hard-
ware, even under a hypothetical assumption
Subscript Checking that it was slowing down the computation.
In the particular examples given above we Additional memory protection is necessary
can, of course, avoid testing m vs. max if to prevent m y program from harming some-
we have dynamic range-checking on all sub- one else's,and theirs from clobbering mine.
scripts o f A. But this usually aborts the M y arguments are directed towards com-
program, giving us little or no control over piled-in tests, not towards the hardware
the error recovery; so we probably want to mechanisms which are reallj~needed to en-
test m anyway. And ouch, what subscript sure reliability.
checking does to the inner loop execution
times! In Example 2, I will certainly want to Hash Coding
suppress range-checking in the while clause Now let's move on to another example, based
since its subscript can't be out of range unless on a standard hashing technique but other-

Computing Surveys, Vol. 6, No. 4, December 1974


!
270 • Donald E. Knuth
I
wise designed for the same application as the since this formulation abstracts the real
above. Here h(x) is a hash function which meaning of what!is happening. Someday
takes on values between 1 and m; and x ~ 0. there may be hardware capable of testing
In this case m is somewhat larger than the membership in small sets more efficiently
number of items in the table, and "empty" than if we program the tests sequentially,
positions are represented by 0. so that such a program would lead ~o better
code than Example 3. And there is a much
Example 3:
more important reason for preferring this
i := h(x); form of the while clause: it reflects a sym-
w h i l e A[i] # 0 d o metry between 0 and x that is not present in
b e g i n i f A[i] = x t h e n g o t o f o u n d fi; Example 3. For example, in most software
i : = i - - 1 ; i f i = 0 t h e n i : = m fi;
end; applications it turns out that the condition
n o t f o u n d : A[i] : = x; B[i] : = 0; A[~] -- x terminates the loop far more fie-
f o u n d : B[i] : = B [ i ] + I ; quently than A[~] = 0; with this knowledge,
If we analyze this as we did Example 1, my second draft of the program would be
the following.
we see that the trick which led to Example 2
doesn't work any more. Yet if we want to E x a m p l e 3a:
eliminate the go to we can apply the idea of
Example la by writing i :ffih(x);
w h i l e A[i] ~ x d o
w h i l e A[i] ~ 0 a n d h[i] ~ x d o . . . b e g i n i f A[i] = 0
t h e n A[i] : = x; B[i] :-- 0;
and by testing afterwards which condition go to found;
caused termination. This version is perhaps fi;
a little bit easier to read; unfortunately it i := i-1;ifi = 0theni : = raft;
makes a redundant test, which we would like end;
found: B[i] :ffi B[il+I;
to avoid if we were in a critical part of the
program. This program is easy to derive from the
Why should I worry about the redundant go to-less form, but not from Example 3;
test in this case? After all, the extra test and it is better than Example 3. So, again we
whether A[i] was ~ 0 or ~ x is being made see the advantage of delaying optimizations
outside of the while loop, and I said before until we have obtained more knowledge of a
that we should generally ecnfine our optimi- program's behavior.
zations to inner loops. Here, the reason is It is instructive to consider Example 3a
that this while loop won't usually be a loop further, assuming now that the while loop
at all; with a proper choice of h and m, the is performed many times per search. Al-
operation i := i - 1 will tend to be executed though this should not happen in most ap-
very infrequently, often less than once per plications of hashing, there are other pro-
search on the average [54, Section 6.4]. Thus, grams in which a loop of the above form is
the entire program of Example 3, except per- present, so it is worth examining what we
haps for the line labeled "not found", must should do in such circumstances. If the w h i l e
be considered as part of the inner loop, if loop becomes an inner loop affecting the
this search process is a dominant part of the overall program speed, the whole picture
overall program (as it often is). The redund- changes; that redundant test outside the loop
ant test will therefore be significant in this becomes utterly negligible, but the test
case. " i f i = 0" suddenly looms large. We gen-
Despite this concern with efficiency, I erally want to avoid testing conditions that
should actually have written the first draft are almost always false, inside a critical
of Example 3 without that go to statement, loop. Therefore, under these new assump-
probably even using a while clause written tions I would change the data structure by
in an extended language, such as adding a'new element A[0] = 0 to the array
whileA[i] ~ {0, x } d o . . . and eliminating the test for i ffi 0 as follows.

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with g o t o E ~ n t s • '271

Example 3b: "carriage return" (i.e., ito advance in t h e


i := h(~); output to the beginning of t h e n e x t line).
•while A[i] ~ x do After printing a period (".") we also want to
if A[i] ~ 0 insert an additional spac e in the output. The
t h e n i := i - 1 following code clearly does the trick.
else if i = 0
t h e n i := m; Example 4:
else A[i] := x; B[i] := 0;
go to found; x :ffi read char;
fi; if ~ = alash
fi; t h e n x := read char;
found: B[il := B[i]+I; if x = slash
t h e n return the carriage;
The loop now is noticeably faster. Again, I go to char processed;
would be unhappy with slow subscript range else tabulate;
checks if this loop were critical. Incidentally, fi;
fi;
Example 3b was derived from Example 3a, write char (x);
and a rather different program would have if x = period then write char (space) fi;
emerged if the same idea had been applied char processed:
to Example 3; then a test " i f i = 0" would An abstract program with similar charac-
have been inserted outside the loop, at label teristics has been studied b y Peterson et al.
"not found", and another go t o would have [77; Fig. l(a)]. In practice we occasionally
been introduced by the optimization process. run into situations where a sequence of
As in the first examples, the program in decisions is made via nested i f - t h e n - e l s e ' s ,
Example 3 is flawed in failing to test for and then two or more of the branches m e r g e
memory overflow. I should have done this, into one. We can m a n a g e such decision-table
for example by keeping a count, n, of how tasks without go to's b y copying the com-
many items are nonzero. The "not found" mon code into each place, or b y defining it
routine should then begin with something as a p r o c e d u r e , but this does not seem con-
like "n := n - k l ; i f n = m t h e n g o t o ceptually simpler than to make g o t o a com-
memory overflow". mon part of the program in such cases. Thus
Text Scanning in Example 4 I could avoid the go t o by
copying "write char (x); f f x ~ pcr/od t h e n
The first time I consciously applied the top-
down structured programming methodology
write char (space) fi" into t h e program after
to a reasonably complex job was in the late "tabulate;" and b y making corresponding
summer of 1972, when I wrote a program to changes. B u t this would be a pointless waste
prepare the index to my book Sorting and of energy just to eliminate a perfectly under-
standable go t o statement: the resulting
Searching [54]. I was quite pleased with the
way that program turned out (there was program would actually be harder to main-
tain than the former, since the action of
only one serious bug), but I did use one g o t o
printing a character now appears in two
statement. In this case the reason was some-
what different, having nothing to do with different places. The alternative of declaring
procedures avoids the latter problem, but it
exiting from loops; I was exiting, in fact,
from an i f - t h e n - e l s e construction. is not especially attractive either. Still
The following example is a simplified ver- another alternative is:
sion of the situation I encountered. Suppose Example 4a:
we are processing a stream of text, and that x :-- re~tdchar;
we want to read and print the next character double slash := false;
from the input; however, if that character is if x = slash
a slash ( " / " ) we want to "tabulate" instead t h e n x := read char;
(i.e., to advance in the output to the next if x = slash
t h e n double slash :ffi true;
tab-stop position on the current line); how- else tabulate;
ever, two consecutive slashes means a fi;

Computing Surveys, Vot~6, No..4, Deaember-.l~4


[ -,

- ~ , ~ , . ~ 4 d ~ . ~ : , - " i ~ z : ~ . : ~ . ¢ ~ ! "¸¸ •
272 • DonaldE. Knuth

fi; statements. In addition, I also know of


i f double slash places where I l~ave myself used a compli-
t h e n return the carriage;
e l s e write char(x); cated structure with excessively unrestrained
i f x = period t h e n write char (space) fi; go to statements, especially the notorious
fi; Algorithm 2.3.3A for multivariate poly-
I claim that this is conceptually no simpler nomial addition [50]. The original program
than Example 4; indeed, one can argue that had at least three bugs; exercise 2.3.3-14,
it is actually more difficult, because it makes "Give a formal proof (or disproof) of the
the entire routine aware of the "double slash" validity of Algorithm A", was therefore
exception to the rules, instead of dealing with unexpectedly easy. Now in the second edi-
it in one exceptional place. tion, I believe that the revised algorithm is
correct, but I still don't know any good way
A Confession to prove it; I've had to raise the difficulty
Before we go on to another example, I must rating of exercise 2.3.3-14, and I hope some-
admit what many readers already suspect, day to see the algorithm cleaned up without
namely, that I'm subject to substantial bias loss of its efficiency.
because I actually have a vested interest in M y books emphasize efficiency because
go to statements! The style for the series they deal with algorithms that are used re-
of books I'm writing was set in the early peatedly as building blocks in a large variety
1960s, and it would be too difficult for me to of applications. It is important to keep
change it now; I present algorithms in my efficiency in its place, as mentioned above,
books using informal English language but when efficiency counts we should also
descriptions, and go to or its equivalent is know how to achieve it.
almost the only control structure I have. In order to make it possible to derive
Well, I rationalize this apparent anachro- quantitative assessments of efficiency, my
nism by arguing that: a) an informaI English books show how to analyze machine lan-
description seems advantageous because guage programs; and these programs are
many readers tell me they automatically expressed in MIXAL, a symbolic assembly
read English, but skip over formal code; b) language that explicitly corresponds one-
when go to statements are used judiciously for-one to machine language. This has its
together with comments stating nonobvious uses, but there is a danger of placing too
loop invariants, they are semantically equi- much stress on assembly code. Programs in
valent to while statements, except that MIXAL are like programs in machine lan-
indentation is missing to indicate the struc- guage, devoid of structure; or, more pre-
ture; c) the algorithms are nearly always cisely, it is difficult for our eyes to perceive
short, so that accompanying flowcharts are the program structure. Accompanying com-
able to illustrate the structure; d) I try to ments explain the program and relate it to
present algorithms in a form that is most the global structure illustrated in flowcharts,
efficient for implementation, and high-level but it is not so easy to understand what is
structures often don't do this; e) many going on; and it is easy to make mistakes,
readers will get pleasure from converting my partly because we rely so much on comments
semiformal algorithms into beautifully struc- which might possibly be inaccurate descrip-
tured programs in a formal programming tions of what the program really does. It is
language; and f) we are still learning much clearly better to write programs in a lan-
about control structures, and I can't afford guage that reveals the control structure,
to wait for the final consensus. even if we are intimately conscious of the
In spite of these rationalizations, I'm hardware at each step; and therefore I will
uncomfortable about the situation, because be discussing a structured assembly language
I find others occasionally publishing ex- called P L / M I X in the fifth volume of The
amples of algorithms in "my" style but art of computer programming. Such a language
without the important parenthesized com- (analogous to Wirth's PL360 [95]) should
ments and/or with unrestrained use of go t o really be supported by each manufacturer

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with g o to Statenwnt,,~ • 273

for each machine in place of the old-fash- weeks ago I decided to choose an algorithm
ioned structureless assemblers that still pro- at random from my books, to study its use
liferate. of go to statements. The very first example
On the other hand I'm not really un- I encountered [54, Algorithm 6.2.3C] turned
happy that MIxAL programs appear in my out to be another case where existing pro-
books, because I believe that MIXAL is a gramming languages have no good substitute
good example of a "quick and dirty assem- for go to's. In simplified form, the loop
bler", a genre of software which will always where the trouble arises can be written as
be useful in its proper role. Such an assembler follows.
is characterized by language restrictions
E x a m p l e 5:
that make simple one-pass assembly possible,
and it has several noteworthy advantages compare:
when we are first preparing programs for a i f A[i] < x
new machine: a) it is a great improvement t h e n i f L[i] # 0
t h e n i : = L[i]; g o t o compare;
over numeric machine code; b) its rules are e l s e L[i] : = j ; g o t o insert fi;
easy to state; and c) it can be implemented e l s e i f R[i] # 0
in an afternoon or so, thus getting an effi- t h e n i : = R[i]; g o t o c o m p a r e ;
cient assembler working quickly on what e l s e R[i] :-- j ; g o t o insert fi;
may be very primitive equipment. So far I fi;
insert: A[j] := x;
have implemented six such assemblers, at L[j] := 0; R[j] := 01j := j + l ;
different times in my life, for machines or
interpretive systems or microprocessors that This is part of the well-known "tree search
had no existing software of comparable and insertion" scheme, where a binary search
utility; and in each case other constraints tree is being represented by three arrays:
made it impractical for me to take the extra A[i] denotes the information stored at node
time necessary to develop a good, structured number i, and L[i], R[~] are the respective
assembler. Thus I am sure that the concept node numbers for the roots of that node's
of quick-and-dirty-assembler is useful, and left and right subtrees; e m p t y subtrees are
I'm glad to let MIXAL illustrate what one is represented by zero. The program searches
like. However, I also believe strongly that down the tree until finding an empty sub-
such languages should never be improved to tree where x can be inserted; and variable j
the point where they are too easy or too points to an appropriate place to do the
pleasant to use; one must restrict their use insertion. For convenience, I have assumed
to primitive facilities that are easy to imple- in this example that x is not already present
ment efficiently. I would never switch to a in the search tree.
two-pass process, or add complex pseudo- Example 5 has four go to statements, but
operations, macro-facilities, or even fancy the control structure is saved from obscurity
error diagnostics to such a language, nor because the program is so beautifully sym-
would I maintain or distribute such a metric between L and R. I h-low that these
language as a standard programming tool for go to statements can be eliminated by
a real machine. All such ameliorations and introducing a Boolean variable which be-
refinements should appear in a structured comes true when L[i] or R[i] is found to be
assembler. Now that the technology is zero. But I don't want to test this variable
available, we can condone unstructured in the inner loop of my program.
languages only as a bootstrap-like means to
a limited end, when there are strong eco- Systematic Elimination
nomic reasons for not implementing a better A good deal of theoretical work has been
system. addressed to the question of g o t o elimina-
tion, and I shall now try to summarize the
Tree Searching findings and to discuss their relevance.
But, I'm digressing from my subject of go t o S. C. Kleene proved a famous theorem in
elimination in higher level languages. A few 1956 [48] which says, in essence, that the set

Computing Surveys, Vol. 6, No. 4, December 1974


274 • Donald E. Knuth

of all paths through any flowchart can be p :-- 1;


represented as a "regular expression" R while p > 0 do
begin if p = 1 t h e n perform step 1;
built up from the following operations: p := successor of step 1 fi;
if p = 2 t h e n perform step 2;
8 the single arc s of the flowchart p := successor step 2 fi;
R1; R2 concatenation (all paths consisting ...
of a path of R~ followedby a path i f p = nn t h e n perform step n;
of R~) p := successor of step n fi;
R1 O R2 union (all paths of either R~ or R2) end.
R+ iteration (all paths of the form p~;
p2; "" ; p~ for some n )_ 1, Here the auxiliary variable p serves as a
where each p~ is a path of R) program counter representing which box of
the flowchart we're in, and the program stops
These regular expressions correspond when p is set to zero. We have eliminated all
loosely to programs consisting of statements g o to's, but we've actually lost all the struc-
in a programming language related by the ture.
three operations of sequential composition, Jacopini conjectured in his paper that
conditionals (if-then-else), and iterations auxiliary variables are necessary in general,
(while loops). Thus, we might expect that and that the go to's in a program of the
these three program control structures would form
be sufficient for all programs. However,
closer analysis shows that Kleene's theorem L l : i f Bi t h e n g o t o L2 fi;
$1;
does not relate directly to control structures; i f B~ t h e n go t o L~ fi;
the problem is only superficially similar. His S~;
result is suggestive but not really applicable g o t o L1;
in this case. L~: S,;
The analogous result for control struc-
cannot always be removed unless additional
tures was first proved by G. Jacopini in 1966,
computation is done. Floyd and I proved this
in a paper written jointly with C. BShm conjecture with John Hopcroft's help [52].
[8]. Jacopini showed, in effect, that any
Sharper results were later obtained by Ash-
program given, say, in flowchart form can be
croft and Manna [1], Bruno and Steiglitz
transformed systematically into another
[10], Kosaraju [57], and Peterson, Kasami,
program, which computes the same results
and Tokura [77].
and which is built up from statements in the
Jaeopini's original construction was not
original program using only the three basic
merely the trivial flowchart emulation
operations of composition, conditional, and
scheme indicated above; he was able to
iteration, plus possible assignment state-
salvage much of the given flowchart struc-
meats and tests on auxiliary variables. Thus,
ture if it was reasonably well-behaved. A
in principle, go to statements can always be
more general technique of g o t o elimination,
removed. A detailed exposition of Jacopini's
devised by Ashcroft and Manna [1], made
construction has been given by H. D. Mills
it possible to capture still more of a given
[69].
program's natural flow; for example, their
Recent interest in structured programming
technique applied to Example 5 yields
has caused many authors to cite Jacopini's
result as a significant breakthrough and as a Example 5a:
cornerstone of modern programming tech-
t := true;
nique. Unfortunately, these authors are un- while t do
aware of the comments made by Cooper in b e g i n i f A[i] < x
1967 [16] and later by Bruno and Steiglitz t h e n i f L[i] # 0 t h e n i : = L[i];
[10], namely, that from a practical stand- e l s e L[i] : = j; t : = f a l s e fi;
e l s e i f R[i] # 0 t h e n i : = R[i];
point the theorem is meaningless. Indeed, e l s e R[i] : = j; t : = f a l s e fi;
any program can obviously be put into the end;
"beautifully structured" form AUI := x;

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with g o t o Statements • 275

But, in general, their technique may cause a living conditions that are much harder to
program to grow exponentially in size; and quantify.
when error exits or other recalcitrant go Probably the worst mistake any one can
to's are present, the resulting programs will make with respect to the subject of g o t o
indeed look rather like the flowchart emula- statements is to assume that "structured-
tor sketched above. programming" is achieved by writing pro-
If such automatic go to elimination grams as we always have and then elimi-
procedures are applied to badly structured nating the go to's. Most go to's shouldn't
programs, we can expect the resulting pro- be there in the first place! What we really
grams to be at least as badly structured. want is to conceive of our program in such
Dijkstra pointed this out already in [23], a way that we rarely even think about g o t o
saying: statements, because the real need for them
hardly ever arises. The language in which we
The exercise to translate an arbitrary flow express our ideas has a strong influence on
diagram more or less mechanically into a
jumpless one, however, is not to be recom- our thought processes. Therefore, Dijkstra
mended. Then the resulting flow diagram [23] asks for more new language features--
cannot be expected to be more transparent structures which encourage clear thinking--
than the original one. in order to avoid the go to's temptations to-
In other words, we shouldn't merely ward complications.
remove go to statements because it's the
fashionable thing to do; the presence or Event Indicotors
absence of go to statements is not really the The best such language feature I know has
issue. The underlying structure of the recently been proposed by C. T. Zahn
program is what counts, and we want only [102]. Since this is still in the experimental
to avoid usages which somehow clutter up stage, I will take the liberty of modifying
the program. Good structure can be expressed his "syntactic sugar" slightly, without
in FORTRAN or COBOL, or even in assembly changing his basic idea. The essential novelty
language, although less clearly and with in his approach is to introduce a new quan-
much more trouble. The real goal is to tity into programming languages, called an
formulate our programs in such a way that event indicator (not to be confused with
they are easily understood. concepts from P L / I or SIMSC~IPT). M y
Program structure refers to the way in current preference is to write his event-
which a complex algorithm is built up from driven construct in the following two general
successively simpler processes. In most forms.
situations this structure can be described
A) l o o p u n t i l (eventh or - . - or {event)s:
very nicely in terms of sequential composi- (statement list)0;
tion, conditionals, simple iterations, and repeat;
with case statements for multiway branches; t h e n (event)l = > (statement list)l;
undisciplined go to statements make pro-
gram structure harder to perceive, and they (event)~ = > (statement list)n;
fi;
are often symptoms of a poor conceptual
formulation. But there has been far too B) b e g i n u n t i l (event)l o r . . . or (event)n;
much emphasis on go to elimination instead (statement list)0;
of on the really important issues; people end;
have a natural tendency to set up all easily then ( e v e n t ) t = > ( s t a t e m e n t list)t;
understood quantitative goal like the aboli- ievent)~ = > (statement list)z;
tion of jumps, instead of working directly fi:
for a qualitative goal like good program
structure. In a similar way, many people There is also a new statement, "(event)",
have set up "zero population growth" as a which means that the designated event has
goal to be achieved, when they really desire occurred: such a statement is allowed only

Computing Surveys, VoL 6, No. 4, December 1974

i
276 • Donald E. Knuth

within (statement lisQ0 of an u n t i l con- This use of events is, in fact, semantically
struct which declares that event. equivalent to a restricted form of go t o
In form (A), (statement list)0 is executed statement, which Peter Landin discussed
repeatedly until control leaves the construct in 1965 [58] before most of us were ready to
• entirely or until one of the named events listen. Landin's device has been reformulated
occurs; in the latter case, the statement by Clint and Hoare [14] in the following
list corresponding to that event is executed. way: Labels are declared at the beginning
The behavior in form (B) is similar, except of each block, just as procedures normally
that no iteration is implied; one of the named are, and each label also has a (label body)
events must have occurred before the e n d just as a procedure has a (procedure body).
is reached. The t h e n . . , fi part may be Within the block whose heading contains
omitted when there is only one event name. such a declaration of label L, the statement
The above rules should become clear go to L according to this scheme means
after looking at what happens when Example "execute the body of L, then leave the
5 above is recoded in terms of this new fea- block". It is easy to see that this is exactly
ture: the form of control provided by Zahn's
Example 5b:
event mechanism, with the (label body)s
replaced by (statement list)s in the t h e n • • •
loop u n t i l left leaf hit or fi postlude and with (event) statements
right leaf hit: corresponding to Landin's go to. Thus,
i f A[i] < x
t h e n i f L[i] # 0 t h e n i := L[i];
Clint and Hoare would have written Ex-
else left leaf hit fi; ample 5b as follows.
else i f R[i] # 0 t h e n i := R[i];
else right leaf hit fi; w h i l e t r u e do
fi; begin label left leaf hit; L[i] := j;
repeat; label right leaf hit; R[i] := j;
t h e n left leaf hit = > L[i] := j; i f A[i] < x
right leaf hit = > R[i] := j; t h e n i f L[i] # 0 t h e n i := L[i];
fi; e l s e go t o left leaf hit fi;
A[j] := x; L[j] := 0; R[j] := 0; j := j + l ; else i f R[/] # 0 t h e n i := R[/];
else go to right leaf hit fi;
Alternatively, using a singleevent, end;
A[j] := x; L[j] := 0; R[j] := 0; j := j + l ;
Example 5c:
I believe the program reads much better in
l o o p u n t i l leaf replaced:
i f A[i] < x
Zahn's form, with the (label body)s set in
t h e n i f L[i] # 0 t h e n i := L[i] the code between that which logically
e l s e L[i] := j; leaf replaced fi; precedes and follows.
e l s e i f R[i] # 0 t h e n i := R[i] Landin also allowed his "labels" to have
e l s e R[i] := j; leaf replaced fi; parameters like any other procedures; this
fi;
repeat; is a valuable extension to Zahn's proposal,
A[j] := x; L[j] :~ O; R[j] := O; j := j + l ; so I shall use events with value parameters
in several of the examples below.
For reasons to be discussed later, Example As Zahn [102] has shown, event-driven
5b is preferable to 5c. statements blend well with the ideas of
It is important to emphasize that the first structured programming by stepwise refine-
line of the construct merely declares the ment. Thus, Examples 1 to 3 can all be cast
event indicator names, and that event into the following more abstract form, using
indicators are not conditions which are being an event "found" with an integer parameter:
tested continually; (event) statements are
simply transfers of control which the com- b e g i n u n t i l found:
piler can treat very efficiently. Thus, in search table for x and
insert it if not present;
Example 5e the statement "leaf replaced" end;
is essentially a go to which jumps out of t h e n found (integer j) = > B[j] := B [ j ] + I ;
the loop. fi;

Computing Su~eys, VoL 6, No. 4, December 1974


S t r u c t u r e d P r o g r a m m i n g w i t h g o t o Sta~mcnto • • '277

This much of the program can be written if x = slash


before we have decided how to maintain t h e n double slash;
else tabu/ate;
the table. At the next level of abstraction, normal character input (x);
we might decide to represent the table as a fi;
sequential list, as in Example 1, so t h a t else normal character input (x);
"search table . . . " would expand into fi)
end;
for i := 1 step 1 u n t i l m do t h e n normal character input (char x) ffi>
if A[i] = x then found(i) fi; write char (x) ;
m := m~-l; Aim] := x; found(m); if x --- period t h e n write char (space) fi;
double slash = > return the carriage,
N o t e t h a t this for loop is more disciplined fi;
than the one in our original Example 1,
This program states the desired actions a
because the iteration variable is not used
bit more clearly t h a n any of our previous
outside the loop; it now conforms to the rules
a t t e m p t s were able to do.
of ALGOL W and ALGOL 68. Such f o r loops
E v e n t indicators, handle error exits too.
provide convenient documentation and
For example, we might write a program as
avoid common errors associated with global
follows.
variables; their advantages have been
discussed b y Hoare [39]. begin u n t i l error or normal end:
Similarly, if we want to use the idea of
i f m = max t h e n error ('symbol table full') fi;
Example 2 we might write the following
code as the refinement of "search table • . . " " normal end;
begin integer i; end;
A[m-bl] := x;i := 1; t h e n error (string E) ffi
while A[i] ~ x do i := i-bl; print ('unrecoverable error,'; E);
i f i > m t h e n m := i ; B [ m ] := 0fi; normal end = >
found (/) ; print ('computation complete');
end; fi;

And finally, if we decide to use hashing, Comparison of Features


we obtain the equivalent of Example 33 Of course, event indicators are not the only
which might be written as follows. decent alternatives tO g o t o statements
t h a t have been proposed. M a n y authors
begin integer i;
i := h(x); have suggested language features which
loop u n t i l present or absent: provide roughly equivalent facilities, but
if A[i] = x then present fi; which are expressed in terms of e x i t , j u m p -
if A[i] = 0 then absent fi; o u t , b r e a k , or l e a v e statements. Kosaraju
i:=i-1; [57] has proved t h a t such statements are
i f i = 0 t h e n i := mfi;
repeat; sufficient to express all programs without
t h e n present = > found(i); go t o ' s and without a n y extra computation,
absent = > A[i] := x; found(/); but only if an exit from arbitrarily m a n y
fi; levels of control is permitted.
end;
The earliest language features of this kind
T h e b e g i n u n t i l (event) construct ulso (besides Landin's proposal) provided essen-
provides a natural way to deal with decision- tially only one exit from a loop; this means
table constructions such as the text-scanning t h a t the code appearing in the t h e n . . . fi
application we have discussed. postlude of our examples would be inserted
Example 4b: into the body itself before branching. (See
Example 5c.) The separation of such code
begin u n t i l normal character input as in Zahn's proposal is better, mainly
or double slash: because the body of the construct corre-
char x;
x := read char; sponds to code t h a t is written under different
if x = slash "invariant assumptions" which are inopera-
t h e n x := read char~~ tive after a particular event has occurred.

ComputingSurveys,Vol. 6, No. 4, December1~4


278 • Donald E. Knuth

Thus, each'event corresponds to a particular specified block of code, and the block may
set of assertions about the state of the be dynamically respecified.
program, and the code which follows that Some people have suggested to me that
event takes cognizance of these assertions, events should be called "conditions" instead,
which are rather different from the assertions by analogy with Boolean expressions. How-
in the main body of the construct. (For this ever, that terminology would tend to imply a
reason I prefer Example 5b to Example 5c.) relation which is continually being moni-
Language features allowing multiple exits tored, instead of a happening. By writing
have been proposed by G. V. Bochmann [7], "loop u n t i l yprime is near y: . . . " we seem
and independently by Shigo et al. [86]. These to be saying that the machine should keep
are semantically equivalent to Zahn's pro- track of whether or not y and yprime are
posals, with minor variations; but they nearly equal; a better choice of words would
express such semantics in terms of state- be an event name like "loop u n t i l con-
ments that say "exit to (label)". I believe vergence established:.-." so that we can
Zahn's idea of event indicators is an im- write " i f abs(yprime - y) < epsilon X y
provement on the previous schemes, because t h e n convergence established". An event
the specification of events instead of labels occurs when the program has discovered
encourages a better conception of the pro- that the state of computatioD has changed.
gram. The identifier given to a label is often
an imperative verb like "insert" or "com- Simple Iterations
pare", saying what action is to be done next, So far I haven't mentioned what I believe
while the appropriate identifier for an event is really the most common situation in which
is more likely to be an adjective like "found". go to statements are needed by an ALGOL
The names of .events are very much like the or P L / I programmer, namely a simple
names of Boolean variables, and I believe iterative loop with one entrance and one
this accounts for the popularity of Boolean exit. The iteration statements most often
variables as documentation aids, in spite of proposed as alternatives to go to statements
their inefficiency. have been "while B do S" and "repeat S
Putting this another way, it is much u n t i l B". However, in practice, the itera-
better from a psychological standpoint to tions I encounter very often have the form
write
A: S;
loop untll found • • • ; found; ••• repeat i f B t h e n go t o Z fi;
T; go t o A;
than to write Z:
search: while true do where S and T both represent reasonably
begin ... ; leave search; .-. end.
long sequences of code. If S is empty, we
The l e a v e or e x i t statement is operationally have a while loop, and if T is empty we
the same, but intuitively different, since it have a repeat loop, but in the general case
talks more about the program than about it is a nuisance to avoid the go to state-
the problem. ments.
The P L / I language allows programmer- A typical example of such an iteration
defined ON-conditions, which are similar occurs when S is the code to acquire or
in spirit to event indicators. A programmer generate a new piece of data, B is the test
first executes a statement "ON CONDITION for end of data, and T is the processing of
(identifier) block" which specifies a block that data. Another example is when the code
of code that is to be executed when the preceding the loop sets initial conditions
identified event occurs, and an occurrence for some iterative process; then S is a com-
of that event is indicated by writing SIG- putation of quantities involved in the test
NAL CONDITION (identifier). However, for convergence, B is the test for conver-
the analogy is not very close, since control gence, and T is the adjustment of variables
returns to the statement following the for the next iteration.
SIGNAL statement after execution of the Dijkstra [29] aptly named this a loop

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with go to Statements • 279

which is performed "n and a half times". our language would provide a single feature
The usual practice for avoiding go to's in which covered all simple iterations without
such loops is either to duplicate the code going to a rather "big" construct like the
for S, writing event-driven scheme. When a programmer
uses the simpler feature he is thereby making
S; while B do begin T; S end; it clear that he has a simple iteration, with
where B is the negation of relation B; or to exactly one condition which is being tested
figure out some sort of "inverse" for T so exactly once each time around the loop.
that "T-i; T" is equivalent to a null state- Furthermore, by providing special syntax
ment, and writing for this common case we make it easier for a
compiler to produce more efficient code,
T-l; repeat T; S u n t i l B; since the compiler can rearrange the machine
or to duplicate the code for B and to make a instructions so that the test appears physi-
redundant test, writing cally at the end of loop. (Many hours of
computer time are now wasted each day
repeat S; i f B then T fi; until B; executing unconditional jumps to the be-
or its equivalent. The reader who studies ginning of loops.)
go to-less programs as they appear in the Ole-Johan Dahl has recently proposed a
literature will find that all three of these syntax which I think is the first real solution
rather unsatisfactory constructions are used to the n -{- ~ problem, He suggests writing
frequently. the general simple iteration defined above as
I discussed this weakness of ALGOL in a
loop; S; while B: T; repeat;
letter to Niklaus Wirth in 1967, and he
proposed two solutions to the problem, where, as before, S and T denote sequences
together with many other instructive ideas of one or more statements separated by
in an unpublished report on basic concepts semicolons. Note that as in two of our
of programming languages [94]. His first original go to-free examples, the syntax
suggestion was to write refers to condition B which represents
repeat begin S; when B exit; T; end; staying in the iteration, instead of condition
B which represents exiting; and this may
and readers who remember 1967 will also be the secret of its success.
appreciate his second suggestion, Dahl's syntax may not seem appropriate
turn on begin S; when B drop out; T; end. at first, but actually it reads well in every
example I have tried, and I hope the reader
Neither set of delimiters was felt to be will reserve judgment until seeing the ex-
quite right, but a modification of the first amples in the rest of this paper. One of the
proposal (allowing one or more single-level nice properties of his syntax is that the word
exit statements within repeat b e g i n . . . repeat occurs naturally at the end of a loop
end) was later incorporated into an experi- rather than at its beginning, since we read
mental version of the ALGOL W language. the actions of the program sequentially.
Other languages such as BCPL and BLISS As we reach the end, we are instructed to
incorporated and extended the exit idea, as repeat the loop, instead of being informed
mentioned above. Zahn's construction now that the text of the loop (not its execution)
allows us to write, for example, has ended. Furthermore, the above syntax
loop until all data exhausted: avoids ALGOL'S use of the word do (and
S; also the more recent unnatural delimiter
if B then all data exhausted fi; od); the word do as used in ALGOL has
T; never sounded quite right to native speakers
repeat;
of English, it has always been rather quaint
and this is a better syntax for the n + for us to say "do read (A[i])" or "do begln"!
problem than we have had previously. Another feature of Dahl's proposals is that
On the other hand, it would be nicest if it is easily axiomatized along the lines

Computing Surveys, Vol. 6, No. 4, December1974


i
280 • Donald E. Knuth

proposed by Hoare [37, 41]: 2. INTRODUCTION OF go to STATEMENTS


I
{P}SiQ} Now that I have discussed how to remove
{Q A B}T{P} go to statements, !I will turn around and
{P} loop: S; while B: T; repeat; {Q A ~ B} show why there are occasions when I actually
wish to insert them into a go to-less program.
(Here I am using braces around the asser- The reason is that I like well-documented
tions, as in Wirth's PASCAL language [97], programs very much, but I dislike inefficient
instead of following Hoare's original nota- ones; and there are some cases where I
tion "P {S} Q", since assertions are, by simply seem to need go to statements,
nature, parenthetical remarks.) despite the examples stated above.
The nicest thing about Dahl's proposal
is that it works also when S or T is empty, Recursion Elimination
so that we have a uniform syntax for all Such cases come to light primarily when I'm
three cases; the while and repeat state- trying to optimize a program (originally
ments found in ALGoL-like languages of the well-structured), often involving the removal
late 1960s are no longer needed. When S or of implicit or explicit recursion. For example,
T is empty, it is appropriate to delete the consider the following recursive procedure
preceding colon. Thus that prints the contents of a binary tree in
loop while B:
symmetric order. The tree is represented by
T; L, A, and R arrays as in Example 5, and the
repeat; recursive procedure is essentially the defini-
tion of symmetric order.
takes the place of "while B do b e g i n T
end;" and Example 6:
loop: procedure treeprint(O; integer t; value t;
S if t # 0
while B repeat; then treeprint(L[t]) ;
print (A[tl) ;
takes the place of "repeat S u n t i l B;". At treeprint (R[t]);
fi;
first glance these may seem strange, but
probably less strange than the w h i l e and This procedure may be regarded as a
r e p e a t statements did when we first learned model for a great many algorithms which
them. have the same structure, since tree traversal
If I were designing a programming lan- occurs in so many applications; we shall
guage today, my current preference would assume for now that printing is our goal,
be to use Dahl's mechanism for simple with the understanding that this is only one
iteration, plus Zahn's more general con- instance of a generM family of algorithms.
struct, plus a for statement whose syntax It is often useful to remove recursion
would be perhaps from an algorithm, because of important
economies of space or time, even though this
loop f o r l < i < n:
tends to cause some loss of the program's
S;
repeat; basic clarity. (And, of course, we might also
have to state our algorithm in a language
with appropriate extensions. These control like FORTRANor in a machine language that
structures, together with i f . . . t h e n - . . doesn't allow recursion.) Even when we use
else .-. fi, will comfortably handle all the ALGOL or P L / I , every compiler I know im-
examples discussed so far in this paper, poses considerable overhead on procedure
without any go to statements or loss of calls; this is to a certain extent inevitable
efficiency or clarity. Furthermore, none of because of the generMity of the parameter
these language features seems to encourage mechanisms, especially cM1 by name and the
overly-complicated program structure. maintenance of proper dynamic environ-

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with g o to 8~atements • 281

ments. When procedure calls occur in an above simplification makes q resume the
inner loop the overhead can slow a program caller of p. When q ffi p the argument is
down by a factor of two or more. But if we perhaps a bit subtle, but it's all right. (I'm
hand tailor our own implementation of not sure who originated this principle; I
recursion instead of relying on a general recall learning it from Gill's paper [34, p.
mechanism we can usually find worthwhile 183], and then seeing many instances of it in
simplifications, and in the process we occa- connection with top-do~vn compiler organiza-
sionally get a deeper insight into the original tion. Under certain conditions the BLms/ll
algorithm. compiler [101] is capable of discovering this
There has been a good deal published simplification. Incidentally, the converse of
about recursion elimination (especially in the the above principle is also true (see [52]):
work of Barron [4], Cooper [15], Manna and go to statements can always be eliminated
Waldinger [61], McCarthy [62], and Strong by declaring suitable procedures, each of
[88; 91]); but I'm amazed that very little of which calls another as its last action. This
this is about "down to earth" problems. I shows that procedure calls include go t o
have always felt that the transformation statements as a special case; it cannot be
from recursion to iteration is one of the most argued that procedures are conceptually
fundamental concepts of computer science, simpler than go to's, although some people
and that a student should learn it at about have made such a claim.)
the time he is studying data structures. This As a result of applying the above simplifi-
topic is the subject of Chapter 8 in my multi- cation, and adapting it in the obvious way
volume work; but it's only by accident that to the case of a procedure with one parame-
recursion wasn't Chapter 3, since it concep- ter, Example 6 becomes
tually belongs very early in the table of E x a m p l e 6a:
contents. The material just wouldn't fit com-
fortably into any of the earlier volumes; yet procedure treeprint(t); integer t; value ~;
there are many algorithms in Chapters 1-7 L:ift~0
that are recursions in disguise. Therefore it then treeprint(L[t]) ;
print(A[t]) ;
surprises me that the literature on recursion t : = R[t]; g o t o L;
removal is primarily concerned with "baby" fi;
examples like computing factorials or re-
versing lists, instead of with a sturdy toddler But we don't really want that g o t o , so we
like Example 6. might prefer to write the code as follows,
Now let's go to work on the above ex- using Dahl's syntax for iterations as ex-
ample. I assume, of course, that the reader plained above.
knows the standard way of implementing E x a m p l e 6b:
recursion with a stack [20], but I want to
make simplifications beyond this. Rule procedure treeprint(t); integer t; value t;
loop while t ~ 0:
number one for simplifying procedure calls treeprint (L[t]) ;
is: print (A [t]) ;
If the last action of procedure p before it re- t : = R[t];
turns is to call procedure q, simply go t o the repeat;
beginning of procedure q instead.
If our goal is to impress somebody, we
(We must forget for the time being that we might tell them that we thought of Example
don't like go to statements.) It is easy to 6b first, instead of revealing that we got it
confirm the validity of this rule, if, for sim- by straightforward simplification of the
plicity, we assume parameterless procedures. obvious program in Example 6.
For the operation of calling q is to put a re- There is still a recursive call in Example
turn address on the stack, then to execute q, 6b; and this time it's embedded in the pro-
then to resume p at the return address cedure, so it looks as though we have to go
specified, then to resume the caller of p. The to the general stack implementation. How-

Computing Surveys, Vo|. 6, No. 4, De~ember 1~4


282 • Donald E. Knuth

ever, the recursive call now occurs in only equally simple way to remove the recursion
one place, so we need not put a return without resorting to something like Example
address on the stack; only the local variable 6c. As I say, it was a shock when I first ran
t needs to be saved on each call. (This is across such an example. Later, Jim Horning
another simplification which occurs fre- confessed to me that he also was guilty, in
quently.) The program now takes the fol- the syntax-table-building program for the
lowing nonrecursive form. XPL system [65, p. 500], because XPL
Example 6c : doesn't allow recursion; see also [56]. Clearly
a now doctrine about sinful go to's is needed ,
p r o c e d u r e treeprint(t); i n t e g e r t; v a l u e t; some sort of "situation ethics".
b e g i n i n t e g e r s t a c k S; S := e m p t y ; The new morality that I propose may
L I : loop w h i l e t ~ 0:
< = t; t := L[t]; go to L1;
perhaps be stated thus: "Certain go t o
L2: t < = S; statements which arise in connection with
print (A[t]) ; well-understood transformations are accept-
t := R[t]; able, provided that the program documenta-
repeat; tion explains what the transformation was."
i f nonempty(S) t h e n g o t o L2 fi;
end. The use of four-letter words like goto can
occasionally be justified even in the best of
Here for simplicity I have extended ALGOL company.
to allow a "stack" data type, where S < = t This situation is very similar to what
means "push t onto S" and t < = S means people have commonly encountered when
"pop the top of S to t, assuming that S is proving a program correct. To demonstrate
nonempty". the validity of a typical program Q, it is
It is easy to see that Example 6c is equiva- usually simplest and best to prove that some
lent to Example 6b. The statement "go t o rather simple but less efficient program P is
L I " initiates the procedure, and control correct and then to prove that P can be
returns to the following statement (labeled transformed into Q by a sequence of valid
L2) when the procedure is finished. Although optimizations. I'm saying that a similar
Example 6c involves go to statements, their thing should be considered standard prac-
purpose is easy to understand, given the tice for all but the simplest software pro-
knowledge that we have produced Example grams: A programmer should create a pro-
6c by a mechanical, completely reliable gram P which is readily understood and
method for removing recursion. Hopkins well-documented, and then he should op-
[44] has given other examples where go t o timize it into a program Q which is very effi-
at a low level supports high-level construc- cient. Program Q may contain go to state-
tions. ments and other low-level features, but the
But if you look at the above program transformation from P to Q should be ac-
again, you'll probably be just as shocked as complished by completely reliable and well-
I was when I first realized what has hap- documented "mechanical" operations.
pened. I had always thought that the use of At this point many readers will say, "But
g o t o statements was a bit sinful, say a he should only write P, and an optimizing
"venial sin"; but there was one kind of g o t o compiler will produce Q." To this I say,
that I certainly had been taught to regard "No, the optimizing compiler would have to
as a mortal sin, perhaps even unforgivable, be so complicated (much more so than any-
namely one which goes into the middle of an thing we have now) that it will in fact be
iteration! Example 6c does precisely that, unreliable." I have another alternative to
and it is perfectly easy to understand Exam- propose, a new class of software which will
ple 6c by comparing it with Example 6b. be far better.
In this particular case we can remove the
go to's without difficulty; but in general Program ManipulationSystems
when a recursive call is embedded in For 15 years or so I have been trying to
~ r "1 ""~'~ a rc~,~,,~ till .~ ~,,,~,~u~u ,., think of how to write a compiler that really
several complex levels of control, there is no produces top quality code. For example,

Computing Surveys, Vol; 6, No. 4, December 1974


Structured Programming with g o t o Statements • 283

most of the M i x programs in my books are restructuring of a program b y combining


considerably more efficient than any of similar loops. I'later discovered that program
today's most visionary compiling schemes manipulation is just part of a much more
would be able to produce. I've tried to study ambitious project undertaken by Cheatham
the various techniques that a hand-coder and Wegbreit [12]; another paper about
like myself uses, and to fit them into some source-code optimizations has also recently
systematic and automatic system. A few appeared [83]. Since LIsP programs are easily
years ago, several students and I looked at a manipulated as LisP d£ta objects, there has
typical sample of FORTRAN programs [51], also been a rather extensive development of
and we all tried hard to see how a machine similar ideas in this domain, notably by
could produce code that would compete Warren Teitelman (see [89, 90]). The time
with our best hand-optimized object pro- is clearly ripe for program-manipulation
grams. We found ourselves always running systems, and a great deal of further work
up against the same problem: the compiler suggests itself.
needs to be in a dialog with the prograrmner; The programmer using such a system will
it needs to know properties of the data, and write his beautifully-structured, but possibly
whether certain cases can arise, etc. And we inefficient, program P; then he will inter-
couldn't think of a good language in which actively specify transformations that make
to have such a dialog. it efficient. Such a system will be much more
For some reason we all (especially me) had powerful and reliable than a "completely
a mental block about optimization, namely automatic one. We can also imagine the sys-
that we always regarded it ~ a behind-the- tem manipulating measurement statistics
scenes activity, to be done in the machine concerning how much of the total running
language, which the programmer isn't sup- time is spent in each statement, since the
posed to know. This veil was first lifted from programmer will want to know which parts
my eyes in the Fall of 1973. when I ran across of his program deserve to be optimized, and
a remark by Hoare [42] that, ideally, a how much effect an optimization will really
language should be designed so that an have. The original program P should be re-
optimizing compiler can describe its optimi- tained along with the transformation specifi-
zations in the source language. Of course! cations, so that it can be properly understood
Why hadn't I ever thought of it? and maintained as time passes. As I say, this
Once we have a suitable language, we will idea certainly isn't my own; it is so exciting
be able to have what seems to be emerging I hope that everyone soon becomes aware of
as the programming system of the future: an its possibilities.
interactive program-manipulation system, A "calculus" of program transformations
analogous to the many symbol-manipulation is gradually emerging, a set of operations
systems which are presently undergoing ex- which can be applied to programs without
tensive development and experimentation. rethinking the specific problem each time.
We are gradually learning about program I have already mentioned several of these
transformations, which are more complicated transformations: doubling up of loops (Ex-
than formula manipulations but really not ample 2a), changing final calls to go t o ' s
very different. A program-manipulation sys- (Example 6a), using a stack for recursions
tem is obviously what we've been leading up (Example 6c), and combining disjoint loops
to, and I wonder why I never thought of it over the same range [18]. The idea of macro-
before. Of course, the idea isn't original with expansions in general seems to find "its most
me; when I told Hoare, he said, "Exactly!" appropriate realization as part of a program
and referred me to u recent paper by Darling- manipulation system.
ton and Burstall [18]. Their paper describes Another well-known example is the re-
a system which removes some recursions moval of invariant subexpressions from
from a LisP-like language (curiously, without loops. We are all familiar with the fact that
introducing any go to's), and which also a program which includes such subexpres-
does some conversion of data structures sions is more readable than the corresponding
(from sets to lists or bit strings) and some program with invariant subexpressions

Computing Surveys, ~ui. 6, Noo4, December 1974


[
284 • Donald E . K n u t h

moved out of their loops; yet we consciously Furthermore, there is a rather simple way
remove them when the running time of the to understand this program, by providing
program is important. suitable "loop invari~nts". At the beginning
Still another type of transformation occurs of the first (outer) loop, suppose the stack
when we go from high-level "abstract" data contents from top tO bottom are t,, . . . , t~
structures to low-level "concrete" ones (see for some n > 0; then the procedure's re-
Hoare's chapter in [17] for numerous ex- maining duty is to accomplish the effect of
amples). In the case of Example 6c, we can
replace the stack by an array and a pointer, treeprint (t) ;
print(A[t,]) ; treeprint(R[t,]) ;
arriving at
E x a m p l e 6d: print(A[tl]) ; treeprint(R[tl]) ; (*)

procedure treeprint(t); integer t; v a l u e t; In other words, the purpose of the stack


begin integer array S[1: n]; i n t e g e r k; k : = 0;
is to record postponed obligations to print
L I : l o o p w h i l e t ~ 0: the A's and right subtrees of certain nodes.
k : = k + l ; S[k] : = t;
t : = L[t]; g o t o L1; Once this concept is grasped, the meaning
L2: t := S[k]; k := k - l ; of the program is clear and we can even see
print (AIt]) ; how we might have written it without ever
t := R[t]; thinking of a recursive formulation or a
repeat; go to statement: The innermost loop ensures
ifk~0then go toL2fi;
end. t ffi 0, and afterwards the program reduces
the stack, maintaining (*) as the condition
Here the programmer must specify a safe to be fulfilled, at key points in the outer loop.
value for the maximum stack size n, in order A careful programmer might notice a
to make the transformation legitimate. Al- source of inefficiency in this program: when
ternatively, he may wish to implement the L[t] = 0, we put t on the stack, then take it
stack by a linked list. This choice can usually off again. If there are n nodes in a binary
be made without difficulty, and it illustrates tree, about half of them, on the average, will
another area in which interaction is prefer- have L[t] ffi 0 so we might wish to avoid
able to completely automatic t r a n s f o r m a - this extra computation. I t isn't easy to do
tions. t h a t to Example 6e without major surgery
on the structure; but it is easy to modify
Recursion vs. Iteration
Example 6e (or 6d), by simply bracketing
Before leaving the treeprint example, I would the souree of inefficiency, including the go
like to pursue the question of go to elimina- t o , and the label, and all.
tion from Example 6c, since this leads to
some interesting issues. It is clear that the E x a m p l e 6f:
first go to is just a simple iteration, and a
little further study shows that Example 6e procedure treeprinl(t); v a l u e t; i n t e g e r t;
is just one simple iteration inside another, begin integer stack S; S : = empty;
LI: loop while t ~ 0:
namely (in Dahl's syntax) L3: i f L[t] ~ 0
E x a m p l e 6e: then S < = t; t : = L[t]; g o t o L1;
L2: t < = S;
procedure treeprint(t); integer l; value t; fi;
begin integer stack S; S : = empty; print(A[t]) ;
loop: t := R[tl;
loop w h i l e t ~ 0: repeat;
S<=t; i f nonempty(S) t h e n g o t o L2 fi;
t := L[t]; end.
repeat;
w h i l e nonemply(S) : Here we notice t h a t a further simplification
t<= S; is possible: go to L1 can become go to L3
print(A[t]);
t := Rit]; because t is known to he nonzero.
repeat; An equivalent go to-free program analo-
end. gous to Example 6e is

ComputingSurveys, Vol. 6, No. 4, December 1974


i
Structured Programming with go to Statement8 • 285

E x a m p l e 6g: found that the corresponding reeursive ver-


procedure treeprint(t); value t; integer t;
sion took about 2.1 unlfis of time per node
begin integer stack S; S := e m p t y ; using our ALGOLW compiler for the 360/67;
l o o p u n t i l finished: and the ratio was 1.16 using the SAIL com-
if/ ~0 piler for the PDP-10. (Incidentally, the
then relative run-times for Example 6f were 0.8
l o o p w h i l e L[t] ~ 0:
S<=t; with ALGOL W, and 0.7 with SAIL. When
t := L[t]; subscript ranges were dynamically checked,
repeat; ALGOL W took 1.8 units of time per node for
e|se the nonrecursive version, and 2.8 with the
i f nonempty (S) recursive version; SAIL'S figures were 1.28
t h e n t < = S;
else finished; and 1.34.)
fi;
fi; Boolean Variable Elimination
print(A[t]) ;
t :ffi R[t]; Another important program transformation,
repeat; somewhat less commonly known, is the re-
end. moval of Boolean variables by code duplica-
tion. The following example is taken from
I deriv'ed this program by thinking of the Dijkstra's treatment [26, pp. 91-93] of
loop invariant (*) in Example 6e and acting
Hoare's "Quicksort" algorithm. The idea is
accordingly, not by trying to eliminate the to rearrange array elements A[m]... A[n] so
go to's from Example 6f. So I know this
that they are partitioned into two parts:
program is well-structured, and I therefore
The left part A i m ] . . . A[j--1], for some
haven't succeeded in finding an example of appropriate j, will contain all the elements
recursion removal where go to's are strictly less than some value, v; the right part
necessary. It is interesting, in fact, that our A [ j + 1]... A[n] will contain all the elements
transformations originally intended for effi-
greater than v; and the element A[j] lying
ciency led us to new insights and to programs between these parts will be equal to v.
that still possess decent structure. However, Partitioning is done by scanning from the
I still feel that Example 6f is easier to under- left until finding an element greater than v,
stand than 6g, given that the reader is told
then scanning from the right until finding an
the recursive program it comes from and the
element less than v, then scanning from the
transformations that were used. The recur-
left again, and so on, moving the offending
sire program is trivially correct, and the
elements to the opposite side, until the two
transformations require only routine verifi-
scans come together; a Boolean variable up
cation; by contrast, a mental leap is needed
is used to distinguish the left scan from the
to invent (*). right.
Does recursion elimination help? Clearly
there won't be much gain in this example if Example 7 :
the print routine itself is the bottleneck. But
i := m ; j := n;
let's replace print(A[t]) by v : = A[j]; u p := true;
i := i-t-1; B[i] := A[t]; loop:
if up
t h e n i f A[/] ~> v
i.e., instead of printing the tree, let's assume t h e n A[3] :ffi A[i]; u p :ffi f a l s e fii
that we merely want to transfer its contents else i f v > A[j]
to some other array B. Then we can expect t h e n A[i] :-: A[j]; up :-- t r u e fi;
to see an improvement. fi;
After making this change, I tried the re- i f u p t h e n i :ffi i + 1 else j :ffi j - - 1 fi;
while i < j repeat;
cursive Example 6 vs. the iterative Example A[j] := v;
6d on the two main ALGOL compilers avail-
able to me. Normalizing the results so that The manipulation and testing of up is
6d takes 1.0 units of time per node of the rather time-consuming here. We can, in
tree, with subscript checking suppressed, I general, eliminate a Boolean variable by

Computing Surveys, Vo~ 6, No. 4, Deeember lff/4


286 • Donald E. Knuth

storing its current value in the program grammer would write for the examples, and
counter, i.e., b y duplicating the program, the other with the object code produced by
letting one part of the text represent t r u e a typical compiler that does only local op-
and the other part false, with jumps be- timizations. The assembly-language pro-
tween the two parts in appropriate places. grammer will keep i, j, v, and up in registers,
Example 7 therefore becomes while a typical compiler will not keep vari-
Example 7a: ables in registers from one statement to
another, except if they happen to be there
i := m;j := n; b y coincidence. Under these assumptions,
v := A[jl; the asymptotic running time for all entire
loop: i f A [ i ] > v
t h e n A[j] := A[i]; g o t o upf fi; Quicksort program based on these routines
upt:i := i+1; will be
w h i l e i < j r e p e a t ; g o t o common; assembled compiled
loop: i f v > A[j]
t h e n A[i] := A[j]; g o t o upt fi; Example 7 202/~N In N 55~6NIn N
upf: j := j--l; Example 7a 1 5 ~ N In N 40N In N
while i < j repeat;
common: A[j] := v;
expressed in memory references to data and
instructions. So Example 7a saves more than
Note that again we have come up with a 25 % of the sorting time.
program which has jumps into the middle of I showed this example to Dijkstra, cau-
iterations, yet we can understand it since we tioning him that the go t o leading into an
know that it came from a previously under- iteration might be a terrible shock. I was
stood program, by way of an understandable extremely pleased to receive his reply [31]:
transformation.
Of course this program is messier than the Your technique of storing the value of up in
the order counter is, of course, absolutely safe.
first, and we must ask again if the gain in I did not faint! I am in no sense "afraid" of a
speed is worth this cost. If we are writing a program constructed that way, but I cannot
sort procedure that will be used many times, consider it beautiful: it is really the same
we will be interested in the speed. The repetition with the same terminating condi-
average running time of Quicksort was tion, that "changes color" as the computation
proceeds.
analyzed b y Hoare in his 1962 paper on the
subject [36], and it turns out that the body He went on to say that he looks forward to
of the loop in Example 7 is performed about the day when machines are so fast t h a t we
2N In N times while the statement up := won't be under pressure to optimize our
false is performed about ~ N In N times, if programs; yet
we are sorting N elements. All other parts of
For the time being I could not agree mare with
the overall sorting program (not shown your closing remarks : if the economies matter,
here) have a running time of order N or less, apply "disciplined optimalization" to a nice
so when N is reasonably large the speed of program, the correctness of which has been
the inner loop governs the speed of the entire established beyond reasonable doubt. Your
sorting process. (Incidentally, a recursive massaging of the program text is then no
longer trickery ad hoe, it is perfectly safe and
version of Quicksort will run just about as sound.
fast, since the recursion overhead is not
part of the inner loop. But in this case the I t is hard for me to express the joy that this
removal of recursion is of great value for letter gave me; it was like having all my
another reason, because it cuts the auxiliary sins forgiven, since I need no longer feel
stack space requirement from order N to guilty about my optimized programs.
order log N.)
Using these facts about inner loop times, Coroutines
we can make a quantitative comparison of Several of the people who read the first draft
Examples 7 and 7a. As with Example 1, it of this paper observed that Example 7a can
seems best to make two comparisons, one perhaps be understood more easily as the
with the assembly code that a decent pro- result of eliminating coroutine linkage instead

ComuutingSurveys, Vol. 6, No. 4, December 1974


Structured Programming with g o t o Statements • 287

of Boolean variables. Consider the following coroutines, or by fo~ming an equivalent


program: program which expresses the coroutine link-
age in terms of g o t o statements; it appears
Example 7b:
to be cumbersome (though not impossible)
e o r o u t i n e move i; to do the job without using either feature.
loop: i f A[i] > v
t h e n A[j] := A[i];
r e s u m e move j; Quicksort.. A Digression
fi; Dijkstra also sent another instructive ex-
i := i+1; ample in his letter [30]. He decided to create
while i < j repeat; the program of Example 7 from scratch, as
e o r o u t i n e move j; if Hoare's algorithm had never been in-
l o o p : i f v > A[j]
t h e n A[i] := A[j]; vented, starting instead with modern ideas
r e s u m e move i; of semi-automatic program construction
fi; based on the following invariant relation:
j := j--l;
while i < j repeat; v = A[n] A
i := m;j := n; v := A[jl; Vk(m<_k<i =>A[kl_< v) A
call move i; Vk(j < k < n = > A[k] _> v).
A[jl := v;
The resulting program is unusual, yet per-
When a coroutine is "resumed", let's as- haps cleaner than Example 7:
sume that it begins after its own r e s u m e
i :ffi m;j := n - i ; v := A[nh
statement; and when a coroutine terminates, l o o p w h i l e i < j;
let's assume t h a t the most recent call state- i f A[j] ~ v t h e n j := j - l ;
ment is thereby completed. (Actual coroutine e l s e A[i] := : A[j]; i :ffi i + 1 ;
linkage is slightly more involved, see Chapter fi;
3 of [17], but this description will suffice for repeat;
i f j ~ m t h e n Alml := : Alnl; j := m fi;
our purposes.) Under these conventions,
Example 7b is precisely equivalent to Ex- Here " : = :" denotes the interchange (i.e.,
ample 7a. At the beginning of move i we swap) operation. At the conclusion of this
know t h a t A[k] <_ v for all k < i, and t h a t program, the A array will be different than
i < j, and t h a t {A[m], . . - , A [ j - 1 ] , A [ j + I ] , before, but we will have the array parti-
• .. ,A[n]} 0 v is a permutation of the orig- tioned as desired for sorting (i.e., A[m]. • • A[j]
inal contents of {A[m], . . . , A[n]l; a similar are ~ v and A [ j + I ] . . . A [ n ] are ~v).
statement holds at the beginning of move j. Unfortunately, however, this "pure" pro-
This separation into two coroutines can be gram is less efficient than Example 7, and
said to make Example 7b conceptually sim- Dijkstra noted that he didn't like it very
pler than Example 7; but on the other hand, much himself. In fact, Quicksort is really
the idea of coroutines admittedly takes some quick in practice because there is a method
getting used to. that is even better than Example 7a: A good
Christopher Strachey once told me about Quicksort routine will have a faster inner
an example which first convinced him that loop which avoids most of the "i < j " tests.
coroutines provided an important control Dijkstra recently [31] sent me another ap-
structure. Consider two binary trees repre- proach to the problem, which leads to a
sented as in Examples 5 and 6, with their A much better solutiom First we can abstract
array information in increasing order as we the situation by considering any notions
traverse the trees in symmetric order of their "small" and "large" so that: a) an element
nodes. The problem is to merge these two A A[i] is never both small and large simultane-
array sequences into one ordered sequence. ously; b) some elements might be neither
This requires traversing both trees more or small nor large; c) we wish to rearrange an
less asynchronously, in symmetric order, so array so that all small elements precede all
we'll need two versions of Example 6 running large ones; and d) there is at least one ele-
cooperatively. A conceptually simple solu- ment which is not small, and at least one
tion to this problem can be written with which is not large. Then we can write the

Computing Surveys, Vo|. 0, No. 4, December 1974


t
288 • Donald E. Knuth

following program in terms of this abstrac- • i := m - - l ; j :ffi n; v := A[n];


tion. loop u n t i l pointers have met:
loop: i := i + 1 ; w h i l e A[i] < v repeat;
Example 8: i f i _) j t h e n pointers have met; fi
A[j] := A[i];
i := m;j := n; loop: j := j - l ; w h i l e A{j] > v repeat;
loop: i f i _> j t h e n j := i; pointers have met; fi
loop w h i l e A[i] is small: A[i] := A[j];
i := i + 1 ; repeat; repeat;
loop while A[j] is large: Afj] := v;
j := j - l ; repeat;
w h i l e i < j: At the conclusion of this routine, the
A[i] := : A[j]; contents of A [ m ] . . . A[n] have been per-
i := i + l ; j := j--l; muted so t h a t A i m ] . . . A [ j - 1 ] are < v
repeat;
and A [ j + I ] - . . A[n] are _> v and A[j] = v
At the beginning of the first (outer) loop and m < j < n. The assembled version will
we know that A[k] is not large for m g k < i, make about l l N In N references to memory
and that A[k] is not small for j < k < n; on the average, so this program saves 28 %
also that there exists a k such that i < k _< n of the running time of Example 7a.
and A[k] is not small, and a k such that When I first saw Example 8 I was cha-
m < k < j and A[k] is not large. The opera- grined to note that it was easier to prove
tions in the loop are easily seen to preserve than my program, it was shorter, and (the
these "invariant" conditions. Note t h a t the crushing blow) it also seemed about 3%
inner loops are now extremely fast, and that faster, because it tested "i < j " only half
they are guaranteed to terminate; therefore as often. M y first mathematical analysis of
the proof of correctness is simple. At the the average behavior of Example 8 indicated
conclusion of the outer loop we know that t h a t the asymptotic number of comparisons
A[m] . . . A [ i - 1 ] and A[j] are not large, that and exchanges would be the same, even
A[i] and A [ j + 1] . . . A[n] are not small, and though the partitioned subfiles included all
t h a t m < j < i < n. N elements instead of N - 1 as in the classical
Applying this to Quicksort, we can set Quicksort routine. But suddenly it occurred
v : = A[n] and write to me t h a t my new analysis was incorrect
because one of its fundamental assumptions
"A[i] < v" in place of "A[i] is small"
"A[j] > v" in place of "A[j] is large" breaks down: the elements of the two subfiles
after partitioning by Example 8 are not in
in the above program. This gives a very random order! This was a surprise, because
pretty algorithm, which is essentially equiva- randomness is preserved by the usual Quick-
lent to the method published by Hoare [38] sort routine. When the N keys are distinct,
in his first major application of the idea of v will be the largest element in the left subtile,
invariants, and discussed in his original and the mechanism of Example 8 shows
paper on Quicksort [36]. Note that since that v will tend to be near the left of that
v = A[n], we know that the first execution subtile. When t h a t subtile is later partitioned,
of "loop while A[j] > v" will be trivial; it is highly likely that v will move to the
we could move this loop to the end of the extreme right of the resulting right sub-
outer loop just before the final repeat. This subtile. So that right sub-subtile will be
would be slightly faster, but it would make subject to a trivial partitioning by its largest
the program harder to understand, so I element; we have a subtle loss of efficiency
would hesitate to do it. on the third level of recursion. I still haven't
The Quicksort partitioning algorithm been able to analyze Example 8, but empiri-
actually given in my book [54] is better than cal tests have borne out my prediction that
Example 7a, but somewhat different from it is in fact about 15 % slower than the book
the program we have just derived. M y algorithm.
version can be expressed as follows (assum- Therefore, there is no reason for anybody
ing that A [ m - 1 ] is defined and <A[n]): to use Example 8 in a sorting routine;

Computing Surveys,Vol. 8, No. 4, December197t


Structured Programmingwith go to ,Statements • 289

though it is slightly cleaner looking than the Each of these programs leads to a Qnick-
method in my book, it is noticeably slower, sort routine that makes about 102~N In N
and we have nothing to fear by using a memory references on the average; the
slightly more complicated method once it former is preferable (except on machines
has been proved correct. Beautiful algo- for which exchanges are clumsy), since it is
rithms are, unfortunately, not always the easier to understand. Thus I learned again
most useful. that I should always keep looking for im-
This is not the end of the Quicksort provements, even when I have a satisfactory
story (although I almost wish it was, since program.
I think the preceding paragraph makes an
important point). After I had shown Ex- Axiomatics of Jumps
ample 8 to my student, Robert Sedgewick, We have now discussed many different
he found a way to modify it, preserving transformations on programs; and there are
the randomness of the sub files, thereby more which could have been mentioned (e.g.,
achieving both elegance and efficiency at the removal of trivial assignments as in [50,
the same'time. Here is his revised program. exercise 1.1-3] or [54, exercise 5.2.1-33]).
E x a m p l e 8a:
This should be enough to establish that a
program-manipulation system will have
i := m-l; j : = n; v : = A[n]; plenty to do.
loop: Some of these transformations introduce
loop: i := i%1; w h i l e A[i] < v repeat;
loop: j : = j - - l ; w h i l e A[j] > v r e p e a t ;
go to statements that cannot be handled
while i < j: very nicely by event:indicators, and in
A[il := : A[jl; general we might expect to find a few pro-
repeat; grams in which go to statements survive.
A[i] : = : A[n]; Is it really a formidable job to understand
(As in the previous example, we assume such programs? Fortunately this is not an
that Aim-1] is defined and < A[n], since insurmountable task, as recent work has
the j pointer might run off the left end.) shown. For many years,: the go to ~tatement
At the beginning of the outer loop the in- has been troublesome in the definition of
variant conditions are now correctness proofs and language semantics;
for example, Hoare and Wirth have pre-
m--l _< i < j < n; sented an axiomatic definition of PASCAL
A[k] < vform-l_< k < i;
A[k] > v for j _< k < n; [41] in which everything but real arithmetic
A[n] = v. and the go to is defined formally. Clint and
Hoare [14] have shown how to extend this
It follows that Example 8a ends with to event-indicator go to's (i.e., those which
A[m]...A[i-1] < v = A[i] _< A [ i + I ] . . . A [ n ] don't lead into iterations or conditionals),
but they stressed that the general case
and m < i < n; hence a valid partition has appears to be fraught with complications.
been achieved. Just recently, however, Hoare has shown
Sedgewick also found a way to improve that there is, in fact, a rather simple way
the inner loop of the algorithm from my to give an axiomatic definition of go t o
book, namely: statements; indeed, he wishes quite frankly
i : = m - - l ; j : = n ; v : = A[n]; that it hadn't been quite so simple. For each
loop: label L in a program, the programmer should
loop: i : = iq-1; w h i l e A[i] < v repeat; state a logical assertion a(L) which is to be
A[j] : = A[i]: true whenever we reach L. Then the axioms
loop: j : = j - - l ; w h i l e h [ j ] > v repeat;
while i < j: {a(L)} go to L {false}
A[il := A[j];
repeat; plus the rules of inference
i f i ~ j then j : = j ~ l ;
A[j] : = v; {~(L)} S{P} t- {a(L)} L:S {P}

Computing Surveys, Vol. 6, No. 4, December 1974


290 • DonaldE. Knuth

are allowed in program proofs, and all and jump on overflow might be
properties of labels and go to's will follow
i f overflow
if the a(L) are selected intelligently. One t h e n overflow : = f a l s e ; g o t o j u m p ;
must, of course, carry out the entire proof else go to no op;
using the same assertion a(L) for each fi;
appearance of the label L, and some choices I still believe that this is the correct way to
of assertions will lead to more powerful write such a program.
results than others. Such situations aren't restricted to in-
Informally, a(L) represents the desired terpreters and simulators, although the
state of affairs at label L; this definition foregoing is a particularly dramatic example.
says essentially that a program is correct if Multiway branching is an important pro-
a(L) holds at L and before all "go to L" gramming technique which is all too often
statements, and that control never "falls replaced by an inefficient sequence of i f
through" a go to statement to the following tests. Peter Naur recently wrote me that he
text. Stating the assertions a(L) is analogous considers the use of tables to control program
to formulating loop invariants. Thus, it is flow as a basic idea of computer science that
not difficult to deal formally with tortuous has been nearly forgotten; but he expects it
program structure if it turns out to be will be ripe for rediscovery any day now. It
necessary; all we need to know is the "mean- is the key to efficiency in all the best; corn-
ing" of each label. priers I have studied.
Some hints of this situation, where one
Reduction of Complication problem reduces to another, have occurred
There is one remaining use of go to for in previous examples of this paper. Thus,
which I have never seen a good replacement, after searching for x and discovering that
and in fact it's a situation where I still it is absent, the "not found" routine can
think go to is the right idea. This situation insert x into the table, thereby reducing the
typically occurs after a program has made a problem to the "found" case. Consider also
multiway branch to a rather large number our decision-table Example 4, and suppose
of different but related cases. A little com- that each period was to be followed by a
putation often suffices to reduce one case to carriage return instead of by an extra space.
another; and when we've reduced one problem Then it would be natural to reduce the
to a simpler one, the most natural thing is post-processing of periods to the return-
for our program to go to the routine which carriage part of the program. In each case, a
solves the simpler problem. go to would be easy to understand.
For example, consider writing an interpre- If we need to find a way to do this without
tive routine (e.g., a microprogrammed saying go to, we could extend Zahn's event
emulator), or a simulator of another com- indicator scheme so that some events are
puter. After decoding the address and fetch- allowed to happen in the t h e n . . , fl part
ing the operand from memory, we do a after we have begun to process other events.
multiway branch based on the operation This accommodates the above-mentioned
code. Let's say the operations include no-op, examples very nicely; but of course it can
add, subtract, jump on overflow, and uncon- be dangerous when misused, since it gives us
ditional jump. Then the subtract routine back all the power of go to. A restriction
might be which allows (statement list)~ to refer to
(event)j only for j > i would be less dan-
operand : = -- operand; g o t o a d d ; gerous.
With such a language feature, we can't
the add routine might be "fall through" a label (i.e., an event indi-
accum := accum -b operand; cator) when the end of the preceding code
tyme : = tyme ...I- 1; is reached; we must explicitly name each
go to no op; event when we go to its routine. ProI~fibiting

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with go to Statements • 291

"fall through" means forcing a programmer in the same way by different people. Every-
to write "go to common" just before the body knows it is a Good Thing, but as
label "common:" in Example 7a; surpris- McCracken [64] has said, "Few people
ingly, such a change actually makes that would venture a definition. In fact, it is not
program more readable, since it makes the clear that there exists a simple definition as
symmetry plain. Also, the program fragment yet." Only one thing is really clear: Struc-
tured programming is not the process of
s u b t r a c t : operand := - operand; g o t o add;
add: accum : = accum + operand; writing programs and then eliminating their
go to statements. We should be able to
seems to be more readable than if "go to define structured programming without
add" were deleted. It is interesting to referring to go to statements at all; then
ponder why this is so. the fact that go to statements rarely need
to be introduced as we write programs should
follow as a corollary.
3. CONCLUSIONS Indeed, Dijkstra's original article [25]
which gave Structured Programming its
This has been a long discussion, and very name never mentions go to statements at
detailed, but a few points stand out. First, all; he directed attention to the critical
there are several kinds of programming question, "For what program structures can
situations in which go to statements are we give correctness proofs without undue
harmless, even desirable, if we are program- labor, even if the programs get large?" By
ming in ALGOLor PL/I. But secondly, new correctness proofs he explained that he does
types of syntax are being developed that not mean formal derivations from axioms,
provide good substitutes for these harmless he means any sort of proof (formal or in-
go to's, and without encouraging a pro- formal) that is "sufficiently convincing";
grammer to create "logical spaghetti". and a proof really means an understanding.
One thing we haven't spelled out clearly, By program structure he means data struc-
however, is what makes some go to's bad ture as well as contro[strueture.
and others acceptable. The reason is that We understand complex things by sys-
we've really been directing our attentior~ to tematically breaking them into successively
the wrong issue, to the objective question simpler parts and understanding how these
of go to elimination instead of the important parts fit together locally. Thus, we have
subjective question of program structure. different levels of understanding, and each
In the words of John Brown [9], "The act of of these levels corresponds to an abstraction
focusing our mightiest intellectual resources of the detail at the level it is composed from.
on the elusive goal of go to-less programs For example, at one level of abstraction, we
has helped us get our minds off all those deal with an integer without considering
really tough and possibly unresolvable whether it is represented in binary notation
problems and issues with which today's or two's complement, etc., while at deeper
professional programmer would otherwise levels this representation may be important.
have to grapple." By writing this long At more abstract levels the precise value of
article I don't want to add fuel to the con- the integer is not important except as it
troversy about go to elimination, since that relates to other data.
topic has already assumed entirely too much Charles L. Baker mentioned this principle
significance; my goal is to lay that contro- as early as 1957, as part of his 8-page review
versy to rest, and to help direct the discus- [2] of McCracken's first book on program-
sion towards more fruitful channels. ming:
Structured Programming Break the problem into small, self-contained
subroutines, trying at all times to isolate the
The real issue is structured programming, various sections of coding as much as possible
but unfortunately this has become a catch . . . [then] the problem is reduced to many
phrase whose meaning is rarely understood much smaller ones. The truth of this seems

Computing Surveys, Vol. 6, No. 4, December 19'I4

~ ~ ~ ~ .....
292 • Donald E. Knuth

very obvious to experienced coders, yet it is From these remarks it is clear t h a t se-
hard to put across to the newcomer. quential composition, iteration, and condi-
tional statements present syntactic struc-
Abstraction is easily understood in terms tures that the eye can readily assimilate;
of B N F notation. A metalinguistic category but a go t o statement does not. The visual
like (assignment statement) is an abstrac- structure of go t o statements is like that of
tion which is composed of two abstractions flowcharts, except reduced to one dimension
(a (left part list) and an (arithmetic expres- in our source languages. In two dimensions
sion)), each of which is composed of abstrac- it is possible to perceive go t o structure in
tions such as (identifier) or (term), etc. We small examples, but we rapidly lose our
understand the program syntax as a whole ability to understand larger and larger
b y knowing the structural details that relate flowcharts; some intermediate levels of
these abstract parts. The most difficult abstraction are necessary. As an under-
things to understand about a program's graduate, in 1959, I published an octopus
syntax are the identifiers, since their meaning flowchart which I sincerely hope is the most
is passed across several levels of structure. horribly complicated that will ever appear in
If all identifiers of an ALGOL program wer~ print; anyone who believes that flowcharts
changed to random meaningless strings of are the best way to understand a program
symbols, we would have great difficulty is urged to look at this example [49]. (See
seeing what the type of a variable is and also [32, p. 54] for a nice illustration of how
what the program means, but we would go to's make a P L / I program obscure, and
still easily recognize the more local features, see R. Lawrence Clark's hilarious spoof
such as assignment statements, expressions, about linear representation of flowcharts b y
subscripts, etc. (This inability for our eyes means of a " c o m e f r o m statement" [13].)
to associate a type or mode with an identifier I have felt for a long time that a t~dent
has led to what I believe are fundamental for programming consists largely of the
errors of human engineering in the design ability to switch readily from microscopic
of ALGOL 68, but that's another story. M y to macroscopic views of things, i.e., to change
own notation for stacks in Example 6e levels of abstraction fluently. I mentioned
suffers from the same problem; it works in this [55] to Dijkstra, and he replied [29]
these examples chiefly because t is lower with an excellent analysis of the situation:
case and S is upper case.) Larger nested
structures are harder for the eye to see unless I feel somewhat guilty when I have suggested
they are indented, but indentation makes the that the distinction or introduction of "differ-
ent levels of abstraction" allow you to think
structure plain. about only one level at a time, ignoring com-
I t would probably be still better if we pletely the other levels. This is not true. You
changed our source language concept so that are trying to organize your thoughts; that is,
the program wouldn't appear as one long you are seeking to arrange matters in such a
string. John M c C a r t h y says " I find it diffi- way that you can concentrate on some portion,
say with 90% of your conscious thinking, while
cult to believe that whenever I see a tree I the rest is temporarily moved away somewhat
am really seeing a string of symbols." In- towards the background of your mind. But
stead, we should give meaningful names to that is something quite different from "ignor-
the larger constructs in our program that ing completely": you allow yourself tem-
porarily to ignore details, but some overall
correspond to meaningful levels of abstrac- appreciation of what is supposed to be or to
tion, and we should define those levels of come there continues to play a vital role. You
abstraction in one place, and merely use remain alert for little red lamps that suddenly
their names (instead of including the de- start flickering in the corners of your eye.
tailed code) when they are used to build I asked t t o a r e for a short definition of
larger concepts. Procedure names do this, structured programming, and he replied that
but the language could easily be designed it is "the systematic use of abstraction to
so that no action of calling a subroutine is control a mass of detail, and also a means of
implied. documentation which aids program design."

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with go t o ~Stah~ments • 293

I hope that m y remarks above have made state an overall purpose, for the statement
the abstract concept of abstraction clear; as a whole. !
the second part of Hoare's definition (which We also need well-structured data; i.e.,
was also stressed by Dijkstra in his original as we write the program we should have an
paper [25]) states that a good way to express abstract idea of what each variable means.
the abstract properties of an unwritten piece This idea is also usually describable as an
of program often helps us to write that invariant relation, e.g.,: "m is the number of
program, and to "know" that it is correct items in the table" or "x is the search argu-
as we write it. ment" Or "L[t] is the number of the root
Syntactic structure is just one part of the node of node t's left subtree, or 0 if this
picture, and B N F would be worthless if the subtree is e m p t y " or "the contents of stack
syntactic constructs did not correspond to S are postponed obligations to do such and
semantic abstractions. Similarly, a good such".
program will be composed in such a way Now let's consider the slightly more
that each semantic level of abstraction has a complex case of an event-driven construct.
reasonably simple relation to its constituent This should also correspond to a meaningful
parts. We noticed in our discussion of abstraction, and our examples show what is
Jacopini's theorem that every program can involved: For each event we give an (in-
trivially be expressed in terms of a simple variant) assertion which describes the situa-
iteration which simulates a computer; but tion which must hold when that event
that iteration has to carry the entire be- occurs, and for the loop u n t i l we also give
havior of the program through the loop, so an invariant for the loop. An event statement
it is worthless as a level of abstraction. typically corresponds to an abrupt change
An iteration statement should have a in conditions so t h a t a different assertion
purpose that is reasonably easy to state; from the loop invariant is necessary.
typically, this purpose is to make a certain An error exit can be considered well-
Boolean relation true while maintaining a structured for precisely t h i s \ r e a s o n - - i t
certain invariant condition satisfied by the corresponds to a situation that is~impossible
variables. The Boolean condition is stated according to the local invariant assertions;
in the program, while the invariant should it is easiest to formulate assertions that
be stated in a comment, unless it is easily assume nothing will go ~ o n g , rather than
supplied by the reader. For example, the to make the invariants cover all contin-
invariant in Example 1 is that A[k] ~ x for gencies. When we jump out to an error exit
1 ~ /~ ~ i, and in Example 2 it is the same, we go to another level of abstraction having
plus the additional relation Aim-k 1] = x. different assumptions.
Both of these are so obvious that I didn't As another simple example, consider bi-
bother to mention them; but in Examples nary search in an ordered array using the
6e and 8, I stated the more complicated invariant relation A[i] < x < A[j]:
invariants that arose. In each of those cases
loop while i~l < j;
the program almost wrote itself once the k : = (i+j) + 2;
proper invariant was given. Note that an i f A[k] < x t h e n i :ffi k;
"invariant assertion" actually does vary e l s e i f A [ k ] > x t h e n j :ffi k;
slightly as we execute statements of the]oop, e l s e cannot preserve the invariant fi;
fi;
b u t it comes back to its original form when
repeat;
we repeat the loop.
Thus, an iteration makes a good abstrac- Upon normal exit from this loop, the
tion if we can assign a meaningful invariant conditions i - b l ~ j and A[i] < x < A[3]
describing the local states of affairs as it imply that A[i] < x < A[i-kl], i.e., t h a t x
executes, and if we can describe its purpose is not present. If the program comes to
(e.g., to change one state to another). Simi- "cannot preserve the iinvariant" (because
larly, an i f . - - t h e n -.- else -.- tl state- x = A[k]), it wants to go t o another set of
ment will be a good abstraction if we can assumptions. The event-driven construct

Computing Surveys ~ol. 6, No. 4, December 1974


294 • Donald E. Knuth

provides a level at which it is appropriate [24] used three go t o statements, all of


to specify the other assumptions. which were perfectly easy to understand;
Another good illustration occurs in Ex- and I think at most two of these would
ample 6g; the purpose of the main i f state- have disappeared from his code if ALGOL 60
ment is to find the first node whose A value had had a w h i l e statement. B u t go t o is
should be printed. If there is no such t, the hardly ever the best alternative now, since
event "finished" has clearly occurred; it is better language features are appearing. If
better to regard the i f statement as having the invariant for a label is closely related to
the stated abstract purpose without con- another invariant, we can usually save com-
sidering that t might not exist. plexity b y combining those two into one
abstraction, using something other than
With go to Statements go t o for the combination.
We can also consider go t o statements from There is also another problem, namely at
the same point of view; when do they cor- what level of abstraction should we introduce
respond to a good abstraction? We've al- a label? This however is like the analogous
ready mentioned t h a t go to's do not have a problem for variables, and the general an-
syntactic structure that the eye can grasp swer is still unclear in both cases. Aspects
automatically; but in this respect they are of data structure are often postponed, b u t
no worse off than variables and other iden- sometimes variables are defined and passed
tifiers. When these are given a meaningful as "parameters" to other levels of abstrac-
name corresponding to the abstraction tion. There seems to be no clearcut idea as
(N.B. not a numeric label!), we need not yet about a set of syntax conventions, relat-
apologize for the lack of syntactic structure. ing to the definition of variables, which
And the appropriate abstraction itself is an would be most appropriate to structured
invariant essentially like the assertions programming methodology; but for each
specified for an event. particular problem there seems to be an
I n other words, we can indeed consider appropriate level.
go t o statements as part of systematic ab-
straction; all we need is a clearcut notion of Efficiency
exactly what it means to go t o each label. In our previous discussion we concluded that
This should come as no great surprise. After premature emphasis on efficiency is a big
all, a lot of computer programs have been mistake which may well be the source of
written using go t o statements during the most programming complexity and grief.
last 25 years, and these programs haven't We should ordinarily keep efficiency con-
all been failures! Some programmers have siderations in the background when we for-
clearly been able to master structure and mulate our programs. We need to be sub-
exploit it; not as consistently, perhaps, as in consciously aware of the data processing
modern-day structured programming, b u t tools available to us, but we should strive
not inflexibly either. B y now, many people most of all for a program that is easy to
who have never had any special difficulty understand and almost sure to work. (Most
writing correct programs have naturally programs are probably only run once; and
been somewhat upset after being branded I suppose in such cases we needn't be too
as sinners, especially when they know per- fussy about even the structure, much less
fectly well what they're doing; so they have the efficiency, as long as we are happy with
understandably been less than enthusiastic the answers.)
about "structured programming" as it has When efficiencies do matter, however, the
been advertised to them. good news is that usually only a very small
My. feeling is that it's certainly possible fraction of the code is significantly involved.
to write well-structured programs with go t o And when it is desirable to sacrifice clarity
statements. For example, Dijkstra's 1965 for efficiency, we have seen that it is possible
program about concurrent process control to produce reliable programs that can be

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with go t o Sta~ments • 295

maintained over a period of time, if we start The Future


with a well-structured program and then use I t seems clear that lan~ages somewhat
well-understood transformations that can be different from those in existence today
applied mechanically. We shouldn't a t t e m p t would enhance the preparation of structured
to understand the resulting program as it programs. We will perhaps eventually be
appears in its final form; it should be thought writing only small modules which are iden-
of as the result of the original program modi- tified by name as they are used to build
fied by specified transformations. We can larger ones, so that devices like indentation,
envision program manipulation systems rather than delimiters, might become feasible
which will facilitate making and document- for expressing local structure in the source
ing these transformations. language. (See the discussion following
In this regard I would like to quote some Landin's paper [59].) Although our examples
observations made recently by Pierre-Arnoul don't indicate this, it turns out t h a t a given
de Marneffe [19]: level of abstraction often involves several
In civil engineering design, it is presently a related routines and data definitions; for
mandatory concept known as the "Shanley example, when we decide to represent a table
Design Criterion" to collect several functions in a certain way, we simultaneously want to
into one p a r t . . . If you make a cross-section specify the routines for storing and fetching
of, for instance, the German V-2, you find ex-
ternal skin, structural rods, tank wall, etc. If information from that table. The next gep-
you cut across the Saturn-B moon rocket, you eration of languages will probably take into
find only an external skin which is at the same account such related routines.
time a structural component and the tank Program manipulation systems appea~ to
wall. Rocketry engineers have used the "Shan- be a promising future tool which will help
ley Principle" thoroughly when they use the
fuel pressure inside the tank to improve the programmers to improve their programs, and
rigidity of the external skin! . . . People can to enjoy doing it. Standard operating pro-
argue that structured programs, even if they cedure nowadays is usually to hand code
work correctly, will look like laboratory critical portions of a routine in assembly
prototypes where you can discern all the indi- language. Let us hope such assemblers will
vidual components, but which are not daily
usable. Building "integrated" products is an die out, and we will see several levels of
engineering principle as valuable as structur- language instead: At the highest levels we
ing the design process. will be able to write abstract programs, while
He goes on to describe plans for a prototype at the lowest levels we will be able to control
system that will automatically assemble storage and register allocation, and to sup-
integrated programs from well-structured press subscript range checking, etc. With an
ones that have been written top-down by integrated system it will be possible to do
stepwise refinement. debugging and analysis of the transformed
Today's hardware designers certainly program using a higher level language for
know the advantages of integrated cir- communication. All levels will, of course,
cuitry, but of course they must first under- exhibit program structure syntactically so
stand the separate circuits before the inte- that our eyes can grasp it.
gration is done. The V-2 rocket would never I guess the big question, although it really
have been airborne if its designers had orig- shouldn't be so big, is whether or not the
inally tried to combine all its functions. ultimate language will have go t o statements
Engineering has two phases, structuring and in its higher levels, or whether go t o will be
integration; we ought not to forget either confined to lower levels. I personally
one, but it is best to hold off the integration wouldn't mind having go t o in the highest
phase until a well-structured prototype is level, just in case I really need it; but I prob-
working and understood. As stated by Wein- ably would never use it, if the general
berg [93], the former regimen of analysis/ iteration and event constructs suggested in
coding/debugging should be replaced by this paper were present. As soon as people
analysis/coding/debugging/improving. learn to apply principles of abstraction

Computing Surveys, V01. 6, No. 4, December 1974


296 • Donald E. Knuth

consciously, they won't see the need for go sation, and four computer listings:
to, and the issue will just fade away. On the Frances E. Mien Ralph L. London
other hand, W. W. Peterson told me about Forest Baskett Zohar Manna
his experience teaching P L / I to beginning G. V. Bochmann W. M. McKeeman
programmers: He taught them to use go t o Per Brinch Hansen Harlan D. Mills
R. M. Burstall Peter Naur
only in unusual special cases where i f and Vinton Cerf Kjell Overholt
w h i l e aren't right, but he found [78] t h a t T. E. Cheatham, Jr. James Pe~erson
"A disturbingly large percentage of the John Cocke W. Wesley Peterson
students ran into situations that require Ole-Johan Dahl Mark Rain
go to's, and sure enough, it was often because Peter J. Denning John Reynolds
Edsger Dijkstra Barry K. Rosen
w h i l e didn't work well to their plan, but James Eve E. Satterthwaite, Jr.
almost invariably because their plan was K. Friedenbach D. V. Schorre
poorly thought out." Because of arguments Donald I. Good Jacob T. Schwartz
like this, I'd say we should, indeed, abolish Ralph E. Gorin Richard L. Sites
go t o from the high-level language, at least Leo Guibas Richard Sweet
C. A. R. Hoare Robert D. Tennent
as an experiment in training people to Martin Hopkins Niklaus Wirth
formulate their abstractions more carefully. James J. Homing M. Woodger
This does have a beneficial effect on style, B. M. Leavenworth William A. Wulf
although I would not make such a prohibi- Henry F. Ledgard Charles T. Zaha
tion if the new language features described These people unselfishly devoted hundreds of
above were not available. The question is man-hours to helping me revise the firstdraft; and
whether we should ban it, or educate against I'm sorry that I wasn't able to reconcile all of their
i t ; should we attempt to legislate program interesting points of view. In many places I have
shamelessly used their suggestions without an
morality? In this case I vote for legislation, explicit acknowledgment; this article is virtually
with appropriate legal substitutes in place a joint paper with 30 to 40 co-authors! However,
of the former overwhelming temptations. any mistakes it contains are my own.
A great deal of research must be done if
we're going to have the desired language b y APPENDIX
1984. Control structure is merely one simple
issue, compared to questions of abstract data In order to make some quantitative esti-
structure. I t will be a major problem to keep mates of efficiency, I have counted memory
the total number of language features within references for data and instructions, assum-
tight limits. And we must especially look at ing a multiregister computer without cache
problems of i n p u t / o u t p u t and data for- memory. Thus, each instruction costs one
matting, in order to provide a viable alterna- unit, plus another if it refers to memory;
tive to CoBoL. small constants and base addresses are as-
sumed to be either part of the instruction or
ACKNOWLEDGMENTS present in a register. Here are the code se-
quences developed for the first two examples,
I've benefited from a truly extraordinary amount
of help while preparing this paper. The individuals assuming that a typical assembly-language
named provided me with a total of 144 pages of programmer or a very good optimizing com-
single-spaced comments, plus six hours of conver- piler is at work.

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Programming with go t o 81z~ ~nent~ 297

LABEL INSTRUCTION COST TIMES

Example 1: r l ~-- 1 1 1
r2 ~ m 2 1
r3 ~ - x 2 1
to test 1 1
loop: A[rl]: r3 2 n-a
t o found i f = 1 n-a
r l ~- r l + l 1 n-1
test: r l : r2 1 n
t o loop i f _4 1 n
notfound: m +-rl 2
A[rl] ~-- r3 2 a
B[rl] ~-- 0 2 a
found: r4 ~-- B[rl] 2 1
r4 ~-- r 4 + l 1 1
B[rl] * - r4 2 1

LABEL INSTRUCTION COST TIMES

E x a m p l e 2: r2 ~-- m 2 I
r3 ~--x 2 1
A [ r 2 + l ] ~-- r3 2 1
r l ~--0 1 1
loop: r l ~-- r l + l 1 n
A[rl]: r3 2 n
t o loop i f ~ 1 n
r l : r2 1 1
to found if < 1 1
notfound: m ~-- r l etc. as in Example I.

Computing Surveys, V01. 6, No. 4, Dee,ember 1974


298 * Donald E. Knuth

A traditional "90% efficient c o m p i l e r " w o u l d r e n d e r t h e first e x a m p l e


as follows:

LABEL INSTRUCTION COST TIMES

Example 1: r l ~-- 1 1 1
to test 1 1
iner: r l ¢- i 2 n--1
r l ~-- r 1 + 1 1 n--1
test: rl : m 2 n
t o notfound i f ~ 1 n
i ¢-- r l 2 n-a
r2 ~-- A[rl] 2 nn-a
r2: x 2 n-a
t o found i f -- 1 n-a
t o iner 1 n- 1
notfound: r l ~-- m 2 a
r l ~-- r l T 1 1 a
i 4--rl 2 a
mc-rl 2 a
r l ~--x 2 a
r2¢-i 2 a
A[r2] ~-- r l 2 a
Bit2] ~-- 0 2 a
found: rl ~-- i 2 1
r2 ~-- B[rl] 2 1
r2 ~- r2W1 1 1
B[rl] ~- r2 2 1

4t Answer to PL/I Problem, page 267.


4t
T h e v a r i a b l e I is i n c r e a s e d b e f o r e F O U N D is t e s t e d . O n e w a y t o fix
4b ~t
t h e p r o g r a m is t o i n s e r t " I = I - F O U N D ; " b e f o r e t h e l a s t s t a t e -
* ment. *
46 4$

BIBLIOGRAPHY [5] BAUER/ F . L . " A - hpi " Iosophy


" of program-
ruing,' University of London Special Lec-
tures in Computer Science (October 1973):
[1] ASHCROFT, EDWARD, AND MANNA, ZOHAR. Lecture notes published by Math. Inst.,
" T h e translation of 'go to' programs to Tech. Univ. of Munich, Germany.
'while' programs," Proc. I F I P Congress [6] BERRY, DANIELM. "Loops with normal and
1971 Vol. 1, North-Holland Publ. Co., Am- abnormal exits," Modeling and Measure-
sterdam, The Netherlands, 1972, 250-255. ment Note ~8, Computer Science Depart-
[2] BAKER, CHARLES L. " R e v i e w of D. D. Mc- ment, Univ. California, Los Angeles, Calif.
Cracken, Digital co~rputer programming," 1974, 39 pp.
Math. Comput. 11 (19M), 298-305. [7] BOCHMANN, G . V . Multiple ex'ts from a
[3] BAKER, F. TERRY, AND MILLS, HARLAN D. loop without the G O T O , " Comm. A C M 16,
"Chief programmer t e a m s , " Datamation 19, 7 (July 1973), 443-444.
12 (December 1973), 58-61. [8] B~HM, CORRADO AND JACOPINI, GUISEPPE.
[4] BARRON, D. W. Recursive techniques i n "Flow-diagrams, Turing machines, and
programming, American Elsevier, New languages with only two formation rules,"
York, 1968, 64 pp. Comm. A C M 9, 5 (May 1966), 366-371.

Computing Surveys, Vol. 6, No. 4, December 1974


Structured Pregramming with go t o Statements • 299

[9] BROWN,JOHN R. "In m e m o r i a m . . . " , un- guages, C. Boon [Ed]., ~.nfoteeh State of the
published note, January 1974. Art Report 7, 1972, 217~232.
[10] BRUNO J., AND STIEGLITZ, K. "The expres- [29] DIJKSTRA,E. W. persbnal communication,
sion of algorithms by charts," J. ACM 19, January 3, 1973.
3 (July 1972), 517-525. [30] DIJKSTRA,E. W. personal communication,
[11] BURKHARD,W. A. "Nonrecursive tree tra- November 19, 1973.
versal algorithms," in Proc. 7th Annual [31] DIJKSTRA,E. W. personal communication,
Princeton Conf. on Information Sciences and • January 30, 1974.
Systems, Princeton Univ. Press, Princeton, [32] DONALDSON, JAMES'R. "Structured pro-
N.J., 1973, 403-405. gramming," Datamation 19, 12 (December
[12] CHEATHAM,T. E., JR., ANDWEGBREIT, BEN. 1973), 52-54.
"A laboratory for the study of automating [33] DYLAN, Bos. Blonde on blonde, reeord album
programming," in Proc. A F I P S 1972 Spring produced by Bob John~ston, Columbia Rec-
Joint Computer Conf., Vol. 40, AFIPS Press, ords, New York, March 1966, Columbia C2S
Montvale, N.J., 1972, 11-21. 841.
[13] CLARK,R. LAWRENCE. "A linguistic contri- [34] GILL, STANLEY. "Automatic computing: Its
bution to GOTO-less programming," Data- problems and prizes," Computer J. 8, 3
marion 19, 12 (December 1973), 62-63. (October 1965), 177-189.
[14] GLINT,M., AND HOARE, C. A. R. "Program [35] HENDERSON,P. AND S~OWDON, R. "An ex-
proving: jumps and functions," Acta Infor- periment in structured programming,"
matica 1, 3 (1972), 214-224. B I T 12, 1 (1972),,~8~-5~. ,, _ . ,
[15] COOPER, D. C. "The equivalence of certain [36] HOARE, C. A . R . Quicksort, Computer J.
computations," Computer J. 9, 1 (May 5, 1 (1962), 10-15.
1966), 45-52. [37] HOARE, C. A. R. "An! axiomatic approach
[16] COOPER, D. C. "BShm and Jacopini's re- to computer programming," Comm. ACM
duction of flow charts," Comm. ACM 10, 8 12, 10 (October 1969!, 576-880, 583.
(August 1967), 463, 473. [38] HOARE, C. A. R. 'Proof of a program:
[17] DAHL,O.-J., DIJKSTRA, E. W., AND HOARE, F I N D , " Comm. ACM 14, 1 (January 1971),
C. A. R. Structured programming, Academic 39-45.
Press, London, England, 1972, 220 pp. [39] HOARE, C. A. R. "A note on the for state-
[18] DARLINGTON,J., AND BURSTALL, R. M. "A ment," B I T 12, 3 (1972), 334-341.
system which automatically improves pro- [40] HOARE, C. A. R. "Prospects for a better
grams," in Proc. 8rd Interntl. Conf. on Arti- programming language," in High level lan-
ficial Intelligence, Stanford Univ., Stanford, guages, C. Boon [Ed.], Infotech State of
Calif., 1973, 479-485. the Art Report 7, 1972, 327-343.
[19] DE MARNEFFE, PIERRE-ARNOUL. "Holon [41] HOARE, C. A. R., AND WXRTH,N. "An axio-
programming: A survey," Universite de matic definition of thb programming lan-
Liege, ~ervice Informatique, Liege, Bel- guage PASCAL," Ac~a ln~ormatiea 2, 4
gium, 1973, 135 pp. (1973), 335-355. i
[20] DIJKSTRA, E. W. "Recursive program- [42] HOARE, C. A. R. "Hints for programming
ming," Numerische Mathematik 2, 5 (1960), language design," COmputer Science re-
312-318. port STAN-CS-74-403, Stanford Univ.,
[21] DIJKSTRA,E. W. "Programming considered Stanford, Calif., January 1974, 29 pp.
as a human activity," in Proc. IFIP Con- [43] HOPKINS, M~R~IN E: "Computer aided
gress 1965, North-Holland Publ. Co., Am- software design," in ~oftware engineering
sterdam, The Netherlands, 1965, 213-217. techniques, J. N. Buxton and B. Randell
[22] DIJKSTRA,E. W. "A constructive approach [Eds.] NATO Scientific Affairs Division,
to the problem of program correctness," Brussels, Belgium, 1970, 99-101.
B I T 8, 3 (1968), 174-186. [44] HOPKINS, MARTIN E, "A case for the
[23] DIJKSTRA, E. W. "Go to statement con- GOTO," ProP. ACM Annual Conference
sidered harmful," Comm. ACM l l , 3 (March Boston, Mass., August 1972, 787-790.
1968), 147-148, 538, 541. [45] HULL, T. E. "Would you believe structured
[There are two instances of pages 147-148 FORTRAN?" SIGNUM Newsletter 8, 4
in this volume; the second 147-148 is rele- (October 1973), 13-16. ~
vant here.] [46] INGALLS, DAN. "The execution time pro-
[24] DIJKSTRA,E. W. "Solution of a problem in file as a programming tool," in Compiler
concurrent programming control," Comm. optimization, 2d Courant Computer Sci-
ACM 9, 9 (September 1968), 569. ence Symposium, Randall Rustin [Ed.],
[25] DIJKSTRA, E. W. "Structured program- Prentice-Hall, Englewood Cliffs, N. J.,
ming," in Software engineering techniques, 1972, 107-128.
J. N. Buxton and B. Randell [Eds.] NATO [471 KELLEY,ROBERT A., AND WAVrERS, JOHN
Scientific Affairs Division, Brussels, Bel- R. "APLGOL-2, a structured programming
gium, 1970, 84-88. system for APL," IBM Palo Alto Scientific
[26] DIJKSTRA,E. W. "EWD316: A short intro- Center report 320-3318 i(August 1973), 29 pp.
duction to the art of programming," Tech- [48] KLEENE, S. C. "Representation of events
nical University Eindhoven, The Nether- in nerve nets," in Automata ~tudies, C. E.
lands, August 1971, 97 pp. Shannon and J. McCarthy [Eds.],Princeton
[27] DIJKSTRA, E. W. "The humble program- University Press, Princeton, N.J., 1956, 3-
mer," Comm. ACM 15, 10 (October 1972), 40.
859-866. [49] KNUTH, DONALD E. "RUNCIBLE--Alge-
[28] DIJKSTRA, E. W. "Prospects for a better braie translation on a limited computer,"
programming language," in High level lan- Comm. A C M 2, 11 (November, 1959), 18-21.

Computing Surveys, Vol.i6, No. 4. December 1~q4


i
300 • Donald E . K n u t h

[There is a bug in the flowchart. The arc [66] MILLAY, EDnA ST. VINCENT. "Elaine"; el.
labeled "2" from the box labeled "0:" in Bartlett's Familiar Quotations.
the upper left corner should go to the box [67] MILLER, EDWARD F., JR., AND LINDAMOOD,
labeledR~ ffi 8003.] GEORGE E. "Structured programming: top-
[50] KNUTH,DONALDE. Fundamental algorithms, down approach," Datamation 19, 12 (De-
The art of computer programming, Vol. 1, cember 1973)i 55--57.
Addison-Wesley, Reading, Mass. 1968 2d [68] MILLS, H. D~ "Top-down programming in
ed., 1973, 634 pp • large systems," in Debugging techniques in
[51] KNUTH, DONALDE. "An empirical study of large systems, Randall Rustin [Ed.], Pren-
FORTRAN programs,!' Software--Practice tice-Hall, Englewood Cliffs, N. J., 1971, 41-
and Experience 1, 2 (April-June 1971), 105- 55.
133. [69] MILLS, H. D. "Mathematical foundations
[52] KNUTH, DONALDE., AND FLOYD,ROBERT W. for structured programming," report FSC
"Notes on avoiding 'go to' statements," 72-6012, IBM Federal Systems Division,
Information Processing Letters 1, 1 (Febru- Gaithersburg, Md. (February 1972), 62 pp.
ary 1971), 23-31, 177. [70] MILLS, H. D, "How to write correct pro-
[53] KNUTH, DONALD E. "George Forsythe and grams and know i t , " report FSC 73-5008,
the development of Computer Science," IBM Federal Systems Division, Gaithers-
Comm. ACM 15, 8 (August 1972), 721-726. burg, Md. (1973), 26 pp.
[54] KNUTH, DONALD E. Sorting and searching, [71] NASSI, I. R., AND AKKOYUNLU,E. A. "Veri-
The art of computer programming, Vol. 3, fication techniques for a hierarchy of con-
Addison-Wesley, Reading, Mass., 1973, 722 trol structures," Tech. report 26, Dept. of
Computer Science, State Univ. of New
[55] ~NUTH, DONALD E. "A review of 'struc- York, Stony Brook, New York (January
tured programming'," Stanford Computer 1974), 48 pp.
Science Department report STAN-CS-73- [72] NAUR, PETER [Ed.] "Report on the al-
371, Stanford Univ., Stanford, Calif., June gorithmic language ALGOL 60," Comm.
1973, 25 p p - ACM 3, 5 (May 1960), 299-314.
[56] KNUTH, DONALD E., AND SZWARCFITER, [73] NAUR,PETER. "Go to statements and good
JAYME L. "A structured program to gener- Algol style," BIT 3, 3 (1963), 204-208.
ate all topological sorting arrangements," [74] NAUR,PETER. "Program translation viewed
Information Processing Letters 2, 6 (April as a general data processing problem,"
1974) 153-157. Comm. ACM 9, 3 (March 1966), 176--179.
[57] KOSARAJU,S. RAO. "Analysis of structured [75] NAUR, PETER. "An experiment on program
rograms," Proe. Fifth Annual ACM Syrup. development," BIT 12, 3 (1972), 347-365.
heory of Computing, (May 1973), 240-252; {76] PAGER, D. "Some notes on speeding up
also in J. Computer and System Sciences, 9, certain loops by software, firmware, and
3 (December 1974). hard w are means, '~ in" Computers and auto-
[58] LANDIN, P. J. "A correspondence between mata, Jerome Fox lEd.f, John Wiley & Sons,
ALGOL 60 and Church's lambda-notation: 1yew x o r k 1972, 207-213; also in IEEE
part I , " Comm. ACM 8, 2 (February 1965), Trans. Computers, C-21, 1 (January 1972),
89-101. 97-100.
[59] LANDIN, P. J. "The next 700 programming
languages," Comm. ACM 9, 3 (March 1966), [77] PETERSON, W. W.; KASAMI, T.; AND TOK-
157-166. UEA, N. "On the capabilities of w h i l e , re-
[60] LEAVENWORTH, B. M. "Programming p e a t , and exit statements," Comm. ACM
with(out) the GOTO," Proc. ACM Annual 16, 8 (August 1973), 503-512.
Conference, Boston, Mass., August 1972, 782- [78] PETERSON,W. WESLEY. personal communi-
786. cation, April 2, 1974.
[61] MANNA, ZOHAR, AND WALDINGER, RICHARD [79] RAIN, MARK AND HOLAOER, PER. "The
J. "Towards automatic program synthesis," present most recent final word about labels
in Symposium on Semantics of Algorithmic in MARY," Machine Oriented Languages Bul-
Languages, Lecture Notes in Mathematics letin 1, Trondheim, Norway (October 1972),
188, E. Engeler [Ed.], Springer-Verlag, New 18-26.
York, 1971, 270-310. [80] REID, CONSTANCE.Hilbert, Springer-Verlag,
[62] McCARTHY, JOHN. "Reeursive functions New York, 1970, 290 pp.
of symbolic expressions and their compu- [81] REYNOLDS,JOHN. "Fundamentals of struc-
tation by machine, part I , " Comm. ACM 3, tured programming," Systems and Info.
4 (April 1960), 184-195. Set. 555 course notes, Syracuse Univ., Syra-
[63] MCCARTHY,JOHN. "Towards a mathemati- cuse, N.Y., Spring 1973.
cal science of computation," in Proc. IFIP [82] SATTERTHWAITE, E. H. "Debugging tools
Congress 1962, Munich, Germany, North- for high level languages," Software--Practice
Holland Publ. Co., Amsterdam, The Nether- and Experience 2, 3 (July-September 1972),
lands, 1963, 21-28. 197-217.
[64] McCRACKEN, DANIEL D. "Revolution in [83] SCHNECK, P. B., AND ANGEL, ELLINOR. " A
• rogranmaing," Datamation 19, 12 (Decem-
er 1973), 50-52.
FORTRAN to FORTRAN optimizing com-
piler," Computer J. 16, 4 (1973), 322-330.
[65] McKEEMAN, W. M.; HORNING, J. J.; AND [84] SCHORRE, D. V. "META-II--a syntax-di-
WORTMAN, D. B. A compiler generator, rected compiler writing language," Proc.
Prentice-Hall, Englewood Cliffs, N. J., ACM National Conference, Philadelphia,
1970, 527 pp. Pa., 1964, paper D1.3.

Computing Surveys, Vol. 6, No. 4, December 197/4


Structured Programming with g o to ,~tabynents • 801
[85] SCHORRE, D. V. "Improved organization improved prograr~ing: perf~mance,"
for procedural languages," Tech. memo Datamation 17, 11 (November 1972), 82--85.
TM 3086/002/00, Systems Development [94] WIRTH, N. ¢'On certain basic concepts of
Corp., Santa Monica, Calif., September 8, programming languages," Stanford Com-
1966, 8 pp. puter Science Report CS 65, Stanford, Calif.
[86] SHIGO, O.; SHIMOMURA, T.; FUJIBAYASHI (May 1967), 30pp.
S.; AND MAEJIMA, T. "SPOT: an experi- [95] WIRTH, N. " P L 360,i a programming lan-
mental system for structured programming" guage for the 360 computers," J. ACM 15,
(in Japanese), Conference Record, Informa- 1 (January 1968), 37-74.
tion Processing Society of Japan, 1973. [96] WIRTH, N. "Pro g,ra~ development by step-
]Translation available from the authors, wise refinement, Comm.ACM 14, 4 (April
Nippon Electric Company Ltd., Kawa- 1971), 221-227.
saki, Japan.] [97] WIRTH, N. "The programming language
[87] STRACHEY,C. "Varieties of programming Pascal," Aeta Info~natica 1, 1 (1971), 35-
language," in High level languages, C. Boon 63.
lEd.], Infotech State of the Art Report 7, [98] WULF, W. A. ; RUSSELL,D. B. ; ~ D HABER-
1972, 345-362. MANN, A. N. "BLISSi: A language for sys-
[~] STRONG,H. R. JR. "Translating recursion tems programming,'~ Comm. ACM 14, 12
equations into flowcharts," J. Computer and (December 1971), 780~-790.
System Sciences 5, 3 (June 1971), 254-285. [99] WULF, W. A. "Prog~nuning without the
[89] TEITELMAN,W. "Toward a programming goto," Information ~Pror~ssing 71, Prec.
laboratory," in Software Engineering Tech- IFIP Congress, Vol. 1, North-Holland
niques, J. N. Buxton and B. Randall [Eds.], Publ. Co., Amsterdam, The Netherlands,
NATO Scientific Affairs Division, Brussels, 1971, 408-413.
Belgium, 1970, 137-149,~ N [100] WULF, W. A. "A case against the GOTO,"
[90] TEITELMAN,W. et al. I TERLISPreference Prec. ACM 197~ Annual Conference, Bos-
manual," Xerox Pale Alto Research Center, ton, Mass. (August 1972), 791-797.
Pale Alto, Calif., and Bolt Beranek and [101] WULF, W. A.; JOHNSON,RIeH~Rn K.; WEIN-
Newman, Inc., 1974. STOCK, CHARLESP.; ANDHeRBs, STEVENO.
[91] WALKER,S. A., AND STRONG, I-I. a. "Char- "The design of an 4ptimizing compiler,"
acterizations of flowchartable recursions," Computer Science !Department report,
J. Computer and System Sciences 7, 4 (Au- Carnegie-Mellon Univ., Pittsburgh, Pa.,
gust 1973), 404-447. (December 1973), 103~pl~.
[92] WEGNER,EEF.RHARD. "Tree-structured pro- [102] ZAHN, CHARLEST. "A control statement for
grams," Comm. ACM 16, 11 (November natural top-down structured program-
1973), 704-705. ming," presented atl Symposium on Pro-
[93] WEINBERG,GERALDM. "The psychology of gramming Languages~ Parrs, 1974.

Computing Surveys,VoL 6, No. 4, December1974


Literate Programming

Donald E. Knuth
Computer Science Department, Stanford University, Stanford, CA 94305, USA

The author and his associates have been experimenting for the past several years with a program-
ming language and documentation system called WEB. This paper presents WEB by example, and
discusses why the new system appears to be an improvement over previous ones.

I would ordinarily have assigned to student research


A. INTRODUCTION
assistants; and why? Because it seems to me that at last
I’m able to write programs as they should be written.
The past ten years have witnessed substantial improve- My programs are not only explained better than ever
ments in programming methodology. This advance, before; they also are better programs, because the new
carried out under the banner of “structured program- methodology encourages me to do a better job. For
ming,” has led to programs that are more reliable and these reasons I am compelled to write this paper, in
easier to comprehend; yet the results are not entirely hopes that my experiences will prove to be relevant to
satisfactory. My purpose in the present paper is to others.
propose another motto that may be appropriate for the I must confess that there may also be a bit of mal-
next decade, as we attempt to make further progress ice in my choice of a title. During the 1970s I was
in the state of the art. I believe that the time is ripe coerced like everybody else into adopting the ideas of
for significantly better documentation of programs, and structured programming, because I couldn’t bear to be
that we can best achieve this by considering programs found guilty of writing unstructured programs. Now I
to be works of literature. Hence, my title: “Literate have a chance to get even. By coining the phrase “liter-
Programming.” ate programming,” I am imposing a moral commitment
Let us change our traditional attitude to the con- on everyone who hears the term; surely nobody wants
struction of programs: Instead of imagining that our to admit writing an illiterate program.
main task is to instruct a computer what to do, let us
concentrate rather on explaining to human beings what
we want a computer to do.
The practitioner of literate programming can be re- B. THE WEB SYSTEM
garded as an essayist, whose main concern is with ex-
position and excellence of style. Such an author, with I hope, however, to demonstrate in this paper that the
thesaurus in hand, chooses the names of variables care- title is not merely wordplay. The ideas of literate pro-
fully and explains what each variable means. He or she gramming have been embodied in a language and a
strives for a program that is comprehensible because its suite of computer programs that have been developed
concepts have been introduced in an order that is best at Stanford University during the past few years as part
for human understanding, using a mixture of formal of my research on algorithms and on digital typography.
and informal methods that reı̈nforce each other. This language and its associated programs have come
I dare to suggest that such advances in documenta- to be known as the WEB system. My goal in what fol-
tion are possible because of the experiences I’ve had lows is to describe the philosophy that underlies WEB,
during the past several years while working intensively to present examples of programs in the WEB language,
on software development. By making use of several and to discuss what may be the future implications of
ideas that have existed for a long time, and by applying this work.
them systematically in a slightly new way, I’ve stumbled I chose the name WEB partly because it was one of
across a method of composing programs that excites me the few three-letter words of English that hadn’t al-
very much. In fact, my enthusiasm is so great that I ready been applied to computers. But as time went on,
must warn the reader to discount much of what I shall I’ve become extremely pleased with the name, because
say as the ravings of a fanatic who thinks he has just I think that a complex piece of software is, indeed, best
seen a great light. regarded as a web that has been delicately pieced to-
Programming is a very personal activity, so I can’t gether from simple materials. We understand a compli-
be certain that what has worked for me will work for cated system by understanding its simple parts, and by
everybody. Yet the impact of this new approach on my understanding the simple relations between those parts
own style has been profound, and my excitement has and their immediate neighbors. If we express a pro-
continued unabated for more than two years. I enjoy gram as a web of ideas, we can emphasize its structural
the new methodology so much that it is hard for me to properties in a natural and satisfying way.
refrain from going back to every program that I’ve ever WEB itself is chiefly a combination of two other lan-
written and recasting it in “literate” form. I find myself guages: (1) a document formatting language and (2) a
unable to resist working on programming tasks that programming language. My prototype WEB system uses
submitted to THE COMPUTER JOURNAL 1
D. E. KNUTH

TEX as the document formatting language and PAS- tion, so you can get printed output by applying one
CAL as the programming language, but the same prin- more system routine to this file.
ciples would apply equally well if other languages were You can also follow the other branch of Figure 1, by
substituted. Instead of TEX, one could use a language running the TANGLE processor; this is a system program
like Scribe or Troff; instead of PASCAL, one could use that takes the file COB.WEB as input and produces a new
ADA, ALGOL, LISP, COBOL, FORTRAN, APL, C, etc., file COB.PAS as output. Then you run the PASCAL com-
or even assembly language. The main point is that WEB piler, which converts COB.PAS to a binary file COB.REL
is inherently bilingual, and that such a combination of (say). Finally, you can run your program by loading
languages proves to be much more powerful than either and executing COB.REL. The process of “compile, load,
single language by itself. WEB does not make the other and go” has been slightly lengthened to “tangle, com-
languages obsolete; on the contrary, it enhances them. pile, load, and go.”
I naturally chose TEX to be the document formatting
language, in the first WEB system, because TEX is my
own creation;1 I wanted to acquire a lot of experience
in harnessing TEX to a variety of different tasks. I chose C. A COMPLETE EXAMPLE
PASCAL as the programming language because it has
received such widespread support from educational in- Now it’s time for me to stop presenting general plat-
stitutions all over the world; it is not my favorite lan- itudes and to move on to something tangible. Let us
guage for system programming, but it has become a look at a real program that has been written in WEB.
“second language” for so many programmers that it The numbered paragraphs that follow are the actual
provides an exceptionally effective medium of commu- output of a WEB file that has been “woven” into a doc-
nication. Furthermore WEB itself has a macro-processing ument; a computer has also generated the indexes that
ability that makes PASCAL’s limitations largely irrele- appear at the program’s end. If my claims for the ad-
vant. vantages of literate programming have any merit, you
Document formatting languages are newcomers to should be able to understand the following description
the computing scene, but their use is spreading rapidly. more easily than you could have understood the same
Therefore I’m confident that we will be able to expect program when presented in a more conventional way.
each member of the next generation of programmers to However, I am trying here to explain the format of WEB
be familiar with a document language as well as a pro- documentation at the same time as I am discussing the
gramming language, as part of their basic education. details of a nontrivial algorithm, so the description be-
Once a person knows both of the underlying languages, low is slightly longer than it would be if it were written
there’s no trick at all to learning WEB, because the WEB for people who already have been introduced to WEB.
user’s manual is fewer than ten pages long. Here, then, is the computer-generated output:
A WEB user writes a program that serves as the source
language for two different system routines. (See Fig- Printing primes: An example of WEB . . . . . . . . . . . . . . §1
ure 1.) One line of processing is called weaving the Plan of the program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . §3
web; it produces a document that describes the pro- The output phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . §5
gram clearly and that facilitates program maintenance. Generating the primes . . . . . . . . . . . . . . . . . . . . . . . . . . §11
The other line of processing is called tangling the web; The inner loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . §22
it produces a machine-executable program. The pro- Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . §27
gram and its documentation are both generated from
the same source, so they are consistent with each other. 1. Printing primes: An example of WEB. The
following program is essentially the same as Edsger
Y TEX Y Dijkstra’s “first example of step-wise program composi-
TEX −−−−→ DVI tion,” found on pages 26–39 of his Notes on Structured
WEAVE%
Programming,2 but it has been translated into the WEB
Y language.
WEB
[[Double brackets will be used in what follows to en-
TANGLE
& Y Y close comments relating to WEB itself, because the chief
purpose of this program is to introduce the reader to
PAS −−−−→ REL
PASCAL the WEB style of documentation. WEB programs are al-
ways broken into small sections, each of which has a
Figure 1. Dual usage of a WEB file. serial number; the present section is number 1.]]
Dijkstra’s program prints a table of the first thou-
Let’s look at this process in slightly more detail. Sup- sand prime numbers. We shall begin as he did, by re-
pose you have written a WEB program and put it into ducing the entire program to its top-level description.
a computer text file called COB.WEB (say). To gener- [[Every section in a WEB program begins with optional
ate hardcopy documentation for your program, you can commentary about that section, and ends with optional
run the WEAVE processor; this is a system program that program text for the section. For example, you are now
takes the file COB.WEB as input and produces another file reading part of the commentary in §1, and the program
COB.TEX as output. Then you run the TEX processor, text for §1 immediately follows the present paragraph.
which takes COB.TEX as input and produces COB.DVI as Program texts are specifications of PASCAL programs;
output. The latter file, COB.DVI, is a “device-independent” they either use PASCAL language directly, or they use
binary description of how to typeset the documenta- angle brackets to represent PASCAL code that appears
2 submitted to THE COMPUTER JOURNAL
LITERATE PROGRAMMING

in other sections. For example, the angle-bracket nota- In the documentation below, the notation ‘p[k]’ will
tion ‘h Program to print . . . numbers 2 i’ is WEB’s way of refer to the kth element of array p, while ‘pk ’ will refer
saying the following: “The PASCAL text to be inserted to the kth prime number. If the program is correct, p[k]
here is called ‘Program to print . . . numbers’, and you will either be equal to pk or it will not yet have been
can find out all about it by looking at section 2.” One assigned any value.
of the main characteristics of WEB is that different parts [[Incidentally, our program will eventually make use of
of the program are usually abbreviated, by giving them several more variables as we refine the data structures.
such an informal top-level description.]] All of the sections where variables are declared will
h Program to print the first thousand prime be called ‘h Variables of the program 4 i’; the number
numbers 2 i ‘4’ in this name refers to the present section, which is
the first section to specify the expanded meaning of
2. This program has no input, because we want to ‘h Variables of the program i’. The note ‘See also . . .’
keep it rather simple. The result of the program will be refers to all of the other sections that have the same top-
to produce a list of the first thousand prime numbers, level description. The expanded meaning of ‘h Variables
and this list will appear on the output file. of the program 4 i’ consists of all the program texts for
Since there is no input, we declare the value m = this name, not just the text found in §4.]]
1000 as a compile-time constant. The program itself is
capable of generating the first m prime numbers for any h Variables of the program 4 i ≡
positive m, as long as the computer’s finite limitations p: array [1 . . m] of integer ; { the first m prime
are not exceeded. numbers, in increasing order }
[[The program text below specifies the “expanded mean- See also sections 7, 12, 15, 17, 23, and 24.
ing” of ‘h Program to print . . . numbers 2 i’; notice that This code is used in section 2.
it involves the top-level descriptions of three other sec-
tions. When those top-level descriptions are replaced
by their expanded meanings, a syntactically correct PAS- 5. The output phase. Let’s work on the second
CAL program will be obtained.]] part of the program first. It’s not as interesting as the
problem of computing prime numbers; but the job of
h Program to print the first thousand prime printing must be done sooner or later, and we might as
numbers 2 i ≡ well do it sooner, since it will be good to have it done.
program print primes (output ); [[And it is easier to learn WEB when reading a program
const m = 1000; that has comparatively few distracting complications.]]
h Other constants of the program 5 i Since p is simply an array of integers, there is little
var h Variables of the program 4 i difficulty in printing the output, except that we need to
begin h Print the first m prime numbers 3 i; decide upon a suitable output format. Let us print the
end. table on separate pages, with rr rows and cc columns
This code is used in section 1. per page, where every column is ww character positions
wide. In this case we shall choose rr = 50, cc = 4, and
3. Plan of the program. We shall proceed to fill ww = 10, so that the first 1000 primes will appear on
out the rest of the program by making whatever deci- five pages. The program will not assume that m is an
sions seem easiest at each step; the idea will be to strive exact multiple of rr · cc .
for simplicity first and efficiency later, in order to see
where this leads us. The final program may not be op- h Other constants of the program 5 i ≡
timum, but we want it to be reliable, well motivated, rr = 50; { this many rows will be on each page in
and reasonably fast. the output }
Let us decide at this point to maintain a table that cc = 4; { this many columns will be on each page
includes all of the prime numbers that will be gener- in the output }
ated, and to separate the generation problem from the ww = 10; { this many character positions will be
printing problem. used in each column }
[[The WEB description you are reading once again fol- See also section 19.
lows a pattern that will soon be familiar: A typical This code is used in section 2.
section begins with comments and ends with program
text. The comments motivate and explain noteworthy 6. In order to keep this program reasonably free of no-
features of the program text.]] tations that are uniquely PASCALesque, [[and in order
to illustrate more of the facilities of WEB,]] a few macro
h Print the first m prime numbers 3 i ≡
definitions for low-level output instructions are intro-
h Fill table p with the first m prime numbers 11 i;
duced here. All of the output-oriented commands in
h Print table p 8 i
the remainder of the program will be stated in terms of
This code is used in section 2.
five simple primitives called print string , print integer ,
4. How should table p be represented? Two possi- print entry , new line , and new page .
bilities suggest themselves: We could construct a suffi- [[Sections of a WEB program are allowed to contain
ciently large array of boolean values in which the kth macro definitions between the opening comments and
entry is true if and only if the number k is prime; or the closing program text. The general format for each
we could build an array of integers in which the kth section is actually tripartite: commentary, then defini-
entry is the kth prime number. Let us choose the lat- tions, then program. Any of the three parts may be
ter alternative, by introducing an integer array called absent; for example, the present section contains no
p[1 . . m]. program text.]]
submitted to THE COMPUTER JOURNAL 3
D. E. KNUTH

[[Simple macros simply substitute a bit of PASCAL 10. The first row will contain
code for an identifier. Parametric macros are similar, p[1], p[1 + rr ], p[1 + 2 ∗ rr ], . . . ;
but they also substitute an argument wherever ‘#’ oc-
curs in the macro definition. The first three macro def- a similar pattern holds for each value of the row offset .
initions here are parametric; the other two are simple.]] h Output a line of answers 10 i ≡
begin for c ← 0 to cc − 1 do
define print string (#) ≡ write (#) if row offset + c ∗ rr ≤ m then
{ put a given string into the output file } print entry (p[row offset + c ∗ rr ]);
define print integer (#) ≡ write (# : 1) new line ;
{ put a given integer into the output file, end
in decimal notation, using only as many This code is used in section 9.
digit positions as necessary }
define print entry (#) ≡ write (# : ww ) { like 11. Generating the primes. The remaining task
print integer , but ww character positions is to fill table p with the correct numbers. Let us do
are filled, inserting blanks at the left } this by generating its entries one at a time: Assuming
define new line ≡ write ln { advance to a new line that we have computed all primes that are j or less, we
in the output file } will advance j to the next suitable value, and continue
define new page ≡ page { advance to a new page doing this until the table is completely full.
in the output file } The program includes a provision to initialize the
variables in certain data structures that will be intro-
7. Several variables are needed to govern the output
duced later.
process. When we begin to print a new page, the
variable page number will be the ordinal number of that h Fill table p with the first m prime numbers 11 i ≡
page, and page offset will be such that p[page offset ] is h Initialize the data structures 16 i;
the first prime to be printed. Similarly, p[row offset ] while k < m do
will be the first prime in a given row. begin h Increase j until it is the next prime
[[Notice the notation ‘+ ≡’ below; this indicates that number 14 i;
the present section has the same name as a previous k ← k + 1; p[k] ← j;
section, so the program text will be appended to some end
text that was previously specified.]] This code is used in section 3.

h Variables of the program 4 i +≡ 12. We need to declare the two variables j and k that
page number : integer ; { one more than the number were just introduced.
of pages printed so far } h Variables of the program 4 i +≡
page offset : integer ; { index into p for the first entry j: integer ; { all primes ≤ j are in table p }
on the current page } k: 0 . . m; { this many primes are in table p }
row offset : integer ; { index into p for the first entry 13. So far we haven’t needed to confront the issue of
in the current row } what a prime number is. But everything else has been
c: 0 . . cc ; { runs through the columns in a row } taken care of, so we must delve into a bit of number
theory now.
8. Now that appropriate auxiliary variables have been
By definition, a number is called prime if it is an
introduced, the process of outputting table p almost
integer greater than 1 that is not evenly divisible by
writes itself.
any smaller prime number. Stating this another way,
h Print table p 8 i ≡ the integer j > 1 is not prime if and only if there exists
begin page number ← 1; page offset ← 1; a prime number pn < j such that j is a multiple of pn .
while page offset ≤ m do Therefore the section of the program that is called
begin h Output a page of answers 9 i; ‘h Increase j until it is the next prime number i’ could be
page number ← page number + 1; coded very simply: ‘repeat j ← j +1; h Give to j prime
page offset ← page offset + rr ∗ cc ; the meaning: j is a prime number i; until j prime ’.
end; And to compute the boolean value j prime , the follow-
end ing would suffice: ‘j prime ← true ; for n ← 1 to k do
This code is used in section 3. h If p[n] divides j, set j prime ← false i’.
14. However, it is possible to obtain a much more ef-
9. A simple heading is printed at the top of each page. ficient algorithm by using more facts of number theory.
h Output a page of answers 9 i ≡ In the first place, we can speed things up a bit by rec-
begin print string (´TheÃFirstô); ognizing that p1 = 2 and that all subsequent primes
print integer (m); are odd; therefore we can let j run through odd values
print string (´ÃPrimeÃNumbersÃ−−−ÃPageô); only. Our program now takes the following form:
print integer (page number ); new line ; new line ; h Increase j until it is the next prime number 14 i ≡
{ there’s a blank line after the heading } repeat j ← j + 2;
for row offset ← page offset to page offset + rr − 1 h Update variables that depend on j 20 i;
do h Output a line of answers 10 i; h Give to j prime the meaning: j is a prime
new page ; number 22 i;
end until j prime
This code is used in section 8. This code is used in section 11.

4 submitted to THE COMPUTER JOURNAL


LITERATE PROGRAMMING

15. The repeat loop in the previous section intro- famous demonstration that there are infinitely many
duces a boolean variable j prime , so that it will not prime numbers is strong enough to prove only that
be necessary to resort to a goto statement. (We are pk+1 <= p1 . . . pk + 1. Advanced books on number
following Dijkstra,2 not Knuth.3 ) theory come to our rescue by showing that much more
h Variables of the program 4 i +≡ is true; for example, “Bertrand’s postulate” states that
j prime : boolean ; { is j a prime number? } pk+1 < 2pk for all k.
h Update variables that depend on ord 21 i ≡
16. In order to make the odd-even trick work, we
square ← p[ord ] ∗ p[ord ]; { at this point ord ≤ k }
must of course initialize the variables j, k, and p[1] as
follows. See also section 25.

h Initialize the data structures 16 i ≡ This code is used in section 20.


j ← 1; k ← 1; p[1] ← 2;
See also section 18. 22. The inner loop. Our remaining task is to de-
This code is used in section 11. termine whether or not a given integer j is prime. The
general outline of this part of the program is quite sim-
17. Now we can apply more number theory in order ple, using the value of ord as described above.
to obtain further economies. If √j is not prime, its
smallest prime factor pn will be j or less. Thus if h Give to j prime the meaning: j is a prime
we know a number ord such that number 22 i ≡
n ← 2; j prime ← true ;
p[ord ]2 > j, while (n < ord ) ∧ j prime do
begin h If p[n] is a factor of j, set
and if j is odd, we need only test for divisors in the j prime ← false 26 i;
set {p[2], . . . , p[ord − 1]}. This is much faster than n ← n + 1;
testing divisibility by {p[2], . . . , p[k]}, since ord tends end
to be much smaller than k. (Indeed, when k is large, This code is used in section 14.
the celebrated “prime number theorem”pimplies that
the value of ord will be approximately 2 k/ln k.) 23. h Variables of the program 4 i +≡
Let us therefore introduce ord into the data struc- n: 2 . . ord max ;
ture. A moment’s thought makes it clear that ord { runs from 2 to ord when testing divisibility }
changes in a simple way when j increases, and that an- 24. Let’s suppose that division is very slow or nonex-
other variable square facilitates the updating process. istent on our machine. We want to detect nonprime odd
h Variables of the program 4 i +≡ numbers, which are odd multiples of the set of primes
ord : 2 . . ord max ; {p2 , . . . , pord }.
{ the smallest index ≥ 2 such that p2ord > j } Since ord max is small, it is reasonable to maintain
square : integer ; { square = p2ord } an auxiliary table of the smallest odd multiples that
haven’t already been used to show that some j is non-
18. h Initialize the data structures 16 i +≡ prime. In other words, our goal is to “knock out” all of
ord ← 2; square ← 9; the odd multiples of each pn in the set {p2 , . . . , pord },
19. The value of ord will never get larger than a cer- and one way to do this is to introduce an auxiliary table
tain value ord max , which must be chosen sufficiently that serves as a control structure for a set of knock-out
large. It turns out that ord never exceeds 30 when procedures that are being simulated in parallel. (The
m = 1000. so-called “sieve of Eratosthenes” generates primes by a
h Other constants of the program 5 i +≡ similar method, but it knocks out the multiples of each
ord max = 30; { p2ord max must exceed pm } prime serially.)
The auxiliary table suggested by these considerations
20. When j has been increased by 2, we must increase is a mult array that satisfies the following invariant
ord by unity when j = p2ord , i.e., when j = square . condition: For 2 ≤ n < ord , mult [n] is an odd multiple
h Update variables that depend on j 20 i ≡ of pn such that mult [n] < j + 2pn .
if j = square then h Variables of the program 4 i +≡
begin ord ← ord + 1; mult : array [2 . . ord max ] of integer ;
h Update variables that depend on ord 21 i; { runs through multiples of primes }
end
This code is used in section 14. 25. When ord has been increased, we need to ini-
tialize a new element of the mult array. At this point
21. At this point in the program, ord has just been j = p[ord − 1]2 , so there is no need for an elaborate
increased by unity, and we want to set square := p2ord . computation.
A surprisingly subtle point arises here: How do we h Update variables that depend on ord 21 i +≡
know that pord has already been computed, i.e., that mult [ord − 1] ← j;
ord ≤ k? If there were a gap in the sequence of prime
numbers, such that pk+1 > p2k for some k, then this 26. The remaining task is straightforward, given the
part of the program would refer to the yet-uncomputed data structures already prepared. Let us recapitulate
value p[k + 1] unless some special test were made. the current situation: The goal is to test whether or
Fortunately, there are no such gaps. But no sim- not j is divisible by pn , without actually performing
ple proof of this fact is known. For example, Euclid’s a division. We know that j is odd, and that mult [n]
submitted to THE COMPUTER JOURNAL 5
D. E. KNUTH

is an odd multiple of pn such that mult [n] < j + 2pn . rr : 5, 8, 9, 10.


If mult [n] < j, we can increase mult [n] by 2pn and square : 17, 18, 20, 21.
the same conditions will hold. On the other hand if true : 4, 13, 22.
mult [n] ≥ j, the conditions imply that j is divisible WEB: 1.
by pn if and only if j = mult [n]. write : 6.
h If p[n] is a factor of j, set j prime ← false 26 i ≡ write ln : 6.
while mult [n] < j do ww : 5, 6.
mult [n] ← mult [n] + p[n] + p[n];
if mult [n] = j then j prime ← false
h Fill table p with the first m prime numbers 11 i
This code is used in section 22.
Used in 3.
h Give to j prime the meaning: j is a prime number 22 i
27. Index. Every identifier used in this program is Used in 14.
shown here together with a list of the section numbers h If p[n] is a factor of j, set j prime ← false 26 i
where that identifier appears. The section number is Used in 22.
underlined if the identifier was defined in that section. h Increase j until it is the next prime number 14 i
However, one-letter identifiers are indexed only at their Used in 11.
point of definition, since such identifiers tend to appear h Initialize the data structures 16, 18 i Used in 11.
almost everywhere. [[An index like this is prepared au- h Other constants of the program 5, 19 i Used in 2.
tomatically by the WEB software, and it is appended to h Output a line of answers 10 i Used in 9.
the final section of the program. However, underlining h Output a page of answers 9 i Used in 8.
of section numbers is not automatic; the user is sup- h Print table p 8 i Used in 3.
posed to mark identifiers at their point of definition in h Print the first m prime numbers 3 i Used in 2.
the WEB source file.]] h Program to print the first thousand prime numbers 2 i
This index also refers to some of the places where key Used in 1.
elements of the program are treated. For example, the h Update variables that depend on j 20 i Used in 14.
entries for ‘Output format’ and ‘Page headings’ indi- h Update variables that depend on ord 21, 25 i
cate where details of the output format are discussed. Used in 20.
Several other topics that appear in the documentation h Variables of the program 4, 7, 12, 15, 17, 23, 24 i
(e.g., ‘Bertrand’s postulate’) have also been indexed. Used in 2.
[[Special instructions within a WEB source file can be
used to insert essentially anything into the index.]]
Bertrand, Joseph, postulate: 21.
boolean : 15. D. HOW THE EXAMPLE WAS SPECIFIED
c: 7.
cc : 5, 7, 8, 10. Everything reproduced above, from the table of con-
Dijkstra, Edsger: 1, 15. tents preceding the program to the indexes of identifiers
Eratosthenes, sieve of: 24. and section names at the end, was generated by ap-
false : 13, 26. plying the program WEAVE to a source file PRIMES.WEB
integer : 4, 7, 12, 17, 24. written in the WEB language. Let us now look at that
j: 12. file PRIMES.WEB, in order to get an idea of what a WEB
j prime : 13, 14, 15, 22, 26. user actually types.
k: 12. There’s no need to show very much of PRIMES.WEB,
Knuth, Donald E.: 15. however, because that file is reflected quite faithfully
m: 2. by the formatted output. Figure 2 contains enough of
mult : 24, 25, 26. the WEB source to indicate the general flavor; a reader
n: 23. who is familiar with the rudiments of TEX will be able
new line : 6, 9, 10. to reconstruct all of PRIMES.WEB by looking only at the
new page : 6, 9. formatted output and Figure 2.
ord : 17, 18, 19, 20, 21, 22, 23, 24, 25. Figure 2a starts with TEX commands (not shown in
ord max : 17, 19, 23, 24. full) that make it convenient to typeset double brack-
output : 2, 6. ets [[. . .]] and to give special typographic treatment to
output format: 5, 9. names like ‘WEB’ and ‘PASCAL’. A WEB user generally
p: 4. begins by declaring such special aspects of the docu-
page : 6. ment format; for example, if nonstandard fonts of type
page headings: 9. are needed, they are usually stated first. It may also
page number : 7, 8, 9. be necessary to specify the correct hyphenation of non-
page offset : 7, 8, 9. English words that appear in the document.
prime number, definition of: 13. Then comes ‘@*’, which starts the program proper.
print entry : 6, 10. WEB uses the symbol ‘@’ as an escape character for spe-
print integer : 6, 9. cial instructions to the WEAVE and TANGLE processors.
print primes : 2. Everything between such special commands is either
print string : 6, 9. expressed in TEX language or in PASCAL language, de-
row offset : 7, 9, 10. pending on the context.
6 submitted to THE COMPUTER JOURNAL
LITERATE PROGRAMMING

\font\ninerm=cmr9 @ This program has no input, because we want


\let\mc=\ninerm % medium caps to keep it rather simple. The result of the
\def\WEB{{\tt WEB}} program will be to produce a list of the
\def\PASCAL{{\mc PASCAL}} first thousand prime numbers, and this list
\def\[{\ifhmode\ \fi$[\mkern-2mu[$} will appear on the |output| file.
\def\]{$]\mkern-2mu]$\ }
.
. Since there is no input, we declare the value
. |m=1000| as a compile-time constant. The
\hyphenation{Dijk-stra} program itself is capable of generating the
@* Printing primes: An example of \WEB. first |m| prime numbers for any positive |m|,
The following program is essentially the same as long as the computer’s finite limitations
as Edsger Dijkstra’s @^Dijkstra, Edsger@> are not exceeded.
‘‘first example of step-wise program \[The program text below specifies the
composition,’’ found on pages 26--39 ‘‘expanded meaning’’ of ‘\X2:Program to print
of his {\sl Notes on Structured $\ldots$ numbers\X’; notice that it involves
Programming},$^\Dijk$ but it has been the top-level descriptions of three other
translated into the \WEB\ language. @.WEB@> sections. When those top-level descriptions
\[Double brackets will be used in what are replaced by their expanded meanings, a
follows to enclose comments relating to \WEB\ syntactically correct \PASCAL\ program will
.
. be obtained.\]
.
an informal top-level description.\] @<Program to print...@>=
@p @<Program to print the first thousand program print_primes(output);
prime numbers@> const @!m=1000;
@<Other constants of the program@>@;
Figure 2a. The beginning of PRIMES.WEB. var @<Variables of the program@>@;
begin @<Print the first |m| prime numbers@>;
end.
Each section of the program begins either with ‘@ ’ Figure 2b. The WEB code that generated §2.
(i.e., at-sign and space) or ‘@*’ (i.e., at-sign and aster-
isk); WEB supplies the section numbers automatically.
@ In order to keep this program reasonably
The latter case, ‘@*’, denotes a major section of the
program, for which a special title is given. This title free of notations that are uniquely
will appear in boldface type, and it will also appear in \PASCAL esque, \[and in order to illustrate
.
.
the table of contents, and as a running headline on all .
pages of the woven documentation until another major The first three macro definitions here are
section begins. Each major section starts at the top of parametric; the other two are simple.\]
a page. (Such page beginnings have been indicated by @d print_string(#)==write(#)
horizontal lines in our example, because WEB’s normal {put a given string into the |output| file}
output format has been adapted to the format of this @d print_integer(#)==write(#:1)
journal. The output of WEAVE usually has a lot more {put a given integer into the |output|
white space, and the individual lines of text are usually file, in decimal notation, using only as
quite a bit wider.) many digit positions as necessary}
The lines that follow in Figure 2a show a few more @d print_entry(#)==write(#:ww)
WEB instructions: ‘@^’ marks the beginning of an index {like |print_integer|, but
entry to be set in roman type; ‘@>’ marks the end of an |ww| character positions are filled,
argument to a WEB command; ‘@.’ marks the beginning inserting blanks at the left}
of an index entry to be set in typewriter type; ‘@p’
@d new_line==write_ln
marks the beginning of the PASCAL program; and ‘@<’
{advance to a new line in the |output| file}
marks the beginning of a top-level description, i.e., of a
@d new_page==page
section name in the WEB program.
{advance to a new page in the |output| file}
Figure 2b immediately follows Figure 2a in the WEB
file. This material is what generated §2 of the doc- Figure 2c. The WEB code that generated §6.
umentation, and it illustrates the bilingual nature of
WEB: The commentary at the beginning of each section
is typed in TEX language, and the program text at the
end is typed in PASCAL language. comments to be formatted by TEX in the normal way.
Language-switching between TEX and PASCAL is oc- WEB files use vertical bars to introduce PASCAL format-
casionally desirable. For example, when you refer to ting in the midst of TEX formatting; for example, Fig-
technical details about the program, you usually want ure 2b says ‘the |output| file’ in order to typeset
to describe them in PASCAL, hence you want WEAVE to ‘the output file’.
format them with the typographic conventions it uses The program text in Figure 2b begins with ‘@<’ in-
for PASCAL programs. Conversely, when you put com- stead of with the ‘@p’ command used in Figure 2a,
ments in a PASCAL program, you want the text of those because the program text in §2 is the expansion of
submitted to THE COMPUTER JOURNAL 7
D. E. KNUTH

a specific top-level description. Notice that the top- {1:}{2:}PROGRAM PRINTPRIMES(OUTPUT);


level description has been abbreviated to ‘@<Program
CONST M=1000;{5:}RR=50;CC=4;WW=10;{:5}{19:}
to print...@>’. Since the names of sections tend to
ORDMAX=30;{:19}VAR{4:}
be rather long, it is a nuisance to type them in full each
time; WEB allows you to type ‘...’ after you have given P:ARRAY[1..M]OF INTEGER;{:4}{7:}
enough text to identify the remainder uniquely. PAGENUMBER:INTEGER;PAGEOFFSET:INTEGER;
The ‘@!’ operation in the program text of Figure 2b ROWOFFSET:INTEGER;C:0..CC;{:7}{12:}J:INTEGER;
governs the underlining of index entries. The ‘@;’ spec- K:0..M;{:12}{15:}JPRIME:BOOLEAN;{:15}{17:}
ifies an invisible symbol that has the effect of a semi- ORD:2..ORDMAX;SQUARE:INTEGER;{:17}{23:}
colon in PASCAL syntax. Commands such as these are N:2..ORDMAX;{:23}{24:}
comparatively unimportant, but they are available for MULT:ARRAY[2..ORDMAX]OF INTEGER;{:24}
polishing up the final documentation when you want to BEGIN{3:}{11:}{16:}J:=1;K:=1;P[1]:=2;{:16}
maintain fine control. {18:}ORD:=2;SQUARE:=9;{:18};
Figure 2c shows key portions of the WEB text that WHILE K<M DO BEGIN{14:}REPEAT J:=J+2;{20:}
generated §6. Notice that the command ‘@d’ introduces IF J=SQUARE THEN BEGIN ORD:=ORD+1;{21:}
a macro definition. All features of WEB that appear in SQUARE:=P[ORD]*P[ORD];{:21}{25:}
our example program are illustrated in Figures 2a, 2b, MULT[ORD-1]:=J;{:25};END{:20};{22:}N:=2;
and 2c; the remainder of PRIMES.WEB simply uses the JPRIME:=TRUE;
same conventions again and again. In fact, most of the WHILE(N<ORD)AND JPRIME DO BEGIN{26:}
WEB file is much simpler than the examples shown here; WHILE MULT[N]<J DO MULT[N]:=MULT[N]+P[N]+P[N]
Figure 2 has illustrated only the difficult parts. ;IF MULT[N]=J THEN JPRIME:=FALSE{:26};N:=N+1;
END{:22};UNTIL JPRIME{:14};K:=K+1;P[K]:=J;
END{:11};{8:}BEGIN PAGENUMBER:=1;
PAGEOFFSET:=1;
E. THE TANGLED OUTPUT WHILE PAGEOFFSET<=M DO BEGIN{9:}
BEGIN WRITE(’The First ’);WRITE(M:1);
Figure 3 shows the PASCAL program PRIMES.PAS that WRITE(’ Prime Numbers --- Page ’);
results when TANGLE is applied to PRIMES.WEB. This WRITE(PAGENUMBER:1);WRITELN;WRITELN;
program is not intended for human consumption—it’s FOR ROWOFFSET:=PAGEOFFSET TO PAGEOFFSET+RR-1
only supposed to be readable by a PASCAL compiler—
DO{10:}
so TANGLE does not go to great pains to produce a beau-
BEGIN FOR C:=0 TO CC-1 DO IF ROWOFFSET+C*RR<=
tiful format. Notice that underlines have been removed
from the identifier names, and that all of the letters M THEN WRITE(P[ROWOFFSET+C*RR]:WW);WRITELN;
have been converted to uppercase (except in strings); END{:10};PAGE;END{:9};
TANGLE tries to produce a format that will be acceptable PAGENUMBER:=PAGENUMBER+1;
to a standard PASCAL compiler. PAGEOFFSET:=PAGEOFFSET+RR*CC;END;END{:8}{:3};
TANGLE removes all of the commentary in the WEB END.{:2}{:1}
file, but it inserts new comments of its own. If for some Figure 3. PASCAL program generated from the WEB file.
reason you need to correlate the tangled PASCAL code
with the woven documentation, you can find the pro-
gram text for, say, §8 by looking between the comments
‘{8:}’ and ‘{:8}’.
A comparison of Figure 3 to Figure 2 should make it G. ADDITIONAL BELLS AND WHISTLES
clear why the TANGLE processor has acquired its name.
A system like WEB can be successful only if it is capa-
ble of handling large programs as well as small ones,
and only if it is complete enough to take care of all
F. THE WOVEN OUTPUT the practical requirements that arise when many differ-
ent kinds of programs are considered. A small example
I mentioned earlier that WEAVE is a program that con- like PRIMES.WEB is a satisfactory vehicle for illustrat-
verts a file like PRIMES.WEB into a file PRIMES.TEX that ing the general ideas, but it cannot be convincing as a
is a syntactically correct source file for TEX. Figure 4 demonstration of WEB’s ability to produce quality soft-
gives a sampling of PRIMES.TEX, which is even more un- ware in the “real world.” My original design of WEB
readable than PRIMES.PAS. The instructions that cause in September, 1981, was followed by a year of exten-
TEX to produce formatted PASCAL programs, with ap- sive experiments, so that by the time Version 1 was
propriate typefaces and indentation, etc., are somewhat released in September, 1982, I could be fairly confident
complex because they are supposed to give decent re- that the language was reasonably complete. Since then
sults regardless of the page size. only one or two small extensions have proved to be nec-
There is no need to discuss Figure 4 further in the essary; and although numerous enhancements can eas-
present paper, because the details of “pretty printing” ily be imagined, I believe that a useful stopping point
are not relevant to my main theme. I have shown this for a working system called WEB83 has been reached.
much of PRIMES.TEX only to make the point that it is A full description of WEB83 appears in a Stanford
nice to have a program like WEAVE to do all the format- report,4 which also contains the complete WEB programs
ting; computer programs are not easy to typeset. for WEAVE and TANGLE. The full language contains only
8 submitted to THE COMPUTER JOURNAL
LITERATE PROGRAMMING

\input webmac there is a way to pass an arbitrary sequence of char-


\font\ninerm=cmr9 acters through TANGLE so that the same sequence will
.
. appear “verbatim” in the PASCAL file; and there is a
. way to force beginning-of-line in that file. The latter
syntactically correct \PASCAL\ program will extensions have proved to be necessary to deal with
be obtained.\] various nonstandard conventions of different PASCAL
\Y\P$\4\X2:Program to print the first compilers. When a comment in braces is sent to the
thousand prime numbers\X\S$\6 PASCAL file, TANGLE is careful not to introduce further
\4\&{program}\1\ \37$\\{print\_primes}(% braces inside the comment.
\\{output})$;\6 3) There are facilities for octal and hexadecimal con-
\4\&{const} \37$\|m=1000$;\5 stants in WEB thees. TANGLE converts such constants to
\X5:Other constants of the program\X\6 decimal form; WEAVE gives them an appropriate typo-
\4\&{var} \37\X4:Variables of the program\X\6 graphic treatment.
\&{begin} \37\X3:Print the first \|m prime 4) There is a facility for dealing with alphabetic con-
numbers\X;\6 stants. When a program contains a double-quoted char-
\&{end}.\par acter like "A", TANGLE converts this to an integer be-
\U section~1.\fi tween 0 and 127 that equals the corresponding ASCII
.
.
. code (in this case 65). The use of ASCII code facilitates
The first three macro definitions here are the construction of software that is readily portable
parametric; the other two are simple.\] from one machine to another, independent of the ac-
tual character set in use.
\Y\P\D \37$\\{print\_string}(\#)\S\\{write}(%
\#)$\C{put a given string into the % 5) Furthermore, if a double-quoted constant is a string
\\{output} file}\par of several characters, like "cat", TANGLE converts it into
.
. a unique integer that is 128 or more. A special string
. pool file is written, containing all of the strings that
\inx have been specially encoded in this way. I have used
\:{Bertrand, Joseph, postulate}, 21. this general mechanism only in large programs, but
\:\\{boolean}, 15. experience has shown that it makes quite a nice sub-
.
.
. stitute for the string-processing capabilities that PAS-
\:\.{WEB}, 1. CAL lacks. (Incidentally, I noticed after several months
\:\\{write}, 6. that a program needs to have some indication that the
\:\\{write\_ln}, 6. string-pool file it is reading contains the same strings
\:\\{ww}, \[5], 6. that TANGLE generated when the program itself was tan-
\fin gled. Therefore a “check sum” is included in the string
.
. pool file; each program is able to refer to its own check
.
\:\X4, 7, 12, 15, 17, 23, 24:Variables of sum and to compare it with the value in the file. This
the program\X check-sum extension was one of the last features to be
\U section~2. added to WEB.)
\con 6) The PRIMES example illustrates macros with param-
eters and macros without parameters. WEB also allows
Figure 4. TEX program generated from the WEB file. “numeric” macros, which are small integer constants;
TANGLE is capable of doing simple arithmetic on such
constants. This feature of WEB was introduced specifi-
cally to overcome PASCAL’s unfortunate inability to do
a few features that do not show up in the PRIMES ex- compile-time arithmetic. For example, it is impossible
ample considered above: to have a PASCAL array whose bounds are ‘0 . . n − 1’,
1) There are facilities to override WEAVE’s automatic for- or to write ‘20 + 3 :’ as the label of one of the cases in
matting of PASCAL programs. For example, it is pos- ‘case x + y’; WEB’s numeric macros make it possible for
sible to force a statement to begin on a new line, or to TANGLE to preprocess such constants.
force several statements to appear on the same line, or
to suggest a desirable breakpoint in the middle of a long
expression. In unusual cases, WEAVE must parse pro-
gram fragments that are not syntactically complete— H. OCCAM’S RAZOR
for example, there may be a begin without a matching
end—so a WEB user must be given a chance to control I would also like to mention several things that were
the results. Furthermore there is a facility for chang- intentionally left out of WEB, since I have tried to keep
ing WEAVE’s formatting rules by declaring that a cer- the language as simple as I could.
tain identifier should be treated as a certain PASCAL There are no “conditional macros,” nor does TANGLE
reserved word, or by declaring that a certain reserved evaluate Boolean expressions that might influence the
word should be treated as an ordinary identifier. output. I found that everything I needed could be done
2) There is a way to force TANGLE to omit a space be- satisfactorily by commenting out the optional code.
tween two adjacent pieces of text, so that a name like For example, a system program is often designed
‘x3 ’ can be manufactured from ‘x’ and ‘3 ’. Similarly, to gather statistics about its own operation, but such
submitted to THE COMPUTER JOURNAL 9
D. E. KNUTH

statistics-gathering is pointless unless someone is actu- software of such complexity has ever been transported
ally going to use the results. In order to make the in- to so many different machines. It seems likely that TEX
strumentation code optional, I include the word ‘stat’ will soon be operating on all but the smallest of the
just before any special code for statistics, and ‘tats’ world’s computer systems.
just after such code; and I tell WEAVE to regard stat To my surprise, the main bottleneck to portability of
and tats as if they were begin and end. But stat and the TEXware has been the lack of suitable PASCAL com-
tats are actually simple macros. When I do want to pilers, because PASCAL has often been implemented
gather the statistics, I define stat and tats to be null; without system programming in mind. Anybody who
but in a production version of the software, I make stat has a decent PASCAL compiler can install WEB (and all
expand to ‘@{’ and tats expand to ‘@}’, where @{ and @} programs written in WEB) without great difficulty, es-
are special braces that TANGLE does not remove. Thus sentially as follows:
the optional code appears as a harmless comment in 1) Start with the three files WEAVE.WEB, TANGLE.WEB,
the PASCAL program. and TANGLE.PAS. (The programs have not been copy-
WEB’s macros are allowed to have at most one pa- righted, so these files are not difficult to obtain.)
rameter. Again, I did this in the interests of simplicity, 2) Run TANGLE.PAS through your PASCAL compiler to
because I noticed that most applications of multiple pa- get a working TANGLE program.
rameters could in fact be reduced to the one-parameter 3) Check your TANGLE by applying it to TANGLE.WEB;
case. For example, suppose that you want to define your output file should match TANGLE.PAS.
something like 4) Apply your TANGLE to the file WEAVE.WEB, obtaining
WEAVE.PAS; then apply PASCAL to WEAVE.PAS and
mac(#1,#2) == m[#1*r+#2]
you’ll have a working WEAVE system.
which WEB doesn’t permit. You can get essentially the 5) The same process applies to any software written
same result with two one-parameter macros in WEB, notably to TEX itself. (However, you need
fonts and suitable output equipment in order to make
mac_tail(#) == #] proper use of TEX; that may be another bottleneck.)
mac(#) == m[#*r+mac_tail Once you have TEX working, you can apply WEAVE
and TEX to your WEB files, thereby getting program
since, e.g., ‘mac(a)(b)’ will expand into ‘m[a*r+b]’. documents as illustrated above.
Here is another example that indicates some of the
surprising generality of one-parameter macros: Con- Notice that a TANGLE.PAS file is needed in order to get
sider the two definitions this “bootstrapping” process started. If you have just
WEAVE.WEB and TANGLE.WEB, you can’t do the first step.
define two_cases(#)==case j of However, anybody who has looked seriously into the
1:#(1); 2:#(2); end question of software portability will realize that my
define reset_file(#)==reset(file@#) comments in the preceding paragraphs have been over-
simplified. I have glossed over some serious problems
where ‘@&’ in the second definition is the concatenation
that arise: Character sets are different; file naming con-
operation that pastes two texts together. You can now
ventions are different; special conventions are needed to
say
interact with a user’s terminal; data is packed differ-
two_cases(reset_file)
ently on different machines; floating-point arithmetic is
and the resulting PASCAL output will be always nonstandard and sometimes nonexistent; users
want “friendly” interaction with existing programs for
case j of
editing and spooling; etc., etc. Furthermore, many of
1:reset(file1);
the world’s PASCAL compilers are incredibly bizarre.
2:reset(file2);
Therefore it is quite naı̈ve to believe that a single pro-
end
gram TANGLE.PAS could actually work on very many
In other words, the name of one macro can usefully be different machines, or even that one single source file
a parameter to another macro. This particular trick TANGLE.WEB could be adequate; some system-dependent
makes it possible to live with PASCAL compilers that changes are inevitable.
do not allow arrays of files. The WEB system caters to system-dependent changes
in a simple but surprisingly effective way that I ne-
glected to mention when I listed its other features. Both
TANGLE and WEAVE are designed to work with two in-
I. PORTABILITY put files, not just one: In addition to a WEB source
file like TEX.WEB, there is also a “change file” TEX.CH
One of the goals of my TEX research has been to pro- that contains whatever changes are needed to customize
duce portable software, and the WEB system has been TEX for a particular system. (Similarly, the source
extremely helpful in this respect. Although my own files WEAVE.WEB and TANGLE.WEB are accompanied by
work is done on a DEC-10 computer with Stanford’s WEAVE.CH and TANGLE.CH.)
one-of-a-kind operating system, the software developed Here’s how change files work: Each change has the
with WEB has already been transported successfully to a form “replace x1 . . . xm by y1 . . . yn ,” for some m ≥ 1
wide variety of computers made by other manufactur- and n ≥ 0; here xi and yj represent lines in the change
ers (including IBM, Control Data, XEROX, Hewlett- file. The WEAVE and TANGLE programs read data from
Packard), and to a variety of different operating sys- the WEB input file until finding a line that matches x1 ;
tems for those machines. To my knowledge, no other this line, and the m − 1 following lines, are replaced
10 submitted to THE COMPUTER JOURNAL
LITERATE PROGRAMMING

by y1 . . . yn . An error message is given if the m lines top-down and bottom-up were opposing methodologies:
replaced did not match x1 . . . xm perfectly. one more suitable for program exposition and the other
For example, the program PRIMES.WEB invokes a page more suitable for program creation.
procedure to begin a new page; but page was not pres- But after gaining experience with WEB, I have come to
ent in Wirth’s original PASCAL and it is defined rather realize that there is no need to choose once and for all
vaguely in the PASCAL standard. Therefore a system- between top-down and bottom-up, because a program
dependent change may be needed here. A change file is best thought of as a web instead of a tree. A hi-
PRIMES.CH could be made by copying the line erarchical structure is present, but the most important
thing about a program is its structural relationships. A
@d new_page==page complex piece of software consists of simple parts and
simple relations between those parts; the programmer’s
from Figure 2c and specifying one or more appropriate task is to state those parts and those relationships,
replacement lines. in whatever order is best for human comprehension—
The program TANGLE itself contains about 190 sec- not in some rigidly determined order like top-down or
tions, and a typical installation will have to change bottom-up.
about 15 of these. If you want to transport TANGLE
When I’m writing a longish program like TANGLE.WEB
to a new environment, you therefore need to create a
or WEAVE.WEB or TEX.WEB, I invariably have strong feel-
suitable file TANGLE.CH that modifies 15 or so parts of
ings about what part of the whole should be tackled
TANGLE.WEB. (Examples of TANGLE.CH are provided to
next. For example, I’ll come to a point where I need to
all people who receive TANGLE.WEB, so that each imple-
define a major data structure and its conventions, be-
mentor has a model of what to do.) You need to insert
fore I’ll feel happy about going further. My experiences
your changes by hand into TANGLE.PAS, until you have a
have led me to believe that a person reading a program
TANGLE program that works sufficiently well to support
is, likewise, ready to comprehend it by learning its var-
further bootstrapping. But you never actually change
ious parts in approximately the order in which it was
the master file TANGLE.WEB.
written. The PRIMES.WEB example illustrates this prin-
This approach has two important advantages. First,
ciple on a small scale; the decisions that Dijkstra made
the same master file TANGLE.WEB is used by everybody,
as he composed the original program2 appear in the WEB
and it contains the basic logic of TANGLE that really
documentation in the same order.
defines the essence of tangling. The system-dependent
changes do not affect any of the subtle parts of TANGLE’s Top-down programming gives you a strong idea of
control structures or data structures. Second, when the where you are going, but it forces you to keep a lot of
official TANGLE has been upgraded to a newer version, plans in your head; suspense builds up because noth-
a brand new TANGLE.WEB will almost always work with ing is really nailed down until the end. Bottom-up
the old TANGLE.CH, since changes are rarely made to programming has the advantage that you continually
the system-dependent parts. In other words, this dual- wield a more and more powerful pencil, as more and
input-file scheme works when the WEB file is constant more subroutines have been constructed; but it forces
and the CH file is modified, and it also works when the you to postpone the overall program organization until
CH file is constant but the WEB file is modified. the last minute, so you might flounder aimlessly.
Change files were added to WEB about three months When I tear up the first draft of a program and start
after the system was initially designed, based on our over, my second draft usually considers things in almost
initial experiences with people who had volunteered the same order as the first one did. Sometimes the
to participate in portability experiments. We realized “correct” order is top-down, sometimes it is bottom-
about a year later that WEAVE could be modified so that up, and sometimes it’s a mixture; but always it’s an
only the changed parts of a program would (optionally) order that makes sense on expository grounds.
be printed; thus, it’s now possible to document the Thus the WEB language allows a person to express
changes by listing only the sections that are actually programs in a “stream of consciousness” order. TANGLE
affected by the CH file that WEAVE has processed. We is able to scramble everything up into the arrangement
also generalized the original format of CH files, which that a PASCAL compiler demands. This feature of WEB
permitted only changes that extended to the end of a is perhaps its greatest asset; it makes a WEB-written
section. These two important ideas were among the program much more readable than the same program
final enhancements incorporated into WEB83. written purely in PASCAL, even if the latter program is
well commented. And the fact that there’s no need to
be hung up on the question of top-down versus bottom-
up—since a programmer can now view a large program
J. PROGRAMS AS WEBS as a web, to be explored in a psychologically correct
order—is perhaps the greatest lesson I have learned
When I first began to work with the ideas that even- from my recent experiences.
tually became the WEB system, I thought that I would Another surprising thing that I learned while using
be designing a language for “top-down” programming, WEB was that traditional programming languages had
where a top-level description is given first and succes- been causing me to write inferior programs, although I
sively refined. On the other hand I knew that I of- hadn’t realized what I was doing. My original idea was
ten created major parts of programs in a “bottom-up” that WEB would be merely a tool for documentation, but
fashion, starting with the definitions of basic proce- I actually found that my WEB programs were better than
dures and data structures and gradually building more the programs I had been writing in other languages.
and more powerful subroutines. I had the feeling that How could this be?
submitted to THE COMPUTER JOURNAL 11
D. E. KNUTH

Well, imagine that you are writing a small subroutine I usually start the name of a section with an im-
that updates part of a data structure, and suppose that perative verb, but I give a declarative commentary at
the updating takes only one or two lines of code. In the beginning of a section. Thus, PRIMES.WEB says
practical programs, there’s often something that can go ‘8. Now that appropriate . . . h Print table p 8 i ≡ . . . ’;
wrong, if the user’s input is incorrect, so the subroutine I wouldn’t do the opposite and say ‘8. Print the table.
has to check that the input is correct before doing the h Code for printing 8 i ≡ . . . ’.
update. Thus, the subroutine has the general form The name of a section (enclosed in angle brackets)
should be long enough to encapsulate the essential char-
procedure update ; acteristics of the code in that section, but it should not
begin if h input data is invalid i then be too verbose. I found very early that it would be a
h Issue an error message and try to recover i; mistake to include all of the assumptions about local
h Update the data structure i; and global variables in the name of each section, even
end. though such information would strictly be necessary to
isolate that section as an independent module. The
A subtle phenomenon occurs in traditional program- trick is to find a balance between formal and informal
ming languages: While writing the program for ‘h Issue exposition so that a reader can grasp what is happening
an error message and try to recover i’, a programmer without being overwhelmed with detail.5
subconsciously tries to get by with the fewest possible Another lesson I learned early in the game was that
lines of code, since the program for ‘h Update the data the name of a section should explicitly mention any
structure i’ is quite short. If an extensive error recovery nonstandard control structures, even though its data
is actually programmed, the subroutine will appear to structures can often be left implied. Furthermore, if
have error-message printing as its main purpose. But the control flow is properly explained, you can avoid
the programmer knows that the error is really an excep- the usual errors associated with goto statements; such
tional case that arises only rarely; therefore a lengthy statements can safely be introduced in a restrained but
error recovery doesn’t look right, and most program- natural manner.
mers will minimize it (without realizing that they are For example, §14 of the prime-printing example could
doing so) in order to make the subroutine’s appearance be reprogrammed as follows, using ‘loop’ as a macro
match its intended behavior. On the other hand when abbreviation for ‘while true do’:
the same task is programmed with WEB, the purpose
of update can be shown quite clearly, and the possibil- h Increase j until it is the next prime number 14 i ≡
ity of error recovery can be reduced to a mere mention loop begin j ← j + 2;
when update is defined. When another section entitled h Update variables that depend on j 20 i;
‘h Issue an error message and try to recover i’ is subse- h If j is prime, goto found 22 i;
quently written, the whole point of that section is to do end;
the best error recovery, and it becomes quite natural to found :
write a better program as a result.
With this change, §22 could become
This fact—that WEB allows you to let each part of
the program have its appropriate size, without distort- h If j is prime, goto found 22 i ≡
ing the readability of other parts—means that good n ← 2;
programmers find their WEB programs better than their while n < ord do
PASCAL programs, even though their PASCAL programs
begin h If p[n] is a factor of j, goto not found 26 i;
once looked like the work of an expert. n ← n + 1;
end;
goto found ;
not found :
K. STYLISTIC ISSUES
if §26 changes in the obvious way. The resulting pro-
I found that my style of using WEB evolved quite a bit gram will be more efficient on most machines; and I
during the first year. The general format, in which each believe that it is actually easier to read and to write,
section beings with commentary and ends with a formal in spite of the fact that two goto statements appear,
program fragment, is extremely versatile; you have the because the labels have been used with appropriate in-
freedom to say anything you want, yet you must make a terpretations of their abstract significance.
decision about how you’ll do it. I imagine that different Of course, PASCAL makes it difficult to use goto
programmers will converge to quite different styles, but statements, because Wirth decided that labels should
I would like to note down some of the things that have be numeric, and that they should be declared in ad-
seemed to work best for me. vance. If I were to introduce the goto statements as
Consider first the question of macros versus section suggested, I would have to define numeric macros found
names. A named section, like ‘h Issue an error mes- and not found , and I would have to insert ‘label found ,
sage and try to recover i’, is essentially the same as a not found ’ into the program at the right place. Such
parameterless macro; WEB provides both. I prefer to extra work is a bit of a nuisance, but it can be done in
use parameterless macros for “small” things that can WEB without spoiling the exposition.
be embodied in a word or two, but named sections for PASCAL has a few other misfeatures that prove to
longer portions of the program that merit a fuller de- be inconvenient with respect to WEB exposition. The
scription. worst of these is the inability to declare local variables
12 submitted to THE COMPUTER JOURNAL
LITERATE PROGRAMMING

in the midst of a program or procedure. For example, a long time that the programs I construct for publi-
a programmer often finds it most natural to define an cation in a book, or the programs that I construct in
integer variable when a for loop is introduced, but the front of a class, have tended to be comparatively free
rules of PASCAL insist that such a variable be declared of errors, because I am forced to clarify my thoughts as
rather far away from that for loop. My WEB programs I do the programming. By contrast, when writing for
overcome this problem by having sections like ‘h Local myself alone, I have often taken shortcuts that proved
variables for xyzzy i’ whenever there’s a rather lengthy later to be dreadful mistakes. It’s harder for me to fool
procedure ‘xyzzy ’ whose local variables should not be myself in such ways when I’m writing a WEB program,
declared all at once. But when a procedure is short, say because I’m in “expository mode” (analogous to class-
only half a dozen sections long, there’s usually no harm room lecturing) whenever a WEB is being spun. Ergo,
in declaring its local variables in PASCAL style, because less debugging time.
the entire text of the procedure will tend to appear on Now that I am writing all my programs in WEB, an
one or two adjacent pages of the documentation. unforeseen problem has, however, arisen: I suddenly
Another slightly awkward aspect of PASCAL is its have a collection of programs that seem quite beautiful
treatment of semicolons. If you look closely at the in my own eyes, and I have a compelling urge to publish
prime-number example, you’ll see that I had to be a all of them so that everybody can admire these works of
bit careful about where I put semicolons; sometimes art. A nice little 10-page program can easily be written
they occur at the end of the expanded text of a section, and debugged in an afternoon and evening; if I keep
but usually they don’t. With a little self discipline, a accumulating such gems, I’ll soon run out of storage
person can learn to do this quite satisfactorily, but it is space, and my office will be encrusted with webs of my
a nuisance until you get used to it. own making. There is no telling what will happen if
lots of other people catch WEB fever and start foisting
their creations on each other. I can already envision the
appearance of a new journal, to be entitled Webs, for
L. ECONOMIC ISSUES the publication of literate programs; I imagine that it
will have a large backlog and a large group of dedicated
What does it cost to use WEB? Let’s look first at the editors and referees.
lowest level, where computer costs are considered, be-
cause it is easy to make quantitative statements at this
level. The running time to TANGLE a WEB file is approx-
imately the same as the time needed to compile the M. RELATED WORK
resulting PASCAL program; hence the extra preprocess-
ing does not cost much. Similarly, WEAVE doesn’t take Nothing about WEB is really new; I have simply com-
long to produce a file for TEX. However, TEX needs a bined a bunch of ideas that have been in the air for a
comparatively large amount of time to typeset the final long time. I would like to summarize in the next few
document. For example, if we assume that each page paragraphs the things that had the greatest influence
requires four seconds, it will take four minutes to pro- on my thinking as I put those pieces together.
duce a 60-page document. The running time for WEAVE- George Forsythe wrote in 1966 that “A useful algo-
plus-TEX is quite reasonable when you consider that rithm is a substantial contribution to knowledge. Its
your program is effectively being converted into a fairly publication constitutes an important piece of scholar-
substantial booklet; but the costs are sufficiently large ship.”6 His comments have always inspired me to strive
to discourage remaking and reprinting such a booklet for excellence in programming, and they have played a
more than once or twice a day. When a new program is major rôle in shaping my present view that it is worth-
being developed, it is therefore customary to work with while to consider every program as a work of literature.
hardcopy documentation that is slightly obsolete, and The design of WEB was influenced primarily by the pi-
to read the WEB source file itself when up-to-date infor- oneering work of Pierre-Arnoul de Marneffe,7,8 whose
mation is required; the source file is sufficiently easy to research on what he called “Holon Programming” has
read for such purposes. not received the attention it deserves. His work was, in
The costs of WEB are more difficult to estimate at turn, inspired by Arthur Koestler’s excellent treatise on
higher levels, but I have found to my surprise that the the structure of complex systems and organisms;9 thus
total time of writing and debugging a WEB program is we have another connection between programming and
no greater than the total time of writing and debug- literature. A somewhat similar system was indepen-
ging an ALGOL or PASCAL program, even though my dently created by Edwin Towster.10
WEB programs are much better, and even though I am I owe a great debt to Edsger Dijkstra, Tony Hoare,
putting substantially more documentation into the pro- Ole-Johan Dahl, and Niklaus Wirth for opening my eyes
grams. Therefore I have lately been using WEB for all of to the importance of abstraction in the reading and
my programming, even for one-off jobs that I write “for writing of programs, and to Peter Naur for stressing the
my eyes only” just to explore occasional problems. The importance of a balance between formal and informal
extra time I spend in preparing additional commentary methods.
is regained because the debugging time is reduced. Tony Hoare provided a special impetus for WEB when
In retrospect, the fact that a “literate” program takes he suggested in 1978 that I should publish my program
much less time to debug is not surprising, because the for TEX. Since very few large-scale software systems
WEB language encourages a discipline that I was previ- were available in the literature, he had been trying
ously unwilling to impose on myself. I had known for to promote the publication of well-written programs.
submitted to THE COMPUTER JOURNAL 13
D. E. KNUTH

Hoare’s suggestion was actually rather terrifying to me, of exposition at that time; then Ignacio Zabala Salelles
and I’m sure he knew that he was posing quite a chal- gave a DOC a thorough test when he prepared a full im-
lenge. As a professor of computer science, I was quite plementation of TEX in PASCAL. Zabala’s implemen-
comfortable publishing papers about toy problems that tation was successfully transported to many different
could be polished up nicely and presented in an elegant computers,17−20 and this experience was of immense
manner; but I had no idea how to take a piece of real value to me when I cast WEB into its present form in
software, with all the compromises necessary to make it 1981. Since then many significant improvements have
useful to a large class of people on a wide variety of sys- been suggested by my colleague David R. Fuchs, and
tems, and to open it up to public scrutiny. How could I have also benefited from the experiences of a large
a supposedly respectable academic, like me, reveal the number of outstanding people who volunteered to be
way he actually writes large programs? And could a guinea pigs for pre-released versions of TEX. It’s im-
large program be made intelligible? My previous at- possible for me to name everyone who has helped, but
tempts along these lines11 were by now hopelessly out I would like to give special thanks to Arthur Samuel,
of date. I decided that this would be a good time to Howard Trickey, Joe Weening, and Pierre MacKay for
try out de Marneffe’s ideas; furthermore, the TEX sys- important contributions. I’m fortunate indeed to share
tem itself provided me with new tools for printing and a working environment with such stimulating people.
format control, so I suspected that it would be possi- When I originally designed the WEB system, I spent
ble to obtain state-of-the-art documentation by making about six weeks preparing the files TANGLE.WEB and
proper use of typography. WEAVE.WEB, during which time I was continually chang-
It is interesting to reread some of the comments that ing the language and trying different styles of expo-
Tony made ten years ago in his keynote address to the sition. (The programs were neither long nor compli-
first ACM symposium on Principles of Programming cated, but this was rather intensive work, so I didn’t
Languages:12 get much else done during those six weeks. The first
Documentation must be regarded as an integral two weeks were actually spent drafting the first ten per
part of the process of design and coding. A good cent of what is now TEX.WEB.) Then I spent about six
programming language will encourage and assist tedious hours with a text editor, hand-simulating the
the programmer to write clear, self-documenting behavior of TANGLE on TANGLE.WEB, so that I had a
code, and even perhaps to develop and display a program TANGLE.PAS that was ripe for debugging. At
pleasant style of writing. first I had to correct errors both in TANGLE.WEB and
TANGLE.PAS, but soon TANGLE was working well enough
He foresaw many future trends, but not the impending that I needed only TANGLE.WEB as a source file. Then
improvements in typesetting quality: WEAVE.WEB could be tangled and debugged too. The
It is of course possible for a compiler or service total time to create “Version 0” of the WEB system, in-
program to expand the abbreviations, fill in the cluding the language design and the time to debug the
defaults, and make explicit the assumptions. But programs and write a brief manual for users, was about
in practice, experience shows that it is very un- eight weeks; then enhancements were added at the rate
likely that the output of a computer will ever be of about one per month for the next 18 months. As a
more readable than its input, except in such trivial result of this experience I think it’s reasonable to state
but important aspects as improved indentation. that a WEB-like system can be created from scratch in a
Typographic formatting of computer programs has fairly short time, for some other pair of languages be-
a long tradition, originating with ALGOL and its im- sides TEX and PASCAL, by an expert system program-
mediate precursors. I’m not sure who made the first mer who is conversant with both languages. Indeed, I
experiments, but I believe that the lion’s share of the spoke about WEB on a recent visit to London and one
credit for developing excellent programming-language of the people in the audience decided to test this hy-
typography belongs to two people: Peter Naur, who pothesis; shortly afterwards I received an elegant report
edited the ALGOL 60 report13 and gave special care from Harold Thimbleby, who had just constructed an
to its presentation; and Myrtle Kellington, who served excellent system called Cweb, based on Troff/Nroff and
C instead of TEX and PASCAL.21
for many years as executive editor of ACM publica-
tions and set the standards that have been adopted by
other journals. The computing profession owes much to
these people, who made published programs so much
more readable than they would otherwise have been; N. RETROSPECT AND PROSPECTS
the magnitude of their contribution can only be ap-
preciated by people who submit computer programs to Enthusiastic reports about new computer languages,
journals like Acta Arithmetica whose editors are unfa- by the authors of those languages, are commonplace.
miliar with computer science. Bill McKeeman called Hence I’m well aware of the fact that my own experi-
attention to formatting issues when he published Algo- ences cannot be extrapolated too far. I also realize that,
rithm 268, “ALGOL 60 reference language editor,” in whenever I have encountered a problem with WEB, I’ve
1965.14 There has been a flowering of such algorithms simply changed the system; other users of WEB cannot
in recent years; the papers by Oppen15 and by Rose operate under the same ground rules.
and Welsh16 are particularly noteworthy. However, I believe that I have stumbled on a way of
I began to design WEB in the spring of 1979, when programming that produces better programs that are
I constructed a prototype system that was called DOC. more portable and more easily understood and main-
Luis Trabb Pardo helped me to develop a suitable style tained; furthermore, the system seems to work with
14 submitted to THE COMPUTER JOURNAL
LITERATE PROGRAMMING

large programs as well as with small ones. I’m pleased like to write and to explain what they are doing. My
that my work on typography, which began as an appli- hope is that the ability to make explanations more nat-
cation of computers to another field, has come full circle ural will cause more programmers to discover the joys
and become an application of typography to the heart of literate programming, because I believe it’s quite a
of computer science; I like to think of WEB as a neat pleasure to combine verbal and mathematical skills; but
“spinoff” of my research on TEX. However, all of my perhaps I’m hoping for too much. The fact that at least
experiences with this system have been highly colored one paper has been written that is a syntactically cor-
by my own tastes, and only time will tell if a large num- rect ALGOL 68 program22 encourages me to persevere
ber of other people will find WEB to be equally attractive in my hopes for the future. Perhaps we will even one
and useful. day find Pulitzer prizes awarded to computer programs.
I made a conscious decision not to design a language And what about the future of WEB? If the next year or
that would be suitable for everybody. My goal was to so of trial use shows that a lot of other people besides
provide a tool for system programmers, not for high myself become “hooked” on this method of program-
school students or for hobbyists. I don’t have anything ming, there will be many ways to incorporate the WEB
against high school students and hobbyists, but I don’t philosophy into a really effective programming environ-
believe every computer language should attempt to of- ment. For example, it will be worthwhile to produce a
fer all things to all people. A user of WEB needs to unified system that does both tangling and compiling,
be good enough at computer science that he or she is instead of using separate programs as in Figure 1; and
comfortable dealing with several languages simultane- it will also be worthwhile to carry the unification one
ously. Since WEB combines TEX and PASCAL with a few step further, so that run-time debugging as well as syn-
rules of its own, WEB programs can contain WEB syntax tactic debugging can be done entirely in terms of the
errors, TEX syntax errors, PASCAL syntax errors, and WEB source language. Furthermore, a WEB-like system
algorithmic errors; in practice, all four types of errors could be designed to incorporate additional modular-
occur, and a bit of sophistication is needed to sort out ization, so that it would be easier to compile different
which is which. Computer scientists tend to be better parts of a program independently. The new generation
at such things than other people. I have found that of graphic workstations makes it desirable to display se-
WEB programs can be debugged rapidly in spite of the lected program sections on demand, by using TEX only
profusion of languages, but I’m sure that many other on the sections that are of current interest, instead of
intelligent people will find such a task difficult. producing hardcopy for an entire document. And so
In other words, WEB seems to be specifically for the on; a considerable amount of additional research and
peculiar breed of people who are called computer sci- development will be appropriate if the idea of literate
entists. And I’m pretty sure that there are also a lot of programming catches on.
computer scientists who will not enjoy using WEB; some
of us are glad that traditional programming languages Acknowledgements
have comparatively primitive capabilities for inserted
The preparation of this paper was supported in part by the Na-
comments, because such difficulties provide a good ex- tional Science Foundation under grants IST-8201926 and MCS-
cuse for not documenting programs well. Thus, WEB 8300984, and by the System Development Foundation. ‘TEX’ is
may be only for the subset of computer scientists who a trademark of the American Mathematical Society.

REFERENCES

1. D. E. Knuth, The TEXbook. Addison-Wesley, Reading, Mass., 12. C. A. R. Hoare, Hints on Programming Language Design. Stan-
U.S.A. (1983). ford Computer Science Report CS403 (October 1973).
2. O.-J. Dahl, E. W. Dijkstra, and C. A. R. Hoare, Structured Pro- 13. P. Naur (ed.) et al., Report on the algorithmic language ALGOL
gramming. Academic Press, London and New York (1972). 60. Communications of the ACM 3, 299–314.
3. D. E. Knuth, Structured programming with go to statements. 14. W. M. McKeeman, Algorithm 268. Communications of the ACM
Computing Surveys 6, 261–301 (1974). 8, 667–668 (1965).
4. D. E. Knuth, The WEB System of Structured Documentation. 15. D. Oppen, Prettyprinting. ACM Transactions on Programming
Stanford Computer Science Report CS980 (September 1983). Languages and Systems 2, 465–483 (1980).
5. P. Naur, Formalization in program development. BIT 22, 437– 16. G. A. Rose and J. Welsh, Formatted programming languages.
453 (1982). Software—Practice & Experience 11, 651–669 (1981).
6. G. E. Forsythe, Algorithms for scientific computation. Communi- 17. I. Zabala and L. Trabb Pardo, The status of the PASCAL imple-
cations of the ACM 9, 255–256 (1966). mentation of TEX. TUGboat 1, 16–17 (1980).
7. P. A. de Marneffe, Holon Programming. Univ. de Liege, Service 18. I. Zabala, TEX-PASCAL and PASCAL compilers. TUGboat 2 (1),
D’Informatique (December, 1973). 11–12 (1981).
8. P. A. de Marneffe and D. Ribbens, Holon Programming, in A. 19. I. Zabala, Some feedback from PTEX installations. TUGboat 2
Günther et al. (eds.), International Computing Symposium 1973 , (2), 16–19 (1981).
Amsterdam, North-Holland (1974). 20. I. A. Zabala, How portable is PASCAL? Draft of paper in prepa-
9. A. Koestler, The Ghost in the Machine. New York, Macmillan ration (1982).
(1968). 21. H. Thimbleby, Cweb. Preprint, University of York (August 1983).
10. E. Towster, A convention for explicit declaration of environments 22. C. H. Lindsey, ALGOL 68 with fewer tears. The Computer Journal
and top-down refinement of data. IEEE Transactions on Software
15, 176–188 (1972).
Engineering SE–5, 374–386 (1979).
11. D. E. Knuth, Computer-drawn flow charts. Communications of
the ACM 6, 555–563 (1963). Received September 1983

submitted to THE COMPUTER JOURNAL 15


Changes at the Journal of Algorithms
Donald E. Knuth, David S. Johnson, and Zvi Galil
Former Co-Editors

For several years we and many of our colleagues have become more and
more concerned about the fact that libraries are increasingly unable to afford
the prices being charged by commercial publishers of scientific journals. In
October of 2003, Don Knuth wrote a long letter to the editorial board of the
Journal of Algorithms, attempting to explain the current state of affairs as
comprehensively and accurately as he could. His letter has now been posted
online at

http://www-cs-faculty.stanford.edu/~knuth/joalet.pdf

and we hope it will be read also by everyone else who is concerned with
publication of computer-science journals. In response to Knuth’s letter, the
entire editorial board ultimately decided to resign from the Journal of Algo-
rithms in favor of launching a new journal to be called ACM Transactions
on Algorithms (see next page).
Elsevier, the publisher of the Journal of Algorithms, intends to continue
publishing the journal, and papers currently in the pipeline will continue to
be handled by the outgoing editors. Here is Elsevier’s official statement.

The Managing Editors and the Publisher announce that the Ed-
itorial Board of the Journal of Algorithms has resigned per Jan-
uary 1 of this year because of an unresolved dispute concerning
the commercial aspects of scientific publishing. Papers which
have been submitted prior to this date will be refereed in the
usual way and published in the course of this year and next year.
Papers submitted after this date will be forwarded to the new
Editorial Board, which will be appointed shortly. It is expected
that this transition will not result in any additional publication
delay.

1
A New Journal: ACM Transactions on Algorithms
Hal Gabow

On January 21, the ACM Publications Board approved a proposal for


a new journal dedicated to Algorithms, ACM Transactions on Algorithms
(TALG). TALG’s editorial board is the editorial board that resigned from
the Journal of Algorithms. The three Editors of JALG, Zvi Galil, David
Johnson and Don Knuth, are members of the new editorial board. I’m very
happy to take over from them as Editor-in-Chief of TALG.
At the time of this writing many details of TALG are still in preliminary
form, but it is certain that individuals and libraries will be able to subscribe at
an attractive price similar to those of other ACM journals. Announcements
of when TALG will begin accepting submissions will appear in various places
including the ACM website.

Call for NP-Completeness Results


David Johnson

To celebrate the launch of the new ACM Transactions on Algorithms I


am planning to revive my “NP-Completeness Column,” with the first col-
umn since 1992 appearing in the inaugural issue of the new journal. This
column will survey the most important/interesting NP-hardness results that
have appeared in the past 12 years, and I would greatly welcome sugges-
tions as to which results should be included. You can send them to me at
dsj@research.att.com.

2
All Questions Answered
Donald Knuth

On October 5, 2001, at the Technische Universität (1979), the Adelsköld Medal from the Royal Swedish
München, Donald Knuth presented a lecture entitled Academy of Sciences (1994), the Harvey Prize from
“All Questions Answered”. The lecture drew an au- the Technion of Israel (1995), the John von Neumann
dience of around 350 people. This article contains Medal from the Institute of Electrical and Electron-
the text of the lecture, edited by Notices senior ics Engineers (1995), and the Kyoto Prize from the
writer and deputy editor Allyn Jackson. Inamori Foundation (1996). Since 1968 Knuth has
Originally trained as a mathematician, Donald been on the faculty of Stanford University, where
Knuth is renowned for his research in computer sci- he currently holds the title of Professor Emeritus of
ence, especially the analysis of algorithms. He is a The Art of Computer Programming.
prolific author, with 160 entries in MathSciNet. —Allyn Jackson
Among his many books is the three-volume series
The Art of Computer Programming [TAOCP], for Knuth: In every class that I taught at Stanford,
which he received the AMS Steele Prize for Exposi- the last day was devoted to “all questions an-
tion in 1986. The citation for the prize stated that swered”. The students didn’t have to come to class
TAOCP “has made as great a contribution to the if they didn’t want to, but if they did, they could
teaching of mathematics for the present generation ask any question on any subject except religion or
of students as any book in mathematics proper in politics or the final exam. I got the idea from
recent decades.” The long awaited fourth volume is Richard Feynman, who did the same thing in his
in preparation and some parts are available through classes at Caltech, and it was always interesting to
Knuth’s website, http://www-cs-faculty. see what the students really wanted to know. Today
stanford.edu/~knuth/. I’ll answer any question on any subject. Do we
Knuth is the creator of the TE X and METAFONT allow religion or politics? I don’t know. But there
languages for computer typesetting, which have is no final exam to worry about. I’ll try to answer
revolutionized the preparation and distribution of without taking too much time so that we can get a
technical documents in many fields, including math- lot of questions in.
ematics. In 1978 he presented the AMS Gibbs Lecture So, who wants to ask the first question?… Well,
entitled “Mathematical Typography”. The lecture if there are no questions…[Knuth makes as if to
was subsequently published in the Bulletin of the leave.]
AMS [MT]. Question: There was a special report to the Amer-
Knuth earned his Ph.D. in mathematics in 1963 ican president, the PITAC report [PITAC], contain-
from the California Institute of Technology under ing some recommendations. I am wondering
the direction of Marshall Hall. He has received the whether you would be willing to comment on the pri-
Turing Award from the Association for Computing orities outlined in these recommendations:
Machinery (1974), the National Medal of Science better software engineering, building a teraflop

318 NOTICES OF THE AMS VOLUME 49, NUMBER 3


computer, improvements in the Internet including you don’t make a test and then have that determine
higher security and higher bandwidth, and the what you do next. A lot of you have seen the movie
socio-economic impacts of managing information Lola rennt (called Run Lola Run in the U.S.), in which
available via computer networks. the plot is played out three different times, with the
Knuth: I think that’s a brilliant solution of the outcome taking three different branches. Quantum
problem of what to present to a president. But in computing is something like that: The world goes
fact what I would like to see is thousands of com- into many different branches, and we’re interested
puter scientists let loose to do whatever they want. in the one where the outcome is the nicest.
That’s what really advances the field. From my ex- I’m good at nonquantum computing myself, so
perience writing The Art of Computer Programming, it’s quite possible that if quantum computing takes
if you asked me any year what was the most im- over, I won’t be able to do the new stuff. My life’s
portant thing that happened work is with computers not

NON SEQUITUR © 2001 Wiley Miller. Dist. By UNIVERSAL PRESS SYNDI-


in computer science that year, because I’m interested so
I probably would have no an- much in computation, but be-
swer for the question, but over cause I happen to be good at
five years’ time the whole field this kind of computing. For-
changes. Computer science is tunately for me, I found that
a tremendous collaboration the thing I could do well was

CATE. Reprinted with permission. All rights reserved.


of people from all over the interesting to other people. I
world adding little bricks to a didn’t develop an ability to
massive wall. The individual think about algorithms be-
bricks are what make it work, cause I wanted to help people
and not the milestones. solve problems. Somehow, by
Next question? the time I was a teenager, I
Question: Mathematicians had a peculiar way of think-
say that God has the “Book of ing that made me good at pro-
Proofs”, where all the most gramming. But I might not be
aesthetic proofs are written. good at quantum program-
Can you recommend some ming. It seems to be a differ-
algorithms for the “Book of Al- ent world from my own.
gorithms”? I’ll take a question from
Knuth: That’s a nice ques- the back.
tion. It was Paul Erdős who Question: I am working in
promulgated the idea that God has a book con- theorem proving, and one of the most important pa-
taining the best mathematical proofs, and I guess pers is your paper “Simple word problems in uni-
my friend Günter Ziegler in Berlin has recently versal algebra” [KB] from 1970, written with
written about it [PFB]. P. B. Bendix. I have two questions. The first is, do you
I remember that mathematicians were telling still follow this area and what do you think of it? And
me in the 1960s that they would recognize com- the second is, who is and what became of P. B. Ben-
puter science as a mature discipline when it had dix?
1,000 deep algorithms. I think we’ve probably Knuth: This work was published in 1970, but I
reached 500. There are certainly lots of algorithms actually did it in 1967 while I was at Caltech. It
that I think have to be considered absolutely beau- was a simple idea, but fortunately it’s turned out
tiful and immortal, in some sense. Two examples to be very widely applicable. The idea is to take a
are the Euclid algorithm and a corresponding one set of mathematical axioms and find all the
that works in binary notation and that may have implications of those axioms. If I have a certain
been developed independently in China, almost as set of axioms and you have a possible theorem,
early as Euclid’s algorithm was invented in Greece. you ask, does this theorem follow from those
In my books I am mostly concerned with the algo- axioms or not? I called my paper “Simple word
rithms that are classical and that have been around problems in universal algebra”, and I said a
for a long time. But still, every year we find brand problem was “simple” if my method could
new ideas that I think are going to remain forever. handle it. Now people have extended the method
Question: Do you have thoughts on quantum quite a lot, so that a lot more problems are
computing? “simple”. I think their work is beautiful.
Knuth: Yes, but I don’t know a great deal about The year 1967 was the most dramatic year of
it. It’s quite a different paradigm from what I’m used my life by far. I had no time for research. I had
to. It has lots of things in common with the kind two children less than two years old; I had been
of computing I know, but it’s also quite mysterious scheduled to be a lecturer for ACM (Association
in that you have to get all the answers at the end; for Computing Machinery) for three weeks; I had

MARCH 2002 NOTICES OF THE AMS 319


Question: It seems to me it’s easier to revise a
book than the huge software programs we see day
to day. How can we apply theory to improve soft-
ware?
Knuth: Certainly errors in software are more dif-
ficult to fix than errors in books. In fact, my main
conclusion after spending ten years of my life work-
ing on the TEX project is that software is hard. It’s
harder than anything else I’ve ever had to do. While
I was working on the TEX program, I was unable to
do full-time teaching. Although I love teaching, I
had to take a year off from it because there was just
too much to keep in my head at one time. Writing a
book is a little more difficult than writing a techni-
cal paper, but writing software is a lot more difficult
than writing a book.
to give lectures in a
NATO summer school In my books I offer rewards for the first person
in Copenhagen; I had to who finds any particular error, and I must say that
speak in a conference at I’ve written more checks to people in Germany
Oxford; and so on. And than in any other country in the world. I get letters
I was getting the page from all over, but my German readers are the best
proofs for The Art of nitpickers that I’ve ever had! In software I similarly
Computer Program- pay for errors in the TEX and METAFONT programs.
ming, of which the first The reward was doubling every year: It started out
volume was being at $2.56, then it went to $5.12, and so on, until it
published in 1968. All reached $327.68, at which time I stopped dou-
of this was in addition bling. There has been no error reported in TEX since
to the classes I was 1994 or 1995, although there is a rumor that some-
teaching, and an attack body has recently found one. I’m going to have to
of ulcers that put me in look at it again in a year or two. I do everything in
the hospital, and being batch mode, by the way. I am going to look again
an editor for twelve at possible errors in TEX in, say, the year 2003.
journals. That year I I think letting users know that you welcome re-
thought of two little ports of errors is one important technique that
ideas. One has become could be used in the software industry. I think
known as the Knuth-Bendix algorithm; the other Microsoft should say, “You’ll get a check from Bill
one is known as attribute grammars [AG]. That Gates every time you find an error.”
was the most creative year of my life, and it was Question: What importance do you give to the de-
also the most hectic. sign of efficient algorithms, and what emphasis do
You asked about Peter Bendix. He was a sopho- you suggest giving this area in the future?
more in a class I taught at Caltech, “Introduction Knuth: I think the design of efficient algorithms
to Algebra”. Every student was supposed to do a is somehow the core of computer science. It’s at
class project, and Peter did his term paper on the the center of our field. Computers are incredibly
implementation of the algorithm. He was a physics fast now compared to what they were before, so
major. This was the time of the Vietnam War, and for many problems there is no need to have an ef-
he became an objector. He went to Canada and ficient algorithm. I can write programs that are in
worked as a high school teacher for about five some sense extremely inefficient, but if it’s only
years and later got a degree in physics. I found he going to take one second to get the answer, who
was living near Stanford a couple of years ago, so cares? Still, some things we have to do millions or
I called him up and found out that he has had a billions of times, and just knowing that the num-
fairly happy life in recent years. ber of times is finite doesn’t tell us that we can han-
In the 1960s, if I wrote a joint paper with my ad- dle it. So the number of problems that are in need
visor Marshall Hall, it meant that he did the theory of efficient algorithms is huge. For example, many
and I did the programming. But if I wrote a paper problems are NP complete, and NP complete is
with anybody else, it meant that I did the theory and just a small level of complexity. Therefore I see an
the other person did the programming. So Pete almost infinite horizon for the need for efficient
Bendix was a good programmer who implemented algorithms. And that makes me happy because
the method. those are the kinds of problems I like the best.

320 NOTICES OF THE AMS VOLUME 49, NUMBER 3


Question: You have a big interest in puzzles, in- Question: You
cluding the “Tower of Hanoi” puzzle on more than spent a lot of time on
3 pegs. I won’t ask a harder question—what is the computer typesetting.
shortest solution?—because I am not sure everyone What are your reflec-
knows this puzzle. But I will ask a more philosoph- tions on the impact of
ical question: Is it possible to show this can never be this work?
solved? Knuth: I am ex-
Knuth: Do people know the “Tower of Hanoi” tremely happy that
problem? You have 3 pegs, and you have disks of my work was in the
different sizes. You’re supposed to transfer the disks public domain and
from one peg to another, and the disks have to be made it possible for
sorted on each peg so that the biggest is always on people on all plat-
forms to communi-
the bottom. You can move only one disk at a time.
cate with each other
Henry Dudeney invented the idea of generalizing
via the Internet. Espe-
this puzzle to more than 3 pegs, and the task of find-
cially now I’m thrilled
ing the shortest solution to the 4-peg problem has
by some recent pro-
been an open question for more than a hundred jects. Two weeks ago
years. The 3-peg problem is very simple; we teach it I heard a great lecture
to freshmen. by Bernd Wegner from
But take another, more famous problem, the the Technical Univer-
Goldbach conjecture in mathematics: Every even in- sity of Berlin about
teger is the sum of two odd primes. Now, I think the plans for online
that’s a problem that’s never going to be solved. I journals by the Euro-
think it might not even have a proof. It might be pean Mathematical So-
one of the unprovable theorems that Gödel showed ciety. Such things
exist. In fact, we now know that in some sense al- would simply have
most all correct statements about mathematics are been impossible with-
unprovable. Goldbach’s conjecture is just, sort of, out the open source
true because it can’t be false. There are so many software that came
ways to represent an even number as the sum of out of my work. So I’m
two odd numbers, that as the numbers grow the extremely delighted
number of representations grows bigger and big- this is helping to ad-
10
ger. Take a 1010 -digit even number, and imagine vance science.
how many ways there are to write that as the sum I’m happy to see
of two odd integers. For an n-bit odd number, the many books that look
chances are proportional to 1/n that it’s prime. How pretty good. Before I
are all of those pairs of odd numbers going to be started my work,
books on mathemat-
nonprime? It just can’t happen. But it doesn’t fol-
ics were looking worse
low that you’ll find a proof, because the definition
and worse from year
of primality is multiplicative, while Goldbach’s con-
to year. It took a lot
jecture pertains to an additive property. So it might
of skilled handwork
very well be that the conjecture happens to be
to do it in the old sys-
true, but there is no rigorous way to prove it. tem. The people who
In the case of the 4-peg “Tower of Hanoi”, there could do that were
are many, many ways to achieve what we think is dying out, and high
the minimum number of moves, but we have no priority did not go to
good way to characterize all those solutions. So mathematical books.
that’s why I personally came to the conclusion that I never expected that
I was never going to be able to solve it, and I TEX would take over the entire world of publishing.
stopped working on it in 1972. But I spent a solid I’m not a very competitive person, and I did not
week working on it pretty hard. want to take jobs away from anybody who was
Question: What are the five most important prob- doing another way of printing. But I found that no-
lems in computer science? body wanted to do mathematical publishing well,
Knuth: I don’t like this “top ten” business. It’s so math was something I could improve without
the bottom ten that I like. I think you’ve got to getting anybody upset about me being an upstart.
go for the little things, the stones that make up The downside is that I’m too sensitive to things
the wall. now. I can’t go to a restaurant and order food

MARCH 2002 NOTICES OF THE AMS 321


because I keep looking at the fonts on the menu. solve a problem, we can prove that we’ve solved it.
Five minutes later I realize that it’s also talking No astronomer will ever know whether his theories
about food. If I had never thought about computer of astronomy are correct. You can’t go up to the sun
typesetting, I might have had a happier life in some and measure it.
ways. So these are my first thoughts on that connec-
Question: Can you give us an outline for com- tion. But there is a difference between mathemat-
puter science, some milestones for the next ten or ics and computer programming, and sometimes I
twenty years? can feel when I’m putting on one hat or the other.
Knuth: You’re asking for milestones again. Some parts of me like mathematics, and some
There is a lot of interest in applications to biology parts of me like emacs hacking. I think they go
because so many things have opened up in that together okay, but I don’t see that they’re the same
domain, with chances to cure diseases. The fact paradigm.
that human beings are based on a discrete code Question: What is the relationship between God
means that people like you and me, who are good and computers?
at discrete problems, are able to do relevant work Knuth: In one of my books, 3:16 Bible Texts
for this area. The problems are very difficult and Illuminated [BTI], I used random sampling to study
challenging, and that’s why I foresee an important sixty different verses of the Bible and what people
future there. from all different religious persuasions and dif-
But in all aspects of our field, I really don’t see ferent centuries have said about those verses. I did
any slowing down. Every time I think I’ve discov- the study at first on my own, and then I found it
ered something interesting, I look on the Internet was interesting enough that I ought to make a book
and find that somebody else has done it too. So we about it. I got sixty of the best artists in the world
have a field that at the moment still seems to be to illustrate the book, many of them in Germany.
like a boiling kettle, where you can’t keep the lid The artwork was exhibited twice in Germany, and
on. in other countries around the world. It was also
In the field of biology, I think we can confidently shown in the National Cathedral in Washington,
predict that it’s going to have rich problems to DC. In that book I used methodology that com-
solve for at least 500 years. I can’t make that claim puter scientists often use for understanding a
for computer science. complicated subject, to see if that method would
Question: What is the connection between math- give some insight into the Bible, which is a com-
ematics and computer programming viewed as an plicated subject. In the book, I don’t give answers.
art? I just say I think it’s good that life should be an
Knuth: Art is Kunst. The American movie ongoing search. The journey is more important
Artificial Intelligence is called Kunstlicher Intelligenz than the destination.
in Germany—that is, artificial as well as artistic. I Question: Do you know whether “P equals NP”
think of programming with beauty in mind, as has been proved? I heard a rumor that it has.
being something elegant, something that you can Knuth: Which rumor did you hear?
be proud of for the way it fits together. Mathematics Question: One from Russia.
in the same way has elegance. Both fields, com- Knuth: From Russia? That’s new to me. Well, I
puting and mathematics, are different from don’t think anybody has proved that P equals NP
other sciences because they are artificial; they yet. But I know that Andy Yao has retired and
are not in nature. They’re totally under our own hopes to solve the problem in the next five to
control. We make up the axioms, and when we ten years. He is inspired by Andrew Wiles, who

322 NOTICES OF THE AMS VOLUME 49, NUMBER 3


devoted several years to proving Fermat’s Last by stealing my bank accounts or whatever. So I am
Theorem. They’re both Princeton people. Andy supportive of a high level of secrecy. But whether
can do it if anybody can. it should be impossible for the authorities to
Three or four years ago, there was a paper in a decode things even in criminal investigations, in
Chinese journal of computer science and technol- extreme cases—there I tend to come down on the
ogy by a professor who claimed he could solve an side of wanting to have some way to break some
NP-hard problem in polynomial time. The problem keys sometimes.
was about cliques, and he had a very clever way to Question: Will we have intelligent machines some-
represent cliques. The method was supposedly time in the future? Should we have them?
polynomial time, but it actually took something like Knuth: There have always been inflated esti-
n12 steps, so you couldn’t even check it when n mates as to how soon we are going to have a
equals 5. So it was very hard to see the bug in his machine that’s intelligent. I still see no signs of
proof. I went to Stanford and sat down with our getting around the central problem of under-
graduate students, and we needed a couple of standing what is cognition, what it means to think.
hours before we found the flaw. I wrote the author Neurologists are making better measurements
a letter pointing out the error, and he wrote back than they ever have before, but we are still so far
a couple of months later, saying, “No, no, there is from finding an answer that I can’t yet rank
no error.” I decided not to pursue it any further. I neuroscience as one of the most active fields of
had done my part. But I don’t believe it’s been current work. Biology has been getting answers,
solved. That’s the most mind-boggling problem with DNA and stem cells and so on. But with cog-
facing theoretical computer science, and maybe nition we are still looking for the secret.
all of science at the moment. Some very thought-provoking books came out
Question: What do you think of research in a year or two ago, one by Hans Moravec [Mo], and
cryptographic algorithms? And what do you one by Ray Kurzweil [Ku]. Both of them are saying
think of efforts by politicians today to put limits on that in twenty or thirty years we are going to have
cryptography research? machines smarter than humans. Some people were
Knuth: Certainly the whole area of cryptographic worried about that. My attitude is, if that’s true,
algorithms has been one of the most active and ex- more power to them. If they are smarter than us,
citing areas in computer science for the past ten so what? Then we can learn from them. But I see
years, and many of the results are spectacular and no signs that there are any breakthroughs around
beautiful. I can’t claim that I’m good at that par- the corner.
ticular subject, though, because I can’t think of Two weeks ago in Greece I was at the inaugura-
sneaky attacks myself. But the key problem is, tion of a new book by Christos Papadimitriou, who
what about the abuse of secure methods of com- is chairman of computer science at Berkeley. He
munication? I don’t want criminals to use these published a novel in the Greek language called The
methods to become better criminals. Smile of Turing [Pa]. I don’t want to give away the
I’m a religious person, and I think that God story, but when it gets published in German or
knows all my secrets, so I always feel that whatever English, you’ll find that it has a very nice discus-
I’m thinking is public knowledge in some way. I sion of artificial intelligence and the Turing test for
come from this kind of background. I don’t feel intelligence.
I have to encrypt everything I do. On the other The most promising model of how the brain
hand, I would certainly feel quite differently if works that I’ve seen says that the brain is a dynamic
somebody started to use such openness against me, genetic algorithm that operates all the time. As I

MARCH 2002 NOTICES OF THE AMS 323


am talking to you now, your brains have a lot of integer really discovered by man? Or is it something
competing theories about what I’m going to say. It’s that is God given? When we start thinking of com-
the survival of the fittest, a continual plexity issues, we have to change our viewpoint as
battle among the competing theories. to what is in nature and what is invented.
Some come to the surface and actually Question: You have been writing checks to peo-
enter your consciousness, but the ple who point out errors in your books. I have never
others are all there. Some kind of mat- heard of anyone cashing these checks. Do you know
ing of concepts might be going on in our how much money you would be out of, if everyone
heads all the time. This model seems to suddenly cashed the checks?
have the right properties to account for Knuth: There’s one man who lives near Frank-
how we can do what we do with the furt who would probably have more than $1,000
relatively slow response time that our if he cashed all the checks I’ve sent him. There’s a
neurons have. But I am by no means an man in Los Gatos, California, whom I’ve never met,
expert on this. who cashes a check for $2.56 about once a month,
Question: What is your thinking about and that’s been going on for some years now.
software patents? There is a big discus- Altogether I’ve written more than 2,000 checks
sion going on in Europe right now about over the years, and the average amount exceeds
whether software should be patentable. $8.00 per check. Even if everybody cashed their
Knuth: I’m against patents on things checks, it would still be more than worth it to me
that any student should be expected to to know that my books are getting better.
discover. There have been an awful lot
References
of software patents in the U.S. for ideas
[TAOCP] The Art of Computer Programming, by Donald E.
that are completely trivial, and that
Knuth. Volume 1: Fundamental Algorithms (third
bothers me a lot. There is an organiza- edition, Addison-Wesley, 1997). Volume 2: Semi-
tion that has worked for many years to numerical Algorithms (third edition, Addison-Wesley,
make patents on all the remaining triv- 1997). Volume 3: Sorting and Searching (second
ial ideas and then make these available edition, Addison-Wesley, 1998). Volume 4: Combina-
to everyone. The way patenting had torial Algorithms (in preparation).
been going was threatening to make [MT] Mathematical typography, by Donald E. Knuth. Bull.
the software industry stand still. Amer. Math. Soc. (N.S.) 1 (1979), no. 2, 337–372.
Algorithms are inherently mathe- Reprinted in Digital Typography (Stanford, Califor-
matical things that should be as un- nia: CSLI Publications, 1998), pp. 19–65.
[PITAC] President’s information technology advisory com-
patentable as the value of π . But for
mittee. See http://www.itrd.gov/ac/.
something nontrivial, something like [PFB] Proofs from The Book, by Martin Aigner and Gün-
the interior point method for linear pro- ter Ziegler. Second edition, Springer Verlag, 2001.
gramming, there’s more justification [KB] Simple word problems in universal algebras, by
for somebody getting a right to license Peter B. Bendix and Donald Knuth. Computational
the method for a short time, instead of Problems in Abstract Algebra, J. Leech, ed. (Oxford:
keeping it a trade secret. That’s the Pergamon, 1970), pp. 263–297. Reprinted in Au-
whole idea of patents; the word patent tomation of Reasoning, Jörg H. Siekmann and Graham
means “to make public”. Wrightson, eds. (Springer, 1983), pp. 342–376.
[BTI] 3:16 Bible Texts Illuminated, by Donald E. Knuth.
I was trained in the culture of mathematics, so
A-R Editions, Madison, Wisconsin, 1990.
I’m not used to charging people a penny every time
[AG] Semantics of context-free languages, by Donald E.
they use a theorem I proved. But I charge somebody Knuth. Mathematical Systems Theory 2 (1968),
for the time I spend telling them which theorem 127–145. See also The genesis of attribute gram-
to apply. It’s okay to charge for services and mars, in Lecture Notes in Computer Science 461
customization and improvement, but don’t make (1990), 1–12.
the algorithms themselves proprietary. [Pa] TO XAMOGELO TOY TOYRINGK (The Smile of Tur-
There’s an interesting issue, though. Could you ing), by Christos Papadimitriou. Livani Publishers,
possibly have a patent on a positive integer? It is Athens, Greece, 2001.
not inconceivable that if we took a million of the [Ku] The Age of Spiritual Machines: When Computers Ex-
greatest supercomputers today and set them going, ceed Human Intelligence, by Ray Kurzweil. Penguin
USA, 2000.
they could compute a certain 300-digit constant
[Mo] Robot: Mere Machine to Transcendent Mind, by
that would solve any NP-hard problem by taking Hans P. Moravec. Oxford University Press, 2000.
the GCD of this constant with an input number, or
by some other funny combination. This integer
would require massive amounts of computation
Photographs used in this article are courtesy of
time to find, and if you knew that integer, then you
Andreas Jung, Technische Universität München.
could do all kinds of useful things. Now, is that

324 NOTICES OF THE AMS VOLUME 49, NUMBER 3


Ancient
Babylonian
Algorithms
Donald E. Knuth
Stanford University

The early origins of mathematics are discussed, One of the ways to help make computer science re-
emphasizing those aspects which seem to be of greatest spectable is to show that it is deeply rooted in history,
interest from the standpoint of computer science. A not just a short-lived phenomenon. Therefore it is natu-
number of old Babylonian tablets, many of which have ral to turn to the earliest surviving documents which
never before been translated into English, are quoted. deal with computation, and to study how people ap-
Key Words and Phrases: history of computation, proached the subject nearly 4000 years ago. Archeo-
Babylonian tablets, sexagesimal number system, sorting logical expeditions in the Middle East have unearthed a
CR Categories: 1.2 large number of clay tablets which contain mathematical
calculations, and we shall see that these tablets give
many interesting clues about the life of early "computer
scientists."

Introduction to Babylonian Mathematics

The tablets in question come from the general area of


Mesopotamia (present day Iraq), between the Tigris and
Euphrates rivers, centered more or less about the ancient
city of Babylon (near present-day Baghdad). They are
covered with cuneiform (i.e. "wedge-shaped") script, a
form of writing which goes back to about 3000 B.C. The
tablets of greatest mathematical interest were written
about the time of the Hammurabi dynasty, about 1800-
1600 B.c., and we shall be primarily concerned with
texts that date from this so-called Old-Babylonian pe-
riod.
It is well known that Babylonians worked in a
sexagesirnal, i.e. radix 60, number system, and that our
present sexagesimal units of hours, minutes, and seconds
Copyright @ 1972, Association for Computing Machinery, Inc. are vestiges of their system. But it is less widely known
General permission to republish, but not for profit, all or part that the Babylonians actually worked withfloating-point
of this material is granted, provided that reference is made to this sexagesimal numbers, using a rather peculiar notation
publication, to its date of issue, and to the fact that reprinting
privileges were granted by permission of the Association for Com- that did not include any exponent part. Thus, the two-
puting Machinery. digit number
Author's address: Stanford University, Computer Science De-
partment, Stanford, CA 94305. The preparation of this paper was 2,20
supported in part by the National ScienceFoundation, under grant
GJ-992. stood for 2 × 60 + 20 = 140, and for 2 + 20/60 = 2~,

671 Communications July 1972


of Volume 15
the ACM Number 7
and for 2/60 + 20/3600, and in general for 140 X 60 ", been added in parentheses, to explain some of the things
where n is any integer. that were originally unstated on the tablets. All numbers
At first sight this manner of representing numbers are presented Babylonian-style, i.e. without exponents,
may look very awkward, but in fact it has significant so the reader is warned that he will have to supply an
advantages when multiplication and division are in- appropriate scale factor in his head; thus, it is necessary
volved. We use the same principle when we do calcula- to remember that 1 might mean 60 and 15 might mean ¼.
tions by slide rule, performing the multiplications and The first example we shall discuss is excerpted f r o m
divisions without regard to the decimal point location an Old-Babylonian tablet which was originally about
and then supplying the appropriate power of 10 later. 5 X 8 × 1 inches in size. H a l f of it now appears in the
A Babylonian mathematician computing with numbers British Museum, about one-fourth appears in the
that were meaningful to him could easily keep the ap- Staatliche Museen, Berlin, and the other fourth has ap-
propriate power of 60 in mind, since it is not difficult to parently been lost or destroyed over the years. The
estimate the range of a value within a factor of 60. A original text appears in [3, pp. 193-199; 4, Tables 7, 8,
few instances have been found where addition was per- 39, 40; and 8, pp. 11-21].
formed incorrectly because the radix points were im- A (rectangular) cistern.
properly aligned [7, p. 28], but such examples are sur- The height is 3,20, and a volume of 27,46,40 has been
prisingly rare. excavated.
As an indication of the utility of this floating-point The length exceeds the width by 50. (The object is to find the
notation, consider the following table of reciprocals: length and the width.)
You should take the reciprocal of the height, 3,20, obtaining 18.
Multiply this by the volume, 27,46,40, obtaining 8,20. (This
2 30 16 3,45 45 1,20 is the length times the width; the problem has been reduced
3 20 18 3,20 48 1,15 to finding x and y, given that x -- y = 50 and xy = 8,20.
4 15 20 3 50 1,12 A standard procedure for solving such equations, which
5 12 24 2,30 54 1,6,40 occurs repeatedly in Babylonian manuscripts, is now used.)
6 10 25 2,24 1 1 Take half of 50 and square it, obtaining 10, 25.
8 7,30 27 2,13,20 1,4 56,15 Add 8,20, and you get 8,30, 25. (Remember that the radix point
9 6,40 30 2 1,12 50 position always needs to be supplied. In this case, 50 stands
10 6 32 1,52,30 1,15 48 for 5/6 and 8,20 stands for 8], taking into account the
12 5 36 1,40 1,20 45 sizes of typical cisterns!)
15 4 40 1,30 1,21 44,26,40 The square root is 2,55.
Make two copies of this, adding (25) to the one and subtracting
Dozens of tablets containing this information have been from the other.
found, some of which go back as far as the " U r I I I You find that 3,20 (namely 3-~)is the length and 2,30 (namely
dynasty" o f a b o u t 2250 B.c. There are also m a n y mul- 2½) is the width.
This is the procedure.
tiplication tables which list the multiples of these num-
bers; for example, division by 81 = 1,21 is equivalent to The first step here is to divide 27,46,40 by 3,20; this is
multiplying by 44,26,40, and tables of 44,26,40 × k for reduced to multiplication by the reciprocal. The mul-
1 < k < 20 and k = 30,40,50 were commonplace. Over tiplication was done by referring to tables, probably by
two hundred examples of multiplication tables have manipulating stones or sand in some manner and then
been catalogued. writing down the answer. The square root was also
computed by referring to tables, since we k n o w that
m a n y tables of n vs. n ~ existed. N o t e that the rule for
Babylonian "Programming" computing the values of x and y such that x -- y = d
and x y = p was to f o r m
The Babylonian mathematicians were not limited sqrt((d/2) ~ + p) 4- (d/2).
simply to the processes of addition, subtraction, mul-
tiplication, and division; they were adept at solving The calculations described in Babylonian tablets are
m a n y types of algebraic equations. But they did not not merely the solutions to specific individual problems:
have an algebraic notation that is quite as transparent as they actually are general procedures for solving a whole
ours; they represented each formula by a step-by-step class of problems. The numbers shown are merely in-
list of rules for its evaluation, i.e. by an algorithm for cluded as an aid to exposition, in order to clarify the
computing that formula. In effect, they worked with a general method. This tact is clear because there are
"machine language" representation of formulas instead numerous instances where a particular case of the gen-
of a symbolic language. eral method reduces to multiplying by 1 ; such a multi-
The flavor of Babylonian mathematics can best be plication is explicitly carried out, in order to abide by
appreciated by studying several examples. The transla- the general rules. Note also the stereotyped ending,
tions below attempt to render the words of the original "This is the procedure," which is c o m m o n l y found at
texts as faithfully as possible into good English, without the end of each section on a tablet. Thus the Babylonian
extensive editorial interpretation. Several remarks have procedures are genuine algorithms, and we can com-
mend the Babylonians for developing a nice way to ex-

672 Communications July 1972


of Volume 15
the ACM Number 7
plain an algorithm by example as the algorithm itself instances such as the following text (again from the
was being defined. British Museum), the style is somewhat different [5, p.
Here is another excerpt from the same tablet, this 19]:
time involving only a linear equation: The sum of length, width, and diagonal is 1,1 and 7 is the area.
A cistern. What are the corresponding length, width, and diagonal?
The length (in cubits) equals the height (in gars, where 1 gar = The quantities are unknown.
12 cubits). 1,10 times 1,10is 1,21,40.
A certain volume of dirt has been excavated. 7 times 2 is 14.
The cross-sectional area (in square cubits) plus this volume (in Take 14 from 1,21,40 and 1,7,40 remains.
cubic cubits) comes to 1,10 (namely 1-~). 1,7,40 times 30 is 33,50.
The length is 30 (namely ½cubit). What is the width? By what should 1,10 be multiplied to obtain 33,50?
You should multiply the length, 30, by 12, obtaining 6; this is the 1,10 times 29 is 33,50.
height (in cubits instead of gars). 29 is the diagonal.
Add 1 to 6, and you get 7. The sum of length, width, and diagonal is 12 and 12 is the area.
The reciprocal of 7 does not exist; what will give 1,10 when What are the corresponding length, width, and diagonal?
multiplied by 7? 10 will. The quantities are unknown.
(Hence 10, namely ~, is the cross-sectional area in square cubits.) 12 times 12 is 2,24.
Take the reciprocal of 30, obtaining 2. 12 times 2 is 24.
Multiply 10 by 2, obtaining the width, 20 (namely xa cubit). Take 24 from 2,24 and 2 remains.
This is the procedure. 2 times 30 is 1.
Note the interesting way in which the Babylonians dis- By what should 12 by multiplied to obtain 17
12 times 5 is 1.
regarded units, blithely adding area to volume; similar 5 is the diagonal.
examples abound, showing that the n u m e r i c a l algebra
was of primary importance to them, not the physical or The sum of length, width, and diagonal is 1 and 5 is the area.
Multiply length, width, and diagonal times length, width, and
geometrical significance of the problems. At the same diagonal.
time they used conventional units of measure (cubits, Multiply the area by 2.
even " g a r s " and the understood relation between gars Subtract the products and multiply what is left by one-half.
and cubits), in order to set the scale factors for the By what should the sum of length, width, and diagonal be
parameters. And they " a p p l i e d " their results to practical multiplied to obtain this product?
The diagonal is the factor.
things like cisterns, perhaps because this would m a k e
their work appear to be socially relevant. This text comes from the considerably later "Seleucid"
In this p r o b l e m it was necessary to divide by 7, but period of Babylonian history (see below), which may
the reciprocal of 7 didn't appear on the tables because account for the difference in style. It treats a problem
it has no finite reciprocal. (There is an infinite repeating based on the rather remarkable formula
expansion 1/7 - 8,34,17,8,34,17,..., but we have no
d -- ½((1 q- w q- d) ~ - 2 A ) / ( l q- w W d),
evidence that the Babylonians knew this.) In such cases
where the reciprocal table was of no avail, the text where
always says, in effect, " W h a t shall I multiply by a in A = lw is the area of a rectangle,
order to obtain b?" and then the answer is given. This d -- x / ( l 2 -b w2) is the length of its diagonals.
wording indicates that a multiplication table is to
be used backwards; for example, the calculation of (There is ample evidence from other texts that the Old-
11,40 - 35 = 20 [3, p. 329] could be read off from a Babylonian mathematicians knew the so-called Pythago-
multiplication table. F o r more difficult divisions, e.g. rean theorem, over I000 years before the time of
1,26,40 - 43,20 = 15 [3, p. 246; 5, p. 8], a slightly Pythagoras.) The first two sections quoted above work
different wording was used, indicating perhaps that a out the problem for the cases (1, w, d) = (20, 21, 29) and
special division procedure was employed in such cases. (3, 4, 5) respectively, but without calculating l and w;
At any rate we k n o w that the Babylonians were able to we know from other texts that the solution t o x -b y = a,
x ~ q_ y2 = b was well known in ancient times. The de-
compute
scription of the calculation in these two sections is un-
7 + 2,6; 2 8 , 2 0 - - 17; 10,12,45 + 40,51; usually terse, not naming the quantities it is dealing
and so on. One Old-Babylonian table of reciprocals is with. On the other hand, the third section gives the s a m e
known that gives reciprocals of irregular numbers to procedure entirely w i t h o u t numbers. The reason for this
three sexagesimal places, but it is not extremely accurate may be the fact that the stated parameters 1 and 5 can-
[3, p. 16]. not possibly correspond to the length-F width-Fdiagonal
and the area, respectively, of any rectangle, no matter
what powers of 60 are attached! Viewed in this light,
Further Examples teachers of computer science will recognize that the
above text reads very much like the solution to an ex-
We have noted that general algorithms were usually amination in which an impossible problem has been
given, accompanied by a sample calculation. In rare posed. (Note also that the second section follows the

673 Communications July 1972


of Volume 15
the ACM Number 7
general procedure, as stated in the third section, very and these operations were clearly u n d e r s t o o d by the
faithfully when it c o m e s to dividing 1 by 12, instead o f B a b y l o n i a n mathematicians; but the rules were ap-
using the reciprocal o f 12.) parently never written down. N o examples showing in-
Instances o f algorithms without a c c o m p a n y i n g num- termediate steps in multiplication have been found.
bers are very rare; here is a n o t h e r one, this time an The following interesting example dealing with c o m -
O l d - B a b y l o n i a n text f r o m the L o u v r e [4, p. 39; 8, p. 71]: p o u n d interest, taken f r o m the Berlin M u s e u m collec-
Length and width is to be equal to the area. tion, is one o f the few examples o f a "DO I = 1 TO N" in
You should proceed as follows• the B a b y l o n i a n tablets that have been excavated so far
Make two copies of one parameter• [3, pp. 353-365; 4, Tables 32, 56, 57; 5, p. 59; 8, pp.
Subtract 1. 118-120]:
Form the reciprocal.
Multiply by the parameter you copied. I invested 1 maneh of silver, at a rate of 12 shekels per maneh (per
This gives the width, year, with interest apparently compounded every five years).
I received, as capital plus interest, 1 talent and 4 manehs.
I n other words, if x + y = xy, it is possible to c o m p u t e (Here 1 maneh = 60 shekels, and 1 talent = 60 manehs.)
y by the p r o c e d u r e y = (x - 1) -1 x. The fact that no How many years did this take?
n u m b e r s are given m a d e this passage particularly h a r d Let 1 be the initial capital.
Let 1 maneh earn 12 (shekels) interest in a 6 (= 360) day year.
to decipher, and it was not properly u n d e r s t o o d for
And let 1,4 be the capital plus interest.
m a n y years (see [9, pp. 73-74]); hence we can see the Compute 12, the interest, per 1 unit of initial capital, giving 12
advantages o f numerical examples. as the interest rate.
The above p r o c e d u r e reads surprisingly like a pro- Multiply 12 by 5 years, giving 1.
g r a m for a " s t a c k m a c h i n e " like the B u r r o u g h s B5500l Thus in five years the interest will equal the initial capital.
Add 1, the five-year interest, to 1, the initial capital, obtaining 2.
N o t e that b o t h in this example and in the very first one
Form the reciprocal of 2, obtaining 30.
we discussed we are told to m a k e two copies o f some Multiply 30 by 1,4, the sum of capital plus interest, obtaining 32.
n u m b e r ; this indicates that actual numerical calcula- Find the inverse of 2, obtaining 1. (The" inverse" here means the
tions generally destroyed the o p e r a n d s in the process o f logarithm to base 2; in other problems it stands for the value
finding a result. Similarly we find in other texts the in- of n such that a given valuef(n) appears in some table.)
Form the reciprocal of 2, obtaining 30.
struction to " K e e p this n u m b e r in y o u r h e a d " [6, pp.
Multiply 30 by 30 (the latter 30 apparently stands for 32 -- 2, for
50-51], a remarkable parallelism with t o d a y ' s notion otherwise the 32 would never be used and the rest of the
t h a t a c o m p u t e r stores n u m b e r s in its " m e m o r y . " In calculation would make no sense), obtaining 15 ( = total
a n o t h e r place we read, in essence, " R e p l a c e the s u m o f interest without initial capital if the investment had been
length and width by 30 times itself" [3, p. 114], an cashed in five years earlier).
Add 1 to 15, obtaining 16.
ancient version of the assignment statement " x : = x/2".
Find the inverse of 16, obtaining 4.
Add the two inverses 4 and 1, obtaining 5.
Multiply 5 by 5 years, obtaining 25.
Conditionals and Iterations Add another 5 years, making 30.
Thus, after the 30th year the initial capital and its interest will
be 1,4.
So far we have seen only "straight-line" calculations,
. . . (Here about 4 lines of the text have broken off. Apparently
without any branching or decision-making involved. In there is now a question of checking the previous answer.)
order to construct algorithms that are really nontrivial • . . giving 12 as the interest rate.
f r o m a c o m p u t e r scientist's point o f view, we need to Multiply 12 by 5 years, giving 1.
have some operations that affect the flow o f control. Thus in five years the interest will equal the initial capital•
Add 1, the five-year interest, to 1, the initial capital, obtaining 2,
But alas, there is very little evidence o f this in the the capital and its interest after the fifth year.
B a b y l o n i a n texts. The only thing resembling a condi- Add 5 years to the 5 years, obtaining 10 years.
tional b r a n c h is implicit in the operation of division, Double 2, the capital and its interest, obtaining 4, the capital
where the calculation proceeds a little differently if the and its interest after the tenth year.
reciprocal o f the divisor does not appear in the table. Add 5 years to the 10 years, obtaining 15 years.
Double 4, the capital and its interest, obtaining 8, the capital
W e d o n ' t find tests like " G o to step 4 if x < 0",
and its interest after the fifteenth year.
because the Babylonians d i d n ' t have negative numbers; Add 5 years to the 15 years, obtaining 20 years.
we d o n ' t even find conditional tests like " G o to step 5 Double 8, obtaining 16, the capital and its interest after the
if x = 0", because they d i d n ' t treat zero as a n u m b e r twentieth year.
either! Instead of ha,~ing such tests, there would effec- Add 5 years to the 20 years, obtaining 25 years.
Double 16, the capital and its interest, obtaining 32, the capital
tively be separate algorithms for the different cases. ( F o r and its interest after the twenty-fifth year.
example, see [3, pp. 312-314] for a case in which one Add 5 years to the 25 years, obtaining 30 years.
algorithm is step-by-step the same as another, but sim- Double 32, the capital and its interest, obtaining I, 4, the capital
plified since one o f the parameters is zero.) and its interest after the thirtieth year.
N o r are there m a n y instances o f iteration. The basic This long-winded and rather clumsy p r o c e d u r e reads
operations underlying the multiplication o f high-preci- almost like a m a c r o expansion !
sion sexagesimal n u m b e r s obviously involve iteration,

674 Communications July 1972


of Volume 15
the ACM Number 7
A more sophisticated example involving c o m p o u n d ample, a symbol for zero was now used within numbers,
interest appears in another section of the Louvre tablet instead of the blank space that formerly appeared. The
quoted earlier. The same usurious rate of interest (20 following excerpts from a text in the Louvre Museum [3,
percent per annum) occurs, but now c o m p o u n d e d an- pp. 96-103; 8, p. 76] indicate some of the other ad-
nually: vances:
One kur (of grain) has been invested; after how many years will From 1 to 10, sum the powers (literally the "ladder") of 2.
the interest be equal to the initial capital? The last term you add is 8,32.
You should proceed as follows. Subtract 1 from 8,32, obtaining 8,31.
Compound the interest for four years. Add 8,31 to 8,32, obtaining the answer 17,3.
The combined total (capital + interest) exceeds 2 kur.
What can the excess of this total over the capital plus interest
The squares from 1 X 1 = 1 to 10 X 10 = 1,40; what is their
for three years be multiplied by in order to give the four-year
sum?
total minus 2?
Multiply 1 by 20, namely by one-third, giving20.
2,33,20 (months).
Multiply 10 by 40, namely by two thirds, giving 6,40.
From four years, subtract 2,33,20 (months), to obtain the desired
6,40 plus 20 is 7.
number of full years and days. Multiply 7 by 55 (which is the sum of 1 through 10), obtaining
Translated into decimal notation, the problem is to de- 6,25.
6,25 is the desired sum.
termine how long it would take to double an investment.
Since Here we have correct formulas for the sum of a geo-
metric series
1.728 = 1.23 < 2 < 1.24 = 2.0736,

the answer lies somewhere between three and four years. ~-~2 k = 2 n q - ( 2 " - - 1)
k=l
The growth is linear in any one year, so the answer is
and for the sum of a quadratic series
1.24 -- 2 33 20
1.24 _ 1.23 X 12 = 2 q- ~ q- 36---~
kffil ~n k.
months less than four years. This is exactly what was
computed [5, p. 63]. These formulas have not been found in Old-Babylonian
Note that here we have a problem with a nontrivial texts.
iteration, like a "WHILE" clause: The procedure is to Moreover, this same Seleucid tablet shows an in-
form powers of I q- r, where r is the interest rate, until creased virtuosity in calculation; for example, the roots
finding the first value of n such that (1 + r)" >_ 2; then to complicated equations like
calculate
xy= 1, xq-y= 2,0,0,33,20
12((1 -F r)" -- 2)/((1 q- r)" -- (1 -Jr- r)"-1),
(solution: x = 1,0,45 and y = 59,15,33,20) are com-
and the answer is that the original investment will puted. Perhaps this problem was designed to demon-
double in n years minus this many months. strate the use of the new zero symbol.
This procedure suggests that the Babylonians were An extremely impressive example of the Seleucid era
familiar with the idea of linear interpolation. Therefore calculating ability appears in another Louvre Museum
the trigonometric tables in the famous "Plimpton tab- tablet [3, pp. 14-22]. It is a 6-place table of reciprocals,
let" [6, p. 38-41] were possibly used to obtain sines and which begins thus:
cosines in a similar way.
By the power of Anu and Antum, whatever I have made with my
hands, let it remain intact.

The Seleueids Reciprocal 1 is 1


Reciprocal 1,0,16,53,53,20 59,43,10,50,52,48
Old-Babylonian mathematics has several other in- Reciprocal 1,0,40, 53,20 59,19,34,13,7,30
teresting aspects, but a more elaborate discussion is be- Reciprocal 1,0,45 59,15,33,20
yond the scope of this paper. Very few tablets have been and so on, ending with
found that were written after 1,600 B.c., until approxi-
mately 300 B.c. when Mesopotamia became part of the 2,57,8,49,12 20,19,19,34,45,35,48,8,53,20
Reciprocal 2,57,46,40 20,15
empire of Alexander the Great's successors, the "Seleu- 2,59,21,40,48,54 20,4,16,22,28,44,14,57,40,4,
cids." A great number of tablets from the Seleucid era 56,17,46,40
have been found, mostly dealing with astronomy which
was highly developed. A very few pure mathematical Reciprocal 3 is 20
texts of this era have also been found; these tablets First part; results for 1 and 2, incomplete.
indicate that the Old-Babylonian mathematical tradition Table of Nidintum-Anu, son of Inakibit-Anu, son of Kuzu,
did not die out during the intervening centuries. priests of Anu and Antum in Uruk. Author Inakibit-Anu.
Indeed, some noticeable progress was made; for ex-

675 Communications July 1972


of Volume 15
the ACM Number 7
Apparently Inakibit-Anu (whom we shall call Inaki- matics by A.A. Aaboe [1] can be recommended, as can
bit for short) was the author of this remarkable table; B.L. van der Waerden's well-known treatise Science
and his son made a copy. Another tablet or tablets, now Awakening [9]. Much of the deciphering of Babylonian
lost, continued the table to numbers beginning with mathematical texts was originally due to Otto Neuge-
3, 4, .... bauer, who has written an authoritative popular ac-
There are exactly 231 sexagesimal numbers of six count The Exact Sciences in Antiquity [7]; see especially
digits (i.e. six sexagesimal places) or less which have a his fascinating discussion, pp. 59-63; 103-105, of the
finite reciprocal and which begin with 1 or 2. This table problems that plague historical researchers in this field.
contains every one of them, without exception. And 20 For more detailed study, it is fun to read the original
further entries, giving reciprocals of numbers that have source material. Neugebauer published the texts of all
more than six digits, are also included. It is not clear known mathematical tablets,, together with German
how these 20 extra numbers were selected. (See the Ap- translations, in a comprehensive series of three volumes,
pendix to this paper for further discussion.) during the period 1935-1937 [3, 4, 5]. A French edition
How did Inakibit prepare this table? The simplest of the texts [8] was published in 1938 by F. Thureau-
procedure would be to start with the pair of numbers Dangin, an eminent Assyriologist. Then in 1945, Neuge-
(1, 1) and then to go repeatedly from (x, y) to (2x, 30y), bauer and A. Sachs published a supplementary volume
(3x, 20y), and (5x, 12y) until no more numbers x of six [6], which includes all mathematical tablets discovered
or less digits are possible. (In fact this procedure can be in the meantime (mostly in American museums). The
simplified further if we note that only those values x of Neugebauer-Sachs volume is written in English, but un-
the form 2~3~'5k need to be considered where either i _< 1 fortunately these tablets are not quite as interesting as
o r j --- 0 or k = 0; other numbers are multiples of 60.) the ones in Neugebauer's original German series. A list
There is some evidence that this is exactly what he did; of developments since 1945 appears in [7, p. 49].
for example, several tables are known that start with Most of the Babylonian mathematical tablets have
some pair of reciprocals and then repeatedly apply one never been translated into English. The translations
of these three operations [6, p. 13-16]. An even more above have been made by comparing the German of
convincing argument for this hypothesis is that Inaki- [3, 4, 5] with the French [8]; but these two versions ac-
bit's values for 3-22 and 3-23 are both wrong; and most of tually differ in many details, so the Akkadian and
the errors in 3-2a are accounted for if we assume that he Sumerian vocabularies published in [4, 8, 6] have been
calculated 3-~8 from the incorrect value of 3-~2. consulted in an attempt to give an accurate rendition.
The complete table requires that 721 pairs (x, y) Since only a tiny fraction of the total number of clay
must be generated, and of course it is very laborious to tablets has survived the centuries, it is obvious that we
work with such high-precision numbers. Moreover, even cannot pretend to understand the full extent of Babylo-
after all these pairs (x, y) have been computed, the work nian mathematics. Neugebauer points out that the job
is far from done; it is still necessary to sort them into of discovering what they knew is something like trying
order ! Inakibit's table is the earliest known example of a to reconstruct all of modern mathematics from a few
large file that has been sorted; and this is one of the pages that have been randomly torn out of the books in
reasons his work is so impressive, as anyone who has a modern library. We can only place "lower bounds" on
tried to sort over 700 cards by hand will attest. To get the scope of Babylonian achievements, and speculate
some idea of the immensity of this task, consider that it about what they did not know.
takes many hours to sort 700 large numbers by hand What about other ancient developments? The Egyp-
nowadays; imagine how difficult it must have been to do tians were not bad at mathematics, and archeologists
this job in ancient times! Yet Inakibit must have done it, have dug up some old papyri that are almost as old as
since there is no simple way to generate such a table in the Babylonian tablets we have discussed. The Egyptian
order. (As we might expect, he made a few mistakes; method of multiplication, based essentially in the binary
there are three pairs of lines which should be inter- number system (although their calculations were deci-
changed to bring the table into perfect order.) mal, using something like Roman numerals)~ is espe-
Thus, Inakibit seems to have the distinction of being cially interesting; but in other respects, their use of
the first man in history to solve a computational prob- awkward "unit fractions" left them far behind the
lem that takes longer than one second of time on a Babylonians. Then came the Greeks, with an emphasis
modern electronic computer ! on geometry but also on such things as Euclid's al-
gorithm; the latter is the oldest nontrivial algorithm
which still is important to computer programmers. (See
Suggestions for Further Reading [7, 9] for the history of Egyptian mathematics, and [1, 7,
9] for Greek mathematics. A free translation of Euclid's
If you have been captivated by Babylonian mathe- algorithm in his own words, together with his incom-
matics, there are several good books on the subject plete proof of its correctness, appears in [2, p. 294-296].)
which give further interesting details. The short intro- And then there are the Indians, and the Chinese; it is
ductory text Episodes from the Early History of Mathe- clear that much more can be told.

676 Communications July 1972


of Volume 15
the ACM Number 7
Acknowledgment. I wish to thank Professor Abra-
ham Seidenberg for his courtesy in helping me obtain
copies of [3] and [8] when I needed them.

Appendix
The 20 additional entries included in Inakibit's table are some-
what mysterious. In 19 of the cases, the number has a reciprocal
with six digits or less; the exception is 3z3 = 2,1,4,8,3,0, 7, whose
reciprocal has 17 sexagesimal digits.
Let's say that a sexagesimal number is a Q-number if it has
six or less digits, while its reciprocal is finite and has more than
six digits and begins with 1 or 2. There are 132 Q-numbers in
all, only 19 of which appear in Inakibit's table. Five of these are
217, 223, 311, 3TM, and 32z; they constitute all Q-numbers of the forms
2~, 3., or 5~, and it is likely that such numbers appeared in special
tables. However, the Q-number 611 is not included, so it is not
simply a matter of perfect powers being included. The three-
digit Q-numbers 2131° and 2239 are excluded, so it not a matter of
including the smallest cases. The Q-numbers which do appear,
besides the five listed above, are 3951, 3105a, 31155; 213951, 2131'52,
213135a (but not 2131554); 31851, 2339, 2731°, 212311, 2183TM, 2203~, 29259,
2'2452. It is perhaps noteworthy that 31153 does not appear, but its
multiple 3u5 ~ does.
Since so many Q-numbers are missing, we may conclude that
Inakibit continued his table by giving the reciprocals of all six-
digit numbers up to 59,43,10, 50, 52,48, not taking advantage of
symmetry. Hence the complete table contained the reciprocals of
at least 721 six-digit numbers, and it probably filled three clay
tablets in all.

References


Aaboe, Asger A. Episodesfrom the Early History of Mathematics.
Random House, New York, 1964, 133 pp.
2.
Knuth, Donald E. Seminumerical Algorithms. Addison-Wesley,
Reading, Mass., 1971 (second printing), 624 pp.
3.
Neugebauer, O. Mathematische keilschrift-texte. In Quellen und
Studien zur Geschichte der Mathematik, Astronomie, und
Physik, Vol. A3, Pt. 1, 1935, 516 pp.
4.
Neugebauer, O. Mathematische keilschrift-texte. In Quellen und
Studien zur Geschichte der Mathematik, Astronomie, und
Physik, Vol. A3, Pt. 2, 1935, 64 pp. plus 69 reproductions of
tablets.
5.
Neugebauer, O. Mathematische keilschrift-texte. In Quellen und
Studien zur Geschiehte der Mathematik, Astronomie, und
Physik, Vol. A3, Pt. 3, 1937, 83 pp. plus 6 reproductions of
tablets.
6.
Neugebauer, O., and Sachs, A. Mathematical Cuneiform Texts.
American Oriental Society, New Haven, Conn., 1945, 177 pp.
plus 49 reproductions of tablets.
7.
Neugebauer, O. The Exact Sciences in Antiquity. Brown U. Press,
Providence, R.I., 1957 (second ed.), 240 pp. plus 14
photographic plates.
8.
Thureau-Dangin, F. Textes Math~matiques Babyloniens. E.J.
Brill, Leiden, The Netherlands, 1938, 243 pp.
9.
van der Waerden, B.L. Science Awakening. Tr. by Arnold Dresden.
P. Noordhoff, Groningen, The Netherlands, 1954, 306 pp.

677 Communications July 1972


of Volume 15
the ACM Number 7

You might also like