Professional Documents
Culture Documents
The Collected Works of Professor Donald Knuth
The Collected Works of Professor Donald Knuth
The Collected Works of Professor Donald Knuth
Donald Ervin Knuth is an American computer scientist, mathematician, and professor emeritus at Stanford
University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer
science. He is the author of the multi-volume work The Art of Computer Programming.
Semantics of Context-Free Languages
by
DONALD E. KNUTH
California Institute of Technology
ABSTRACT
127
MATHEMATICAL SYSTEMS THEORY, Vol. 2, No. 2
Published by Springer-Verlag New York Inc.
128 DONALD E. KNUTH
the following context-free grammar:
B ---> 0
B ---> l
L--> B
(1.1) L --+LB
N---~ L
N---~ L . L
(Here the terminal symbols a r e . , 0, and 1; the nonterminal symbols are
B, L, and N, standing respectively for bit, list of bits, and n u m b e r ; and a
binary n u m b e r is i n t e n d e d to be any string of terminal symbols which can
be obtained f r o m N by application of the above productions.) This gram-
mar says in effect that a binary n u m b e r is a sequence of one or more O's
and l's, optionally followed by a radix point and a n o t h e r sequence o f one
or more O's and l's. F u r t h e r m o r e , the g r a m m a r assigns a certain tree struc-
ture to each binary n u m b e r ; for example, the string 1101 • 01 receives the
following structure:
N
/\
L L
/\ /\
L B L B
(1.2)
/\ I I I
L B 1 B 1
/\ I I
L B 0 0
1 I
B 1
I
1
B-~ 0 v(B) = o
B---~ 1 v(B)= 1
L---~ B v(L) = v(B), l(L) = 1
(1.3)
L 1 ---} L2B v(L1) = 2v(L2) + v(B), I(L1) = I(L2) + 1
N'-* L v(N) = v(L)
N ' - * L1 • L2 v ( N ) = v(LO + v(L2)/2 uL2~
(In the fourth and sixth rules subscripts have been used to distinguish
between occurrences of like nonterminals.) Here the semantic rules define
all of the attributes of a nonterminal in terms of the atrributes of its im-
mediate descendants, so ultimately values are defined for each attribute.
The semantic rules are phrased in terms of notations which are assumed to
be already understood. Notice for example that the symbol "0" in the
semantic rule "v(B) = 0" is to be interpreted quite differently from the
symbol "0" in the production "B ~ 0"; the former denotes a mathematical
concept, the integer zero, while the latter denotes a written character which
has a certain elliptical shape. In a sense it is just coincidence that the two
symbols look the same.
The structure (1.2) may be augmented by showing the attributes at
each level:
N(v _ 13.25)
y d, k2 ) ,(VT,)
L(v/= 3, l = 2) B(v=. O) 1 =0)
.(y=,
1
o i
B(~ = 1) 1
!
1
Thus "1101 • 01" means 13.25 (in decimal notation).
This manner of defining semantics for context-free languages is es-
sentially well known, since it has already been used by several authors. But
there is an important way to extend this method, and it is this extension
which will be of primary interest to us.
Suppose for example that we want to define the semantics of binary
notation in a different way corresponding more closely to the manner in
which we usually think of the notation. T h e leading "1" in "1101 • 01"
really denotes 8, although according to (1.4)it is ascribed the value 1. Per-
haps therefore it would be better to define the semantics in such a way that
130 DONALD E. KNUTH
( H e r e the semantic rules are listed using the convention that the right-
h a n d side o f each equation is the definition o f the left-hand side; thus,
"s(B) = s(L)" says that s(L) is to be evaluated first, t h e n s(B) is defined to
have this same value.)
T h e i m p o r t a n t feature o f g r a m m a r (1.5) is that some o f the attributes
are defined for nonterminals which a p p e a r on the right side o f the corre-
s p o n d i n g production, while in (1.3) all attributes were defined when the
n o n t e r m i n a l a p p e a r e d on the left side. H e r e we are using both synthesized
attributes (which are based on the attributes o f the descendants o f the non-
terminal symbol) and inherited attributes (which are based on the attributes
o f the ancestors). Synthesized attributes are evaluated f r o m the b o t t o m u p
in the tree structure, while inherited attributes are evaluated f r o m the top
down. G r a m m a r (1.5) contains the synthesized attributes v(B), v(L), l(L),
v ( N ) and also the inherited attributes s(B) and s(L), so the evaluation in-
volves going in both directions. T h e evaluated structure c o r r e s p o n d i n g to
the string 1101 • 01 is
Semantics of Context-Free Languages 131
~/
i
v = 12, l = 3, s = l~ B(v = 1, s = O)
\
\.
L(v = O, l= 1, s = - 1 ) B(v=
\ .25, s = - 2 ) I
L(v = 12, 1 = 2 , s = 2) B(v = O, s ~ 1) 1 B(vTO, s = - I ) 1
LIV T 8, I= I, s= 3~)B(v T 4, s= 2~) 0 0
B(v ~ 8, s = 3) 1
1
Here it can be noted that the "length" attributes of the L's to the right of
the radix point must be evaluated from the bottom up before the "scale"
attributes can be evaluated (from the top down) and finally the "value"
attributes (from the bottom up).
Grammar (1.5) is probably not the "best possible" grammar for binary
notation, but it does seem to correspond better to our intuition than gram-
mar (1.3). (A grammar which agrees more exactly with our conventional
understanding of binary notation could be based on a different set of pro-
duction rules which would assign another structure to the string of bits at
the right of the radix point; then the "length" attribute, which is not really
relevant, would be unnecessary.)
Our interest in g r a m m a r (1.5) is not that it is an ideal definition of
binary notation, but rather that it shows an interaction between inherited
and synthesized attributes. It is not always obvious when semantic rules
such as those in (1.5) do not amount to a circular definition, because the
attributes are not evaluated in a single direction; an algorithm which tests
for circularity appears later in this paper.
The importance of inherited attributes is that they arise naturally in
practice and that they are "dual" to synthesized attributes in a straight-
forward manner. Although binary notation can be formulated using
nothing but synthesized attributes, there are many languages for which
such a restriction leads to a very awkward and unnatural definition of
semantics. Situations which involve a mixture of inherited and synthesized
attributes are essentially the same as the cases which have been most dif-
ficult to handle in previous formulations of semantic rules.
2. Formal properties. Let us now put the ideas of synthesized and in-
herited attributes into a more precise and more general setting.
Suppose we have a context-free grammar f~ = (V, N, S, 6~), where V is
the (finite) vocabulary of terminal and nonterminal symbols; N C V is the
set of nonterminal symbols; S • N is the "start symbol", which appears on
the right-hand side of no production rule; and ~ is the set of production
rules. Semantic rules are added to fg in the following manner: To each
132 DONALDE. KNUTH
symbol X • V we associate a finite set `4(X) o f attributes; `4(X) is p a r t i t i o n e d
into two disjoint sets, the synthesized attributes Ao(X) a n d the i n h e r i t e d
attributes A~(X). We r e q u i r e AI(S) to be e m p t y (i.e., the start symbol S has
no i n h e r i t e d attributes); similarly we r e q u i r e Ao(X) to be e m p t y if X is a
t e r m i n a l symbol. Each attribute a in A(X) has a (possibly infinite) set o f
possible values V~, f r o m which o n e value will be selected (by m e a n s o f the
semantic rules) for each a p p e a r a n c e o f X in a derivation tree.
Let ~ consist o f m productions, a n d let the p-th p r o d u c t i o n be
(2.2)
Xpl Xp2 ° . . Xpn p
o3: o4:
(3.2)
"*v(B) #s(B) ov(L2) *'I(L2) ds(L2) "*v(B)-'°s(B)
(3.3)
/X.o\
Y--1 "'" 3-,p
for some p, where ~--j is a derivation tree with Xp~ as the label of the root,
for I ~ j ~ np. In the former case we will say ~-- is a derivation tree of type
0, and in the latter case we will say ~-- is a derivation tree oftypep; according
to the definition, D(~--) is obtained in this case from Dp, D(J-1), • • • ,
D(3",p) by identifying the vertices for attributes o f Xpj with the correspond-
ing vertices for the attributes of the root of 3-j in D(~--j), 1 <~j <~ np.
136 DONALD E. KNUTH
(3.4) D p [ G . • • • , Grip]
be the directed g r a p h obtained f r o m D v by a d d i n g an arc f r o m (Xpj, or) to
(Xp¢, a ' ) w h e n e v e r t h e r e is an arc f r o m a to a ' in Gj. For example, if we have
vl s vs
GI = e ~ • , G2 = T,~•
a n d if D 4 is the directed g r a p h a p p e a r i n g in (3.2), t h e n D4[G1, G2] is
there should be an arc f r o m a to a ' in D~." It is clear that this process must
ultimately t e r m i n a t e with n o m o r e arcs a d d e d , since only finitely m a n y arcs
are possible in all.
In the case o f g r a m m a r (1.5), this algorithm begins with
V $ V 8 V l S
v l s v v
D'4 ~-- • • • D'5 = • D'6 = •
a n d adds arcs until finally we have
V°$ ~ 8 V l $
(It is hoped that the reader will find this programming language sufficiently
self-explanatory that he understands it before any formal definition of the
language is given, although of course this is not necessary. The above pro-
gram is not intended as an example of good programming, rather as an ex-
ample of the features of the simple language considered in this section.)
Since every programming language must have a name, let us call the
language Turingol. Any well-formed Turingol program defines a program
for a T u r i n g machine; let us say a Turing machine program consists of
a set Q of "states";
a set Z of "symbols";
an "initial state" qo E Q;
a "final state" q® ~ Q;
(4.2) " " " , a-3, a-2, a-l, ao, al, a2, a3, • " "
Start symbol: p
Attributes:
Name of attribute Type of value Purpose
Q Set States of the program
Set Symbols of the program
qo Element of Q Initial state
q= Element of Q Final state
Function from (Q-q=) × ~i, Transition function
intoX x { - 1 , 0 , + 1 } XQ
label Function from strings of State table for statement
letters into elements of Q labels
symbol Function from strings of Symbol table to tape
letters into elements of S symbols
follow Element of Q State immediately follow-
ing statement or list of
statements
d +1 Direction
text String of letter's Identifier
start Element of Q State at the beginning of a
statement or list of state-
ments (an inherited
attribute).
Productions and semantics: See Table 1.
Notice that two states correspond to each statement S: start (S) is the
state corresponding to the first instruction of the statement (if any), and it
is an inherited attribute of S; follow (S) is the state which "follows" the state-
ment, the state which is normally reached after the statement is executed.
In the case of a "go statement", however, the program does not transfer to
follow (S), since the action of the statement is to change control to another
place; follow (S) may be said to follow statement S "statically" or "textually",
not "dynamically" during a run of the program.
In Table 1, follow (S) is a synthesized attribute; it is possible to give
similar semantic rules in which follow (S) is inherited, although a less effi-
cient program would be obtained for null statements (see Rule 4.4). Simi-
larly, both start (S) and follow (S) could be synthesized attributes, but at the
expense of additional instructions in the Turing machine program for
statement lists (Rule 6.2).
This example would be somewhat simpler if we had used a less standard
definition of Turing machine instructions. The definition we have used
requires reading, printing, and shifting in each instruction, and also makes
the Turing machine into a kind of "one-plus-one-address computer" in
which each instruction specifies the location (state) of the next instruction.
Semantics o f C o n t e x t - F r e e L a n g u a g e s 141
Table 1.
The output of this program is 1, 2, 2, 3! Oversights such as this are not un-
expected when an algorithmic definition of a language is constructed; they
are less likely to occur when the methods of Section 4 are employed.
It appears to be reasonable to assert that none of the previous schemes
for formal definition of semantics could produce a definition of Turingol
that is as brief or as easy to comprehend as the definition given above; and
(although the details have not of course been worked out) it also appears
that ALGOL 60, Euler, MicrO-ALGOL, and PL/I can be defined using the
methods of Section 4 in a manner which has advantages over the defini-
tions previously given. But of course the author cannot judge these things
impartially, and more experience is needed before these claims can be
substantiated.
Notice that semantic rules as given in this paper do not depend on any
particular form of syntactic analysis. In fact, they need not even be tied
down to specific forms of the syntax: All that the semantic rules depend on
is the name of the nonterminal symbol on the left of a production and the
names of the nonterminals on the right. Particular punctuation marks, and
the order in which the nonterminals appear on the right-hand side of any
production, are immaterial as far as the semantic rules are concerned.
Thus, the method of semantics considered here blends well with
McCarthy's idea [12, 13] of "abstract syntax".
When a syntax is ambiguous, in the sense that some strings of the
language have more than one derivation tree, the semantic rules give us
one "meaning" for each derivation tree. For example, suppose the rules
L1 ~ BL2 v(L1) = 2t(L2)v(B) q- v(L2), l(Li) =/(L2) + 1
are added to grammar (1.3). T h e n the grammar becomes syntactically
ambiguous; but it still is semantically unambiguous since the attribute
v(N) has the same value over all derivation trees. On the other hand, if we
were to change production 5.2 of Turingol from S ~ I: S to S ~ S: I, the
grammar would become syntactically and semantically ambiguous.
REFERENCES
[7] EDGART. IRONS, Towards more versatile mechanical translators, Proc. Sympos. Appl.
Math., Vol. 15, pp. 41-50, Amer. Math. Soc., Providence, R. I., 1963.
[8] DONALDE. KNUTH,The Art of Computer Programming, I, Addison-Wesley, 1968.
[9] P.J. LANDXN,"The mechanical evaluation of expressions," Comp.J. 6 (1964), 308-320.
[10] P.J. LANDIN,A formal description of ALGOL60, Formal Language Description Languages
for Computer Programming, pp. 266-294, Proc. IFIP Working Conf., Vienna, (1964),
North H.olland, 1966.
[11] P. J. LANDIN, A correspondence between ALGOL60 and Church's lambda notation,
Comm. ACM 8 (1965), 89-101, 158-165.
[12] JOHN MCCARTHY,A formal definition of a subset of ALGOL,Formal Language Descrip-
tion Languages for Computer Programming, pp. 1-12, Proc. IFIP Working Conf.,
Vienna (1964), North Holland, 1966.
[13] JOHN MCCARTHYand JAMESPAmTrR, Correctness of a compiler for arithmetic expres-
sions, Proc. Sympos. Appl. Math., Vol. 17, to appear, Amer. Math. Soc., Providence,
R. I., 1967.
[14] ROBERT M. McCLUm~, T M G - A syntax directed compiler, Proc. ACM Nat. Conf. 20
(1965), 262-274.
[ 15] PL/I Definition Group of the Vienna Laboratory, Formal definition of PL/L IBM Techni-
cal Report TR 25.071 (1966).
[16] NXKLAUSWroTH and HELMUTWEBER, Euler: A generalization of ALGOL,and its for-
mal definition, Comm. ACM 9 (1966), 11-23, 89-99,878.
This year I had the chance and the honor of interviewing Professor Knuth. I’m proud, as a journalist and
FSM’s TeX-nician, to see it published in what I consider “my magazine”.
Prof. Knuth while reading one of the magazines typeset by his program TeX. Photo by Jill Knuth (she
is a graduate of Flora Stone Mather College (FSM))
Donald E. Knuth, Professor Emeritus of the Art of Computer Programming, Professor of (Concrete)
Mathematics, creator of TeX and METAFONT, author of several fantastic books (such as Computers
and Typesetting, The Art of Computer Programming, Concrete Mathematics) and articles, recipient of the
Turing Award, Kyoto Prize, and other important awards; fellow of the Royal Society (I could keep
going). Is there anything you feel you have wanted to master and haven’t? If so, why?
Thanks for your kind words, but really I’m constantly trying to learn new things in hopes that I can then help
teach them to others. I also wish I was able to understand non-English languages without so much difficulty;
I’m often limited by my linguistic incompetence, and I want to understand people from other cultures and
other eras.
Your algorithms are well known and well documented (I’ll only quote, for brevity’s sake, the
Knuth-Morris-Pratt String Matching algorithm), which allows everyone to use, study and improve
upon them freely. If it wasn’t clear through your actions, in an interview with Dr. Dobb’s Journal, you
stated your opinion about software patents, which are forcing people to pay fees if they either want to
I mention patents in several parts of The Art of Computer Programming. For example, when discussing one of
the first sorting methods to be patented, I say this:
Alas, we have reached the end of the era when the joy of discovering a new algorithm was
satisfaction enough! Fortunately the oscillating sort isn’t especially good; let’s hope that
community-minded folks who invent the best algorithms continue to make their ideas freely
available.
I don’t have time to follow current developments in the patent scene; but I fear that things continue to get
worse. I don’t think I would have been able to create TeX if the present climate had existed in the 1970s.
On my recent trip to Europe, people told me that the EU had wisely decided not to issue software patents. But
a day or two before I left, somebody said the politicians in Brussels had suddenly reversed that decision. I
hope that isn’t true, because I think today’s patent policies stifle innovation.
However, I am by no means an expert on such things; I’m just a scientist who writes about programming.
The portion of the paper where Knuth answered question 2 of this interview. Please note that he
typographically quotes and typesets even when writing by hand: the quotation from TAOCP and the
word TeX
So far you have written three volumes of The Art of Computer Programming, are working on the fourth,
are hoping to finish the fifth volume by 2010, and still plan to write volumes six and seven. Apart from
the Selected papers series, are there any other topics you feel you should write essays on, but haven’t got
time for? If so, can you summarize what subject you would write on?
I’m making slow but steady progress on volumes 4 and 5. I also have many notes for volumes 6 and 7, but
those books deal with less fundamental topics and I might find that there is little need for those books when I
get to that point.
I fear about 20 years of work are needed before I can bring TAOCP to a successful conclusion; and I’m 67
years old now; so I fondly hope that I will remain healthy and able to do a good job even as I get older and
more decrepit. Thankfully, at the moment I feel as good as ever.
A prime number of questions for the Professor Emeritus ofthe Art of Computer Programming 2
Interview with Donald E. Knuth
If I have time for anything else I would like to compose some music. Of course I don’t know if that would be
successful; I would keep it to myself if it isn’t any good. Still, I get an urge every once in awhile to try, and
computers are making such things easier.
There are rumours that you started the TeX project because you were tired of seeing your manuscripts
mistreated by the American Mathematical Society. At the same time, you stated that you created it
after seeing the proofs of your book The Art of Computer Programming. Please, tell our readers briefly
what made you decide to start the project, which tools you used, and how many people you had at the
core of the TeX team.
No, the math societies weren’t to blame for the sorry state of typography in 1975. It was the fact that the
printing industry had switched to new methods, and the new methods were designed to be fine for magazines
and newspapers and novels but not for science. Scientists didn’t have any economic clout, so nobody cared if
our books and papers looked good or bad.
I tell the whole story in Chapter 1 of my book Digital Typography, which of course is a book I hope
everybody will read and enjoy.
The tools I used were home grown and became known as Literate Programming. I am enormously biased
about Literate Programming, which is surely the greatest thing since sliced bread. I continue to use it to write
programs almost every day, and it helps me get efficient, robust, maintainable code much more successfully
than any other way I know. Of course, I realize that other people might find other approaches more to their
liking; but wow, I love the tools I’ve got now. I couldn’t have written difficult programs like the MMIX
meta-simulator at all if I hadn’t had Literate Programming; the task would have been too difficult.
At the core of the TeX team I had assistants who read the code I wrote, and who prepared printer drivers and
interfaces and ported to other systems. I had two students who invented algorithms for hyphenation and line
breaking. And I had many dozens of volunteers who met every Friday for several hours to help me make
decisions. But I wrote every line of code myself.
Chapter 10 of my book Literate Programming explains why I think a first-generation project like this would
have flopped if I had tried to delegate the work.
Portrait of Donald E. Knuth by Alexandra Drofina. Commercial users should write to Yura Revich
(revich@computerra.ru) for permission
Maybe you feel that some of today’s technologies are still unsatisfactory. If you weren’t busy writing
your masterpieces, what technology would you try to revolutionize and in what way?
A prime number of questions for the Professor Emeritus ofthe Art of Computer Programming 3
Interview with Donald E. Knuth
Well certainly I would try to work for world peace and justice. I tend to think of myself as a citizen of the
world; I am pleasantly excited when I see the world getting smaller and people of different cultures working
together and respecting their differences. Conversely I am distressed when I learn about deep-seated hatred or
when I see people exploiting others or shoving them around pre-emptively.
In what way could the desired revolution come about? Who knows… but I suspect that “Engineers Without
Borders” are closer than anybody else to a working strategy by which geeks like me could help.
Biography
Gianluca Pignalberi (/user/4" title="View user profile.): Gianluca (/contacts/g.pignalberi) is Free Software
Magazine's Compositor.
Copyright information
This article is made available under the "Attribution-NonCommercial-NoDerivs" Creative Commons License
3.0 available from http://creativecommons.org/licenses/by-nc-nd/3.0/.
Source URL:
http://www.freesoftwaremagazine.com/articles/interview_knuth
Biography 4
Dancing Links
Donald E. Knuth, Stanford University
remove x from the list; every programmer knows this. But comparatively few programmers
have realized that the subsequent operations
L R[x] ← x, R L[x] ← x (2)
1
For example, Dijkstra’s recursive procedure for the queens problem kept the current
state in three global Boolean arrays, representing the columns, the diagonals, and the
reverse diagonals of a chessboard; Hitotumatu and Noshita’s program kept it in a doubly
linked list of available columns together with Boolean arrays for both kinds of diagonals.
When Dijkstra tentatively placed a queen, he changed one entry of each Boolean array
from true to false; then he made the entry true again when backtracking. Hitotumatu
and Noshita used (1) to remove a column and (2) to restore it again; this meant that they
could find an empty column without having to search for it. Each program strove to record
the state information in such a way that the placing and subsequent unplacing of a queen
would be efficient.
The beauty of (2) is that operation (1) can be undone by knowing only the value of x.
General schemes for undoing assignments require us to record the identity of the left-hand
side together with its previous value (see [11]; see also [25], pages 268–284). But in this
case only the single quantity x is needed, and backtrack programs often know the value
of x implicitly as a byproduct of their normal operation.
We can apply (1) and (2) repeatedly in complex data structures that involve large
numbers of interacting doubly linked lists. The program logic that traverses those lists
and decides what elements should be deleted can often be run in reverse, thereby deciding
what elements should be undeleted. And undeletion restores links that allow us to continue
running the program logic backwards until we’re ready to go forward again.
This process causes the pointer variables inside the global data structure to execute an
exquisitely choreographed dance; hence I like to call (1) and (2) the technique of dancing
links.
The exact cover problem. One way to illustrate the power of dancing links is to consider
a general problem that can be described abstractly as follows: Given a matrix of 0s and
1s, does it have a set of rows containing exactly one 1 in each column?
For example, the matrix
0 0 1 0 1 1 0
1 0 0 1 0 0 1
0 1 1 0 0 1 0
(3)
1 0 0 1 0 0 0
0 1 0 0 0 0 1
0 0 0 1 1 0 1
has such a set (rows 1, 4, and 5). We can think of the columns as elements of a universe,
and the rows as subsets of the universe; then the problem is to cover the universe with
disjoint subsets. Or we can think of the rows as elements of a universe, and the columns as
subsets of that universe; then the problem is to find a collection of elements that intersect
each subset in exactly one point. Either way, it’s a potentially tough problem, well known
to be NP-complete even when each row contains exactly three 1s [13, page 221]. And it is
a natural candidate for backtracking.
Dana Scott conducted one of the first experiments on backtrack programming in 1958,
when he was a graduate student at Princeton University [34]. His program, written for the
IAS “MANIAC” computer with the help of Hale F. Trotter, produced the first listing of all
2
ways to place the 12 pentominoes into a chessboard leaving the center four squares vacant.
For example, one of the 65 solutions is shown in Figure 1. (Pentominoes are the case n = 5
of n-ominoes, which are connected n-square subsets of an infinite board; see [15]. Scott
was probably inspired by Golomb’s paper [14] and some extensions reported by Martin
Gardner [12].)
This problem is a special case of the exact cover problem. Imagine a matrix that
has 72 columns, one for each of the 12 pentominoes and one for each of the 60 cells of
the chessboard-minus-its-center. Construct all possible rows representing a way to place
a pentomino on the board; each row contains a 1 in the column identifying the piece, and
five 1s in the columns identifying its positions. (There are exactly 1568 such rows.) We can
name the first twelve columns F I L P N T U V W X Y Z, following Golomb’s recommended
names for the pentominoes [15, page 7], and we can use two digits ij to name the column
corresponding to rank i and file j of the board; each row is conveniently represented by
giving the names of the columns where 1s appear. For example, Figure 1 is the exact cover
corresponding to the twelve rows
I 11 12 13 14 15
N 16 26 27 37 47
L 17 18 28 38 48
U 21 22 31 41 42
X 23 32 33 34 43
W 24 25 35 36 46
P 51 52 53 62 63
F 56 64 65 66 75
Z 57 58 67 76 77
T 61 71 72 73 81
V 68 78 86 87 88
Y 74 82 83 84 85 .
Solving an exact cover problem. The following nondeterministic algorithm, which I
will call algorithm X for lack of a better name, finds all solutions to the exact cover problem
defined by any given matrix A of 0s and 1s. Algorithm X is simply a statement of the
obvious trial-and-error approach. (Indeed, I can’t think of any other reasonable way to do
the job, in general.)
3
If A is empty, the problem is solved; terminate successfully.
Otherwise choose a column, c (deterministically).
Choose a row, r, such that A[r, c] = 1 (nondeterministically).
Include r in the partial solution.
For each j such that A[r, j] = 1,
delete column j from matrix A;
for each i such that A[i, j] = 1,
delete row i from matrix A.
Repeat this algorithm recursively on the reduced matrix A.
The nondeterministic choice of r means that the algorithm essentially clones itself into
independent subalgorithms; each subalgorithm inherits the current matrix A, but reduces
it with respect to a different row r. If column c is entirely zero, there are no subalgorithms
and the process terminates unsuccessfully.
The subalgorithms form a search tree in a natural way, with the original problem at
the root and with level k containing each subalgorithm that corresponds to k chosen rows.
Backtracking is the process of traversing the tree in preorder, “depth first.”
Any systematic rule for choosing column c in this procedure will find all solutions,
but some rules work much better than others. For example, Scott [34] said that his initial
inclination was to place the first pentomino first, then the second pentomino, and so on;
this would correspond to choosing column F first, then column I, etc., in the corresponding
exact cover problem. But he soon realized that such an approach would be hopelessly slow:
There are 192 ways to place the F, and for each of these there are approximately 34 ways
to place the I. The Monte Carlo estimation procedure described in [24] suggests that the
search tree for this scheme has roughly 2 × 1012 nodes! By contrast, the alternative of
choosing column 11 first (the column corresponding to rank 1 and file 1 of the board),
and in general choosing the lexicographically first uncovered column, leads to a search tree
with 9,015,751 nodes.
Even better is the strategy that Scott finally adopted [34]: He realized that piece X
has only 3 essentially different positions, namely centered at 23, 24, and 33. Furthermore,
if the X is at 33, we can assume that the P pentomino is not “turned over,” so that it takes
only four of its eight orientations. Then we get each of the 65 essentially different solutions
exactly once, and the full set of 8 × 65 = 520 solutions is easily obtained by rotation and
reflection. These constraints on X and P lead to three independent problems, with
4
of dancing links allows us to do this quite nicely; the search trees for Scott’s pentomino
problem then have only
10,421 nodes (X at 23);
12,900 nodes (X at 24);
14,045 nodes (X at 33, P not flipped),
respectively.
The dance steps. One good way to implement algorithm X is to represent each 1 in the
matrix A as a data object x with five fields L[x], R[x], U [x], D[x], C[x]. Rows of the matrix
are doubly linked as circular lists via the L and R fields (“left” and “right”); columns are
doubly linked as circular lists via the U and D fields (“up” and “down”). Each column
list also includes a special data object called its list header.
The list headers are part of a larger object called a column object. Each column ob-
ject y contains the fields L[y], R[y], U [y], D[y], and C[y] of a data object and two additional
fields, S[y] (“size”) and N [y] (“name”); the size is the number of 1s in the column, and the
name is a symbolic identifier for printing the answers. The C field of each object points
to the column object at the head of the relevant column.
The L and R fields of the list headers link together all columns that still need to be
covered. This circular list also includes a special column object called the root, h, which
serves as a master header for all the active headers. The fields U [h], D[h], C[h], S[h], and
N [h] are not used.
For example, the 0-1 matrix of (3) would be represented by the objects shown in
Figure 2, if we name the columns A, B, C, D, E, F, and G. (This diagram “wraps around”
toroidally at the top, bottom, left, and right. The C links are not shown because they
would clutter up the picture; each C field points to the topmost element in its column.)
Our nondeterministic algorithm to find all exact covers can now be cast in the following
explicit, deterministic form as a recursive procedure search(k), which is invoked initially
with k = 0:
If R[h] = h, print the current solution (see below) and return.
Otherwise choose a column object c (see below).
Cover column c (see below).
For each r ← D[c], D D[c] , . . . , while r 6= c,
set Ok ← r;
for each j ← R[r], R R[r] , . . . , while j 6= r,
cover column j (see below);
search(k + 1);
set r ← Ok and c ←C[r];
for each j ← L[r], L L[r] , . . . , while j 6= r,
uncover column j (see below).
Uncover column c (see below) and return.
The operation of printing the current solution is easy: We successively print the rows
containing O0 , O
1 , .. . , Ok−1, where
the row
containing data object O is printed by
printing N C[O] , N C[R[O]] , N C[R[R[O]]] , etc.
5
A B C D E F G
h
2 2 2 3 2 2 3
To choose a column object c, we could simply set c ← R[h]; this is the leftmost
uncovered column. Or if we want to minimize the branching factor, we could set s ← ∞
and then
for each j ← R[h], R R[h] , . . . , while j 6= h,
if S[j] < s set c ← j and s ← S[j].
Then c is a column with the smallest number of 1s. (The S fields are not needed unless
we want to minimize branching in this way.)
The operation of covering column c is more interesting: It removes c from the header
list and removes all rows in c’s own list from the other column lists they are in.
Set L R[c] ← L[c] and R L[c]
← R[c].
For each i ← D[c], D D[c]
, . . . , while i 6= c,
for each j ← R[i],
R R[i] , .. . , while
j 6= i,
set U D[j] ← U [j], D U[j] ← D[j],
and set S C[j] ← S C[j] − 1.
Operation (1), which I mentioned at the outset of this paper, is used here to remove objects
in both the horizontal and vertical directions.
Finally, we get to the point of this whole algorithm, the operation of uncovering a given
column c. Here is where the links do their dance:
For each i = U [c], U U [c] , . .. , while i 6= c,
for each j ← L[i],
L L[i]
, . . . , while j 6= i,
set S C[j] ← S [j] + 1,
and
set U D[j]
← j, D U [j] ← j.
Set L R[c] ← c and R L[c] ← c.
6
Notice that uncovering takes place in precisely the reverse order of the covering operation,
using the fact that (2) undoes (1). (Actually we need not adhere so strictly to the principle
of “last done, first undone” in this case, since j could run through row i in any order. But
we must be careful to unremove the rows from bottom to top, because we removed them
from top to bottom. Similarly, it is important to uncover the columns of row r from right
to left, because we covered them from left to right.)
A B C D E F G
h
2 2 2 1 2 2 2
Consider, for example, what happens when search(0) is applied to the data of (3) as
represented by Figure 2. Column A is covered by removing both of its rows from their
other columns; the structure now takes the form of Figure 3. Notice the asymmetry of the
links that now appear in column D: The upper element was deleted first, so it still points to
its original neighbors, but the other deleted element points upward to the column header.
Continuing search(0), when r points to the A element of row (A, D, G), we also cover
columns D and G. Figure 4 shows the status as we enter search(1); this data structure
represents the reduced matrix
B C E F
0 1 1 1
. (4)
1 1 0 1
Now search(1) will cover column B, and there will be no 1s left in column E. So
search(2) will find nothing. Then search(1) will return, having found no solutions, and
the state of Figure 4 will be restored. The outer level routine, search(0), will proceed to
convert Figure 4 back to Figure 3, and it will advance r to the A element of row (A, D).
7
A B C D E F G
h
2 1 2 1 1 2 1
Figure 4. The links after columns D and G in Figure 3 have been covered.
A D
B G
C E F
A D
E F C
B G
if the shortest column is chosen at each step. (The first item printed in each row list is the
name of the column on which branching was done.) Readers who play through the action
of this algorithm on some examples will understand why I chose the title of this paper.
Efficiency considerations. When algorithm X is implemented in terms of dancing links,
let’s call it algorithm DLX. The running time of algorithm DLX is essentially proportional
to the number of times it applies operation (1) to remove an object from a list; this is also
the number of times it applies operation (2) to unremove an object. Let’s say that this
quantity is the number of updates. A total of 28 updates are performed during the solution
of (3) if we repeatedly choose the shortest column: 10 updates are made on level 0, 14 on
level 1, and 4 on level 2. Alternatively, if we ignore the S heuristic, the algorithm makes
16 updates on level 1 and 7 updates on level 2, for a total of 33. But in the latter
case
each update will go noticeably faster, since the statements S C[j] ← S C[j] ± 1 can
be omitted; hence the overall running time will probably be less. Of course we need to
8
Figure 5. The search tree for one case of Scott’s pentomino problem.
study larger examples before drawing any general conclusions about the desirability of the
S heuristic.
A backtrack program usually spends most of its time on only a few levels of the search
tree (see [24]). For example, Figure 5 shows the search tree for the case X = 23 of Dana
Scott’s pentomino problem using the S heuristic; it has the following profile:
Level Nodes Updates Updates per node
0 1 ( 0%) 2,031 ( 0%) 2031.0
1 2 ( 0%) 1,676 ( 0%) 838.0
2 22 ( 0%) 28,492 ( 1%) 1295.1
3 77 ( 1%) 77,687 ( 2%) 1008.9
4 219 ( 2%) 152,957 ( 4%) 698.4
5 518 ( 5%) 367,939 (10%) 710.3
6 1,395 (13%) 853,788 (24%) 612.0
7 2,483 (24%) 941,265 (26%) 379.1
8 2,574 (25%) 740,523 (20%) 287.7
9 2,475 (24%) 418,334 (12%) 169.0
10 636 ( 6%) 32,205 ( 1%) 50.6
11 19 ( 0%) 826 ( 0%) 43.5
Total 10,421 (100%) 3,617,723 (100%) 347.2
(The number of updates shown for level k is the number of times an element was removed
from a doubly linked list during the calculations between levels k − 1 and k. The 2,031 up-
dates on level 0 correspond to removing column X from the header list and then removing
2030/5 = 406 rows from their other columns; these are the rows that overlap with the
9
placement of X at 23. A slight optimization was made when tabulating this data: Col-
umn c was not covered and uncovered in trivial cases when it contained no rows.) Notice
that more than half of the nodes lie on levels ≥ 8, but more than half of the updates occur
on the way to level 7. Extra work on the lower levels has reduced the need for hard work
at the higher levels.
The corresponding statistics look like this when the same problem is run without the
ordering heuristic based on S fields:
Level Nodes Updates Updates per node
0 1 ( 0%) 2,031 ( 0%) 2031.0
1 6 ( 0%) 5,606 ( 0%) 934.3
2 24 ( 0%) 30,111 ( 0%) 1254.6
3 256 ( 0%) 249,904 ( 1%) 976.2
4 581 ( 1%) 432,471 ( 2%) 744.4
5 1,533 ( 1%) 1,256,556 ( 7%) 819.7
6 3,422 ( 3%) 2,290,338 (13%) 669.3
7 10,381 (10%) 4,442,572 (25%) 428.0
8 26,238 (25%) 5,804,161 (33%) 221.2
9 46,609 (45%) 3,006,418 (17%) 64.5
10 13,935 (14%) 284,459 ( 2%) 20.4
11 19 ( 0%) 14,125 ( 0%) 743.4
Total 103,005 (100%) 17,818,752 (100%) 173.0
Each update involves about 14 memory accesses when the S heuristic is used, and about
8 accesses when S is ignored. Thus the S heuristic multiplies the total number of memory
accesses by a factor of approximately (14 × 3,617,723)/(8 × 17,818,752) ≈ 36% in this
example. The heuristic is even more effective in larger problems, because it tends to
reduce the total number of nodes by a factor that is exponential in the number of levels
while the cost of applying it grows only linearly.
Assuming that the S heuristic is good in large trees but not so good in small ones,
I tried a hybrid scheme that uses S at low levels but not at high levels. This experiment
was, however, unsuccessful. If, for example, S was ignored after level 7, the statistics for
levels 8–11 were as follows:
Level Nodes Updates
8 18,300 5,672,258
9 28,624 2,654,310
10 9,989 213,944
11 19 10,179
And if the change was applied after level 8, the statistics were
Level Nodes Updates
9 11,562 1,495,054
10 6,113 148,162
11 19 6,303
Therefore I decided to retain the S heuristic at all levels of algorithm DLX.
10
My trusty old SPARCstation 2 computer, vintage 1992, is able to perform approxi-
mately 0.39 mega-updates per second when working on large problems and maintaining the
S fields. The 120 MHz Pentium I computer that Stanford computer science faculty were
given in 1996 did 1.21 mega-updates per second, and my new 500 MHz Pentium III does
5.94. Thus the running time decreases as technology advances; but it remains essentially
proportional to the number of updates, which is the number of times the links do their
dance. Therefore I prefer to measure the performance of algorithm DLX by counting the
number of updates, not by counting the number of elapsed seconds.
Scott [34] was pleased to discover that his program for the MANIAC solved the pen-
tomino problem in about 3.5 hours. The MANIAC executed approximately 4000 instruc-
tions per second, so this represented roughly 50 million instructions. He and H. F. Trotter
found a nice way to use the “bitwise-and” instructions of the MANIAC, which had 40-bit
registers. Their code, which executed about 50,000,000/(103,005+106,232+154,921) ≈ 140
instructions per node of the search tree, was quite efficient in spite of the fact that they
had to deal with about ten times as many nodes as would be produced by the order-
ing heuristic. Indeed, the linked-list approach of algorithm DLX performs a total of
3,617,723 + 4,547,186 + 5,526,988 = 13,691,897 updates, or about 192 million memory
accesses; and it would never fit in the 5120-byte memory of the MANIAC! From this stand-
point the technique of dancing links is actually a step backward from Scott’s 40-year-old
method, although of course that method works only for very special types of exact cover
problems in which simple geometric structure can be exploited.
The task of finding all ways to pack the set of pentominoes into a 6 × 10 rectangle is
more difficult than Scott’s 8 × 8 − 2 × 2 problem, because the backtrack tree for the 6 × 10
problem is larger and there are 2339 essentially different solutions [21]. In this case we
limit the X pentomino to the upper left quarter of the board; our linked-memory algorithm
generates 902,631 nodes and 309,134,131 updates (or 28,320,810 nodes and 4,107,105,935
updates without the S heuristic). This solves the problem in less than a minute on a Pen-
tium III; however, again I should point out that the special characteristics of pentominoes
allow a faster approach.
John G. Fletcher needed only ten minutes to solve the 6 × 10 problem on an IBM 7094
in 1965, using a highly optimized program that had 765 instructions in its inner loop [10].
The 7094 had a clock rate of 0.7 MHz, and it could access two 36-bit words in a single clock
cycle. Fletcher’s program required only about 600 × 700,000/28,320,810 ≈ 15 clock cycles
per node of the search tree; so it was superior to the bitwise method of Scott and Trotter,
and it remains the fastest algorithm known for problems that involve placing the twelve
pentominoes. (N. G. de Bruijn discovered an almost identical method independently;
see [7].)
With a few extensions to the 0-1 matrix for Dana Scott’s problem, we can solve the
more general problem of covering a chessboard with twelve pentominoes and one square
tetromino, without insisting that the tetromino occupy the center. This is essentially the
classic problem of Dudeney, who invented pentominoes in 1907 [9]. The total number of
such chessboard dissections has apparently never appeared in the literature; algorithm DLX
needs 1,526,279,783 updates to determine that it is exactly 16,146.
Many people have written about polyomino problems, including distinguished math-
ematicians such as Golomb [15], de Bruijn [7, 8], Berlekamp, Conway and Guy [4]. Their
11
92 solutions, 14,352,556 nodes, 1,764,631,796 updates 100 solutions, 10,258,180 nodes, 1,318,478,396 updates
20 solutions, 6,375,335 nodes, 806,699,079 updates 0 solutions, 1,234,485 nodes, 162,017,125 updates
12
Applications to hexiamonds. In the late 1950s, T. H. O’Beirne introduced a pleasant
variation on polyominoes by substituting triangles for squares. He named the resulting
shapes polyiamonds: moniamonds, diamonds, triamonds, tetriamonds, pentiamonds, hex-
iamonds, etc. The twelve hexiamonds were independently discovered by J. E. Reeve and
J. A. Tyrell [32], who found more than forty ways to arrange them into a 6 × 6 rhombus.
Figure 7 shows one such arrangement, together with some arrow dissections that I couldn’t
resist trying when I first learned about hexiamonds. The 6 × 6 rhombus can be tiled by
the twelve hexiamonds in exactly 156 ways. (This fact was first proved by P. J. Torbijn
[35], who worked without a computer; algorithm DLX confirms his result after making
37,313,405 updates, if we restrict the “sphinx” to only 3 of its 12 orientations.)
4 solutions, 6,677 nodes, 4,687,159 updates 0 solutions, 7,603 nodes, 3,115,387 updates
41 solutions, 35,332 nodes, 14,948,759 updates 3 solutions, 5546 nodes, 3,604,817 updates
13
O’Beirne was particularly fascinated by the fact that seven of the twelve hexiamonds
have different shapes when they are flipped over, and that the resulting 19 one-sided hexi-
amonds have the correct number of triangles to form a hexagon: a hexagon of hexiamonds
(see Figure 8). In November of 1959, after three months of trials, he found a solution; and
two years later he challenged the readers of New Scientist to match this feat [28, 29, 30].
Meanwhile he had shown the puzzle to Richard Guy and his family. The Guys pub-
lished several solutions in a journal published in Singapore, where Richard was a professor
[17]. Guy, who has told the story of this fascinating recreation in [18], says that when
O’Beirne first described the puzzle, “Everyone wanted to try it at once. No one went to
bed for about 48 hours.”
A 19-level backtrack tree with many possibilities at each level makes an excellent
test case for the dancing links approach to covering, so I fed O’Beirne’s problem to my
program. I broke the general case into seven subcases, depending on the distance of the
hexagon piece from the center; furthermore, when that distance was zero, I considered two
subcases depending on the position of the “crown.” Figure 8 shows a representative of
each of the seven cases, together with statistics about the search. The total number of
updates performed was 134,425,768,494.
My goal was not only to count the solutions, but also to find arrangements that were
as symmetrical as possible—in response to a problem that was stated in Berlekamp, Guy,
and Conway’s book Winning Ways [4, page 788]. Let us define the horizontal symmetry of
a configuration to be the number of edges between pieces that also are edges between pieces
in the left-right reflection of that configuration. The overall hexagon has 156 internal edges,
and the 19 one-sided hexiamonds have 96 internal non-edges. Therefore if an arrangement
were perfectly symmetrical—unchanged by left-right reflection—its horizontal symmetry
would be 60. But no such perfectly symmetric solution is possible.
The vertical symmetry of a configuration is defined similarly, but with respect to top-
bottom reflection. A solution to the hexiamond problem is maximally symmetric if it has
the highest horizontal or vertical symmetry score, and if the smaller score is as large as
possible consistent with the larger score. Each of the solutions shown in Figure 8 is, in
fact, maximally symmetric in its class. (And so is the solution to Dana Scott’s problem
that is shown in Figure 1: It has vertical symmetry 36 and horizontal symmetry 30.)
The largest possible vertical symmetry score is 50; it is achieved in Figure 8(c), and in
seven other solutions obtained by independently rearranging three of its symmetrical sub-
parts. Four of the eight have a horizontal symmetry score of 32; the others have horizontal
symmetry 24. John Conway found these solutions by hand in 1964 and conjectured that
they were maximally symmetric overall. But that honor belongs uniquely to the solution
in Figure 8(f), at least by my definition, because Figure 8(f) has horizontal symmetry 52
and vertical symmetry 27. The only other ways to achieve horizontal symmetry 52 have
vertical symmetry scores of 20, 22, and 24. (Two of those other ways do, however, have
the surprising property that 13 of their 19 pieces are unchanged by horizontal reflection;
this is symmetry of entire pieces, not just of edges.)
After I had done this enumeration, I read Guy’s paper [18] for the first time and learned
that Marc M. Paulhus had already enumerated all solutions in May 1996 [31]. Good, our
independent computations would confirm the results. But no—my program found 124,519
solutions, while his had found 124,518! He reran his program in 1999 and now we agree.
14
(a) (b)
(hsym = 32, vsym = 50) (hsym = 51, vsym = 22) (hsym = 48, vsym = 30)
11,447 solutions, 20,737,702 nodes 7,549 solutions, 24,597,239 nodes 6,675 solutions, 17,277,362 nodes
10,315,775,812 updates 12,639,698,345 updates 8,976,245,858 updates
(f) (g)
15
O’Beirne [29] also suggested an analogous problem for pentominoes, since there are
18 one-sided pentominoes. He asked if they can be put into a 9 × 10 rectangle, and
Golomb provided an example in [15, Chapter 6]. Jenifer Leech wrote a program to prove
that there are exactly 46 different ways to pack the one-sided pentominoes in a 3 × 30
rectangle; see [26]. Figure 9 shows a maximally symmetric example (which isn’t really
very symmetrical).
I set out to count the solutions to the 9 × 10, figuring that an 18-stage exact cover
problem with six 1s per row would be simpler than a 19-stage problem with seven 1s per
row. But I soon found that the task would be hopeless, unless I invented a much better
algorithm. The Monte Carlo estimation procedure of [24] suggests that about 19 quadrillion
updates will be needed, with 64 trillion nodes in the search trees. If that estimate is correct,
I could have the result in a few months; but I’d rather try for a new Mersenne prime.
I do, however, have a conjecture about the solution that will have maximum horizontal
symmetry; see Figure 10.
A failed experiment. Special arguments based on “coloring” often give important in-
sights into tiling problems. For example, it is well known [5, pages 142 and 394] that if we
remove two cells from opposite corners of a chessboard, there is no way to cover the remain-
ing 62 cells with dominoes. The reason is that the mutilated chessboard has, say, 32 white
cells and 30 black cells, but each individual domino covers one cell of each color. If we
16
present such a covering problem to algorithm DLX, it makes 4,780,846 updates (and finds
13,922 ways to place 30 of the 31 dominoes) before concluding that there is no solution.
The cells of the hexiamond-hexagon problem can be colored black and white in a
similar fashion: All triangles that point left are black, say, and all that point right are
white. Then fifteen of the one-sided hexiamonds cover three triangles of each color; but
the remaining four, namely the “sphinx” and the “yacht” and their mirror images, each
have a four-to-two color bias. Therefore every solution to the problem must put exactly
two of those four pieces into positions that favor black.
I thought I’d speed things up by dividing the problem into six subproblems, one
for each way to choose the two pieces that will favor black. Each of the subproblems was
expected to have about 1/6 as many solutions as the overall problem, and each subproblem
was simpler because it gave four of the pieces only half as many options as before. Thus
I expected the subproblems to run up to 16 times as fast as the original problem, and I
expected the extra information about impossible correlations of piece placement to help
algorithm DLX make intelligent choices.
But this turned out to be a case where mathematics gave me bad advice. The overall
problem had 6675 solutions and required 8,976,245,858 updates (Figure 8(c)). The six
subproblems turned out to have respectively 955, 1208, 1164, 1106, 1272, and 970 solutions,
roughly as expected; but they each required between 1.7 and 2.2 billion updates, and the
total work to solve all six subproblems was 11,519,571,784. So much for that bright idea.
Applications to tetrasticks. Instead of making pieces by joining squares or triangles
together, Brian Barwell [3] considered making them from line segments or sticks. He
called the resulting objects polysticks, and noted that there are 2 disticks, 5 tristicks, and
16 tetrasticks. The tetrasticks are especially interesting from a recreational standpoint; I
received an attractive puzzle in 1993 that was equivalent to placing ten of the tetrasticks
in a 4 × 4 square [1], and I spent many hours trying to psych it out.
Barwell proved that the sixteen tetrasticks cannot be assembled into any symmetrical
shape. But by leaving out any one of the five tetrasticks that have an excess of horizontal
or vertical line segments, he found ways to fill a 5×5 square. (See Figure 11.) Such puzzles
are quite difficult to do by hand, and he had found only five solutions at the time he wrote
his paper; he conjectured that fewer than a hundred solutions would actually exist. (The
set of all solutions was first found by Wiezorke and Haubrich [37], who invented the puzzle
independently after seeing [1].)
Polysticks introduce a new feature that is not present in the polyomino and polyia-
mond problems: The pieces must not cross each other. For example, Figure 12 shows a
non-solution to the problem considered in Figure 11(c). Every line segment in the grid of
5 × 5 squares is covered, but the ‘V’ tetrastick crosses the ‘Z’.
We can handle this extra complication by generalizing the exact cover problem. In-
stead of requiring all columns of a given 0-1 matrix to be covered by disjoint rows, we
will distinguish two kinds of columns: primary and secondary. The generalized problem
asks for a set of rows that covers every primary column exactly once and every secondary
column at most once.
The tetrastick problem of Figure 11(c) can be set up as a generalized cover problem
in a natural way. First we introduce primary columns F, H, I, J, N, O, P, R, S, U, V,
17
(a) (b)
607 solutions, 2,681,188 nodes 530 solutions, 3,304,039 nodes 204 solutions, 1,779,356 nodes
611,043,121 updates 760,578,623 updates 425,625,417 updates
For example, the two rows corresponding to the placement of V and Z in Figure 12
are
V H23 I33 H33 V43 I44 V44
Z H24 V33 I33 V32 H32
The common interior point I33 means that these rows cross each other. On the other hand,
18
I33 is not a primary column, because we do not necessarily need to cover it. The solution
in Figure 11(c) covers only the interior points I14, I21, I32, and I41.
Fortunately, we can solve the generalized cover problem by using almost the same
algorithm as before. The only difference is that we initialize the data structure by making
a circular list of the column headers for the primary columns only. The header for each
secondary column should have L and R fields that simply point to itself. The remainder
of the algorithm proceeds exactly as before, so we will still call it algorithm DLX.
A generalized cover problem can be converted to an equivalent exact cover problem
if we simply append one row for each secondary column, containing a single 1 in that col-
umn. But we are better off working with the generalized problem, because the generalized
algorithm is simpler and faster.
I decided to experiment with the subset of welded tetrasticks, namely those that do not
form a simple connected path because they contain junction points: F, H, R, T, X, Y. There
are ten one-sided welded tetrasticks if we add the mirror images of the unsymmetrical pieces
as we did for one-sided hexiamonds and pentominoes. And—aha—these ten tetrasticks can
be arranged in a 4 × 4 grid. (See Figure 13.) Only three solutions are possible, including
the two perfectly symmetric solutions shown. I’ve decided not to show the third solution,
which has the X piece in the middle, because I want readers to have the pleasure of finding
it for themselves.
There are fifteen one-sided unwelded tetrasticks, and I thought they would surely fit
into a 5 × 5 grid in a similar way; but this turned out to be impossible. The reason is that
if, say, piece I is placed vertically, four of the six pieces J, J′ , L, L′ , N, N′ must be placed
to favor the horizontal direction, and this severely limits the possibilities. In fact, I have
been unable to pack those fifteen pieces into any simple symmetrical shape; my best effort
so far is the “oboe” shown in Figure 14.
19
Figure 15. Do all 25 one-sided tetrasticks
fit in this shape?
I also tried unsuccessfully to pack all 25 of the one-sided tetrasticks into the Aztec
diamond pattern of Figure 15; but I see no way to prove that a solution is impossible. An
exhaustive search seems out of the question at the present time.
Applications to queens. Now we can return to the problem that led Hitotumatu and
Noshita to introduce dancing links in the first place, namely the N queens problem, be-
cause that problem is actually a special case of the generalized cover problem in the
previous section. For example, the 4 queens problem is just the task of covering eight
primary columns (R0, R1, R2, R3, F0, F1, F2, F3) corresponding to ranks and files, while
using at most one element in each of the secondary columns (A0, A1, A2, A3, A4, A5, A6,
B0, B1, B2, B3, B4, B5, B6) corresponding to diagonals, given the sixteen rows
R0 F0 A0 B3
R0 F1 A1 B4
R0 F2 A2 B5
R0 F3 A3 B6
R1 F0 A1 B2
R1 F1 A2 B3
R1 F2 A3 B4
R1 F3 A4 B5
R2 F0 A2 B1
R2 F1 A3 B2
R2 F2 A4 B3
R2 F3 A5 B4
R3 F0 A3 B0
R3 F1 A4 B1
R3 F2 A5 B2
R3 F3 A6 B3 .
In general, the rows of the 0-1 matrix for the N queens problem are
Ri Fj A(i + j) B(N − 1 − i + j)
for 0 ≤ i, j < N . (Here Ri and Fj represent ranks and files of a chessboard; Ak and Bℓ
represent diagonals and reverse diagonals. The secondary columns A0, A(2N − 2), B0, and
B(2N − 2) each arise in only one row of the matrix so they can be omitted.)
When we apply algorithm DLX to this generalized cover problem, it behaves quite
differently from the traditional algorithms for the N queens problem, because it branches
sometimes on different ways to occupy a rank of the chessboard and sometimes on different
20
ways to occupy a file. Furthermore, we gain efficiency by paying attention to the order in
which primary columns of the cover problem are considered when those columns all have
the same S value (the same branching factor): It is better to place queens near the middle
of the board first, because central positions rule out more possibilities for later placements.
Consider, for example, the eight queens problem. Figure 16(a) shows an empty board,
with 8 possible ways to occupy each rank and each file. Suppose we decide to place a queen
in R4 and F7, as shown in Figure 16(b). Then there are five ways to cover F4; after choosing
R5 and F4, Figure 16(c), there are four ways to cover R3, and so on. At each stage we
choose the most constrained rank or file, using the “organ pipe ordering”
R4 F4 R3 F3 R5 F5 R2 F2 R6 F6 R1 F1 R7 F7 R0 F0
to break ties. Placing a queen in R2 and F3 after Figure 16(d) makes it impossible to
cover F2, so backtracking will occur even though only four queens have been tentatively
placed.
(a) (b)
F0 F1 F2 F3 F4 F5 F6 F7 F0 F1 F2 F3 F4 F5 F6 F7
R7 8 R7 × × 6
R6 8 R6 × × 6
R5 8 R5 × × 6
R4 8 R4 × × × × × × ×
q
R3 8 R3 × × 6
R2 8 R2 × × 6
R1 8 R1 × × 6
R0 8 R0 × × 6
8 8 8 8 8 8 8 8 7 7 7 6 5 5 5
(c) (d)
F0 F1 F2 F3 F4 F5 F6 F7 F0 F1 F2 F3 F4 F5 F6 F7
R7 × × × × 4 R7 × × × × × × 2
R6 × × × × 4 R6 × × × × × 3
R5 × × × ×
q × × × R5 × × × ×
q × × ×
R4 × × × × × × ×
q R4 × × × × × × ×
q
R3 × × × × 4 R3 × × × × ×
q × ×
R2 × × × × 4 R2 × × × × × 3
R1 × × × 5 R1 × × × × × 3
R0 × × × 5 R0 × × × × × 3
5 5 4 4 4 4 4 3 2 2 3
Figure 16. Solving the 8 queens problem by treating ranks and files symmetrically.
21
The order in which header nodes are linked together at the start of algorithm DLX can
have a significant effect on the running time. For example, experiments on the 16 queens
problem show that the search tree has 312,512,659 nodes and requires 5,801,583,789 up-
dates, if the ordering R0 R1 . . . R15 F0 F1 . . . F15 is used, while the organ-pipe ordering
R8 F8 R7 F7 R9 F9 . . . R0 F0 requires only about 54% as many updates. On the other
hand, the order in which individual elements of a row or column are linked together has
no effect on the algorithm’s total running time.
Here are some statistics observed when algorithm DLX solved small cases of the
N queens problem using organ-pipe order, without reducing the number of solutions by
taking symmetries of the board into account:
Here “R-nodes” and “R-Updates” refer to the results when we consider only R0, R1, . . . ,
R(N − 1) to be primary columns that need to be covered; columns Fj are secondary. In
this case the algorithm reduces to the usual procedure in which branching occurs only on
ranks of the chessboard. The advantage of mixing rows with columns becomes evident as
N increases, but I’m not sure whether the ratio of R-Updates to Updates will be unbounded
or approach a limit as N goes to infinity.
I should point out that special methods are known for counting the number of solutions
to the N queens problem without actually generating the queen placements [33].
Concluding remarks. Algorithm DLX, which uses dancing links to implement the “nat-
ural” algorithm for exact cover problems, is an effective way to enumerate all solutions
to such problems. On small cases it is nearly as fast as algorithms that have been tuned
to solve particular classes of problems, like pentomino packing or the N queens problem,
where geometric structure can be exploited. On large cases it appears to run even faster
22
than those special-purpose algorithms, because of its ordering heuristic. And as computers
get faster and faster, we are of course tackling larger and larger cases all the time.
In this paper I have used the exact cover problem to illustrate the versatility of dancing
links, but I could have chosen many other backtrack applications in which the same ideas
apply. For example, the approach works nicely with the Waltz filtering algorithm [36];
perhaps this fact has subliminally influenced my choice of names. I recently used dancing
links together with a dictionary of about 600 common three-letter words of English to find
word squares such as
23
References
[1] 845 Combinations Puzzles: 845 Interestingly Combinations (Taiwan: R.O.C. Patent
66009). [There is no indication of the author or manufacturer. This puzzle, which
is available from www.puzzletts.com, actually has only 83 solutions. It carries a
Chinese title, “Dr. Dragon’s Intelligence Profit System.”]
[2] Harry Barris, Mississippi Mud (New York: Shapiro, Bernstein & Co., 1927).
[3] Brian R. Barwell, “Polysticks,” Journal of Recreational Mathematics 22 (1990), 165–
175.
[4] Elwyn R. Berlekamp, John H. Conway, and Richard K. Guy, Winning Ways for Your
Mathematical Plays 2 (London: Academic Press, 1982).
[5] Max Black, Critical Thinking (Englewood Cliffs, New Jersey: Prentice–Hall, 1946).
[Does anybody know of an earlier reference for the problem of the “mutilated chess-
board”?]
[6] Ole-Johan Dahl, Edsger W. Dijkstra, and C. A. R. Hoare, Structured Programming
(London: Academic Press, 1972).
[7] N. G. de Bruijn, personal communication (9 September 1999): “. . . it was almost my
first activity in programming that I got all 2339 solutions of the 6 × 10 pentomino on
an IBM1620 in March 1963 in 18 hours. It had to cope with the limited memory of
that machine, and there was not the slightest possibility to store the full matrix . . .
But I could speed the matter up by having a very long program, and that one was
generated by means of another program.”
[8] N. G. de Bruijn, “Programmeren van de pentomino puzzle,” Euclides 47 (1971/72),
90–104.
[9] Henry Ernest Dudeney, “74.—The broken chessboard,” in The Canterbury Puzzles,
(London: William Heinemann, 1907), 90–92, 174–175.
[10] John G. Fletcher, “A program to solve the pentomino problem by the recursive use
of macros,” Communications of the ACM 8 (1965), 621–623.
[11] Robert W. Floyd, “Nondeterministic algorithms,” Journal of the ACM 14 (1967),
636–644.
[12] Martin Gardner, “Mathematical games: More about complex dominoes, plus the
answers to last month’s puzzles,” Scientific American 197, 6 (December 1957), 126–
140.
[13] Michael R. Garey and David S. Johnson, Computers and Intractability (San Francisco:
Freeman, 1979).
[14] Solomon W. Golomb, “Checkerboards and polyominoes,” American Mathematical
Monthly 61 (1954), 675–682.
[15] Solomon W. Golomb, Polyominoes, second edition (Princeton, New Jersey: Princeton
University Press, 1994).
[16] Solomon W. Golomb and Leonard D. Baumart, “Backtrack programming,” Journal
of the ACM 12 (1965), 516–524.
24
[17] Richard K. Guy, “Some mathematical recreations,” Nabla (Bulletin of the Malayan
Mathematical Society) 7 (1960), 97–106, 144–153.
[18] Richard K. Guy, “O’Beirne’s Hexiamond,” in The Mathemagician and Pied Puzzler,
edited by Elwyn Berlekamp and Tom Rodgers (Natick, Massachusetts: A. K. Peters,
1999), 85–96.
[19] Robert M. Haralick and Gordon L. Elliott, “Increasing tree search efficiency for con-
straint satisfaction problems,” Artificial Intelligence 14 (1980), 263–313.
[20] Jenifer Haselgrove, “Packing a square with Y-pentominoes,” Journal of Recreational
Mathematics 7 (1974), 229.
[21] C. B. and Jenifer Haselgrove, “A computer program for pentominoes,” Eureka 23, 2
(Cambridge, England: The Archimedeans, October 1960), 16–18.
[22] Hirosi Hitotumatu and Kohei Noshita, “A technique for implementing backtrack al-
gorithms and its application,” Information Processing Letters 8 (1979), 174–175.
[23] George P. Jelliss, “Unwelded polysticks,” Journal of Recreational Mathematics 29
(1998), 140–142.
[24] Donald E. Knuth, “Estimating the efficiency of backtrack programs,” Mathematics of
Computation 29 (1975), 121–136.
[25] Donald E. Knuth, TEX: The Program (Reading, Massachusetts: Addison–Wesley,
1986).
[26] Jean Meeus, “Some polyomino and polyamond problems,” Journal of Recreational
Mathematics 6 (1973), 215–220.
[27] N. Metropolis and J. Worlton, “A trilogy of errors in the history of computing,”
Annals of the History of Computing 2 (1980), 49–59.
[28] T. H. O’Beirne, “Puzzles and Paradoxes 43: Pell’s equation in two popular problems,”
New Scientist 12 (1961), 260–261.
[29] T. H. O’Beirne, “Puzzles and Paradoxes 44: Pentominoes and hexiamonds,” New
Scientist 12 (1961), 316–317. [“So far as I know, hexiamond has not yet been put
through the mill on a computer; but this could doubtless be done.”]
[30] T. H. O’Beirne, “Puzzles and Paradoxes 45: Some hexiamond solutions: and an
introduction to a set of 25 remarkable points,” New Scientist 12 (1961), 379–380.
[31] Marc Paulhus,“Hexiamond Homepage,” http://www.math.ucalgary.ca/~paulhusm/
hexiamond1.
[32] J. E. Reeve and J. A. Tyrell, “Maestro puzzles,” The Mathematical Gazette 45 (1961),
97–99.
[33] Igor Rivin, Ilan Vardi, and Paul Zimmermann, “The n-queens problem,” American
Mathematical Monthly 101 (1994), 629–639.
[34] Dana S. Scott, “Programming a combinatorial puzzle,” Technical Report No. 1 (Prince-
ton, New Jersey: Princeton University Department of Electrical Engineering, 10 June
1958), ii + 14 + 5 pages. [From page 10: “. . . the main problem in the program was
to handle several lists of indices that were continually being modified.”]
25
[35] P. J. Torbijn, “Polyiamonds,” Journal of Recreational Mathematics 2 (1969), 216–227.
[36] David Waltz, “Understanding line drawings of scenes with shadows,” in The Psy-
chology of Computer Vision, edited by P. Winston (New York: McGraw–Hill, 1975),
19–91.
[37] Bernhard Wiezorke and Jacques Haubrich, “Dr. Dragon’s polycons,” Cubism For Fun
33 (February 1994), 6–7.
Addendum. During November, 1999, Alfred Wassermann of Universität Bayreuth suc-
ceeded in covering the Aztec diamond of Figure 15 with one-sided tetrasticks, using a
cluster of workstations running algorithm DLX. The 107 possible solutions, which are
quite beautiful, have been posted at http://did.mat.uni-bayreuth.de/wassermann/.
He subsequently enumerated the 10,440,433 solutions to the 9 × 10 one-sided pentomino
problem; many of these turn out to be more symmetric than the one in Figure 10.
26
Computer Science Department
Mathematical Writing
by
Donald E. Knuth, Tracy Larrabee, and Paul M. Roberts
This report is based on a course of the same name given at Stanford University during
autumn quarter, 1987. Here's the catalog description:
CS 209. Mathematical Writing-Issues of technical writing and the ef
fective presentation of mathematics and computer science. Preparation of theses,
papers, books, and "literate" computer programs. A term paper on a topic of
your choice; this paper may be used for credit in another course.
The first three lectures were a "minicourse" that summarized the basics. About two
hundred people attended those three sessions, which were devoted primarily to a discussion
of the points in § 1 of this report. An exercise (§2) and a suggested solution (§3) were also
part of the minicourse.
The remaining 28 lectures covered these and other issues in depth. We saw many
examples of "before" and "after" from manuscripts in progress . We learned how to avoid
excessive subscripts and superscripts. We discussed the documentation of algorithms, com
puter programs, and user manuals . We considered the process of refereeing and editing.
We studied how to make effective diagrams and tables, and how to find appropriate quota
tions to spice up a text. Some of the material duplicated some of what would be discussed
in writing classes offered by the English department, but the vast majority of the lectures
were devoted to issues that are specific to mathematics and/or computer science.
Guest lectures by Herb Wilf ( University of Pennsylvania), Jeff Ullman (Stanford),
Leslie Lamport (Digital Equipment Corporation) , Nils Nilsson (Stanford), Mary-Claire
van Leunen (Digital Equipment Corporation) , Rosalie Stemer (San Francisco Chronicle),
and Paul Halmos (University of Santa Clara), were a special highlight as each of these
outstanding authors presented their own perspectives on the problems of mathematical
communication.
This report contains transcripts of the lectures and copies of various handouts that
were distributed during the quarter. We think the course was able to clarify a surprisingly
large number of issues that play an important part in the life of every professional who
•
works in mathematical fields. Therefore we hope that people who were unable to attend
the course might still benefit from it, by reading this summary of what transpired.
The authors wish to thank Phyllis Winkler for the first-rate technical typing that
made these notes possible.
Caveat: These are transcripts of lectures, not a polished set of essays on the subject.
Some of the later lectures refer to mistakes in the notes of earlier lectures; we have decided
to correct some (but not all) of those mistakes before printing this report. References to
such no-longer-existent blunders might be hard to understand. Understand?
Videotapes of the class sessions are kept in the Mathematical & Computer Sciences
Library at Stanford.
The preparation of this report was supported in part by NSF grant CCR-8610181.
Table of Contents
3. Don't use the symbols . ' . , =*', V, 3, 3; replace them by the corresponding words.
( Except in works on logic, of course. )
4. The statement just preceding a theorem, algorithm, etc., should be a complete sen
tence or should end with a colon.
Bad: We now have the following
T heorem. H(x) is continuous.
This is bad on three counts, including rule 2 . It should be rewritten, for example, like
this:
Good: We can now prove the following result.
Theorem_ The function H(x) defined in (5) is continuous.
Even better would be to replace the first sentence by a more suggestive motivation,
tying the theorem up with the previous discussion.
(like lowercase letters for elements of sets and uppercase for sets) are also useful.
15. Don't get carried away by subscripts, especially when dealing with a set that doesn't
need to be indexed; set element notation can be used to avoid subscripted subscripts.
For example, it is often troublesome to start out with a definition like "Let X =
{Xl,"" In}" if you're going to need subsets of X, since the subset will have to
defined as {Xi., ... , Xim}, say. Also you'll need to be speaking of elements Xi and Xj
all the time. Don't name the elements of X unless necessary. Then you can refer to
elements X and y of X in your subsequent discussion, without needing subscripts; or
YOll can refer to Xl and X2 as specified elements of X.
16. Display important formulas on a line by themselves. If you need to refer to some of
these formulas from remote parts of the text, give reference numbers to all of the most
important ones, even if they aren't referenced.
17. Sentences should be readable from left to right without ambiguity. Bad examples:
"Smith remarked in a paper about the scarcity of data." "In the theory of rings,
groups and other algebraic structures are treated."
18. Small numbers should be spelled out when used as adjectives, but not when used as
names (i.e., when talking about numbers as numbers).
Bad: The method requires 2 passes.
Good: Method 2 is illustrated in Fig. 1; it requires 17 passes. The count was
increased by 2. The leftmost 2 in the sequence was changed to a l.
19. Capitalize names like Theorem 1, Lemma 2, Algorithm 3, Method 4.
L(G,P) = . . .
u =
2u2 3 u35 u5 7u71 1 ull . . . = II pUp�
P prime
where the exponents t!2, t!3,'" are uniquely determined nonnegative inte
gers, and where all but a finite number of the exponents a.re zero."
[The first quotation is from Carl Linderholm's neat satirical book Mathema.tics Made
Difficult; the second is from D. Knuth's Seminumericai Algorithms, Section 4.5.2.]
When in doubt, read The Art of Computer Programming for outstanding examples
of good style.
[That "as a joke. Humor is best used in technical writing when readers ca. n understand
the joke only when they also understand a technical point that is being made. Here
is another example from Linderholm:
= =
"... 0D 0 and N0 N, which we may express by saying that 0 is
absorbing on the left and neutral on the right, like British toilet paper."
Try to restrict yourself to jokes that will not seem silly on second or third reading.
And don't overuse exclamation points!]
(1)
hence
C; - Cj � k(b; - bj). (3)
This is impossible, since C; - Cj =k - 1 is less than k, yet bj - bi � 1. It follows that
(bl, ... , bn) must be an element of An. I
Note that the hypothesis C #- 0 is necessary in Lemma 1, for if C is empty the set
L(C, P) is also empty regardless of P.
[This was the "minor slip".J
BUT ... don't always use the first idea you think of. The proof above actually
commits another sin against mathematical exposition, namely the unnecessary use of proof
by contradiction. It would have been better to use a direct proof:
Let (bl, ... , bn) be an arbitrary element of P, and let i and j be fixed subscripts with
i < j; we wish to prove that b; � bj. Since C is nonempty, it contains some element
(CI, ... ,cn). Now the vector (CI, . . . , cn) + k(bl, ... , bn) is an element of L(C,P) for all
k � 0, and by hypothesis it must therefore be an element of An. But this means that
Ci + kbi � Cj + kbj, i.e.,
C; - Cj � k(bj - b;), (3)
for arbitrarily large k. Consequently bj - bi must be zero or negative.
We have proved that bj - bi � 0 for all i < j , so the vector (bl, ..., bn) must be ,n
element of An. I
This form of the proof has other virtues too: It doesn't assume that the bi's 2 re
integer-valued, and it doesn't require stating that CI 2··· � Cn·
In certain instances, people did not understand what constitutes a proof. Fluency
in mathematics is;mportant for Computer Science students but will not be taught
in this class.
Not all formulas are equations. Depending on the formula, the terms 'relation',
'definition', 'statement', or 'theorem' might used.
When you use ellipses, such as (PI, . . . , Pn), remember to put commas before and
after the three dots. When placing ellipses between commas the three dots belong
on the same level as the commas, but when the ellipsis is bracketed by symbols such
as ' + ' or ' < ' the dots should be at mid-level.
Linebreaks in the middle of formulas are undesirable. There are ways to enforce
this with TEX (as well as other text formatting systems). People who use TEX and
wish to use the vertical bar and the empty set symbol in notation like' {c IcE �}'
should be aware of the TEX commands \mid and \emptyset.
Comments such as, "We demonstrate the second conclusion by contradiction," and
"There must be a witness to the unsortedness of P," are useful because they tell the
reader what is going on or bring in new and helpful vocabulary.
Numbering all displayed formulas is usually a bad idea; number the important ones
only. Extraneous parentheses can also be distracting. For example, in the phrase
"let k be (Ci - Cj) + 1," the parentheses should omitted.
You can overdo the use of any good tool. For instance, you could overuse typo
graphic tools by having 20 different fonts in one paper.
Two more topics were touched on (and are sure to be discussed further): the use of 'I' in
technical writing, and the use of past or present tense in technical wri ting.
Knuth says that Mary-Claire van Leunen defends the use of'!' in scholarly articles, but
that he disagrees (unless the identity of the author is important to the reader). Knuth
likes the "teamwork" aspect of using'we' to represent the author and reader together. If
there are multiple authors, they can either "revel in the ambiguity" of continuing to use
'we', or they can use added disambiguating text. If one author needs to be mentioned
separately, the text can say 'one of the authors (DEK)', or 'the first author', but not 'the
senior author'.
Knuth (hereafter known as Don) recommends that one of two approaches be used with
respect to tenses of verbs: Either use present tense throughout the entire paper, or write
sequentially. Sequential writing means that you say things like, "We saw this before. We
will see this later." The sequential approach is more appropriate for lengthy papers. You
can use it even more effectively by using words of duration: "We observed this long ago.
We saw the other thing recently. We will prove something else soon."
was the one of the first sentences out of Don's mouth on the topic of the exact definition
of "Mathematical Writing." He admitted that such a contest was "probably not the
right topic for this course." However, a program (presumably even an iambic pentameter
program) is among the documents that he will accept as the course term paper. He
will accept articles for professional journals, chapters of books or theses, term papers for
o t he r courses, computer programs, user manuals or parts thereof: anything that falls into
a definite genre where you have a specific audience in mind and the technical aspect is
significant.
We spent the rest of cl ass continuing to examine the homework assignment. In the interest
of succinct notes, I have replaced many literal phrases by their generic equiva.lents. For
example, I might have replaced'A > B' by '(relation)'. This time I have divided the com
ments into two sets: those dealing with what I will call "form" (parentheses, capitalization,
fonts, etc.) and those dealing with "content" (wording, sentence construction, tense, etc.).
First, the comments concerning form:
Don't overdo the use of colons. While the colon in 'Define it as follows:' is fine, the
one in 'We have: (formula)' should be omitted since the formula just completes the
sentence. Some papers had more colons than periods.
Should the first word after a colon be capitalized? Yes, if the phrase following the
colon is a full sentence; No, if it is a sentence fragment. (This is not "yet" a standard
rule, but Don has been trying it for several years and he likes it.)
While too many commas will interfere with the smooth flow of a sentence, too
few can make a sentence difficult to read. As examples, a sentence beginning with
,
'Therefore, does not need the comma following 'therefore'. But 'Observe that if
(symbol) is (formula) then so is (symbol) because (reasoning)' at least needs a
comma before 'because'.
Putting too many things in parentheses is a stylistic thing that can get very tiring.
(When Don moves from his original, handwritten draft to a typed, compute r sto red
-
Among the parentheses most in need of removal are nested parentheses. To this end,
it is better to write '(Definition 2)' than '(definition (2))'. Unfortunately, however,
you can't use the former if the definition was given in displayed formula (2). Then
it's probably best to think of a way to avoid the outer parentheses altogether.
In some cases your audience may expect nested parentheses. In this case (or in any
other case when you feel you must have them), should the outer pair be changed to
brackets (or curly-braces)? This was once the prevailing convention, but it is now
not only obsolete but potentially dangerous; brackets and curly braces have semantic
content for many scientific professionals. ("The world is short of delimiters," says
Paul Halmos introduced the handy convention of placing a box at the end of a
proof; this box serves the same function as the initials 'Q.E.D.'. If you use such a
box, it seems best to leave a space between it and the final period.
Try to make it clear where new paragraphs begin . When using displayed formulas,
this can become confusing unless you are careful.
Using notationa.l or typographic conventions can be helpful to your readers (as long
as your convention is appropriate to your audience). Boldface symbols or arrows
over your vectors are each appropriate in the correct context. When using a raised
'st' in phrases such as 'the i+1 ,t component', it's better to use roman type: 'i+1" '.
Then it's clear that you aren 't speaking of "1 raised to the power st."
Avoid "psychologically bad" line breaks. This is subjective, but you can catch many
such awkward brea.ks by not letting the final symbol lie on a line separate from the
rest of its sentence. If you are using TEX, a tilde ( ) in place of a space will cause the
-
two symbols on either side of the tilde to be tied together. (Other text processors
also have methods to disallow line breaks at specific points.)
Some of us are much better at spelling than others of us. Those of us who are not
naturally wonderful spellers should learn to use spelling checkers.
Allowing formulas to get so long that they do not format well or are unnecessarily
confusing "violates the principle of 'name and conquer' that makes mathematics
readable." For example, 'v = c + U(Ci - Cj + 1)' should be 'v = C + k", where
k = Ci - Cj + 1', if you're going to do a lot of formula m anipulation in which
(Ci - Cj + 1) rema.ns as a un it.
Be stingy with your quotation marks. "Three cute things in quotes is a little too
cute."
'
Remember to minimize subscripts . For example, Pi is an element of p' could more
easily be 'P is an element of P'.
Remember to capitalize words like theorem and lemma in titles like Lemma 1 and
Theorem 23.
On the other hand, parallelism should be used when it is the point of the sentence.
Now the comments involving content:
Try to make sentences easily comprehensible from left to right. For example, "We
prove that (grunt) and (snort) implies (blah)." It would be better to write "We
prove that the two conditions (grunt) and (snort) imply (blah)." Otherwise it
seems at first that (grunt) and (snort) are being proved.
While guidelines have been given for the use of the word 'that', the final placement
must be dictated by cadence and darity. Read your words aloud to yourself.
The word 'shall' seems to be a natural word for definitions to many mathematical
readers, but it is considered formal by younger members of the audience.
Be precise in your wording. If you mean "not nonincreasing," don't say "increasing";
the former means that Pi < Pj+1 for some j , while the latter that Pi < Pj+1 for
all j.
Avoid passive voice. (My temptation to write, "Passive voice is bad," was over
whelming. ) For example, replace "It can be shown .. . " by "A proof shows . . . .
"
Mixed tenses on the same subject are awkward. For example, "We assume now
(grunt), hoping to show a contradiction," is better than, "We assume now (grunt),
and will show that this leads to a contradiction."
Many people used the ungainly phrase "Assume by contradiction that (blah) ." It
is better to say, "The proof that (blah) is by contradiction," and even better to say
"To prove (grunt), let us assume the opposite and see what happens."
In general, a conversational tone giving signposts and clearly written transition
paragraphs provides for pleasant reading. One especially easy-to-read proof con
tained the phrases "The operative word is zero," "The lemma is half proved," and
"We divide the proof into two parts, first proving (blah) and then proving (grunt)."
You can give relations in two ways, either saying 'Pi < Pi' or 'Pi> Pi'. The latter is
for "people who are into dominance," Don says, but the former is much easier for a
reader to visualize after you've just said 'P (PI, Pl,
= , Pn) and i < j'. Similarly,
. . .
don't say 'i < j and Pi < Pi '; keep i and j in the same relative position.
Don opened class by saying that up until now he has been criticizing our writing; now he
will show us what he does to his own. Perhaps apropos showing us his own writing he
quoted Dijkstra: "A good teacher will teach his students the importance of style and how
to develop their own style-not how to mimic his."
First he showed us a letter from Bob Floyd. The letter opened by saying 'Don, Please stop
using so many exclamation points!' and closed with at least five exclamation points. After
receiving this letter he looked in The Art of Computer Programming and found about two
exclamation points per page. (Among the other biographical tidbits we learned at this
class were that Don went to secretarial school, types 80 words per minute, and once knew
two kinds of shorthand.)
Don is writing a book with Oren Patashnik and Ron Graham. The book is entitled
Concrete Mathematics and is to be used for CS 260. He showed us two copies of Chapter
Five of this book: one copy he called "Before" and one he called "After".
The Before copy actually carne into existence long after the work on the book began. Oren
wrote several drafts using the ]}TEX book style, and then the authors availed themselves
of the services of a book designer. The designer decided how wide the text was, what
fonts were to be used, what chapter headings looked like, and a host of other things. The
designer, at the authors request, has left room in the inner margins for "graffiti". That is,
for informal snappy comments from the peanut gallery. (This idea was "stolen" from the
booklet Approaching Stanford.)
The After copy is just another formally typeset revision of the Before copy. N ei ther copy
has yet been through a professional copy editor. Having now mentioned copy editors and
book designers, Don said, "In these days of author self-publishing, we must not forget the
value of professionals." (Actually, the copy edi tor was first mentioned when an error in
punctuation was displayed on the screen.)
Upon receiving a question from the audience concerning how many times he actually
rewrites something, Don told us (part of) his usual rewrite sequence:
His first copy is written in pencil. Some people compose at a terminal, but Don says, "The
speed at which I write by hand is almost perfectly synchronized with the speed at which
I think. I type faster than I think so I have to stop, and that interrupts the flow."
In the process of typing his handwritten copy into the computer he edits his composition
for flow, so that it will read well at normal reading speed. Somewhere around here the
text gets TEXed, but the description of this stage was tangled up with the description of
the process of rewriting the composition. Of course, rewriting does not all occur at any
one stage. As Don said, "You see things in different ways on the different passes. Some
things look good in longhand but not in type."
While discussing his own revisions, he mentioned those of two other Computer Science
authors. Nils Nilsson had at least five different formal drafts of his "Non-Monotonic Rea
soning" chapter. Tony Hoare revised the algorithm in his paper on "Communicating
Sequential Processes" more than a dozen times over the course of two years.
Don discussed the labours of the book designer and showed us specimen "page plans" and
example pages. The former are templates for the page and show the exact dimensions of
margins, paragraphs, etc. His designer also suggested a novel scheme for equations: They
are to be indented much like paragraphs rather than being centered in the traditional
way. We also saw conventions for the display of algorithms and tables. Although Don
is doing his own typesetting, he is using the services of the designer and copy editor.
These professionals are well worth their keep, he said. Economists in the audience were
not surprised to hear that the prices of books bear almost no relation to their production
costs. Hardbacks are sometimes cheaper to produce than paperbacks. For those interested
in such things, Don recommended a paperback entitled One Book Six Ways (available in
the Bookstore) that describes the entire production process by means of actual documents.
Returning to the editing of his Concrete Maths text, Don went through more of the Before
and After pages he began to show us on Monday, picking out specific examples that
illustrate points of general interest.
He exhorted writers to try to put themselves in their readers' shoes: "Ask yourself what
the reader knows and expects to see next at some point in the text." Ideally, the finished
version reads so simply and smoothly that one would never suspect that had been rewritten
at alL For example, part of the Concrete Math draft said
(Before) The general rule is ( . , . ) and it is particularly val� able because ( . . . ) .
. .
The transformation in (5. 1 2) is called ( . " ). It IS easIly proved· smce
( . . , and . . . ) .
1 '; ;
[§7. PREPARING BOOKS FOR PUBLICATION (2)
Reading this at speed and i n context made i t dear that readers would be hanging on their
chairs wondering why the rule was true; so we should first tell them why, before stressing
the rule's significance:
(After) The general rule is ( . . . ) and it is easily proved since ( . . . and . . . ).
[new paragraph] Identity ( 5. 12) is particularly valuable because ( . . . ).
It is called ( . . . ).
Don's favorite dictionary was of no help on the question of 'replace with' vs. 'replace by'.
The phrase 'by replacing - - by - - ' is bad (due to the repetition), and 'by replacing - -
with - - ' seems worse. In this case the solution is to choose another word: 'by changing
- - to - - '.
As a very general rule, try reading at speed. You will often get a much better sense of the
rhythm of the sentence than you did when you wrote it.
It is a bad idea to display false equations. The reader's eye is apt to alight upon them in
the text and treat . them as gospel. It is much better to put them into the text, as in "So
the equation ' . . . ' is always false!"
Be sure that the antecedent of any pronoun that you use is clear. For example, the previous
paragraph has two sentences beginning 'It is . . . '; they are fine. But sometimes such a
sentence structure is troublesome because 'i t ' seems to be referring to an object under
discussion. For example,
(Before) Two things about the derivation are worthy of note. First, it's a great
convenience to be summing . . . .
(After) Two things about this derivation are worthy of note. First, we see again
the great convenience of summing . . . .
Towards the end of the editing process you will need to ensure that you don't have a page
break in the middle of a displayed formula. Often you'll simply have to think up something
else to say to fill up the page, thus pushing the displayed formula entirely onto the next
page. Try to think of this as a stimulus to research!
Let proofs follow the same order as definitions, e.g., where you have to deal with several
separate cases.
Hyphens, dashes, and minus signs are distinct and should not be used interchangeably.
The shortest is the hyphen. The next is the en-dash, as in 'lines 10-18'. Longer still is the
minus sign, used in formulae: ' 10 18- -8'. The longest of all is the em-dash-used in
=
sentences.
When proofreading you may catch technica.l errors as well as stylistic errors. Think about
the mathematics too, not just the prose. For example, the book was discussing a purported
argument that 0° should be undefined "because the functions xO and 0' have different
limiting values when x -+ 0 " . Don revised this statement to " . . . when x decreases to 0,"
because 0' is undefined when x -+ 0 through negative values.
m
The formula
( k - �) - ; (m � 1)
1
k:'Om G)
(Before) L =
looks a bit confusing because of the minus sign on the right, 50 Don
changed it to
m; (m � 1 ) .
1
(After)
k� G) G k) - =
There may be many ways to write a formula; you have the freedom to select the best.
(This change also propagated into the subsequent text, where a reference to 'the factor
(k - r/2 ) ' had to be changed to 'the factor (r/2 - k ) '.
Somebody saw an integral sign on that page and asked about the relative merits of
J f(x) dx J f(x) dx .
- 00 % = - 00
Don said that putting limits above and below, instead of at the right, traded vertical space
for horizontal"space, so it depends on how wide your formulas are : Both forms are used.
We continued to examine Before and After pages from the book of which Don is a co
author. The following points were made in reference to changes Don decided to make.
When long formulas don't fit, try to break the lines logically. In some cases the
author can even change some of the math (perhaps by introducing a new symbol) to
make the formula placement less jarring. Such a change is best made by the author,
since the choice of how to display a complex expression is an important part of any
mathematical exposition.
Sometimes moving a formula from embedded text to one separately displayed will
allow the formula to be more logically divided. The placement of the equals sign ( = )
is different for line breaks in the middle of displayed versus embedded formulas: The
break comes after the equals sign in an embedded formula, but before the equals
sign in a display.
While editing for flow, sentences can be broken up by changing semicolons to peri
ods; or if you want the sentences to join into a quickly moving blur, you can change
periods to semicolons. Breaking existing paragraphs into smaller paragraphs can
also be helpful here.
While making such changes make sure to preserve clarity. For example, make sure
that any sentences you create that begin with conjunctions are constructed clearly.
and that words like 'it' have clear antecedents. (Sentences that begin with the word
'And' are not always evil. )
Make sure your variable names are not misleading. Variable names that are too
similar to conceptually unrelated variables can be confusing. Systematic variable
renaming is one of the advantages of text editors.
We noted last time that present tense is correct for facts that are still true; but it
is okay to use past tense for "facts" that have turned out to be in error.
Exercises are some of the most difficult parts of a book to write. Since an exercise
has very little context, ambiguity can be especially deadly; a bit of carefully chosen
redundancy can be especially important. For this reason, exercises are also the
hardest technical writing to translate to other languages.
Copyright law has changed, making it technically necessary to give credit to nil
previously published exercises. Don says that crediting sources is probably sufficient
(he doesn't plan to write every person referenced in the exercises for his new book.
unless the publisher insists). Tracing the history of even well-known theorems c a n
be difficult, because mathematicians have tended to omit citations. He recently
spent four hours looking through the collected works of Lagrange trying to find the
source of "Lagrange's inequality," but he was unsuccessful. Therefore he's not too
unhappy with the new law.
We can dispense with some of our rhetorical guidelines when writing the answers to
exercises. Answers that are quick and pithy, and answers that start with a symbol.
are quite acceptable.
5. Webster's New Word Speller Divider. Don said that people who don't spell well find
this book to be quite useful. [I saw no indication that he actually uses it, though. ]
6. Roget 's Thesaurus. This book is a synonym dictionary. Don says that he owns two,
one for home and one for his Stanford office, and he uses them in many different
ways: when he knows that a word exists but has forgotten it; when he wants to
avoid repetition; when he wants to define a new technical term or a new name for
a paper or program.
The issue of British versus American dialect came up. When wntmg for international
audiences, should we use British or American spellings and conventions"? Don says he
agrees with the rule that Americans should write with their own spellings and the British
should do the same. The two styles should be mixed only when, say, an American writes
about the 'labor of the British Labour Party'. ( Readers of these classnotes will now
understand why TLL and PMR spell some words differently. )
Should this course have been named "Computer Scientifical Writing" or "Informatical
Writing" rather than "Mathematical Writing"? The Computer Science Department is
offering this class, but until now we have been talking about topics that are generally of
concern to all writers who use mathematics. Today we begin to discuss topics specific to
the writing of Computer Science.
We are not abandoning mathematical concerns; Don says that a technical typist in Com
puter Science must know all that a Math department typist must know plus quite a bit
more. He showed us two examples where mathematical journals had trouble presenting
programs, algorithms, or concrete mathematics in papers he wrote. In order to solve the
first problem, Don had to convince the typesetters at Acta Arithmetica to create "floor"
and "ceiling" functions by carving off small pieces of the metal type for square brackets.
The second problem had to do with typographic conventions for computer programs; The
American Mathematical Monthly was using different fonts for the same symbol at different
. .
pomts m a procedure, was m ' t erchangeably usmg
' := ,
u n : = , an d " =: " t 0 represen t an
,, "
assignment symbol.
Stylistic conventions for programming languages originated with Algol 60. Prior to 1960.
FORTRAN and assembly languages were displayed using all uppercase letters in variable
width fonts that did not mix letters and numbers in a pleasant manner. Fortunately, Algol's
visual presentation was treated with more care: Myrtle Kellington of ACM worked from
the beginning with Peter Naur (editor of the Algol report) to produce a set of conventions
concerning, among other things, indentation and the treatment of reserved words.
Don found the prevailing variable-width fonts unacceptable for use in the displayed com
puter programs in Volume 1 of The Art of Computer Programming, and he insisted that
[2 0 § 1O . PRESENTING ALGORITH.\/S ]
he needed fixed width type. The publishers initially said that it wasn·t possible, but they
eventually found a way to mix typewriter style with roman, bold, and italic.
Don says he had a difficult time trying to decide how to present algorithms. He could
have used a specific programming language, but he was afraid that such a choice would
alienate people (either because they hated the language or because they had no access to
the language). So he decided to write his algorithms in English.
His Algorithms are presented rather like Theorems with labeled steps; often they have
accompanying (but very high-level) flow charts ( a technique he first saw in Russian liter
ature of the 1950s). The numbered steps have parenthetical remarks that we would call
comments; after 1968 these parenthetical remarks are often invariant relations that can be
used in a formal proof of program correctness.
Don has received many letters complimenting him on his approach, but he says it is not
really successful. Explaining why, he said, "People keep saying, 'I'm going to present an
algorithm in Knuth's style,' and then they completely botch it by ignoring the conventions
I think are most important . This style must just be a personal style that works for me.
So get a personal style that works for you. " In recent papers he has used the pidgin Algol
style introduced by Aho, Hopcroft, and Ullman; but he will not change his style for the
yet-unfinished volumes of The Art of Computer Programming because he wants to keep
the entire series consistent.
Don says that a computer program is a piece of literature. ( "I look forward to the day
when a Pulitzer Prize will be given for the best computer program of the year." ) He
says that, apart from the benefit to be gained for the readers of our programs, he finds
that treating programs in this manner actually helps to make them run smoothly on the
computer. ( "Because you get it right when you have to think about it that way." )
He gave us a reprint of "Programming Pearls" by Jon Bentley, from Communications of
the ACM 29 (May 1986), pages 364-369, and told us we had best read it by Wednesday
since it will be an important topic of discussion. Don, who was 'guest oyster' for this
installment of "Programming Pearls," warned us that "this represents the best thing to
come out of the TEX project. If you don't like it, try to conceal your opinions until this
course IS over. "
.
Bentley published that article only after Don had first published the idea of "literate
programming" in the British Compu ter Journal. (Don says that he chose the term in
hopes of making the originators of the term "structured programming" feel as guilty when
they write illiterate programs as he is made to feel when he writes unstructured programs.)
When Bentley wanted to know why Don did not publish this in America, Don said that
Americans are illiterate and wouldn't care anyway. Bentley seems to have disagreed with
at least part of that statement. (As did many of his readers: The article was so popular
that there will now be three columns a year devoted to literate programming.)
As Don began explaining the "WEB" system, he restated two previously mentioned princi
ples: The correct way to explain a complex thing is to break it into parts and then explain
each part; and things should be explained twice (formally and informally). These two
principles lead: naturally to programs made up of modules that begin with text (informal
But the book fails to be comp rehensible by novi ces. It fails because, as Don says , "If you
are a person who has been in the field for a long time, you don't realize when you are
using jargon." However, Don says that just because the AWK book fails to meet this goal
does not mean that it isn't a good book. ( "Perhaps the best book in Comp u ter Science
published this year." ) He explains this by saying, "If you try to write for the novice. you
will communicate with the experts-otherwise you communicate with nobody. "
Don opened class with the good news that Mary-Claire van Leunen has agreed to help
read the term papers and drafts thereof, despite the fact that her name was in co rr e ct.ly
capitalized in last week's notes.
Returning to the subject of "Literate Programming," Don said that it takes a while to find
a new style to suit a new system like WEB. When he was trying to write the WEB program
in its own language he tore up his first 25 pages of code and started again, having finally
found a comfortable style. He digressed to talk about the vicious circle involved in writing
a program in its own language. To break it, he hand-simulated the program on itself to
produce a Pascal program that could then be used to compile WEB programs. The task was
eased because there is obviously no need for error-handling routines when deali ng with
code that you have to debug anyway. But there is also another kind of bootstrapping
going on; you can evolve a style to write these programs only by sit t ing down and writing
programs. Don told us that he wrote WEB in just two months, as it was never intended to
be a polished product like '!EX.
We spent the rest of the class looking at WEB programs that had been written by under
graduates doing independent research with Don during the Spring. We saw how t hey had
(or had not) adapted to its style. Don said that he had got a lot of feedback and some
times found it hard to be dispassionate about stylistic questions, but that some things
were clearly wrong. He showed us an example that looked for all the world just· like a
Pascal program; t he student had obviously not changed his ways of thinking or wri ting at
in WEB. It consists of almost 1400 modules. The guiding principle behind WEB is that each
module is introduced at the psychologically right moment. This means that the program
can be written in such a way as to motivate the reader, leaving TANGLE to sort everything
'. . out later on. [The TANGLE processor converts WEB programs to Pascal programs.] After
all, we don't need to worry about motivating the compiler. (Don added the aside that
contrary to superstition, the machine doesn't spend most of its time executing those parts
of the code that took us the longest to write.) It seems to be true that the best way in
which to present program constructs to the reader is to use the same order in which the
creator of the program found himself making decisions about them. Don himself always
felt it was quite clear what had to be presented next, throughout the entire composition
of this huge program. There was at all points a natural order of exposition, and it seems
that the natural orderings for reading and writing are very much the same.
The first student hadn't used this new flexibility at all; he had essentially just used WEB to
throw in comments here and there.
A general problem of exposition arose: How are we to describe the behavior of a computer
program? Do we see the program as essentially autonomous, "running itself," or are we
participants in the action? Our attitude to this determines whether we are going to say
'we insert the element in the heap' or 'it inserts the element . . . '. Don favours 'we'; at
any rate one should be consistent.
Students used descriptors and imperatives for the names of their modules; Don said he
favours the latter, as in ( Store the word in the dictionary ) , which works much better than
( Stores the word in the dictionary ) . On the other hand, where a module is essentially
a piece of text with a declarative function-a list of declarations, say-we should use a
descriptor to name it: ( Procedures for sorting ) .
Incidentally, it is natural to capitalize the first letter of a module name.
One student used the identifier 'FindinNew Word3'. This looks comparatively bad in print:
Uppercase let ters were not designed to appear immediately following lowercase ones. Since
the use of compound nouns is almost inevitable, WEB provides a neat solution. It allows
a short underscore to be used to conjoin words like geLword . (Since the Pascal compiler
will not accept identifiers like this, TANGLE quietly removes the underscore.) Don told us
that Jim Dunlap of Digitek, who made some of the best early compilers, invariably used
identifiers forty-or-so characters long. The meaning was always quite clear although no
comments app eared .
Each module should contain an informal but clear description of what it actually does. A
play-by-play account of an algorithm, a simple stepping through of the process, does not
qualify. We are trying to convey an intuition of what is going on, so a high-level account
is much more helpful.
We saw several modules that were much too long. Don thinks that a dozen lines of code
is about the right length for a module. Often he simply recommended that the students
( Punt ) =
begin snap;
place;
kick;
end
( Punt ) =
snap;
place;
kick
comments.
Let the variables III the module title correspond to the local parameters m the
module itself.
According to this student's comments, his algorithm uses 'tail recursion'. This is
an impressive phrase, helpful in the proper context; but unfortunately that is not
the kind of recursion his program uses.
However, Don did grant that his exposition was good and gave a nice intuition of the
functions of the modules.
We saw a second program by the same student. It had the usual sprinkling of "wicked
whiches"-'which's that should have been 'that's. The purpose of the program was to
"enforce" the triangle inequality on a table of data that specified the distances between
pairs of large cities in the US. Don commented here that his project (from which these
programs came) intends to publish interesting data sets so that researchers in different
phases can replicate each other's results. He also observed that a program running on a
table of " real data," as here ( the actual "official" distances between the cities in question)
is a lot more interesting than the same program running on "random data." Returning to
the nitty-gritty of the program, Don observed that the student had made a good choice of
variable names-for instance ' villain.' for those parts of the data that were causing incon
sistences. This fitted in nicely with the later exposition; he could talk about 'cut throats'
and so forth. ( Don added that we nearly always find villainy pretty unamusing in real life.
Don declared that he didn't know what this was supposed to mean; it would be a lot better
to say 'Extra long messages can be seen if you make them move'.
It's good to have plenty of comments like 'Good luck!' and 'Enjoy" scattered here and
there. But Don thought the phrase 'this system has been carefully redesigned not to bite'
hardly reassuring.
told us that at the moment all papers are re-typed by the publishers, except for one or
two AI journals that have used TEX for several years. But next year a math journa.l will
be adopting a policy in which the author's text is manipula.ted electronically throughout
the whole process. This should speed publication and reduce errors and costs.
Some of the note's in the galley were signed ' Ptr', that is 'printer', and asked 'OK?'. Don
answers affirmatively by circling the 'OK'. At one point he was asked to sanction the
insertion of a whole new sentence. Apparently he had made reference to Figure 14 before
Figure 13, and his approval was sought to make an extra comment first about 'Figures
13-16'. (The extra comment was wrong but fixable. )
The publishers also insisted o n more details i n his bibliography. They wanted t o know,
for example, exactly where and when a conference had taken place. Someone in the class
pointed out that Mary-Claire van Leunen recommends omitting the location of conferences.
Don replied that libraries often nowadays index conferences by city for those poor souls who
can remember nothing else about them; so such information was useful. He observed that
people have a great tendency to copy citation information blindly into their own papers,
and so errors propagate unchecked. When Elwyn Berlekamp wrote his book on coding
theory, he found that nearly half the information in bibliographies of papers was wrong!
Don wrapped up the galley proof. discussion by showing us a few tables of (somewhat )
standardized proof-readers' symbols.
Today Don spoke about the refereeing process. A paper submitted to an academic journal
is usually passed to one or more referees by the editor of the journal. Each referee is
intended to be an expert in the relevant field, and thus in a position to tell the edit.or
whether or not the paper meri ts publication. Don pointed out that many of us will one
day find our papers being subject to just this scrutiny; and some of us will certainly be
-
asked to assess other people's papers ourselves.
Don talked about his now famous research on "The Toilet Paper Problem." This was first
published in the MONTHLY, and as Don pointed out to the Editor in his cover letter, mRlly
of its readers probably keep their copies in the bathroom anyway. The editor (H alulOs)
replied a little gravely that "jokes are dangerous in our journal" , and asked Don to think
twice about the scatological references. Don did agree to change the section names-which
originally continued the pun with such headings as 'An absorbing barrier', 'A process
of elimination', and 'Residues'-to innocuous equivalents, but kept the title intact. In
justification of this, Don pointed out to the editor that two talks had already been given on
his results under this title, and that the material had been widely circulated and discussed.
"Your toilet paper is accepted" replied Halmos. Don confessed that he still has occasional
doubts when he catches sight of the title amongst his papers , but the deed is done now.
Still, it did get reasonably good reviews, even in Russia.
Don showed us an article entitled 'Rules for Referees' by Forscher, published in Science
(October 1 5, 1965 ). These rules constitute a rather traditional view, Don said, and em
phasize the legal rights and responsibilities of all concerned. Don thought that this seems
a lot more oriented to the advancement of careers rather than of science as such; the right
reason to publish is to build upon the results of others and provide a foundation for fu t.ure
research. It is a sad truth, said Don, that an editor can all too easily find himself spending
a great deal of time dealing with those authors whose papers don't merit publicatioll. for
it is usually very hard to convince them of the fact . Rebuttals are followed by counter
rebuttals , and so on. But fortunately this doesn't happen so often that the whole business
of science gets bogged down.
The referee is conventionally regarded as a sort of "expert witness," whose task is to tell
the editor whether the paper deserves to be published or not. The first criterion should
be originality; is the material presented a genuine advance on previous work?
Don urged referees to see their primary responsibility as being to authors and readers, not
just to editors. Don himself decided long ago that he would put more of his efforts into
refereeing papers before their publication than into reviewing published papers. Don hoped
that he could thus do his bit to encourage high standards of writing in Computer Science
and help the field win respect . These days there are more good people to go around, both
in refereeing and reviewing.
In the 1960's Don circulated a list of 'Hints to referees' to try to encourage good practice.
He wanted to show us the list, but not a copy can be found. Don has written to some of
the people to .whom he sent it, so it is possible that a copy will turn up before the end of
the quarter.
[§ 1 5 · REFEREEING (1)
Don disagreed with our guest speaker, Herb ·'Nilf, who had said that he would tolerate
more stylistic lapses in the Journal of Algorithms t han in the M O NTHLY. Authors, thought
Don, should always be encouraged to do better; he could recall only a single occasion when.
as a referee or editor, he could recommend no improvements at all. (The author in this
case was George Collins writing for the ACM journal. ) Let us publish journals to be proud
of, he said. This was sadly not true of Computer S cience in the early 60s. Some published
results were just plain wrong; or a correct result was incorrectly proved; or a paper simply
contained no results at all! Contrast this state of affairs, said Don, to the math journals
that were published in the 20s and 30s-leafing through them at random we see a host
of familiar names and theorems, because so much of what was written then was polished,
significant, and worth reprinting in textbooks . The same could not be said of loday's
efforts-perhaps we have grown increasingly tolerant of substandard work.
Referees should try to be teachers, said Don. The author you criticize today will be writing
another paper tomorrow, so try to help him improve his writing. Unfortunately, referees
wi l l often be over-critical and make quite t asteless comments on papers, knowing that
they do so under a cloak of anonymity. This only angers the author and he learns nothing.
Try to supply constructive criticism, Don urged. These human issues are not discussed in
Forscher's ' Rules '.
In addition, the referee can contribute to the technical quality of a paper by giving ref
erences to related work of which the author was apparently unaware, or improving the
results. Don himself has contributed results anonymously to papers-more than one au
thor has had to add a footnote: "My thanks to the referee for Theorems 4, 5, and 6 . "
Don was always pleased t o feel that by doing this the image o f the journal was improved.
A journal should be seen as a source of wisdom, so let us be cooperative toward this end,
not legalistic.
How should one choose a journal to which to submit a paper? Don thought the answer is
to look for the one with the best referees, not the one with the least critical editor. After
all, an author presumably wants to know whether he has really made a contribution to his
field. So find a journal that has handled papers on related subjects.
Someone asked whether the letters that appear in journals are also refereed. Don said
that sometimes they are, sometimes not. There is often nothing to distinguish letters from
short papers.
Some journals do not use referees at all. Their readership must be willing to wade through
a great deal of nonsense. The ACM did at one time have plans to publish an unrefereed
journal, but to Don's relief that never came to fruition.
At this point Don confessed to a sneaky trick he had pulled way back in the 60s. At that
time he had just begun to edit the programming languages material for the Communica
tions of the A CM and the Journal of the A CM. He had no way of k nowi ng which of his
referees were any good, so in an effort to calibrate them he sent all a copy of the same paper
and solicited their opinions. Don had already refereed the paper himself, of course, and
found it a very badly written exposition of a very interesting algorithm (due to someone
besides the author) . As such, it was certainly worthy of the referees ' study.
D . Knuth
1. Publ1.h eaa ...tially a. 18 ; the only chanlea nec.. lary ue very limpl e
typographical matter. vbJ.ch een be change4 by the editor.
2. Publ1.h arter author ' . minor revislon; the reteree IUCle.tl pointe
vh i ch muat b. ehanpd before the peper ...ta the .tendardl tor publ1c:et1on.
}. Publ1lh only it the euthor malt.. major r ....ldOll. . (Perhapi the pIper
11 mueh too lone or 11 badly written. 'lbe ...... 1Ie4 pIper ,,1ll be ref....e4 agaln. )
.. . Rej.et . (Th.... 11 noth1n& lalv..nble . )
The l0al. of a re1'uee are to keep the quality of publ1clt 1 011 .. hi;;ll ..
pOlilbloe and al.o to help the author to produe. better paper. In the t'Uture .
Your ...1'..... . . report .hould be dea11De4 to 11ve the author the maximua beneflt,
Try to let every weher to put out the b.st paper
yet not cClllPrClll1 •• 011 quality .
he 11 clplbl. 01' vrit1n&i a p.p.r rarely falla in clt.lory 1 Ibov.. liner put I
pap.r in c.t.Sory 1, 11' yaU 1'••1 the author ean 40 b.tt.r, ev.n it th. plp.r &1
it It ends 18 r... caably 1004! Ii. paper .hould only b. put lnto eat.lory }, it the
lubstene. at the paper 11 canald.re4 .11D11'1cant enOUCh to warr ant the addltional
amount or llbor to rewrlte and ree0l1ll 1der th. p.per.
to jude. the pu1>U.habU1ty or the p.per 70Q certainly lin"" whIt 11 ,oed and
vhlt 11 bId but the 1'ollov1n& br1.1' li.t 11 lncluded her. � .
(. ) Th. paper lhoul4 contr1but. to the .tat. at the art lZJ4./or .hould b. I
100d expodtory p.per . It it 11 purely lXl'OIitory 1t .houl4 be clauly d.dsnat od
•• •ueh.
It 11 ta.pt1n& t o poetpoae ...t..... lnI tulia by putt lnl the paper .. 1ds tor
a fev day. .But it tu.. no lccler to do it todq than it ,,111 in a "e.k · . t1&• •
It au t••l th.t au are tor .c.a ... ..... un.bl. to rat.ree the • II' lelle re-
turn it imnedllt. at
• rv •• , the ...1'.1'•••• report 18 expect.d DO 1II0re tllan
four week.. R_ber th.t th. r.t.r.dne cyel. 1. "er1tical path tw." in the
publicat ion proc • • • •
R.turn the aanuacrlpt t o the e4iter; p l.... don ' t mark it up. Yau .hauld
lubmit the report in duplicat.. II_her thet one copy will b• ••nt dlreetly to
the luthor ; 1t i. up to you "hlth... 10U w�t to m.nt lon your n... an it or not.
It you dealr., you aq wr1t. an acccapanylne lettar to the Ultor ""lch of course
>lUI not bO. p.. . e4 on to the .uther. Tht. latte.. , howev..., IlUlt
. not cons t1tute
the refer •• ' . report .
Lemma. There is a one t e-one correspondence between the set of all real numbers Ii and
-
the set of all pairs ( (nk), (tk)), where (nk)k ;?: l is an increasing sequence of positive integers
and (tkh ?: l is a sequence of real numbers.
Notation. The sequences (nk) and (tk) corresponding to Ii are called (nk) and (tk ") .
The set of real numbers is called R.
such that, if 5 is any uncountable su bset of R, we have In ( 5 ) R for all but finitely =
man.v In .
Proof. Let the countable set A" consist of the real numbers
--
If a is any real number, define an increasing sequence of positive integers (In by letting
If = nr1 and then, after l�_l has been defined, letting I� be the least integer in the
sequence (nr' , n�· , ) that is greater than I�-l '
. . .
In(a) =
{ ta.
n ' if n l� for some k � 1 ;
=
0 , otherwise.
We will show that the sequence of functions In satisfies the theorem, by proving that any
set 5 for which infinitely many n have In ( 5 ) 'I R must be countable.
Suppose, therefore, that (nk) is an increasing sequence of integers and that (tk) is a
sequence of real numbers such that
for all k � 1 .
'
Let 0 be any real number 'I f3 such that a tf. Ail ' We will prove that Q tf. 5; this will
prove the theorem, because all elements of 5 must then lie in the countable set Ail U {f3} .
By hypothesis, f3 E A" . Hence we have f3 O k for some k . If we set n
= = l� , we know by
the definition of In that
lA - t'"
: In (�) il
n - tn -- tn ' - -
[Here are additional excerpts from PMR's classnotes for October 23:] The homework
assignment is due a week from today, Don said; so do it as well as possible, and let's not
have any excuses!
proof in minor ways. For example, it's not necessary to have the hypothesis C/ f. .8 to
conclude that C/ E -4.il or f3 E A" , because the existence of a family .-I." that satisfies
Sierpinski's more complicated hypothesis is equivalent to the existence of a family that
satisfies the simplified one.
The grader objected to the last sentence in the first paragraph of my proof. He asks, "Has
some 'initialization' of L" been omitted?" He apparently wants k 1 to be singled out a s
=
a special case, for more effective exposition. The sentence makes perfectly good sense to
me, but maybe there should be a concession to readers who are unaccustomed to empty
constraints.
Solution B introduces two nice techniques of a different kind. First, the lemma becomes
a sequence of ordered pairs instead of an ordered pair of sequences. Second, the need for
a notational correspondence between Q and the corresponding sequence is avoided by just
using English words, saying that one is the counterpart of the other. In other words. we
can hold back in giving notations for a correspondence, since plain words are sufficient
(even bet ter at times).
Solution B also "factors" the proof into two parts, one that describes a subgoal ( the crucial
property that the functions fn will possess) and one that applies the coup de grace. Much
less must be kept in mind when you read a factored proof, because the two pieces have a
simple interface. Moreover, the reader is told that the proof is "essentially a diagonalization
technique" ; this gives an extremely helpful orientation. It is no wonder that the grader
found Solution B easier to understand than Solution A.
Solution C i s by another student who found words superior to notation in this case.
Solution D cannot be shown in full because it contains seven illustrations, some of which
are in four colors. But the excerpts that are shown do capture its expository flavor.
A combination of the ideas from all these solutions would lead to a truly perspicacIOUS
proof of Sierpinski's theorem.
Lemma. There is a one-to-one correspondence between the set of all real numbers a and
the set of all pairs ( N, T) , where N is a countable set of integers and T is a sequence of
real numbers.
Notation. The set N corresponding to 0 is called N"" and the sequence T is called
(a I , a2, . . . ) . The set of real numbers is called R.
Theorem. Assume that there is an uncountable family of countable subsets ..1."" one for
each real number o . with the property that either 0 E Ai! or f3 E A", for all real a and (3 .
Then there exists a countable family F of functions i : R ..... R such that, if 5 IS any
uncountable subset of R, we have i( 5 ) R for all but finitely many i E F.
=
Proof. If a is any real number, we can construct a countable set of integers L", as follows:
Fork 1 , 2, . . . , let f3 be the ph element of A"" in some enumeration of this countable
=
set. Include in L", any element of Ni! that's not already present in L", because of the first
k - 1 elements of A", .
Now let F = { h , 12 , . . . } be the countable set of functions defined for all real a as follows:
in(o) = { f3n ,
0 ,
if n E L", and n corresponds to f3 E
if n ¢ L", .
.4" ;
We will show that F satisfies the theorem, by proving that any given set 5 � R is countable
whenever { n I in(5 ) # R } is infinite.
Let 5 be a set such that N = { n I in(5) # R } is infinite, and suppose that
for all n E N .
Let t n= 0 for n ¢ N . By the lemma, there is a real number f3 such that N N{3 and =
( t1 , t2 , . . . ) ( f31 , f32 , . . . ) .
=
Let a b e any real number such that 0 ¢ Ai! ' We will prove that a ¢ 5 ; this will prove
the theorem, because all elements of 5 then must lie in the countable set Ai! '
By hypothesis, f3 E A", . Hence there is some n E L" corresponding to f3 , and in(a) f3n =
so a cannot be in 5 . I
Lemma. There is a one-to-one correspondence between the set of all real numbers Q and
the set of all sequences of ordered pairs ((n" t'))'> I , where the first component (n,) is an
increasing sequence of positive integers and the second component (t,) is a sequence of
real numbers.
We shall call the sequence of ordered pairs corresponding to Q the counterpart of Q, and
vice versa.
Theorem. Suppose that there exists a family of countably infinite subsets of the reals R.
denoted by (A")"ER, with the property that Q "I {3 implies either Q E .4.11 or {3 E .-I." .
Then there is a sequence of functions In : R R such that for any uncountable subset 5
-t
Proof: Using the existence of (Aa)aER, we first construct a sequence of functions In with
the property that for all 0 , and for all {3 E Aa , there exists an ordered pair (n, t) in the
counterpart of {3 such that In (o) =t . The construction is essentially a diagonalization
technique. For each 0 , let the countable set A" be enumerated as
Start with (nl, t l ) being the first ordered pair in the counterpart of {31 ' Proceed inductively,
and let (n. , t k ) be the first ordered pair in the counterpart of {3. such that n. > n'_I . This
selection can be made because the first component of the counterpart of {3. is unbounded.
Thus, we have constructed a sequence of ordered pairs ((n., tk))'�1 with n. increasing and
each (n. , t . ) in the counterpart of {3• . Using this sequence, we then define the function In
by the rule
if n n. for some k;
=
otherwise.
Indeed, In is well-defined since ni "I nj for i "I j . Moreover, the sequence (In) has the
desired property that for every 0 and every {3 in A", there is an ordered pair (n, t) in the
counterpart of {3 such that In(o) t .=
Now we show that any subset S of R for which infinitely many n have In(S) "I R must be
countable, thereby proving the theorem. If In( S ) "I R then there exists a real t ¢ In( S ) .
S o if there are infinitely many In such that In( S) "I R, then there is a sequence of ordered
pairs (n, t) with n increasing and t ¢ In( S ) . Let the counterpart of this sequence of
ordered pairs be {3 . Thus, every ordered pair (n, t) in the counterpart of {3 has t ¢ In( S ) .
Now consider all real 0 ¢ All U {{3} . By the hypothesis, we must have {3 E Aa . We
constructed the sequence (In) in such a way that there is an ordered pair (n, t) in the
counterpart of {3 with In(o) = t . But by the choice of {3 , we have t ¢ In(S). Hence,
In (o) t ¢ In( S ) implies 0 ¢ S . Therefore S must be a subset of All U { {3 } , a cOl.\ntable
=
. . . If the real number a corresponds to the pair ((nk) , (tk ) ) , then we call (nkk2:1 the
integer sequence of Q and (tkh2:1 the real Jequence of a .
. . . Proof. Note that a given real number a has associated with it both integer and
real sequences, as well as a set of reals A", = {aI, a2, a3 , . . . } . We add to this list and
construct an infinite set of integers La = { I I , 12 , 13 , . . . } in which each I; comes from the
integer sequence of a i .
Solution D
As a step toward proving the Continuum Hypothesis, which states that there are no infini
ties between the countably infinite and the continuum, Sierpinski proposed the following
theorem.
Suppose we have a function, speci a l , that maps every real a to a count ably infinite subset
of the reals (Figure A). Now suppose we make the additional hypothesis that for any two
reals Q of a, either a E special or a E special (Figure B). Then we can dra.w the following
conclusion. There exists . . .
spec(Q) (__�.
� � .
�� __
__ ��
. . .__
.. �.
� __
__ __
__ __•__
• •
• __
__ __
__ __��
. --'
4--�
a
special ()( • )( )( )( )( * )( )( )()()/)
JJ. 1't
special • • ••• • • • )( •
Quotat ion . . . a writer expresses himself in quoting words that have been used
before because they give his meaning better than he can give it himself, or because
they are beautiful or witty, or because he expects them to touch a chord of
association in his readers, or because he wishes to show that he is learned and
well-read. Quotation due to the last motive is invariably ill-advised; the discerning
reader detects it and is contemptuous, the undiscerning is perhaps impressed, but
even then is at the same time repelled, pretentious quotation being the surest road
to tedium.
Mais malheur a' l'auteur qui veut toujours instruire! Le secret d'ennuyer est celui
de tout dire.
II ne faut jamais qu'un prince donne dans les details. II faut qu'il pense, et laisse
et fasse agir: n es t l'aime, et non pas Ie bras.
Don's secret delight, he confessed today, is to "play a library as if it were a musical instru
ment ." Using the resources of a great library to solve a specific problem-now that, to him,
is real living. One of his favourite ways to spend an afternoon is amongst the labyrinthine
archives, pursing obscure cross-references, tracking down ancient and neglected volumes,
all in the hope of finding the perfect quotation with which to open or conclude a chapter .
Don takes great pleasure in finding a really good aphorism with which to preface a piece
of writing. So many people have written so many neat things down the ages, he said, that
it behooves us to take every opportunity to pass them on. Don has been known to take
such a liking to a phrase that he has written an article to publish along with it .
So how are to find that wonderfully apposite quotation with which to preface our term
paper? Serendipity, said Don. Live a full and varied life, read widely, keep your eyes and
ears open, live long and prosper. You will stumble across great quotations. For example,
vVebster defines 'bit' as "a boring tool"-Don was able to use this when introducing a
computer science talk.
Sometimes one needs to go about the search more systematically. For example, Don's
TEXbook consists of 27 chapters, 10 appendices, and a preface. His format demands two
relevant quotations at the end of each of these. His METAFONT book posed exactly the
same problem. How did he go about it?
The first secret, he confided, is Bartlett . There are numerous dictionaries of quotations
[filed under PN 6000 in the reference section of Green Library] , of which B artlett's Familiar
Quotations is the most familiar. It was here, under the heading ' technique' in the index,
that Don found a quote from Leonard B acon deriding Technique as the death of true Art.
[§ 22. Q UOTATTONS 4 7)
:-.Iuw ' TE X ' , in greek, means both 'technique' and 'art ' , so this seemed pretty appropriate
for The TE;.X book where the (greek ) name TEX is explained.
When Bartlett fails, we can try the OED. This incomparable dictionary lists every word
along with contexts in which it has been used; very often it prints a memorable quotation
that incorporates the word in question. Likewise, we can turn to concordances of Shake
speare or Chaucer to find every single instance in which these authors used any given
word.
Leafing through The TEX book, Don picked out some of his favourites: Goethe on mathe
maticians ( and why they are like Frenchmen); Paul Halmos telling us that the best notation
is no notation (write mathematics as you would speak it!). Tacitus had something to say
about the macro (or rather, about the ancient politician of that name).
A stiffer challenge was provided by a book that listed the M ETAFONT code defining each
letter of the alphabet (as well as other symbols) in a certain typeface; Don had to come
up with quotes for individual letters of the alphabet. No problem: James Thurber had
proposed the abolition of '0'; Ambrose Bierce had scathing things to say about 'M' in
his famous Devil's Dictionary; Benjamin Franklin once wrote to Bodoni concerning the
exact form of the letter 'T'; a technical report about statistical properties of the alphabet
deliberately made no use of the letter 'E' .
Some of the best quotations are taken entirely out of context. The economist Leontief had
something to say about (economic) output; Don quoted him in his chapter on ( computer)
output. Galsworthy's comments on Expressionists found their way into his section on
expressIOns.
In a pinch, said Don, quote yourself. You could even find someone famous and ask her to
say something-anything!-on such-and-such a subject. In another desperate case, Don
couldn't find anything much that had been said about fonts. No matter, he quoted the
explorer Pedro Font writing about something else entirely (the discovery of Palo Alto, as it
happens). If you are Don Knuth, you may even be able to quote Mary-Claire van Leunen
praising your use of quotation!
Computer technology now gives us another quote-locating resource. When Albert Camus'
The Plague is available online, it will be a simple matter for this note-taker to find the
part in which a writer agonizes for a week before putting a comma in a particular sentence.
and then for another week before taking it out again; just search for occurrences of the
word 'comma' in the text. Don used this technique to find quotations involving the word
' expression' in Grimm 's Fairy Tales and Wutbering Heigbts, both of which are available
on SAIL.
If any member of the class would like to demonstrate virtuosity at "playing the library."
he could try to track down the quotation "God is in the details." Don rather identifies
with God in this, but hasn't been able to track down the reference. A number of people
have assured him that it originated with Mies van der Rohe, but despite reading all the
works and contacting the two biographers of this architect, he has not been able to find it.
Someone told him that Flaubert once wrote "Le bon Dieu est dans Ie detail." Don hasn ' t
the patience t"o search exhaustively in Flaubert '5 voluminous publications, but he did try
In addition to showing us his letter to Gardner (and Mr. Gardner's sympathetic response)
he showed the class the original and the edited versions. Among SA's changes: Changing
all uses of 'we', transforming some long sentences to several short sentences, transform
ing some short sentences into one long sentence, removing commas (commas that Don
found necessary), changing 'which's to 'that's, removing technical jargon, changing 'most
common' to 'commonest', and introducing a few errors. ( Don found many of the changes
means 'log to the base 10'; or to number theorists, when it means the natura.l log (base e ).
Not the greatest copy-editors, Don sighed.
More recently, Don wrote for the October issue of the ACM Transac tions on Graphics,
and encountered some really shocking copy-editing. They changed ' . . . data has to . . . ' to
' . . . data have to . . . ' . Now long ago Don was told that 'data' is really plural, but everywhere
it is used both as a singular or a plural, even in the reliably conservative (,antediluvian"
chimed Mary-Claire) New York Times. Don thought it quite right to use it as a singular
when referring to data as some kind of collective stuff. Don wrote and complained that
ACM did gracefully admit to and correct some straightforward mistakes, such as 'this
number plus that number are equal to 63'. But where Don wrote 1000000 they substituted
1 , 000, 000. Don objected that although this might be justified in text, his use is perfectly
O K in a formula. Well then, they replied, write 106 . Fine, said, Don, but what do I do
when the number is 1234567? The IEEE standard here is to insert spaces, thus: 1 234 567 .
Don doesn't like this in formulre, but agrees that it may be useful in a high-precision
context, such as numerical tables.
Don recalled a remark by George Forsythe that every scientist should try to write for a
general audience--not just for other scientists-at least once in his life. Don has done
this three times now, so feels that he's done his bit! He gave his first such lecture to a
non-technical audience in Norway and found it surprisingly hard to understand their 'mind
set'. The problem is to make the talk interesting, but convey how it feels to a computer
scientist to do computer science. The public probably imagine that mathematicians sit
and factor polynomials all day, and that CS types design videogames. How to convey the
soul of the subject to them? In this lecture, Don presented a sequence of algorithms for
a search task. S ince we all have to look up information in large tables or indexes now
and then, he hoped the audience would have a clear intuition of the problem. Brute force
searching is clearly too slow; binary search is natural and powerful; hashing is better still ,
but very unintuitive to most people. Don was asked to write up his talk for a Norwegian
magazine called Forskningsnytt, ' Research News' ( a sort of Scientiiic Norwegian). In the
course of doing so he learned enough of the language to wri te v and h instead of I and ,. to
designate left and right sons in a tree structure. Dr. Ole Amble, a numerical analyst who
was one of Norway's computer pioneers, helped Don with Norwegian style on this article ,
and got interested in search algorithms as a result. He asked Don whether there mightn't
be a way to combine the advantages of binary search and hashing? Don at first told him
"obviously not," but then realized what Amble meant . . . alas, too late to include in the
just-published Volume 3 of ACP. But this combination of methods made a nice conclusion
to his SA paper, which was based on this Norwegian prototype.
It was in April of 1977 that Don's travails with SA prompted him to investigate typesetting
for himself; in May of that year he designed the first draft of '!EX and spent his sabbatical
(and ten more years) perfecting it, putting Volume 4 of ACP on the backburner.
We had a few minutes left to look at other changes that SA made to Don's original
manuscript. In the first case we looked at, there seemed to be no reason for restructuring
a sentence to put Amble's name first instead of the motivation of his discovery. But Mary
Claire noted that SA always tries to stress the human contributions in science, sometimes
at the expense of the ideas. Don also mentioned another surprising thing he learned about
SA's editorial policy: They never display equations. (PMR knows at least one scientist who
refuses to read S A for this very reason-'How can you explain science without equations?-
All this prolific word production must have left him in verbal debt: When he finished
the book he tried to write a letter to Phyllis telling her how to type the book. He
couldn't. Except he must have eventually-the book is still in print and sells several
hundred copies a year (in seven languages).
1:
"Bopdul" haa had two '.u •• ovor .inc. it firn app.ared in tho
lansuas. late in tho 15th cantury. A p.non coulcl b. hopdul
(.zp.ctant • • as.r. d •• iroue) ; or a .ituation could b. hop.ful
(proai.ins . au.picioue . br1&ILt ) .
As Wilf took the dais he pronounced this "a marvelous course." ( "Taken earlier in my
career it would have saved me and the world a lot of grief-mostly me." ) The course
topic is one of daily concern for him; apart from writing his own papers he edits two
very different journals: the American Mathematical Monthly ( "The MONTHLY" ) and the
Journal of Algorithms.
The Journal of Algorithms was founded in 1980 by Wilf and Knuth and is a research
Journal. Results are reported there if they are new, if they are important , and if they are
significant contributions to the field. If these conditions are met, a little leeway can be
given in the area of beautiful presentation. But the MONTHLY is an expository journal. It
is a home for excellent mathematical exposition. (It also seems to be a popular place to
send "proofs" of Fermat's Last Theorem.)
Though he told us that he feels "older without feeling wiser" and is uncomfortable setting
down rules for a human interaction that "involves part brain and part hormone system"
he gave us several pointers.
Get the attention of your readers immediately. Snappy titles, arresting first
sentences, and lucid initial paragraphs are all methods of doing this.
Get everything up front. Tell your readers in plain English what you are going to
write about and let them decide for themselves whether or not they are interested.
( "You. can quintuple your readership if you will let them in on what it is that you
are doing." )
Remember that people scan papers when they read them. Potential readers
will skim looking for statements of theorems; if all of your text is discursive they will
having nothing to latch onto. Summarize your results using bold face ( "or neon" )
so that the page flippers can make an informed decision. Similarly, drop notational
abbreviations and convoluted references in the statements of theorems.
A little motivation is good, b u t readers don 't like too much. Presenting
examples that do not yield desired results can be quite useful, but the technique
loses its charm after a small number of such examples. (Far from overdoing this
technique, many writers will introduce mysteriously convenient starting points for
He gave us the name of three books (not written by anyone in the room) that he considers
superb books of mathematics:
Wilf commented that all three of these books are quite dry, but Knuth objected (along
the same lines as those used by Hardy and Wright in their preface) and Wilf amended his
statement: Each of these books is very lean.
Discussing the change of his own writing style over time, he told us that when he was
younger he didn't have much self esteem and stuck to established forms . Now that he
feels better about himself he has developed his own, much chattier, style. (Speaking of
chattiness, he is also a fan of the use of the first-person in technical writing.) He says he
aims to be chatty leading up to a proof, prove it in the "lean and mean" style that Rudin
would use, and then be chatty again after he finishes the proof.
The last things that Wilf discussed were two handouts ( § 28 and § 29 below): "Enumeration
of orbits of mappings under action of en , the cyclic group," and "Counting necklaces."
Each handout discusses the same mathematical problem, solved the same way. "Enumer.
at ion of . . . " takes a half page; "Counting Necklaces" takes four pages.
Some audience members will appreciate the half page of exposition that is condensed to
the word "evidently" in the shorter paper; some will merely be annoyed by it. As the .
Montbly editor he gets letters from people who complain about the informal style creeping
into recent publications. "Mathematics is a serious business, not a comic pursuit," said
one such letter.
Finally, Wilf doesn't mean to say that either of the two approaches is superior ( "They are
the two sides of the coin" ) ; he means for us to examine each and decide what techniques
we want from each.
B. Nimble
nipul ations .
The problem is to find F(n, Ie), the number of different n-bead necklaces
of at mo.t Ie colon (let'l call th_ (n, Ie) necklaces) .
Among all (n,le) necklaces we distinCUish a subset that we will call the
' prime ' necklaces. Say that a necklace is prime if it doea not result from
concatenating a number of repetitions of a ahorter pattern. ThUl, among
the (4, 2) necldac. above, the am one is not prime because it results from
stringing together 4 identical .horter atrlnp (via., 'A'). The third and sixth
ones are also not prime, whereu the second, fourth and fi.fth ones are prime.
Let M(n, Ie) denote the number of prim. (n,le) necklaces (e.g., M(4, 2) =
3).
The re&lOn for concentrating on these prime necklaces will now appear.
We claim that .1I.rll po••i61. on. 01 the Ie" po••i61. linur .trin9. 01 n 6eads
01 Ie eolor. un he wn8trueted onee and On/II one. 611 .uch a cuttin9 opel'CltiorL
To prove that, let w be such an n.string, and let d be the smallest integer
L dM(d, k) = k n (n � 1).
din
where � is the Mobius function. Hence we have an explicit formula for the
number of prime necklaces.
Unfortunately, that wasn't the question. The question was to find the
number of all different necklaces whether prime or not.
Fortunately, the la.tter number, F(n,k), is easily obtainable from M(n, k) .
Take a. divisor d of n and a prime (d, k) necklace. Cut it somewhere, and make
n/d copies of the resulting string. Concatenate them (sound familiar? ) into
a single n string, but now (this part shouldn't sound familiar) tie its ends
together. The result of this opera.tion is a. nuklace, not just a linear string.
Hence it doesn't matter where we ma.ke the cut: wherever the cut is made, the
end result is the same necklace.
Bottom line: F(n,k) = L d,.. M(d,k)
Since we ha.ve an explicit formula ( 1) for M and an explicit formula for F
in terms of M, we're finished, aren't we? Well in a sense yes, but if we stick
with it we'll find tha.t simplifying the expression is half the fun. Where we are
is tha.t
F(n,k) == L M(d,k)
dl..
== L(l/d) L �(d/d')kd'
dl.. d'id
The first step in the simplification process is to invoke the Law of Double
Sums: 'Intercha.nge Them'. This gives
L T(td'!n)p,(t)/t = L p,(t)/t
c tl(n/d')
where the fine print has now reverted to the bottom of the ad.
Next we want to relate this lut sum to Euler's totient function, from the
theory of numbers. The well known evaluation of the Euler function in terms
of the prime factorization of an integer n is
if>(n) = n II (1 - l/p)
pin
where the product is over prime divisors of n. IT we multiply out all of the
factors of the product we get an enormous alternating sum of reciprocals of
various product. of prime divisors of n. Those product. run through precisely
the square-free divisors of n, i.e., those in which no prime factor is repeated,
and those are exactly the divisors of n on which the Mobius function is nonzero.
What that all boils down to is that
if>(n)/n = L P,(d)/d.
di n
IT we substitute into the inner sum that we've been fussing over, and then
substitute back in (2) we get the finallinal result that exactly
way. let's try another simple case of the formula. This time. suppose n is a
prime number. The virtue of that assumption is that primes have only two
divisors. so there are only two terms in the sum for F(n. k} . A er recalling p
that ¢(n) = n -1 if (and only if) n is prime. we discover that
integer k might be. Well that is exactly Fermat's Little theorem, and we got a
proof of it as a spinoff from a combinatorial formula. The main point is that
counting formulas must give integer answers, and if the answers don't look like
integers then we may have discovered something interesting.
By the way, formula (3) must still be an integer even when n isn't prime.
but it sure doesn't look like it. I wonder what that might mean...
6� 1
[§2g. WILF'S OTHER EXTREME
§3 0. Excerpts from class , November 18 [notes by P M Rj
"No man but a blockhead ever wrote except for money. "
Today we got an entirely different perspective on the whole ball of wax. Don began his
fortnight's sabbatical by turning the stage over to one of Computer Science's most prolific
authors: Professor Jeff Ullman. A large crowd had gathered to hear Jeff's advice on "How
to get rich by writing books" -an illustration of one of the principles of cover design, he
said: Attract people with something that isn't in the book at all.
Jeff started by talking a bit about the pragmatics of publishing-how the money flows. He
kicked off with a back-of-an-envelope calculation. A book is a megabyte of text. Jeff can
write perhaps two or three kilobytes of first draft per hour-say one kilobyte per hour of
finished text. We can all train ourselves up to much the same performance, he asserted. So
it takes around a thousand hours of labour to write a book. Now then, a typical CS text
might sell for $40. A good book on a specialized topic, or a mediocre book on a general
topic, might well sell 1500 copies in the US and 500 copies abroad. ( These figures put the
200,000 copies of Don's ACP sold in the USSR into some perspective.) A 15% royalty is
standard on domestic sales, a rather lower rate for foreign sales. All in all , our talented
specialist or so-so generalist can expect to net maybe $8000 over his book's lifetime of
perhaps five years. Of course, fame as well as fortune is to be gained through publication,
but Jeff dismissed such non-financial motivations as being beyond the scope of his talk.
"I told you to be a lawyer. Or a doctor," someone's mother was heard to whisper. But
Jeff forestalled a mass exodus to the GSB by going on to tell us how to make book-writing
a going concern. Firstly, he said, it's quite feasible to double the royalty rate. CS authors
have some leverage with publishers in that their books sell quite well-a publisher's costs
are very sublinear in the number of copies sold, so he can afford to pay a lot more for a
book that will sell 5000, instead of 2000, copies. What's more, a computer scientist often
keeps his publisher's costs down by preparing his own camera-ready copies. Jeff is happy
to tell you more about how to drive a hard bargain with your publisher-go and talk to
him about it!
Secondly, you need to aim for ten thousand domestic sales; say two thousand a year for five
years. That's 5-10% of the entire market in a topic like compilers or operating systems.
There's nothing off-the-wall about this, provided you find the right niche: Let yours be
the hardest book on the subject, or the easiest. Or the best. This wasn't so hard to do in
the early days of CS, when there was a big demand for textbooks but only a few authors;
it's certainly going to get harder as the field matures. If you're going for the big bucks ,
advised Jeff, choose a young and booming field-biogenetics perhaps.
Increase your royalties and sales, arid your efforts can net you as much as a medium-grade
hooker's: say $100 per hour. Top-notch computer scientists should aspire to no less.
to consider?
This last remark brought Jeff ( "I am not a lawyer") Ullman round to the tricky subject
of plagiarism. According to Prentice-Hall's Guide to Authors, imitation ceases to be the
sincerest form of flattery and becomes something much more culpable if a reasonable person
could not believe that you didn't have the other chap's book open in front of you as you
wrote yours. That said, remember that you can't copyright ideas as such, but only ways
of expressing them. Jeff shamelessly admits that his Compilers book borrowed another 's
table of contents and the general front-to-back expository scheme.
,
Jeff showed us a suspicious case in which an author had written "Knuth has shown . . . .
and then went on to quote more-or-Iess verbatim from ACP. The coincidence of notation
is hardly conclusive, he said, but the identical use of italics is pretty damning.
Don here pointed out that his disciple had actually corrected a typo, for one sentence
was in fact the exact logical negation of the other. But this book contained much worse
examples of plagiarism: A dozen or so successive equations lifted straight from elsewhere.
In these notes, names have been suppressed to protect the guilty.
Someone asked about second and subsequent editions. Jeff said that these will still consume
a kilohour or so, although they'll go faster if you can use your earlier examples. But the
financial advantages are very real: People stop buying a book when it has been out for five
years, so publish a new edition and start the clock ticking again!
One person asked about writing survey papers-surely they will contain a lot of verbatim
quotes? There's no problem since the writer is not presenting the work as his own, Jeff
said. Besides, accusations of plagiarism hinge on financial loss, and no one writes technical
papers to make money. But be explicit in your quotation if you feel more comfortable
doing so.
Why don't expositions of CS make more use of analogy, asked someone, drawing an analogy
with physics texts ( which are planted thick with analogy, metaphor, and simile). Jeff
thought it partly due to the nature of the subject, but encouraged us to use analogy where
we are sure that the reader will get the point.
Asked about progress on Parallel Computation, Jeff confessed that it may never be finished:
"That's another point about co-authors . . . " . Jeff left us, and Don, to reflect on his maxim:
The first thing Leslie told u s was that h e would restrict his advice t o the writing o f papers
(not books) . "I have one thing to say about writing a paper for publication: Don't. The
market is flooded. Why add to the detritus?" After the appropriate dramatic pause, he
continued with, "But seriously folks, somebody has to write papers."
While we are asking ourselves if our own papers are worth writing, Leslie asks that we
keep in mind two bad reasons for writing a paper:
The first bad reason is "to have a long publications list." Leslie says he would like to think
that the people who are supposed to be impressed by a long publications list would be
more impressed with quality than quantity. Admitting that this might not always be the
case, he appealed to our own sense of integrity to police us where others' standards do not .
The second bad reason is "to have a paper published in a specific conference." Leslie
has known people whose need to insert papers in specific proceedings is greater than their
need to disseminate accurate information. This approach "sometimes leads to pretty sloppy
papers." He told us that he knows of one case where the authors of a conference paper
promised to send a correction, once they figured it out, to each conference participant.
Leslie recognizes one good reason to publish a paper: "You have done something that you
are excited about."
Just how excited can you be and yet not publish a paper? Leslie was once told: "Judge an
artist not by the quality of what is framed and hanging on the walls, but by the quality of
what's in the wastebasket." Similarly, Leslie thinks that we should be judged on the "best
thing that we have done that we decided not to publish."
Moving on to how we learn to write well, Leslie told us that learning to write is more like
learning to play the piano than like learning to type. While both typing and piano-playing
involve motor skills, a good pianist must spend much time studying music in its entirety;
he must ·spend more time away from the piano than in front of it. Correspondingly, we
should learn to write by reading. Leslie payed homage to Halmos and Knuth, but said that
they can not match Fowles and Eliot: We should read great literature in order to learn
how to write good mathematical literature.
We must know what we want to present before we can present it well. As Leslie said,
"Bad writing comes from bad thinking, and bad thinking never produces good writing."
We must keep in mind what we are writing-and to whom.
The question of audience is closely related to where a paper, once written, should be
published. Appropriate places may be a Tech Report, a letter, a Journal, or the bottom
drawer of your desk. (Don't really throw anything out: it is good to have the record-, even
if you don't publish your work.) How do we choose?
What characterizes a good first sentence? Leslie says to "avoid passive wimpiness," but
to be simple and direct . "Get right down to business." Of course, once you have hit your
readers in the gut with your first sentence, you can't let them down with your second.
Continuing in this vein, by induction, "When you come to sentence number 2079, you've
got to keep socking it to them." (He illustrated this by reading an arresting sentence from
the middle of The Four quartets by T. S. Eliot , choosing the sentence at random. )
Leslie finished his lecture b y saying, "I am not T . S . Eliot. I need t o pay more attention
to my writing. As do we all."
PaOOF: Let a ana b be two points in the interval with 4 < b. We must
prove that f( a) < f(b ).
By the Mean Value Theorem, there is some x in (a, b) with
I'(x) = i=�(a)
f( b
PaOOF: Let c and b be two point. in the interval with 4 < b . We must
prove that /(c) < /(6).
Statement RejN!O!l
1. There exists x in (c, b) with
1. The Mean Value Theorem.
= f( b) - f(o)
I'(x) b-o
2. f'(x ) > 0 2. By 1 and hypothesis.
. f( b) - J( c)
3 >0 3. By 1 and 2.
b-c
4. b - c > 0 4. By choice of 0 and b.
5. J(b) > J(4) 5. By 3 and 4.
Nils always tries to teach a course on a topic at the time that he is writing it up-it's
ridiculous to inflict your ramblings on the world unless you are prepared to do this, he
said.
Nils decided that since he had the whole book on-line, he would take a crack at publishing it
himself. He and his wife Karen set up the Tioga Publishing Company. One big advantage
about this cottage-industry approach is the ease with which the author can make changes
in subsequent editions. Karen went on to become a full-time publisher; Tioga's theme has
now changed from AI to nature and the environment. So Nils considers himself pretty well
"vertically integrated" in the world of books.
3 . Read. Read a great deal; it'll sharpen your style and get your critical faculties
working.
4 . Model the Reader. Deja vu. This should be obvious, said Nils, but there's really
a lot to it. Ask yourself what the reader's primitives are, and write with them in mind.
In fact, the whole issue is so complex and important that Nils likes to operationalise it
with AI-type "dremons." Any number of these have to be running in the background
as you write, catching errors and providing constructive criticism. You have to be
asking all the time; "How is the reader going to misunderstand me here?" You must
* "Easy reading is damned hard writing." -Nathaniel Hawthorne. (Just lucky to find
it -DEK.)
5. Master the Medium. You need a good vocabulary, though this needn't mean a
huge list of big words. There are issues other than pure language: indexes, tables ,
graphs, and how to use them to best effect. As Don pointed out earlier, we can use
typography to make important distinctions, as with the typewriter font for logical
formulae.
In the future, said Nils, it's clear that reacling and writing will be far more interactive
processes-The Media Lab i s not all hype. It's not clear yet what will prove necessary or
useful; just as i t took several centuries to invent the index, it will probably take us a long
time to identify the "stable points" offered by our new technology. We in the audience are
at the cutting edge of these experiments.
6. Master the Material. There's a lot of internal feedback involved in wri ting; one
comes to understand the material in a new way on trying to organise it for publication.
Nils drew this diagram :
Internal
Model
writing
Text
As Mary-Claire said on Wednesday, "How do I know what I mean until I hear what I say?"
Even Nils sometimes finds himself thinking "I don't believe that!" when he hears himself
lecture. I am reminded of the ( true) story of a professor who was always seen to take a pad
of blank paper with him when he delivered a talk. When asked what was for, he replied:
"Why, if I say anything good I'll want to write it down! " So go to lecture� and classes,
give talks. All these things help modify your internal model and get things into shape.
"In a very real sense, the writer writes in order to teach himself, to understand
himself; the publishing of his ideas, though it brings gratifications, is a curious
anticlimax."
-:- Alfred Kazin
Ted Shortliffe did a great job with Mycin, Nils said. But with 20/20 hindsight he might
have done better to invent a simplified system for expository purposes. For example,
he could have demonstrated the backward-chaining techniques and only later dealt with
"certainty factors."
By using simple examples we can get ourselves on the winning side of the 80-20 rule: we
can convey 80% of the truth with only 20% of the difficulty. Mathematicians, of course,
like to go the other way: They never state a theorem in three dimensions if it can be
generalised to n. Such terse elegance can be painful for the reader.
8. Avoid Recycling. With on-line text and sophisticated editors ( I refer to software,
not the mandarins behind Scientiiic American) it is very tempting to re-use portions
of old material. Resist the temptation. Almost certainly you are writing in a new
context, with a new emphasis. Hopefully you are older and wiser, and perhaps even a
better writer than you were when the old material was written. So do rewrite it , it's
worth the extra effort.
9. Aim for Excellence. You've got to keep shooting for perfection, even if you'll never
get there. What the Great have said on this:
"We are all apprentices in a craft where no-one ever becomes a master."
- Ernest Hemingway
"Ah, but a man's reach should exceed his grasp, or what's a Heaven for?"
- William Blake
"The message of these books is that, here in the 80s, 'good' is no longer good
enough. In today's business environment, 'good' is a word we use to describe
an employee whom we are about to transfer to a urinal-storage facility in the
Aleutian Islands. What we want, in our 80s business executive, is somebody
the title of her talk: "Calisthenics." Mary-Claire opened her talk by telling us a story.
Many years ago Mary-Claire was a frequent passenger on the Chicago bus system. The
neighborhood where she boarded her #5 bus was a gathering spot for "bummy guys."
All of these guys were interested in money: Some begged, others peddled. Among the
peddlers-hawking wares ranging from trenchcoats full of watches to freedom from the
peddler's presence--was a man whom Mary-Claire patronized quite regularly. He sold
pencil stubs (obviously collected from trash bins); but Mary-Claire said his patter was
charming enough to rate one or two purchases a week.
"These pencils are magic pencils," he would say. "Buy a magic pencil. Only 25
cents. "
"What's a magic pencil?" would come the expected response.
"With this pencil, you can write the truth."
Inevitably, someone would pipe up, "But I can write lies with it."
"Oh, you can break the magic. But if you really believe, you can write the truth."
Mary-Claire sees this as the wonder and the motivation behind the craft of writing: If
you work hard, you can explain a new truth to someone you will never meet-perhaps to
someone who will live after you are dead.
Such a vocation requires preparation. The Composition Exercises that Mary-Claire has
given us (see § 36 below) were designed to help US become as strong as we can. Our readers
are more likely to be tolerant of a few weaknesses if they are surrounded and supported
by strength.
Mary-Claire has given these exercises to students before, but preparing this draft for
our class pushed her to really write the exercises. The copy that she referred to over
the TV monitors was slightly different than the copies that we have been given. Mary
Claire, hoping that these differences represent improvements, invites us to suggest further
improvements to the draft. (She says that she might publish something that evolves from
this draft-but probably not soon: She is not a fast writer.)
The first set of exercises, labeled "Vocabulary," is designed to increase our command of
just that.
The first of the pair is an exercise that was done by little Greek boys: Taking a composition
and swapping all the old words (nouns, verbs, adjectives, and adverbs) for new ones. What
The second vocabulary exercise, the writing of a thesaurus entry, is best done o\-er a
week. After several days of slowly adding to our set of synonyms, we should compare
our entry with an entry in our thesaurus. ( Everyone needs at least one thesaurus and a
good unabridged dictionary. In addition to more than one kind of dictionary, Mary-Claire
recommends Sidney Landau's book Dictionaries to help us understand how to best use our
dictionaries. )
"Syntax," the next set of exercises, deals with syntactic mastery. Mary-Claire says even
though vocabulary improvement is more often considered than increasing our command
of syntax, syntactic armory improvement is more important. She says that most of the
time we will use our basic three to four thousand words; we must use them in the most
interesting way possible_
Speaking of using words in interesting ways, Mary-Claire has been reading the first draft
of our term papers. There must be room for some improvement there: Her first comment
was, "Nobody sits down to write a boring paper." How can we tell when something we
write is "syntactically impoverished" ? She gathered some statistics that might help us get
the right idea.
One of her tricks was to study the first 10 complete sentences on the third page of every
paper. First she charted the average length of the 1 0 sentences: They varied from 15.6
sentence t o 24.4 words per sentence. Mary-Claire says that any of us with averages under
20 words per sentence are in the correct range for adult writing_ (But the writer with the
24.4 average had better have results pretty wonderful, to compensate for the extra work
that it takes to read his paper.)
Sheer variation in sentence length is one indication of syntactic variation and appropriate
pacing. With 10 sentences we should be aiming for 9 or 10 different lengths. The samples
from our papers yielded 6 to 9 different lengths. The difference between the word count
on the shortest sentence to the word count on the longest varied between 1 7 words to 3 7
words. The ideal chart of sentence lengths should look like a bell-curve centered around
1 5 to 1 8 words per sentence.
She asks us to note that we did not have enough short ( "and punchy" ) sentences. A few
long sentences are also important. She said, "A well constructed 46-word sentence is not
a difficult beast, but it had better not be the your crucial point." We should remember
that we have a responsibility to emphasize and deemphasize our points to the reader; long
sentences are one method of deemphasizing a point.
Beyond the word counts, she looked at the the templates used to construct our sentences.
For example, she found two writers who would appear to be similar if we just looked
at their sentence length average and variation, but who had quite different methods of
constructing their sentences. One of these writers used the same sentence construction for
almost every sentence (adverbial + subject + transitive-verb + object), and the other
used many different styles of construction. But the second writer was not free of flaws. He
had two sentences in a row with a full independent clause followed by a full parenthetical
Most writers are aware how important the manual part of composition is: They have very
rigid restrictions on how they compose. ( "Oh, I can only write on yellow pads with a
fountain pen." Mary-Claire says we should be able to compose on a cocktail napkin.
While discussing the section labeled "Frozen sounds," Mary-Claire told us about reading
aloud to her students their own writing. Some students were chagrined; others glowed.
She says we should form partnerships with other novice writers: Read and listen to each
other. But she cautions that a little goes a long way. If the writing is good, we can live on
that joy for quite a while; if the writing is bad, we won't be able to stand it for very long.
At this point in the lecture, Mary- Claire noticed that very few minutes remained. So her
comments on the final exercises were limited to those that she thought were the most
important.
Concerning the "Marks on paper" exercises, Mary-Claire quoted from E. M. Forster: "How
do I know what I mean till I see what I say?" We need to remember that writing is "the
most forgiving medium known to man ." We can work on it until we get it right.
Rushing past the "Stance, voice, and tone" section, she told us that she borrowed tech
niques from speech therapists-who ask patients to exaggerate their defects until they
understand just exactly what characterizes their defects. For example, she says, "If any
one has ever told you that you are 'breezy,' write something truly off the wall."
She told us that the sections labeled "Observation," "Same as and different from," and
"Invention" are less important for us than for pure writers. Our discipline provides the
glue that writers with more freedom have to manufacture from scratch.
She warns us that the "Scansion" exercises are hard, but very important. She realizes that
she may have trouble convincing us that we need to write verse in order to learn to write
mathematics, but once again she says, "Trust me."
She reminded us that the "Precis" exercises were touched upon by Leslie Lamport in his
talk. At some point we cannot reduce the word count of a piece of prose without changing
the structure of that prose. ( We should never change the meaning, but we will have
to dispense with some details.) This point comes at different percentages of decrease
depending on the flabbiness of the original text.
The final exercises she discussed, "Nearly real," are aptly named. They really are very
much like real writing. For instance, Mary-Claire says that "Writing a joke is exposition
at its purest. Things aren't funny unless they a,re well written."
She suggests that we try "Ben Franklin's exercise," rewriting a passage of someone else's
from memory and limited written hints-but that we try it with Don's writing. When we
have finished, what do we like better about Dan's version? What do we like better about
our own?
Before the cameraman could shoo us out of the room, Mary-Claire reminded us once again
that these exercises are "very hard work." She closed with, "I hope they will serve you as
well as they have served me."
Composition Exercises
Unless you plan to do nothing else but composition exercises, there are enough here to last you for the
next decade. I had a wonderful student who did nearly all of them in a year, but he really did do nothing
else. Some of the exercises are quite deep, and you might easily be able to do them again and again in
different guises for the rest of your li!e. I've done all of them myself.
Many of these exercises tell you to take a passage of such-and-such a length and work some kind of
transformation on it. Whole pauage should it be, your own or someone else's? Either; or rather, both.
You can leam diHererlt things by doll\g the exerdJe diHererlt ways. If a piece of your own writing Is still
fresh enough In your mind so that you can ,eu_1ber what problems you were trying to solve as you
wrote it, the exper!eiIce of working comp1etely arbitrary changes on it can be exhilarating, not unlike
setting dollar billa on fire.
I . Replace.
Take a passage five pages long and replac:e at least three words in every sentence with others
that rrean approximately the same thing. (- "Get into your hands a 1500- word portion of a
written work and swap out of every sentence a minimum of three words, substituting others
without changing what's being said." - "Select a longish section from something you've
been reading and change the vocabulary of every sentence without changing the
signification.")
2. Multiply.
Choose a word and write a thesaurus enny for it - all the words at every level of diction that
mean approximately the sarre thing. Compare your enny to the entries in which the word
actually appears in some real thesaurus.
Most writers like to have several good desk dictionaries and at least one good thesaurus. In addition,
you might llke • book by Slclney 4ndau called DictioMria. It has helped me understand how
dictionaries and thesawwes get made and thus how to use them better.
Syntax:
t. Transform.
Take . passage of five pages and transform every sentence so that it says approximately the
same thing in different syntax. Change the vocabulary as little as possible. (_ 'Taking a
five-page passage, transform every sentence to say .n approximation of the same thing in
different syntax." - "Can you transform every sentence in a passage of five pages so that
approximately the same thing gets said in different syntax?")
Write a sentence containing a tight parallel: • pair of structures that match perfectly in the
number and kind of aU their parts .nd subordinate structures. Push yourself till you can
construct sentences that contain tight parallels in which sch member is fifteen or twenty
words long.
3. Be periodic.
4. Eulphasize.
Reast a sentence so as to express the same meurlng but emphasize a different point.
S. Write nonsense.
Write a paragraph of coherent, tightly structurecl nonsense - all the connectives and labels in
place but n9 �ing. (M.de-up words not allowed.)
,
1 . Copy I.
Copy out a passage of your own writing or someone else's with a pen; with a pencil; with a
crayon; first with your left hand and then with your right; with a manual typewriter; with a
word-processor.
2. Copy II.
Copy out a passage from somethillg you lil<e; from something you dislil<e; from something
you find difficult to read; from something you find laughably easy; from someone you'd like
to imitate; from someone completely unlike you; from something written a hundred years
ago; from somethillg written last year; from somethillg scrawled off in haste; from something
overwritten and finicky.
The first of these copying exercises Is exploratory and interesting, and I certainly recommend that you do
it, but the second is is in another class altogether. Copying as a means of close reading is an inexhaustible
source of information. Word-processing has temporarily confused writers about the connection between
their hands and their brains. What you do with your hands Is the easy part. Use It to support the hard
part. The hard part which is what you do with your brain.
Frozen sounds:
1 . Transcribe.
Make a tape recording of five minutes of raello news and transcribe it. Transcribe five
minutes of a publlc lecture; five minute. of dialog from a television show; five minutes of
ordinary convelMtion among tluee o� four people. Talk extemporaneously into a tape
recorder fo� five minutes and traNaibe that.
2. Usten.
Read aloud a page of your own writing. Ask a friend of yours to read it aloud. Ask a second
friend. Ask a stranger.
Learn to mutter aloud what you're writing as you write it. It's only a minor eccentricity, and there's no
more efficient way of checking tor both cadence and tone.
Compose a paragraph with your eyes shut. Start over again from the beginning on a fresh
piece of paper as often as you like, but don't peek.
2. Tabulate.
Write a sentence with two conditionals ("If it rains and if Peter arrives on time .. "); rewrite it
.
De9c:ribe a picture in a single line that fits under the picture exactly; in two lines that fit
under the picture exactly.
1 . Change stance.
nephew, for instance. Write a description of this morning's events as a letter to your spouse
or lover; rewrite the description as a letter to an old hIgh�1 teacher of yours; rewrite it
yet again as a report to an examining psychiatrist; to an anthropologbt; to a police inspector;
to a reporter from People magazine.
2. Take both sides.
Write a vigorous, closely reasoned argument for SODle small household economy like re
using plastic bags or turning mattresses; now write a vigorous, closely reasoned argument
against
3. Hyperbolize.
Describe your current dwelling al an UNO'Upulous realtor would; describe your most recen t
meal in restawant-menu proee; describe an object on your desk as if It were for sale by mail
order.
4. Euphemize .
De9c:ribe the sympto� of severe gutroenteritis accurately but without recourse to vivid
language. Describe human aexual lntercoune In the diction of a knowledgeable prude.
Describe an employee's forced resiption for incompetence in language that attempts to
leave no opening for a libel suit
5. Obfuxate.
Take a passage of simple prote and rewrite It so t1!at the saDIe Ideas seem obscure and
difficult.
6. Pontificate.
Take a straightforward passage written in the first person and rewrite it so as to make the
author seem pompous and self-Important.
Take an argumentative passage and inject it with doubts, quibbles, and hesitations.
8. Strengthen; vitiate.
Find a weak. flabby paragnph and rewrite it, inventing ideas and details where necessary, to
make it vigorous and strong. Now do the reverse: Find a strong paragnph and weaken it.
9. Change tone.
Look through a magazine or a newspaper for a sarcastic letter to the editor; rewrite it to
make the same point but without the sarcasm. Fmd a short factual piece; rewrite it as the
preamble to a petition asking for some action on the facts.
Observation:
2. Rethink.
Describe a favorite food by its appearance alone; describe only the sounds in the opening
credits for a movie; categorize and describe the objects on your desk by texture; by color.
3. Sensualize.
Choose an object and describe it by sight; by sound; by smell; by taste; by touch. Choose an
event from your daily life and describe it as a sensory experience.
4. Louis Agassiz's exerc:lse.
Put a green leaf or a flower on a plate and describe It eve%)' day for two weeks. (The original
version used a fish and took two months.)
1. Compare.
Choose two unlike things and build a simile capturing some point of slmllarity between
them. Choate two siuUlar things and explain how they differ.
2. Analogize.
Invent an extended analogy that would help an illiterate understand what a library is good
for; that would help a child understand getting fired from a job; that would help a city
dweller understand the agricUltural year.
3. Differentiate.
Choose a ten- or fifteen-word entry in a the$l.uru5 and explain how the words differ from
one another.
1 . Combine words.
Choose two words at random from a dictionary and write a sentence that uses both of them;
choose three words at random and do the same.
2. Categorize.
Take twenty nouns at random from a dictionary and arrange them in categories. Write an
explanation of your scheme for arranging them.
Usually we don't need to do pure invention; we start from something, even if it's only "What I Did on My
Summer Vacation." Much of learning a discipline is learning how to do invention - how to recognize the
kinds of ideas that malce that discipline go forward and how to get yourself into position to have such
ideas yourself.
Another part of learning a discipline is learning what you don't have to invent because it's already been
done for you. The lonN lor taking advantage of that bacl,log of ideas vary from one discipline to
another, but the underlying habits of thought are simllar. The best set of exercises I have ever seen on
those habits is at the end of the section called "External Aids to Invention in Edward P.J. Corbett's
"
Scansion:
1. Venily.
Render a newspaper story in couplets of IunbIc tetrameter; in triplets of dactylic hexameter.
Render a recipe into rhymes veri Ilbre. Render an expository passage as a ballad. Render a
short argumentative puaage II a villanel.le
2. SoMetize.
Write a new SOMe! every day for a week. (Be sure to throw these sonnets away.)
3. Explode.
Take a piece of metric vene and expand every line by one foot without altering the meaning
- from tetramell!r to pentameter, for instance, or from pentameter to hexameter.
Verse-writing is to other compoaition exercises what lifting hundred-pound weights is to touching your
toes. But you must honor conventional rhyme and stresa in order to get the benefit of writing verse;
otherwise you'll cheat yourself by writing near misses. For help on words lllce "villanelle" and
"heumeter: get a prosody handbook; get John Hollander'S, and you'll find yourself reading it for fun.
1 . Reduce.
Choose a passage and count the number of words in it. Reduce them by 5% without
changing the meaning; by a quarter; by half.
2. AbstTact.
Nearly real:
1. flip.
Take a paragraph and rewrite it so that the last sentence comes first and the first sentence
comes last. The middle will have to be completely rewritten, but try to change the first and
last sentences as little as possible.
2. Repace.
3. Crunch.
Write a JOO.word description of some ooncrete physical object without using any adjectives
or adverbs; write a thousand-word description without any.
4. Unpack.
Take a metaphoril:al pasaage in either verse or prose and rewrite it as a series of flat-footed
proM comparUons.
5. Define.
Write dictionary definitions for a common word 1i1ce "hand" or "mean" or "find." Compare
your definitions to those in several dlctionaries.
6. Exemplify.
Chooee an abstract noun Uke "pne1oeity" or "fortitude" and describe three or more instances
of It. Impose an order on the instances and explain the order.
7. Explain.
Rewrite a simple .water pattern to meet the needs of someone who has never knitted.
Rewrite a cho<X!late-c:ala! recipe for someone who has never cooked. Rewrite directions on
how to set IpitIon points lor someone who has never driven a car.
8. Instruct.
Write a set of instruction. on how to draw some fairly complicated object without even
naming it or any of ils parts - a Christmas tree with ornaments and a star on top, lor
instance, or a house with a chimney, windows, doors, and foundation plantings.
10. Translate.
Buy a book in a language you don't know and a bilingual dictionary for the language.
Translate passages.
II. Ben Franklin's exercise.
Take a passage of someone else's, three or four pages long, and reduce it to a set of one- and
two-word hints to yourself about the contents, each written on a separate piece of paper.
Jumble the hints, put them in a box, and take them out again after three weeks. Arrange
them and reconstruct the passage. Compare your reconstruction to the origina\.
12. Push.
Write sentences at a deliberate pace for five minutes without repeating yourself, without
writing nonsense, and without stopping. Increase the time gradually till you can do this
exercise for twenty minutes.
Tropes:
In addition to doing all these exercises, my student and I also worked our way through Richard
unham's HArullist of R1retoriad TtnIIlI, writing an example for every rhetorical figure listed.
Contrary to
what I had expected. writing an example for "ery figure in unham turned out to be quite shallow. [
believe that merely thinking about our doing it will give you every bit as much benefit as doing it
yourself.
Books Mentioned
23 November 1987
Benjamin Franklin.
The Autobiography.
Edward P. J. Corbett.
ClAssical Rhetoric for tM Modmt Student.
Oxford University Press, second edition 1971.
John Hollander.
Rhyme'J &tuorI:
A Guide to E"glWl V".".
Yale Univenity Presl, 1981.
Sidney 1. l.&ndau.
DictioPUlrits:
TIft Art II,", Craft of
Charles Scribner's Sons, LaiaJgrtqJIry
19M. .
RJchud A. l.&nham.
A Hll1IIIlUt of R1retoriad Tmns:
A Guide for StudlmfJI of E"gliM uu,IIM6.
Univenity of Callfomi.a Press, 1969.
During the whole of a dull, darlc, and soundless day in the autumn of the year,
when the clouds hung oppressively low in the heavens, I had been pa.... ing alone,
on hor.. ebaclc, through a singularly dreary tract of country; and at length found
myself, a.. the shade.. of evening drew on, within view of the melancholy Terman
Engineering building.
-E. A. Poe (amended)
Don, like Mary-Claire, scans the pages of The New Yorker for choice malapropisms to
entertain us. In its columns the law firm of Choate, Hall, & Stewart had been rendered
as Choate, Hall, Ampersand, and Stewart, presumably by a journalist receiving dictation
over the telephone.
We also saw a splendid dangling participle from the same source:
"Flavor and texture of cooked okra are different from other vegetables. We usually
don't eat it raw, but in judging at fairs, I frequently taste a slice of a pod to check
maturity and condition. In soups, it is used as a thickening agent. When fried, I
love okra."
[When sober, can't stand the stuff. -The New Yorker]
Don announced that he had good news and bad news for us. He gave us the good news
first. Mary-Claire is to speak again on Wednesday. Also, Don finally got up the courage
to ask Paul Halmos to appear in our guest spot; he readily agreed and will speak next
Wednesday (9th December). This talk should be a fitting climax to the course. And a
week from today (Monday, 7th December) we will hear from Rosalie Sterner, a copy-editor
for The San Francisco Chronicle.
Having thus softened us up with these cheerful tidings, Don delivered the Bad News: The
first drafts of the term papers were, well . . . "their content was not one hundred percent
pleasing to your instructor." What makes a professor's life worthwhile? The knowledge
that he has succeeded in teaching something. In particular, there's a joy in the thought that
a student was able to do something that he couldn't have managed without the professor's
help. Don confessed that this joy did not run through him as he read our drafts. Indeed,
he could almost think that many of them were written before the class started. Have we
been relaxing too much, he wondered? Has our writing in fact changed at all? Have we
learnt nothing? Disturbing thoughts, he said.
Of the thirteen papers submitted, eleven were sprinkled with wicked whiches-at least two
in each. Don himself has been guilty of these in his time, and of course there is no-one like
a convert for rooting out heresy. But these are the 80s and we are supposed to be sensitised
to these things. And heaven knows, we've talked about this issue enough in class! So what
is he to think about this landslide of carelessness? Shaking his head, Don declared that
we left him with no alternative-he would have to resort to the ultimate sanction: a quiz.
In keeping with Honor Code protocols, Don left the room while we each wrote· a sen
tence that used a 'that' correctly where a 'which' would have been wrong, and another
Don returned. We spent a few minutes looking a t the various examples that the class had
come up with, some correct and some incorrect. By and large, the class redeemed itself by
the creative solutions that were submitted:
All the students that know when to use 'which' and 'that' will pass the quiz. The
exam, which took place at the beginning of class, was not difficult.
A paper that uses two whiches improperly does not demonstrate that the author
hasn't learned anything. My first draft , which was written this summer, had a
million of them.
Beware of examples that are misleading. My term paper, which contains many
wicked whiches, is otherwise not too bad.
Is it not true, TLL asked of Mary- Claire, that people invariably get their whiches and
thats right when they speak? Mary-Claire replied that people almost never say 'which'
improperly in general speech-it's only when they feel under pressure that they resort to
this unnatural diction. So unnecessary use of 'which' really conveys a bad tone in your
writing; it makes you sound nervous. ( Conversely, on paper we can often fool our audience
into thinking that we are a lot more comfortable than we really are).
Don observed that all translations of the Bible are strewn with erroneous whiches. ( "Thou
shalt not llUffer a wicked which t o live," he might have said . ) A clamour of voices pointed
out that Fowler is quite clear about the rule. True, but it was never enforced until the late
iDs, Don countered. It seems particularly strange, he said, that the New English Bible
should commit this error, as its editors take great pride in the literary qualities of their
text. Mary-Claire resolved this mystery: Apparently our "oldest and closest allies" on that
far-off island regard this whole issue as unmitigated nonsense!
Don made a final plea to us: "You all keep your text online, so it's very very easy to locate
all your whiches and check them. Please don't cause your instructor any more pain on this
score!"
Sneaky Don had saved one more item of good news to lighten our spirits after this depress
ing interlude: A letter from Leonard Gillman, editor of the Seirpinski proof over which we
had laboured many moons ago. Professor Gillman was fulsome in his praise of our sugges
tions, and is now working on an improved write up. Particular credit went to Student B,
of course. Gillman is an Emeritus Professor as of three months ago-Don drools to think
of all the free time he must have.
We moved on. Don claimed to have discovered a new ( ?) rule only by seeing it broken in
three of the papers he read. It is t his: The text should make sense if we read through it
omitting the titles of subsections. So, for example, don't say:
your article if you intended to use it. (The equals sign was invented by Robert Recorde
in his Whetstone of Wit te, 1557, but it did not come into general use until more than a
hundred years later. Descartes used '=' to mean something completely different. ) The
moral: Ask yourself what background your readers share, and what they may or may not
have in common. "Be aware of what's diverse in your readership" .
We saw a somewhat intimidating multi-part definition. It would become less formidable
to the reader if shortened. In this particular case, the expression
since W2 is used nowhere else. (In the Haddad-Schaffer paper, ' . ' means 'zero or more'
and ' + ' means 'one or more', but let's not worry about that here.) Try to be succinct,
said Don: "Less is more."
It is important to be consistent in your use of terms, and you need to be especially careful
about this when working with co-authors. In this paper, one writer talked about 'dom
inators' and the other about 'parents" referring to the same concept. (Freudian slip? )
A related issue: Don't define terms that you never use. Don recalled Feynman's complaint
about New Maths: you are taught the symbols n and U in second grade, but you don't
use them in any nontrivial way for seven years.
Next came a tricky question of tenses. "Gabow and Tarjan[Gab83] show that for many
algorithms that had such a multiplicative factor in their worst-case complexities, the mul
tiplicative term can be removed." Here 'had' should be 'have'; an algorithm lives forever,
and its worst-case complexity is a timeless fact about it. However, the problem · solved
by an algorithm can have different known complexities at different times; therefore 'had'
We talked about abbreviations for bibliographic references. Don didn't like the lack of
space before the bracket in " . . . Tarjan[Gab83j . . . " ; neither does he like this kind of thing:
"In [Smith 80) it was shown . . . " . References should ideally be parenthetical; we should be
able to read the sentence ignoring them and still have it make sense (cf. subheadings ) . Some
citation styles write up names and dates in full, but this can get repetitious: " . . . Knuth
[Knuth83) has shown that . . . " . Don's paper on goto's was published first in ACM Com·
puting Surveys and later incorporated into a book. For this second printing he had to
make numerous changes to the sentences containing citations, because the originals would
look strange in the different context and format of the book. Oren Patashnik pointed out
that the Chigaco Manual of Style recommends that you don't number references, lest you
have to make changes all through the text every time you insert a new one. This is less of
an issue when a system like TEX handles such things automatically, of course. The CMS
is full of such efficiency tips.
Too many commas can be a bad thing (bad things? ) . For example, consider this sentence:
"Our algorithm to recognize and label the graphs when given a directed graph, G , with
distinguished vertex s, can be summarized as follows." Remove the commas around 'G'
and put one after 'graphs'. A s a rough guide, put a comma where a speaker would pause
to draw breath.
The word "loop" was ambiguous when first used; Don replaced it by "self-loop " .
A sentence in the paper began "If any Hi (j > 0 ) h as . . . " . In fact i t was known that Ho
satisfies the stated condition, so Don suggested that the authors simplify the statement by
omitting the j > 0 condition. Moral: Give a simple rule rather than an optimal one.
Elsewhere we saw " . . . all the Hi 's . . .". This is of course the standard way to form
the plural of a symbol, but you are going to get into trouble when you start also using
the construct Hj (that is, Hi primed) . A simple way to avoid the problem is just to
say: " . . . each Hi is . . , " . Alternatively, you might want to invent another name for the
concept , particularly if you are going to be using it time and time again. It's just not
elegant to have too many symbols crowded on the page. At one point the authors wrote
" . . . of Hi S descendants . . . " . This doesn't work at all; you do need an apostrophe for the
genitive (possessive) case.
Instead of " . . . the one vertex path . . . " , write " . . . the one-vertex path . . . " .
The preposition 'at' would be better than 'of' in " . . . vertex of distance < d.,. " .
Some authors have a disconcerting habit of using a lemma or theorem that is not proved
until later on in the book. This can leave the reader wondering whether someone hasn't
We have no spoken evidence from the 17th century, but Language Theorists believe that
writing and speech were very far apart. That is, they believe that no one's ideal was
to write the way that he sounded. Theorists cite two pieces of evidence to support this
claim: The first is that the Theorists themselves find it difficult to believe that, in the last
three centuries, spoken English has evolved as fast as it must have if the written language
and the spoken language originally matched. The second piece of evidence comes from
examining extant 17th century guidelines on writing or speaking effectively.
By examining samples of writing from the 19th century (particularly the everyday writing
that was used for communication rather than as examples of great literature), we can see
that the written language has evolved into one much closer to the spoken language. Lan
guage Theoreticians of that time said that this evolution was good, but their admonitions
came after the direction of evolution was already evident. (We should remember that our
language belongs to millions of people. It cannot be controlled by the decrees of any one
person or group.) We now move on to our own century.
In 1906 H . W. Fowler and F. G. Fowler published The King's English (Oxford University
Press still has it in print) . Here the brothers Fowler write down for the first time that
conversational rhythms are to be reflected in written English.
In 1926 H. W. Fowler published Modern English Usage ( also still available from Oxford
University Press). In this book the surviving brother continues the explanation of the
relationship between spoken and written English-but he does so much more clearly.
While we are following the hot trail of our current subject, we should not lose sight of the
vast range of the contributions that Fowler made in this landmark book. Mary-Claire calls
Fowler the " great theoretician of the semicolon." Fowler saw the semicolon, which has no
spoken equivalent, as a structuring device that operates between the levels of the sentence
and the paragraph. This is just one example of how Fowler tried to utilize the graphical
nature of print to the advantage of written English.
Returning to the evolution of written English toward spoken English, let's examine how
people use which and that when they talk.
Speakers do not use which as a relative pronoun because speakers do not normally express
thoughts that are long enough to contain non-restrictive clauses: OUI spoken sentences
are shorter than our written ones. People do use which when they talk, but they use
non-referential whiches to introduce new thoughts that are tacked on to old thoughts .
Examples of this kind of usage seem strange when written down ( because we don't use
non-referential whiches in written English), but they sound perfectly normal when heard
I went sailing this weekend; which tells you why my nose is pink.
Fowler realized that written English would sound more like speech if the choice of relative
pronoun was uniquely determined by whether or not the clause it introduced was restrictive
or non-restrictive. He wrote several thousand words on this subject; here are a few of them:
A supposed, and misleading, distinction is that 'that' is the colloquial and 'which'
the literary relative. That is a false inference from an actual but misinterpreted
fact . It is a fact that the proportion of 'that's to 'which's is far higher in speech
than in writing; but the reason is not that the spoken 'that's are properly con
verted into written 'which's. It is that the kind of clause properly begun with
'which' is rare in speech with its short detached sentences, but very common in
the more complex and continuous structure of writing, while the kind properly
begun with 'that' is equally necessary in both. This false inference, however,
tends to verify itself by persuading the writers who follow rules of thumb actually
to change the original 'that' of their thoughts into a 'which' for presentation in
print.
The two kinds of relative clause, to one of which 'that' and to the other of which
'which' is appropriate, are the defining and the non-defining; and if writers would
agree to regard 'that' as the defining .relative pronoun, and ' which' as the non
defining, there would be much gain both in lucidity and in ease. Some there
are who follow this principle now; but it would be idle to pretend that it is the
practice either of most or of the best writers.
There is no doubt that Fowler has had a significant influence on the English language. but
why is it that his effect on American English has been greater than on British English?
To answer that question, we move our focus to New York in the year 1925: Harold Ross
has just founded the New Yorker magazine.
Ross was a man who liked things to be clearly defined. He took Modern English Usage as
gospel. For decades the New Yorker had reliably influential prose, and for decades H. W.
Fowler's dictums were applied blindly to that prose. Mary-Claire was nearly nonplused as
she mentioned reading a collection of letters from a New Yorker editor to various literary
luminaries. ( "I'm sorry, but we had to change all your whiches to thats," sounds r ather
presumptuous when addressed to John Updike .) The New Yorker no longer treats Fowler
as divinely inspired, and they haven't since the 1950s, but that leaves close to three decades
of blind obedience to consider.
c
�J
.
The lawn mower that is broken is in the garage. (Tells which one . )
The lawn mower, which i s broken, i s i n the garage. (Adds a fact about
the only mower in question.)
Mary-Claire has a copy of the first edition of The Elements of Style, in which White uses
a ' which' for a 'that' (this has been changed in later editions) . The line originally read:
The Elements of Style has many departures from guidelines presented by Fowler. It was
written for the American audience, and it was written for an audience without a high level
of grammatical sophistication. In contrast, as Mary-Claire said, "Fowler is rough going
for those of us whose Latin is weak and whose Greek is non-existent ." Future editions of
Fowler may need prefaces explaining what adjectives, adverbs, and the like are. It is most
common for people to learn those terms when they learn their first non-native language
( though Latin is the only language to which the terms are perfectly suited) .
Fortunately for native English speakers, there i s a rule completely lacking in jargon that
we can use to determine whether a 'which' should be a 'that ' :
Mary-Claire attributed this rule to Leslie Lamport. Leslie says that his version of the rule
is actually:
If it sounds all right to replace a 'which' by a 'that ' , then Strunk & White say
replace it.
" Which and that are not in themselves very important. But tone is important,
and tone consists entirely of making these tiny, tiny choices. If you make enough
of these them wrong-choices like which versus that-then you won't get your
maximum readership. The reader who has to read the stuff will go on reading
it, but with less attention, less commitment than you want. And the reader who
doesn't have to read will stop."
First, she reminded us of "Franklin's Exercise" from her previous lecture. She read us a
passage from The Autobiography of Benjamin Franklin where Franklin mentions it:
A question was once, somehow or other, started between Collins and me, of the
propriety of educating the female sex in learning, and their abilities for study.
He was of opinion that it was improper, and that they were naturally unequal to
it. I took the contrary side, perhaps a little for dispute's sake. He was naturally
more eloquent , had a ready plenty of words; and sometimes, as I thought, bore
me down more by his fluency than by the strength of his reasons. As we parted
without settling the point, and were not to see one another again for some time,
I sat down to put my arguments in writing, which I copied fair and sent to him.
He answered, and I replied. Three or four letters of a side had passed, when
my father happened to find my papers and read them. Without entering into
the discussion, he took occasion to talk to me about the manner of my writing;
observed that, though I had the advantage of my antagonist in correct spelling
and pointing (which I ow'd to the printing-house) , I fell far short in elegance of
expression, in method and in perspicuity, of which he convinced me by several
instances. I saw the justice of his remarks, and thence grew more attentive to the
manner in writing, and determined to endeavor an improvement.
About this time I met with an odd volume of the Spectator. It was the third.
I had never before seen any of them. I bought it, read it over and over, and was
much delighted with it. I thought the writing excellent , and wished, if possible,
to imitate it. With this view I took some of the papers, and, making short hints
of the sentiment in each sentence, laid them by a few days, and then, without
looking at the book, try'd to compleat the papers again, by expressing each
hinted sentiment at length, and as fully as it had been expressed before, in any
suitable words that should come to hand. Then I compared my Spectator with
the original, discovered some of my faults, and corrected them. But I found I
wanted a stock of words, or a readiness in recollecting and using them, which I
thought I should have acquired before that time if I had gone on making verses;
since the continual occasion for words of the same import , but of different length,
to suit the measure, or of different sound for rhyme, would have laid me under
a constant necessity of searching for variety, and also have tended to fix that
variety in my mind, and make me master of it. Therefore I took some of the tales
and turned them into verse; and, after a time, when I had pretty well forgotten
the prose, turned them back again. I also sometimes jumbled my collections of
hints into confusion, and after some weeks endeavored to reduce them into the
best order, before I began to form the full sentences and compleat the paper.
This was to teach me method in the arrangement of thoughts. By comparing my
work afterwards with the original, I discovered many faults and amended them;
but I sometimes had the pleasure of fancying that, in certain �articulars of small
Next, she reminded us of the verse-writing exercises that she had so highly recommended
during the previous lecture. She showed us the book that she and her student, Steven
Astrachan, had worked through together: A Prosody Handbook, by Shapiro and Beum.
She said what would have been even better was Rhyme's Reason by John Hollander.
Mary-Claire said we should all buy Hollander's book, and then she tried to make sure we
would by playing on the Computer Scientist's love of self-reference. This is Hollander's
description of one particular poetic form:
Mary-Claire told us that she once wrote out a recipe for making bagels in Alexandrine
couplets. It was a good exercise, and it was hard. She says that it was so hard that she
actually began to believe that the results would be intelligible (and interesting) to someone
else. She sent the recipe off to a food magazine and received "a truly astounded letter of
rejection." She cautioned us again that the verse exercises, useful as they are, "really are
only exercises. "
The final book that she showed us was A Handlist of Rhetorjcal Terms, by Richard A .
Lanham. She said Lanham is the source for many of the great words she dazzles people
with. Some of the terms in Lanham's book are more useful than others; there are some
terms in the book that can only be represented in Greek syllabic verse.
Mary-Claire and Steven wrote out examples of each term in Lanham's book. Having
performed the exercise, Mary-Claire confidently told us that it was not profitable. She
"All the officer patients in the ward were forced to censor letters written by all
the enlisted-men patients, who were kept in residence in wards of their own. It
was a monotonous job, and Yossarian was disappointed to learn that the lives of
enlisted men were only slightly more interesting than the lives of officers. After
the first day he had no curiosity at all. To break the monotony, he invented
games. Death to all modifiers, he declared one day, and out of every letter that
passed through his hands went every adverb and every adjective. The next day he
made war on articles. He reached a much higher plane of creativity the following
day when he blacked everything in the letters but a, an and the. That erected
more dynamic intralinear tensions, he felt, and in just about every case left a
message far more universal."
- from Catcn-22 by Joseph Heller
Don rewarded today's early birds with the chance to participate in a referendum. We
voted to decide the due-date for the term papers, Monday 14th or Wednesday 16th . UN
observers were not surprised to find the latter date was favoured by the populace; the only
surpcise was that the vote was not quite unanimous. Very well then, said Don: All papers
to b, h nded to him, his secretary or TA's, by 5pm (Pacific Standard Time) on Wednesday
16" - ·ember. (The real early birds were rewarded with some cookies that Sherry was
har.('. ··Jund. And very good they were too).
It \Va.; course too much to hope that we could get through the whole of a CS class
witho\.: .· omputers rearing their ugly heads; today they did. Don's topic was computer
prograus that are supposed to help us with our writing. Two such-style and dict ion
are available on Navajo (a CSD Unix machine). These are relatively old programs. State
of-the-art systems cost a lot of money, and so naturally Stanford doesn't have them. There
is a program called sexist, for example, which attempts to alert us to controversial word
usage. Don recalled the occasion when the (London) Times quoted him as saying that it
wasn't appropriate to talk about 'mother and daughter' nodes in a tree structure, and he
received a lot of irate mail as a result. People seem to be less uptight about such things
these days, he said_
The style program takes a piece of text and scores it according to 'readability'. The
analysis is very superficial-way below the level of human critiquing. However, said Don,
these programs are kind of fun. And they do provide an excuse to read the document
from another point of view. Even if the analysis is wrong it does prompt you to re-read
your prose, and this has to be a good thing. Don recalled Richard Feynman's anecdote
about his first day at Oak Ridge Laboratories: Having no idea what he was supposed to
be doing, Feynman pointed to a random symbol in the blueprints and said, "What about
this then?" A technician immediately agreed that Feynman had spotted a significant and
potentially dangerous oversight in the design.
How were Q , {3 , and l' determined? The authors of each readability index simply look at
a large number of pieces of writing and assign them a grade-level 'by eye'-that is, they
estimate the age of the intended reader. Each piece of text is then characterised by three
real numbers: the average number of words per sentence, the average number of syllables
per word, and the subjective grade level. So each piece determines a single point in 3-space
(plotted against three orthogonal axes) ; the set of pieces determines a scatter of points in
3-space. Standard linear regression techniques are used to find the plane that is the "best
fit" for these points. The three parameters above define this plane.
Someone asked whether we should be shooting for some specific grade level, and if so, what
level? Don replied that his usual aim is to minimise the level, although overdoing this will
defeat the purpose.
In addition to the raw scores, a variety of other parameters come out of a s t yle analysis:
average length of sentences, percentage of sentences that are much shorter or longer than
the average, percentage of sentences that begin with various parts of speech, etc. The
program also attempts to classify sentences into types and tabulate their frequencies, as well
as telling us the percentages of nouns, adjectives, verbs (active or passive), etc. A sentence
is considered "passive" if a passive verb appears in it anywhere, even in a subclause.
Curiously, style classifies any sentence that begins 'It . . . ' or 'There . . . ' as an "expletive."
This seems a little strange to those of us who are old enough to remember Watergate. We
always thought that it was quite a different class of words that the transcribers of Tricky
Dicky's tapes felt the need to delete.
Don's theological piece stood out as being pitched at a significantly lower grade level than
the other specimens. He was initially surprised by this, and double checked the data
to make sure there was no mistake. But on reflection he concluded that we usually write
more obscurely when writing about our own field. The two versions of his binomial chapter
had very similar scores, despite their having been written twenty years apart. Church's
piece scored high. Don said that the statistics were misleading here; although Church's
sentences are quite long, they are not ugly but musical. Still they were not a special joy
for the reader.
The style output also noted a lot of passive voice in Church (perh'l-ps not surprising in a
A companion program called diction operates on different lines. It has an internal dic
tionary of 450 words and phrases that it deems 'questionable' and flags them, inviting the
writer to find an alternative way to express himself. For example, diction doesn't like
the word 'gratuitous' , and flags its use as an error. Neither does it like the phrases 'num
ber of' or 'due to'. Don noted that copy editors generally prefer 'because of' to 'due to'
in ordinary writing, and perhaps diction is overlooking the mathematical usage: "This
theorem, due to Cauchy, is used . . . " . In Don's book TEX: The Program, the copy-editor
changed all Don's 'due to's to 'owing to'; Don changed them all back again. But he
searched unsuccessfully for a reference to the mathematical usage in his dictionaries, so
he wondered aloud if he was completely out of line with the rest of the world. The class
unanimously reassured him that ' due to' was quite the elegant way to give credit for a
scientific innovation. Lexicographers are out of touch here.
The word ' very' is also on dict ion's list of suspects. Don recalled that someone had once
advised him thus: "Try changing all your 'very's to 'damn's and see what results. Don't
use ' very' unless you would happily use 'damn' in its place." Damn good advice!
The diction filter also objected to 'literally' and 'in fact', but partially redeemed itself by
catching a wicked 'which'. A sister program, explain, expands on diction's objections
and recommends improvements. For example, explain suggests that we write 'if' instead
of 'assuming that', and 'really' instead of 'actually'. In practice, users reportedly accept
about 50% of diction's suggestions. And that's as it should be-we've got to keep these
machines in their place.
Today we heard from our penultimate guest speaker, Rosalie Sterner. Rosalie is a wire
features editor at the San Francisco Chronicle, teaches copy editing at Berkeley, and has
worked as a copy editor for the San Francisco Chronicle, the Kansas City Star, and Chicago
Daily News. So she wields an ultimate pen.
It's a sad truth, Rosalie said, that people who should be able to write well often can't. She
illustrated with a newspaper headline:
and a story that began: "Doing what he loved best, golf pro John Smith died while . . . " .
She told us about the occasion when · a newspaper was having trouble fitting the word
' psychiatrist' into a headline, and resolved the problem simply by writing 'dentist' instead.
Rosalie went through a story filed by an experienced journalist, pointing out its good and
bad features and the changes she had made.
"Nine out of ten books bought in this century by the U.S. Library of Congress, one
'
of the great research libraries of the world, will self-destruct in ao to 50 years. "
the largest paper manufacturers . . . " . Whenever you see "he has heard" you can often
improve or delete it, Rosalie said. A similar case: "The reason for removing the spaces
from the list is that . , . " can be (better) written "We remove the spaces because . . . .
"
Rosalie conceded that good writing is very difficult. We must strive to be clear, coherent,
accurate, and concise. This last is especially imp,ortant, she said, and quoted Pascal:
"I have made this letter longer than usual because I lack the time to make it shorter."
Rosalie was pleased to note that the first drafts of our term papers were quite a bit better
than something else she had read recently-a report by a local software company. After
just a few weeks we are pushing out the envelope of Silicon Valley literacy! But many of our
sentences could be improved, she said, by cutting them shorter. Out with the semicolons,
in with the periods. Don't write one long sentence if you can say the same thing in two
short ones. A semicolon should be used only where the separated clauses have a very
close relationship, and even there a period is often better. She quoted William Zinsser
in his book On Writing Well as saying "The semicolon all too ell$ily conveys 'a certain
Don started class by introducing Paul Halmos. Paul is a distinguished author, a professor
of mathematics at the University of Santa Clara, and a spicy and entertaining lecturer. As
Don said, "He brings our program of guest speakers to a triumphant conclusion."
Paul started his lecture by wondering why we had called him here. "I don't have anything
new to say," he said. "What I had to say has already been majorized by Don and Mary
Claire." He said that even the act of talking about mathematical writing was difficult, by
comparison with the act of talking about mathematics itself. We don't have to remember
much about math, because we know its structure; we can develop and discover the material
as we talk about it. The structure of mathematical writing is much more elusive, so how
do we know what to say about it? Sure, Paul brought several pages of prepared notes to
class, but he claims that even those won't help h.im much.
Not that the subject of mathematical writing isn't important. Some mathematicians have
disdain for anything other than great theorems. "Anything else is beneath them." But
they are wrong. Mathematicians who merely think great theorems have no more done
their job than painters who merely think great paintings.
Paul has read our handouts, and he wants to make a few comments. He wants to have a
dialog with us; he admonished us to break in whenever we feel the urge.
He is going to drift in and out of many different topics but only after he has given us an
anchor and a rough outline. The anchor? Two basic rules:
Do organize material.
Do not distract the reader.
Turning first to Semantics, Paul spoke to us about the natural process of change inherent
in language and how it affects our word usage. Some changes are good-some changes are
bad. According to Paul, one of the most often discussed symptoms of that change is the
word 'hopefully'.
The most recent literary tradition, handed down to us by our grandparents, tells us that
'hopefully' means the exact opposite of 'hopelessly' :
" I don't have have a chance i n the world t o be promoted," he growled hopelessly.
"My chances look good," his colleague grinned hopefully.
But another, impersonal use of 'hopefully' has become popular-an evasive form in which
one can say "Hopefully he won't be re-elected" instead of "I hope he won't." This confiicts
with the normal usage of other words that can end both -fully and -lessly. Although we
may think that interest rates will rise, we don't say "Thoughtfully interest rates will rise."
as much as it prohibits
Paul doesn't like the new usage, which he calls "illogical and ugly." The mere fact of change
is bothersome. But he realizes that his is only one vote, and he seems to be outnumbered.
On balance, it is perhaps a good change, one that might even make communication easier.
"The English language won't collapse if the other side wins." In fact, Paul says,
Paul sees other changes as needless and careless. It grates on his ears when he hears,
Of course some would say, Why do we need to reserve a special word for the random
destruction of one out of ten? Paul thinks muddying the meaning of the word is bad, but
he admits that it is harmless.
Other unneeded, and harmful, obfuscations should be discouraged. 'Imply' does not mean
'infer' , and ' disinterested' does not mean 'uninterested'. To confuse these words is to' lose
valuable distinctions. Tragically, the differences between these words are becoming so
confused that if we are writing for a large audience, and if we need to make use of the
distinctions, we probably shouldn' t .
Evidence of bad changes can even be found in our handouts. In § 4 , one of the TAs (not the
one with the charming British Accent) used 'reference' as a verb. Paul's response: "There
is no such verb, and if there were, it sure as hell wouldn't be transitive." How would it
sound to say "1 quotationed the author"?
Barry Hayes pointed out that in Computer Science, 'reference' is a technical term used
as a verb. Technical terms like 'majorize' sometimes creep into our vocabularies. Don
supported him by saying that computer programmers "reference and de-reference things
all the time." Paul's response: "My condolences. You know, the French say English is
ruining their language. How the French feel about English is how 1 feel about that."
We moved on to Syntax. "Obviously," said Paul, "people approve of it; nobody uses un
grammatical English on purpose." Syntax changes more slowly than semantics. However,
he once heard the following lovely sentence:
This has rhythm, it's communicative, it's personal; but of course it's not grammat ical
English. Therefore it distracts the reader from what is actually being said. Here's another
non-made-up example:
Similarly, we say
He is the President of France,
but never
Him is the President of France.
Therefore we would not logically ask,
Whom is the President of France?
Simple, right? Well, there are more confusing cases too:
I don't know who is the President of France.
Or should it be 'I don't know whom is the President of France'? A grammatical push-pull
is involved here. (The nominative wins, and 'Who' is correct.)
Paul would like to stamp out abuses such as 'I hate whomever said that'. An attention to
logical rules of grammar helps us to clarify our own thinking in general.
Taking issue with part of our first handout, Paul says the rule "A preposition is a bad
word to end a sentence with" is "reactionary grammarian balderdash". Consider:
Palo Alto is a good place to live in.
Don Knuth is fun to have a drink with.
There aren't many people I would say that to.
All of these are examples of prepositions in "post position" that could only be ruined by
being made grammatically pure. (We have all heard Winston Churchill's famous statement
about "the sort of nonsense up with which I will not put.") Why should we do gymnastics
for sentences with only one preposition at the end? Paul gave us a famous sentence ending
with five prepositions:
..
What did you want to bring that book I didn't want to be read to out of up for?
See the section on "Quotations," which may be found elsewhere in this volume.
Paul was incensed. "Horrors" , he said. "You see the iJlogic, don't you? There's no
reason for it. It's not a grammatical convention-it's a totally arbitrary typographer's
convention. The battle against this sort of stupidity can be won." He has succeeded in
getting his own books punctuated logically. Bob Floyd gave support by mentioning how
deadly such conventions are in a book about computer programming.
But then Don remarked that one of Paul's two main points was not to distract the reader.
Paul said, "And your implied, snide, argument?" "Well," said Don, "I guess I'm implying
that you think you're distracting only the copy editors and not the readers." "Yes, I believe
that's right, with respect to commas and quotation marks."
Mary-Claire asked, "Just how far are you willing to go in the direction of logic? Are you
willing to place periods outside the quotation marks in actual dialog that already has its
own punctuation?" . Her example:
He said, "No." .
Paul said that if you push him in a corner he might go so fas as to say "Yes.". And
Mary-Claire responded, "That's what I thought. Luckily there's not much dialog in the
sort of stuff you write." . (Paul conceded that he doesn't really have an ear for dialog and
doesn't have immediate plans to break into the world of fiction. He would love to write
a novel, some piece of literature that isn't expository, but he's not being held back by an
inability to punctuate.)
The second Symbols-related point that Paul wanted to bring to our attention was the
subject of written versus symbolic numerals. He gave us an examples where 'one' could
either be a pronoun or a numeral, depending on the context:
What are we to do when x is one?
The sentence preceding that one may have been
The solutions of the equation are the singularities of the
function we are studying.
Or it may have been
Everything is clear when x is 2 or greater.
Another example (this time from Birkhoff & MacLane's classic text):
The first few positive primes are
2, 3, 5, 7, 11, . . . .
Any positive integer which is not one
He urges us to remove such ambiguities by using '1' when we want to speak of the numeral.
Rule #6 in § 1 suggests that we use 'we' to avoid passive voice. This use of 'we' is equivalent
to "the reader and 1" . Paul says that even better is to avoid both passive voice and the
use of 'we' through judicial use of imperative and indicative moods along with an outlying
kind of non-sentential phrase. For example,
becomes
Or,
Consequence: A implies B.
The latter technique can occasionally be used in a sequential manner,
Consequence 1 : X. Consequence 2: Y.
Conclusion: Z.
Alternatively, here's an example of imperative mood:
Just say
Replace x by 7 throughout.
Paul finds this less distracting. Using 'we' is not a crime, but it adds an irrelevant dimension
that can often be replaced by something clearer and smoother.
He gave a lengthier example of a typical passage that shows how both 'we' and passive
voice can be avoided without sounding artificial:
(The example would be more effective, of course, if the (something) s were replaced by
meaty concepts, but that would distract us from the point at issue.)
* Passive, God forgive me, or at least not active; but this phrase is standard aJ;ld inof·
fensive to my ear.
Paul. The material is in linear order, but organization means much more than that. " The
plot of an exposition is rarely a straight line." Branches and alternative threads must be
woven together. Paul says he spends most of his writing time working on organization of the
material. He suggests that we look at Roget's Thesaurus, an Encyclopedia, a do-it-yourself
article, and a·good textbook, for increasingly complex examples of non-linearly-organized
presentations.
"Do organize," and "Do not distract." Except that all rules are made to be broken. When
you want to jar your readers, Paul suggests that you distract them by changing your
notation, screaming ungrammatical sentences, or being awkwardly repetitious.
His final words to the class were, "Anything that helps communication is good. Anything
that hurts is bad. And that's all I have to say."
The final lecture of CS 209 was partially devoted to course evaluation. (We were, no doubt,
harsh but fair. ) Don told us that we would spend the last 40 minutes of class looking at
the notes of people who have been going over our handouts but haven't had a chance to
speak. ( More course evaluations, perhaps?) Don said that he wanted to "end on a note of
honesty and truth."
The first comments that he addressed were from Nelson Blachman (father of course member
Nancy Blachman). Nelson is very interested in writing (he writes papers frequently), and
he took the time to suggest improvements to the first few handouts.
Don liked some of these suggestions, but he found others incompatible with his personal
style. He said, "The main thing that I get from this is that the style has to be your own.
You will write things that someone else will never write." Don says he has learned this
lesson well by writing an annual Christmas letter with his wife, Jill. "We get along 364
days of the year," he said, "but there is no way that we can write a sentence acceptable
to both of us." (They have solved the problem by writing alternate paragraphs.)
Among Nelson's suggestions were:
Changing 'the above proof' to 'the proof above'. Don agrees with this change
mostly because editors are forever calling him on it, but the original usage doesn't
sound terribly odd to him. Nelson says that 'above' and 'below' are two adjectives
that never precede the things they modify. Don thinks 'above' has become an
adjective, but 'below' hasn't (yet).
Changing ', i.e.' to '; i.e.' . Don says that that is a matter of taste and pacing.
Changing the spelling of 'hiccups' to 'hiccoughs'. Don's dictionaries preferred the
shorter spelling.
Changing 'depending on the usage, the terms this, that, or the other might be
used' to 'depending on the usage, the term this, that, or the other might be use<:i'.
Don didn't see this as an improvement.
[§ 43 . FINAL TRUTHS 1 1 :I ]
use of hopefully as a sentential adverb acceptable. Bob also provided several authoritative
quotations to support his objection to its use. (Don said that this is the main concern:
Using 'hopefully' raises hackles in many people, distracting them from what you're trying
to say; that's why he doesn't use i t . But he thinks some of the documents that Bob uses to
support his position were probably written by the the people that Mary-Claire was calling
ignorant in her essay. )
Tom Henzinger, who is Austrian, observed that the German language has a common
word 'hoffentlich' that corresponds precisely to the new English usage of 'hopefully'. This
reminded Don that he often needs words that the English language just doesn't have. For
example, we have hundreds of ways to say that Jane beat Jim, but we have few ways to
say that Jim lost to Jane. (And we have to use two words in the latter case where only
one is needed in the former. ) Don said:
Our language often lacks verbs that correspond to "reflexive" relations. We have
an abundance of words like 'dominate' but none like 'dominate or equal to'. So
we must use long-winded phrases like 'less than or equal to'; sometimes, but not
often enough, we can say 'at most '.
Returning to Bob Floyd's comments, Bob sent Don several citations to support his claim
that exclamation points should be used only with actual exclamations or interjections.
Some examples: Ouch! Stop! Thief! Well, I'll be! To Don's surprise, none of the authorities
even mention that exclamation points can indicate surprise! Paul Halmos, speaking from
the peanut gallery today, told about a trick he has to get around this: You can put the
exclamation point in parentheses(!) . * Then everybody is happy, because you've made an
exclamation of surprise.
Bob said, "Advice to always avoid splitting infinitives is unwise." Don agreed that split
infinitives can provide good emphasis and that rewrites can sound forced or awkward.
About not ending sentences with prepositions, Bob said, "You have no case, give up." Don
agreed, saying that he had not understood the issue. "Coming from Milwaukee, where half
the people speak English with a heavy dose of German, has made me oversensitive to
sentences that end funny." However, there is a problem with sentences ending with prep
ositions, namely when they already have a structure that accommodates the preposition
in the middle:
Avoid such prepositions, which such sentences end with. The people who don't
like the rule against prepositions in post position would never think of writing
such sentences, so they probably have forgotten why the overly restrictive rule
was first formulated.
Bob next objected to Don's suggestion not to omit 'that's. Don admitted that there are
cases when leaving out a that produces a better sentence. For example, 'He said he was
* Don was able to use that trick the next day in Chapter 8 of his book. (Who said this
course wasn't practical?) But he found that it was like an unusual word: You can't easily
repeat it twice in the same chapter.
going' is a better sentence than 'He said that he was going. ' But, in this example ·that ·
is not needed as a grammatical help because the pronoun (in nominative case) keeps the
syntax clear. In technical writing we often have more complicated sentences, which can
benefit from the extra information that 'that' provides.
Someone in the class mentioned a related issue: Should the word 'then' be used in sentences
like "If I get there early enough, (then) I will save you a seat." (Rosalie had suggested
that it should not . ) Don says that there is a difference between technical writing and
newspaper writing, and he believes that well placed 'then's can make a paper more easily
understood. In that particular sentence he would definitely leave out 'then' : but in mathe
matical contexts (where the phrase after the comma is likely a mathematical statement) he
would definitely leave it in. Don says that our brains only have time to do simple parsing
when we are reading for speed and comprehension. As Paul Halmos said, "Anything that
helps communication is good."
The final subject that Don introduced was a behind-the-scenes discussion between Mary
Claire, Don, and one of the class members: Dan Schroeder. Dan received the comments on
his term paper and objected to the claim that he had "wicked-whiches" ; he gave involved
logical reasoning in support of why his whiches really should be whiches. Don said, "If you
have to think that long about the sentence, it is probably wrong." Mary-Claire said that
writers have to contend with overly-sensitive readers like Don, who wince at all whiches
that aren't preceded by commas or prepositions.
In one place Dan did not place a comma before a which because he was concerned about
coincident commas. This is what Mary-Claire has to say about coincident commas:
Coincident commas are not a sign of bad construction, any more than the co
incidence of a final comma and a period, or a final comma and a semicolon, or
any other two marks of punctuation. Where two commas coincide, we write only
one. Where a comma and a period coincide, we write the period. Etc. Truly,
coincident punctuation is not a problem.
(Did you catch the coincident periods there?)
After this comment we were thrown from the room in order to make way for another class.
As always in this course, there was more to say than there was time to say it in.
(Not only does this rhyme and scan, it also works. In fact , it may be the fastest way to
do sideways addition on the GE635 and similar machines.)
Donald E. Knuth
Computer Science Department
Stanford University
Stanford, California 94305
I found Landau's remark [6, p. 883] that the first appearance of 0 known
to him was in Bachmann's 1894 book [i, p. 401]. In the same place, Landau
said that he had personally invented the o-notation while writing his
handbook about the distribution of primes; his original discussion of 0
and o is in [6, pp. 59-63].
I could not find any appearances of ~ -notation in Landau's publications;
this was confirmed later when I discussed the question with George P61ya, v~o
told me that he was a student of Landau's and was quite familiar with his
writings. P61ya knew what ~ -notation meant, but never had used it in
his o}m work. (Like teacher, like pupil, he said.)
Since ~ notation is so rarely used, my first three trips to the
library bore little fruit, but on my fourth visit I was finally able to
pinpoint its probable origin: Hardy and Littlewood introduced Q in their
classic 1914 memoir [4, p. 225], calling it a "new" notation. They used
it also in their major paper on distribution of primes [5, see pp. 125ff],
but they apparently found little subsequent need for it in later works.
Unfortunately, Hardy and Littlewood didn't define ~(f(n)) as I wanted
them to; their definition was a negation of o(f(n)) , namely a function
whose absolute value exceeds Cf(n) for infinitely many n , when C is a
sufficiently small positive constant. For all the applications I have seen
so far in computer science, a stronger requirement (replacing "infinitely
many n" by "all large n") is much more appropriate.
After discussing this problem with people for several years, I have
come to the conclusion that the following definitions will prove to be
most useful for computer scientists:
O(f(n)) denotes the set of all g(n) such that there exist positive
constants C and nO with Ig(n)l ~ Cf(n) for all n ~ nO .
~(f(n)) denotes the set of all g(n) such that there exist positive
constants C and nO with g(n) k Cf(n) for all n k n0 .
®(f(n)) denotes the set of all g(n) such that there exist positive
constants C, C' , and nO with Cf(n) < g(n) < C'f(n) for all
n ~ nO
SIGACT News 20 Apr.-June 1976
have to break down and use o-notation when faced with a function for
which I can't prove anything stronger.
Note that there is a slight lack of symmetry in the above definitions
of 0 , N , and ® , since absolute value signs are used on g(n) only in
the case of 0 . This is not really an anomaly, since 0 refers to a
neighborhood of zero while N refers to a neighborhood of infinity.
(Hardy's book on divergent series uses 0L and OR v~en a one-sided
O-result is needed. Hardy and Littlewood [5] used NL and ~R for
functions respectively < -Cf(n) and > Cf(n) infinitely often. Neither
of these has become widespread.)
The above notations are intended to be useful in the vast majority
of applications~ but they are not intended to meet all conceivable need£.
For e x s m ~ l % if you are dealing with a function like (log log n) c°s n
you ~ g h t want a notation for "all functions which oscill~te between
log log n and i/log log n where these limits are best possible". In
such a case, a local notation for the purpose~ confined to tme pages of
whatever paper you are writing at the t i m % should suffice; it isn't
necessary to worry about standard notations for a concept unless that
concept arises frequently.
I would like to close this letter by discussing a competing way to
denote the order of function growth. My library research turned up the
surprising fact that this alternative approach actually antedates the
O-notation itself. Paul du Bois-Reymond [2] used the relational notations
already in 1871 , for positive functions f(n) and g(n) , with the meaning
we can now describe as g(n) : o(f(n)) (or as f(n) = e(g(n)) ). Hardy's
interesting tract on "Orders of Infinity" [3] extends this by using also
the relations
f(n) ~ g(n)
~W
f(n) ~ g(n)
f(n) ~ g(n)
v
when lim n ~ f(n)/g(n) = i . (Hardy's ~ notation may seem peculiar at
firstj until you realize what he did with it; for example, he proved the
following nice theorem: "If f(n) and g(n) are any functions built up
recursively from the ordinary arithmetic operations and the exp and log
functions, we have exactly one of the three relations f(n) < g(n)
f(n) ~ g(n) , or f(n) ~ g(n) .")
Hardy's excellent notation has become somewhat distorted over the years.
For example, Vinogradov [9] writes f(n) << g(n) instead of f(n) < g(n) ;
thus, Vinogradov is comfortable with the formula
2002 << (n
2) ,
while I am not. In any even% such relational notations have intuitively
clear transitive properties~ and they avoid the use of one-way equalities
which bother some people l Why, then, should they not replace 0 and the
new symbols ~ and ® ?
The main reason why 0 is so handy is that we can use it right in the
middle of formulas (and in the middle of English sentences~ and in tables
which show the running times for a family of related algorithms~ etc.).
The relational notations require us to transpose everything but the function
we are estimating to one side of an equation. (Cf. [7], P. 191.) Simple
derivations like
l+ n
-~ : exp(H n in(l + Hn/n))
s©propriate.
References
II May 1976
Dear Editor,
Dana Angluin
Oral History of Donald Knuth
Interviewed by:
Edward Feigenbaum
Edward Feigenbaum: My name is Edward Feigenbaum. I’m a professor at Stanford University, in the
Computer Science Department. I’m very pleased to have the opportunity to interview my colleague and
friend from 1968 on, Professor Don Knuth of the Computer Science Department. Don and I have
discussed the question of what viewers and readers of this oral history we think there are. We’re
orienting our questions and comments to several groups of you readers and viewers. First, the generally
intelligent and enlightened science-oriented person who has seen the field of computer science explode
in the past half century and would like to find out what is important, even beautiful, and what some of the
field’s problems have been. Second, the student of today who would like orientation and motivation
toward computer science as a field of scholarly work and application, much as Don and I had to do in the
1950s. And third, those of you who maybe are not yet born, the history of science scholar of a dozen or
100 years from now, who will want to know more about Donald Knuth, the scientist and programming
artist, who produced a memorable body of work in the decades just before and after the turn of the
millennium. Don and I share several things in our past in common. Actually, many things. We both went
to institutes of technology, I to Carnegie Institute of Technology, now Carnegie Mellon University, and
Don to Case Institute of Technology, now Case Western Reserve. We both encountered early computers
during that college experience. We both went on to take a first job at a university. And then our next job
was at Stanford University, in the new Computer Science Department, where we both stayed from the
earliest days of the department. I’d like to ask Don to describe his first encounter with a computer. What
led him into the world of computing? In my case, it was an IBM 701, learned from the manual. In Don’s
case, it was an IBM Type 650 that had been delivered to Case Institute of Technology. In fact, Don even
dedicated one of his major books to the IBM Type 650 computer. Don, what is the story of your discovery
of computing and your early work with this intriguing new artifact?
Donald Knuth: Okay. Thanks, Ed. I guess I want to add that Ed and I are doing a team thing here; next
week, I’ll be asking Ed the questions that he’s asking me today. We thought that it might be more fun for
both of us and, also for people who are listening or watching or reading this material, to see the
symmetrical approach instead of having a historian interviewing us. We’re colleagues, although we work
in different fields. We can give you some slants on the thing from people who sort of both have been
there. We’re going to be covering many years of the story today, so we can’t do too much in-depth. But
we also want to do a few things in depth, because the defining thing about computer science is that
computer science plunges into things at low levels as well as sticking on high level. Since we’re going to
cover so many topics, I’m sure that I won’t sleep tonight because I’ll be saying to myself, “Oh, I should’ve
said such and such when he asked me that question”. So I think Ed and I also are going to maybe add
another little thing to this oral interview, where we might want to add a page or two of afterthoughts that
come to us later, because then I don’t have to be so careful about answering every question that he asks
me now. The interesting thing will be not only the wrong answer that pops first in my mind, but also
maybe a slightly corrected thing. One of the stories of my life, as you’ll probably find out, is I try to get
things correct. I probably obsess about not making too many mistakes. Okay. Now, your question, Ed,
was how did I get into the computing business. When the computers were first built in the ‘40s I was ten
years old, so I certainly was not a pioneer in that sense. I saw my first computer in 1957, which is pretty
late in the history game as far as computers are concerned. On the other hand, programming was still
pretty much a new thing. There were, I don’t know, maybe 2,000 programmers in the world at that time.
I’m not sure how to figure it, but it was still fairly early from that point of view. I was a freshman in college.
Your question was: how did I get to be a physics student there in college. I grew up in Milwaukee,
Wisconsin. Those of you who want to do the math can figure out, I was born in 1938. My father was the
first person among all his ancestors who had gone to college. My mother was the first person in all of her
ancestors who had gone to a year of school to learn how to be a typist. There was no tradition in our
family of higher education at all. I think [that’s] typical of America at the time. My great-grandfather was a
blacksmith. My grandfather was a janitor, let’s say. The people were pretty smart. They could play
cards well, but they didn’t have an academic background. I don’t want to dwell on this too much, because
I find that there’s lots of discussion on the internet about the early part of my life. There’s a book called
Mathematical People, in which people asked me these questions at length -- how I got started. The one
thing that stands out most, probably, is when I was an eighth grader there was a contest run by a local TV
station, a company called Zeigler’s Giant Bar. They said, “How many words can you make out of the
letters in ‘Zeigler’s Giant Bar’?” Well, there’s a lot of letters there. I was kind of intrigued by this question,
and I had just seen an unabridged dictionary. So I spent two weeks going through the unabridged
dictionary, finding every word in that dictionary that could be made out of the letters in “Zeigler’s Giant
Bar”. I pretended that I had a stomachache, so I stayed home from school those two weeks. The bottom
line is that I found 4500 words that could be made, and the judges had only found 2500. I won the
contest, and I won Zeigler’s Giant Bars for everybody in my class, and also got to be on television and
everything. This was the first indication that I would obsess about problems that would take a little bit of -
- what do you call it? -- long attention span to solve. But my main interest in those days was music. I
almost became a music major when I went to college. Our high school was not very strong in science,
but I had a wonderful chemistry and physics teacher who inspired me. When I got the chance to go to
Case, looking back, it seems that the thing that really turned it was that Case was a challenge. It was
supposed to have a lot of meat. It wasn’t going to be easy. At the college where I had been admitted to
be a music major, the people, when I visited there, sort of emphasized how easy it was going to be there.
Instead of coasting, I think I was intrigued by the idea that Case was going to make me work hard. I was
scared that I was going to flunk out, but still I was ready to work. I worked especially hard as a freshman,
and then I coasted pretty much after that. In my freshman year, I started out and I found out that my
chemistry teacher knew a lot of chemistry, but he didn’t know physics or mathematics. My physics
teacher knew physics and chemistry, but he didn’t know much about mathematics. But my math teacher
knew all three things. I was very impressed by my math teacher. Then in my sophomore year in physics,
I had to take a required class of welding. I just couldn’t do welding, so I decided maybe I can’t be a
physicist. Welding was so scary. I’ve got these thousands of volts in this stuff that I’m carrying. I have to
wear goggles. I can’t have my glasses on, I can’t see what I’m doing, and I’m too tall. The table is way
down there. I’m supposed to be having these scary electrons shooting all over the place and still connect
X to Y. It was terrible. I was a miserable failure at welding. On the other hand, mathematics! In the
sophomore year for mathematicians, they give you courses that are what we now call discrete
mathematics, where you study logic and things that are integers instead of continuous quantities. I was
drawn to that. That was something, somehow, that had great appeal to me. Meanwhile, in order to
support myself, I had to work for the statisticians at Case. First this meant drawing graphs and sorting
cards. We had a fascinating machine where you put in punch cards and they fall into different piles, and
you can look at what comes out. Then I could plot the numbers on a graph, and get some salary from
this. Later on in my freshman year there arrived a machine that, at first, I could only see through the
glass window. They called it a computer. I think it was actually called the IBM 650 “Univac”. That was a
funny name, because Univac was a competing brand. One night a guy showed me how it worked, and
gave me a chance to look at the manual. It was love at first sight. I could sit all night with that machine
and play with it. Actually, to be exact, the first programs I wrote for the machine were not in machine
language but in a system called The Bell Interpreter System. It was something like this. You have an
instruction, and the instruction would say, “Add the number in cell 2 to the number in cell 15 and put the
result in cell 30.” We had instructions like that, a bunch of them. This was a simple way to learn
programming. In fact, I still believe that it might be the best way to teach people programming, instead of
teaching them what we call high-level language right now. Certainly, it’s something that you could easily
teach to a fourth or fifth grader who hasn’t had algebra yet, and get the idea of what a machine is. I was
pledging a fraternity, and one of my fraternity brothers didn’t want to do his homework assignment where
he was supposed to find the roots of a fifth-degree equation. I looked at some textbooks, and it told me
how to solve a fifth-degree equation. I programmed it in this Bell Interpretive Language. I wrote the
program. My memory is that it worked correctly the first time. I don’t know if it really gave the right
answers, but miraculously it ground out these numbers. My fraternity brother passed his course, I got
into the fraternity, and that was my first little program. Then I learned about the machine language inside
the 650. I wrote my first program for the 650 probably in the spring of my freshman year, and debugged it
at night. The first time I wrote the program, it was about 60 instructions long in machine language. It was
a program to find the prime factors of a number. The 650 was a machine that had decimal arithmetic with
ten-digit numbers. You could dial the numbers on the console of the machine. So you would dial a ten-
digit number, and my program would go into action. It would punch out cards that would say what are the
factors of this number that you dialed in there. The computer was a very slow computer. In order to do a
division instruction, it took five milliseconds. I don’t know, is that six orders of magnitude slower than
today’s machines, to do division? Of course, the way I did factoring was by division. To see if a number
was divisible by 13, I had to divide by 13. I divided by 15 as well, 17, 19. It would try to find everything
that divided. If I started out with a big ten-digit number that happened to be prime -- had no dividers at all
-- I think it would take 15 or 20 minutes for my program to decide. Not only did my program have about
60 or so instructions when I started, they were almost all wrong. When I finished, it was about 120, 130
instructions. I made more errors in this program than there were lines of code! One of the things that I
had to change, for example, that took a lot of work, was I had originally thought I could get all the prime
factors onto one card. But a card had 80 columns, and each number took ten digits. So I could only get
eight factors on a single card. Well, you take a number like 2 to the 32nd power, that’s going to take four
cards. Because it’s two times two times two times two [and so on]. I had to put in lots of extra stuff in my
program that would handle these cases. So my first program taught me a lot about the errors that I was
going to be making in the future, and also about how to find errors. That’s sort of the story of my life, is
making errors and trying to recover from them. Did I answer your question yet?
Feigenbaum: No.
Knuth: No.
Feigenbaum: Don, a couple questions about your early career, before Case and at Case. It’s very
interesting that you mentioned the Zeigler’s Giant Bar, because it points to a really early interest in
combinatorics. Your intuition at combinatorics is one of the things that impresses so many of us. Why
combinatorics, and how did you get to that? Do you see combinatorics in your head in a different way
than the rest of us do?
Knuth: I think that there is something strange about my head. It’s clear that I have much better intuition
about discrete things than continuous things. In physics, for example, I could pass the exams and I could
do the problems in quantum mechanics, but I couldn’t intuit what I was doing. I didn’t feel right being able
to get an “A” on an exam without ever having the idea of how I would’ve thought of the questions that the
person made up solving the exam. But on the other hand, in my discrete math class, these were things
that really seemed part of me. There’s definitely something in how I had developed by the time I was a
teenager that made me understand discrete objects, like zeros and ones of course, or things that are
made out of alphabetical letters, much better than things like Fourier transforms or waves -- radio waves,
things like this. I can do these other things, but it’s like a dog standing on his hind legs. “Oh, look, the
dog can walk.” But no, he’s not walking; he’s just a dog trying to walk. That’s the way it is for me in a lot
of the continuous or more geometrical things. But when it comes to marks on papers, or integer numbers
like finding the prime factors of something, that’s a question that appealed to me more than even finding
the roots of polynomial.
Feigenbaum: Don, question about that. Sorry to interject this question, behaving like a cognitive
psychologist.
Feigenbaum: Right. Herb Simon -- Professor Simon, of Carnegie Mellon University -- once did a set of
experiments that kind of separated thinkers into what he called “visualizers” and “symbolizers”. When
you do the combinatorics and discrete math that you do, which so amazes us guys who can’t do it that
well, are you actually visualizing what’s going on, or is it just pure symbol manipulation?
Knuth: Well, you know, I’m visualizing the symbols. To me, the symbols are reality, in a way. I take a
mathematical problem, I translate it into formulas, and then the formulas are the reality. I know how to
transform one formula into another. That should be the subtitle of my book Concrete Mathematics: How
to Manipulate Formulas. I’d like to talk about that a little. It started out… My cousin, Earl, who died, Earl
Goldschlager [ph?], he was a engineer, eventually went to Southern California, but I knew him in
Cleveland area. When I was in second grade he went to Case. He was one of the people who sort of
influenced me that it may be good to go to Case. When I was visiting him in the summer, he told me a
little bit about algebra. He said, “If you have two numbers, and you know that the sum of these numbers
is 100 and the difference of these numbers is 20, what are the two numbers?” He said, “You know how
you can solve this, Don? You can say X is one of the numbers and Y is one of the numbers. X plus Y is
100. X minus Y is 20. And how do you find those numbers?” he says. “Well, you add these two
equations together, and you get 2X = 120. And you subtract the equation from each other, and you get
2Y = 80. So X must be 60, and Y must be 40. Okay?” Wow! This was an “aha!” thing for me when I
was in second grade. I liked symbols in this form. The main thing that I enjoyed doing, in seventh grade,
was diagramming sentences. NPR had a show; a woman published a book about diagramming
sentences, “The Lost Art of Diagramming Sentences”, during the last year. This is where you take a
sentence of English and you find its structure. It says, “It’s a noun phrase followed by a verb phrase.”
Let’s take a sentence here, “How did you get to be a physics student?” Okay. It’s not a noun phrase
followed by a verb phrase; this is an imperative sentence. It starts with a verb. “How did you get…” It’s
very interesting, the structure of that sentence. We had a textbook that showed how to diagram simple
English sentences. The kids in our class, we would then try to apply this to sentences that weren’t in the
book, sentences that we would see in newspapers or in advertisements. We looked at the hymns in the
hymnal, and we couldn’t figure out how to diagram those. We spent hours and hours trying to figure this
out. But we thought about structure of language, and trying to make these discrete tree structures out of
English sentences, in seventh grade. My friends and I, this turned us on. When we got to high school,
we breezed through all our English classes because we knew more than the teachers did. They had
never studied this diagramming. So I had this kind of interest in symbols and diagramming early on --
discrete things, early on. When I got into logic as a sophomore, and saw that mathematics involved a lot
of symbol manipulation, then that took me there. I see punched cards in this. I mean, holes in cards are
nice and discrete. The way a computer works is totally discrete. A computer has to stand on its hind legs
trying to do continuous mathematics. I have a feeling that a lot of the brightest students don’t go into
mathematics because -- curious thing -- they don’t need algebra at the level I did. I don’t think I was
smarter than the other people in my class, but I learned algebra first. A lot of very bright students today
don’t see any need for algebra. They see a problem, say, the sum of two numbers is 100 and the
difference is 20, they just sort of say, “Oh, 60 and 40.” They’re so smart they don’t need algebra. They
go on seeing lots of problems and they can just do them, without knowing how they do it, particularly.
Then finally they get to a harder problem, where the only way to solve it is with algebra. But by that time,
they haven’t learned the fundamental ideas of algebra. The fact that they were so smart prevented them
from learning this important crutch that I think turned out to be important for the way I approach a
problem. Then they say, “Oh, I can’t do math.” They do very well as biologists, doctors and lawyers.
Feigenbaum: You’re recounting your interest in the structure of languages very early. Seventh grade, I
think you said. That’s really interesting. Because among the people -- well, the word “computer science”
wasn’t used, but we would now call it “information technology” people -- your early reputation was in
programming languages and compilers. Were the seeds of that planted at Case? Tell us about that early
work. I mean, that’s how I got to know you first.
Knuth: Yeah, the seeds were planted at Case in the following way. First I learned about the 650. Then,
I’m not sure when it was but it probably was in the summer of my freshman year, where we got a program
from Carnegie -- where you were a student -- that was written by Alan Perlis and three other people.
Feigenbaum: “IT”.
Feigenbaum: Yeah, it was Perlis, [Harold] van Zoeren, and Joe Smith.
Knuth: In this program you would punch on cards a algebraic formula. You would say, “A = B + C.”
Well, in IT, you had to say, “X1 = X2 + X4.” Because you didn’t have a plus sign, you had to say, “A” for
the plus sign. So you had to say, “X1 Z X2 A X4.” No, “S,” I guess, was plus, and “A” was for absolute
value. But anyway, we had to encode algebra in terms of a small character set, a few letters. There
weren’t that many characters you could punch on a card. You punch this thing on a card, and you feed
the card into the machine. The lights spin around for a few seconds and then -- punch, punch, punch,
punch -- out come machine language instructions that set X1 equal to X2 + X4. Automatic programming
coming out of an algebraic formula. Well, this blew my mind. I couldn’t understand how this was
possible, to do this miracle where I had just these punches on the card. I could understand how to write a
program to factor numbers, but I couldn’t understand how to write a program that would convert algebra
into machine instructions.
Feigenbaum: It hadn’t yet occurred to you that the computer was a general symbol-manipulating device.
Knuth: No. No, that occurred to Lady Lovelace, but it didn’t occur to me. I’m slow to pick up on these
things, but then I persevere. So I got hold of the source code for IT. It couldn’t be too long, because the
650 had only 2,000 words of memory, and some of those words of memory had to be used to hold the
data as well as the instructions. It’s probably, I don’t know, 1,000 lines of code. The source code is not
hard to find. They published it in a report and I’ve seen it in several libraries. I’m pretty sure it’s on the
internet long ago. I went through every line of that program. During the summer we have a family get-
together on a beach on Lake Erie. I would spend the time playing cards and playing tennis. But most of
the time I was going through this listing, trying to find out the miracle of how IT worked. Ok, it wasn’t
impossible after all. In fact, I thought of better ways to do it than were in that program. Since we’re in a
history museum, we should also mention that the program had originally been developed when Perlis
was at Purdue, before he went to Carnegie, with three other people there. I think maybe Smith and van
Zoeren came with Alan to Carnegie. But there was Sylvia Orgel and several other people at Purdue who
had worked on a similar project, for a different computer at Purdue. Purdue also produced another
compiler, a different one. It’s not as well-known as IT. But anyway, I didn’t know this at the time, either.
The code, once I saw how it happened, was inspiring to me. Also, the discipline of reading other people’s
program was something good to learn early. Through my life I’ve had a love of reading source materials -
- reading something that pioneers had written and trying to understand what their thought processes were
in order to write this out. Especially when they’re solving a problem I don’t know how to solve, because
this is the best way for me to put into my own brain how to get past stumbling blocks. At Case I also
remember looking at papers that Fermat had written in Latin in the 17th century, in order to understand
how that great number theorist approached problems. I have to rely on friends to help me get through
Sanskrit manuscripts and things now, but I still…. Just last month, I found, to my great surprise, that the
concept of orthogonal Latin squares, which we’ll probably talk about briefly later on, originated in North
Africa in the 13th century. Or was it the 14th century? I was looking at some historical documents and I
came across accounts of this Arabic thing. By reading it in French translation I was able to see that the
guy really had this concept, orthogonal Latin squares, that early. The previous earliest known example
was 1724. I love to look at the work of pioneers and try to get into their minds and see what’s happening.
Feigenbaum: One of the things worth observing -- it’s off the track but as long as we’re talking about
history -- is that our current generation, and generations of students, don’t even know the history of their
own field. They’re constantly reinventing things, or thoughtlessly disregarding things. We’re not just
talking about history going back in time hundreds of years. We’re talking about history going back a
dozen years, or two-dozen years.
Knuth: Yeah, I know. It’s such a common failing. I would say that’s my major disappointment with my
teaching career. I was not able to get this across to any of my students this love for that kind of
scholarship, reading source material. I was a complete failure at passing this on to the people that I
worked with the most closely. I don’t know what I should’ve done. When I came to Stanford from
Caltech, I had been researching Pascal. I couldn’t find much about Pascal’s work in the Caltech library.
At Stanford, I found two shelves devoted to it. I was really impressed by that. Then I came to the
Stanford engineering library, and everything was in storage if it was more than five years old. It was a
basket case at that time, in the 60’s.
Knuth: I’ve got to restrain myself from not telling too much about the early compiler. But anyway, after
IT, I have to mention that I had a job by this time at the Case Computing Center. I wasn’t just growing
grass for statisticians anymore. Case was one of the very few institutions in the country with a really
enlightened attitude that undergraduate students were allowed to touch the computers by themselves,
and also write software for the whole campus. Dartmouth was another place. There was a guy named
Fred Way who set the policy at Case, and instead of going the way most places go, which would hire
professionals to run their computer center, Case hired its own students to play with the machines and to
do the stuff everybody was doing. There were about a dozen of us there, and we turned out to be fairly
good contributors to the computing industry in the long range of things. I told all of my friends how this IT
compiler worked, and we got together and made our own greatly improved version the following year. It
was called RUNCIBLE. Every program in those days had to have an acronym and this was the Revised
Unified New Compiler Basic Language Extended, or something like this. We found a reason for the
name. But we added a million bells and whistles to IT, basically.
Knuth: All on the 2000 word drum. Not only that, but we had four versions of our compiler. One of them
would compile to assembly language. One would compile directly into machine language. One version
would use floating point hardware. And one version would use floating point attachment. If you changed
613 instructions, you would go from the floating point attachment to the floating point hardware version. If
you changed another 372 instructions, it would change from the assembly language version to the
machine language version. If we could figure out a way to save a line of code in the 373 instructions in
one version, then we had to figure out a way to correspondingly save another line of code in the other
version. Then we could have another instruction available to put in a new feature. So RUNCIBLE went
through the stages of software development that have since become familiar, where there is what they
call “creeping featurism”, where every user you see wants a new thing to be added to the software. Then
you put that in and pretty soon the thing gets… you have a harder and harder user manual. That is the
way software always has been. We got our experience of this. It was a group of us developing this;
about, I don’t know, eight of us worked together on different parts of it. But my friend, Bill Lynch, and I
did most of the parts that were the compiler itself. Other people were working on the subroutines that
would support the library, and things like that. Since I mentioned Bill Lynch, I should also, I guess... I
wrote a paper about the way RUNCIBLE worked inside, and it was published in the Communications of
the ACM during my senior year, because we had seen other articles in this journal that described
methods that were not as good as the ones that were in our compiler. So we thought, okay, let’s put it to
work. But I had no idea what scientific publishing was all about. I had only experienced magazines
before, and magazines don’t give credit for things, they just tell the news. So I wrote this article and it
explained how we did it in our compiler. But I didn’t mention Bill Lynch’s name or anything in the article. I
found out to my great surprise afterwards that I was getting credit for having invented these things, when
actually it was a complete team effort. Mostly other people, in fact. I had just caught a few bugs and
done a lot of things, but nothing really very original. I had to learn about scholarship, about scientific
publishing and things as part of this story. So we got this experience with users, and I also wrote the user
manual for this machine. I am an undergraduate. Case allows me to write the user manual for
RUNCIBLE, and it is used as a textbook in classes. Here I’ve got a class that I am taking; I can take a
class and I wrote the textbook for it already as an undergraduate. This meant that I had an unusual
visibility on campus, I guess. The truth is that Case was a really great college for undergraduates, and it
had superb teachers. But it did not have very strong standards for graduate studies. It was very difficult
to get admitted to the undergraduate program at Case, and a lot of people would flunk out. But in
graduate school it wasn’t so hard to get over. I noticed this, and I started taking graduate courses,
because there was no competition. This impressed my teachers -- “Oh, Knuth is taking graduate
courses” -- not realizing that this was line of least resistance so that I could do other things like write
compilers as a student. I edited a magazine and things like that, and played in the band, and did lots of
activity. Now [to] the story, however: What about compilers? Well, I got a job at the end of my senior
year to write a compiler for Burroughs, who wanted to sell their drum machine to people who had IBM
650s. Burroughs had this computer called the 205, which was a drum machine that had 4000 words of
memory instead of 2000, and they needed a compiler for it. ALGOL was a new language at the time.
Somebody heard that I knew something about how to write compilers, and Thompson Ramo Wooldridge
[later TRW Inc.] had a consulting branch in Cleveland. They approached me early in my senior year and
said, “Don, we want to make a proposal to Burroughs Corporation that we will write them an ALGOL
compiler. Would you write it for us if we got the contract?” I believe what happened is that they made a
proposal to Burroughs that for $75,000 they would write a ALGOL compiler, and they would pay me
$5,000 for it, something like this. Burroughs turned it down. But meanwhile I had learned about the 205
machine language, and it was kind of appealing to me. So I made my own proposal to Burroughs. I said
I’ll write you an ALGOL compiler for $5,000, but I can’t implement all of ALGOL. I think I told them I can’t
implement all of ALGOL for this; I am just one guy. Let’s leave out procedures -- subroutines. Well, this
is a big hole in the language! Burroughs said, “No, no -- you got to put in procedures.” I said, “Okay, I
will put in procedures, but you got to pay me $5,500.” That’s what happened. They paid me $5,500,
which was a fairly good salary in those days. I think a college professor was making eight or nine
thousand dollars a year in those days. So between graduating from Case and going to Cal Tech, I
worked on this compiler. As I drove out to California, I drove a 100 miles a day and I sat in a motel and
wrote code. The coding form on which I wrote this code, I now donated it to the Computer History
Museum, and you can see exactly the code that I wrote. I debugged it, and it was Christmas time [when]
I had the compiler ready for Burroughs to use. So I was interested; I had two compilers that I knew all the
code by the end of the ‘60s. Then I learned about other projects. When I was in graduate school some
people came to me and said, “Don, how about writing software full time? Quit graduate school. Just
name your price. Write compilers for a living, and you will have a pretty good living.” That was my
second year of graduate school.
Knuth: I was at Cal Tech in the math department. There was no such thing as a computer science
department anywhere.
Knuth: I didn‘t do physics. I switched into math after my sophomore year at Case, after flunking
welding. I switched into math. There were only seven of us math majors at Case. I went to Cal Tech,
and that’s another story we’ll get into soon. I’m in my second year at Cal Tech, and I was a consultant to
Burroughs. After finishing my compiler for Burroughs, I joined the Product Planning Department. The
Product Planning Department was largely composed of people who had written the best software ever
done in the world up to that time, which was a Burroughs ALGOL compiler for the 220 computer. That
was a great leap forward for software. It was the first software that used list processing and high level
data structures in an intelligent way. They took the ideas of Newell and Simon and applied them to
compilers. It ran circles around all the other things that we were doing. I wanted to get to know these
people, and they were by this time in the Product Planning Group, because Burroughs was doing its very
innovative machines that are the opposite of RISC. They tried to make the machine language look like
algebraic language. This group I joined at Burroughs as a consultant. So I had a programming hat when
I was outside of Cal Tech, and at Cal Tech I am a mathematician taking my grad studies. A startup
company, called Green Tree Corporation because green is the color of money, came to me and said,
“Don, name your price. Write compilers for us and we will take care of finding computers for you to
debug them on, and assistance for you to do your work. Name your price.” I said, “Oh, okay.
$100,000.”, assuming that this was… In that era this was not quite at Bill Gate’s level today, but it was
sort of out there. The guy didn’t blink. He said, “Okay.” I didn’t really blink either. I said, “Well, I’m not
going to do it. I just thought this was an impossible number.” At that point I made the decision in my life
that I wasn’t going to optimize my income; I was really going to do what I thought I could do for… well, I
don’t know. If you ask me what makes me most happy, number one would be somebody saying “I
learned something from you”. Number two would be somebody saying “I used your software”. But
number infinity would be… Well, no. Number infinity minus one would be “I bought your book”. It’s not
as good as “I read your book”, you know. Then there is “I bought your software”; that was not in my own
personal value. So that decision came up. I kept up with the literature about compilers. The
Communications of the ACM was where the action was. I also worked with people on trying to debug the
ALGOL language, which had problems with it. I published a few papers, like ”The Remaining Trouble
Spots in ALGOL 60” was one of the papers that I worked on. I chaired a committee called “Smallgol”
which was to find a subset of ALGOL that would work on small computers. I was active in programming
languages.
Knuth: No. There was a big European group, but this was mostly Americans. Gosh, I can’t remember.
We had about 20 people as co-authors of the paper. It was Smallgol 61? I don’t know. It was so long
ago I can’t remember. But all the authors are there.
Knuth: I was a graduate student, yeah. But this was my computing life.
Knuth: Oh, at Case they thought it was terrible that I even touched computers. The math professor said,
“Don’t dirty your hands with that.”
Knuth: No, first at Case. Cal Tech was one of the few graduate schools that did not have that opinion,
that I shouldn’t touch computers. I went to Cal Tech because they had this [strength] in combinatorics.
Their computing system was incredibly arcane, and it was terrible. I couldn’t run any programs at Cal
Tech. I mean, I would have to use punched paper tape. They didn’t even have punch cards, and their
computing system was horrible unless you went to JPL, Jet Propulsion Laboratory, which was quite a bit
off campus. There you would have to submit a job and then come back a day later. You couldn’t touch
the machines or anything. It was just hopeless. At Burroughs I could go into what they called the
fishbowl, which was the demonstration computer room, and I could run hands-on every night, and get
work done. There was a program that I had debugged one night at Burroughs that was solving a problem
that Marshall Hall, my thesis advisor, was interested in. It took more memory than the Burroughs
machine had, so I had to run it at JPL. Well, eight months later I had gotten the output from JPL and I
had also accumulated the listings that were 10 feet high in my office, because it’s a one- or two-day
turnaround time and then they give you a memory dump at the end of the run. Then you can say, “Oh, I’ll
change this and I’ll try another thing tomorrow.” It was incredibly inefficient, brain damaged computing at
Cal Tech in the early ‘60s. But I kept track with the programming languages community and I became
editor of the programming languages section of the Communications of the ACM and the Journal of the
ACM in, I don’t know, ’64, ’65, something like that. I was not a graduate student, but I was just out of
graduate school in the ‘60s. That was definitely the part of computing that I did by far the most in, in
those days. Computing was divided into three categories. By the time I came to Stanford, you were
either a numerical analyst, or artificial intelligence, or programming language person. We had three
qualifying exams and there was a tripartite division of the field.
Feigenbaum: Don, just before we leave your thesis advisor: your thesis itself was in mathematics, not in
computing, right?
Knuth: Yes.
Feigenbaum: Tell us a little bit about that and what your thesis advisor’s influence on your work was at
the time.
Knuth: Yeah, because this is combinatorial, and it’s definitely an important part of the story.
Combinatorics was not a academic subject at Case. Cal Tech was one of the few places that had it as a
graduate course, and there were textbooks that began to be written. I believe at Stanford, for example,
George Danzig introduced the first class in combinatorics probably about 1970. It was something that
was low on the totem pole in the mathematics world in those days. The high on the totem pole was the
Bourbaki school from France, of highly abstract mathematics that was involved with higher orders of
infinities and things. I had colleagues at Cal Tech that I would say, “You and I intersect at countable
infinity, because I never think of anything that is more than countable infinity, and you never think of
anything that is less than countable infinity.” I mostly stuck to things that were finite in my own work. At
Case, when I’m a senior, we had a visiting professor, R. C. Bose from North Carolina, who was a very
inspiring lecturer. He was an extremely charismatic guy, and he had just solved a problem that became
front page news in the New York Times. It was to find orthogonal Latin squares. Now, today there is a
craze called Sudoku, but I imagine by the time people are watching this tape or listening to this tape that
craze will have faded away. An N-by-N Latin square is an arrangement of N letters so ever row and
every column has all N of the letters. An orthogonal Latin square is where you have two Latin squares
with the property that if you put them next to each other, so you have a symbol from the first and a symbol
from the second, the N squared cells you get have all N squared possibilities. All combinations of A will
occur with A somewhere. A will occur with B somewhere. Z will occur with Z somewhere. A famous
paper, from 1783, I think, by Leonard Euler had conjectured that it was impossible to find orthogonal Latin
squares that were 10 by 10, or 14 by 14, or 18 by 18, or 6 by 6 -- all the cases that were twice an odd
number. This conjecture was believed for 170 years, and even had been proved three times, but people
found holes in the proof. In 1959 R. C. Bose and two other people found that it was wrong, and they
constructed Latin squares that were 10 by 10 and 14 by 14. They showed that all those cases where
actually it was possible to find orthogonal Latin squares. I met Bose. I was taking a class from him. It
was a graduate class, and I was taking graduate classes. He asked me if I could find some 12 by 12
orthogonal Latin squares. It sounded like an interesting program, so I wrote it up and I presented him
with the answer the next morning. He was happy and impressed, and we found five mutually orthogonal
Latin squares of the order of 12. That became a paper. Some interesting stories about that, that I won’t
go into it. The main thing is that he was on the cutting edge on this research. I was at an undergraduate
place where we had great teaching, but we did not have cutting edge researchers. He could recommend
me to graduate school, and he could also tell me Marshall Hall is very good at combinatorics. He gives
me a good plug for going to Cal Tech. I had visited California with my parents on summer vacations, and
so when I applied to graduate school I applied to Stanford, Berkeley and Cal Tech, and no other places.
When I got admitted to Cal Tech, I got admitted to all three. I took Cal Tech because I knew that they had
a good combinatorial attitude there, which was not really true at Stanford. In fact, [at] Stanford I wouldn’t
have been able to study Latin squares at all. While we’re at it, I might as well mention that I got
fellowships. I got a National Science Foundation Fellowship, Woodrow Wilson Foundation Fellowship, to
come to these place, but they all had the requirement that you could not do anything except study as a
graduate student. I couldn’t be a consultant to Burroughs and also have an NSF fellowship. So I turned
down the fellowships. Marshall Hall was then my thesis advisor. He was a world class mathematician,
and had done, for a long time, pioneering work in combinatorics. He was my mentor. But it was a funny
thing, because I was such in awe of him that when I was in the same room with him I could not think
straight. I wouldn’t remember my name. I would write down what he was saying, and then I would go
back to my office so that I could figure it out. We couldn’t do joint research together in the same room.
We could do it back and forth. It was almost like farming my programs out to JPL to be run. But we did
collaborate on a few things. The one thing that we did the most on actually never got published,
however, because it turned out that it just didn’t lead to the solution. He thought he had a way to solve
the Burnside problem in group theory, but it didn’t pan out. After we did all the computation I learned a lot
in the process, but none of these programs have ever appeared in print or anything. It taught me how to
deal with tree structures inside a machine, and I used the techniques in other things over the years. He
also was an extremely good advisor, in better ways than I was with my students. He would seem to keep
track of me to make sure I was not slipping. When I was working with my own graduate students, I was
pretty much in a mode where they would bug me instead of me bugging them. But he would actually
write me notes and say, Don, why don’t you do such and such? Now, I chose a thesis topic which was to
find a certain kind of what they call block designs. I will just say: symmetric block designs with parameter
Lambda equals 2. Anybody could look that up and find out what that means. I don’t want to explain it
now. At the time I did this, I believe there were six known designs of this form altogether. I had found a
new way to look at those designs, and so I thought maybe I’ll be able to find infinitely many more such
designs. They would be mostly academic interest, although statisticians would justify that they could use
them somehow. But mostly, just, do they exist or not? This was the question. Purely intellectual
curiosity. That was going to be my thesis topic: to see if I could find lots of these elusive combinatorial
patterns. But one morning I was looking at another problem entirely, having to do with finite projective
geometry, and I got a listing from a guy at Princeton who had just computed 32 solutions to a problem
that I had been looking at with respect to a homework problem in my combinatorics class. He had found
that there are 32 solutions of Type A, and 32 solutions of Type B, to this particular problem. I said, hmm,
that’s interesting, because the 32 solutions of Type A, one of those was a well known construction. The
32 of Type B, nobody had ever found any Type B solutions before for the next higher up case. I
remember I had just gotten this listing from Princeton, and I was riding up on the elevator with Olga Todd,
one of our professors, and I said, “Mrs. Todd, I think I’m going to have a theorem in an hour. I’m going to
look at these two lists of 32 numbers. For every one on this page I am going to find a corresponding one
on this page. I am going to psyche out the rule that explains why there happen to be 32 of each kind.”
Sure enough, an hour later I had seen how to get from each solution on the first page to the solution on
the second page. I showed this to Marshall Hall. He said, “Don, that’s your thesis. Don’t worry on this
Lambda equals 2 business. Write this up and get out of here.” So that became my thesis. And it is a
good thing, because since then only one more design with Lambda equals 2 has been discovered in the
history of the world. I might still be working on my thesis if I had stuck to that problem. I felt a little guilty
that I had solved my PhD problem in one hour, so I dressed it up with a few other chapters of stuff. The
whole thesis is 70 some pages long. I discovered that it is now on the internet, probably for peoples’
curiosity, I suppose: what did he write about in those days? But of all the areas of mathematics that I’ve
applied to computer science, I would say the only area that I have never applied to computer science is
the one that I did my thesis in. It just was good training for me to exercise my brain cells.
Feigenbaum: Yeah. In fact for your colleagues, that is kind of a black hole in their knowledge of you
and understanding of you, is that thesis.
Knuth: The thesis, yeah. Well, I was going to say the reason that it is not used anymore is because
these designs turn out… Okay, we can construct them with all this pain and careful, deep analysis. But
it turned out later on that if we just work at random, we get even better results. So it was kind of pointless
from the point of view of that application, except for certain codes and things like that.
Feigenbaum: Don, just a footnote to that story. I intended this would come up later in the interview, but
it’s just so great a point to bring it in. When I’ve been advising graduate students, I tell them that the
really hard part of the thesis is finding the right problem. That’s at least half the problem.
Knuth: Yeah.
Feigenbaum: And then the other half is just doing it. And that’s the easy part of it. So I am not
impressed by this one hour. I mean, the hard part went into finding the problem, not in the solving of it.
We will get to, of course, the great piece of work that you did on The Art of Computer Programming. But
it’s always seemed to me that the researching and then writing the text of The Art of Computer
Programming was a problem generator for you. The way you and I have expressed it in the past is that
you were weaving a fabric and you would encounter holes in the fabric. Those would be the great
problems to solve, and that’s more than half the work. Once you find the problems you can go get at
them. Do you want to comment on that?
Knuth: Right. Well, yeah. We will probably comment on it more later, too. But I guess one of the
blessings and curses of the way I work is that I don’t have difficulty thinking of questions. I don’t have too
much difficulty in the problem generation phase -- what to work on. I have to actively suppress
stimulation so that I’m not working on too many things at once. But you can ask questions that are…
The hard thing, for me anyway, is not to find a problem, but to find a good problem. To find a problem
that has some juice to it. Something that will not just be isolated to something that happens to be true,
but also will be something that will have spin offs. That once you’ve solved the problem, the techniques
are going to apply to many other things, or that this will be a link in a chain to other things. It’s not just
having a question that needs an answer. It’s very easy to… There’s a professor; I might as well mention
his name, although I don’t like to. It would be hard to mention the concept without somebody thinking of
his name. His name is [Florentin] Smarandache. I’ve never met him, but he generates problems by the
zillions. I’ve never seen one of them that I thought any merit in it whatsoever. I mean, you can generate
sequences of numbers in various ways. You can cube them and remove the middle digit, or something
like this. And say, ”Oh, is this prime?”, something like that. There’s all kinds of ways of defining
sequences of numbers or patterns of things and then asking a question about it. But if one of my
students say “I want to work on this for a thesis”, I would have to say “this problem stinks”. So the hard
thing is not to come up with a problem, but to come up with a fruitful problem. Like the famous problem of
Fermat’s Last Theorem: can there be A to the N, plus B to the N equals C to the N, for N greater than 2.
It has no applications. So you found A, B and C. It doesn’t really matter to anything. But in the course of
working on this problem, people discovered beautiful things about mathematical structures that have
solved uncountably many practical applications as a spin off. So that’s one. My thesis problem that I
solved was probably not in that sense, though, extremely interesting either. It answered a question
whether there existed projective geometries of certain orders that weren’t symmetrical. All the cases that
people had ever thought of were symmetrical, and I thought of unsymmetrical ways to do it. Well, so
what? But the technique that I used for it led to some insight and got around some other blocks that
people had in other theory. I have to worry about not getting bogged down in every question that I think
of, because otherwise I can’t move on and get anything out the door.
Feigenbaum: Don, we've gotten a little mixed up between the finishing of your thesis and your assistant
professorship at Caltech, but it doesn't matter. Around this time there was the embryonic beginnings of a
multi-volume work which you're known for, "The Art of Computer Programming." Could you tell us the
story about the beginning? Because soon it's going to be the middle of it, you were working on it so fast.
Knuth: This is, of course, really the story of my life, because I hope to live long enough to finish it. But I
may not, because it's turned out to be such a huge project. I got married in the summer of 1961, after
my first year of graduate school. My wife finished college, and I could use the money I had made -- the
$5000 on the compiler -- to finance a trip to Europe for our honeymoon. We had four months of wedded
bliss in Southern California, and then a man from Addison-Wesley came to visit me and said "Don, we
would like you to write a book about how to write compilers." The more I thought about it, I decided “Oh
yes, I've got this book inside of me.” I sketched out that day -- I still have the sheet of tablet paper on
which I wrote -- I sketched out 12 chapters that I thought ought to be in such a book. I told Jill, my wife, "I
think I'm going to write a book." As I say, we had four months of bliss, because the rest of our marriage
has all been devoted to this book. Well, we still have had happiness. But really, I wake up every morning
and I still haven't finished the book. So I try to -- I have to -- organize the rest of my life around this, as
one main unifying theme. The book was supposed to be about how to write a compiler. They had heard
about me from one of their editorial advisors, that I knew something about how to do this. The idea
appealed to me for two main reasons. One is that I did enjoy writing. In high school I had been editor of
the weekly paper. In college I was editor of the science magazine, and I worked on the campus paper as
copy editor. And, as I told you, I wrote the manual for that compiler that we wrote. I enjoyed writing,
number one. Also, Addison-Wesley was the people who were asking me to do this book; my favorite
textbooks had been published by Addison Wesley. They had done the books that I loved the most as a
student. For them to come to me and say, ”Would you write a book for us?", and here I am just a second-
year gradate student -- this was a thrill. Another very important reason at the time was that I knew that
there was a great need for a book about compilers, because there were a lot of people who even in 1962
-- this was January of 1962 -- were starting to rediscover the wheel. The knowledge was out there, but it
hadn't been explained. The people who had discovered it, though, were scattered all over the world and
they didn't know of each other's work either, very much. I had been following it. Everybody I could think
of who could write a book about compilers, as far as I could see, they would only give a piece of the
fabric. They would slant it to their own view of it. There might be four people who could write about it, but
they would write four different books. I could present all four of their viewpoints in what I would think was
a balanced way, without any axe to grind, without slanting it towards something that I thought would be
misleading to the compiler writer for the future. I considered myself as a journalist, essentially. I could be
the expositor, the tech writer, that could do the job that was needed in order to take the work of these
brilliant people and make it accessible to the world. That was my motivation. Now, I didn’t have much
time to spend on it then, I just had this page of paper with 12 chapter headings on it. That's all I could do
while I'm a consultant at Burroughs and doing my graduate work. I signed a contract, but they said "We
know it'll take you a while." I didn't really begin to have much time to work on it until 1963, my third year
of graduate school, as I'm already finishing up on my thesis. In the summer of '62, I guess I should
mention, I wrote another compiler. This was for Univac; it was a FORTRAN compiler. I spent the
summer, I sold my soul to the devil, I guess you say, for three months in the summer of 1962 to write a
FORTRAN compiler. I believe that the salary for that was $15,000, which was much more than an
assistant professor. I think assistant professors were getting eight or nine thousand in those days.
Feigenbaum: Well, when I started in 1960 at [University of California] Berkeley, I was getting $7,600 for
the nine-month year.
Knuth: Yeah, so you see it. I got $15,000 for a summer job in 1962 writing a FORTRAN compiler. One
day during that summer I was writing the part of the compiler that looks up identifiers in a hash table. The
method that we used is called linear probing. Basically you take the variable name that you want to look
up, you scramble it, like you square it or something like this, and that gives you a number between one
and, well in those days it would have been between 1 and 1000, and then you look there. If you find it,
good; if you don't find it, go to the next place and keep on going until you either get to an empty place, or
you find the number you're looking for. It's called linear probing. There was a rumor that one of
Professor Feller's students at Princeton had tried to figure out how fast linear probing works and was
unable to succeed. This was a new thing for me. It was a case where I was doing programming, but I
also had a mathematical problem that would go into my other [job]. My winter job was being a math
student, my summer job was writing compilers. There was no mix. These worlds did not intersect at all
in my life at that point. So I spent one day during the summer while writing the compiler looking at the
mathematics of how fast does linear probing work. I got lucky, and I solved the problem. I figured out
some math, and I kept two or three sheets of paper with me and I typed it up. [“Notes on ‘Open’
Addressing’, 7/22/63] I guess that's on the internet now, because this became really the genesis of my
main research work, which developed not to be working on compilers, but to be working on what they call
analysis of algorithms, which is, have a computer method and find out how good is it quantitatively. I can
say, if I got so many things to look up in the table, how long is linear probing going to take. It dawned on
me that this was just one of many algorithms that would be important, and each one would lead to a
fascinating mathematical problem. This was easily a good lifetime source of rich problems to work on.
Here I am then, in the middle of 1962, writing this FORTRAN compiler, and I had one day to do the
research and mathematics that changed my life for my future research trends. But now I've gotten off the
topic of what your original question was.
Feigenbaum: We were talking about sort of the.. You talked about the embryo of The Art of Computing.
The compiler book morphed into The Art of Computer Programming, which became a seven-volume plan.
Knuth: Exactly. Anyway, I'm working on a compiler and I'm thinking about this. But now I'm starting,
after I finish this summer job, then I began to do things that were going to be relating to the book. One of
the things I knew I had to have in the book was an artificial machine, because I'm writing a compiler book
but machines are changing faster than I can write books. I have to have a machine that I'm totally in
control of. I invented this machine called MIX, which was typical of the computers of 1962. In 1963 I
wrote a simulator for MIX so that I could write sample programs for it, and I taught a class at Caltech on
how to write programs in assembly language for this hypothetical computer. Then I started writing the
parts that dealt with sorting problems and searching problems, like the linear probing idea. I began to
write those parts, which are part of a compiler, of the book. I had several hundred pages of notes
gathering for those chapters for The Art of Computer Programming. Before I graduated, I've already
done quite a bit of writing on The Art of Computer Programming. I met George Forsythe about this time.
George was the man who inspired both of us [Knuth and Feigenbaum] to come to Stanford during the
'60s. George came down to Southern California for a talk, and he said, "Come up to Stanford. How
about joining our faculty?" I said "Oh no, I can't do that. I just got married, and I've got to finish this book
first." I said, "I think I'll finish the book next year, and then I can come up [and] start thinking about the
rest of my life, but I want to get my book done before my son is born.” Well, John is now 40-some years
old and I'm not done with the book. Part of my lack of expertise is any good estimation procedure as to
how long projects are going to take. I way underestimated how much needed to be written about in this
book. Anyway, I started writing the manuscript, and I went merrily along writing pages of things that I
thought really needed to be said. Of course, it didn't take long before I had started to discover a few
things of my own that weren't in any of the existing literature. I did have an axe to grind. The message
that I was presenting was in fact not going to be unbiased at all. It was going to be based on my own
particular slant on stuff, and that original reason for why I should write the book became impossible to
sustain. But the fact that I had worked on linear probing and solved the problem gave me a new unifying
theme for the book. I was going to base it around this idea of analyzing algorithms, and have some
quantitative ideas about how good methods were. Not just that they worked, but that they worked well:
this method worked 3 times better than this method, or 3.1 times better than this method. Also, at this
time I was learning mathematical techniques that I had never been taught in school. I found they were
out there, but they just hadn't been emphasized openly, about how to solve problems of this kind. So my
book would also present a different kind of mathematics than was common in the curriculum at the time,
that was very relevant to analysis of algorithm. I went to the publishers, I went to Addison Wesley, and
said "How about changing the title of the book from ‘The Art of Computer Programming’ to ‘The Analysis
of Algorithms’." They said that will never sell; their focus group couldn't buy that one. I'm glad they stuck
to the original title, although I'm also glad to see that several books have now come out called “The
Analysis of Algorithms”, 20 years down the line. But in those days, The Art of Computer Programming
was very important because I'm thinking of the aesthetical: the whole question of writing programs as
something that has artistic aspects in all senses of the word. The one idea is “art” which means artificial,
and the other “art” means fine art. All these are long stories, but I’ve got to cover it fairly quickly. I've got
The Art of Computer Programming started out, and I'm working on my 12 chapters. I finish a rough draft
of all 12 chapters by, I think it was like 1965. I've got 3,000 pages of notes, including a very good
example of what you mentioned about seeing holes in the fabric. One of the most important chapters in
the book is parsing: going from somebody's algebraic formula and figuring out the structure of the
formula. Just the way I had done in seventh grade finding the structure of English sentences, I had to do
this with mathematical sentences. Chapter ten is all about parsing of context-free language, [which] is
what we called it at the time. I covered what people had published about context-free languages and
parsing. I got to the end of the chapter and I said, well, you can combine these ideas and these ideas,
and all of a sudden you get a unifying thing which goes all the way to the limit. These other ideas had
sort of gone partway there. They would say “Oh, if a grammar satisfies this condition, I can do it
efficiently.” ”If a grammar satisfies this condition, I can do it efficiently.” But now, all of a sudden, I saw
there was a way to say I can find the most general condition that can be done efficiently without looking
ahead to the end of the sentence. That you could make a decision on the fly, reading from left to right,
about the structure of the thing. That was just a natural outgrowth of seeing the different pieces of the
fabric that other people had put together, and writing it into a chapter for the first time. But I felt that this
general concept, well, I didn't feel that I had surrounded the concept. I knew that I had it, and I could
prove it, and I could check it, but I couldn't really intuit it all in my head. I knew it was right, but it was too
hard for me, really, to explain it well. So I didn't put in The Art of Computer Programming. I thought it
was beyond the scope of my book. Textbooks don't have to cover everything when you get to the harder
things; then you have to go to the literature. My idea at that time [is] I'm writing this book and I'm thinking
it's going to be published very soon, so any little things I discover and put in the book I didn't bother to
write a paper and publish in the journal because I figure it'll be in my book pretty soon anyway. Computer
science is changing so fast, my book is bound to be obsolete. It takes a year for it to go through editing,
and people drawing the illustrations, and then they have to print it and bind it and so on. I have to be a
little bit ahead of the state-of-the-art if my book isn't going to be obsolete when it comes out. So I kept
most of the stuff to myself that I had, these little ideas I had been coming up with. But when I got to this
idea of left-to-right parsing, I said "Well here's something I don't really understand very well. I'll publish
this, let other people figure out what it is, and then they can tell me what I should have said." I published
that paper I believe in 1965, at the end of finishing my draft of the chapter, which didn't get as far as that
story, LR(k). Well now, textbooks of computer science start with LR(k) and take off from there. But I
want to give you an idea of…
Feigenbaum: Don, for historical reasons, tell the audience where the LR(k) paper was published so they
can go look it up.
Knuth: It was published in the journal called Information and Control, which has now changed its name
to Information and Computation. In those days, you can see why they called it Information and Control. It
was the journal that had had the best papers on parsing of languages at the time. It's a long paper, and
difficult. It's also reprinted in my book “Selected Papers on Computer Languages”, with a few corrections
to the original. In the original, I drew the trees with the root at the bottom. But everybody draws trees with
the root at the top now, so the reprint has trees drawn in a more modern notation. I'm trying to give the
flavor of the way things were in 1965. My son was born in the summer of '65, and I finished this work on
LR(k) at Christmastime in '65. Then I had, I think, one more chapter to write. But early in '66 I had all
3000 pages of the manuscript ready. I typed chapter one. My idea was, I looked at these pages -- the
pages were all hand-written -- and it looked to me like my handwriting, I would guess, that was, I don't
know how many words there were on a page. I had chapter one and I typed it and I sent it to the
publishers, and they said "Don, what have you written? This book is going to be huge." I had actually
written them a letter earlier as I'm working on sorting. I said to the guy who signed me up, I signed a
contract with him; by this time, he had been promoted. No, I'm not sure was about this, but anyway, I
wrote to him in '63 or '64 saying, "You know, as I'm working on this book on compilers, there's a few
things that deserve more complete treatment than a compiler writer needs to know. Do you mind if I add a
little bit here?" They said "Sure, Don, go right ahead. Whatever you think is good to write about, do it."
Then I send them chapter one a few years later. By this time, I guess the guy's promoted, and he's
saying "Oh my goodness, what are we going to do? Did you realize that this book is going to be more
than 2,000 pages long?", or something like this. No, I didn't. I had read a lot of books, and I thought I
understood about things. I had my typed pages there, and I was figuring five typed pages would go into
one page of text. It just looked to me, to my eyes, if I had five typewritten pages -- you know, the letters in
a textbook are smaller. But I should have realized that the guys at the publishing house knew something
about books too. They told me "No, no, it was one and a half pages of text makes a book [page]." I didn't
believe it. So I went back to my calculus book, which was an Addison Wesley book, and it typed it out.
Sure enough, they were absolutely right. It took one and a half pages. So I had three times longer. No
wonder it had taken me so long to get chapter one done! I'm sitting here with much, much more than I
thought I had. Meanwhile computer science hasn't been standing still, and I knew that more still has to
be written as I go. I went to Boston, and I happened to catch a glance at some notes that my editor had
written to himself for the meeting that we were going to have with his bosses, and one of the comments
on there was "Terrific cost bind" or something like that. Publishing houses all have their horror stories
about a professor who writes12 volumes about the history of an egg, or something like this, and it never
sells, and it just is a terrible thing that they have a contract that they've signed. So they have to figure out
how to rescue something out of this situation coming with this monster book. We thought at first we
would package it into three volumes instead of one. Then they sent out chapter one to a dozen readers
in a focus group, and they got comments on it. Well, the readers liked what they saw in that chapter, and
so at least I had some support from them. Then after a few more months we decided to package it. They
figured out that of the 12 chapters there were seven of them that would sell, and we could stick the other
five in some way that would make a fairly decent seven-volume set. That was what was finally
announced in 1966 or something: that it would come out in seven volumes. After typing chapter one I
typed chapter two, and so on. I kept working on it. All the time when I'm not teaching my classes at
Caltech, I'm typing up my notes and polishing the hand-written notes that I had made from these 3000
pages of rough draft. That sets the scene for the early days of The Art of Computer Programming.
Knuth: What happened is, I'm at Caltech. I'm a math professor. I'm teaching classes in algebra and
once in a while combinatorics at Caltech. Also one or two classes connected with computing, like sorting,
I think I might have taught one quarter. But most of the things I'm teaching at Caltech are orthogonal to
The Art of Computer Programming. My daughter is born in December of '66. I've got the entire
manuscript of volume one to the publisher, I think, during '66. I'm working on typing up chapters three
and four at the beginning of '67. I think this is approximately the way things stand. I was trying to finish
the book before my son was born in '65, and what happened is that I got… I'm seeing now that…
Volume one actually turned out to be almost 700 pages, which means 1,000 type-written pages. You can
see why I said that my blissful marriage wasn't quite so blissful, because I'm working on this a lot. I'm
doing most of it actually watching the late late show on television. I have also some earplugs for when
the kids are screaming a little bit too much. Here I am, typing The Art of Computer Programming when
the babies are crying, although I did also change diapers and so on.
Feigenbaum: I think that what we need to do is talk about… This is December '66, when your daughter
was born.
Knuth: Yeah.
Feigenbaum: That leads sort of directly into this magical year of 1967, which didn't end so magically.
Let's continue on with 1967 in a moment.
Knuth: Okay.
Feigenbaum: Don, once you told me that 1967 was your most creative year. I'd like to get into it. You
also said you had only a very short time to do your research during that year, and the year didn't end so
well for you. Let's talk about that.
Knuth: Well, it's certainly a pivotal year in my life. You can see in retrospect why I think things were
building up to a crisis, because I was just working at high pitch all the time. I think I mentioned I was
editor of ACM Communications, and ACM Journal, in the programming languages sections. I took the
editorial duties very seriously. A lot of people were submitting papers, and I would write long referee
reports in many cases, as well as discussing with referees all the things I had to be doing. I was a
consultant to Burroughs on innovative machines. I was consumed with getting The Art of Computer
Programming done, and I had children, and being a father, and husband. I would start out every day and
I would say "Well, what am I going to accomplish today?" Then I would stay up until I finished it. I used
to be able to do this. When I was in high school and I was editor of the paper, I would do an all-nighter
every week when the paper came out. I would just go without sleep on those occasions. I was sort of
used to working in this mode, where I didn't realize I was punishing my body. We didn't have iPods and
things like that, but still I had the TV on. That was enough to kill the boredom while I had to do the typing
of a lot of material. Now, in 1967, is when things came to a head. Also, it was time for me to make a
career decision. I was getting offers. I think I was offered full professorships at North Carolina in Chapel
Hill, and also at Purdue, I think. I had to make a decision as to what I should do. I was promoted to
Associate Professor at Caltech surprisingly early. The question is, where should I spend the rest of my
life? Should I be a mathematician? Should I be a computer scientist? By this time I had learned that
there was actually possible to do mathematical work as a computer scientist. I had analysis of algorithms
to do. What would be a permanent home? I visited Stanford. I gave a talk about my left-to-right parsing. I
discovered a theorem about it sitting in one of the student dormitories, Stern Hall, the night I gave the
lecture. I came up there, I liked George Forsythe very much, I liked the people that I met here very much.
I was thinking Stanford would be a nice place, but also there were other places too that I wanted to check
out carefully. I was also trying to think about what to do long-term for my permanent home. I don't like to
move. My model of my life was going to be that I was going to make one move in my lifetime to a place
where I had tenure, and I would stay there forever. I wanted to check all these things out, so I was
confronted with this aspect as well. I was signed up to be an ACM lecturer, ACM National Lecture
Program, for two or three weeks in February of 1967, which meant that I give a list of three talks. Each
ACM chapter or university that wants to have a speaker, they coordinate so that I have a schedule. I go
from city to city every day. You probably did the same thing about then.
Feigenbaum: Yep.
Knuth: Stanford and Berkeley were on this list, as well as quite a few schools in the east. That was
three weeks in February where I was giving talks, about different things about programming languages,
mostly. When I'm at Caltech, I've got to be either preparing my class lectures, or typing my book and
getting it done. I'm in the middle of typing chapter four at this time, which is the second part of volume
two. I'm about, I don't know, one third of the way into volume two. That's why I don't have time to do
research. If I get a new idea, if I'm saying "Here's a problem that ought to be solved", when am I going to
do it? Maybe on the airplane. As you know, when you're a lecturer every day goes the same way. You
get up at your hotel, and you get on the plane. Somebody meets you at noon and you go out to lunch
and then they have small talk. They ask you the same questions; "Where are you going to be tomorrow,
Don", and so on. You give your lecture in the afternoon, there's a party in the evening, and then you go
to your hotel. The next morning you just go off to the next city. After three weeks of this, I got really not
very good. I skipped out in one case. There was a snowstorm in Atlanta, so I skipped my talk in Atlanta
and I stayed an extra day. I'm trying to give you the flavor of this. But on this trip in February, also, it
turned out to be very fruitful because one of my stops was in Cornell, where Peter Wegner was a visiting
professor. We went out for a hike that weekend to talk about the main topic in programming languages in
those days: how do you define the semantics of a programming language. What's a good way to
formalize the meaning of the sentences in that language? When someone writes a string of symbols, we
wanted to say exactly what that means, and do it in a way that we can prove interesting results about,
and make sure that we’ve translated it correctly. There were a lot of ideas floating in the air at the time. I
had been thinking of how I'm presenting it in The Art of Computer Programming. I said, well, you know,
there were two basic ways to do this. One is top down, where you have the context telling you what to
do. You start out and you say, “Oh, this is supposed to be a program. What does a program mean?”
Then a program tells the things inside the program what they're supposed to mean. The other is bottom
up, where you just start with one symbol, this is a number one, and say “this means one”, and then you
have a plus sign, and one plus two, and you build up from the bottom, and say “that means three”. So we
have a bottom-up version of semantics, and a top-down version of semantics. Peter Wegner says to me
"Don, why don't you use both top-down and bottom-up? Have the synthesized attributes from the bottom
up and the inherited attributes that come down from the environment." I said "Well, this is obviously
impossible. You get into circular reasoning. You can't define something in terms of itself." We were
talking about this, and after ten minutes I realized I was shouting to him, because I was realizing that he
was absolutely right. You could do it both ways, and define the things in a way that they would not
interfere with each other; that certain aspects of the meaning could come from the top, and other aspects
from the bottom, and that this actually made a beautiful combination.
Ed Feigenbaum: Don, we were speaking about semantics of programming languages and you were
shouting at Peter Wegner.
Don Knuth: I’m shouting at Peter Wegner because it turns out that there’s a beautiful way to combine
the top-down and bottom-up approaches simultaneously when you’re defining semantics. This is
happening on a weekend as we’re hiking at Cornell in a beautiful park by frozen icicles. I can remember
the scene because this was kind of an “aha” moment that doesn’t happen to you very often in your life.
People tell me now no one’s allowed in that park in February because it’s too risky that you’re going to
slide and hurt yourself. It was when all of a sudden it occurred to me that this might be possible. But I
don’t have time to do research. I have to go on and give more lectures. Well, I find myself the next week
at Stanford University speaking to the graduate students. I gave one of my regular lectures, and then
there was an hour where the students ask questions to the visitor. There was a student there named
Susan Graham, who of course turned out to be a very distinguished professor at Berkeley and editor of
Transactions on Programming Languages and Systems, and she asked me a question. “Don, how do
you think would be a good way to define semantics of programming languages?” In the back of my mind
through that week I had been tossing around this idea that Peter and I had talked about the week before.
So I said, “Let’s try to sketch out a simple language and try to define its semantics”. On the blackboard,
in response to Susan’s questions, we would erase, and try things, and some things wouldn’t work. But
for the next 15 or 20 minutes I tried to write down something that I had never written down before, but it
was sort of in the back of my mind: how to define a very simple algebraic language and convert it into a
very simple machine language which we invented on the spot to be an abstract but very simple computer.
Then we would try to write out the formal semantics for this, so that I could write a few lines in this
algebraic language, and then we could parse it and see exactly what the semantics would be, which
would be the machine language program. Of course there must have been a lot of bugs in it, but this is
the way I had to do research at that time. I had a chance while I’m in front of the students to think about
the research problem that was just beginning to jell. Who knows how bad it was fouled up, but on the
other hand, being a teacher, that’s when you get your thoughts in order best. If you’re only talking to
yourself, you don’t organize your thoughts with as much discipline. It probably was also not a bad way to
do research. I didn’t get a chance to think about it when I got home to Caltech because I’m typing up The
Art of Computer Programming when I’m at home, and I’m being an editor, and I’m teaching my classes
the rest of the time at Caltech. Then in April I happened to be giving a lecture in Grenoble, and a
Frenchman, Louis Bolliet, asked me something about how one might define semantics, in another sort of
a bull session in Grenoble in France. That was my second chance to think about this problem, when I
was talking with him there. I was stealing time from the other things. That wasn’t the only thing going on
in ’67. I wasn’t only thinking of what to do with my future life, and editing journals and so on, I’m also
teaching a class at Caltech for sophomores. It’s an all year class, sort of an introduction to abstract
mathematics. While I was looking at a problem, we had a visitor at Caltech named Trevor-- what’s his
last name-- Evans, Trevor Evans. He and I were discussing how to work from axioms, and to prove
theorems from axioms. This is a basic thing in abstract mathematics. Somebody sets down an axiom,
like the associative law; it says that if parentheses “ab” times “c” is equal to “a” times parentheses “bc.”
That’s an axiom. I was looking at other axioms that were sort of random. One of the things I asked my
students in the class was, I was trying to teach the sophomores how to do mini research problems. So I
gave them axioms which I called the “axioms of a grope.” They were supposed to develop “grope theory”
-- they were supposed to grope for theorems. Of course the mathematical theory well developed is a
“group”, which I had been teaching them; axioms of groups. One of them is the associative law. Another
axiom of groups is that an element times its inverse is equal to the identity. Another axiom is that the
identity times anything, identity times “X”, is “X”. So groups have axioms. We learned in the class how to
derive consequences of these axioms that weren’t exactly obvious at the beginning. So I said, okay, let’s
make a “grope.” The axiom for a grope is something like “x” times the quantity “yx” was equal to “y”. I give
them this axiom, and I say to the class, what can you derive? Can you find all gropes that have five
elements? Can you prove any theorems about normal subgropes, or whatever it is? Make up a theory.
As a class we came back in a week saying what theorems could you come up with. We tried to imagine
ourselves in the shoes of an inventor of a mathematical theory, starting with axioms. Well, Trevor Evans
was there and he showed me how to define what we called the “free grope,” which is the set of all… It
can be infinite, but you take all strings of letters, all formulas. Is it possible to tell whether one formula
can be proved equal to the other formula just by using this one axiom of the grope, “x” times “yx” equals
“y”? He showed me a very nice way to solve that problem, because he had been working on word
problems in what’s called universal algebra, the study of axiom systems. While I was looking at Trevor
Evans’ solution to this problem -- this problem arose in connection with my teaching of the class -- I
looked at Trevor Evans’ solution to this problem and I realized that I could develop an actual method that
would work with axioms in general, without thinking that a machine could figure out. The machine could
start out with the axioms of group theory, and after a small amount of computation, it could come up with
a set of 10 consequences of those axioms that would be enough to decide the word problem for free
groups. And the machine was doing it. We didn’t need a mathematician there to prove, to say, “Oh, now
try combining this formula and this formula.” With the technique I learned from Trevor Evans, and then
with a little extra twist that I put on it, I could set the machine going on axioms and it would automatically
know which consequences of these things, which things to plug in, would be potentially fruitful. If we were
lucky, like we were in the case of group theory axioms, it would finally get to the end and say, “Now,
there’s nothing more can be proved. I’ve got enough. I’ve got a complete set of reductions. If you apply
these reductions and none of them applies, you’ve got it.” It relates to AI techniques of expert systems, in
a way. This idea came to me as I’m teaching this basic math class. The students in this class were
supposed to do a term paper. In the third quarter, everybody worked on this. One of the best students in
the class, Peter Bendix, chose to do his term paper by implementing the algorithm that I had sketched on
the blackboard in one of the lectures at that time. So we could do experiments during the spring of ’67,
trying out a whole bunch of different kinds of axioms and seeing which ones the machine would solve and
which ones it would keep spinning and keep generating more and more reductions that seemed to go
without limit. We figured out in some cases how we could introduce new axioms that would bring the
whole thing back down again. So we’re doing a lot of experiments on that kind of thing. I don’t have time
to sit down at home and work out the theory for it, but I knew it had lots of possibilities. Here I had
attribute grammars coming up in February, and these reductions systems coming up in March, and I’m
supposed to be grinding out Volume Two of The Art of Computer Programming. The text of volume one
had gone to Addison-Wesley the previous year, and the copy editor had sent me back corrections and
told me, “Don, this isn’t good writing. You’ve got to change this,” and he’d teach me Addison-Wesley
house style. The page proofs started coming. I started going through galley proofs, but now it was time to
get page proofs for volume one. Volume one was published in January of 1968, but the page proofs
started to be available in the spring also.
Don Knuth: Right. There’s a conference in April in Norway on simulation languages; that was another of
the things that I’d been working on at Burroughs. We had a language called SOL, Simulation Oriented
Language, which was an improvement of the state-of-the-art in systems simulation, in what they called
discrete simulation languages. There was an international conference held in Norway by the people who
had invented the Simula language, which wasn’t very well known. They organized this conference and I
went to that, visiting Paris and Grenoble on my way because Maurice Nivat and I had also become
friends. His thesis was on theory of context-free grammars, and no one in France would read it. He
found a guy in America who would appreciate his work, so he came out and we spent some time together
in ’66 getting to know each other and talking about context-free grammar research. I visited him in Paris
and then I went to Grenoble, and then went to Norway for this conference on simulation languages where
I presented a paper about SOL, and learned about Simula, and so on. My parents and Jill’s parents are
taking care of our kids while we’re in Europe during this time in April. I’m scheduled in June to lecture at
a summer school in Copenhagen, an international summer school. I’m giving lectures about how to
parse, what’s called top-down parsing. “LL(k)” is the terminology that developed after these lectures. This
was a topic that I did put in my draft of Chapter 10. It was something that I understood well enough that I
didn’t have to publish it at the time. I gave it for the first time in these lectures in June in Copenhagen.
That was a one-week series of lectures with several lectures every day, five days, to be given there. The
summer school met for two weeks, and I was supposed to speak in the second week of that summer
school. All right. What happened then in May is I had a massive bleeding ulcer, and I was in the hospital.
My body gave out. I was just doing all this stuff, and it couldn’t take it. I learned about myself. I had a
wonderful doctor who showed me his textbook about ulcers. At that time they didn’t know that ulcers are
related to this bacteria. As far as they were concerned it, was just acid.
Q Stress.
Don Knuth: Yeah. People would get operations so that their stomachs wouldn’t produce so much acid,
and things like that. Anyway, he showed me his textbook, and his textbook described the typical ulcer
patient; what other people call the “Type A” personality. It just described me to a “T”, all of the things that
were there. I was an automaton, I think, basically. I had all been all my life pretty much a test-taking
machine. You know, I saw a goal and I put myself to it, and I worked on it and pushed it through. I didn’t
say no to people when they said, “Don, can you do this for me?” At this point I saw, I could all of a
sudden get to understand, that I had this problem; that I shouldn’t try to do the impossible. The doctor, I
say he’s so wonderful because doctors usually talk down to patients and they keep their secrets to
themselves. But here he let me look at this textbook so I could know that he wasn’t just telling me
something to make me feel good. I had access to anything I wanted to know about my condition. So I
wrote a letter to my publisher, framed in black, saying, “I’m not going to be able to get the manuscript of
volume two to you this year. I’m sorry. I’m not supposed to work for the next three weeks.” In fact, you
can tell exactly where this was. I was writing in a part of volume two when the ulcer happened, when it
started to burst or whatever. I was working out the answer to a problem about greatest common divisors
that goes about in the middle of volume two. It was an exercise where the answer had a lot of cases to it,
so it takes about a page and a half to explain the answer. It was a problem that needed to be studied and
nobody had studied before, and I was working at it. All of a sudden, bingo. The reason you can find it is
if you look in the index to volume two under “brute force,” it refers you to a page, an answer page. I was
solving this problem by brute force, and so you look at that page, you can see exactly what exercise I was
working on. Then I put it away. I only solved half of the exercise before I could work on it again a few
weeks later. I went into the hospital. It wasn’t too bad, but the blood supply… I took iron pills and got
ready. I could still go to Copenhagen to give my lectures in June. However, the first week was supposed
to be lectures by Nicholas Wirth, and the second week was supposed to be lectures by me. But Klaus
had just gone on an around the world tour with his wife and had come down with dysentery in India and
was extremely ill, and had to cancel his lectures. So I was supposed to go on in the first week instead.
But I was stealing time so bad, I hadn’t really prepared my lectures. I said, oh, I have a week. I’ll go to
Copenhagen, listen to Klaus and I’ll prepare my lectures. I hadn’t prepared. So I’m talking about stuff that
has never been written down before, never been developed with the students. I get to Copenhagen with
one day to prepare for this week of lectures. Well, one thing in Copenhagen, there’s wonderful parks all
over the city. I sat down under a big tree in one of those parks on the first day, and I thought of enough
things to say in my first two lectures. On the second day I gave the lectures, and I sat down under that
tree and I worked out the lectures for the next day. These lectures became my paper called “Top-Down
Syntax Analysis.” That was the story of the first part of June. The second part of June I’m going to a
conference in Oxford, one of the first conferences on discrete mathematics. There I’m presenting my
paper on the new method that I had discovered, now called the Knuth-Bendix algorithm, about the word
problems in universal algebra. After I finished my lectures at Copenhagen I had time to write the paper
that I was giving at Oxford the following week. There at Oxford, I meet a lot of other people and get more
stimulated about combinatorial research, which I can’t do. Come back to Caltech and I’m working as a
consultant as well. I resigned from ten editorial boards at this time. No more ACM Journal, no more
Communications. I gave up all of the editorships that I was on in order to cut down my work load. I started
working again on volume two where I left off at the time of the ulcer, but I would be careful to go to sleep
and keep a regular schedule. In the fall I went to a conference in Santa Barbara, a conference on
combinatorial mathematics. That was my first chance to be away from Caltech, away from my teaching
duties, away from having to type The Art of Computer Programming. That’s where I had three days to sit
on the beach and develop the theory of attribute grammars, this idea of top-down and bottom–up. I cut
out of the whole conference. I didn’t go to any of the talks. I just sat on the beach and worked on the
theory of attribute grammar. As it turned out, I wasn’t that interested in most of the talks, although I met
people that became lifelong friends at the meals and we talked about things off-line. But the formal talks
themselves, I was getting disappointed with mathematical talks. I found myself, in most lectures on
mathematics that I heard in 1966 and ’67, I sat in the back row and I said, “So what? So what?” Computer
science was becoming much more exciting to me. When I finally made my career decision as to where to
go, I had four main choices. One was stay at Caltech. They offered me full professor of mathematics. I
could go to Harvard as a full professor in applied science, which meant computer science. That was as
close as you could get to computer science there. At Harvard my job would have been to build up a
computer science department there. Harvard was, in Floyd’s term, an advanced backwater at that point
in time for computer science, and Caltech was as well. Because Caltech and Harvard are so good at
physics and chemistry and biology, they were thinking of computers because they can help chemists and
physicists and biologists. They didn’t think of it as having problems of its own interest. Stanford, where
we had the best group of computer scientists in the world already there, and knowing that computer
science had a great future, and also the best students in the world were there to work with, the program
was already built up. I could come to Stanford and be one of the boys and do computer science, instead
of argue for computer science and try to do barnstorming. Berkeley was the fourth place. I admired
Berkeley very much as probably the greatest all around institution for covering everything. Everything
Stanford covered it covered well, but it didn’t have a professor of Sanskrit, and Berkeley had a professor
of Sanskrit, that sort of thing. But I was worried about Berkeley because Ronald Reagan was governor.
Stanford was a private school and wouldn’t be subject to the whims of politicians so much as the
University of California. Stanford had this great other thing where the faculty can live on campus, so I
knew that I could come to Stanford and the rest of my life I would be able to bike to work; I wouldn’t have
to do any commuting. And Forsythe was a wonderful person, and all the group at Stanford were great,
and the students were the best. So it was almost a no-brainer, why I finally came to Stanford. My offer
from Stanford came through in February of ’68, which was the end. The other three had already come in
earlier, but I was waiting for Stanford before I made my final decision. In February of ’68 I finally got the
offer from Stanford. It was a month after volume one had been published, and George said, “Oh yes,
everybody’s all smiles now.”
Ed Feigenbaum: Everyone was all smiles because they had gone out on a limb to offer you a full
professorship?
Don Knuth: No, because the committees were saying, “This guy is just 30 years old.” You know, I was
born in ’38 and this was January of ’68. But when they looked at the book, they said, “Oh, there’s some
credibility here.” That helped me. I got through ’67 and learned how to slack off a little bit, right? I’ve
always felt after that, hearing many other stories of people of when did they get these special insights that
turned out to be important in their research thing, that was very rarely in a settled time of their life, where
they had a comfortable living conditions and good – the word is escaping me now - but anyway, luxury;
set up a nice office space and good lighting and so forth. No, people are working in a garret, they’re
starving, they’ve got kids screaming, there’s a war going on or something. But that’s when they get a lot
of their most… almost every breakthrough idea. I’ve always wondered, if you wanted to set up a think
tank where you were going to get the most productivity out of your scientists, wouldn’t you have to, not
exactly torture them, but deprive them of things? It’s not sustainable. Still, looking back, that was a time
when I did as much science as I could, as well as try to fulfill all my other obligations.
Ed Feigenbaum: Don, to go back to the Stanford move. A couple of questions come up, because I was
around. I remember sitting in George Forsythe’s office, just a handful of us people considering the
appointment of this young guy from Caltech who had this wonderful outline of books. One of the things
that we were discussing was [that] Don Knuth wanted us to also hire Bob Floyd. It turns out that hiring
Bob Floyd was a wonderful idea. Bob Floyd was magnificent. But it hadn’t occurred to us until you
brought it up, and then we did it. Can you go into that story?
Don Knuth: Yeah, because Bob was a very special person to me throughout this period. As I said, I’d
been reading the literature about programming languages avidly. When I was asked to write a book
about it in ’62, I knew there were these people who had written nice papers, but nobody knew how to sort
out the chaff from the wheat. In the early days, like by 1964, my strong opinion was that five good papers
about programming languages had ever been written, and four of them were by Bob Floyd. I met Bob the
first time in summer of ’62 when I was working on this Fortran compiler for Univac. At the end of the
summer I went to the ACM conference in Syracuse, New York, and Bob was there. We hit it off very well
right away. He was showing me his strange idea that you could prove a computer program correct,
something that had never occurred to me. I said I was a programmer in one room, and I was a
mathematician in another room. Mathematicians prove things. Programmers write code and they hope it
works, and they twiddle it until it works. But Bob’s saying, no, you don’t have to twiddle; you can take a
program and you can give a mathematical proof that it works. He was way ahead of me. There were very
few people who had ever conceived of putting those two worlds together at that time.
Don Knuth: McCarthy, exactly, right. John and Bob were probably… I don’t know if there was anybody
in Europe yet who had seen this right. Bob tells me his thoughts about this when I meet him in this
conference in Syracuse. Then I went to visit him a year later when I was in Massachusetts at the crisis
meeting with my publishers. He lived there, and I went and spent a couple of days in Topsfield where he
lived. We shared ideas about sorting. Then we had a really exciting correspondence over the next time
where letters go back and forth, each one trying to trump the other about coming up with a better idea
about something that’s now called sorting networks. Bob and I developed a theory of sorting networks
between us in the correspondence. We were thinking at the time, this looks like Leibnitz writing to
Bernoulli in the old days of scientists trying to develop a new theory. We had a very exciting time working
on these letters. Every time I would send a letter off to Bob, thinking, “Okay, now this is the last result,” he
would come back with a brand new idea and make me work harder to come up with the next step in our
development of this theory. We weren’t talking only about programming languages; we were talking also
about a variety of algorithms. We found that we had lots of common interests. He came out to visit me a
couple of times in California, and I visited him. So when I was making my career decision, I said, “Hey
Bob, wouldn’t it be nice if we could both end up at the same place?” I wrote him a letter, probably the
same letter where I was describing to him my idea about left-to-right parsing. As soon as I discovered it, I
wrote immediately to Bob a 12-page letter with ideas of left-to-right parsing after I had come up with the
idea. He comes back and says, “Oh, bravo, and did you think about this,” and so on. So we had this
going on. Then at the beginning of ’67 I said, “You know, Bob, why don’t we think about trying to get into
the same place together? What is your take on the different places in the world?” At that time he was at
Carnegie. He had left Computer Associates and spent, I think, two years at Carnegie. He was enjoying it
there, and he was teaching and introducing new things into the curriculum there. He wrote me this letter
assessing all of the schools at the time, the way he thought their development of computer science was.
When I quoted him a minute ago saying Harvard was an advanced backwater, that comes out of that
letter that he was describing the way he looked at things. At the end of the letter he says -- I had already
mentioned that Stanford was my current number one but I wasn’t totally sure -- and at the end he ended
up concurring. He said if I would go there and he could go there, chances are he would go there, too. I
presented this to Forsythe, saying why don’t we try to make it a package deal. This meant they had to
give up two professors to replace us with. They couldn’t get two new billets for us, and so it was a lot of
work on Stanford’s part, but it did develop. Except that you had to lose two other good people, but I think
Bob and I did all right for the department.
Ed Feigenbaum: Maybe that was your first great service to our department was recruiting Bob Floyd.
Don Knuth: Well, I don’t know. I did have to work a little bit the year after I got here. To my surprise they
had appointed him as an associate professor but me as a full professor. It was understandable because
he didn’t have a Ph.D. He had been a child prodigy, and I think he had gotten into graduate school at
something like age 17, and then dropped out to become a full time programmer. So he didn’t have the
academic credentials, although he had all the best papers in the field. I had to meet with the provost and
say it’s time to promote him to full professor. The thing that clinched it was that he was the only person
that had gotten -- this was 1969 -- he was the only person that had been invited to give keynote
addresses in two sessions of the International Congress in Ljubljana.
Ed Feigenbaum: In ’71.
Feigenbaum: Don, maybe we could just say a little more about Bob [Floyd] and his life at Sanford.
Knuth: Right. As it turned out, when we got together we couldn’t collaborate quite as well as when we
were writing letters. I noticed this was true in other cases. Like sometimes I could advise my students
better when I was on sabbatical than when we were having weekly meetings. It’s not easy to work face-
to-face all the time, but rather sometimes offline instead of online. I told you my experience with Marshall
Hall -- that I couldn’t think in his presence. I have to confess that there are some women computer
scientists that when I’m in their presence, I think only of their brown eyes. I love their research, but I’m
wired in certain ways that mean that we should write our joint papers by mail, or by fe-mail. Anyway. We
did a lot of joint work in the early ‘70s, but also it turned out that when Bob became chair of the
department… I’m not sure exactly when that was; probably right after my sabbatical.
Knuth: Yeah. I went on leave of absence for a year in Norway and then I came back and Bob was chair
of the department. He took that job extremely seriously, and worked on it to such an extent that he
couldn’t do any research very much at all during those three or four years when he was chair. I don’t
know how many years, five years.
Knuth: Okay, so it was four years. That included very detailed planning all aspects of our new building.
When he came back, then he had two years of sabbatical. That’s one credit that you get. So there was a
break in our joint collaboration. Afterwards, he never quite caught up to the leading edge of the same
research topics that I was in. We would work on things occasionally, but not at all the way we had done
previously. We wrote a paper that we were quite pleased with at the end of the ‘80s, but it was not the
kind of thing that we imagined originally, that would always be in each other’s backyard. In fact, I’m a
very bad coworker. You can’t count on me to do anything, because it takes me a while to finish stuff and
I think of something else. So how can anybody rely on me as being able to go with their agenda? Bob,
during the ‘70s, came up with a lot of ideas, like his method for half-tone, for making gray-level pictures,
that is in all the printers of the world now. That was done completely independently. I didn’t even know
about it until a couple years after he had come up with these inventions. But I’m dedicating a book to
Bob. My collected works are being published in eight volumes. The seventh volume is selected papers
on design of algorithms. That one is dedicated to Bob Floyd, because a lot of the joint papers, joint work
we did, occurs in that volume. He was one of the few people in my life that really I consider one of my
teachers, the gurus that inspired me.
Feigenbaum: Don, I’m going to call that the end of your first period of Stanford. I wanted to move into
some questions about what I call your second Stanford period. This is very different. I’ve sort of
delineated this as a very different time. I saw you shifting gears, and I couldn’t believe what was
happening. You became, in a solitary way, the world’s greatest programmer. It was your engineering
phase. This was TeX and METAFONT. All of a sudden, you disappeared into just miles of code, and
fantastic coding ideas just pouring out, plus your engineering. We were in the new building and you were
running back and forth from your office to where this new printing machine was installed. You’d be
debugging it with your eyes and with your symbols and pulling your hair out of your head, because it
wasn’t working right, and all that. You were just what the National Academy of Engineering would call an
engineer. Tell me about that period in your life.
Knuth: Okay, well, it ties in with several things. There was a year that you didn’t see me when I was up
at McCarthy’s lab...
Knuth: …starting this. One of the first papers that I collaborated with Bob Floyd on in 1970 [had] to do
with avoiding go-to statements. There was a revolutionary new way to write programs that came along in
the ‘70s, called structured programming. It was a different way than we were used to when I had done all
my compilers in the ’60s. Bob and I, in a lot of our earliest conversations at Stanford, were saying, “Let’s
get on the bandwagon for this. Let’s understand structured programming and do it right.” So one of our
first papers was to do what we thought was a better approach to this idea of structured programming than
some people had been taking. Some people had misunderstood that if you just get rid of go-to
statements you had a structured program. That’s like saying zero population growth; you have a
numerical goal, but you don’t change the structure. People were figuring out a way to write programs that
were just as messy as before, but without using the word “go-to” in them. We said no, no, no; here’s
what the real issues are. Bob and I were working on this. This is going on, and we’re teaching students
how to write programs at Stanford, but we had never really written more than textbook code ourselves in
this style. Here we are, being full professors, telling people how to do it, having never done it ourselves
except in really sterile cases with not any real world constraints. I probably was itching… Thank you for
calling me the world’s greatest programmer. I was always calling myself that in my head. I love
programming, and so I loved to think that I was doing it as well as anybody. But the fact is, the new way
of programming was something that I didn’t have time to put much effort into.
Feigenbaum: The emphasis in my comment was on the solitary. You were a single programmer doing
all this. No team.
Knuth: That’s right. As I said, it’s hard for me to have somebody else doing the drumming. I had to
march to my... I had The Art of Computer Programming, too. I could never be a reliable part of a team
that I wasn’t the head of, I guess. I did first have to get into that mode, because I was forced to. I was
chair of the committee at Stanford for our university reports. We put out lots and lots of reports from all
phases of the department through these years. We had a big mailing list. People were also trading their
reports with us. We had to have a massive bookkeeping system just to keep the correspondence, so that
the secretaries in charge of it could know who had paid for their reports, who we were sharing with. All
this administrative type of work had to be done. It seemed like just a small matter of programming to do
this. I had a grad student who volunteered to do this as his master’s project; to write-up program that
would take care of all of the administrative chores of the Stanford tech reports distribution. He turned in
his term paper and I looked at it superficially and I gave him an A on it, and he graduated with his
master’s degree. A week later, the secretary called me up and said, “Don, we’re having a little trouble
with this program. Can you take a look at it for us?” The program was running up at the AI lab, which I
hadn’t visited very often. I went up there and took a look at the program. I got to page five of the
program and I said, “Hmmm. This is interesting. Let me make a copy of this page. I’m going to show it
to my class.” [It was] the first time I saw where you change one symbol on the page and you can make
the program run 50 times faster. He had misunderstood a sorting algorithm. I thought this was great.
Then I turned to the next page and he has a searching algorithm there for binary search. I said, “Oh, he
made a very interesting error here. I’ll make a copy of this page so I can show my class next time I teach
about the wrong way to do binary search.” Then I got to page eight or nine, and I realized that the way he
had written his program was hopelessly wrong. He had written a program that would only work on the
test case that he had used in his report for the master’s thesis, that was based on a database of size
three or something like this. If you increased the database to four, all the structures would break down. It
was the most weird thing. I would never conceive of it in my life. He would assume that the whole
database was being maintained by the text editor, and the text editor would generate an index, the way
the thing did. Anyway, it was completely hopeless. There was no way to fix the program. I thought I was
going to spend the weekend and give it to the secretary on Monday and she could work on it. There was
no way. I had to spend a month writing a program that summer -- I think it was probably ’75, ’76 -- to
cover up for my terrible error of giving this guy an A without seeing it. The report that he had, made it
look like his program was working. But it only worked on that one case. It was really pathetic. So I said,
“Okay, I’ll use structured programming. I’ll do it right. This is my chance to do structured programming.
I’ll get a learning experience out of it.” I got a good appreciation for writing administrative-type
programming. I used to think was trivial, [but] there was a lot to it. After a month I had a structured
program that would do Stanford reports, and I could install that and get back to the rest of my life.
Meanwhile, I’d been up at the AI lab and I met the people up there. I got to know Leland Smith, who is a
great musician professor. Leland Smith told me about a problem that he had. He was typesetting music.
He says, “I’ve got a piece of music and it maybe has 50 bars of music. I have to decide when to turn the
page. I know how many notes are in each bar of the music, and I know how much can fit on the page.
But I like to have the breaks come out right. Is there any algorithms that could work for this?” He
described the problem with me. He had the sequence of numbers, how many notes there are, and try to
find a way to break it into lines and pages in a decent way. I looked at the problem and said, “Hey
Leland, this is great. It’s a nice application of something we in computer science call the dynamic
programming algorithm (method). Look, here’s how dynamic programming can be used to solve this
problem.” Then I’m teaching Stanford’s problem seminar the next fall, and it came up in class. I would
show the students, “Look how we had this music problem, and we can solve it with dynamic
programming.” One of the students, I don’t remember who it was, raised his hand and said, “You know,
you could also use that to text, to printing books. You could say, instead of notes into bars, you could
also say you’ve got letters and words into lines, and make paragraphs choosing good line breaks that
way.” I said, “Hey, that’s cool. You’re right.” Then comes, in the mail, the proof sheets for the second
edition of volume two. I had changed a lot of pages in volume two of The Art of Computer Programming.
I got page proofs for the new edition. During the ‘70s, printing technology changed drastically. Printing
was done with hot lead in the ‘60s, but they switched over to using film in the ‘70s. My whole book had
been completely retypeset with a different technology. The new fonts looked terrible! The subscripts
were in a different style from the large letters, for example, and the spacing was very bad. You can look
at books printed in the early ‘70s and it turns out that if it wasn’t simple -- well, almost everything looked
atrocious in those days. I couldn’t stand to see my books so ugly. I spent all this time working on it, and
you can’t be proud of something that looks hopeless. I’m tearing out my hair. I went to Boston again and
they said, “Oh, well, we know these people in Poland. They can imitate the fonts that you had in the old
hot lead days. It’s probably not legal, but we can probably sneak it through without…” You know, the
copyright problems of the fonts. “They’ll try to do the best they can, and do better”. Then they come back
to me, at the beginning of ’77, with the new version done with these Polish fonts which are supposed to
solve the problem. They are just hopelessly bad. At the very same time, February of ’77, I’m on
Stanford’s comprehensive exam committee, and we’re deciding what the reading list is going to be for
next year’s comp. Pat Winston had just come out with a new book on artificial intelligence, and the proofs
of it were just being done at III Corporation [Information International, Incorporated] in Southern
California; at [Ed] Fredkin’s company. They had a new way of typesetting using lasers. All digital, all dots
of ink. Instead of photographic images and lenses, they were using algorithms, bits. I looked at these
galley proofs of Winston’s book. I knew it was just bits, but they looked gorgeous. They looked
absolutely as good as anything I’d ever seen printed by any method. By this time I was working at the AI
lab, where we had the Xerox Graphics Printer, which did bits at about 120 dots per inch. It looked
interesting, but it didn’t look beautiful by any stretch of the imagination. Here, with I think this was 1,000
dots per inch at III, you couldn’t tell the difference. It was like: I come from Wisconsin and in Wisconsin
we never eat margarine. Margarine was illegal to bring into the State of Wisconsin unless you didn’t color
it. I’m raised on butter. It’s the same thing here. With typography, I’m thinking: okay, digital typography
would have to be like margarine. It couldn’t be the real thing. But, no! Our eyes don’t see any difference
when you’ve got enough dots to the inch. A week later, I’m flying down with Les Earnest to Southern
California to III, and finding out what’s going on there. How can we get this machine and do it?
Meanwhile, I planned to have my sabbatical year in ‘77-’78. I was going to spend my sabbatical year in
Chile.
Knuth: Yeah.
Feigenbaum: I don’t know if Fredkin was still involved with III at that time. But III never gets enough
credit for those really revolutionary ideas.
Feigenbaum: Not just those ideas, but the high speed graphics ideas.
Knuth: Oh yeah. That’s when I met Rich Sherpel [ph?] down there, and he was working on character
recognition problems. They had been doing it actually for a long time on microfilm, before doing
Winston’s book. This was the second generation. First they had been using the digital technology at
really high resolutions on microfilm. And so many other things [were] going on. Fredkin is a guy who--
Feigenbaum: Right at the beginning, Fredkin revolutionized film reading, using the PDP-1. Anyway, I
interrupted you. You were on your Chile.
Knuth: Ed’s life is ten times as interesting as mine. I’m sure that every time I hear more about Ed, it
adds just another… He’s an incredible person. We got to get 20 oral histories.
Feigenbaum: I think Ed may be a subject for one of these oral histories of the Computer History
Museum.
Knuth: Yeah, we’ve got to do it. Anyway, I cancelled my sabbatical plan for Chile. I wrote to them
saying I’m sorry; instead of working on volume four during my sabbatical, I’m going to work on
typography. I’ve got to solve this problem of getting typesetting right. It’s only zeros and ones. I can get
those dots on the page, and I’ve got to write this program. That’s when I became an engineer.
Feigenbaum: I’m going to let you go on with this, but I just wanted to ask a question in the middle here,
just related to myself, actually. How much of this motivation to do TeX related to your just wanting to get
back to being a programmer? Life was going on in too abstract a way, and you wanted to get back to
being a programmer and learning what the problems were, or the joy of programming.
Knuth: It’s a very interesting hypothesis, because really you can see that I had this. The way I
approached the CS reports problem the year before was an indication of this; that I did want to sink my
teeth into something other than a toy problem. It wasn’t real large, but it wasn’t real small either. It’s true
that I probably had this craving. But I had a stronger craving to finish volume four. I did sincerely believe
that it was only going to take me a year to do it.
Knuth: No, no, absolutely. You’re absolutely right. In 1975 and ’76, you can check it out. Look at the
Journal of the ACM. Look at the SIAM Journal on Computing. Look at, well, there’s also SIAM Review
and there’s math journals, combinatorial journals, Communications of the ACM, for that matter. You’ll
find more than half of those articles are things that belong in volume four. People were discovering things
right and left that I knew deserved to be done right in volume four. Volume four is about combinatorial
algorithms. Combinatorial algorithms was such a small topic in 1962 when I made that chapter seven of
my outline that Johan Dahl asked me, when I was in Norway, “How did you ever think of putting in a
chapter about combinatorial algorithms in 1962?” I said, “Well, the only reason was, that was the part I
thought was most fun.” I really enjoy writing, like this program for Bose that I did overnight. It was a
combinatorial program. So I had to have this chapter just for fun. But there was almost nothing known
about it at the time. People will talk about combinatorial algorithms nowadays [and] they usually use
“combinatorial” in a negative way. In a pejorative sense, instead of the way I look at it. They say, “Oh,
the combinatorial is going to kill you.” “Combinatorial” means “It’s exploding. you can’t handle it, it’s a
huge problem.” The way I look at it is, combinatorial means this is where you’ve got to use some art.
You’ve got to be really skillful, because one good idea can save you six orders of magnitude and make
your program run a million times faster. People are coming up with these ideas all the time. For me, the
combinatorial explosion was the explosion of research. Not the problems exploding, but the ideas were
exploding. So there’s that much more to cover. It’s true that I also in the back of my mind I’m scared stiff
that I can’t write volume four anymore. So maybe I’m waiting for it to simmer down. Somebody did say to
me once, after I solved the problem of typesetting, maybe I would start to look at binding or something,
because I had to have some other reason [to delay]. I’ve certainly seen enough graduate student
procrastinators in my life. Maybe I was in denial.
Knuth: As far as I knew, though, it was going to take me a year. I was going to work and I was going to
enjoy having a year of writing this kind of a program. The program was going to be just for me and my
secretary, Phyllis; my super-secretary, Phyllis. I was going to teach her how to do it. She loved to do
technical typing. I could write my books and she could make them; dotting I’s and crossing T’s and spit
and polish that she did on my math papers when she always typed my math papers.
Don Knuth: Okay, those are great questions. It’s funny you’d ask that, because the number one thing
on my mind as I was walking into the building this morning was thinking about John Backus’s death. It
was a shock to me to learn about it yesterday, but then I was just thinking, “Oh yeah, he always wore
clothes like this.” [Motions to himself] Whenever I saw him he was wearing a denim jacket. I can say just
a few orthogonal things about the whole situation that strike me first. In the first place, when I was a
student we had no information at all about Fortran. I didn’t hear about it until after I had been using IT,
and I think after RUNCIBLE. It was, like, 1959 when people were coming out with something called
FORTRANSIT, which was a translator from Fortran to IT, to IT, so that people on the 650 could use the
Fortran language. You see, the IBM 650 was the world’s first mass-produced computer, the first time
there were more than 100 of any one kind of computer. Fortran was developed for a 704, which there
were several dozen of those, but it was aircraft industries and so on. It was people who could afford a
much bigger kind of machine than the 650. It was a different world. You were lucky as a summer
student. You could see the other world, but I was more in the boondocks. I told you last time that priority
was so far from my mind when I wrote this article about RUNCIBLE for the Communications [of the ACM]
that I failed to mention any of the people who were working with me. We had this team at Case, but we
didn’t name any names in our story as to who came up with the improvements that we made, because it
wasn’t something that we knew anything about. But that might have been my naiveté as a college
undergraduate. The first time I learned about upsmanship or something -- academic priority -- was, in
fact, from my teacher Bose, the man who worked on Latin squares. The reason he wanted me to get this
program working overnight is because he was in intense competition with another group in Canada that
was also trying to find Latin squares of order 12. It turned out it was approximately a dead heat between
the two groups. That amazed me, that there could be so much competition for being first at the time. It
wasn’t part of the culture that I grew up in. All I can report is that I was amazed later on to find out also,
when people are talking about the discovery of DNA and all this, how much passion went into these
things, because it just wasn’t something that I personally experienced. But it just might be, again, my
naiveté. I presented a paper at the ACM Conference in 1962 in Syracuse, which was the summer that I
wrote my Fortran compiler for the Univac solid-state machine. At the end of that summer I gave this
paper at the ACM called, “A History of Writing Compilers.” I guess I didn’t call it “The History of Writing
Compilers.” Basically I was trying to explain in my talk what I knew about the various developments that
had come up with in technology for writing compilers. Of course I mentioned Fortran, and the ways in
which they had dealt with the question, for example, of operator precedence. That was, if you write
AxB+C without parentheses, Fortran would recognize that as first multiply A by B, and then add C. I’ll go
back and give you a better example. AxB+CxD, but you write that without any parentheses. Now what
would happen in Fortran, is Fortran would know that the mathematicians usually mean by that that you
take A and multiply it by B, and you take C and multiply by D, and then you add the two things together.
But IT wouldn’t do it that way. IT would require you to put parentheses if you want to do it, and, if my
memory is correct, otherwise it would associate to the right. So it would take A times the quantity B plus
the quantity C times D. So Fortran had to invent a way to do this. The way they did it was rather clever.
They replaced the times sign by right parenthesis times left parenthesis, and they replaced the plus sign
by two right parentheses plus two left parentheses. Then they put a whole bunch of parentheses around
the whole thing. The result is that you had an expression that was fully parenthesized, but since you had
guarded the plus sign with two parentheses and the times with only one, the times was done first. It was
a clever idea. It’s just one of the things I mention in my paper in this Syracuse thing. Well, a reviewer of
my paper afterwards said, “He didn’t talk about the history of writing compilers. He just talked about the
history of him writing one particular compiler.” Well, if you look at my paper you’ll see it’s not a fair
criticism. The reviewer was undoubtedly tee’d off that I had not mentioned his compiler. I gave a history
of many ideas that were used in building compilers, but I didn’t give a history of what people had done in
compilers. As years went on, I got more interested in history. In 1962 I was 24 years old. It’s like Mark
Twain or somebody said, that “When you’re a teenager you think your parents are the stupidest people in
the world. Five years later you wonder how they could learn so much in five years.” You get more
interested in history and the overall thing. Well anyway, this criticism, that I hadn’t given a very
comprehensive history of compilers, weighed on my mind. So the next few years after ’62, I actually
started looking into the history of compilers, trying to get a real understanding as to who did what when,
and first, and so on. Where did the ideas come from before the little excrescences of the story that I
knew. In fact, the main lecture I was giving on my ACM lecture tour in 1967 was the real early history of
writing compilers. By that time I had gone through and I had studied Grace Hopper’s work, and I had
studied Backus’s work, and the Fortran 0, and the many developments in England and Russia and so on,
that had taken place in the earliest days. So the talk that I was giving when I’m making this nationwide
lecture tour is mostly this talk of redeeming myself for giving a very unbalanced view of the history of
compiler development that I had given in 1962. Later on I worked with my student Luis Trabb Pardo in
order to really do it right, because the Harvard University Press had asked me to edit a sourcebook on
computer science which was intended to print the documents from the early days that had come out
before their time. I had collected a lot of these early things. They’ve done this with many other fields:
sourcebook on logic, sourcebook on mathematics, sourcebook on chemistry, and that kind of thing.
Harvard had a big series. I was asked to do a sourcebook on computer science. In the course of this I
worked with Luis to get a really thorough history of programming languages, their early development. We
presented this as a paper at a big conference in Los Alamos in 1976. 1976 was the year everybody had
history on mind, because it was the bicentennial of America. We had a big conference where almost all
the computer pioneers were living were assembled there. People like [Konrad] Zuse came, who I met
from Europe, and the people who had worked on the Colossus computers. All these pioneers were there.
The paper that I presented at that time was “The Early History of Computer Languages.” This was one of
the most difficult papers to write, in the sense of total amount of work expended, because what I
presented in this talk was 20 predecessors to Fortran. Not only was Fortran not number 1, but it was
number 21, basically. Although one of the 20 preceding Fortran was the preliminary specs of Fortran,
which wasn’t implemented, but people were using it in mockups and trial runs. Going to Zuse’s work, for
example, Zuse had a high level language, his PlanKalkul. Many, many other pioneers [attended]. I
brought that picture all together. I’m quite proud of the paper now, because of all the work I put into it. As
I was writing it I found out actually I only had 19 predecessors of Fortran. Just a week before the
conference I learned about another one at Livermore that had been developed. I went out to Livermore,
and right in my own backyard was one of the first. So there was a great amount of activity going on. IT
was part of this, for sure. But most of the people didn’t know [of] the existence of the others. Fortran
itself was strongly influenced by a compiler for the Whirlwind computer that John Backus learned about
when he went to a conference at MIT in 1954. Then John got his team together and did that. I had a
great admiration for John. I remember that the first time I came to Stanford, which was about 1964, was
when I first met him. We had corresponded. He and Barbara invited me to their house, and he also
introduced me to topless bars at the time. It was interesting as a nice phenomenon in San Francisco, you
know. We had a pleasant evening together. That was on the same trip that I visited Stanford at
Forsythe’s invitation. John and I always hit it off well, and I admired his breadth of interest in all these
things. But your question was mostly about priority. I think it cuts two ways. In the first place, I don’t like
to think of it as saying somebody did it before somebody else. That’s the popular interpretation of the
priority. But the opposite is where you just have an idea and you have no idea where it came from.
That’s very bad, I think, just to assume that ideas have no connection to each other or they didn’t spring
from somewhere. Because how are we going to get another idea tomorrow if we don’t have a lot of case
studies as to how ideas can germinate? I go out of my way in my books The Art of Computer
Programming to try to track down the sources of the concepts that we have in computer science.
Sometimes I tell people I only do this in order to make computer science respectable, to show that it’s not
a fly-by-night thing, but it’s deeply rooted in ancient history and so on. Well, of course, it’s nice to have
computer science a little bit respectable. We are the new kid on the block. But that’s not really the point.
The point is that really there were people who would’ve been computer scientists, if computers had been
around, that were living a hundred years ago. They just happened to have been born at the wrong time,
but they had the same kind of strange way of looking at things that I do. I can see that in their writings. I
was reading last year a manuscript from 14th century India, and I felt the guy was talking to me. I doubt if
any of his contemporaries really knew what he was, but here it was. I said to my wife, “This guy is a
computer scientist. I know exactly what’s going on because I went through the same kind of a thought
process when I was looking at a similar problem when I was younger.” So the idea of priority is more,
instead, really learning the human element of it. How somebody was able to combine ideas and then
make a non-obvious leap that would then influence somebody else. For this reason I love to read source
documents instead of [reading] somebody boiling down a source document. I boil it down myself in my
books. I try to recommend that. I try to give places so that people can check out the originals when they
can. We’d have much less of this cutthroat idea of competition in the field than I read about when I study
the novel by my friend who invented the birth control pill.
Knuth: Yeah, Carl Djerassi’s novel, “The Bourbaki Gambit”. Or something like this, right? It’s all about a
world of science that I don’t feel computer science inhabits. It’s a different…
Knuth: Yeah.
Feigenbaum: I was going to bring a quote from a chemist to this interview but I didn’t have it exactly right
so I didn’t do it. But in chemistry, the knife is sharp.
Knuth: Yeah. I worked on open source publishing a few years ago, and I was surprised to find out
that… I was looking at some of the general policy, and in other fields than computer science when you
submit your paper, you can list people that you don’t want to be referees of your paper. It blew my mind.
I said, “Why do you do this?” and he said, “Well, because they think the other guys are going to steal their
ideas.” <laughs> We share ideas. The whole Silicon Valley culture, the venture capitalists get together
for lunch every Tuesday and say, “These are the startups I’m thinking of starting,” and somebody will say,
“Well why don’t you change it a little bit?” They share ideas openly because they know that there’s a half-
life of these ideas, and in six months they’ll get better. The companies are even better because of it. The
biology community would never think of such a thing, of sharing their plans for new development. It is
quite a different culture. I think you’re right.
Feigenbaum: Don, just a small follow-up question for this. As you were speaking, it was bubbling in my
mind that in various sciences, including ours, the big prizes are sometimes given for what are considered
breakthrough ideas. So a young person can win a big prize. Sometimes it’s given for career
contributions. The Nobel Prizes are like this too. Sometimes a brilliant thing flashes up on the screen,
like the CT scan. The Nobel Prize was given to a EE guy for the CT scan, in medicine. But often the
prize is given to someone for a career’s worth of work. Do you think that we have breakthrough ideas in
computer science of that sort?
Knuth: Yeah, but I minimize their importance, in a sense. We do have landmark ideas that sort of all of
a sudden… Something like in theoretical field, the idea of NP-completeness. All of a sudden we had
thousands of people inspired by this idea. But how many of them are there? It was interesting. I wrote a
letter to Allen Newell when I was starting to write “The Art of Computer Programming.” I think I wrote it to
him in 1963 or something like this. I said, “Allen, I’m struck by the fact that all good ideas in computer
science were invented before 1960 and we’ve just been rediscovering the wheel since then.” I’m not
sure, but sort of that was the thrust of my letter. Allen replied to me, “Oh no, Don, you’re suffering from
the bow wave phenomenon,” Or something like this. Then I had another conversation with Juris
Hartmanis, who was the head of the department at Cornell. Juris was a wonderful leading person in
Automata Theory, and he happened to have been a student of Marshall Hall as well. He was recruiting
me to come to Cornell at the same time I was considering Berkeley, and Stanford, and other things. He
visited me, and I visited Cornell with the serious idea of going there, and didn’t put it on my final list
because people don’t drop into Cornell the way they drop into Stanford. I would have to go to them, and I
don’t like traveling that much. But we had this conversation, and one of the questions that struck me, he
said, “Don, what was the most important new idea in computer science during the past year?” I couldn’t
think of a single thing. Say 1965, or something like that. I couldn’t think of any breakthrough. For the
next ten or so years, I asked myself the same thing at the end of every year. What was the breakthrough
that occurred this year? I couldn’t come up with anything. Almost never could come up with anything.
On the other hand, in ten years the whole field had changed. I realized that what it really is, it’s like a
great wall, where everybody’s contributing bricks to the wall, and each brick… In other words, it’s the
community enterprise that really has made it such a thriving field. I like to give credit to everybody who
puts in one of these bricks. Of course, we’ve got to have the major prizes in order to get into the
newspaper and things like this. But so many things go into this. The big breakthrough is not the real
story, although they’re wonderful when they occur. That’s my take on that.
Feigenbaum: Thanks, Don. We could sit here and discuss that endlessly. I’m going to resist doing that
because I’d like to get on to the moment when you are sitting in that little office in [Stanford’s Margaret]
Jacks Hall, and you decide that you and Phyllis need a better language in which to basically, essentially,
typeset your books. Listening to the early part of your conversation about this last week, it occurred to
me that one level below the surface, there’s something else about books. My wife just stopped being,
she was a trustee, a member of the Board of the San Francisco Center for the Book. She’s a book artist
and she loves books.
Feigenbaum: I get this feeling that there’s something about books inside you, inside your head, that you
absolutely love. Can you just tell us about your love affair with books?
Knuth: That goes very deep. My parents disobeyed the conventional wisdom by teaching me to read
before I went into kindergarten. All of their friends said, “No, he’s going to be bored in school,” but I was
the youngest member of the “bookworm club” in Milwaukee Public Library. I think I was two-and-a-half
years old, or something like this. The Milwaukee Journal ran a little blurb about it with my picture in it
because I was a member of the bookworm club at the library. I loved books from a child. In those days
there weren’t big drug problems and so on, and little kids could ride the streetcars downtown. I went
down to the library one day, and the lights went out in the library. I went over to the window so I could
see better the book I was reading. It didn’t occur to me the library was closing. My parents called a
couple hours later and said, “Where is he? Where’s our son?”, and the librarian found me in the book. I
have kind of a strange love affair with books going way back. In my undergraduate years, I think I
mentioned last time that a lot of my favorite textbooks were published by Addison-Wesley: the calculus
book that I had, the physics book that I had, the book on number theory that I had seen. Addison-
Wesley, for technical books, had a special thing. The president of the company had actually done
something that other publishers… I know you were an editor for McGraw-Hill. McGraw-Hill would farm
out their typesetting, but Addison-Wesley had its own house composition plant. Hans Wolf had his team
of people making the type, right next door to where the editorial offices were. It was the philosophy of the
company really to get a special house style, and really good designers, and they made their mark on it.
They also published the first book on computer science: Wilkes, Wheeler, and Gill in 1951, or something
like this. That was one of the very first books Addison-Wesley put out, at the time when it was a
struggling new company. I had also this thing about the appearance of books. I wanted my books to be
something that other readers would treasure the appearance of it, not just that there were some words in
there.
Feigenbaum: Let’s go back to the time when you were planning TeX, and Phyllis was in the outside
office, and the two of you needed a language.
Knuth: Right. Phyllis had been typing all of my technical papers. I have never seen her equal anywhere,
and I’ve met a lot of really good technical typists. She really loved it too, and so we had a fairly good
thing. She could read my handwriting. I always composed my manuscripts by hand. People ask me
about this. I might as well digress yet again. I also love keyboard things. I’ve been playing the piano for
ages. When I was in high school I learned how to run a stenograph machine, like court reporters use. I
went to Spencerian College for summer class. I had the idea I’d get to college and I’m going to take
notes with a stenograph machine. I tried it for two weeks at Case before giving it up. We had been
taught shortcuts for how to say, “Dear Sir,” and “Yours very truly,” but we didn’t have any abbreviations
for chemistry and all these other things.
Knuth: But anyway, I’m fascinated by keyboards. I also took typing and I was a very good typist. I could
do 70 words a minute or something like this. I got myself a Russian typewriter with a Cyrillic keyboard so
that I could do my Russian homework in undergrad as well. I love keyboards. But I always compose my
manuscripts handwritten. The reason is that I type faster than I think. There’s a synchronization
problem. I can think of ideas at about the rate I can write them down with a pencil. But with typing I’m
going faster, so I have to sync, and my thoughts have to start up and stop again in a way that involves
more of my brain. As a college student I found I could write a letter home much faster by hand, much
faster than I could type it even though I’m a great typist. The synchronization was slowing down the total
thing. Phyllis and I had this nice, symbiotic relationship. She could read my handwriting, she knew when
to display a formula, make it look beautiful. You almost would think she knew more mathematics than I
did, sometimes, the way she would correct a formula that I had and didn’t look right to her. She would
change it and also get it right. When I’m learning that typesetting is a problem of zeros and ones --just a
matter of programming to get the ink where it’s supposed to go -- my thought was definitely that this
would be something that I would make so that Phyllis would be able to take my handwritten manuscripts
and go from there. I used her as the model for the language that I was developing, and I also would be
able to understand it myself.
Knuth: No.
Knuth: That’s right. I knew that Bell Labs had a system where they had been using secretaries to
typeset. Bell Labs had the EQN system. I learned later that other people had developed systems where
they hire and train secretaries. I used the Bell Lab system, which I knew was a working system, where
somebody uses the Greek letter Alpha, they say “A-l-p-h-a.” The guys in these commercial systems, the
letter Alpha, they say, “Oh no, these secretaries can never learn that. They’re scared by any hint that it’s
Greek. They just know that it’s this symbol, and so they type QA for the letter Alpha, and that gives them
job satisfaction because they know this code that the mathematicians don’t know. My philosophy was,
though, that I knew that Phyllis would like to write “A-l-p-h-a”. What went into the design was her as a
model. And the fact that I knew that the secretaries at Bell Labs had a language that was in existence,
that it was something that secretaries could learn. At that time when I started TeX, some physics journals
were already being typeset with the EQN system from Bell Labs. It looked horrible -- the spacing was just
ugly -- but it was the first generation of this. But I knew that they had a language that the secretaries
could learn. All I had to do was tune up the aesthetics of the final product.
Feigenbaum: Don, I would like to ask you about the activities going on. You mentioned that TeX took
much longer than you had anticipated. You had anticipated a one-year project. You ended up with a ten-
year project. It kind of carves out a section of your life in which you were being an interface designer.
You were being a programmer. I use the term “programmer” because you yourself use it in bio material
on the web, that when you were doing TeX you were a programmer. Then there’s all the other things
going on, both with TeX, with fonts, with the rest of your life. Can you tell us about those three things?
Knuth: Okay.
Knuth: A life story. Okay, there are stories. The first part of it, I’m designing a language for my
secretary. This took place in sort of two all-nighters. I made a draft. I sat up at the AI lab one evening
and into the early morning hours, composing what I thought would be the specifications of a language. I
had already been playing around. I looked at my book and I found excerpts from several dozen pages
where I thought it gave all the variety of things I need in the book. Then I sat down and I thought, well, if I
were Phyllis, how would I like to key this in? What would be a reasonable format that would appeal to
Phyllis, and at the same time something that as a compiler writer I felt that I could translate into the book,
because TeX is another kind of a compiler. Instead of going into machine language, instead, you’re
going into words on a page. That’s a different output language, but it’s analogous in recognizing the
constructs that appear in the source file. So I went through and this day I drafted how I would typeset
those 12 sample segments in a language that I thought Phyllis would understand. I also mentioned a
mini-users manual for teaching this language. I wrote the draft of this one night, and I showed it to a
bunch of people for their comments. Then a few weeks later I went through the same thing again.
Fortunately, the Stanford AI lab, where I did this work, had a very good backup system. All of the files
that were on that computer for more than 20 years, stored on archival tapes, are now being available
through the internet. I found, thanks to looking at these old so-called dark tapes, I found the drafts that I
made of TeX on those days when I did the design. Since I believe in source documents, as I said, I
published those in my book, “Digital Typography”, so the people could see what the raw thoughts were,
and all the mistakes, the words that were there at the very beginning. Just as an idea of a design
process. Then I showed the second version of this design to two of my graduate students, and I said,
“Okay, implement this, please, this summer. That’s your summer job.” I thought I had specified a
language. I had to go away. I spent several weeks in China during the summer of 1977, and I had
various other obligations. I assumed that when I got back from my summer trips, I would be able to play
around with TeX and refine it a little bit. To my amazement, the students, who were outstanding
students, had not competed [it]. They had a system that was able to do about three lines of TeX. I
thought, “My goodness, what’s going on? I thought these were good students.” Well afterwards I
changed my attitude to saying, “Boy, they accomplished a miracle.” Because going from my
specification, which I thought was complete, they really had an impossible task, and they had succeeded
wonderfully with it. These students, by the way, [were] Michael Plass, who has gone on to be the brains
behind almost all of Xerox’s Docutech software and all kind of things that are inside of typesetting devices
now, and Frank Liang, one of the key people for Microsoft Word. He did important mathematical things
as well as his hyphenation methods which are quite used in all languages now. These guys were actually
doing great work, but I was amazed that they couldn’t do what I thought was just sort of a routine task.
Then I became a programmer in earnest, where I had to do it. The reason is when you’re doing
programming, you have to explain something to a computer, which is dumb. When you’re writing a
document for a human being to understand, the human being will look at it and nod his head and say,
“Yeah, this makes sense.” But then there’s all kinds of ambiguities and vagueness that you don’t realize
until you try to put it into a computer. Then all of a sudden, almost every five minutes as you’re writing the
code, a question comes up that wasn’t addressed in the specification. “What if this combination occurs?”
It just didn’t occur to the person writing the design specification. When you’re faced with implementation,
a person who has been delegated this job of working from a design would have to say, “Well hmm, I don’t
know what the designer meant by this.” If I hadn’t been in China they would’ve scheduled an
appointment with me and stopped their programming for a day. Then they would come in at the
designated hour and we would talk. They would take 15 minutes to present to me what the problem was,
and then I would think about it for a while, and then I’d say, “Oh yeah, do this. ” Then they would go
home and they would write code for another five minutes and they’d have to schedule another
appointment. I’m probably exaggerating, but this is why I think Bob Floyd’s Chiron compiler never got
going. Bob worked many years on a beautiful idea for a programming language, where he designed a
language called Chiron, but he never touched the programming himself. I think this was actually the
reason that he had trouble with that project, because it’s so hard to do the design unless you’re faced
with the low-level aspects of it, explaining it to a machine instead of to another person. Maybe it was
Forsythe, I think it was, who said, “People have said traditionally that you don’t understand something
until you’ve taught it in a class. The truth is you don’t really understand something until you’ve taught it to
a computer, until you’ve been able to program it.” At this level, programming was absolutely important.
Feigenbaum: Could I stop you just a second? That’s exactly the same methodology that I learned from
Herb Simon and Al Newell at Carnegie, which is, it’s useless to spit out theories of human thinking unless
you can program them. You get every detail. You have to make a decision about every detail.
Knuth: Yeah, and they’re trying to come up with models of the brain and chess players and things like
this. It becomes very clear at this point.
Knuth: But also in every field. Composing music. I took a class in music theory during my sabbatical
year, my year in Princeton before coming to Stanford. The idea of music theory, you’re supposed to
decide whether or not certain combinations of notes are going to sound good or not. But if they had
presented it as a programming thing -- write a program that decides whether or not these notes are going
to sound good or not -- that would’ve focused the issue, the attention, so much more sharply. It’s a
dream that if I finish my “Art of Computer Programming,” one of the things I want to do before I die is to
spend time programming for musical composition, and see if I can come up with some good music that is
developed with computer aid. I feel that in order to really understand music, it’s going to help me to be
able to program that. Who knows?
Feigenbaum: Don, let me go back to the programming stage. I would wander into your office in Jacks
occasionally, and occasionally you would jump up and down and show me something. I remember one
day you were showing me something that had to do with paragraph formatting, where you had uncovered
a link between that and, I think, dynamic programming or some other kind of mathematical programming.
That was a very interesting story, which is told other places, but maybe you want to use that as an
example to illustrate the link between one part of your life and another.
Knuth: I’m not sure if I mentioned that. I was telling somebody about that in the last two weeks. I don’t
know if I mentioned it last week.
Knuth: Then in my class they said they could do this with the dynamic programming algorithm that I
used for music. It turned out to also work for English texts, and that was a revelation for my student. But
then when I got to actually programming it, I had to also organize it so that I could handle lots of text. I
had to develop a new data structure in order to be able to do the paragraph coming in text and enter it in
an efficient way. I had to introduce some ideas that are called “glue”, and “penalties”, and figure out how
that glue should disappear at boundaries in certain cases and not in others. All these things would never
have occurred to me unless I was writing the program. Edsger Dijkstra gave this wonderful Turing lecture
early in the 70s called “The Humble Programmer.” One of the points he made early on in his talk was
that when they asked him in Holland what his job title was, he said, “Programmer,” and they said, “No,
that’s not a job title. You can’t do that; programmers are just coders.” They’re people who are assigned
like scribes were in the days when you needed somebody to write a document in the Middle Ages.
Dijkstra said he was proud to be a programmer. Unfortunately he changed his attitude completely, and I
think he wrote his last computer program in the 1980s. At this conference I went to in 1967 about
simulation language, Chris Strachey was going around asking everybody at the conference what was the
last computer program you wrote. This was 1967. Some of the people said, “I’ve never written a
computer program.” Others would say, “Oh yeah, here’s what I did last week.” I asked Edsger this
question when I visited him in Texas in the 90s and he said, “Don, I write programs now with pencil and
paper, and I execute them in my head.” He finds that a good enough discipline. I think he was mistaken
on that. He taught me a lot of things, but I really think that if he had continued... One of Dijkstra’s
greatest strengths was that he felt a strong sense of aesthetics, and he didn’t want to compromise his
notions of beauty. They were so intense that when he visited me in the 1960s, I had just come to
Stanford. I remember the conversation we had. It was in the first apartment, our little rented house,
before we had electricity in the house. We were sitting there in the dark, and he was telling me how he
had just learned about the specifications of the IBM System/360, and it made him so ill that his heart was
actually starting to flutter. He intensely disliked things that he didn’t consider clean to work with. So I can
see that he would have distaste for the languages that he had to work with on real computers. My
reaction to that was to design my own language, and then make Pascal so that it would work well for me
in those days. But his response was to do everything only intellectually. So, programming. I happened
to look the other day. I wrote 35 programs in January, and 28 or 29 programs in February. These are
small programs, but I have a compulsion. I love to write programs and put things into it. I think of a
question that I want to answer, or I have part of my book where I want to present something. But I can’t
just present it by reading about it in a book. As I code it, it all becomes clear in my head. It’s just the
discipline. The fact that I have to translate my knowledge of this method into something that the machine
is going to understand just forces me to make that crystal-clear in my head. Then I can explain it to
somebody else infinitely better. The exposition is always better if I’ve implemented it, even though it’s
going to take me more time.
Feigenbaum: It’s not just the exposition. It’s the understanding. That’s why I don’t do theoretical AI. I
just can’t understand the thing from a theoretical point of view until I experiment with it.
Knuth: Yeah. That’s absolutely true. I’ve got to get another thought out of my mind though. That is,
early on in the TeX project I also had to do programming of a completely different type. I told you last
week that this was my first real exercise in structured programming, which was one of Dijkstra’s huge...
That’s one of the few breakthroughs in the history of computer science, in a way. He was actually
responsible for maybe two of the ten that I know. So I’m doing structured programming as I’m writing
TeX. I’m trying to do it right, the way I should’ve been writing programs in the 60s. Then I also got this
typesetting machine, which had, inside of it, a tiny 8080 chip or something. I’m not sure exactly. It was a
Zilog, or some very early Intel chip. Way before the 386s. A little computer with 8-bit registers and a
small number of things it could do. I had to write my own assembly language for this, because the
existing software for writing programs for this little micro thing were so bad. I had to write actually
thousands of lines of code for this, in order to control the typesetting. Inside the machine I had to control
a stepper motor, and I had to accelerate it. Every so often I had to give another [command] saying,
“Okay, now take a step,” and then continue downloading a font from the mainframe. I had six levels of
interrupts in this program. I remember talking to you at this time, saying, “Ed, I’m programming in
assembly language for an 8-bit computer,” and you said “Yeah, you’ve been doing the same thing and it’s
fun again.” You know, you’ll remember. We’ll undoubtedly talk more about that when I have my turn
interviewing you in a week or so. This is another aspect of programming: that you also feel that you’re in
control and that there’s not a black box separating you. It’s not only the power, but it’s the knowledge of
what’s going on; that nobody’s hiding something. It’s also this aspect of jumping levels of abstraction. In
my opinion, the thing that computer scientists are best at is seeing things at many levels of detail: high
level, intermediate levels, and lowest levels. I know if I’m adding 1 to a certain number, that this is getting
me towards some big goal at the top. People enjoy most the things that they’re good at. Here’s a case
where if you’re working on a machine that has only this 8-bit capability, but in order to do this you have to
go through levels, of not only that machine, but also to the next level up of the assembler, and then you
have a simulator in which you can help debug your programs, and you have higher level languages that
go through, and then you have the typesetting at the top. There are these six or seven levels all present
at the same time. A computer scientist is in heaven in a situation like this.
Feigenbaum: Don, to get back, I want to ask you about that as part of the next question. You went back
into programming in a really serious way. It took you, as I said before, ten years, not one year, and you
didn’t quit. As soon as you mastered one part of it, you went into Metafont, which is another big deal. To
what extent were you doing that because you needed to, what I might call expose yourself to, or upgrade
your skills in, the art that had emerged over the decade-and-a-half since you had done RUNCIBLE? And
to what extent did you do it just because you were driven to be a programmer? You loved programming.
Knuth: Yeah. I think your hypothesis is good. It didn’t occur to me at the time that I just had to program
in order to be a happy man. Certainly I didn’t find my other roles distasteful, except for fundraising. I
enjoyed every aspect of being a professor except dealing with proposals, which I did my share of, but that
was a necessary evil sort of in my own thinking, I guess. But the fact that now I’m still compelled to… I
wake up in the morning with an idea, and it makes my day to think of adding a couple of lines to my
program. Gives me a real high. It must be the way poets feel, or musicians and so on, and other
people, painters, whatever. Programming does that for me. It’s certainly true. But the fact that I had to
put so much time in it was not totally that, I’m sure, because it became a responsibility. It wasn’t just for
Phyllis and me, as it turned out. I started working on it at the AI lab, and people were looking at the
output coming out of the machine and they would say, “Hey, Don, how did you do that?” Guy Steele was
visiting from MIT that summer and he said, “Don, I want to port this to take it to MIT.” I didn’t have two
users. First I had 10, and then I had 100, and then I had 1000. Every time it went to another order of
magnitude I had to change the system, because it would almost match their needs but then they would
have very good suggestions as to something it wasn’t covering. Then when it went to 10,000 and when
it went to 100,000, the last stage was 10 years later when I made it friendly for the other alphabets of the
world, where people have accented letters and Russian letters. I had started out with only 7-bit codes. I
had so many international users by that time, I saw that was a fundamental error. I started out with the
idea that nobody would ever want to use a keyboard that could generate more than about 90 characters.
It was going to be too complicated. But I was wrong. So it [TeX] was a burden as well, in the sense that I
wanted to do a responsible job. I had actually consciously planned an end-game that would take me four
years to finish, and [then] not continue maintaining it and adding on, so that I could have something
where I could say, “And now it’s done and it’s never going to change.” I believe this is one aspect of
software that, not for every system, but for TeX, it was vital that it became something that wouldn’t be a
moving target after while.
Feigenbaum: The books on TeX were a period. That is, you put a period down and you said, “This is it.”
Knuth: 1986 was it, in other words. Five volumes were published, “Computers and Typesetting,
Volumes A, B, C, D, and E”, and that was to be the end. Then we had this 1988 and 1989, changing
everything from 7-bit to 8-bit, which was a major rewrite, done with the help of volunteers all over the
world. But I still had to personally do everything myself in order to make sure that it wasn’t going to
diverge.
Feigenbaum: This was at the same time it was being ported over to personal computers?
Knuth: It was ported over to personal computers already in 1980. It was ported to 200 different
programming environments -- I’m considering the combination of operating system and language -- by
1981. TeX ’82 was the complete rewrite and incompatible break with TeX ’78. The original design, TeX
’78, had already been ported to 200 different environments before I did TeX ’82. We also made sure that
this could be ported.
Knuth: Yes. We worked on the porting environment. This was the genesis of literate programming.
One of the aspects of literate programming that doesn’t get top billing is the way it helps for porting a
system. It’s called change file mechanism. I have my master files, and nobody is allowed to touch these.
It says at the top of the file, “Do not change this file unless you are D.E. Knuth.” I don’t know how many
D.E. Knuth’s there are in the world, but anyway I get to change the master file. But change files come
along. The change file starts out with a line saying, “Okay, now go to the first line in the master file that
matches this,” and then it quotes lines from the master file, When it comes to the end, then it says, “Now
replace those by these lines.” This turned out to be a very flexible mechanism. It also had extra features,
like you can include another change file in the midst of one change file. But anyway, there’s the master
files that I write, and you have everybody who’s porting it. You have hundreds of these change files.
Then I make a change to the master file, because I find a bug, or because I have to have a new feature
before TeX is frozen. Still, the change file has very minor corrections in it. The error checking was
sufficiently good that you would usually find that the people who were porting it to another environment,
their ports would automatically work, even though I was changing the thing and they understood the port.
So that mechanism has worked well.
Feigenbaum: Don, I wanted to, while we’re talking about TeX and this decade, bring in fonts. Font
design, your interest in the art of font design, bringing Chuck Bigelow to Stanford. All of that, and
Metafont as a program, and as a book.
Knuth: Yeah. Metafont. Wow, there’s so many layers here. I just received in the mail two days ago a
wonderful book by Herman Zapf, who’s about to celebrate his 90th birthday. It tells the story of his life
and everything, and I’m just thinking about it because I met so many wonderful people. The graphic
designers are about the nicest people I’ve ever met in my life, and this came out of this group. It starts
out, actually, very briefly, at Stanford. Stanford has a wonderful professor, Matt Kahn, who taught a
course in basic design. Jill and I took his class – audited his class -- in 1976, I think it was. I got to rub
shoulders with artists during this time. He also gave a lot of insight into the way artists do their wonderful
things. Then a few years later when I’m working on TeX, of course aesthetics is very important to me.
That’s why I didn’t like the Bell Lab system, otherwise I would’ve adopted the Bell Lab system. I had to
have something that looked beautiful to me. Stanford has a wonderful collection of fine printing, called
the Gunst Collection. I went through and I absorbed the writings of type designers through the centuries,
and studied, and started to learn what makes good quality different from ordinary quality in published
books. That was during the earliest time working in TeX. Before the summer of ’77, I could be mostly
found, like during May of that year just before my sabbatical, I could probably mostly be found in the
Stanford Library reading about the history of letter forms. Before I went to China I had drafted the letters
for A to Z. I’m not sure if I had gotten into all the letters. I think I had probably 26 lower case and 26
upper case letters by the time I left for China. But I had to do fonts at the same time as TeX. It wasn’t
something [where] I can do TeX and then I can do fonts. It’s a chicken and egg problem. You can’t do
typesetting unless you have the fonts to work with. Structured programming gave me a different feeling
from programming the old way. A feeling of confidence that I didn’t have to debug something immediately
as I wrote it. Even more important, I didn’t have to mock-up the unwritten parts of the program. I didn’t
have to do any fast prototyping on something like this, because when you use structured programming
methodology, you have more confidence that it’s going to be right, that you don’t have to try it out first. In
fact, I wrote all of the code for TeX over a period of seven months, before I even typed it into a computer.
It wasn’t until March of 1978 when I spent three weeks debugging everything I had written up to that time.
Certainly you can imagine how I’m feeling in October, November, saying, “Hmm. I wonder if this is really
going to typeset a paragraph, if these data structures I have for dynamic programming are really going to
work.” Maybe I’m a little curious about it, but structured programming still was strong enough that I
thought, “No, no. If I’m going to try to minimize my total time, then why should I have to first debug my
prototype and then debug the real thing? Why don’t I just do all the debugging once and save total time?”
The same with fonts. I had to have fonts. I couldn’t debug TeX until I had the fonts. So it’s all mixed up,
but working on one for a month and then going to the other for a month and coming back. I thought fonts
were going to be easy. I had seen Butler Lampson playing around with fonts at Xerox PARC. He was
sitting at a terminal and he had a big letter “B.” I can sort of visualize it now. He was drawing splines
around the edge. In my art class project I had done a project for Matt Kahn [which] taught me about
splines, so I knew how to program splines. I thought, okay, I’ll get the letters that are used in the old
edition of “The Art of Computer Programming”, and I’ll do like Butler did, and I’ll make my font. I was
going to go over to Xerox PARC and work with their equipment. They said, “Fine. Sure, Don. We’ll give
you an office over here. Of course, any fonts you design here become Xerox property. You won’t mind
that?” I said, “What? All I’m going to come out with [are] my measurements, a bunch of numbers. How
can you own those numbers? These are just integers. Numbers belong to God.” Well, this is a
debatable point. But they said anything I do there would belong to them. So I worked instead at the
Stanford AI lab, where we didn’t have anywhere near as good of precision cameras. We had a TV
camera and a great amount of distortion. If you turned the light slightly up just a tiny bit, the width of the
letters on the screen would grow by 25%. It was impossible to do any quality work through that. I had to
learn all kind of tricks for getting around it. It became much more difficult to do fonts than I had expected.
You were saying the other day that a story has to have moments of tragedy as well as success. One of
the greatest disappointments in my whole life was the day that I received in the mail the new edition of
volume 2 of “The Art of Computer Programming,” which was typeset with my fonts and which was
supposedly to be the crowning moment of my life when I had succeeded with the TeX project. I think it
was 1981, and I had gotten the best typesetting equipment, and I had written a program for the 8-bit
microprocessor inside, and it had 5,000 dots-per-inch, and all of the proofs that I had coming out looked
good on this machine. I went over to Addison-Wesley and they typeset it, and it came in a book. There
was the book, and it was in the familiar beige color covers. I opened the book up and I’m thinking oh, this
is going to be a nice moment. [But] this doesn’t look the same!
Knuth: I sent them film. It doesn’t look the same as my other books. I had volume 2, first edition. I had
volume 2, second edition. They were supposed to look the same. Everything I had known up to that
point was that they would look the same. All the measurements seemed to agree. But a lot of distortion
goes on, and our optic nerves aren’t linear. All kinds of things happening. I wrote it up once, when I say I
burned with disappointment. I mean, I really felt a hot flash where I “Ohhhhh!”
Feigenbaum: You were saying that you put so much effort into this and it wasn’t beautiful.
Knuth: It wasn’t that bad. Some people didn’t notice any difference at all, but the worst was the
numerals. The numbers 1, 2, 3, 4, 5 are really in a rather different style from letters, and they’re very
tricky. I didn’t realize that when browsing a book our eyes jump and focus on different parts, and one of
the things we focus on most, often when we’re using a book, is the page numbers. And the 2 was really
ugly. And the 6 -- there is something about the 6 that it’s just not a 6. And the 5‘s! Anyway, I got to the
point where I was so upset. Some of California highway signs -- the speed limit signs for 50 miles an
hour, or 25 miles an hour -- the 5 is really ugly. It looks like the 5 that I used to have. I couldn’t live in
Santa Rosa because they have lousy 5’s on their speed limit signs in Santa Rosa. It just reminds me of
this awful time. There will be a time when I would be looking at all of the 2’s that I could see as I’m riding
a bus, or something like this, and how am I going to get this 2 to be right, because the numbers were the
worst of all. The letters were okay, but I’d seen the numbers, and I can’t read my book without seeing
these numbers. I’m looking up a page and I look in the index. Oh, yeah, I see, page 413. Then I have to
read all these numbers in order to get to page 413.
Knuth: Before.
Knuth: You see, it’s the context. Having it on a film… Ok, first of all, we’re working with the Xerox
Graphics Printer, which has a very low resolution. Everything has jaggies -- jagged edges -- in that
machine. I knew about this even before I started to go into typography. We had the Xerox Graphics
Printer and we were saying, “Oh, this is interesting, but it’s not a book.” Then I had the nice results from
Pat Winston’s book that looked like a book. That was professionally designed type; it wasn’t done by a
computer programmer. But now I was trying to match exactly the type that we had in the other [version].
I would debug my whole book looking at Xerox XGP proofs. Then I would go to my high-res machine,
this expensive typesetter in the basement, and [on] that machine it was certainly crisp, and I didn’t see
any jaggies in those. I had no indication that when this would actually then go to be printed on paper, the
ink gets a little distorted by the printing process, and even more so bound in a place that looked exactly…
It’s the context. It had to look right, and it didn’t at that time. I’m happy to say that I open my books now
and I like what I see.
Knuth: Even though they don’t match exactly 1968, the way they differ are pleasing to me. But I had to…
So then I went to all the best type designers in the world. I had learned some of their names, and I was
able to invite them to participate in my research project, and I got to meet [them]. I could see, for
example, that Herman Zapf, from some of the things he had written, he seemed to be a very open-
minded guy. So I wrote him a letter introducing myself and saying, “Would you be interested in spending
two weeks at Stanford?” And boy! He’s the absolute best in the world. In my apprenticeship he’s one of
my great teachers. As you mentioned Chuck Bigelow, Chuck was the dean of typography in America. I
worked out to get some donations that we would be able to hire Chuck and have a joint appointment with
the art department. I was glad to find out that after we had gone through the process of committees and
getting the appointments approved by two departments and everything, the week after he had accepted
our offer he received a MacArthur Prize Fellowship, which certainly enhanced my credibility too with the
art department. This was a big, new thing for them; we had never had a joint thing with the art
department before. I brought Matthew Carter, who is considered definitely the leading type designer in
America. There was a great article about him in The New Yorker last year. He was out here for a quarter.
Many other visitors and industry leaders from around the world helped me at the time. Finally by 1986 I
was ready. I had type that I could be happy with. They said to me, “Don, that’s the normal five years’
apprentice as a type designer. That’s the way it goes.” Originally, I thought it was just going to be a
matter of making a few measurements and taking a few numbers, and that would be it.
Feigenbaum: That was the TeX story, the METAFONT story. Anything else going on during this time,
[in] the other parts of your life?
Knuth: Okay. I had to work so intensively on this software that I could not keep up my normal teaching
load at Stanford. I think three or four quarters… I’m not sure. Were you chair?
Knuth: ’81, yeah. So I had to approach you and say, “Can you give me a leave of absence this quarter
because I’m doing software?” Also then Nils [Nilsson] probably. Do you know who? No?
Knuth: Gene. Okay. Anyway, I missed three or four quarters during a period of four years, because I
found that writing software was much more difficult than anything else I had done in my life, in the
following sense. I had to keep so many things in my head at once. I couldn’t just put them down and
start something else. It really took over my life during this period. I used to sort of think there were
different kind of tasks: writing a paper, writing a book, teaching a class, things like that. I could juggle all
of those simultaneously. But software was an order of magnitude harder. I couldn’t do that and still teach
a good Stanford class. Of course, I’m advising my grad students through all this period, and they’re doing
great theses related to typography. Mostly, not always. But the other parts of my life were largely on
hold. That includes The Art of Computer Programming. Except volume 2 was my big project, to get the
new edition of volume 2 done with TeX. In 1980 I spent several months just doing pure… There were
new developments in the algorithms that belong in volume 2, and I wrote a lot of new material for volume
2 during this period. But then in order to get TeX and METAFONT completely finished, that was the
focus. At Stanford we had a unique class taught in the spring of ’84 when the new version of METAFONT
was being done. I co-taught it with Chuck Bigelow and Richard Southall. Richard is not a type designer
but an expert in the interface between the designer and the actual final product. He’s a talented designer
but he’s not one of the leading designers. His main expertise is actually knowing what distortions you
have to make in order to get it to look right on the page. The three of us co-taught the class. The class
met three days a week, once by Chuck, once by Richard and once by me. The students in the class are
learning to design fonts at the same time. It was a great quarter doing this class, and it was all recorded
on videotape. Unfortunately the tapes were all erased, so we just have our memories of this class. My
life was pretty much typography. When it got to The Art of Computer Programming, every three months I
would take a look at the journals that had come in for those three months and I would scan the titles. For
each article I would say, “Oh, this belongs in volume 4, in a certain part.” I kept an index of them for a
while. I started throwing the preprints that I would receive in the mail, I started first putting them into a
box. All my preprints had been organized well for volume 4, into 32 compartments. But then they were
starting to overflow, so then I had X1, which just had overflow from all the compartments, and X2 and X3.
I got up to X15 of these preprints. Then I gave up on that and I started putting them into a big box in a
room in my house. And then the box overflowed and there was a big pile on the floor.
Feigenbaum: Yeah. I remember visiting you in your study when it was just a chaos of piles.
Knuth: Yeah. So in 1993, I think it was, I finally attacked the pile. I went through and I had accumulated,
I think it was, 14 linear feet of material that I had just been saying “someday get to this for volume 4.” I
think it took me a year to go through all of that and organize it and get ready to write the real volume 4
after all this time. So I put that on hold. Then before 1994 I had to get ready to, well, I’m retiring. We’ll
probably get into my third Stanford period. But typography was it for the early part of the ‘80s. Then I
started doing a lot of mathematical research in the late part of the ‘80s, analysis of algorithms, my real
life’s work.
Feigenbaum: I’d like to do that, to move on to the third period. You’ve already mentioned one of them,
the retirement issue, and let’s talk about that. The second one you mentioned quite early on, which is the
birth in your mind of literate programming, and that’s another major development. Before I quit my little
monologue here I also would like to talk about random graphs, because I think that’s a stunning story that
needs to be told. Let’s talk about either the retirement or literate programming.
Knuth: I’m glad you brought up literate programming, because it was in my mind the greatest spinoff of
the TeX project. I’m not the best person to judge, but in some ways, certainly for my own life, it was the
main plus I got out of the TeX project was that I learned a new way to program. I love programming, but I
really love literate programming. The idea of literate programming is that I’m talking to, I’m writing a
program for, a human being to read rather than a computer to read. It’s still a program and it’s still doing
the stuff, but I’m a teacher to a person. I’m addressing my program to a thinking being, but I’m also being
exact enough so that a computer can understand it as well. And that made me think. I’m not sure if I
mentioned last week, but I think I did mention last week, that the genesis of literate programming was that
Tony Hoare was interested in publishing source code for programs. This was a challenge, to find a way to
do this, and literate programming was my answer to this question. That is, if I had to take a large
program like TeX or METAFONT, fairly large, it’s 5 or 600 pages of a book--how would you do that? The
answer was to present it as sort of a hypertext, where you have a lot of simple things connected in simple
ways in order to understand the whole. Once I realized that this was a good way to write programs, then I
had this strong urge to go through and take every program I’d ever written in my life and make it literate.
It’s so much better than the next best way, I can’t imagine trying to write a program any other way. On the
other hand, the next best way is good enough that people can write lots and lots of very great programs
without using literate programming. So it’s not essential that they do. But I do have the gut feeling that if
some company would start using literate programming for all of its software that I would be much more
inclined to buy that software than any other.
Feigenbaum: Just a couple of things about that that you have mentioned to me in the past. One is your
feeling that programs can be beautiful, and therefore they ought to be read like poetry. The other one is a
heuristic that you told me about, which is if you want to get across an idea, you got to present it two ways:
a kind of intuitive way, and a formal way, and that fits in with literate programming.
Knuth: Right.
Knuth: Yeah. That’s the key idea that I realized as I’m writing The Art of Computer Programming, the
textbook. That the key to good exposition is to say everything twice, or three times, where I say
something informally and formally. The reader gets to lodge it in his brain in two different ways, and they
reinforce each other. All the time I’m giving in my textbooks I’m saying not only that I’m.. Well, let’s see.
I’m giving a formula, but I’m also interpreting the formula as to what it’s good for. I’m giving a definition,
and immediately I apply the definition to a simple case, so that the person learns not only the output of
the definition -- what it means -- but also to internalize, using it once in your head. Describing a computer
program, it’s natural to say everything in the program twice. You say it in English, what the goals of this
part of the program are, but then you say in your computer language -- in the formal language, whatever
language you’re using, if it’s LISP or Pascal or Fortran or whatever, C, Java -- you give it in the computer
language. You alternate between the informal and the formal. Literate programming enforces this idea.
It has very interesting effects. I find that, for example, writing a system program, I did examples with
literate programming where I took device drivers that I received from Sun Microsystems. They had device
drivers for one of my printers, and I rewrote the device driver so that I could combine my laser printer with
a previewer that would get exactly the same raster image. I took this industrial strength software and I
redid it as a literate program. I found out that the literate version was actually a lot better in several other
ways that were completely unexpected to me, because it was more robust. When you’re writing a
subroutine in the normal way, a good system program, a subroutine, is supposed to check that its
parameters make sense, or else it’s going to crash the machine. If they don’t make sense it tries to do a
reasonable error recovery from the bad data. If you’re writing the subroutine in the ordinary way, just
start the subroutine, and then all the code. Then at the end, if you do a really good job of this testing and
error recovery, it turns out that your subroutine ends up having 30 lines of code for error recovery and
checking, and five lines of code for what the real purpose of the subroutine is. It doesn’t look right to you.
You’re looking at the subroutine and it looks the purpose of the subroutine is to write certain error
messages out, or something like this. Since it doesn’t quite look right, a programmer, as he’s writing it, is
suddenly unconsciously encouraged to minimize the amount of error checking that’s going on, and get it
done in some elegant fashion so that you can see what the real purpose of the subroutine is in these five
lines. Okay. But now with literate programming, you start out, you write the subroutine, and you put a line
in there to say, “Check for errors,” and then you do your five lines. The subroutine looks good. Now you
turn the page. On the next page it says, “Check for errors.” Now you’re encouraged. As you’re writing
the next page, it looks really right to do a good checking for errors. This kind of thing happened over and
over again when I was looking at the industrial software. This is part of what I meant by some of the
effects of it. But the main point of being able to combine the informal and the formal means that a human
being can understand the code much better than just looking at one or the other, or just looking at an
ordinary program with sprinkled comments. It’s so much easier to maintain the program. In the
comments you also explain what doesn’t work, or any subtleties. Or you can say, “Now note the
following. Here is the tricky part in line 5, and it works because of this.” You can explain all of the things
that a maintainer needs to know. I’m the maintainer too, but after a year I’ve forgotten totally what I was
thinking when I wrote the program. All this goes in as part of the literate program, and makes the
program easier to debug, easier to maintain, and better in quality. It does better error messages and
things like that, because of the other effects. That’s why I’m so convinced that literate programming is a
great spinoff of the TeX project.
Feigenbaum: Just one other comment. As you describe this, it’s the kind of programming methodology
you wish were being used on, let’s say, the complex system that controls an aircraft. But Boeing isn’t
using it.
Knuth: Yeah. Well, some companies do, but the small ones. Hewlett-Packard had a group in Boise that
was sold on it for a while. I keep getting… I got a letter from Korea not so long ago. The guy says he
thinks it’s wonderful; he just translated the CWEB manual into Korean. A lot of people like it, but it doesn’t
take over. It doesn’t get to a critical mass. I think the reason is that a lot of people don’t enjoy writing the
English parts. A lot of good programmers don’t enjoy writing the English parts. Two percent of the
world’s population is born to be programmers. I don’t know what percent is born to be writers, but you
have to be in the intersection in order to be really happy with literate programming. I tried it with Stanford
students. I had seven undergraduates. We did a project leading to the Stanford GraphBase. Six of the
seven did very well with it, and the seventh one hated it.
Feigenbaum: Don, I want to get on to other topics, but you mentioned GWEB. Can you talk about WEB
and GWEB, just because we’re trying to be complete?
Knuth: Yeah. It’s CWEB. The original WEB language was invented before the [world wide] web of the
internet, but it was the only pronounceable three-letter acronym that hadn’t been used at the time. It
described nicely the hypertext idea, which now is why we often refer to the internet as a web too. CWEB
is the version that Silvio Levy ported from the original Pascal. English and Pascal was WEB. English and
C is CWEB. Now it works also with C++. Then there’s FWEB for Fortran, and there’s noweb that works
with any language. There’s all kinds of spinoffs. There’s the one for Lisp. People have written books
where they have their own versions of CWEB too. I got this wonderful book from Germany a year ago
that goes through the entire MP3 standard. The book is not only a textbook that you can use in an
undergraduate course, but it’s also a program that will read an MP3 file. The book itself will tell exactly
what’s in the MP3 file, including its header and its redundancy check mechanism, plus all the ways to
play the audio, and algorithms for synthesizing music. All of it a part of a textbook, all part of a literate
program. In other words, I see the idea isn’t dying. But it’s just not taking over.
Feigenbaum: We’ve been talking about, as we’ve been moving toward the third Stanford period which
includes the work on literate programming even though that originated earlier. There was another event
that you told me about which you described as probably your best contribution to mathematics, the
subject of random graphs. It involved a discovery story which I think is very interesting. If you could sort
of wander us through random graphs and what this discovery was.
Knuth: Well, let me try to set the scene and connect it to the past a little bit. We finished the TeX project.
The climax of that was 1986, although I did have to come back into it later on to make it more world
friendly. But after 1986, that was a sabbatical year for me, so it was also a time when I spent the whole
year in Boston. It was the year I gave to my wife as her sabbatical. It was 25 years of marriage; I thought
I could help her for one year, and she’s been helping me for all the rest. That was a break. I came back
to Stanford after that, and I plunged into what I consider my main life’s work is analysis of algorithms.
That’s a very mathematical thing, and so instead of having font design visitors to my project, I had great
algorithmic analysts to my project, especially Philippe Flajolet from Paris. I started working on some
powerful mathematical approaches to analysis of algorithms that were unheard of in the ‘60s when I
started the field. We were excited about these developments and able to analyze a lot more algorithms
that previously were untouchable. Also other visitors, like Boris Pittel and so on. I had good research
funding to do work on analysis of algorithms. In fact I brought in the TeX project originally as just a minor
thing on my contract. ”Say, by the way, we’re going to write these technical papers and we need a
publishing method to present our work, so I’ll spend a little time on typography.” That lasted only a year,
and then I got special funding for working on TeX. But throughout that time I also was doing a little bit of
support, with graduate students and visitors, doing analysis of algorithms. This became a major thing
again in the late ‘80s. I found on the web one of my progress reports from 1987 listing ten
accomplishments of that year. I had to say that I don’t know if any other year was as fruitful as that year,
as far as my project was concerned anyway. It was certainly in full swing again finally after, from 1977 to
1986, the work on typography. So here I am in math mode, and thriving on the beauties of this subject.
The main glory of it then occurred after the new ideas had started to gel. We started to see the deeper
implications. As you learn the new techniques you apply it to new problems that were previously
unreachable. One of the problems that was out there that was fascinating is the study of random graphs.
Graphs are one of the main focuses of volume 4, all the combinatorial algorithms, because they’re
ubiquitous in applications. A lot of times in order to understand what an algorithm is doing, you see what
would it do if I applied it to random data of various kinds. Yesterday at our computer forum Pat Hanrahan
was telling me how many people he knows that are working with random graphs to study the internet, and
so on. One of the simplest models of random graphs is one that also the physicists had been interested
in for many years. It connects to so-called Bose-Einstein statistics, they tell me, although I don’t really
understand that much about that part of physics. This model is very simple. We start out with N points
that are totally disconnected from each other. These points don’t exist in three-dimensional space. They
exist just as N objects in any number of dimensions. Initially there’s no connection at all between any
objects. But you can imagine that somebody draws two random objects, totally at random. Close your
eyes, find one, and each one with equal probability, 1/N. Then find another one and then put a
connection between those two. “Zap.” Those two are now joined. Okay. Now we have N-2 objects that
are still independent, but two of them are connected together. Do it again, and maybe you’ll connect two
others. After you do it a few more times you might find that these two are together, and these two are
together, but then you will hook them together and we’ll get four. Or we might have two that get a third; a
guy goes with them. Eventually we build up trees of things, meaning that they’re hooked together but they
don’t have cycles. There’s no loops. Everything in a tree is connected to everything else in the tree, but
there’s only one way to get from each one to each other one. There’s no loops. But we keep on adding.
This random process keeps going on, adding more and more connections, one at a time. Eventually
cycles occur. If we keep on going on and on and on and on, eventually everything is going to be
connected to everything else directly. This is called the evolution of random graphs. We can ask, at any
point in time, what does the random graph look like after we’ve added M connections to these N groups?
What does it look like? Paul Erdos and Alfred Renyi had proved in 1960 that an amazing thing happens
as we add these connections. When M gets to a value which is approximately one half of N times the
natural log of N, all of a sudden a “big bang” occurs, where comparatively little connection was true
before the big bang, compared to a lot after the big bang. The statistics are something like this. If we say
that M, the number of edges, is equal to lambda over 2 times N. If M is N over 2, if we went ahead,
added half as many edges as there are points, then lambda is 1. If lambda is 10, then I’ve added 5N
point connections. The thing is, if lambda is less than 1… So we consider a large value of N, and we
have fewer than one half… Sorry. If lambda is less than log N… No. Ok. Change my definition so the
number of edges is equal to lambda times natural log N times N over 2. If lambda is less than 1, then
almost surely the graph consists of only trees, and the largest tree is of size something like the logarithm
of N. It’s almost totally dispersed. If lambda is equal to 1, almost surely there is a component of size N to
the two thirds power; if N is a million, a component of size approximately 10,000. It’s N to the two thirds
power. It goes from log N size trees to connect the part that’s big, that has N to the two thirds. If lambda
is greater than 1, it’s proportional to N, not N to the two thirds. So there is this jump between a very small
number and no cycles. If lambda is 1 minus, if lambda is 0.999999, you still only get log N. If lambda is
1.000001, you get N. There is this bang that’s occurring, and the question…
Knuth: Discontinuity, a double jump. People who have studied the Erdos and Renyi, and physicists,
could study it from the point of view of starting from zero and going up to lambda equals 1, and then their
equations would blow up at lambda equals 1. Or they could study the later stages, larger lambda, and
lambda gets down towards 1, and there the equations blow up. Okay? Now a Russian man in St.
Petersburg who had noticed to his surprise that actually there was some similarity between the blow-ups
from the top and the blow-ups from the bottom. What we proposed to study was what happens in the
middle, and center on the middle, if possible. I guess we’ll continue the story later.
Feigenbaum: Don, we’re at the discontinuity point, and you’re about to explore both sides of that point,
and the story’s going to get really interesting here.
Knuth: Well, I hope so, but at about this time, Dick Karp at Berkeley was also interested in the evolution
of random graphs, and this explosion phenomena. It relates in a vague way to computer algorithms,
because if we have data that has a lot of connections in it, then we would want to use a different kind of
data structure to represent in the computer, and certain strategies would work a lot better. Dick Karp had
shown that, for example, if we want to take the transitive closure of a binary relation, you use a different
method, or if you want to update the consequences of adding a new thing, depending on how big the
graph is, you want to choose a different strategy. So this becomes a problem also in an analysis of
algorithm as well as in physics. He had a couple of his graduate students do a simulation and try to grow
a lot of random graphs and see what happened. The word we heard from this simulation -- it actually
turned out we misunderstood it -- but what it seemed to imply from what we heard from what the Berkeley
students had done was the following: as the graph is growing and getting more and more connections,
the graph first gets to a point where it has one cycle. It’s not just trees, but there’s also one of the
components has an extra edge in it, more than needed to connect things together. Not only are the
things connected, but also there’s another edge making a cycle. Eventually there will be two cycles, and
three cycles, and things like this, and there’ll be more things happening. What we thought the Berkeley
group had discovered was that there almost never was a case where two of the connected components
of the graph would have cycles. In other words, as we’re adding edges, components merged together;
things that used to be apart become one. You might think that actually in a graph if we have a left
component and a right component, the left component might get a cycle and the right component might
get a cycle, and then they might merge later. But in the Berkeley experiments, it seemed, this almost
never happened. Instead, whichever component first got a cycle, it was the only one that had cycles later
on. Others would merge into it, but none of these other components would grow their own cycles first.
They weren’t big enough to have cycles. We thought, well, if this is true, this would also have
implications for data structures and algorithms. We could design our algorithms so that they could have
one place for the cycle guy, and one place for the other ones. We could have our data structure and say,
well, here’s where the cycles are, and here’s where the trees are, and then we could do faster updating.
So we set out, really, not originally to understand everything about the way the graph goes through this
critical point. Our original goal was to just try to prove what we thought the Berkeley group had found
empirically, this phenomenon that there’s sort of almost always only one main component, or one main
component that has cycled.
Feigenbaum: Don, can I interrupt you just a second to ask a question? What puzzles me, and puzzles
maybe the audience, which is how often do analysts, mathematical analysts, do empirical experiments to
discover things? Is that a usual thing, or was it special in this case?
Knuth: It’s a fast-growing area in mathematics. The Journal of Experimental Mathematics was founded
by Sylvio Levy less than ten years ago. He was my co-author with CWEB, but he’s very broad. It’s
because computers are now there, so we can now do empirical studies with mathematics. It’s not too
common. My professor, Marshall Hall, was sort of famous for his observation with combinatorial things,
that at the time he best expressed the wisdom of the 1960s of saying that when you’re doing
mathematics it’s nice to do a bunch of experiments with pencil and paper. If some problem has a
parameter N associated with it, you can usually go up to some value of N, like N=10 or something, by
hand. Then with the computer you can go on with N=11. Combinatorial problems tend to grow faster, to
the point where the computer can go beyond the hand thing. But then you can’t go to N=12, because
that’s already too much, because the problem is growing so much. So he says computers were good for
going one case beyond what you could do by hand. But now computers are better by orders of
magnitude than they were there, and also the tools that we have now for examining mathematical things
are much better, the software that we have.
Feigenbaum: If this journal is only ten years old, this work that you were doing around 1990 must’ve
been very much an early kind of a pioneering thing.
Knuth: Well, it was, actually. I guess there was another story associated with that, and that is I did
empirical studies on the first cycle that occurs with a random graph. There was the paper that I wrote just
previous to the one, the work I did with Philippe Flajolet. We first developed the theory, and then we
wanted to have a section at the end of the paper that validated [it] experimentally, so we could see how
big the graph had to be before the asymptotics would kick in. A lot of graph problems actually behave
differently when the size is small. Our theorems we knew were true when N gets up to larger than the
size of the universe, but how did we actually know, if N is a million, is our theory correct? So I ran
experiments, sort of as a last phase of writing the previous paper, in order to test the thing in practice for
small values, since our mathematics was entirely concentrated on the case where N is getting very large,
the size of the graph is getting very large. I ran the program over Christmas vacation. I think I let it run a
little longer than I intended, I think because of timesharing, nobody else was using it at the Christmas
vacation. I didn’t realize, but a week later I got a bill from Betty Scott for $60,000 of computer time, which
was way more than I had in my budget of my research grant. I refused to pay it, basically. I said, “I’m
sorry, I have to declare bankruptcy.” The worst part of the story is that I found out, 15 years later, that I
had a bug in my program and all the answers were wrong, all the $60,000 of calculation. What we had to
report in our paper was that actually our theory didn’t seem to be very relevant for the small values of N.
And Professor, our stat professor -- oh, what’s his name? I see him in front of me, but I don’t know -- he
was looking at our data and he figured out another algorithm by which he could calculate things by hand.
He knew that our answers were wrong. Sure enough, all this money that I wasted on this empirical
calculation, no wonder it didn’t agree with our theory, because my program was, in fact, wrong. In the
reprint of that paper on the first cycles, which came out in my Collected Papers on Discrete Mathematics,
I think it is -- I don’t remember which of mine -- I recomputed this table with a correct program. Of course,
it only took five minutes on a modern computer. But with the SAIL [Stanford Artificial Intelligence Lab]
computer we got a whopping bill. So it wasn’t very usual to do empirical calculations at that time, and it
was at Berkeley that the guys do.
Feigenbaum: Let’s go back to Berkeley. You had interpreted, but probably misinterpreted, the Berkeley
numbers.
Knuth: That’s right. The Berkeley numbers were telling us there might be this giant component
phenomenon, that the seed is planted very early, and then it stays with the thing. That was our original
motivation for studying the… What we finally found out was a good explanation of the Big Bang, but our
motivation -- we didn’t start out in saying, “I’m going to solve this problem.” That would’ve been a
hopeless problem. That would’ve been too much, even for an optimist like me, to say he was going to
tackle that problem. It just turned out that we stumbled on the answer. But in our course of looking at it,
we did find a way to slow down the Big Bang, and that’s not too hard to understand. Let’s imagine again
that we’re watching this graph evolve. Every graph as it evolves finally gets to a point that a cycle
appears in one component. Okay, we have one component containing a cycle. Now then it comes to a
point where there are two cycles in the whole graph. There are two possibilities. Either the two cycles
are in the same component, or one cycle in this component, one cycle in another one. So there’s a fork
in the road. It goes one direction or the other. Then when the third cycle appears, we have three
possibilities. We could have all three in one component, or we could have one and two, or we could have
one, one and one. And so on. You could draw an abbreviated history if you just look at which
components have cycles. There’s a branching diagram that every evolving graph goes through some
path in this diagram. The Berkeley experiment, as we understood it, was that almost always we were on
the upper line of this path. Almost always there’s only one component that contains cycles. These other
possibilities are there, but rare. We developed tools of complex analysis that I had mostly learned from
Philippe Flajolet. It got to the point where I could prove that it wasn’t almost always happening on the top
line, because at the very first branch, if I’m not mistaken, the odds were 72 to 5 that it would take the first
branch, but 5 cases out of 77, you’d take the bottom branch. It’s not going to zero, but that most of the
time it takes the top branch. But then maybe those two will join together and will get up to the top branch
again. We started to have more mathematics so we could find the first branch, in the two cases. A few
more days later, we could extend that. We could say, “Oh, what happens when the two go into three, and
three go into four?” We were getting peculiar numbers, but we could calculate these probabilities by a
long sequence of steps; a lot of calculus, a lot of Mathematica -- or Macsyma, I guess it was at that time -
- using the symbolic algebra systems to grind out these strange probabilities. The truth actually turned
out that the Berkeley experiments had sampled the graph. Say you have a million nodes. They would
sample it after you had a thousand edges, and then you print out what’s the state then. What about
1,100 edges, 1,200 edges? I’m sorry, the critical point occurs at 1/2 N log N. They would sample it at
periodic times, but they wouldn’t sample it [exactly] at the state where you get the first cycle, the second
cycle, as in our mathematics. What we were doing is we were seeing the graph at a certain number of
seconds at time. The truth is that these deviations from the top line disappear very quickly. There’ll be a
brief instant of time where there’s two [cycles], where you’re not on the top line, but then it jumps back up
to the top line again. If you’re only sampling the graph at intervals of time, you almost never see the case
where you’re not on the top line. That’s why we misunderstood what we thought the Berkeley
experiments contained. We actually were able to prove sort of a climactic theorem, to get an exact
probability that it stays on the top line throughout and never ever has more than one cycle. The number
was something like 5π/12. Amazing. No, it’s got to be less than one, but it equals a small rational
multiple of π. That was the exact probability of staying on the top line. It’s kind of amazing that the
number π would occur in this connection. So we had these new mathematical tools. What it finally gave
us was a way to look at the Big Bang from the center of the process, and none of our equations blow up.
We’re able to slow down the Big Bang and watching it happen, by means of this new scale of
measurement saying, “Look at it after there’s a certain number of cycles in the graph.”
Knuth: I had these numbers now. They were numbers like 5/77. I wrote these numbers down, and they
just looked like really crazy numbers. Then one day I decided to take the series, it’s a power series, X +
(5/77) X2, or something like this. You have a sequence of powers of X with weird rational numbers
attached to each power. I realized that the mathematics that we were developing actually would simplify
if we weren’t using those numbers, but instead forming the exponential of these power series. You take
ef(X) instead of f(X). I used Macsyma to calculate ef(X), and it has rational coefficients, too. But one of
those rational coefficients was something like 23023. Or 17017, or something. It wasn’t just a random
number, You play with numbers [and] you know that 23023 is 23 x 7 x 11 x 13, because 7 x 11 x 13 is
1,001, and that happens to involve a lot of small prime numbers. So here’s this number with a lot of small
prime factors appearing. If we didn’t take the exponential, the numbers just looked crazy; they didn’t
have small prime factors, they didn’t have any nice mathematical redeeming features. But after I took
the exponential, all of a sudden the numbers that I was looking at looked like old friends. They were
something that, you know, there had to be a reason for it. God didn’t want these numbers just to be
there. There had to be some mathematical reason. You could say that’s “stumbling on” something. An
hour later I could see the pattern for all of the numbers, because now it was all small prime factors and I
could guess what the next one in the series is. Before having this combination, it was impossible. The
funny story is that I made this discovery in the middle of the night, about 4:00. I could explain why it’s
5/77ths and everything, and I could draw the diagram of the transition of every graph as it goes through
the beginning of the Big Bang. Bill Gates was visiting Stanford the next day, and they were trying to
impress him so that he would donate some money to build a new house for the computer science
department. They asked me to meet with him in the morning. I’m not sure if we had ever met before. I
know he says now that he had studied my books rather hard when he was at Harvard. I was all filled with
the enthusiasm about having seen a pattern in these numbers. I drew on the blackboard the branching
structure of the first moments of the Big Bang, and I put my rational numbers on there, and I put my
formula involving 6N factorial, or anyway all the pattern that I had noticed. Later on, Carolyn Tajnai, who
was walking him around between the buildings, said to me, “Don, can you recreate for me what you put
on the blackboard that day? Because Bill was really enthusiastic about this?” The next day he wrote a
check to Stanford, for I don’t know, $10 million, or something like this. I always use this story if somebody
says, “Who says theoretical computer science has no value?”
Feigenbaum: Great story. Here’s a question which kind of wraps up the Stanford… Well, you were
going to talk about your retirement, and then I was going to ask you about volumes five through seven.
Knuth: Okay.
Knuth: I was afraid you’d ask me about volumes five through seven, so I’ll talk about retirement. That’s
really when we’re getting into phase 3 of Stanford. Phase 4 will be retirement, then, because phase 2
was certainly intensive software work, and then phase 3 was back into intensive analysis for “Art of...”
The next phase is going into retirement. As I said, I had my sabbatical year in Boston in ’86. That was to
be the climax and finishing of the TeX project. Then I come back from sabbatical and get back to speed
and so on. One of the things that happens when you come back from sabbatical is people will say to you,
“Oh, aren’t you glad to be home?” And, you know, I say, “Yeah, it’s nice”, and all this. But I found that it
wasn’t. I wasn’t really as happy as I let on. I mean, I was certainly enjoying this research that I was
doing, but I wasn’t making any progress at all on Volume 4. I’m doing this work, giant component, Big
Bang type of explorations, and I’m learning all of this thing. But at the end of the year, how much more
had been done? I’ve still got this 11 feet of preprints stacked up in my closet that I haven’t touched,
because I had to put that all on hold for the TeX project. I figured the thing that I’m going to be able to do
best for the world is going to be to finish ”The Art of Computer Programming”. I can do cutting edge
research, but maybe I shouldn’t be just enjoying myself on this, but I should be getting stuff out the door
that’s going to be “The Art of Computer Programming,” which I had promised to write in 1962, and here it
is late 1980s. After two years, I started thinking about it during the summer of ’88, as to what I should be
doing with my life. At this point, see, I’m 50 years old. I was born in ’38, this is 1988. I decided that I
didn’t need money anymore. I didn’t need my Stanford salary. I had enough money in the bank. I didn’t
get any money from the TeX project -- that’s in the public domain -- but “The Art of Computer
Programming,” you know, [is] selling by the thousands every year all the time. So I can afford to do
whatever I want with my life. I don’t have to be employed. I can do what’s the best way to use whatever
gifts I have to put out. I decided that I really wanted to do “The Art of Computer Programming,” and get
this done. The only way to do it was to stop being a professor full time. I really had to be a writer full
time. I wrote a memo to Nils Nilsson, who is our chair, saying, “Nils, I’ve decided that after two more
years I would like to go on leave of absence and never come back.” I would love to continue an affiliation
with Stanford whereby I would be giving occasional lectures, but I think the thing I really want to do is
write “The Art of Computer Programming.” I don’t like the idea of a professor who just spends all his time
writing books and getting paid for being a professor, so I shouldn’t call myself any more a fulltime
professor. I shouldn’t be drawing my Stanford salary. I’m going to be doing only the books, except for
occasional things. I’d like to be five percent time to keep participating in things, but I’ll never get my
books done unless I can really put fulltime into that. If I’m only going to make one day’s worth of progress
out of every 365, it’s going to take an estimated several centuries to finish at this rate. I wrote this letter to
Nils, and then we had meetings with the Dean, Gibbons, and the provost, who you know is Jim Ross.
They thought maybe they could find a donor to Stanford who’d like to endow a professorship for
somebody who writes “The Art of Computer Programming.” They didn’t find that, but they did say that we
could have an amicable way to achieve this. It looked like in a year-and-a-half we’d be able to find
someone who would take over my role as leading the analysis of algorithms activities in the department.
Unfortunately, that never happened. We never found a senior person to take over what I was doing. But
as of January 1990 I became on leave of absence. They allowed the leave of absence to continue until I
was 55 years old and I could officially retire with a pension. I didn’t get any buy-out or anything like this,
like people are talking about now, but I do get some of my health insurance and so on through Stanford.
This is the kind of retirement that I worked out. I was able to also create my own title. I’m “Professor
Emeritus of The Art of Computer Programming”, with a capital “T” in “The Art of Computer Programming.”
I love that title. So starting at age 55, which is officially the beginning of ’93, I was Professor Emeritus.
The arrangement was that I give occasional lectures, which I’m now giving about three a year. We were
hoping for more, like six a year, but it’s on the average three because I’m out of town a lot. I have an
office and a secretary, and I’m on campus a lot. But I don’t have to raise funds for research projects.
Unfortunately I don’t have direct work with graduate students like I did before, and I don’t have regular
teaching. I enjoyed those things very much, and I think that the students that I had, I’m proud of every
one of them. The thing is, “The Art of Computer Programming” is something I have to do my best at.
Feigenbaum: Let me ask you a little easier question than Volume 5 through 7. Where are you in
Volume 4?
Knuth: So, Volume 4: I’m on page 12 of Volume 4 right now, although I’ve already written 400 pages
that come starting after about page 50. I’ve got a lot of it under my belt now, but you know as a computer
programmer you don’t write the initialization first. I’m at the point now I’m ready to write the initialization
to Volume 4.
Feigenbaum: But this volume must make you particularly nervous, because it’s on combinatorial
algorithms.
Knuth: It has [been] subject to combinatorial explosion, so it will Volume 4A, 4B, and 4C-- possibly 4D.
I’m sure it won’t get up to 4Z, but there will be sub-volumes to Volume 4 because of the huge growth in
combinatorial algorithm. By the way, while it’s in my mind, let me, because it related to a question you
asked me last week and I didn’t think of a response at the time. It was something about being an
engineer versus being a scientist, or something like this. The way I tended to phrase that is the relation
between theory and practice in my life. I always thought that the best way to sum up my professional
work is that it has been a mixture of theory and practice, almost equally. The theory that I do gives me
the vocabulary and the ways to do practical things that can make giant steps instead of small steps when
I’m doing a practical problem, because of the theory that I know. The practice that I do makes me able to
consider better, more robust theories, theories that are richer than if they’re just purely inspired by other
theories. There’s this basic symbiotic relationship between those things that’s probably central to the
whole thing. At least four times in my life when I was asked to give kind of a philosophical talk about the
way I look at my professional work, the title was always ”Theory in Practice.” I think the first time I did this
you were chair of the department, and I had just gotten the “Fletcher Jones” professorship. That was the
title, and I was asked to speak for five minutes on my life as I get this endowed chair at Stanford. My title
was “Theory and Practice.” I remember that in that talk I gave a kind of a spoof. I started out and I said,
“Well, I’ve written so many pages of books, and I’ve published so many papers, and I’ve had so many
students.” I gave a lot of the numerical statistics, and I said, “And that just about sums me up. So now
that I’ve got this chair, I’m going to follow the advice of the Fonz and ’Sit on it.’” I remember I had made a
pretty compelling case for why I was tired and ready to ‘sit on it.’ I scared you to the point where you
were really sweating blood there. Then, of course, in the next sentence I said, “And of course, you know
that this is impossible, and that I couldn’t possibly do this.” And “Whew!” I could see you, you know,
doing this. [Showing relief] That was the first time I gave a talk about “Theory in Practice.” I gave another
one; the next one was actually very interesting. It was given in the Theater of Epidaurus in Greece, the
best preserved ancient theater. It was the keynote speech for the European Association for Theoretical
Computer Science. They had their annual meeting in Greece that year. Greece is the place for
philosophy, and also the words “theory” and “practice” both come from Greek words. So naturally I
decided I would speak on “Theory and Practice” in Greece, and I could speak in this temple of Greek
culture giving this talk. Melina Mercouri was the Greek Minister of Culture, and she introduced me in the
speech. It was a great moment of my life to summarize the roles, the tradeoffs between the two. At that
time I was working on TeX; it was early ’80s. My main message to the theorists is, “Your life is only half
there unless you also get nurtured by practical work.” And, I said, “Software is hard.” My experience with
TeX taught me to have much more admiration for the colleagues that are devoting most of their life to
software than I had previously done, because I didn’t realize how much more bandwidth of my brain was
being taken up by that work than it was when I was doing just theoretical work.
Feigenbaum: While we have just a moment left, if that Greek lecture was written up, do you know where
it is, so the audience could go look it up?
Knuth: I have a book called, “Selected Papers on Computer Science.” George Forsythe told me early
on when I came to Stanford, he said, “Don, sometimes in your life you’re going to be speaking not to
professionals, but you’re going to be talking to a much more general audience. It’s always scary to do
that, because you don’t understand… It’s easier to give a speech to somebody that’s exactly like you
than to somebody who has a different way of thinking.” When I wrote for Scientific American or
something -- every once in a while I would write something that was not addressed to somebody in my
own field. This book, 200 pages or something, contains all of the papers that I wrote in this way. There
are three or four versions, takes on, “Theory and Practice,” including the Fonz Winkler one, are reprinted
in that volume. Thanks for asking.
Feigenbaum: Okay.
Feigenbaum: You’ve reviewed for us what you might call the chronologically-oriented themes of your
career. Pre-Stanford, first Stanford period, and so on, until your retirement -- your pseudo-retirement, I
should say. Cutting through all these are other kinds of themes that touch on in many different points in
the chronological explanation of your life. In my field, I really call these the heuristics of leading a career.
In fact, I told you once that I felt that one of the bad decisions I made in my career was leaving what was
then Carnegie Tech a year too early, before I learned all I had to learn from Herb Simon. I don’t mean
learning the material. Not the content, but the heuristics of leading a life. Could you talk a little bit about
that? If a Ph.D. program is kind of a research training apprenticeship where the students learn these
heuristics, what are they learning from you?
Knuth: I have some slants that I would tend to emphasize. Other professors would emphasize other
slants. I don’t have a monopoly on wisdom of this kind. The kind of things that I would tend to
emphasize are not just doing trendy stuff. In fact, I’d probably overemphasize that. If something is really
popular, I tend to think maybe I back off. I tell myself and my students: really to go with your own
aesthetics, what you think is important. Not what you think other people think you want to do, but what
you really want to do yourself. That’s sort of been a guiding heuristic all the way through. When I was
working on typography, it wasn’t fashionable for a computer science professor to do typography, but I
thought it was important and a beautiful subject. So what? In fact, other people told me that they’re so
glad that I put a few years into it because they could make it academically respectable and now they
could work on it themselves. They sort of were afraid to do it themselves. But all the way through, when
my books came out, they weren’t usually copies of any other books. They always were something that
hadn’t been fashionable to do, but they corresponded to my own perception of what ought to be done.
Also, your word “heuristics” triggers another thing in my mind, and that is George Polya. Polya wrote this
great book called, “Heuristics and Plausible Reasoning.” Of course, I know heuristics is a great word for
you because you had the ”Heuristics Programming Project” and all these things. Heuristic, meaning
discovery. Polya also inspired me. I had the great fortune to get to know him because he’s a Stanford
professor. He came to my house many times and I spent a lot of time with him. One of the things that
cuts across also many years is he had an approach to teaching that he called, “Let Us Teach Guessing.”
When he was teaching his classes, he’s not saying, “Memorize this, guys.” He’s saying, “Here’s a
problem. How do you solve it?”, with the idea that the students are going to make mistakes and then
they’re going to learn how to recover from mistakes, as well as making guesses. These are important
heuristics for my life, both in the teaching aspect and in the research aspect. Let me talk about the
teaching aspect first. Polya gave a demonstration lecture that was filmed at Stanford, and I saw it when I
was still at Caltech. I saw this. He presents the students with a problem, with something like, “You have
a piece of cheese and you cut it with four strokes. How many pieces are you going to get?” Something
like this. Then he has the students try to analyze this. He started out by looking at simpler problems,
where it’s on a plane instead of in three dimensions, and you only take two cuts; things like this. At the
end of the hour he has all the students understanding not only the solution to this problem, but also
having taken apart and discovering the solution themselves. That’s what goes into their brain, because
then they can solve other problems later on. I adopted this as a model for my own classes, already at
Caltech. Whenever I taught a class that had a decent textbook, I would devote the class time to problem
solving as a group, instead of reading to them or lecturing to them about what’s in the book. I would
assume that they could read the book on their own. They come to class, we do things that aren’t in the
book. We take a problem that’s similar to ones in the book and we try to work on it, almost like a
language class. I go down the row and, “It’s your turn, your turn, your turn.” People soon learned that if
they make a mistake, we all do, and we recover. I’d give a rule that nobody’s allowed to speak more than
twice in the hour, so that everybody participates. My teaching assistants would take notes, so that the
students could concentrate on what was going on instead of having to worry about having their notes
right so they couldn’t listen fully. The teaching assistant’s notes would then be typed up later on by
Phyllis and distributed to everyone. So we could record these sessions in the class as to things that
aren’t in the book, and how to recover from errors. I kept that style of teaching all the way through until I
retired. That was a great source of pleasure. I could use it except in the cases where there was no
textbook available. In my own research, this idea of guessing is also very important. When I read a
technical paper, I don’t turn the page until I try to guess what’s on the next page. Or, [say] the guy writing
the paper is going to state a problem. Before I look any further, I’ll say, “Now how would I solve this
problem?” I fail almost always. But I’m ready now to read the solution to the problem, so then I turn the
page and I see. But by this time I’m ready for what’s happening. When I work on “The Art of Computer
Programming,” over a period of 40 years I’ve gathered together dozens of papers on the same subject. I
haven’t read any of them yet except to know that they belong to this subject. Then I read those papers,
the first two papers extremely slowly with this “Don’t turn the page until you’ve tried to solve the problem
yourself and do it yourself.” With this method, I can then cover dozens of papers. The last ones, I’m
ready for. I just know what to look at that’s a little different than I’ve already learned how to absorb.
That’s been a key heuristic in my own research, based on guessing.
Feigenbaum: That’s a really interesting story. In fact, my little footnote to that is that I called my own
project, the “Heuristic Programming Project” because I didn’t want to infringe on John McCarthy’s term,
“Artificial Intelligence.” Stanford Artificial Intelligence Laboratory. Everyone knew what programming
was, but no one knew what heuristics were. When they asked me, I would just quote Polya. I’d say,
“Polya says heuristic is the art of good guessing.”
Feigenbaum: Anyway, I wanted to ask you a little bit about the process of gathering up the literature and
writing them in “The Art of Computer Programming” that you’ve been doing. To go back to Artificial
Intelligence, part of the program is a problem solver, but then there’s the part we don’t understand very
well, which is the problem generator. I’ve always thought of “The Art of Computer Programming” as
some kind of a problem generator for you. In fact, I’ve been jealously thinking of that. As you begin to
put things together, you see the holes.
Knuth: Yeah. The main perk that I get from working on “The Art of Computer Programming” is that I get
first crack at a lot of really natural research problems. Because I’m the only person so far who’s read a
paper by two authors who didn’t know of each other’s existence. I can see where this guy’s ideas fit in
with this guy’s ideas. They’re both working on the same problem, but they don’t realize it because they
have different vocabularies, very often. Artificial intelligence people, you know, have a _________
algorithm or something like this. The electrical engineers are working on a problem with a different
vocabulary, a different slant on it, but they’re thinking of something else. The people in operations
research are thinking of another way. Each person will take the problem and solve it in one respect.
Person B will solve a similar problem in another respect. I get to be the one who solves problem A in
respect to B and vice versa. Often these problems are natural and unify the subject. They tie the
problems in with even more parts of the subject, which make more of a pattern instead of having page 1,
page 2, page 3. Somehow it’s a network instead of a branching structure. Then there are also the other
problems that I can’t solve. Those make good research problems. I usually know somebody in the world
that I can suggest it to, and then science advances that way. But I get first crack at it. If it’s an easy one,
then I have a chance then. It’s fun to do this. The danger is I have to know when to stop. If I couldn’t go
on, if I had to solve a problem before I turned to the next problem, I’d never get to the end of The Art.
Knuth: That’s right. Very good. It’s a lot of work writing “The Art of Computer Programming,” but the big
benefit is this chance to see patterns that other people didn’t have the opportunity to see, because they
just didn’t spend 40 years gathering the material the way I did.
Feigenbaum: I wanted to ask you, again it’s a heuristics question, but it has to do with another
qualitative aspect of picking problems and finding their solutions, which is the aesthetic that you
mentioned. You mentioned that you had an aesthetic, that other people have aesthetics. You’ve
mentioned to me in the past some various criteria that you use in your aesthetic. Do you want to mention
any of those?
Knuth: Okay. For example, when I’m writing a computer program, I could have different aesthetics. I
could say that the program should be the fastest possible, right? Or it could be the one that uses the
smallest amount of memory. Or the one that takes up the smallest number of keystrokes to type. Or the
one that’s easiest to explain to a student. Or the one that’s hardest to explain to a student. There are
lots of different measures that you can apply to a program. Or to anything; to a piece of literature, music,
whatever. You can say, “My goal is to make this best for teenagers” or whatever it is. Somehow you
have an audience in mind, or some criterion. All artists are trying to optimize some constraints or other
that you have in your mind, as to what you consider most beautiful or most important in this particular
piece of work. In the combinatorial work I’m doing now in volume 4, the main goal tends to be speed,
because we have these problems that involve zillions of cases. Every time we can save 100
nanoseconds, if we’re doing it a billion times, that’s an hour. We look for things like that because we
know that everything we do is going to have a large payoff in that way. But other programs, I just want it
to be elegant in a way that hangs together; somebody can read it and smile. There are so many different
criteria of it. But in all cases, the thing that turns me on is the beauty of it and the style that goes with it.
Dijkstra had a great remark about teaching programming. I find style important in programming. Like the
style in IT, in Perlis’ program, was not great. The program worked, but it was sort of bumpy. Another
program I read when I was in my first year of programming was the SOAP II assembler by Stan Poley at
IBM. It was a symphony. It was smooth. Every line of code did two things. It was like seeing a grand
master playing chess. That’s the first time I got a turn-on saying, “You can write a beautiful program.”
I’ve mentioned that several times, because it did have an important effect on my life. I’m worried about
the present state of programming. Programmers now are supposed to mostly just use libraries.
Programmers aren’t allowed to do their own thing from scratch anymore. They’re supposed to have
reusable code that somebody else has written. There’s a bunch of things on the menu and you choose
from these when you put them together. Where’s the fun in that? Where’s the beauty of that? It’s very
hard, [but] we have to figure out a way that we can make programming interesting for the next generation
of programmers, that it’s not going to be just a matter of reading a manual and plugging in the parameters
in the right order to get stuff. I’ve got to say something else, too, that pops into my mind. I saw a review
a year or so ago in Computing Reviews. Someone had written a book, something about tricks of the
trade, or something like this. It was somebody telling how to use machines efficiently by using some of
the less well-known instructions in the machine. The reviewer of the book hated this. He said, “If I ever
caught any of my programmers using anything in this book, I’d fire them.” Of course I immediately went
to the library and got out the book, because this was the book for me. My attitude is, if there’s a method
that works well and it’s not commonly known to students, let’s not stop using it. Let’s teach the students
how to use this so that it’s understandable and it can be used in the next generation. But this guy, he
was saying, “No, no. We already understand all the possible good ways to write programs. I’m not going
to let anybody write for me using anything subtle.”
Feigenbaum: Yeah, that was the kind of thing that I was telling you. Bob Bemer would come down
when I was a graduate student and tell us about these tricks, like unintended side effects of instructions.
“The designer never intended this but you can do this with it.”
Knuth: Of course, I told you about when I’m writing RUNCIBLE and we were saving one line of code
here because we can use one constant for two different purposes. [In the 650] you could store a data
address and you could store an instruction address. You could actually put one constant in there, and you
could store it with one thing and it would zero out one field and store another one. So we could save;
instead of having two constants we could have one, all kinds of stuff like that. That is terrible
programming. I don’t recommend it at all. If you have a machine that has only 2000 words of memory,
okay. But I’m not recommending tricks just because they’re tricks. Although if your aesthetic is to cram
something in small, like you’re writing something for Gameboy or something, and you can put ten extra
features in there without increasing the size of the cartridge, okay, that’s fine. But [for] most of the things,
it’s much more important to have stuff that is not tricky to the point of breaking whenever you make a
slight change to something else. With literate programming you can document this stuff very carefully, to
warn people against it, but still it’s not great [or] to be recommended. But the fact is, a computer doesn’t
slow down when it gets to a part of the program that was harder to understand. The computer doesn’t
say, “Oh, I don’t understand what I’m doing here,” and then go faster like a human being does. So
there’s no reason for us not to put subtle tricks in our programs -- unless we can’t document them enough
so that the person who’s going to have to modify the program won’t be able to fathom it.
Feigenbaum: Don, I wanted to ask you about another word that you have used, and lots of scientists
use the word, difficult to define, but the word is “taste.” Good taste in problems, good taste in finding
problems, good taste in solving problems. Do you want to say anything about good taste?
Knuth: Well, there’s no accounting for taste. I was going to mention how Dijkstra was talking about
style. That is, you want to teach your students that they should have taste, but you don’t want to tell them
to have the same taste as you. You try to give them the idea of taste. You can imagine a music
composer. If Beethoven or Stravinsky or somebody would take on students, would they be a great
teacher if they told them to compose exactly like they did? Or if they said, “Here’s an example of a strong
style. Now develop your own.” That’s what you really want to do. My feeling is it’s important to have
taste driving yourself and to try to refine your taste, but you can’t impose it on somebody else. There’s no
absolute way for me to know that what I believe is beautiful is going to appeal to somebody else. Still, if I
am trying to define beauty by what other people think is beautiful than me, I think I’m making a mistake.
That’s why I was talking about trendy stuff a minute ago.
Feigenbaum: The other issue that you’ve talked about in the past is exercising some control of your
problem selection by knowing what it is you don’t know. Any words of wisdom about that?
Knuth: Well, the best way to learn what you don’t know is to try to program it, as we were saying. Well,
not exactly. Words of wisdom? I don’t know. I often learn what I don’t know by trying to program it for a
computer. But also I found, like in trying to translate something written in another language, if I try to put
it in my own words, then I realize that I don’t know. If I read somebody else’s translation, I don’t get as
much out of it as if I take a text and try to put it in my own words. This exercise of being a teacher, or in
some way putting yourself into it, is the best way for me to discover what I don’t understand. You can
think you understand something until you try to program it, or do some other thing where you are really
not just repeating something but you’re actually processing it.
Feigenbaum: When you discover some of these holes that need to be plugged, some of them are easy
to solve, and some of them you just don’t know what the answer is.
Knuth: Yeah. I remember the first time in my life when I spent more than 10 hours on a problem and
actually got the answer. When you start out in life, when you start doing something that you don’t know,
you think of a question and then you answer it. First then you discover that oh, the Greeks already had
done that. Then you learn a few more things, and you ask some more questions, and you say oh yeah,
this was done in the 17th century, the 18th century, 19th century. Finally you get to something that, for the
first time in your life, you discovered something that, as far as you know, nobody had discovered before.
Then you’re asking questions and you don’t get anywhere with them. You have to go on to the next
question. I do remember there was a time, when all of a sudden… Up until this day, if I couldn’t get it in
the first hour, I didn’t get it even if I spent a week on it. But here was a time when I had actually worked
on something more than 10 hours and I did get the solution. That was the big time in my life to realize
that I could go that far. What I do now, though, is I try to give myself an hour on these problems, and then
I say, “Well no, I’ll have to pass that on to somebody else,” send a letter to somebody who might do it.
Unless I think I’m almost there, if I think “Well, maybe in another five minutes I’ll get the answer.” Then
another hour later, if I still think I’m five minutes from the answer, I keep going at it. Sometimes I’m
trapped in this mode for a week still. But not too often anymore. Just in the past week I sent off two
problems to other people that I thought would be worth their attention, that they might like.
Feigenbaum: I’d like to switch to the personal Don Knuth. We at Stanford know the personal Don
Knuth. The people watching this video or the scholars of 50 years from now may know the professional
and mathematical Knuth, but they won’t have the privilege of knowing the personal Don Knuth. So I
wanted to ask you a few questions that just relate to the Don that we know and love. You say in your
various biographies, you always end by quoting or saying to the reader that your avocation is music, and
if I had to write out my biography like that, I would also. I would also say my avocation is music. I get as
much of a thrill every week by going to the Stanford choruses as anything else that happens in the week.
But your musical background is way more extensive than mine. Could you tell us something about the
role of music in your life, and if there is any connection with your work? What the role of music in
influencing your work has been?
Knuth: Okay. Well it’s certainly one of my greatest loves, is music. We were just talking a minute ago
about taste. I don’t like all kinds of music. Like everybody, I have certain music that really touches me
deeply, and other kinds that I’m not really enthused about. For example, I spent an hour-and-a-half last
night playing through the score of South Pacific. More than half of the songs in there I find really
beautiful. On the other hand, if I were a professor of music, I would have to find a way to distance myself
from opera, because I’ve given opera a good shot many times, and I’ve seen excellent performances, but
it has never turned me on. So everybody has their taste. My own musical tastes are fairly eclectic. I
love jazz, I love to play things by Dave Brubeck, but other kinds of jazz don’t seem to work very well for
me either. I like Beatles music. I don’t get too thrilled by some kinds of hip-hop and so on. Every
generation also has their own favorite kind of music. It must be partly because of the records that my
father played when I was growing up. Things like Brahms’ Symphonies are things that are deeply
satisfying to me now too, by their familiarity, by what I learned. My father was a musician. He was a
church organist, and a pretty good one. He played at the Chicago World’s Fair in the ‘30s before he got
married. I started piano lessons probably when I was five years old. Throughout high school I was the
accompanist for the chorus, for the choir, and I played in the band. I wanted to play bassoon, but that
was taken, so I played tuba -- the sousaphone. Those were the two instruments that you didn’t have to
own yourself. The school owned the tuba and the bassoons, and our family was poor. We didn’t have
money to buy instruments. My dad earned enough money to buy a piano by teaching piano lessons
himself. I did then get into the band as well as the keyboard music. I took a year of organ lessons when
I was in high school, from my piano teacher. I almost became a music major, as I mentioned, in college.
I went into physics but if I had gone to _________ University, I would’ve been a music major there. I
started looking at arranging. I made arrangements for our school band. When I got into college I wrote
the music for a five-minute skit that our fraternity put on. It was called “Nebbishland.” That was when
nebbishes were popular in greeting cards. I don’t know if anybody in the future will know what nebbishes
were, but one of the lines in there was “We’re all on the verge of insanity.” It might bring back some
memories anyway. Nebbishes. “I’m a nebbish and a nebbish isn’t snobbish.” I’ll probably put this great
musical piece of mine into the final volume of my collected papers, which is going to be called “Selected
Papers on Fun and Games.” There’ll be a little bit of music in there that I did for fun over the years.
During the ‘60s, at the church where I was going, I was a member of the choir. I had mentioned to the
choir director and organist that I had taken a year of organ lessons when I was younger. He knew that I
could do some keyboard skills. If we needed a harpsichord accompaniment or something, I could help
out and he could be directing, or I could go to the console while he’s directing and I can be playing. One
Saturday I got a phone call from his wife saying, “My husband has just come down with a detached retina
in his eye.” In those days, the only way to cure this was for him to sit still for six months with a pack
holding his head steady. She said, “Don, did you say that you knew a little bit about the organ? Can you
play on Sunday and be our temporary organist?” That’s what happened. For six months I was the
organist at our church in Pasadena. Fortunately Pasadena was the home of some of America’s best
organists. There was a famous teacher, Clarence Mader, and five of his students who are still located in
the Pasadena area. If you look at the National Recitalists of the American Guild of Organists, five of them
are from Pasadena. There are others from the east and all around, but we had a very good concentration
of this. So I joined the American Guild of Organists and got to see some very excellent musicians. At
that time I learned something about the literature of the organ. I thought hey, it would be cool in the future
if I sort of was a college professor with an interest in organ. If I had 40 more years to look at this music,
there was some neat pieces of organ that are so good I could never get tired of them and I could learn to
play. When I had my year in Princeton between Caltech and Stanford, I took organ at the Westminster
Choir College. I had a teacher there and I had some other classes there at the college as part of my
year. My teacher, Mary Krimmel, taught me a lot about how to perform. Also, I had made some friends in
Pasadena that had an organ in their own house, and that seemed kind of interesting to me. My father
also had an organ. It was an electronic organ but he had an organ in our house in Milwaukee. When Jill
and I were planning our dream house to be built on the Stanford campus, we decided that we would have
two special rooms in the house. One was my room where I would have a music room and have room for
an organ, and one was her art room, a studio, where she’d have good lighting for working on her art
projects. We couldn’t afford to put in an organ at the beginning, but the architect made sure that there
would be enough bracing in the floor to handle several tons of weight and there was a nice 16-foot ceiling
so that we would have room for a good organ. I spent the next few years thinking about what kind of an
organ would be good to have in the home. Peter Naur in Copenhagen introduced me to five great organ
builders in Denmark. The year that I spent in Norway, I visited him also for a week and talked to some of
the world’s greatest organ makers that he could introduce me to in Denmark. I found out, though, that I
couldn’t buy a Danish organ with any reasonable economic certainty. Because the way it works in
Denmark was that they don’t give you a fixed price on it. The Danish labor contracts are tied to the rate
of inflation. I would have to give them a blank check and say, well, whatever it costs, I would have to pay.
What happened then is that I also talked to American organ builders. I found a very fine one whose shop
is near UCLA, and we hit it off very well. I started going around to all the organs around the Bay Area
and all the Stanford organs and listening to each pipe and each note and making notes, and then worked
with the builder, Pete Seeker [ph?], down in the Los Angeles area. It turned out then that they built an
organ for my house. It’s a nice company that builds about four organs a year. They made an organ for
my house, and I still haven’t seen another house organ that I would rather have than it. It’s designed to
be enjoyed by the person playing it, rather than for the audience. But it really has a lot of varieties of
tone.
Feigenbaum: Why don't you continue with your discussion of the organ?
Knuth: My main hobby had turned out to be then the focus on organs, and this had lots of interesting
little side stories. I'll give you a few. In the first place, I'm making the arrangements for this organ in my
home just at the time when I'm finishing volume three. I have a few jokes in the indexes to my books.
Some of them haven't been discovered yet. Like one of them in the TeX [book] that people found.
There's one that I think, if you look under “ten“… No, no, I'm sorry. If you look under, oh what's her
name? She was the star of the movie "Ten." Oh, goodness , you know the movie I'm talking about?
Knuth: Anyway, she's this beautiful woman, and I put her name in the index of the book. If you look
there it will just tell everywhere the number ten appears in the textbook; you can find it indexed that way.
In volume four I have a place in the index where it says "pun resisted." It refers you to a page, and you're
supposed to figure out what pun that I could have made on that page that doesn't appear. I have fun with
my indexes. I try to make them useful, but it takes me six weeks to write them so I have to do something
to amuse myself during that six weeks. In volume three if you look under "royalties, use of" you get to a
page that has a picture of organ pipes on it, because this is what allowed me to get an organ in my
house. In those days it cost $35,000. Other people on the block, their house cost $35,000 in those days.
You can't believe it now, but that was true. That's one little story. That was actually put in before the
organ was built, but I had to sign a contract some years in advance. Through the years, then, the fact
that I can play organ has given me intro to lots of the great organs of the world. I don't have to be a great
organist, I just have to be pretty good for a computer scientist. Then the leading computer scientist, my
host wherever I am, Mexico City or Paris or whatever, will know somebody who knows an organist. Then
I get introduced, and they'll take me over there, and I get to play on the organ. I've played on the world's
best organs. I played on the largest organ too. I got to a point where I had sort of given plenty of
lectures, and I couldn't accept any invitations to travel to give a lecture. But a guy in Philadelphia wrote to
me, and he said, "Don, we really want you to lecture at Drexel University." He says, "Now about organs."
He said, "If you come I'll let you play on the Wannamaker organ, which was the largest musical
instrument in the world. Then we can go to Eaton Hall, and then we can go to Benjamin Franklin…this
old American organ", and so on. He arranged four great organs for me to play in the Philadelphia area,
so naturally I went to speak at Drexel University.
Knuth: The last time I was in Paris I got to play on a really great unique organ. I had two hours to play
on it. I went to Israel, I could play on the organ in the Mormon Center, wherever. This fall I'll be playing in
Bordeaux. I was in Zurich -- organ -- last year, or a year and a half ago. I don't have to be a great
organist in order to have these opportunities. I just have to be a computer scientist who is not too bad. I
don't play in public very much. The one exception, really, was at the University of Waterloo about five
years ago. They have an organ professor there, and he and I put on a concert of organ duets -- music
written for two performers at one or two organs. I practiced with him several times for this. That was the
highlight of my organ playing, where we put this on. The music was, I guess, broadcast a couple of times
on Canadian radio as well. I got to work with a really fine organist. On my web page there's a reference
to the program that we played, some very interesting music.
Feigenbaum: Yeah, I was going to say -- if they broadcast it, you might have tapes, and you can put
them on the website.
Knuth: Yeah, I think I might have a tape in my collection somewhere. Good idea to try to put it on the
Internet.
Feigenbaum: Don, let me move on to something which is important in your life and which we all know
about. The world didn't really know much about it until you published the book "3:16". Namely, your
religious belief, and your studies of religion and religious thought. Do you want to say anything about
that?
Knuth: Well, yeah. This is the Computer History Museum, but it is part of my…
Knuth: That's right. The thing is, I think computer science is wonderful, but it's not everything.
Throughout my life I've been in a very loving religious community. My father -- also my mother, it wasn’t
her career, but she sang in the church choir for 60 years -- but my father dedicated his life to being an
educator in the Lutheran school systems. I was raised in Lutheran schools, and Lutheran high school,
before I went to college. I come from a Midwestern Lutheran German background that has set the scene
for my life. This is something that I've gotten to appreciate, that Luther was a theologian who said you
don't have to close your mind. You keep questioning. You never know the answer. You don't just blindly
believe something. Also, he had ways of making it both intellectual and faith, as a combination. That's
part of my background. I had a lot of exposure to it as I'm young. I'm also a scientist. On Sundays I
would study with other people of our church on aspects of the Bible and other topics in religion. I got this
strange idea that maybe -- the Bible is a complicated subject -- maybe I could study the Bible the way a
scientist would do it, by using random sampling. Like a Gallup poll. You have a complicated thing and
you want to look at a small number of samples. You talk to 1000 people and you try to find out what the
sentiment is in the United States about something. I thought, well what if I did this with the Bible? This
was a complicated book. There's been tens of thousands of books written about the Bible. Instead of
somebody else telling me what parts of the Bible to look at, what if I just chose parts that were selected in
an unbiased way? I was doing this also with other things. I wrote a paper… About this same time we
had this conference which was a pilgrimage to Khwarizm [now Khiva, in Uzbekistan] . The word
algorithm means “from Khwarizm”; it's an Arabic word. We went to Khwarizm, and I gave a talk there
trying to analyze what is the difference between a mathematician and a computer scientist. We did this
by looking at page 100 of many books. We sampled the works of mathematicians to find out, what do
mathematicians do? From an AI point of view we tried to say what would we have to program in the
computer in order to reproduce page 100 of these books. It's this idea of sampling. I was using it also for
grading term papers. A student gives me a 50-page paper. I don't have time to read 40 of these papers
and get my grades done. I only have a week to do the grading. So I would look at parts of the term
paper. The student wouldn't know which parts I was going to look at, but I would look at parts of them
and I would use that to assess the quality of the whole. I got in trouble with this Master's thesis about the
CS bookkeeping problem, that we talked about last week. But anyway, I'm using sampling. So I said, let
me do it to the Bible too. I wanted to have a rule. I would study this with my friends at our church in
Menlo Park. We would, as a group, discuss randomly chosen parts of the Bible. The rule I decided on
was we were going to study Chapter 3, Verse 16 of every book of the Bible. Genesis 3:16, Exodus 3:16,
and it ends with Revelation 3:16. The reason is, that if any part of the Bible is known by its number, it's
John 3:16. There's a verse in the Bible that people put up on Super Bowl Sunday, and it's supposed to
be a capsule summary of the gospel. A lot of people knew it; 3:16 had a catchy phrase in people's
minds. I said, "Okay, we all know John 3:16. Nobody knows what's Genesis 3:16." Well it turned out
very interesting. It's about women's liberation. Exodus 3:16, and so on. I mentioned Peter Wegner last
week. Well, his wife Judith Wegner is a great scholar of women's issues in the Hebrew scriptures. She
couldn't believe it, but three of the verses that I chose are key verses in her own studies. It just turned out
a really strange coincidence. Isaiah 3:16 talks about women strutting. Anyway, it's very funny. It was
just serendipity, but in fact there's a nice joke about it. Somebody called it a “cross section” of the
scriptures because of the cross in Christian theology. I did this with my friends in Menlo Park. Actually I
had announced that we were going to meet the next Sunday and we were going to study Genesis 3:16,
Exodus 3:16. Then I came down with an attack of kidney stones, and I was in the Stanford Hospital. We
couldn't meet for our first session of this group. But I looked and I was in hospital room 316. So I said,
"Whoa. Well, God wants me to continue with this project."
Knuth: So we went through, and the class grew in interest all the way through. It could have been a real
dud [if] all these are really boring, but it didn't happen that way. It was sustained all the way through, and
people got inspired. Some of the women in the class were very good at calligraphy, and they would take
these verses and they would write them beautifully, and we'd put them up in front of us as we're studying
the things. I had this experience in the late '70s where sampling gave some insight into the complicated
thing called the Bible. All of a sudden I get this “Aha!” moment in the middle of the night after I met
Hermann Zapf and a whole bunch of other experts on letter forms. I'm working on the TeX project in the
early '80s and I said, "Boy, this class that we did on the 3:16 turned out to be really interesting for us. It
would be interesting to other people too. We could make it into a book. What if I asked Hermann if he
would do a few pages of a book for me like the women in the class had been doing?" He was sort of the
dean of all the calligraphers of the world. He's the god to the calligraphers and he knows all the
calligraphers everywhere. I didn't really dream of asking him to do it, but I asked him to do the cover. I
said, "Herman, I got this strange idea for a book on 3:16. Can you make for me the most beautiful 3
that's ever been drawn in the history of mankind, and the most beautiful colon, and the most beautiful 1
and 6 to go with it, and make it for the cover of the book?" He sends me back a letter. He says, "Don this
is wonderful" and he also gives me sketches of a couple other verses that he looked at in his German
Bible. He says, "Don, I know the best calligraphers in every country of the world. We could invite them,
each one, to do a page." So he's on the bandwagon. I got him, and everybody loves him. To make a
long story short, as I'm on my sabbatical year in Boston I also am going to the Boston Public Library
several hours looking up what all the greatest writers about the Bible have said through the centuries
about Genesis 3:16, et cetera, et cetera. I made my own translations of these verses. This ties in with
your question a minute ago about how do you know what you don't understand? I'm thinking if I really
want to understand Genesis 3:16 I shouldn't read somebody else's translation, but I should look at the
Hebrew words and how those words have been used in other parts of Genesis and so on; how other
people have translated these. I copied out 60 different translations of each verse, so I knew that in my
own translation every mistake I made had also been made by at least ten other people. Then I made my
translation, and I sent out a letter -- Herman and I signed it, both -- commissioning the best artists all over
the world to do these verses as a page in the book. While I'm in Boston these artworks started coming.
It's like getting a Christmas present every day, with these beautiful mailing tubes and all of that. I mean,
the calligraphers also write beautiful letters; "Dear Don" and things like this. Many stories involved with
individual pages later on. That's why I know I can say that the graphic artists are the best people in the
world, because Jill and I have met most of these people in subsequent years. We didn't know very many
of them, of course, in the beginning. Then I had to write the actual text to go with it. I could go to Harvard
Theological Library, and the Boston Public[Library], and I spent a few days at Yale Divinity School, and
the Graduate Theological Union at Berkeley has a great library. Here in Menlo Park we've got an
excellent library in St. Patrick's Seminary. And all the theological literature is well indexed, so that I can,
like Jonah 3:16, you can see what articles in the theological literature have been written that refer to
Jonah 3:16. So I'm not just having a cross-section of the Bible, but all the secondary literature about the
Bible. There's all these tens of thousands of books. I can just look at a few parts of them that are
relevant to this thing, and I can crack open books that I would never see before. For example, John
Calvin writes 90 volumes about theology. But I'm a Lutheran. Why should I ever read any of these? But,
no. Now I look at a few pages of John Calvin. He wrote about Genesis 3:16. Okay, good. I find out he's
got some insights that none of the other people had. I get to appreciate John Calvin. I get to appreciate
St. Patrick. I get to appreciate people from early days of Christianity, different people in the 17th Century,
18th Century, 19th Century, 20th Century, all the different streams; atheist, Jews. Not too many Muslims,
but there was some connection with India and so on that came up. It turned out to be really interesting.
This idea of sampling turned out to be a good time-efficient way to get into a complicated subject. The
result was that I actually got too confident that I knew much more. I started to feel that I knew more about
the Bible than I actually had any right to do, because I'm only studying less than 1/500th of the Bible. But
the thing is, people have this idea. There's a classical definition a liberal education is that you know
everything about something and something about everything. Now I had the point where there were a
few things that I knew everything about. I mean, I had 60 pegs of things that I had researched and I had
found out just about everything that had been written about these small parts of the Bible, but these I had
surrounded. There was nothing vague, so everything else in the Bible sort of could be tacked onto
something solid. It gave me more of a secure feeling that I understood the Bible scholarship than I really
did. But it really shows that this methodology has a lot of merit as long as you don't bias it to a particular
way. It turned out to be an educational experience for me. I met these wonderful artists, and their work
was shown all around the world. It was supported by the National Endowment for the Arts, and it got into
many countries. It was shown in the Guinness Museum in Dublin, and greatest places like that. I saw
some of the work in San Francisco in February on exhibit still.
Knuth: The original deal was with the artists that they retained possession of the work, and I was paying
them some money for the reproduction rights. What eventually developed was that the collection was so
good, there was a strong feeling that it ought to be kept intact. So we wrote to them saying was it okay if
the San Francisco Public Library wants to keep it in their Harrison Collection. They have the world's best
collection of calligraphy, and they would like to accession these works into their collection. The artists
agreed to this, so that's what happened.
Knuth: Yeah. So I had to come out of the closet saying, "Oh, I'm going to write a book about the Bible."
Well, Isaac Asimov did this. I mentioned this for the first time to somebody when I'm living in Boston that
year on my sabbatical year. There was an ACM SIGCSE convention there -- the computer science
education group -- and I mentioned that I was spending my time at the library looking up these Bible
verses. I thought they would say, "Oh, gosh, you're over the hill now, Don." But surprisingly, people to
my face didn't really laugh at me too much. It's something that I never would talk about in a Stanford
class, but this is a part of my life that integrated with it.
Feigenbaum: I'd like to see if we can bring this full circle by getting into, finally, two aspects of your
career looking back a little bit and looking ahead a little bit. We know you, and the world knows you,
pretty much -- and you've said it yourself I think on the web somewhere -- that you are pretty much a lone
wolf. In fact, I think you even said it last week in this interview.
Feigenbaum: You operate by yourself. We all know that. We leave you alone. You've cut yourself off
from email. You work in your study for long hours. Two questions about that. Is that a myth? Because
you keep talking about all the places you've traveled to and all the people you see. And the other is, how
do you feel about working with collaborators?
Knuth: Okay. I think I mentioned last week that the trouble… I enjoy working with collaborators, but I
don't think they would enjoy working with me, because I'm very unreliable. I march to my own drummer
and I can't be counted on to meet deadlines because I always underestimate things. I'm not a great co-
worker. Also I'm very bad at delegating. That's why I resisted any chance, any opportunity to be the chair
of the department. I knew that I would just be awful at [it]. I'd have to do it myself. I have no good way to
work with somebody else on tasks that I can do myself. I'm just unable to. It's a huge skill that I lack.
With the TeX project I think it was important, however, that I didn't delegate the writing of the code, as we
said before. I needed to be the programmer on a first generation project. I needed to write the manual
too, a manual. I can't understand… Other users write the manual their way, but I had to write a manual
too. If I delegated that, I wouldn't have realized some parts of it are impossible to explain, and I changed
them as I wrote the manual realizing that it was a stupid thing that was there. So I was the tech writer of
the project. I was a user of the project. I had to use TeX in order to typeset volume two. As I'm
typesetting volume two, I kept track of the changes that I made to TeX as I went through volume two. It
turned out almost a perfect straight line: every four pages I type, I got a new idea for how to improve TeX.
For the first 500 pages of that 700-page book, I got a new idea every four pages. The last 200 pages
were sheer boredom and I didn't do it. With 500 pages, if I hadn't been the user, I would not have had
such a good system. I had to live with it before I gave it to somebody else. These are cases in my life
where I think it's a good thing I didn't delegate. Then again, with the TeX project, once it was there, once
I had this prototype out there, then we're getting more and more users. Then we would have every
Friday, for two or three hours, a community meeting of several dozen people discussing questions,
issues, problems with TeX, how to make it better, how to adapt it to their problems. Everybody coming to
Stanford knowing about this could join our sessions on Friday. Also there was quite a team of volunteers
associated with the project. There I'm working together with the group, but I'm still insisting that I be the
final filter on the stuff. Now I should have mentioned, on this giant component work that we did, I
mentioned that Philippe Flajolet and Boris Pittel were involved with it. But also what turned out is, I got it
to a point where I couldn't prove some of the main theorems. I met Svante Janson, one of the greatest
mathematicians, a Swedish guy. I was visiting Norway, so I went up to Uppsala to show him my work on
this. He got enthusiastic about it, and he saw how to get me to the next level of some things that I was
stumped on. Then he had a visitor from Poland who was there, so it turned out that our paper was
published under four authors. He was the leading author because of alphabetical order, and so that's a
joint work. Svante and Tomasz [Luczak] and myself and Boris all worked on the drafts of this. It's a giant
paper too. It filled a whole issue of a journal, I don't know, 120 pages or something like this. We went
through many, many drafts of this, all working together on it. So it's not that I can't ever work with
anyone. In the same room with a person, I think I mentioned that I couldn't think when I was with
Marshall Hall, and so on.
Knuth: There was a guy in Princeton who was my office mate, and he and I were perfectly tuned to each
other. Ed Bender. Ed and I, I mean, he could start a sentence and then I would know. Then we would
work on a problem and I would take it as far as I could. Then I'd be stuck and then he would know how to
do the next thing. Then he would be stuck and then I could take it over. Once in a while you find
somebody where you can really do this online interaction, and the synchronization problem is nil, and it
works very well. But I found that actually terrifying, because it would be your responsibility that we had to
invent science whenever we were together. I already had promised to do so many other things, if I get
more stimulation it’ll kill me. I have to finish The Art of Computer Programming, and all these other
things. So part of my being a loner is in order to fulfill the responsibilities that I have already
accumulated. And knowing that I'm not that great for integrating in with somebody else's agenda; I've
got too strong opinions of my own as to what I have to do. On the other hand, I came to Stanford so that I
could collaborate, so that there would be a music department that I could be with. Caltech didn't have a
music department. At Stanford I could be a chair of the Ph.D. thesis committee, I could chair the oral
examination of music students, and students of German, and things like this. I like to come to a place
where there are people who aren't clones of myself that I can learn from. Whenever I am stumped on
something, I can turn to them, and they can help me learn this. I need people to help me read German
and French and Japanese and Russian things, Sanskrit, and so on, when I get into a historical question
where I don't have a translation handy. All kinds of parts of the university are very important that I
collaborate in that way, but not so often where it's a long term project.
Feigenbaum: I think, in all of that, there's probably -- I don't want to go into it here because it's your
interview not mine -- but I think there are some deep issues there having to do with problem solving and
concentration. When you're into something really deep, like piecing together 14 different articles on one
subject to try to make sense of them, it's straining all the limited human information processing abilities,
and you really can't stand a lot of input. Too many symbols change context.
Knuth: Yeah, it's a bandwidth question. It's easier for my left brain to communicate with my right brain
than for me to communicate with another person's brain.
Feigenbaum: Don, there's a little last thing I want to talk about. This is a little bit of a paradox. Well not
so much; you said you like collaboration, so it's not really a paradox. But you say, I think somewhere on
your website or else in one of your publications, that you're predicting that the future of computer science
will be in terms of contributing pairs of people.
Knuth: Yeah.
Feigenbaum: One a computer scientist, and one somebody from another discipline. I'd like to have you
talk about that, particularly in view of the fact that you did string search algorithms, and yet you did not
collaborate with a biologist; those people have the most intensive string search problems that there are
today.
Knuth: Right. Well, to take them in last-in first-out order: the biologists didn't have those problems in
1972.
Knuth: The human genomes, now we've got all this data, but there wasn't such data then. Certainly if I
was doing the work now, it would be a different thing. But this pairwise thing is a notion that I have that
might be way off the wall. I didn't limit it to computer scientists and X. I was viewing it as a university as a
whole, including humanities, medicine, everything. I'm saying knowledge in the world is exploding, and
there are so many things now, that trying to look at the way a university might be 100 years from now
compared to the way universities have evolved up to this point, in the following way. Up until this point
we had subjects, and a person would identify themselves with what I call the vertices of a graph, where
one vertex would be mathematics. Another vertex would be biology. Another vertex would be computer
science, a new vertex on the block. There would be a physics vertex, and so on. Then, okay, there was
biophysics and things. There was English, and Spanish, Latin. But people identified themselves as
vertices, because these were the specialties. You could sort of live in that vertex, and you would be able
to understand most of the lectures that were given by your colleagues. The subjects were getting bigger
and bigger, but, still, we used to be able to have a computer science colloquium every week and
everybody would come and we would know. But knowledge is growing and growing to the point where
nobody can say they know all of mathematics, certainly. But also there's also so much interdisciplinary
work now, where we see a mathematician can study the printing industry and see that some of the ideas
of dynamic programming apply to book publishing. Wow! There's interactions galore wherever you look.
You mentioned the electrical engineer who gets a Nobel Prize for medicine because he can do CT
scanning, or whatever. My model of the way the future might go is that people wouldn't identify
themselves with vertices, but rather with edges, with the connections between. Each person is a bridge.
Each person is a bridge between two other areas, and that they identify themselves by the two sub-
specialties that they happen to have a talent for. Then it's more of a network than a group of
departments. It doesn't mean that I'm a loner, but that I'm communicating with the other people who are
branches in the adjacent fields. This is the context in which that remark came up.
Knuth: …world. We're going to find that most of the people we talk to are people that have one foot in
the same place than we do.
Knuth: Yeah.
Feigenbaum: Don, thank you for sharing all of this with everyone. Not only everyone now, but everyone
50, 100, 200 years from now.
Knuth: Well, thank you for directing it all this way. I hope I can do half as well when I have to sit in your
shoes next time.
END OF INTERVIEW
Guest Editor
Peter Neumann The Complexity
of Songs
DONALD E. KNUTH
Every day brings new evidence that the concepts of By the Distributive Law and the Commutative Law [4],
computer science are applicable to areas of life which we have
have little or nothing to do with computers. The pur-
c= n- (V+R)m + mV
pose of this survey paper is to demonstrate that impor-
tant aspects of popular songs are best understood in = n- Vm-Rm + Vm (3)
terms of modern complexity theory.
=n-Rm.
It is known [3] that almost all songs of length n re-
quire a text of length ~ n. But this puts a considerable The lemma follows. [3
space requirement on one's memory if many songs are
to be learned; hence, our ancient ancestors invented (It is possible to generalize this lemma to the case of
the concept of a refrain [14]. When the song has a verses of differing lengths V1, V2. . . . ~Vm, provided that
refrain, its space complexity can be reduced to cn, the sequence (Vk) satisfies a certain smoothness condi-
where c < 1 as shown by the following lemma. tion. Details will appear in a future paper.)
A significant improvement on Lemma 1 was discov-
LEMMA 1. ered in medieval European Jewish communities where
Let S be a song containing m verses of length V and a an anonymous composer was able to reduce the com-
refrain of length R where the refrain is to be sung first, plexity to O(x/n). His song "Ehad Mi Yode'a" or "Who
last, and between adjacent verses. Then, the space Knows One?" is still traditionally sung near the end of
complexity of S is ( V / ( V + R)) n + O(1) for fixed V the Passover ritual, reportedly in order to keep the chil-
and R as m ~ oo. dren awake [6]. It consists of a refrain and 13 verses
vl . . . . . v13, where v~ is followed by vk-1 • .. v2vl before
PROOF. the refrain is repeated; hence m verses of text lead to
T h e l e n g t h of S when s u n g i s 1/2m2 .-F O(m) verses of singing. A similar song called
"Green Grow the Rushes O" or "The Dilly Song" is
n =R+(V+R)m (1)
often sung in western Britain at Easter time [1], but it
has only twelve verses (see [1]), where Breton, Flemish,
while its space complexity is
German, Greek, Medieval Latin, Moldavian, and Scottish
c = R + Vm. (2) versions are cited.
The coefficient of ~n was further improved by a Scot-
tish farmer named O. MacDonald, whose construction~
appears in Lemma 2.
The research reported here was supported in part by the National Institute of
Wealth under grant $262,144. ActuallyMacDonald'spriorityhas beendisputedby somescholars;Peter
Kennedy([8],p. 676)claimsthat "1BoughtMyselfa Cock"and similarfarm-
©1984ACMO001-0782/84/0400-0344 75¢ yard songsare actuallymucholder.
lish. It requires only O(m) space to define Tk for all Acknowledgment I wish to thank J. M. Knuth and J. S.
k < 10" since we can define Knuth for suggesting the topic of this paper.
Tq.lo.,.r = Tq ' times 10 to the ' Tin' plus ' T, (14)
REFERENCES
for 1 _< q _< 9 and 0 _< r < 10m-L 1. Rev. S. Baring-Gould, Rev. H. Fleetwood Sheppard, and F.W. Bus-
sell, Songs of the West (London: Methuen, 1905}, 23, 160-161.
Therefore the songs Sk defined by 2. Oscar Brand, Singing Holidays (New York: Alfred Knopf, 1957), 68-
69.
So=e,Sk= VkSk-1 for k>_ 1 (15) 3. G.J. Chaitin, "On the length of programs for computing finite binary
sequences: Statistical considerations," J. ACM 16 (1969), 145-159.
have length n X k log k, but the schema which defines 4. G. Chrystal, Algebra, an Elementary Textbook (Edinburgh: Adam and
them has length O(log k); the result follows. [3 Charles Black, 1886), Chapter 1.
5. A. D6rrer, Tiroler Fasnacht (Wien, 1949), 480 pp.
Theorem 1 was the best result known until recently 3, 6. Encyclopedia Judaica (New York: Macmillan, 1971), v. 6 p. 503; The
perhaps because it satisfied all practical requirements Jewish Encyclopedia (New York: Funk and Wagnalls, 1903); articles
for song generation with limited memory space. In fact, on Ehad Mi Yode'a.
7. U. Jack, "Logarithmic growth of verses," Acta Perdix 15 (1826),
99 bottles of beer usually seemed to be more than suffi- 1-65535.
cient in most cases. 8. Peter Kennedy, Folksongs of Britain and Ireland (New York: Schirmer,
1975), 824 pp.
However, the advent of modern drugs has led to
9. Norman Lloyd. The New Golden Song Book (New York: Golden Press,
demands for still less memory, and the ultimate im- 1955), 20-21.
provement of Theorem 1 has consequently just been 10. N. Picker, "Once sefiores brincando al mismo tiempo," Acta Perdix
12 (1825), 1009.
announced: 11. ben shahn, a partridge in a pear tree (New York: the museum of
modern art, 1949), 28 pp. (unnumbered).
THEOREM 2. 12. Cecil J. Sharp, ed., One Hundred English Folksongs (Boston: Oliver
There exist arbitrarily long songs of complexity O(1). Ditson, 1916), xlii.
13. Christopher J. Shaw, "that old favorite, A p i a p t / a Christmastime al-
PROOF: (due to Casey and the Sunshine Band). Consider gorithm," with illustrations by Gene Oltan, Datamation 10, 12 (De-
cember 1964), 48-49. Reprinted in Jack Moshman, ed., Faith, Hope
the songs Sk defined by (15), but with and Parity (Washington, D.C.: Thompson, 1966), 48-51.
14. Gustav Thurau, Beitr~ge zur Geschichte und Charakteristik des Refrains
Vk = 'That's the way,' U 'I like it, ' U (16) in derfranzosischen Chanson (Weimar: Felber, 1899), 47 pp.
15. Marcel Vigneras, ed., Chansons de France (Boston: D.C. Heath. 1941),
U = 'uh huh,' 'uh huh' 52 pp.
for all k. [3
Permission to copy without fee all or part of this material is granted
It remains an open problem to study the complexity provided that the copies are not made or distributed for direct commer-
of nondeterministic songs. cial advantage, the ACM copyright notice and the title of the publication
and its date appear, and notice is given that copying is by permission of
3 The chief rival for this honor was "This old man, he played m, he played the Association for Computing Machinery. To copy otherwise, or to
knick-knack...'. republish, requires a fee a n d / o r specific permission.
Related Resources
Twenty Questions for Donald Knuth Store Articles Blogs
By Donald E. Knuth
May 20, 2014 Learning MIT App Inventor: A
Hands-On Guide to Building
Your Own Android Apps
By Derek Walter, Mark Sherman
⎙ Print + Share This 💬 Discuss Page 1 of 1
$27.99
Check informit.com/knuth throughout 2014 to purchase Vol 3-4A eBooks as they become
available. If you want email notifications, send an email to taocp@awl.com.
1. Jon Bentley, researcher: What a treat! The last time I had an opportunity like this was at
the end of your data structures class at Stanford in June, 1974. On the final day, you opened
the floor so that we could ask any question on any topic, barring only politics and religion. I still
vividly remember one question that was asked on that day: "Among all the programs you've
written, of which one are you most proud?"
Your answer (as I approximately recall it, four decades later) described a compiler that you
wrote for a minicomputer with 1024 available bytes of memory. Your first draft was 1029 bytes
long, but you eventually had it up and running and debugged at 1023 bytes. You said that you
were particularly proud of cramming so much functionality into so little memory.
My query today is a slight variant on that venerable question. Of all the programs that you've
written, what are some of which you are most proud, and why?
Don Knuth: I'd like to ask you the same! But that's something like asking parents to name their
favorite children.
1 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
change the world, and because they led to many friendships. Furthermore they've made these
eBooks possible: I'm enormously happy that the work I did more than 30 years ago has
miraculously survived many changes of technology, and that the 3,000 pages of TAOCP now
look so great on a little tablet—even after zooming.
While I was preparing for Volume 4 of TAOCP in the 90s, I wrote several dozen short routines
using what you and I know as "literate programming." Those little essays have been packaged
into The Stanford GraphBase (1994), and I still enjoy using and modifying them. My favorite is
the implementation of Tarjan's beautiful algorithm for strong components, which appears on
pages 512–519 of that book.
I have to admit some pride also in the implementation of IEEE floating-point arithmetic that
appears in my book MMIXware (1999), as well as that book's metasimulator for MMIX, in which I
explain many principles of advanced pipelined computers from the ground up.
Literate programming continues to be one of the greatest joys of my life. In fact, I find myself
writing roughly two programs per week, on average, both large and small, as I draft new
material for the next volumes of TAOCP.
2. Dave Walden, Users Group: Might you publish the original 3,000-page version of
TAOCP (before the decision to change it into seven volumes), as a historical artifact of your
view of the state of the art of algorithms and their analysis circa 1965? I think lots of people
would like to see this.
Don Knuth: Scholars can look at the handwritten pages that led to Volumes 1–3 by going to
the Stanford Archives, and all of the remaining pages will be deposited there eventually. I see
little value in making those drafts more generally available—although some of the material
about baseball that I decided not to use is pretty cool. Archives from the real pioneers of
computer science, who wrote in the 40s and 50s, should be published first.
I do try to retain the youthful style of the original, in the pages that I write today, except where
my first draft was embarrassingly naïve or corny. I've also learned when to say "that" instead of
"which," thanks in part to Guy Steele's tutelage.
3. Charles Leiserson, MIT: TAOCP shows a great love for computer science, and in particular,
for algorithms and discrete mathematics. But love is not always easy. When writing this series,
when did you find yourself reaching deepest into your emotional reservoir to overcome a
difficult challenge to your vision?
Don Knuth: Again, Charles, I'd like to ask you exactly the same question!
For me, I guess, the hardest thing has always been to figure out what to cut. And I obviously
haven't been very successful at that, in spite of much rewriting.
The most difficult technical challenge was to write the metasimulator for MMIX. I needed to do
that behind the scenes, in order to shape what actually appears in the books, and it was surely
the toughest programming task that I've ever faced. Without the methodology of literate
programming, I don't think I could have finished that job successfully.
Many of the "starred" mathematical sections also stretched me pretty far. Overall, however,
after working on TAOCP for more than fifty years, I can't think of any aspect of the activity
where the effort of writing wasn't amply repaid by what I learned while doing it.
4. Dennis Shasha, NYU: How does a beautiful algorithm compare to a beautiful theorem? In
other words, what would be your criteria of beauty for each?
Don Knuth: Beauty has many aspects, of course, and is in the eye of the beholder. Some
theorems and algorithms are beautiful to me because they have many different applications;
some because they do powerful things with severely limited resources; some because they
involve aesthetically pleasing patterns; some because they have a poetic purity of concept.
For example, I mentioned Tarjan's algorithm for strong components. The data structures that he
devised for this problem fit together in an amazingly beautiful way, so that the quantities you
need to look at while exploring a directed graph are always magically at your fingertips. And his
algorithm also does topological sorting as a byproduct.
2 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
It's even possible sometimes to prove a beautiful theorem by exhibiting a beautiful algorithm.
Look, for instance, at Theorem 5.1.4D and/or Corollary 7H in TAOCP.
5. Mark Taub, Pearson: Does the emergence of "apps" (small, single-function, networked
programs) as the dominant programming paradigm today impact your plans in any way for
future material in TAOCP?
Don Knuth: People who write apps use the ideas and paradigms that are already present in
the first volumes. And apps make use of ever-growing program libraries, which are intimately
related to TAOCP. Users of those libraries ought to know something about what goes on inside.
Future volumes will probably be even more "app-likable," because I've been collecting tons of
fascinating games and puzzles that illustrate programming techniques in especially instructive
and appealing ways.
6. Radia Perlman, Intel: (1) What is not in the books that you wish you'd included? (2) If you'd
been born 200 years ago, what kind of career might you imagine you'd have had?
Don Knuth: (1) Essentially everything that I want to include is either already in the existing
volumes or planned for the future ones. Volume 4B will begin with a few dozen pages that
introduce certain newfangled mathematical techniques, which I didn't know about when I wrote
the corresponding parts of Volume 1. (Those pages are now viewable from my website in
beta-test form, under the name "mathematical preliminaries redux.") I plan to issue similar
gap-filling "fascicles" when future volumes need to refer to recently invented material that
ultimately belongs in Volume 3, say.
(2) Hey, what a fascinating question—I don't think anybody else has ever asked me that before!
If I'd been born in 1814, the truth is that I would almost certainly have had a very limited
education, coupled with hardly any access to knowledge. My own male ancestors from that era
were all employed as laborers, on farms that they didn't own, in what is now called northern
Germany.
But I suppose you have a different question in mind. What if I had been one of the few people
with a chance to get an advanced education, and who also had some flexibility to choose a
career?
All my life I've wanted to be a teacher. In fact, when I was in first grade, I wanted to teach first
grade; in second grade, I wanted to teach second; and so on. I ended up as a college teacher.
Thus I suppose that I'd have been a teacher, if possible.
To continue this speculation, I have to explain about being a geek. Fred Gruenberger told me
long ago that about 2% of all college students, in his experience, really resonated with
computers in the way that he and I did. That number stuck in my mind, and over the years I was
repeatedly able to confirm his empirical observations. For instance, I learned in 1977 that the
University of Illinois had 11,000 grad students, of whom 220 were CS majors!
Thus I came to believe that a small percentage of the world's population has somehow
acquired a peculiar way of thinking, which I happen to share, and that such people happened to
discover each other's existence after computer science had acquired its name.
For simplicity, let me say that people like me are "geeks," and that geeks comprise about 2% of
the world's population. I know of no explanation for the rapid rise of academic computer science
departments—which went from zero to one at virtually every college and university between
1965 and 1975—except that they provided a long-needed home where geeks could work
together. Similarly, I know of no good explanation for the failure of many unsuccessful software
projects that I've witnessed over the years, except for the hypothesis that they were not
entrusted to geeks.
So who were the geeks of the early 19th century? Beginning a little earlier than 1814, I'd maybe
like to start with Abel (1802); but he's been pretty much claimed by the mathematicians. Jacobi
(1804), Hamilton (1805), Kirkman (1806), De Morgan (1806), Liouville (1809), Kummer (1810),
and China's Li Shanlan (1811) are next; I'm listing "mathematicians" whose writings speak
rather directly to the geek in me. Then we get precisely to your time period, with Catalan (1814)
and Sylvester (1814), Boole (1815), Weierstraß (1815), and Borchardt (1817). I would have
enjoyed the company of all these people, and with luck I might have done similar things.
3 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
By the way, the first person in history whom I'd classify as "100% geek" was Alan Turing. Many
of his predecessors had strong symptoms of our disease, but he was totally infected.
7. Tony Gaddis, author: Do you remember a specific moment when you discovered the joy of
programming, and decided to make it your life's work?
Don Knuth: During the summer of 1957, between my freshman and sophomore years at Case
Tech in Cleveland, I was allowed to spend all night with an IBM 650, and I was totally hooked.
But there was no question of viewing that as a "life's work," because I knew of nobody with
such a career. Indeed, as mentioned above, my life's work was to be a teacher. I did write a
compiler manual in 1958, which by chance was actually used as the textbook for one of my
classes in 1959(!). Still, programming was for me primarily a hobby at first, after which it
became a way to support myself while in grad school.
8. Robert Sedgewick, Princeton: Don, I remember some years ago that you took the position
that you weren't trying to reach everyone with your books—knowing that they would be
particularly beneficial to people with a certain interest and aptitude who enjoy programming and
exploring its relationship to mathematics. But lately I've been wondering about your current
thoughts on this issue. It took a long time for society to realize the benefits of teaching
everyone to read; now the question before us is whether everyone should learn to program.
What do you think?
Don Knuth: I suppose all college professors think that their subject ought to be taught to
everybody in the world. In this regard I can't help quoting from a wonderful paper that John
Hammersley wrote in 1968:
Just for the fun of getting his reactions, I asked an eminent scholar of English Literature
what educational benefits might lie in the study of goliardic verse, Erse curses, and runic
erotica. 'A working background of goliardic verse would be more than helpful to anyone
hoping to have some modest facility in his own mother tongue', he declared; and with that
he warmed to his subject and to the poverties of unlettered science, so that it was some
minutes before I could steer him back to the Erse curses, about which he seemed a good
deal less enthusiastic. 'Really', he said, 'that sort of thing isn't my subject at all. Of course, I
applaud breadth of vocabulary; and you never know when some seemingly useless piece
of knowledge may not turn out to be of cardinal practical importance. I could certainly
envisage a situation in which they might come in very handy indeed'. 'And runic erotica?'
'Not extant'. (Was it only my fancy that heard a note of faint regret in his reply?) Certainly
the higher flights of scholarship can add savour; but does the man-in-the-street have the
time and the pertinacity and the intellectual digestion for them?
But your question asks about everybody. I still think many years will have to go by before I
would recommend that my own highly intelligent wife, son, and daughter should learn to
program, much less that everybody else I know should do so.
Nick Trefethen told me a few years back that he had just visited his son's high school in Oxford,
which is one of the best anywhere, and learned that not a single student knew how to program!
Britain is now beginning to change that, indeed at a more rapid pace than in America. Yet such
a revolution almost surely needs to take place over a generation or more. Where are the
teachers going to come from?
My own experience is with the subset of college students who are sufficiently interested in
programming that they expect it to become an integral part of their life. TAOCP is essentially for
specialists. I've primarily been writing it for geeks, not for a general audience, because
somebody has to write books that aren't for dummies. (By a "dummy" I mean a smart non-geek.
That's a much larger market, and very important; but it's not my target audience, and general
4 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
On the other hand, believe it or not, I try to explain everything in my books by imagining a
non-specialist reader. My goal is to be jargon-free whenever possible; I especially try to avoid
terms from higher mathematics that tend to frighten the programmer-on-the-street. Whenever
possible I try to translate results from the theoretical literature into a language that high-school
students could understand.
I know that my books still aren't terribly easy to fathom, even for geeks. But I could have made
them much, much harder.
9. Barbara Steele: What was the conversion process, and what tools did you use, to convert
your print books to eBooks?
Don Knuth: I knew that these volumes would not work especially well as eBooks unless they
were converted by experts. Fortunately I received some prize money in 2011, which could be
used to pay for professional help. Therefore I was able to achieve the kind of quality that I
envisioned, without delaying my work on future volumes, by letting the staff at Mathematical
Sciences Publishers in Berkeley (MSP) handle all of the difficult stuff.
My principal goal was to make the books easily searchable—and that's a much more
challenging problem than it seems, if you want to do it right. Secondarily, I wanted to let readers
easily click on the number of any exercise or equation or illustration or table or algorithm, etc.,
and to jump to that exercise; also to jump readily between an exercise and its answer.
The people at MSP wrote special software that converts my source text into suitable input
to other software that creates pdf files. I don't know the details, except that they use "change
files" analogous to those used in WEB and CWEB. I've checked the results pretty carefully, and I
couldn't be more pleased. Moreover, they've designed things so that it won't be hard for me to
make changes next year, as readers discover bugs in the present editions.
(My style of writing tends to maximize the number of opportunities to make mistakes, hence I
would be fooling myself if I thought that the books were now perfect. Therefore it has always
been important to keep future errata in mind. The production staff at Addison-Wesley has been
consistently wonderful in the way they allow me to correct about fifty pages every year in each
volume.)
10. Silvio Levy, MSP: Could you comment on the differences between the print, pdf, ePUB,
etc., editions of TAOCP? What would you say is gained or lost with each?
Don Knuth: The printed versions weigh a lot more, but they don't need battery power or a
tether to electricity. They are always there; I don't have to turn them on, and I can have them all
open at once.
I can scribble in the margins (and elsewhere) of the print versions, and I can highlight text in
different colors. Ten years from now I expect analogous features will be commonly available for
eBooks.
I'm used to flipping pages and finding my way around a regular book, much more so than in an
eBook; but my grandchildren might have the opposite reaction.
The great advantage of an eBook is the reader's ability to search exhaustively. What fun it is to
look for all occurrences of a random word like 'game', or for a random word fragment like 'gam'
or 'ame', and find lots of cool material that I don't recall having written. The search feature on
these books works even better than I had a right to hope for.
The index in a printed book has the advantage of being more focused. But that index also
appears in the eBook, and in the eBook you can even click in the index to get to the cited
pages.
Today's eBook readers are often inconvenient for setting bookmarks and going back to where
you were a couple of minutes ago, especially after you click on an Internet link and then want to
go back to reading. But that software will surely improve, and so will today's electronic devices.
In the future I look forward to curated eBooks that have additional notes by experts—and
possibly even graffiti in the style of Concrete Mathematics—somewhat analogous to the
5 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
"director's comments" and other extras found on the DVDs for films. One could select different
subsets of these comments when reading.
11. Peter Gordon, Addison-Wesley (retired): If the full range of today's eBook features and
functionalities had been available when TAOCP was first published, would you have written
those volumes very differently?
Don Knuth: Well, I don't think I would have gotten very far at all. I would have had to think
about doing everything in color, and with interactive figures, tables, equations, and exercises. A
single person cannot use the "full range" of features that eBooks potentially have.
But by limiting myself to what can be presented well in black-and-white type, on printed pages
of a fixed size, I was fortunately able to complete 3,000 pages over a period of 50 years.
12. Udi Manber, Google: The early volumes of TAOCP established computer programming as
computer science. They introduced the necessary rigor. This was at the time when computers
were used mostly for numerical applications. Today, most applications are related to people
—social interaction, search, entertainment, and so on. Rigor is rarely used in the development
of these applications. Speed is not always the most important factor, and "correctness" is rarely
even defined. Do you have any advice on how to develop a new computer science that can
introduce rigor to these new applications?
Don Knuth: The numerical computations that were somewhat central when computer science
was born are by no means gone; they continue to grow, year by year. Of course, they now
represent a much smaller piece of the pie, but I don't believe in concentrating too much on the
big pieces.
As a user of products from Google and Adobe and other corporations, I know that a
tremendous amount of rigor goes into the manipulation of map data, transportation data, pixel
data, linguistic data, metadata, and so on. Furthermore, much of that processing is done with
distributed and decentralized algorithms that require more rigor than anybody ever thought of in
the 60s.
So I can't say that rigor has disappeared from the computer science scene. I do wish, however,
that Google's and Adobe's and Apple's programmers would learn rigorously how to keep their
systems from crashing my home computers, when I'm not using Linux.
In general I agree with you that there's no decrease in the need for rigor, rather an increase in
the number of kinds of rigor that are important. The fact that correctness can't be defined on the
"bottom line" should not lull people into thinking that there aren't intermediate levels within
every nontrivial system where correctness is crucial. Robustness and quality are compromised
by every weak link.
On the other hand, I certainly don't think that everything should be mathematized, nor that
everything that involves computers is properly a subdiscipline of computer science. Many parts
of important software systems do not require the special talents of geeks; quite the contrary.
Ideally, many disciplines collaborate, because a wide variety of orthogonal skill sets is a
principal reason why life is such a joy. Vive la différence.
Indeed, I myself follow the path of rigor only partway: Rarely do I ever give a formal proof that
any of my programs are correct, once I've constructed an informal proof that convinces me. I
have no real interest, for example, in defining exactly what it would mean for to be correct,
or for verifying formally that my implementation of that 550-page program is free of bugs. I know
that anomalous results are possible when users try to specify pages that are a mile wide, or
constants that involve a trillion zeros, etc. I've taken care to avoid catastrophic crashes, but I
don't check every addition operation for possible overflow.
There's even a fundamental gap in the foundations of my main mathematical specialty, the
6 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
analysis of algorithms. Consider, for example, a computer program that sorts a list of numbers
into order. Thanks to the work of Floyd, Hoare, and others, we have formal definitions of
semantics, and tools by which we can verify that sorting is indeed always achieved. My job is to
go beyond correctness, to an analysis of such things as the program's running time: I write
down a recurrence, say, which is supposed to represent the average number of comparisons
made by that program on random input data. I'm 100% sure that my recurrence correctly
describes the program's performance, and all of my colleagues agree with me that the
recurrence is "obviously" valid. Yet I have no formal tools by which I can prove that my
recurrence is right. I don't really understand my reasoning processes at all! My student Lyle
Ramshaw began to create suitable foundations in his thesis (1979), but the problem seems
inherently difficult. Nevertheless, I don't lose any sleep over this situation.
13. Al Aho, Columbia: We all know that the Turing Machine is a universal model for
sequential computation.
But let's consider reactive distributed systems that maintain an ongoing interaction with their
environment—systems like the Internet, cloud computing, or even the human brain. Is there a
universal model of computation for these kinds of systems?
Don Knuth: I'm not strong on logic, so TAOCP treads lightly on this sort of thing. The TAOCP
model of computation, discussed on pages 4–8 of Volume 1, considers "reactive processes,"
a.k.a. "computational methods," which correspond to single processors. I've long planned to
discuss recursive coroutines and other cooperative processes in Chapter 8, after I finish
Chapter 7. The beautiful model of context-free parsing via semiautonomous agents, in Floyd's
great survey paper of 1964, has strongly influenced my thinking in this regard.
I'd like to see extensions of the set-theoretic model of computation at the beginning of Volume 1
to the things you mention. They might well shed light on the subject.
But fully distributed processes are well beyond the scope of my books and my own ability to
comprehend them. For a long time I've thought that an understanding of the way ant colonies
are able to perform incredibly organized tasks might well be the key to an understanding of
human cognition. Yet the ants that invade my house continually baffle me.
14. Guy Steele, Oracle Labs: Don, you and I are both interested in program analysis: What
can one know about an algorithm without actually executing it? Type theory and Hoare logic are
two formalisms for that sort of reasoning, and you have made great contributions to using
mathematical tools to analyze the execution time of algorithms. What do you think are
interesting currently open problems in program analysis?
Don Knuth: Guy, I'm sure you aren't really against the idea of program execution. You and I
both like to know things about programs and to execute them. Often the execution contradicts
our supposed knowledge.
The quest for better ways to verify programs is one of the famous grand challenges of computer
science. And as I said to Udi, I'm particularly rooting for better techniques that will avoid
crashes.
Just now I'm writing the part of Volume 4B that discusses algorithms for satisfiability, a problem
of great industrial importance. Almost nothing is known about why the heuristics in modern
solvers work as well as they do, or why they fail when they do. Most of the techniques that have
turned out to be important were originally introduced for the wrong reasons!
If I had my druthers, I wish people like you would put a lot of effort into a problem of which I've
only recently become aware: The programmers of today's multithreaded machines need new
kinds of tools that will make linked data structures much more cache-friendly. One can in many
cases start up auxiliary parallel threads whose sole purpose is to anticipate the memory
accesses that the main computational threads will soon be needing, and to preload such data
into the cache. However, the task of setting this up is much too daunting, at present, for an
ordinary programmer like me.
15. Robert Tarjan, Princeton: What do you see as the most promising directions for future
work in algorithm design and analysis? What interesting and important open problems do you
see?
7 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
Don Knuth: My current draft about satisfiability already mentions 25 research problems, most
of which are not yet well known to the theory community. Hence many of them might well be
answered before Volume 4B is ready. Open problems pop up everywhere and often. But your
question is, of course, really intended to be much more general.
In general I'm looking for more focus on algorithms that work fast with respect to problems
whose size, n, is feasible. Most of today's literature is devoted to algorithms that are
asymptotically great, but they are helpful only when n exceeds the size of the universe.
In one sense such literature makes my life easier, because I don't have to discuss those
methods in TAOCP. I'm emphatically not against pure research, which significantly sharpens
our abilities to deal with practical problems and which is interesting in its own right. So I
sometimes play asymptotic games. But I sure wouldn't mind seeing a lot more algorithms that I
could also use.
For instance, I've been reading about algorithms that decide whether or not a given graph G
belongs to a certain class. Is G, say, chordal? You and others discovered some great
algorithms for the chordality and minimum fillin problems, early on, and an enormous number of
extremely ingenious procedures have subsequently been developed for characterizing the
graphs of other classes. But I've been surprised to discover that very few of these newer
algorithms have actually been implemented. They exist only on paper, and often with details
only sketched.
Two years ago I needed an algorithm to decide whether G is a so-called comparability graph,
and was disappointed by what had been published. I believe that all of the supposedly "most
efficient" algorithms for that problem are too complicated to be trustworthy, even if I had a year
to implement one of them.
Thus I think the present state of research in algorithm design misunderstands the true nature of
efficiency. The literature exhibits a dangerous trend in contemporary views of what deserves to
be published.
Another issue, when we come down to earth, is the efficiency of algorithms on real computers.
As part of the Stanford GraphBase project I implemented four algorithms to compute minimum
spanning trees of graphs, one of which was the very pretty method that you developed with
Cheriton and Karp. Although I was expecting your method to be the winner, because it
examines much of the data only half as often as the others, it actually came out two to three
times worse than Kruskal's venerable method. Part of the reason was poor cache interaction,
but the main cause was a large constant factor hidden by O notation.
16. Frank Ruskey, University of Victoria: Could you comment on the importance of working
on unimportant problems? My sense is that computer science research, funding, and academic
hiring is becoming more and more focused on short-term problems that have at their heart an
economic motivation. Do you agree with this assessment, is it a bad trend, and do you see a
way to mitigate it?
Similarly, could you comment on the demise of the individual researcher? So many papers that
I see published these days have multiple authors. Five-author papers are routine. But when I
dig into the details it seems that often only one or two have contributed the fresh ideas; the
others are there because they are supervisors, or financial contributors, or whatever. I'm pretty
sure that Euler didn't publish any papers with five co-authors. What is the reason for this trend,
how does it interfere with trying to establish a history of ideas, and what can be done to reverse
it?
Don Knuth: I was afraid somebody was going to ask a question related to economics. I've
never understood anything about that subject. I don't know why people spend money to buy
things. I'm willing to believe that some economists have enough wisdom to keep the world
running some of the time, but their reasons are beyond me.
I just write books. I try to tell stories that seem to be important, at least for geeks. I've never
bothered to think about marketing, or about what might sell, except when my publishers ask me
to answer questions as I'm doing now!
Three years ago I published Selected Papers on Fun and Games, a 750-page book that is
entirely devoted to unimportant problems. In many ways the fact that I was able to live during a
8 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
time in the history of the world when such a book could be written has given me even more
satisfaction than I get when seeing the currently healthy state of TAOCP.
I've reached an age where I can fairly be described as a "grumpy old man," and perhaps that is
why I strongly share your concern for the alarming trends that you bring up. I'm profoundly
upset when people rate the quality of my work by measuring the extent to which it affects Wall
Street.
Regarding joint authorship, you are surely right about Euler in the 18th century. In fact I can't
think of any two-author papers in mathematics, until Hardy and Littlewood began working
together at the beginning of the 20th century.
In my own case, two of my earliest papers were joint because the other authors did the theory
and I wrote computer programs to validate it. Two other papers were related to the ALGOL
language, and done together with ACM committees. In a number of others, written while I was
at Caltech, I did the theory and my student co-authors wrote computer programs to validate it.
There was one paper with Mike Garey, Ron Graham, and David Johnson, in which they did the
theory and my role was to explain what they did. You and I wrote a joint paper in 2004, related
to recursive coroutines, in which we shared equally.
The phenomenon of hyperauthorship still hasn't infected computer science as much as it has
hit physics and biology, where I've read that Thomson-Reuters indexed more than 200 papers
having 1,000 authors or more, in a single recent year! When I cite a paper in TAOCP, I like to
mention all of the authors, and to give their full names in the index. That policy will become
impossible if CS publication practices follow in the footsteps of those fields.
Collaborative work is exhilarating, and it's wonderful when new results are obtained that
wouldn't have been discovered by individuals working alone. But as you say, authors should be
authors, not hangers-on.
You mention the history of ideas. To me the method of discovery tends to be more important
than the identification of the discoverers. Still, credit should be given where credit is due;
conversely, credit shouldn't be given where credit isn't due.
I suppose the multiple-author anomalies are largely due to poor policies related to financial
rewards. Unenlightened administrators seem to base salaries and promotions on publication
counts.
What can we do? As I say, I'm incompetent to deal with economics. I've gone through life
refusing to go along with a crowd, and bucking trends with which I disagree. I've often declined
to have my name added to a paper. But I suppose I've had a sheltered existence; young people
may be forced to bow to peer pressure.
17. Andrew Binstock, Dr. Dobb's: At the ACM Turing Centennial in 2012, you stated that you
were becoming convinced that P = N P. Would you be kind enough to explain your current
thinking on this question, how you came to it, and whether this growing conviction came as a
surprise to you?
Don Knuth: As you say, I've come to believe that P = N P, namely that there does exist an
integer M and an algorithm that will solve every n-bit problem belonging to the class N P in
nM elementary steps.
Some of my reasoning is admittedly naïve: It's hard to believe that P ≠ N P and that so many
brilliant people have failed to discover why. On the other hand if you imagine a number M that's
finite but incredibly large—like say the number 10 3 discussed in my paper on "coping
with finiteness"—then there's a humongous number of possible algorithms that do nM bitwise or
addition or shift operations on n given bits, and it's really hard to believe that all of those
algorithms fail.
My main point, however, is that I don't believe that the equality P = N P will turn out to be helpful
9 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
even if it is proved, because such a proof will almost surely be nonconstructive. Although I think
M probably exists, I also think human beings will never know such a value. I even suspect that
nobody will even know an upper bound on M.
Mathematics is full of examples where something is proved to exist, yet the proof tells us
nothing about how to find it. Knowledge of the mere existence of an algorithm is completely
different from the knowledge of an actual algorithm.
For example, RSA cryptography relies on the fact that one party knows the factors of a number,
but the other party knows only that factors exist. Another example is that the game of N × N
Hex has a winning strategy for the first player, for all N. John Nash found a beautiful and
extremely simple proof of this theorem in 1952. But Wikipedia tells me that such a strategy is
still unknown when N = 9, despite many attempts. I can't believe anyone will ever know it when
N is 100.
More to the point, Robertson and Seymour have proved a famous theorem in graph theory: Any
class of graphs that is closed under taking minors has a finite number of minor-minimal
graphs. (A minor of a graph is any graph obtainable by deleting vertices, deleting edges, or
shrinking edges to a point. A minor-minimal graph H for is a graph whose smaller minors all
belong to although H itself doesn't.) Therefore there exists a polynomial-time algorithm to
decide whether or not a given graph belongs to : The algorithm checks that G doesn't contain
any of 's minor-minimal graphs as a minor.
But we don't know what that algorithm is, except for a few special classes , because the set
of minor-minimal graphs is often unknown. The algorithm exists, but it's not known to be
discoverable in finite time.
This consequence of Robertson and Seymour's theorem definitely surprised me, when I
learned about it while reading a paper by Lovász. And it tipped the balance, in my mind, toward
the hypothesis that P = N P.
The moral is that people should distinguish between known (or knowable) polynomial-time
algorithms and arbitrary polynomial-time algorithms. People might never be able to implement a
polynomial-time-worst-case algorithm for satisfiability, even though P happens to equal N P.
Don Knuth: Besides economics, I was also afraid that somebody would ask me about the
future, because I'm a notoriously bad prophet. I'll take a shot at your question anyway.
Assuming 100 years of sustainable civilization, I'm fairly sure that a large percentage of
theorems (maybe even 38.1966%) will be discovered with computer aid, and that a nontrivial
percentage (maybe 0.7297%) will have computer-verified proofs that cannot be understood by
mortals.
A few months ago, however, I tried unsuccessfully to do a similar thing. I had a 5,000-step
mechanically discovered proof that the edges of a smallish flower snark graph cannot be
3-colored, and I wanted to psych out how the machine had come up with it. Although I gave up
after a couple of days, I do think it would be possible to devise new tools for the study of
computer proofs in order to identify the "aha moments" therein.
10 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
hours, but there still were 20 million steps in the proof. I see no way at present for human
beings to understand more than the first few thousand of those steps.
19. Scott Aaronson, MIT: Would you recommend to other scientists to abandon the use of
email, as you have done?
Don Knuth: My own situation is unusual, because I do my best work when I'm not interrupted. I
eat, sleep, and write content, more-or-less as a recluse who spends considerable time reading
archives and other people's code. As I say on my home page, most people need to keep on top
of things, but my role is to get to the bottom of things.
So I don't recommend a no-email policy to people who thrive on communication. And I actually
take advantage of others in this respect (either shamelessly or shamefully, I'm not sure which),
by pestering them with random questions, even though I don't want anybody to pester
me—except about the one topic that I happen to be zooming in on at any particular time.
I do welcome email that reports bugs in TAOCP, because I always try to correct them as soon
as possible.
Other unsolicited messages go to the bit bucket in the sky, otherwise known as /dev/null.
20. J. H. Quick, blogger: Why is this multi-interview called "twenty questions," when only 19
questions were asked?
Incidentally, the eVolumes of TAOCP contain some 4,500 questions, and almost as many
answers.
Learn More
By Donald E. Knuth
Learn More
By Donald E. Knuth
Learn More
Discussions
11 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
12 of 13 01/07/2015 01:54 PM
Twenty Questions for Donald Knuth | | InformIT http://www.informit.com/articles/article.aspx?p=...
13 of 13 01/07/2015 01:54 PM
INFORMATION AND CONTROL 8, 6 0 7 - 6 3 9 (1965)
Y
610 KNUTtt
is not of bounded right context, since the handle in both acid and bc~d
is " d " ; yet this grammar is certainly L R ( 0 ) . A more interesting ex-
ample is
Here the terminal strings are {a~bc~}, and the b must be reduced to S
or A according as n is even or odd. This is another LR(0) grammar
which fails to be of bounded right context.
In Section I I I we will give further examples and will discuss the
relevance of these concepts to the grammar for ALGOL 60. Section IV
contains a proof that the existence of k, such that a given grammar is
L R ( k ) , is recursively undecidable.
Ginsburg and Greibach (1965) have defined the notion of a deter-
ministic language; we show in Section V that these are precisely the
languages for which there exists an L R ( k ) grammar, and thereby we
obtain a number of interesting consequences.
II. ANALYSIS OF LR(k) GRAMMARS
Given a grammar ~ and an integer k => 0, we will now give two ways
to test whether ,q is L R ( k ) or not. We may assume as usual that ~ does
614 KNUTH
not contain useless productions, i.e., for any A in I there are terminal
strings ~, f, ~ such that S - > a A ' , / ~ aft'/.
The first method of testing is to construct another grammar ~ which
reflects all possible configurations of a handle and k characters to its
right. The intermediate symbols of ~ will be [A; a], where a is a k-letter
string on T U { ~ } ; and also [p], where p is the number of production in
9. The terminal symbols of ~ will be I U T U { -~}.
For convenience we define Hk(a) to be the set of all k-letter strings f
over T U { -~} such that a - > ¢~-/with respect to @ for some v; this is
the set of all possible initial strings of length k derivable from a.
Let the pth production of ~ be
[S; _~k]~ O[p] and [S; _~k]~ O~[q] implies ¢ = e and p = q. (15)
But ~ is a regular grammar, and well-known methods exist for testing
Condition (15) in regular grammars. (Basically one first transforms
so that all of its productions have the form Q~ ~ aQ], and then if Q0 =
IS; qk], one can systematically prepare a list of all pairs (i, j) such that
there exists a string a for which Qo ~ aQ~ and O0 ~ aQj .)
When k = 2, the grammar ~ corresponding to (2) is
TRANSLATION FROM LEFT TO RIGHT 615
(We thus have added to $ all productions we might begin to work on,
in addition to those we are already working on.)
T R A N S L A T I O N FROM L E F T TO R I G H T 617
C o m p u t e & ' by Eq. (18) and then compute the new set &~+~as follows:
properly take care of the most general case, this method is necessarily
complicated, for all of the relevant information must be saved. The
structure of this general method should shed some light on the im-
portant special cases which arise when the LR(k) grammar is of a simpler
type.
We will not give a formal proof that this parsing method works, since
the reader may easily verify that each step preserves the assertions we
made about the state sets and the stack. The construction of all possible
state sets that can arise will terminate since there are finitely many of
these. The grammar will be LR(k) unless the Z sets of Eqs. (19)-(20)
are not disjoint for some possible state set. The parsing method just
described will terminate since any string in the language has a finite deri-
vation, and each execution of Step 2 either finds a step in the derivation
or reduces the length of string not yet examined.
III. EXAMPLES
Now let us give three examples of applications to some nontrivial
languages. Consider first the grammar
In Table I, the symbol 21-~ stands for the state [2, 1; ~ ], and 4lab
stands for two states [4, 1; a] and [4, 1; b]. "Shift" means "perform the
shift left operation" mentioned in step 2; "reduce p " means "perform
the transformation (21) with production p." The first lines of Table I
TABLE I
~ARSING METHOD FOR GRAMMAR (26)
01~ 4 stop
224 4 reduce 2
43ab a, b reduce 4
634 4 reduce 6
84ab a, b reduce 8
TRANSLATION FROM LEFT TO R I G H T 621
are formed as follows: Given the initial state $ = {004} , we. m u s t form
S' according to Eq. (18). Since X01 = B and X02 = 4 we must include
10 4 and 20 4 in $'. Since X21 = L and X~2 = R we must:include 30ab;
40ab in $ ' ( a and b being the possible initial characters o f R 4 ). Since
X41 = L and X4~ = N we must, similarly, include 30ab and 40ab in 8';
but these have already been included, and so 8' is completely deter-
mined. Now Z = {a} in this case, so the only possibility i n s t e p 2 is to
have Yi = a and shift. Step 3 is more interesting; if we ever get to
Step 3 with $~ = $ (this includes later events when a reduction (21) has
been performed) there are three possibilities for X,~+i. These are de-
termined by the seven states in St, and the righthand column is merely
an application of Eq. (23).
An important shortcut has been taken in Table I. Although it is
possible to go into the state set "514 71b", we have no entry for that
set; this happens because 51471b is contained i n 51471ab. A procedure
for a given state set must be valid for any of its subsets. (This implies less
error detection in Step 2, but we will soon justify that.) It is often
possible to take the union of several state sets for which the parsing
action does not conflict, thereby considerably shortening the parsing
algorithm generated by the construction of Section II.
When only one possibility occurs in Step 2 there is no need to test
the validity of Yi • • • Yk ; for example in Table I line 1 there is no need
to make sure Y~ = a. One need do no error detection until an attempt
to shift Y~ = ~ left of the vertical line occurs. At this point the stack
will contain "$oS8i[ 4 k'' if and only if the input string was well-
formed; for we know a well-formed string will be parsed, and (by defini-
tion!) a malformed string cannot possibly be reduced to " S 4 ~'' by
applying the productions in reverse. Thus, any or all error detection
m a y be saved until the end. (When k = 0, 4 must be appended at the
right in order to do this delayed error check.)
One could hardly write a paper about parsing without considering the
traditional example of arithmetic expressions. The following grammar is
typical:
This grammar has the terminal alphabet {a, - , . , (,), 4 }; for example,
the string " a -- ( - - a . a - a) 4 " belongs to the language. Table II shows
how our construction would produce a parsing method. In line 10, the
notation "4, 5, 6" appearing in the X column means rules 4, 5, and 6
apply to this state set also. Such "factoring" of rules is another way to
simplify t h e parsing routine produced by our construction, and the
reader will undoubtedly see other ways to simplify Table II.
By means of our construction it is possible to determine exactly what
information about the string being parsed is known at any given time.
Because of this detailed knowledge, it will be possible to study how much
of the information is not really essential (i.e., how much is redundant)
and thereby determine the "best possible" parsing method for a gram-
mar, in some sense. The two simplifications already mentioned (delayed
error ehecldng, taking unions of compatible state sets) are simplifications
of this ldnd, and more study is needed to analyze this problem further.
In many eases it will not be necessary to store the state sets $~ in the
stack, since the states Sr which are used in the latter part of Step 2 can
often be determined by examining a few of the X's at the top of the
stack. Indeed, this will always be true if we have a bounded right con-
text grammar, as defined in Section I. Both grammars (26) and (27)
are of bounded context.
From Table I we can see how to recover the necessary state set in-
formation without storing it in the stack. We need only consider those
state sets which have at least one intermediate character in the " X ~ + I "
column for otherwise the state set is never used by the parser. Then it is
immediately clear from Table I that {004} is always at the bottom of
the stack, {214 , 4lab} is always to the right of L, {614,8lab} is always
to the right of b, and {624, 82ab} is always to the right of N.
Grammar ( 2 7 ) is related to the definition of arithmetic expressions in
the ALGOL 60 language, and it is natural to ask whether ALGOL 60 is
an LR(k) language. The answer is a little difficult because the definition
of this language (see Naur (1963)) is not done completely in terms of
productions; there are "comment conventions" and occasional informal
explanations. The grammar cannot be LR(k) because it has a number
of syntactic ambiguities; for example, we have the production
(open string} --+ (open string} (open string}
which is always ambiguous. Another type of ambiguity arises in the
parsing of (identifier) as (actual parameter}. There are eight ways to do
T A B L E II
]~ARSING METHOD FOR GRAMMAR (2,7)
this:
(actual parameter} --~ (array identifier} --~ (identifier}
(actual parameter --~ (switch identifier} --~ (identifier)
(actual parameter --* (procedure identifier} --* (identifier}
(actual parameter -+ (expression} --~ (designational expression}
(identifier}
(actual parameter (expression} --~ (Boolean expression}
(variable} ~ (identifier}
(actual parameter --~ (expression} --~ (Boolean expression}
(function designator) ~ (identifier}
: (actual parameter --~ (expression} --~ (arithmetic expression}
(variable} ~ (identifier}
(actual parameter} --* (expression} --+ (arithmetic expression}
(function designator) ~ (identifier}
These syntactic ambiguities reflect bona fide semantic ambiguities,
if the identifier in question is a formal parameter to a procedure, for it is
then impossible to determine what sort of identifier will be the actual
arg~lment in the absence of specifications. At the time the ALGOL 60
report was written, of course, the whole question of syntactic ambiguity
was just emerging, and the authors of that document naturally made
little attempt to avoid such ambiguities. In fact, the differentiation
between array identifiers, switch identifiers, etc. in this example was done
intentionally, to provide explanation along with the syntax (referring
to identifiers which have been declared in a certain way). In view of this,
a ninth alternative
(actual parameter) --~ (string} --* (formal parameter} --* (identifier)
might also have been included in the ALGOL 60 syntax (since section
4.7.5.1 specifically allows formal parameters whose actual parameter is a
string to be used as actual parameters, and this event is not reflected in
any of the eight possibilities above). The omission of this ninth alterna-
tive is significant, since it indicates the philosophy of the ALGOL 60 re-
TRANSLATION FRCM LEFT TO RIGHT ~5
the Tag problem (see Cocke and Minsky (1964)) but no apparent simple
connection. We can, however, prove that the partial correspondence
problem is recursively unsolvable, using methods analogous to those
devised by Floyd (1964b) for dealing with the ordinary correspondence
problem and using the determinacy of Turing machines.
For this purpose, let us use the definition and notation for Turing ma-
c.hines as given in Post (1947) ; we will construct a partial correspondence
problem for any Turing machine and any initial configuration. The
characters used in our partial correspondence problem wilt be
q~SiS~hh, 1 < i <_ R, 0 <=j <-_ m.
If the initial configuration is
S i l S j ~ " " Sj~_tq~lSjk'" S~
the pair of strings
( ~, ~hSj~...S~_lqi~Sjk... Si~,h) (28)
will enter into our partial correspondence problem. We also add the
pairs
(/~, h), (h,/~), (S~., ~.), (Ss', Sj), (~ , q~), 1 <_-i --- R, 0 ~ j = m. (29)
Finally, we give pairs determined by the quadruples of the Turing ma-
chine:
Form of quadruple Corresponding pairs, 0 < t -< m:
q~S~Lq~ (hqiS~, h(tzSoSj), ( Stq~S~, q~S~Ss)
q~S~Rqz (q~Sjh, ,~J(l~Sof~), (q~SjSt, Si~zSt) (30)
qiSjSkq~ (q~Sj, (lzS~)
N o w it is easy to see that these corresponding pairs will simulate the
behavior of the Turing machine. Since the pair (28) is the only pair
having the same initial character, and since the pairs in (30) are the
only ones involving any q~ in the ]efthand string, the only possible
strings which can be initial substrings of both a~la~: .-. and
fl~fl~ . . . are initial substrings of
, ~-aO~la~a~&~a~ "" , (31 )
where no, m , a~, etc. represent the successive stages of the Turing
machine's tape (with h's placed at either end, and where ~ is an obvious
TRANSLATION FROM LEFT TO RIGHT ~27
notation signifying the " b a r r i n g " of each letter of a). For these pairs,
therefore, the partial correspondence problem has an affirmative answer if
and only if the Turing machine never halts. And the problem of telling if a
Turing machine will ever halt is, of course, well known to be recursively
unsolvable.
We will apply this result to L R ( k ) grammars as follows:
T~EOREM. The problem of deciding, for a given grammar ~, whether or
not there exists a k ~ 0 such that ~ is L R ( k ) , is recursively unsolvable.
This theorem is in contrast to the results of Section II, where we
showed the problem to be solvable when k is also given. To prove this
theorem we will reduce the partial correspondence problem to the L R ( k )
problem for a particular class of grammars.
Let ( a l , ill), "" • , (a,~, ft.) be pairs of strings entering into the partial
correspondence problem, and let
X1X2 " " X~ +
be n + 1 characters distinct from those appearing among the a's and
3's. Let ~ be the following grammar:
S - - ~ A , S---~ B, A -+ X i + o~i , B - ~ X I + fli
(32)
A --+ X i A o ~ i , B --> X i B f l i , ] ~- i <~ n .
O {X,m "'" X i l --~ (~il "'" C~im} O {Xim "'" X i 1 ~- ~,1 "'" ~,m}:
We will show @ is LR(tc) for some k if and only if the partiM corre-
spondence problem has a negative answer. If the answer is affirmative,
for every p we have sentential forms X 9 . . . X{, + a~ . . . a ~ , X{. .- •
X q + fl~ • • • fl~ in which the first p characters following " + " agree.
The handle must include the " + " sign, but the p - q characters following
the handle do not tell us whether the production A --+ Xi, + a~ or
B --+ X~I + fi~ is to be applied, if q is the maximum length of the
strings a~, fl~. Hence the grammar is not LR(q). On the other hand, if
the answer to the partial correspondence problem is negative, there is
a p for which, knowing (ix, ".- , i,~i~(~.o) and the first p characters
of aqai~ - " ai, ~ ~ or fli,fl~ "'" flit q ~, we can distinguish whether it
is a string of a's or a string of fl's, and therefore @ is in fact a bounded
context grammar.
628 KNUTH
type (ii) Aq~ ~ q~ if qi is not final, Aqi ~ (ti if qi is final, A ~ --~ ~..
type (iii) Aql ~ A B q j if qi is not final, Aq~ --~ A B ~ j if q~ is final, A ~ -*
A B(l j .
One easily verifies that (39) cannot occur, and the same set of strings
is accepted; basically we get into a state ~. if the current string has been
accepted, and then we do not accept the string again, but return to an
unbarred state when the next rule of type (i) is used.
Once the D P D A has been modified to meet these assumptions, let it
have the states q0, • • • , q, ; we are ready to construct a grammar for
the language it accepts. We begin by defining the languages L~At for
0 < i, t < r and for all intermediates A of the DPDA:
L~At = {a [ Aq~a _t> Aq --+ qt for some q} (40)
where no step in the derivation represented by " - ' > " affects the A appear-
ing at the left.
Constl~ct the following productions for all rules (38) of the DPDA:
Rule P r o d u c t i o n s for
Now remove all useless productions from ~, i.e., those which can never
appear in a derivation of a terminal string starting from L0~. We claim
the resulting grammar ~ is L R ( 1 ) . This result could be proved using
either of the constructions in Section II, where the state sets have a
rather simple form, but for purposes of exposition we will give here a
more intuitive explanation which shows the connection between the
operation of the D P D A and the parsing process.
Consider any string a-{ where a is accepted by the DPDA, and
consider the step-by-step behavior of the D P D A as it processes a. At
the same time we will be building a partial derivation tree which reflects
all of the information known at a given stage of the parse. The nodes of
this partial tree will contain symbols [i, A, .] which means that in the
only possible parsing of the string the intermediate L~at, for some t =
0, 1, . . . , r or t " b l a n k " , must fill that position. We will be " a t " some
node [i, A, *] of t h e tree, meaning this particular node below the handle
is of interest, and at the same time the D P D A will contain the con-
figuration .-. A q ~ . . . .
All of this can be clarified by considering an example, so we will con-
sider the following " r a n d o m " D P D A :
Rules of DPDA Productions of ~ (useless ones deleted)
c [4, A, *]
\ /
[2, A, ,]
\
a ]1, A, *]
\ /
[2, A, *] (45)
\
a [1, A, *]
\ /
[2, A, ,1
\
a [1, [-, *]
\ /
[0, ~, ,]
We are now " a t " node [4, A, *], signified by the three dots above it. At
this point the D P D A uses the rule Aq4 --* q6 and we transform the top
of tree (45) to
(Thus, two handles are recognized and then removed from the tree.)
Then the D P D A uses the rule Aq6 --~ q2 and (46) becomes
i L L6A~i
i a... /<L,~ i ". (47)
[~A,*]
by reducing three more handles. When the rule Aq~b --~ Aq3 is next ap-
634 KNUTH
b [3, A, *]
L2~2 [2, A, *]
\/
[1, A, ,]
aN,// (481
[2, A, *]
\
[1, ~, ,]
a /
\/
[o, ~, ,]
Now q3 is a final state and the next character is " ~ ", so we complete
the parsing; (48) becomes
b L3~
\/
L2~2 L2~
\/
L1A
a\ // (49)
L2A
\
a LI~
\/
Lo~
Having worked the example, we can consider the general case. Suppose
the D P D A is in t h e configuration ..- C A q i a . . . , and suppose we are
at node [i, A, .] of the tree. If q~ is a final state and a -= " ~ ", by condi-
tion (39) we must now complete the parsing, so we proceed to replace
each [i, A, ,] in the tree by L~u until the root is reached (as in going from
(48) to (49)). If q~ is not final or a ~ " -~ ", there are three cases de-
pending on the pair Aq~ :
C a s e ( i ) . The D P D A contains a rule of the form A q i a --~ A q j . Then
the only possible parse must occur by changing
TRANSLATION FROM LEFT TO RIGHT 635
from to a [j, A, *]
[i, A, ,] ~ /
[i, A, *]
\
/[ i, A,*] to i X~\ ? ~ j
X2 [il, A,,.] X2 LqA~j
x~ \[~i A2.1 X. ~./2A2,
......... .\/:i ..........................
"c,1
.
\
[i', c,,l [i', c,,]
[i, A, .1 [j, B, .]
\
[i, A, *]
as we did while building tree (45).
Cases (i), (ii), (iii) are mutually exclusive by the definition of DPDA,
and the arguments are justified by the fact that our tree represents all
possible productions of the grammar that could conceivably work.
Notice that in the parsing we actually have almost an LR(0) grammar
since it was necessary to look at the character following the handle only
when q~ was a final state, to see if the next character is " ~ " or not.
As a consequence of our two theorems, we find a language can be
generated by an LR(k) grammar if and only if it is deterministic, if and
only if it can be generated by an LR(1) grammar.
The theorem cannot be improved to " L R ( 0 ) grammar", since ob-
636 KNUTH
REFERENCES
CocK~, J., AND MINSKY, M. (1964), Universality of Tag systems with P = 2.
J. Assoc. Comput. Mach. 11, 15-20.
EARLEY, J. (1964), "Generating Productions from B N F " (preliminary report).
Carnegie Institute of Technology.
EICKEL, J. (1964), Generation of parsing algorithms for Chomsky type 2 languages.
Tech. Hoch. M~nchen, Bet. //6401.
FLoYn, R. W. (1963), Syntactic analysis and operator precedence. J. Assoc. Corn-
put. Mach. 10, 316-333.
FLOYD, R. W. (1964a), Bounded context syntactic analysis. Commun. Assoc.
Comput. Mach. 7, 62-66.
FLOYD, R. W. (1964b), "Now Proofs of Old Theorems in Logic and Formal Lin-
guistics." Computer Associates, Inc., Wakefield, Massachusetts.
GINSBURG, S., AND GREIBACH,S. (1965), "Deterministic Context-Free Languages"
(preliminary report). Am. Math. Soc. Not. 12, 246, 367.
IRONS, E. T. (1964), "Structural connections" in formal languages. Commun.
Assoc. Comput. Mach. 7, 67-71.
LzNc~, W. C. (1963), "Ambiguities in BNF Languages." Thesis, Univ. of Wis-
eonsin.
NAU~, P., ed. (1963), Revised Algol 60 report. Commun. Assoc. Comput. Mach. 6,
1-17.
P~us, M. (1962), A general processor for certain formal languages. Proc. Syrup.
Symbolic Languages in Data Processing, Rome, I962. Gordon and Breach,
New York.
POST, E. L. (1947), Beeursive unsolvability of a problem of Thue. J. Symbolic
Logic 19., 1-11.
SOFTWARE-PRACTICE AND EXPERIENCE, VOL. 19(7), 607-685 (JULY 1989)
DONALD E. KNUTH
Computer Science Department, Stanford University, Stanford, California 94305, U S A .
SUMMARY
This paper is a case study of program evolution. The author kept track of all changes made to
TEX during a period of ten years, including the changes made when the original program was
first debugged in 1978. The log book of these errors, numbering more than 850 items, appears
as an appendix to this paper. The errors have been classified into fifteen categories for purposes
of analysis, and some of the noteworthy bugs are discussed in detail. The history of the TEX
project can teach valuable lessons about the preparation of highly portable software and the
maintenance of programs that aspire to high standards of reliability.
KEY WORDS Errors Debugging TEX Program evolution Language design True confessions
INTRODUCTION
I make mistakes. I always have, and I probably always will. But I like to think that I
learn something, every time I go astray. I n fact, one of my favourite poems consists
of the following lines by Piet Hein:'
I am writing this paper on 5 May 1987, exactly ten years since I began to work
intensively on software systems for typesetting. I have certainly learned a lot during
those ten yrears, judging from the number of mistakes I made; and I would like to
share what I have learned with other people who are developing software. The best
way to do this, as far as I know, is to present a list of all the errors that were corrected
in TEX while it was being developed, and to attempt to analyse those errors.
When I mentioned my plan for this paper to Paul M. B. Vitiinyi, he told me about
a best-selling book that his grand-uncle had written for civil engineers, devoted entirely
to descriptions of foundation work that had proved to be defective. The preface to that
book2 says
TYPES OF ERROR
Some people undoubtedly think that everything I did on TEX was an error, from start
to finish. But I shall consider only a limited class Qf errors here, based on the log books
I kept while I was developing the program. Whenever I made a change, I noted it
down for future reference, and it is these changes that I shall discuss in detail. Edited
forms of my log books appear in the appendix below.
I guess I could say that this paper is about ‘changes’, not ‘errors’, because many of
the changes were made in order to introduce new features rather than to correct
malfunctions. However, new features are necessary only when a design is deficient (or
at least non-optimal). Hence, I will continue to say that each change represents an
error, even though I know that no complex system will ever be error-free in this
extended sense.
The errors in my log books have each been assigned to one of fifteen general
categories for purposes of analysis:
A, an algorithm awry. Here my original method proved to be incorrect or inad-
equate, so I needed to change the procedure. For example, error no.212 fixed
THE ERRORS OF TEX 609
a problem in which footnotes appeared on a page backwards: the last footnote
came out first.
B, a blunder or botch. Here I knew what I ought to do, but I wrote something
else that was syntactically correct-sort of a mental typo. For example, in error
no. 126 I wrote ‘before’ when I meant ‘after’ and vice versa. I was thinking so
much of the Big Picture that I did not have enough brainpower left to get the
small details right.
C, a clean-up for consistency or clarity. Here I changed the rules of the language
to make things easier to remember and/or more logical. Sometimes this was just
a surface change to TEX’S ‘syntactic sugar’, as in error no. 16 where I decided
that \input would be a better name than Lequire.
D, a data structure debacle. Here I did not properly update the representation of
information to preserve the appropriate invariants. For example, in error no. 105
I failed to return nodes to available memory when they were no longer accessible.
E, an efficiency enhancement. Here I changed the program so that it would run
faster; the existing code was correct but slow. For example, in error no.287 I
decided to give TEX the ability to preload fount information, since it took a
while to read thirty short files at the beginning of every run.
F, a forgotten function. Here I did not remember to do everything I had intended,
when I actually got around to writing a particular part of the code. It was a
simple error of omission, rather than commission. For example, in error no. 11
and again in no. 172 I had a loop of the form while p # null do, and I forgot to
advance the pointer p inside the loop! This seems to be one of my favourite
mistakes: I often forget the most obvious things.
G, a generalization or growth of ability. Here I realized that some extension of the
existing specifications was desirable. For example, error no. 303 generalized my
original primitive command ‘\ifT (char)’ (which tested if a given character was
‘T’or not) to the primitive ‘\if (char)(char)’ (which tested if two given characters
were equal). Eventually, in no. 666, I decided to generalize further and allow
‘\if (token)(token)’.
I, an interactive improvement. Here I made TEX respond better to the user’s
needs. Sometimes I saw how to help TEX identify and recover from errors in
the documents it was processing. I also kept searching for better ways to
communicate the reasons underlying X'S behaviour , by making diagnostic
information available in symbolic form. For example, error no. 54 introduced
‘. . . ’ into the display of context lines so that users could easily tell when
information was truncated.
L, a language liability. Here I misused or misunderstood the programming language
or system hardware I was working with. For example, in error no. 24 I wanted
to reduce a counter modulo 8, so I wrote t := (t - 1) mod 8; this unfortunately
made t negative because of the way mod was defined. Sometimes I forgot the
precedence of operators, etc.
M, a mismatch between modules. Here I forgot the conventions I had built into a
subroutine when I actually got around to using that subroutine. For example,
in error no.64 I had a macro with four parameters (xo, yo, xl, yl) that define a
rectangle; but when I used it, I gave the parameters in different order, (xo, xl,
yo, yl). Such ‘interface errors’ included cases when a procedure had unwanted
side-effects (such as clobbering a global variable) that I failed to take into
610 D. E. KNUTH
account. Some mismatches (such as incorrect data types) were caught by the
compiler and not entered in my log.
P, a promotion of portability. Here I changed the organization or documentation
of the program; this affected only a person who would try to read or modify the
code, not a person who tried to run it. For example, in error no. 59, one of my
comments about how to set the size of memory had ‘1’ where 1 meant to say
‘5’. (Most changes of this kind were not recorded in my log; I noted only the
noteworthy ones.)
Q, a quest for quality. Here I changed the specifications of what the program should
output from given input, when I learned how to improve the typographic
appearance of the output. For example, error no. 187 changed TEX’S behaviour
when typesetting formulae that have an unusually complex superscript; as a
result, Q X now produces
- 1
instead of e+.
detected by the compiler as a syntax error, I did not log it, because bad syntax
can easily be corrected.
Nine of these categories (A, B, D, F, L, M, R, S, T) represent ‘bugs’; such errors
absolutely had to be corrected. The other six categories (C, E, G, I , P, Q) represent
‘enhancements’; I could have refused to consider the existing situation erroneous. As
remarked earlier, I am considering all items in the log to be indications of error. But
there is a significant difference between errors of these two kinds: I felt guilty when
fixing the bugs, but I felt virtuous when making the enhancements.
My classification of errors into fifteen categories is ad hoc, but at the moment it is
the best way I can think of to make sense out of my experiences. Some of the bug
categories refer to simple flaws in the basic mechanics of programming: writing the
right thing but typing it wrong (T) ; thinking the right thing but writing it wrong (B) ;
knowing the right thing but forgetting to think it (F); imperfectly knowing the tools
(L) or the specifications (M). Such bugs are easy to fix once they have been identified.
Categories A and D represent the next level of difficulty, as we get into technical
T H E ERRORS OF TEX 61 1
aspects of what programming is all about. (As Niklaus Wirth has said, Algorithms +
Data Structures = Programs.) Category R covers the special situation in which we
want a program to survive even when its input is incorrect. Finally, category S accounts
for higher-level surprises ; these are the subtle bugs that result from compIex interactions
between different parts of a system. Thus the nine types of bugs have a somewhat
logical structure. The remaining six categories-cleanliness (C), efficiency (E), general-
ization (G), interaction (I), portability (P) and quality (Q)-seem to provide a reason-
able way to classify the various kinds of enhancements that were made to TEX during
its development.
My classification scheme relies more on essential functionality than on the external
form of the program. Thus it is not easy to use my statistics about the number of
errors per category to answer questions such as ‘How many bugs were due to improper
use of goto statements?’ Such questions are interesting to teachers of programming,
but I no longer think that they are extremely important. If I had indexed my errors
by syntactic categories, I would have found that error nos. 45, 91, 119, 155, 231, 352,
354, 419, 523, 581 and 801 could be ascribed to my use or abuse of goto; also no. 512
could be added to this list, since return and goto are analogous. Thus we can conclude
from my experience with TEX that goto statements can indeed be harmful. On the
other hand we must balance this fact with the realization that bad gotos account for
only 1.4 per cent of my errors; we must identify other culprits if we’re going to do
away with the other 98-6 per cent. Sure enough, several other errors were caused by
lapses in my use of other control structures: A case statement got me in trouble in
no. 21; a while confused me in no. 29; if-then-else led me astray in nos. 467, 471, 680
and 843. (See also nos.796 and 845, where efficiency of control was important.) I
conclude that every feature of a programming language can be harmful, if it is misused.
Some of the errors noted in my log book were much more devastating than others.
In certain cases the changes were far-reaching, affecting dozens of different parts of
the program; several days of ‘hacking’ were necessary before such changes had been
made and verified. For example, change no. 110 required major surgery to the program,
because my original ideas were incapable of handling aligned tables inside of aligned
tables. On the other hand, some of my errors were only venial sins, and some of the
changes were merely twiddles; for example, no. 87 simply improved the wording of a
diagnostic message. Although the log does not give an explicit weighting to the errors,
the ‘heavy’ errors tend to cancel with the ‘light’ ones, so we can still get a reasonable
insight into the stability of the program if we calculate, say, the number of errors
logged per year.
CHRONOLOGY
The development of m X has taken place over a period of ten years, and the lessons
I learned can best be understood when they are put into the context of the other things
I was doing during that time. Typography has many facets, hence TEX itself was only
one of the projects I decided to work on. The two most significant companion systems
were METAFONTI (a system for typeface design) and Computer Modern (a family of
typefaces defined in terms of the METAFONT language); these programs had to be
debugged just as TEX did, and their debugging logs show a similar development
history. I also needed a dozen or so utility routines to support TEX and METAFONT;
the most notable of these are TANGLE and WEAVE, which constitute the WEB system of
structured doc~mentation.’*~
BeghhgS
The genesis of TEX probably took place on 1 February 1977, when I first chanced
to see the output of a high-resolution typesetting machine. I was told that this fine
typography (the galley proofs of a book by Winston,’ which our faculty was considering
for inclusion in an exam syllabus) was produced by entirely digital methods; yet I
could see no difference between the digital type and ‘real’ type. Therefore I realized
that a central aspect of printing had been reduced to bit manipulation. As a computer
scientist, I could not resist the challenge of improving print quality by manipulating
those bits better. Therefore my diary entry for 8 February says that, already at that
time, I began discussing the possibility of new typesetting software with people at
Stanford’s Artificial Intelligence Lab. By 13 February I had changed my plan to spend
a forthcoming sabbatical year in South America; instead of travelling to an exotic place
and working on Volume 4 of The Art of Computer Programming, I had decided to stay
at Stanford and work on digital typography.
I mentioned earlier that the design of TEX was begun on 5 May 1977. A week later,
I wrote a draft report containing what I thought was a pretty complete design, and I
stayed up until 5 a.m. typing it into the computer. The problem of typesetting seemed
quite straightforward, so I soon started thinking about founts instead; I spent the next
45 days writing a program that was destined to evolve into METAFONT. By 28 June,
I had 25 lower-case letters in various styles that looked reasonably good to me at the
time; and three days later I figured out how to handle the 26th letter, which required
some new ideas.”
I went back to thinking about TEX on 3 July. Several people had made thoughtful
comments on my earlier draft, and I prepared a thoroughly revised language definition
after two weeks of further study. (This included two days of working with dictionaries
in order to develop an algorithm for hyphenation of English.) The resulting document,
I thought, was a reasonably complete specification of a language for typesetting, and
I left it in the capable hands of two graduate students who were my research assistants
that summer (Frank Liang and Michael Plass). Their job was to implement T o while
I flew off for a visit to China. I returned on 25 August and had just one day to meet
with them before leaving on another three-week trip. On 14 September I returned and
they presented me with a sheet of paper that had been typeset by their p r o t o - T o
program! They had implemented only about 15 per cent of the language, and they
had used data structures that were not general enough or efficient enough to support
the remaining 85 per cent; but they had chosen their subset wisely, so that a small
test program could run from start to finish. Hence it was easy for me to imagine what
a complete system would entail.
Now it was time for Liang and Plass to go back to school, and time for my sabbatical
year to begin. I started coding the ‘final version of TEX’(or so I thought) on 16
September, and immediately I discovered that their summer work represented a truly
heroic achievement. Although I had thought that my specification of TEX was quite
complete, I encountered loose ends every 15 minutes or so when I was actually faced
- _
T H E ERRORS OF TEX 613
with writing the code. I soon realized that if I had been in my students’ shoes-having
to implement this language when the author was completely unreachable-I would
have thrown up my hands in despair; important policy decisions had to be made at
every turn.
That was the first big lesson I learned during my work with T@: the designer of
a new kind of system must participate fully in the implementation. Even if I had been
available for consultation with my students, they would have had to come to me so
often with questions that the work would have dragged on forever. I can imagine them
having to spend a half hour or so explaining each particular problem to me, and we
would have needed literally hundreds of those meetings. Now I knew why other
projects I had heard about, in which the language designer had decided not to be the
compiler writer, had failed.
By 14 October I had coded all of TEX except for the parts that typeset mathematics,
and except for the routines that convert from X'S internal representation into codes
for an output device. At this point I had to leave for three weeks of travel in Europe.
This European trip had been planned long before, so it was mostly unrelated to
typesetting; but I did have some interesting discussions about curve-drawing with
mathematicians I met in Oberwolfach, Germany, and in Oslo, Norway. I also was able
to arrange a visit to the headquarters of Monotype Corporation in Redhill, England.
After returning, I spent November finishing the numerals, upper-case letters, and
punctuation marks of the first-draft Computer Modern types. I needed to have a
complete fount because I had been invited to give a lecture about this work to the
American Mathematical Society, and I did not want to have only lower-case examples
to show. I prepared the AMS lecture” during December and presented it in January,
so I did not have a chance to resume the coding of m X until 14 January. But finally
I was able to write the following in my diary on 9 February 1978:
Finished the TEX programs including all loose ends and got them all compiled
without syntax errors (4 a.m.).
was the first fairly large program I had written since 1970; so it was my first
non-trivial ‘structured program’, in the sense that I wrote it while consciously applying
the methodology I had learned in the early 1970s from Dijkstra, Hoare, Dahl and
others. I found that structured programming greatly increased my confidence in the
correctness of the code, while the code still existed only on paper. Therefore I could
wait until the whole program was written, before trying to debug any of it. This saved
a lot of time, because I did not have to prepare ‘dummy’ versions of non-existent
modules while testing modules that were already written; I could test everything in its
final environment. Of course I had a few qualms in January about whether my code
from September would really work; but that gave me more of an incentive to finish
the whole thing sooner.
Even on 10 February, when TEX had been compiled and was ready to be tested, I
did not feel any compelling need to try it immediately. I knew that the program was
fairly readable and ‘informally proved correct’, so I spent the next month making italic,
greek, script, symbols and large delimiter founts. My test program for required
those founts, so I did not want to start testing until everything was in place. Again, I
knew I was saving time by not having to prepare prototypes that would merely simulate
the real thing; structured programming gave me the courage to wait until the whole
614 D. E. KNUTH
system was ready. I finished the large symbols on 8 March, and I happily penned the
following in my diary on 9 March:
My log book for errors in TEX began that next day, 10 March; the debugging
process will be discussed below. By 29 March I had decided that TEX was essentially
working,
I began tuning up the founts and drafting ideas for a user manual; then I spent a few
days at Alphatype Corporation in Illinois, from whom Stanford had decided to purchase
a phototypesetter. From 11 April to 11 May I took time off from typography to work
on dozens of updates to Seminumen'cal Algorithms, which is Volume 2 of The Art of
Computer Programming;'2 I wanted to incorporate new research results into that text,
which was to be W X ' s first big application. Then on 14 May I began to get TEX
running again; proof copies of pages iv to 8 of Volume 2 came out of our Xerox
Graphics Printer on 15 May.
My work was cut out for me during the next weeks: I became a production user of
Q X , typing the manuscript of Volume 2. This proved to be an invaluable experience,
as explained below. By the time my sabbatical year ended, on 24 September, I had
finished the typing up to page 441 of that 700-page book. Improvements to kept
occurring to me all during that time, of course-except during a month-long vacation
trip with my family. (Even on vacation I kept seeing founts everywhere and thinking
about how to draw such letterforms by computer. I spent one morning sitting by one
of the trails in the Grand Canyon designing the algebraic notation for METAFONT; my
founts had previously been written in a primitive macro language and compiled directly
into machine code, not interpreted.) I also spent three weeks that summer writing the
first manual for WX.
Although my sabbatical year was over, I kept working on typography in odd moments
between classes in the autumn; the text of Volume 2 was completed on the morning
of 15 November. On 17 November I began writing METAFONT, and my diary entry
for 31 December 1978 was this:
Other people had begun to use TEX in August of 1978, and I was surprised to see
how fast the system was propagating. I spent my spare time during the first three
months of 1979 thinking about how to make TEX available in Pascal form. (The
original program was written in S A I L , a language that was available on only a few
computers.) During this period I began to experiment with the typesetting of Pascal
programs; I wrote a program called BLAISE that converted Pascal source code into a
file for pretty-printing. BLAISE soon developed into a system called DOC for
structured documentation, completed on 31 March, 1979; programs in DOC format
could be converted either to Pascal or to TEX.Luis Trabb Pardo and Ignacio Zabala
THE ERRORS OF TEX 615
subsequently used DOC to prepare a highly portable version ofm in Pascal, completed
in April of 1980.
About this time I learned another big lesson: writing software is much harder than
writing books. I could not simultaneously teach classes well and finish what needed to
be done on typography. So I asked to be excused from teaching in the spring of 1979;
my diary for March 22 said,
Now my obligations are fairly well cleared away and it’s back to the stalled
research on m X .
(It turned out that I was able to teach during only 13 of the 21 academic quarters
between my sabbatical years in that period. I continued to supervise graduate students,
but I gave no classroom lectures during 1983 when the work on TEX and METAFONT
was at its peak; I also missed three months in 1982, 1984 and 1985. I really enjoy
teaching, but I could not see any way to finish the Q X project without relinquishing
almost all of my other duties.)
On 1 April 1979, I returned to METAFONT, which had been written but not
debugged. METAFONT began to work on 28 April. Then I began to design software
for the Alphatype machine; that took about three months. During the summer I wrote
the METAFONT manual, which gave me further experience with TEX. And TEX also
received an important stimulus from the American Mathematical Society that summer,
when several people (including Barbara Beeton and Michael Spivak) were given the
opportunity to spend some time at Stanford developing QX macros. The AMS people
introduced me to several important applications, such as the indexes to Mathematical
Reviews, which stretched TEX to its limits and led to substantial improvements.
Endings
By 14 August 1979, I felt that TEX was essentially complete and fairly stable. I
lectured that evening to about 100 participants of the Western Institute for Computer
Science in Santa Cruz, telling about my experiences developing and debugging the
program. At that time my log book of errors had accumulated 420 items; little did I
know that the final total would be more than twice that! But already I knew that I
had learned a lot by keeping the log, and I must have been enthusiastic because I
lectured from 7:30 to 9:30 p.m. (The audience was equally enthusiastic-they kept
asking me questions until 11:30 p.m. So I resolved to write a paper about the errors
of Q X , and at last I am able to do so.)
I devoted the last months of 1979 and the first months of 1980 to Computer Modern,
which needed to be rewritten in terms of the new METAFONT. Then I needed to
update Volume 2 again-computer science marches inexorably forward-until I had
finally finished producing camera-ready copy on our Alphatype. This was the goal I
had hoped to achieve during my sabbatical year; I reached it at 2 a.m. on 29 July
1980, about two years late. During the rest of 1980 I wrote papers about what 1
thought were the most novel ideas in TEX13 and in METAFONT.14
But my research on TEX was by no means finished. About 50 people from all over
the U.S.A. met at Stanford on 22 February 1980, and established the TEX User
Group (TUG). I asked them if they would mind my cleaning up the language in
several upward-incompatible ways, even though this would make the user manual and
616 D. E. KNUTH
their existing computer files obsolete; and nobody objected to such changes! Soon
T U G grew dramatically, under the able chairmanship of Richard Palais, and it became
international. I realized that I could not disappoint all these people by leaving TEX
in its current state and returning immediately to work on subsequent volumes of The
Art of Computer Programming.
I needed to work out a better 'endgame strategy', and it soon became clear what
ought to be done: the original versions of TEX and METAFONT should be scrapped,
once they had served their purpose of accumulating enough user experience to indicate
what such languages ought to be. New versions of TEX and METAFONT should be
written, designed to last a long time and to be highly portable between computers and
typesetting devices of all kinds. Moreover, these new programs should be published,
because TEX was making it possible to improve the state of the art of program
documentation. I decided to do my best to produce a stable system and to explain all
I knew about it, so that other people could take it over and maintain it if it proved to
be important. This way I could return to other pursuits in good conscience, knowing
that if MY typographic research had any merit it would be carried on by others in
whatever ways would prove to be necessary.
So that was my new goal; I thought I could achieve it in one or two more years.
The original TEX program was renamed QX78, and the new one was to be called
m82.
Classes and miscellaneous chores kept me too busy to do much else during the first
half of 1981, but I began to write TEX82 on 22 August. By 9 September I realized
that the DOC system needed to be completely revised, so I spent two months replacing
it by a much better system called WEB'. Since then my programming language of choice
has been WEB (which, unlike DOC, was written in its own language). After a month in
Europe, I was able to resume writing "EX82 on 1 December 1981. The draft of
m 8 2 was completed on 29 June 1982; as before, I wrote the entire program before
trying to run any of it.
Meanwhile I had other problems to worry about. When my new copy of Seminumer-
icai Algorithms arrived in January 1981, I had expected to be filled with joy at the
consummation of so much hard work. Instead, I burned with disappointment, as I
realized that I still had a great deal to learn about founts. The early Computer Modern
typefaces were not at all what I had hoped to achieve, when I first saw them in print.
They had looked reasonably good at low resolution, so I had blithely assumed that
high resolution would be much better. Not so. My education in typefaces was barely
beginning. Later in 1981 I met Richard Southall, a professor of type design who had
exactly the expertise I was lacking; so I invited him to visit Stanford. We spent the
entire month of April 1982 working about 16 hours a day, revising Computer Modern
from A to z.
I debugged T ~ x 8 in 2 the summer of 1982, then began to write the new manual-
called The T@b~ok'~-in October. The first manual had been written hastily and
finished in 21 days, but I wanted The T & Y h k to meet much higher standards.
Therefore I was not able to finish it until a full year later.
It was during this period, October 1982 to October 1983, that TEX became a mature
system. I had to rethink every aspect of its design as I rewrote the manual. Fortunately
I was aided by a wonderful group of knowledgeable volunteers, who would meet with
me for two or three hours every Friday noon and we would discuss the trade-offs of every
important decision. The diverse backgrounds of these people provided an important
THE ERRORS OF TEX 617
counterweight to my one-sided views. Finally, on 9 December 1983, I decided that
the first phase of my endgame strategy was complete; I gratefully hosted a coming-of-
age party for TEX,with 36 guests of honour, at the Fuki-Sushi restaurant in Palo
Alto.
The rest is history. I wrote METAFONT in WEB between December 1983 and July
1984; I wrote The METAFONTbook between August 1984 and October 1985, taking off
five months (February to July) to rewrite Computer Modern in terms of the new
METAFONT. I began another sabbatical year in October 1985, just after the TEX project
disbanded. Finally, after adding a few more finishing touches, I was able to celebrate
the long-planned completion of my ‘endgame’ on 21 May 1986, when my publishers
sponsored a reception at the Computer Museum in Boston; that was the day I first
saw the five hardcover volumes of Computers & Typesetting,the books that summarize
my nine years of work on TEX, METAFONT and Computer Modern.
Another year has gone by and I would like to report that TEX has proved to be 100
per cent correct. But I cannot, not yet. For I stumbled across a hidden Q X anomaly
last January. And I have just been teaching a course about software development based
on the internal structure of TEX; students in the class have noticed a few things that
should be improved. So I suppose there is still at least one bug lurking there. I plan
to hold off publishing this paper until another year or so has gone by, so that I will
have more reason to believe that my log book of errors is complete.
CONTENTSOFTHELOGBOOKS
As I said, the appendix to this paper reproduces the entire list of errors that I kept as
TEX was evolving. The best way to comprehend how TEX evolved is to peruse this
list. The first 519 items refer to the original program m 7 8 , which was written in
SAIL, from the time I began to debug it to the time I stopped maintaining it. The
remaining items, numbered 520-849 (as of May 1987), refer to the ‘real’ program
m 8 2 , which was written in WEB. I did not keep any record of errors removed during
the hectic period when m 8 2 was being debugged, but items 520 and following
include every change that was made to W 8 2 after it passed its first test. The
differences between m 7 8 and m 8 2 , seen from a user’s standpoint, have been listed
elsewhere.l6
I have tried to edit the log entries so that they can be understood in terms of the
published listing6 of m 8 2 . For example,
is entry no. 15. My original log entry referred to case ‘[font]’in ‘eqdestroy’using
SAIL syntax, but I have changed to Pascal syntax in the edited log. Similarly, the 1978
identifier font eventually became setfont, so I have adopted the published equivalent.
m 8 2 contains a procedure called eq-destroy in $275 of the program, and this
procedure is quite similar to the eqdestroy of n X 7 8 ; so I have supplied $275 as a
program reference. (It turns out that eq-destroy no longer needs a ‘setfont:’ subcase,
but it did in 1978.) The ‘F‘ after $275 means that this was a bug of type F, a forgotten
function.
Changes to a program often spawn other changes later. I have tried to indicate that
618 D. E. KNUTH
phenomenon in the appendix by prefixing the number of a prior error when it was an
important part of the reason for a subsequent error. Thus no. 67 is
Error no. 25 was logged when I had been surprised to find a space at the end of TEX'S
internal representation of a paragraph. I had 'cured' the problem by converting the
space from a normal interword space to a space of width zero. But that was not good
enough, since it was possible for TEX to try breaking a line at the zero-width space.
A better solution was to replace the space by the glue that is always added to fill out
the end of a paragraph.
Figure 1 shows a time chart of the first 519 log entries-the errors of TEX78. There
is a burst of activity right near the beginning, since I logged the first 237 errors during
the three weeks of initial debugging. Thus the main line in Figure 1, which shows the
cumulative number of errors as a function of time, is nearly horizontal at the beginning.
But it is nearly vertical at the end, since only 13 changes were made during the last
year of T ~ X 7 8 ' sactivity.
Another line also appears in Figure 1: it represents the total number of different
pages I typeset with "EX78 as I was experimenting with the first version. The dotted
line in July 1978 stands for the 200 pages of the first TEX manual, and the dotted line
in June 1979 stands for the 100 pages of the first METAFONT manual; the remaining
solid lines stand for the 700 pages of Volume 2 and some experiments with DOC.
Figure 1 shows that four different phases can be distinguished in the development
of T ~ x 7 8First
. came the debugging phase (Phase 0), already mentioned. Then came
a longer period of time (Phase 1) when I typeset several hundred pages of Volume 2
and the first user manual; this experience suggested many amendments to my original
design. Then TEX suddenly had more than one user, and different kinds of errors
began to show up. New usersfind new bugs. This coming-out phase (Phase 2) included
small bursts of changes when I faced new applications-a suite of difficult test cases
posed by the American Mathematical Society, then the application to Pascal formatting,
then the complex index to Mathematical Reviews. Finally there was Phase 3, when
changes were made in anticipation of a future Q X 8 2 ; I wanted several new ideas to
be well tested before I programmed the 'ultimate' TEX.
Nov 80
Dec
Jan
80
81
\
Feb 81
Mar 81
.rEx installers’ workshop
b
Apr 81
May 81
Jun 81
Jul 81
Aug 81 Begin coding W 8 2
Sep 81
Oct 81
Nov 81
Dec 81
Jan 82
Feb 82
Mar 82
~ p a2
r
iy 3: 1
12 Mar 78
13 Mar 78
14 Mar 78
\
Data structures, memory management
Syntax, error recovery
Basic typesetting primitives
output
Paragraphing
I
PHASE
~-
DEBUGGING
15 Mar 78
16 Mar 78 Page breaking
Paragraphing, continued
17 Mar 78
19 Mar 78
Alignment
Math typesetting
I
20 Mar 78
21 Mar 78
22 Mar 78
23 Msr 78
25 Mar 78 “Realistic” test F gram
27 Mar 78
29 Mar 78
II
May 78 PAASE 1: FIRST AP~LIcATIONS
Jun 78
. **..User manual \I
’
Jul 78 0.
Aug 78
0 .
-Manual 0
Sep 78
Nov 78
PHASE 2: LnPAl l l -”I.-C f 7 D C
I
--\ Manual 1
Manual 2
I I
A MS
es so far)/2
Pascal typese
METRFONT .... Manual 3
AMS inclex demo
~~
- -
- Manual 4
PHASE 3: GLOBAL USEM New linebreaking algorithm rs so far
1980
I
-
1981 installers’ workshop \
1982
100 200
Begin coding ‘Q9C82
300 400 5
L
0
I could set a break-point and continue at high speed until coming to new material.
Watching the program execute itself in this ‘dynamic order’ has always been insightful
for me, after I have desk-checked it in the ‘static order’ of my original code.
Figure 2 shows that I got through the program initialization the first day; then I
was gradually able to check out the routines for basic data management, parsing and
error reporting. On the fourth day TEX began to combine boxes and glue, and there
was visible output on the fifth day. During the following three days I tested the
algorithms for breaking paragraphs into lines and breaking lines into pages. All this
went rather smoothly; I had already logged 101 errors during this first week, but all
of the problems were comparatively minor oversights, to be expected in any program
of this size.
On the ninth day I tackled alignment of tables, and got a big shock: my original
algorithms were quite wrong. I had greatly misunderstood this aspect of TEX, because
I had greatly underestimated the complications of nested alignments. (The log mentions
some of the puzzlement and frustration I felt at the time.) I wrestled with alignment
for two days before finding a solution.
THE ERRORS OF TEX 62 1
Then I looked at the last remaining part of TEX, the code for typesetting mathemat-
ics; this took another four days. (Well, the ‘days’were nights actually; I worked during
the night to avoid delays due to time-sharing.) Finally I had seen essentially all of
TEX in operation, and I could let it run at full speed instead of relying on single-step
mode. I spent six more days helping TEX get through its first test data; finally the
test was passed. Whew! The debugging phase was over, 18 days and 237 log-book
entries after it began.
I kept track of how long this process took, so that I’d be better able to estimate the
duration of future programming projects. Table I gives the figures.
The total debugging time, 132 h, was extremely encouraging to me, because it was
much less than the 41 days it had taken me to write the program. Previously I had
needed to devote about 70 per cent of program development time to debugging, but
now the figure had dropped to about 30 per cent. I considered this to be a tremendous
victory for structured programming, since my programming time had also decreased
from what it had been with old habits. Later, with the WEB system, I noticed even
further gains in productivity.
How big was Q X at the time? I estimated this by counting the number of semicolons
(4857) and the number of occurrences of the S A I L reserved words comment (480) and
else (223). Since I always put semicolons before end, the total number of statements
in the program could be computed as
Table I
700-page book is not one of life’s greatest pleasures-but the regular appearance of
nice-looking pages kept me happy. T h e jagged line in Figure 2 shows my progress in
terms of pages typeset versus errors in the TEX log; a similar (even more jagged) line
appears in Figure 1, showing pages typeset as a function of time.
The most striking thing about the jagged line in Figure 2 is that it is almost straight.
Ideas about how to improve TEX kept occurring to me quite regularly as I typed the
manuscript. Between 13 May and 22 June I processed about 250 pages, and added 69
new entries to the log. Those 69 entries included 29 ‘bugs’ and 40 ‘enhancements’;
thus, I thought of a new way to improve TEX at a regular rate of about one enhancement
for every six pages typed.
I mentioned earlier my firm conviction that I could not have correctly delegated the
coding of TEX to another person; I had to be doing it myself, because writing a new
sort of program implies continually revising the specifications. Similarly, I could not
have correctly delegated these initial typing experiments to another person. I had to
put myself in the r61e of a regular user; there is no substitute for such experience,
when a new system is being designed.
But at the time I was not thinking about creating a system that would be used
widely; I was designing TEX primarily for my own use. The idea that TEX could or
should be generalized to other applications besides The Art of Computer Programming
dawned on me only gradually, as people kept noticing what I was doing and expressing
an interest in it.
John McCarthy observed during this period that TEX was doing a reasonable job
with respect to traditional mathematical copy, but he suspected that I would have a
tough time typesetting a book about TEX itself. ‘That will be the real test’, he said,
‘because you’ll have to shut off many of TEX’S automatic features in order to handle
problems of self-reference’.
In July I succumbed to John’s challenge and prepared a user manual for TEX. Sure
enough, this experience helped me identify quite a few weaknesses in the existing
design, things that I probably wouldn’t have noticed if I had confined my attention to
The Art of Computer Programming alone. Again I thought of enhancements at the rate
of about one for every six or seven pages, as I wrote the manual; but these were not
really occasioned by defects in TEX’S ability to be self-referential, as John had predicted.
The new enhancements came about because the process of manual-writing forced me
to think about TEX as a whole, in a new way. The perspective of a teacher/expositor
helped me to notice several inconsistencies and shortcomings.
Thus, I came to the conclusion that the designer of a new system must not only be
the implementor and the first large-scale user; the designer should also wn’te the first
user manual. The separation of any of these four components would have hurt TEX
significantly. If I had not participated fully in all these activities, literally hundreds of
improvements would never have been made, because I would never have thought of
them or perceived why they were important.
code. At first I had fairly rigid ideas about how much space to put in certain places,
about how much penalty to charge for certain line breaks, about how to interpret
various characters in the input, and even about where to find certain characters in
founts. One by one, starting already at change no. 104, these things became parameters
that could be changed by users who had different requirements and/or different
preferences.
T H E REAL TEX
I had vastly underestimated the complexities and subtleties of typesetting when I had
naively expected to work out a complete system for myself during a single sabbatical
year. By 1980 it became clear that I had acquired almost a moral obligation to advance
the art and science of typography in a more substantial way. I realized that I could
never be happy with the monster 1 had created unless I started over and built an
entirely new system, using the experience I had gained from m X 7 8 .
I began writing the new system in the summer of 1981, and I decided to call it
W S Z because I knew it would take a year to complete. Once again I could not
delegate the job to an associate; I wanted to rethink every detail of TEX, and I wanted
to have a thorough taste of ‘literate programming’ before I dared to inflict such ideas
on other^.^ I wanted to produce truly portable software that would have a chance to
serve for many years as a reliable component of larger systems. I wanted m X 8 2 to
justify the confidence that people were placing in T ~ x 7 8 which , was getting more
praise than it deserved.
Figure 3 shows the development of T m 8 2 , starting at the moment I decided that
it was essentially bug-free; this illustration uses the same time-warp strategy as Figure 2.
Nov 82
Dec 82
Jan 83
Feb 83
Mar 83
Apr 83
May 83
J u n 83
Jul83
AUK83
Sep 83
Version 1.0
1984 PHASE 2: GLOBAL USERS
I Version 1.3
1985
Version 2.0
PHASE 3: CqNVERGENCE
100 200 300
T E S T PROGRAMS
Since 1960 I have had extremely good luck with a method of testing that may deserve
to be better known: instead of using a normal, large application to test a software
system, I generally get best results by writing a test program that no sane user would
ever think of writing. My test programs are intended to break the system, to push it
to its extreme limits, to pile complication on complication, in ways that the system
programmer never consciously anticipated. T o prepare such test data, I get into the
626 D . E. KNUTH
meanest, nastiest frame of mind that I can manage, and I write the nastiest code I can
think of; then I turn around and embed that in even nastier constructions that are
almost obscene. T h e resulting test program is so crazy that I could not possibly explain
to anybody else what it is supposed to do; nobody else would care! But such a program
proves to be an admirable way to flush the bugs out of software.
In one of my early experiments, I wrote a small compiler for Burroughs Corporation,
using an interpretive language specially devised for the occasion. I rigged the interpreter
so that it would count how often each instruction was interpreted; then I tested the
new system by compiling a large user application. T o my surprise, this big test case
did not really test much; it left more than half of the frequency counts sitting at zero!
Most of my code could have been completely messed up, yet this application would
have worked fine. So I wrote a nasty, artificially contrived program as described above,
and of course I detected numerous new bugs while doing so. Still, I discovered that
10 per cent of the code had not been exercised by the new test. I looked at the
remaining zeros and said, ‘Shucks, my source code wasn’t nasty enough, it overlooked
some special cases I had forgotten about’. It was easy to add a few more statements,
until eventually I had constructed a test routine that invoked all but one of the
instructions in the compiler. (And I proved that the remaining instruction would never
be executed in any circumstances, so I took it out.)
I used such ‘torture tests’ to debug three compilers during the 1960s. In each case
very few bugs were ever discovered after the tests had been passed, so the methodology
was quite effective. But when I debugged T ~ x 7 8 my , test program was quite tame by
comparison-except when I was first testing the mathematics routines (20-23 March).
I guess I was not trying as hard as usual to make TEX a bullet-proof system, because
I was still thinking of myself as TEX’S main user. My original test program for T ~ x 7 8
was written with an ‘I hope it works’ attitude, rather than ‘I bet I can make it fail’. I
suppose I would have found several dozen of the bugs that showed up later (such as
nos. 240 and 263) if I had stuck to the torture-test methodology. Still, considering my
mood at the time, I suppose it was a good idea to have a test program that would look
like real typography; I did not know what TEX should do until I could judge the
aesthetic quality of its output.
At any rate, my first test program was based on a sampling of material from Vol-
ume 2. I went through that book and boiled it down to five pages that illustrated just
about every kind of typographical difficulty to be found in the entire volume. (The
output of this test program can be seen in another paper,” where David Fuchs and I
used the same test data to study some algorithms for fount management.)
Years later, when T ~ x 8 2was ready to be debugged, I understood pretty clearly
what the program was supposed to do, so I could then apply the superior torture-test
methodology. My test program was called TRIP; I spent about five days preparing the
first draft of T R I P in July 1982. Here, for example, is a relatively tame part of the
original TRIP code:
\def\gobble#l{} M l o a t i n g p e n a l t y 100
\everypar {A\insert200{\basel ineskip400pt\spl i t t o p s k i p \ c o u n t 1 5 p t
\hbox{\vadjust{\penalty999}}\hbox t o -lOpt{}}\showthe\pagetotal
\show t he\p agegoal\advance\co mt 15by l h a r k {\ t he\c oun t 15 }%
\splitmadepth-lpt\paR\gobble}%abort e v e r y paragraph a b r u p t l y
THE ERRORS OF TEX 627
\def \ w e i r d # l {\csname\expandaf t e r k o b b l e\s tr i n g # l
\s tr ing\c sname\endc sname} \message{\the\output\we ird\one}
(Please do not ask me what it means.) Since then 1 have probably spent at least 200
hours modifying and maintaining TRIP, but I consider that time well spent, and I
think TRIP is one of the most significant products of the TEX project.” The reason
is that the T R I P test has detected extremely subtle bugs in hundreds of implementations
of TEX, bugs that would have been almost impossible to track down in any other way.
T ~ x 8 . 2 ,with its TRIP test, has proved to be much more reliable than any of the Pascal
compilers it has been compiled with. In fact, I believe it is fair to say that T E X ~ Zhas
helped to flush out at least one previously unknown compiler bug whenever it has been
ported to a new machine or tried on a compiler that has not seen TEX before! These
compiler errors were detectable because of the TRIP test. Later I developed a similar
test program for METAFONT, called TRAP,” and it too has helped to exorcise dozens
of compiler bugs.
A single test program cannot detect all possible mistakes. For example, TEX might
terminate with a ‘fatal error’ in several ways, only one of which can happen on any
particular run. Furthermore, TRIP runs almost automatically, so it does not test all of
TEX’S capability for on-line interaction. But TRIP does exercise almost all of TEX’S
code, and it does so in tricky combinations that tend to fail if any part of TEX is
damaged. Therefore it has proved to be a great time-saver: whenever I modify TEX,
I simply check that the results of the TRIP test have changed appropriately.
The only difficulty with the TRIP methodology is that I must check the output
myself to see if it is correct. Sometimes I need to spend several hours before I have
determined the appropriate output; and I am fallible. So TEX might give the wrong
answer without my being aware of it. This happened in bug nos. 543 and 722, when
I learned to my surprise that TEX had never before done the correct thing with TRIP.
A system utility for comparing files suffices now to convince me that incremental
changes to TEX or TRIP cause the correct incremental changes to the TRIP test output;
but when I began debugging, I needed to verify by hand that thousands of lines of
output were accurate.
I should mention that I also believe in the merit of formal and informal correctness
proofs. I generally try to prove my programs correct, informally, by stating appropriate
invariants in my documentation and checking at my desk that those relations are
preserved. But I can make mistakes in proofs and in specifying the conditions for
correctness, just as I make mistakes in programming; therefore I do not rely entirely
on correctness proofs, nor do I rely entirely on empirical test routines such as TRIP.
I mentioned before that each of the errors listed in the appendix refers where possible
to its approximate location in the program listing of T ~ x 8 2 It. is natural to wonder
whether the errors are uniformly interspersed throughout the code, or if certain parts
were particularly vulnerable. Figure 4 shows the actual distribution. No part of the
program has come through unscathed-or, shall we rather say, unimproved-but some
parts have seen significantly more action. The boxes to the left of the vertical lines in
Figure 4 represent ‘bugs’ (A, B, D, F, L, M, R, S, T ) , whereas the boxes to the right
represent ‘enhancements’ (C, E, G, I, P, Q). T h e most unstable parts of T ~ x 7 were
8
the parts I understood least when I began to write the code, namely mathematical
formatting and alignment. The most unstable parts of T ~ x 8 2were the parts that
differed most from T ~ x 7 8(the conditional instructions and other aspects of macro
expansion; also the increased user access to registers and internal quantities used in
m’sdecision-making) .
I should mention why hyphenation is almost never mentioned in the log of TEX78.
Although I said earlier that m X 7 8 was entirely written before any of it was tested,
that is not quite true. T h e hyphenation algorithm was quite independent of everything
else and easily isolated from the code, so I had written and debugged it separately
during three days in October 1977. (There is obviously no advantage to testing
independent programs simultaneously; that leads only to confusion. But the rest of
was highly interdependent, and it could not easily be run when any of the parts
$0 Input/output, strings
$50 Error handling
$100
Data structures for semantics
$150
Basic operations on data
$200
The hash table
8250
Data structures for syntax
$300
Low-level parsing
$350
Macro expansion
$400
Medium-level parsing
$450
Conditionals
$500
File name scanning
$550
Font data
$600
Binary output
$650
Data structures for math
$700
Math typesetting
$750
Alignment
gsoo Line breaking
$850
Line breaking, continued
$900
Hyphenation
$950
Page breaking
51000 The chief executive
$1050
Building boxes
$1100
Building lists
$1150
Building math formulas
$ 1200
Assigning to user registers
$1250
Miscellaneous
$1300
Initialization
$1350
Extensions
§ 1400
A, Algorithmic anomalies
I decided from the beginning that the algorithms of TEX would be in the public
domain. But if I were to change my mind and charge a fee for my services in inventing
them, I would probably request the highest price for a comparatively innocuous-looking
group of statements now found in sections 851 and 854 of the program. This precise
sequence of logical tests, used to control when a line break is being forced because
there is no ‘feasible’ alternative, has the essential form
if a, v az then
if ag A a4 A a5 A a6 then u1
else if a7 then uz else u3
else u4
and most of the appropriate boolean conditions a; were discovered only with great
difficulty. T h e program now warns any readers who seek to improve TEX to ‘think
thrice before daring to make any changes here’. Some indications of my struggles with
this particular logic appear in error nos. 75, 93 and 506.
-X’s line-breaking algorithm determines the optimum sequence of breaks for each
paragraph, in the sense that the total ‘demerits’ are minimized over all feasible sequences
of breaks. The original algorithm was fairly simple, but it continued to evolve as I
fiddled with the formula used to calculate demerits. Demerits are based on the ‘badness’
b of the line (which measures how loose or tight the spacing is) and the ‘penalty’p for
the break (which may be at a hyphen or within a mathematical formula). A penalty
might be negative to indicate a good break. T h e original formula for demerits in T ~ x 7 8
was
D = max(b +P,O)~
T H E ERRORS OF TEX 63 1
D = { (1 + b + P ) ~ , ifp 2 0
+
(1 b)2-p2, ifp < 0
The extra constant 1 was used to encourage paragraphs with fewer lines; the subtraction
of p2 when p < 0 gave fewer demerits to good breaks. This improved formula was
published on page 1128 of the article on line-breaking by Knuth and Plass.13 T h e first
draft of T ~ x 8 2
added an obvious generalization to the improved formula by introducing
a \1inepenalt y parameter, t , to replace the constant 1. A further improvement was
made in change no. 554, when I realized that better results would be obtained by
computing demerits as follows:
D = { (1 + b ) 2+ p 2 , ifp 2 o
(1 + b ) 2- p 2 , if p < 0
B, Blunders
A typical blunder, among the 50 or so errors of class B in the appendix, is illustrated
by error nos. 7 and 92. I had declared two symbolic constants in my program, new-line
(for one of the three states of TEX’S lexical scanner) and next-line (for the sequence-of
ASCII codes carriage-return and line-feed, needed in S A I L output conventions).
Although the meanings were quite dissimilar, the names were quite similar; therefore
I confused them in my mind. T h e compiler did not detect any syntax error, because
both were legal in an output statement, so I had to detect and correct the bugs myself.
I could have avoided these errors by using a name like cr-If instead of nextline; but
that sounds too jargony. A better alternative would have been new-line-state instead
of new-line.
D, Data disasters
My most striking error in data-structure updating was no. 630, which crept in when
I made change no. 625. T h e error needs a bit of background information before I can
explain it: using an idea of Luis Trabb Pardo, I was able to save one bit in each node
of TEX’S main data structures by putting the nodes in which the bit would be &the
so-called charnodes-into the upper part of the mem array, all other nodes into the
lower part. (It was very important to save this bit, because I needed at least 32
additional bits in every charnode.) One of the aspects of change no. 625 was to optimize
my data structure for representing mathematical subformulae that consist of a single
letter. I could recognize and simplify such a subformula by looking for a list that
consisted of precisely two elements, namely a charnode followed by a ‘kern node’ (for
an ‘italic correction’). A kern node is identified by (a) not being a charnode, i.e. not
having a high memory address, and (b) having the subfield type = 11.
632 D . E. KNUTH
I forgot to test condition (a). But my program still worked in almost every case,
because unsuitable lists of length 2 are rare as subformulae, and because the type
subfield of a charnode records a fount number. Amazingly, however, within one week
of my installing change no. 625, some user happened to create a mathematical list of
length 2 in which the second element was a character from fount number l l !
This example demonstrates that I was lucky to have a wide variety of users. Still,
such a bug might survive for years before it would cause trouble for anybody.
F, Forgetfulness
As I am writing this paper, I am trying to remember all the points I wanted to
explain about TEX’S evolution. Probably I will forget something, as I did when I was
writing the program for TEX.
Usually a bug of class F was easily noticed when I first looked at the corresponding
part of the code, with my walk-through-in-execution-order method of debugging. But
I would like to mention two of the F errors that were among the most difficult to find.
Both of them occurred in routines that had worked correctly the first few times they
were exercised; indeed, these routines had been called hundreds of times, with perfect
results, so I no longer suspected that they could be the source of any trouble.
Error no.91 occurred in the memory allocation subroutine, the first time I ran out
of memory. That subroutine had the general form
The bug is obvious: I forgot to say ‘goto ovfl’ just before the label ‘found:’. And it is
also obvious why this bug was hard to find: I had lost my suspicions that this subroutine
could fail, but when it did fail it allocated one node right in the middle of another.
My linked data structure was therefore destroyed, but its defective fields did not cause
trouble until several hundred additional operations had been performed by the parts
of the program where I was still looking for bugs.
Error no.203 was even more difficult to find; it lurked in TEX’S get-next routine,
the subroutine that is executed far more than any other. Whenever TEX is ready to
see another token of input, get-next comes into action. Therefore, by the time I had
corrected 200 errors, get-next had probably obtained the correct next token more than
100,000 times; I considered it rock-solid reliable.
Since get-next is part of TEX’S ‘inner loop’, I had wanted it to be efficient. Indeed,
I learned later that the very first statement of get-next, ‘cur-cs t O’,is performed more
often than any other single statement of Q X 8 2 . (Empirical tests covering a period of
more than a year show that ‘cur-cs t 0’ was performed more than 1.4 billion times on
Stanford’s SUAI computer. T h e get-avail routine, which is next in importance, was
invoked only about 438 million times.) Knowing that get-next was critical, I had tried
THE ERRORS OF TEX 633
to avoid performing ‘cur-cs t 0’ in my first implementations, in cases where I knew
that the value of cur-cs would not be examined by the consumers of getnext’s tokens.
In fact, I knew that cur-cs would be irrelevant in the vast majority of cases. (But I
also knew, and forgot, Hoare’s dictum that premature optimization is the root of all
evil in programming.)
Well, you can almost guess the rest. When 1 corrected my serious misunderstanding
of alignments, error nos. 108 and 110, I introduced a new case in get-next, and that
new case filled my thoughts so much that I forgot to worry about the ‘cur-cs t 0’
operation. Still, no harm was done unless cur-cs was actually being looked at; TEX
would not fail unless \ c r occurred in an alignment having a special sort of template
that required back-up in the parser. As before, the effect of this error was buried in a
data structure, where it remained hidden until much later. I found the bug only by
temporarily inserting new code that continually monitored the integrity of the data
structures. (Such code later became a standard diagnostic feature of it can be
seen for example in section 167.)
L, Language lossage
Some of my errors (nos.98, 295, 296, 480) were due to the fact that algorithms
involving floating-point numbers sometimes fail because of round-off errors. (I have
assigned these errors to class L instead of class A, although it was a close call.) T ~ x 8 2
was designed to be portable so that it gives essentially identical results on all computers;
therefore I avoided floating-point calculations in critical parts of the new program.
Two other errors in my log belong unambiguously to class L: in nos. 63 and 827, I
failed to insert parentheses into a macro definition. As a result, when I used the macro
with text replacement, any frequent user of macros can guess what happened. (Namely,
in no.827, I had declared the macro
hi-mem-stat-min = mem-top - 13
MyMismatches
When I write a program I tend to forget the exact specifications of its subroutines.
One of my frequent flubs is to blur the distinction between an object and a pointer to
that object. In T ~ x 8 2 for
, example, I noticed when I got to error no.79 that I had
called vpackage (p, . . . ) where p pointed to the first node of a vlist, whereas in the
declaration of vpackage I had assumed parameters of the form (h, . . . ) where h points
to a list header; thus link(h), not h itself, was assumed to point to the first list item.
T h e compiler did not catch the error because both h and link(h) were of type pointer.
While fixing this bug it occurred to me that vpackage was an oft-used subroutine
and that I might have made the same mistake more than once. So I looked closely at
each of the 26 places I had called vpackage, and the results proved that I was remarkably
634 D . E. KNUTH
inconsistent: I had specified a list head 14 times, and a direct pointer 12 times!
(Fortunately there was not a 13-13 split; that would have been unlucky.)
This error reminded me that I should always check the entire program whenever I
notice a mistake;fuilures tend to recur. In fact, several errors of T ~ x 8 (nos.
2 803, 813,
815, 837) were first noticed when I was debugging similar portions of METAFONT.
R, Robustness
Most of the changes of type R were introduced to keep TEX from crashing when
users supply input that does not obey the rules. But some of the Rs in the log are
intended to keep TEX alive even when other parts of TEX are failing, because of my
programming errors or because somebody else is trying to produce a new modification
of TEX.
Thus, for example, in nos.99 and 123, I redesigned two of my procedures so that
they would produce a symbolic printout of given data structures in memory even when
those data structures were malformed. I made it possible to obtain meaningful output
from arbitrary bit configurations in memory, so that while debugging TEX I could
interactively look at garbage and guess how it might have arisen.
One of the most recent changes to TEX, no. 846, has the same flavour: the parameter
to show-node-list was redeclared to be of type integer instead of type pointer, because
buggy calls on show-node-list might not supply a valid pointer.
S, Surprises
The most serious errors were those due to my global misunderstandings of how the
system fits together. T h e final error in TEX78 was of type S, and I suppose the final
error of T ~ x 8 2will be yet another surprise.
Let me mention just two of these. The first is extremely embarrasing, but it makes
a good story. TEX produces D V I files as output, where DVI stands for Device
Independent. T h e D V I language is like a machine language, consisting of %bit instruc-
tion codes followed in certain cases by arguments to the instructions. Two of the
simplest instructions of DVI language are push (code 141) and pop (code 142). It turns
out that TEX might output push followed immediately by pop in various circumstances,
and this needlessly clutters up the DVI file; so I decided to optimize things a bit by
checking to see whether the final byte in my output buffer was push before TEX would
output a pop. If so, I could cancel both instructions. This technique even made it
possible to detect and cancel long redundant sequences such as push push pop push
push pop pop pop. Naturally, I checked to see that the buffer had not been entirely
cancelled out when I tested for such an optimization. ( I was not 100 per cent stupid.)
But I failed to realize that the byte just preceding pop might just happen to be 141
(the binary code for push) when it was the final operand byte of some other instruction.
Ouch!
The other S bug I want to discuss is truly an example of global misunderstanding,
because it arose in connection with my misperceptions about \global definitions in
TEX documents. Users can define control sequences such as \abc inside a TEX
‘group’, which is essentially a ‘block’ in the sense of Algol scope rules. At the end of
a group, local definitions are rescinded and control sequences revert to the meanings
T H E ERRORS OF TEX 635
they had at the beginning of the group. In my first implementation of T ~ x 7 8I went
even further: If \abc was defined inside a group but not before the group had begun,
I actually removed \abc from the hash table when the group ended.
There is one exception, however, to TEX’S local scope rules (and it is usually the
exceptions that lead to surprises). Users can state that a definition is \ g l o b a l ; this
means that the new definition will survive at the end of the current group, unless it
has been globally redefined again. Therefore my implementation removed control
sequences from the hash table at group endings only when they had not been globally
defined.
That caused bug no. 422, which was identical to one of the first serious bugs I had
ever encountered when learning to program in the 1950s: deletions from an ‘open’ hash
table might make other keys inaccessible, unless the deletions occur in FIFO order,
or unless the deletion algorithm takes special precautions to relocate keys in the table.
(See my book Sorting and Searching,22 pp. 526-527, where I say-in italics-‘The
obvious w a y to delete records from a scatter table doesn’t work.’) Alas, I had deleted
the control-sequence records in the ‘obvious way’ in T ~ x 7 8 not
, realizing that global
definitions destroyed the F I F O order.
To fix bug no. 422, I could not patch the definition procedure by using Algorithm
6.4R from my book,22 because the organization of TEX did not allow for relocation of
keys. So I needed to change the hash table algorithm from linear probing to chaining,
which supports arbitrary deletions. This change was not as painful as it might have
been at this late date (August 1979), because I had needed an excuse anyway to
overcome my initial hash table design. In order to keep the original implementation
simple, I had decided to require that control sequence names be essentially unique
when restricted to their first six letters. Such a restriction was quite reasonable when
I was to be the only user of TEX; but it was becoming intolerable when the number
of users began to grow into the thousands. Therefore change no. 422 not only altered the
hash discipline, it also changed the entire representation mechanism so that identifiers of
arbitrary length could be accommodated.
And that was not the end of the story. Another year and a half went by before I
realized (in no. 493) that TEX allows declarations like
\def\abc{. . . }
\global\def \xyz{.. . \abc. . . }
within a group. In such cases I could not eliminate \abc from the hash table at the
end of the group, because a reference to \abc still survived within \xyz. I finally
decided not to delete anything from the hash table (although I did provide a mechanism
to prevent unwanted keys from ever getting in; see nos. 294 and 769).
How did such serious bugs remain undetected for so long? They lay dormant because
normal usage of TEX does not require complicated interactions between local and
global definitions in groups. Most formatting is simpler than this; even complex books
such as The Art of Computer Programming and the TEX manual itself do not need such
generality. But if I had used the TRIP test methodology in the early days, I would
have found and corrected the local/global problems right at the start. This experience
suggests that all software systems be subjected to the meanest, nastiest torture tests
imaginable; otherwise they will almost certainly continue to exhibit bugs for years after
they have begun to produce satisfactory results in large applications.
636 D. E. KNUTH
T, Typographical trivia
T h e typographical errors of T@ were not especially significant, but I will mention
two of them (nos. 69 and 86), where my original SAIL code looked like this:
C, Clean-ups
The stickiest issue in TEX has always been the treatment of blank spaces. Users
tend to insert spaces in their computer files so that the files look nice, but document
processors must also treat spaces as objects that appear in the final output. Therefore,
when you see documents nowadays that have been prepared by systems other than
m X , you often find cases where double spaces appear incorrectly between words; arid
when you see documents prepared with Tm, you run into cases where a necessary
space between words has disappeared. I kept searching for rules that would be simple
enough to be easily learned, yet natural enough that they could be applied almost
unconsciously. I finally concluded that no such rules existed, and I opted for the best
compromise I could find.
Several of the log entries refer to the question of optional spaces after a macro
definition. In no. 133, I decided to ignore a space that appears there; this was prompted
by experiences recorded in my comments following nos. 115 and 119. But no. 133 caused
a timing problem in no. 560, because the macro definition had not been fully processed
when TEX wanted to check for the optional space; if the user invoked the macro
immediately, instead of putting a space there, TEX was not ready to respond. Finally
in no. 606 I came to the conclusion that m X users will best be able to keep their sanity
T H E ERRORS OF TEX 637
if I do not ignore spaces after definitions; then dozens of similar-appearing cases all
have consistent rules.
(See also no.220, for space after $$; nos.361, 708, 720 and 723, for space after
constants; no. 440, for space after active characters; and no. 632, for space after ‘\Y.)
G, Generalizations
TEX continued to grow new capabilities as people would present me with new
applications. When I could not handle the new problem nicely with the existing TEX,
I would usually end up changing the system. (But I kept the changes minimal, because
I always wanted to finish and get on with other things. More about that later.)
Such generalizations were often built incrementally on the shoulders of their prede-
cessors. For example, the original TEX78 had \output and b a r k and macro
definitions, which scanned and remembered lists of tokens, but there was no good way
to assign a list of tokens to a ‘token list variable’ without causing macro expansion.
Then TEX82 added a feature called \everypar, which Arthur Keller had long been
lobbying for. One day I noticed that I could solve a user’s problem in a tricky way by
temporarily using \everypar to store a list of tokens. This was quite different from
the intended use of \everypar, of course; so I introduced a new primitive operation
called \tokens for such purposes (no. 559). Later, \everypar spawned several
descendants called \everymath and \everydi splay (no. 568), \everyhbox and
\ e v e r y b o x (no. 649), \everyj ob (no. 657), \everycr (no. 688). I eventually found
applications where \tokens was not enough by itself and I needed to borrow one of
the \every features temporarily to do some non-standard hackery. So I finally replaced
\tokens by an array of 256 registers called \toks (no.713), analogous to TEX’S
existing arrays of registers for integers, dimensions, boxes and glue. TEX82 also
acquired the ability to make assignments between different kinds of token-list variables
(no. 746). In such ways I tried to keep the design ‘orthogonal’ as the language grew.
Of course every language designer likes to keep a language simple by applying
Occam’s razor. I was pleased to discover early in 1977 that simple primitive operations
involving boxes, glue and penalties could account for many of the fundamental oper-
ations of typesetting. This was a real unification of basic principles, and it turned out
to be even better when I realized that the concepts of ordinary line-breaking applied
also to tasks that seemed much harder.I3 But I also fooled myself into thinking that
TEX had fewer primitives than it really did, by ‘overloading’ operations that were
essentially independent and calling them single features.
For example, my original design of TEX78 would break paragraphs into lines by
ignoring all lines whose badness exceeded 200. Later (no. 104) I made this threshold
value user-settable by introducing a new primitive called \j par. Setting \jpar =2
was something like setting \tolerance=200 in TEX82; but I also included a peculiar
new convention: if \ j p a r was odd, the paragraphs would be set with ragged right
margins, otherwise they would be justified to the full width!
Thus, in my attempt to minimize primitives, I had loaded two independent ideas
onto a single parameter. I had also packed half a dozen different kinds of diagnostic
output into a single number called \ t r a c i n g (see no. 199), whose binary digits were
examined individually when TEX was deciding whether to trace parts of its operations.
Then I began to see the need for more user-settable numbers, and I shuddered to
think at the resultant multiplicity of new primitives. So I replaced both \ j p a r and
638 D . E. KNUTH
I, Interactions
About 15 per cent of the errors in the TEX log have been classified type I. T h e
main issue in such cases is to help users identify and recover from errors in their source
programs, and this is always problematical because there are so many ways to make
mistakes. ‘When your error is due to misunderstanding rather than mistyping, . . . TEX
can only explain what looks wrong from its own viewpoint; such an explanation is
bound to be mysterious unless you understand the machine’s attitude’. I s Which you
don’t.
Still, I kept trying to make TEX respond more productively, and every such change
was logged as an ‘error’ in my original design. The most memorable error of this type
was probably no.213, when I first realized how nice it would be if I could insert a
token or two that mX could read immediately, instead of aborting a run and starting
from scratch. (This was soon followed by no. 242, when deletion of tokens was also
allowed in response to an error message.) I would never have thought of these
improvements if I had not participated in the implementation and testing of TEX, and
I have often wished for similar features in the compilers I have used since. This one
feature must have saved me hundreds of hours as a mX user during recent years.
Another improvement in interaction did not occur to me until several months and
several hundred pages of output later. Error no.338 records the blessed day when I
gave TEX the ability to track ‘runaways’, parts of the program that were being processed
in the wrong mode because of missing right delimiters. (Further refinements to that
change were logged as entry nos.344, 426 and 793.) Without such provisions, errors
that TEX could not have detected until long after their appearance would have been
much harder to track down.
There was another significant improvement in interaction that never made it into
my error log, because I included it in the original TEX82 without ever putting it into
m X 7 8 . This is the short-display procedure, for showing the contents of ‘overfull boxes’
and such things in an abbreviated form easily understood by novice users. The
short-display idea was invented by Ralph Stromquist, who installed it in his early
version of TEX at the University of Wisconsin.
THE ERRORS OF TEX 639
P, Portability
The first changes of type P were simply enhancements to the comments in my S A I L
program, but the advent of WEB made it possible for TEX to become truly independent
of the machine and operating system it was being run on.
Change no.633 is perhaps the most instructive class-P modification: I decided to
guarantee compatibility between DEC-like systems (which break the source file into
lines according to the appearance of ASCII carriage-return characters) and IBM-like
systems (which have fixed-length source lines reminiscent of 80-column cards),* in the
following way: whenever TEX reads a line of input, on any system, it automatically
removes all blank spaces that appear at the right end. The presence or absence of such
blanks therefore cannot influence the behaviour of TEX in any way. An ASCII file
whose lines are at most 80 characters long (as defined by carriage returns, with or
without blanks in front of those carriage returns) can be converted to a file of 80-
character records that will produce identical results with TEX, simply by padding each
line with blanks.
Change no. 791 carried no. 633 to its logical conclusion.
From the beginning, I wanted TEX to produce documents of the highest possible
typographical quality. T h e time had come when computer-produced output no longer
needed to settle for being only ‘pretty good’; I wanted to equal or exceed the quality
of the best books ever printed by photographic methods.
As Kernighan and Cherry have said, ‘The main difficulty is in finding the right
numbers to use for esthetically pleasing positioning. . . . Much of this time has gone
into two things-fine-tuning (what is the most esthetically pleasing space to use between
the numerator and denominator of a fraction?), and changing things found deficient
by our users (shouldn’t a tilde be a delimiter?)’.23
I too had trouble with numerators and denominators: change no. 229 increased the
amount of space surrounding the bar line in displayed fractions, and I should have
made a similar change to fractions in text. (Page 68 of the new Volume 2 turned out
to be extremely ugly because of badly spaced fractions.) T ~ x 8 was2 able to improve
the situation because of my experiences with T ~ x 7 8 but , even today I must take
special precautions in my TEX documents to get certain square roots to look right.
*Paradoxically, DEC has also introduced the VMS operating system, which has fixed-length lines that can include
troublesome carriage-returns. But that is another story.
640 D . E. KNUTH
to TEX whatsoever. I wanted TEX to produce the highest quality, sure, but I wanted
to achieve that with the minimum amount of work on my part.
At the end of almost every day between 29 March 1978 and 29 March 1980, I felt
that TEX78 was a complete system, containing no bugs and needing no further
enhancements. At the end of almost every day since 9 September 1982, I have felt
that TEX82 was a complete system, containing no bugs and needing no further
enhancements. Each of the subsequent steps in the evolution of TEX has been viewed
not as an evolutionary step towards a vague distant goal, but rather as the final
evolutionary step towards the finally reached goal! Yet, over time, TEX has changed
dramatically as a result of many such ‘final steps’.
Was this horizon-limiting attitude harmful, or was it somehow a blessing in disguise?
I am pleased to see that TEX actually kept getting simpler as it kept growing, because
the new features blended with the old ones. I was constantly bombarded by ideas for
extensions, and I was constantly turning a deaf ear to everything that did not fit well
with TEX as I conceived it at the time. Thus TEX converged, rather than diverged,
to its final form. By acting as an extremely conservative filter, and by believing that
the system was always complete, I was perhaps able to save TEX from the ‘creeping
featuri~m’’~ that destroys systems whose users are allowed to introduce a patchwork of
loosely connected ideas.
If I had time to spend another ten years developing a system with the same aims as
TEX-if I were to start all over again from scratch, without any considerations of
compatibility with existing systems-I could no doubt come up with something that
is marginally better. But at the moment I cannot think of any big improvements. The
best such system I can envision today would still look very much like T ~ x 8 2 so ; 1
think this particular case study in program evolution has proved to be successful.
Of course I do not mean to imply that all problems of typography have been solved.
Far from it! There still are countless important issues to be studied, relating especially
to the many classes of documents that go far beyond what I ever intended TEX to
handle.
CONCLUSIONS
My purpose in this paper has been to describe what I think are the most significant
aspects of the experiences I had while developing TEX, basing this on a study of more
than 800 errors that I noted down in log books over the years. I have tried to interpret
many specific facts and observations in a sufficiently general way that readers may
understand how to apply similar concepts to other software developments.
In Volume 1 of The Art of Computer Pr~gramrning,~’ I wrote:
Well, I hope that my error log in the appendix below, especially the first 237 items
(which relate specifically to debugging), will be useful somehow to people who study
the debugging process.
But if you ask whether keeping such a log has helped me learn how to reduce the
number of fature errors, my answer has to be no. I kept a similar log for errors in
METAFONT, and there was no perceivable reduction. I continue to make the same
kinds of mistakes.
What have I really learned, then? I think I have learned, primarily, to have a better
sense of balance and proportion. I now understand the complexities of a medium-size
software system, and the ways in which it can be expected to evolve. I now understand
that there are so many kinds of errors, we cannot stamp them out by systematically
eliminating everything that might be ‘considered harmful’. I now understand enough
about my propensity to err that I can accept it as an act of life; I can now be convinced
more easily of my fallacy when I have made a mistake. Indeed, I now strive energetically
to find faults in my own work, even though it would be much easier to look for
assurances that everything is OK. I now look forward to making (and correcting)
hundreds of future errors as I write Volume 4 of The Art of Computer Programming.
ADDENDUM: F I F T E E N M O N T H S MORE
As I mentioned above, I began to write this paper in May 1987, but I decided to wait
before publication until more time had gone by. Then I could present a ‘complete’
and ‘final’ record of TEX’S errors.
Now it is September 1988, and I have decided to bring this paper to a possibly
premature conclusion, because I am scheduled to present it at a conference.’” TEX
still has not shown encouraging signs of becoming quiescent; indeed, sixteen more
entries have entered the error log since May 1987, including three as recent as June
1988. Therefore it still is not the right moment to manufacture TEX on a chip!
All errors known to me as of 1 September 1988, are now included in the appendix
to this paper; the total has now reached 865.” I plan to publish a brief note ten years
from now, bringing the list to its absolutely final form.
I have been paying a reward to everyone who discovers new bugs in TEX, and
doubling the amount every year. Last December I made two payments of $40.96 each,
and my chequebook has been hit for five $81.92 payments in recent months. I am
desperately hoping that this incentive to discover the final bugs will produce them
before I am unable to pay the promised amount. (Surely in 1998 I won’t be writing
cheques for $83,886.08?)
As I expected, half of the most recent errors have fallen into the surprise (S)
category-even though surprises, by definition, are unexpected. But one of the others
(error no.854) was perhaps the most surprising of all, because it was the result of a
terrible algorithm by a person who certainly should have known better (me). I wanted
to multiply the two’s-complement fixed-point number
by the positive quantity Z/2'6, where Z is an integer, ZZ6 5 Z < 227, obtaining an
answer of the form P/216 where P is an integer, IPI < Z 3 1 ; all intermediate quantities
in the calculation were required to be less than Z31 in absolute value. My program did
this by computing
C t 16 * Z;
Z t Z div 16;
P t ((a, * Z) div 256 + a2 * Z) div 256 + a, *Z - C;
Z t Z div 16;
P t ((a, * Z) div 256 + a2 * Z) div 256 + (a, - 256) * Z
The new version also avoids problems on certain computers when O0 and O1 are
negative; that was error no. 863. (Of course, when TEX is this close to running out of
T H E ERRORS OF TEX 643
memory, it probably will not survive much longer anyway. I am grasping at straws.
But I might as well grasp intelligently.)
ACKNOWLEDGEMENTS
I have already mentioned that the TEX project has had hundreds of volunteers who
helped to guide me through all these developments. Their names can be found in the
rosters of the TEX Users Group; I couldn’t possibly list them all here. Luis Trabb
Pardo and David R. Fuchs were my ‘right-hand men’ for T ~ x 7 and 8 T ~ x 8 2 respect-
,
ively. T h e project received generous financial backing from several independent sources,
notably the System Development Foundation, the U.S. National Science Foundation,
and the Office of Naval Research. The material on which this report has been based
is now housed in the Stanford University Archives; I wish to thank the archivist,
Roxanne L. Nilan, for her friendly co-operation. The preparation of this paper has
been supported by U. S. National Science Foundation grant CCR-86-10181. Thanks
are due to the referee who helped me to remove errors not from TEX but from this
paper. And above all, I want to thank my wife, Jill, for ten years of exceptional
tolerance; software development is much more demanding than the other things I
usually do. Jill also helped me to design the format for the appendix that follows.
10 Mar 1978
1 Rename a few external variables to make their first six letters unique. L
2 Initialize escape-char to -1, not 0 [it will be set to the first character input]. $240 D
3 Fix bug: The test ‘id < ‘200’ was supposed to distinguish one-letter identifiers
from longer (packed) ones, but negative values of id also pass this test. $356 L
4 Fix bug: I wrote ‘while a A (p V 7)’ when I meant ‘while ( a A p) V y’. $259 B
5 Initialize the input routines in INITEX [at this time a short, separate program
not under user control], in case errors occur. $1337 R
6 Don’t initialize mem in INITEX,it wastes time. $164 E
7 Change ‘new-line’ [which denotes a lexical scanning state] to ‘neb-line’ [which
denotes carriage-return and line-feed] in print commands. B
8 Include additional test ‘memlp] # 0 A’ in check-mem. $168 F
9 Fix inconsistency between the eq-level conventions of macro-def and eq-define. $277 M
About six hours of debugging time today.
INITEX appears to work, and the test routine got through start-input, chcode
[the m 7 8 command for assigning a cat-code], get-next , and back-input the
first time.
11 Mar 1978
10 Insert space before ‘(’ on terminal when opening a new file. $537 I
11 Put ‘ p +- link@)’ into the loop of show-token-list, so that it doesn’t loop
forever. $292 F
12 Shift the last item found by scan-toks into the info field. [With SAIL all packing
of fields was done by arithmetic operations, not by the compiler.] $474 L
12 H 13 Fix the previous bugfix: I shifted by the wrong amount. $474 B
14 Add a feature that prints a warning when the end of a file page occurs within
a macro definition or call. [System dependent.] $336 I
Unintended bugs in my test routine [a format intended eventually to typeset
The Art of Computer Programming] helped check out the error recovery
mechanisms. For example, I had ‘\Zftf#]’ instead of ‘\lft.C##]’inside a
macro, and three cases of improper f and 3 nesting.
15 Add the forgotten case ‘set-font:’ to eq-destroy. $275 F
16 Change \require to \input. $376 C
17 Add code for the case cur-cmd = 0 [later known as the case ‘ t 2 cs-token-flag’]
when scanning a tokenlist. $357 F
That’s the first “big” error I’ve spotted so far.
18 Introduce a ‘d’ option in the error routine, to facilitate debugging. $84 I
19 Assign a floating-point constant ignore-depth to prev-depth, instead of assigning
the integer constant flag [since prev-depth is type real in m 7 8 1 . $215 L
20 Improve the readability and spacing of show-node-list output. $182,187 I
21 Set the variable v before using the case construction in show-node-last , because
there’s one case where v didn’t receive a value [aspart of the field unpacking]. $182 F
About seven hours today.
12 Mar 1978
One hour to enter yesterday’s corrections and recompile.
0 At this point correctly located further unintended syntax errors in acphdr
[the test file].
22 Insert debug-help into succumb, giving a chance to look at memory before the
system dies. $93 I
23 Use eq-destroy wherever necessary in unsave. $283 D
THE ERRORS OF TEX 645
24 Change ‘t +- ( t - 1) mod 8’ to ‘t +- ( t - 1) land 7’ in id-name, since SAIL
has -1 mod 8 = -1. [At this time, id-name is a routine that unpacks
control sequence names, according to a scheme that will become obsolete
after change #422.] L
25 Remove the space that appears at end of paragraph. (I hadn’t anticipated
that.) $816 S
26 Throw away unwanted line-feed after getting a carriage-return in response to
in-chr-w [a system routine for input from the terminal]. 883 L
27 Delete spurious call t o flush-list in end-token-list . $324 B
Why did I make such a silly mistake?
28 Fix bug in get-x-token: I forgot to say ‘macro-call’ (which is the main point of
that routine)! $380 F
While tracking that bug down, I found out incidentally that kerning is okay.
Also T)$ correctly caught an error Op for Opt.
29 Fix bug in scan-spec (while instead of repeat). $404 L
30 Make the table entries for \ h f i l l and \hskip consistent with the program
convent ions. $1058 M
31 Disable unforeseen coercion: When scan-spec put hsize on save-stack, the value
changed from real to integer. $645 L
32 Use ‘*’ instead of ‘-1.0’for running dimensions of rules in show-node-Zist. $176 I
33 Clear mem[head] to null in push-nest [in m 8 2 , this will be done by get-avail]. $216 D
A vrule link got clobbered because I forgot to do this.
34 Translate ASCII control codes t o special form when displaying them. §48,68 I
Ligatures work, but show-node-list showed them funny.
35 Remember to clear parameters off save-stack in package routine. $1086 F
About eight hours today.
13 Mar 1978
36 Introduce a new variable hang-first [later the sign of hang-after]. $849 D
36 H 37 Simplify the new code, realizing that if hang-indent = 0 then hang-first is
irrelevant. $848 E
Time sharing is very slow today, so I’m mostly reading technical reports while
waiting three hours for compiler, editor, and loading routine.
I’m not counting this as debugging time!
(Came back in the evening.)
38 Spruce up the comments in the line-break routine, which appears t o be almost
working. $813 P
39 Rethink the setting of best-line; it’s 1 too high in many cases. [The final line
of a paragraph was handled in a treacherous way.] $874 D
40 Compute proper initialization for prev-depth when beginning a n \hbox with
a paragraph inside. [This refers t o a special ‘paragraph box’ construction,
used when an hbox of specified size becomes overfull; m 78 doesn’t have
the concept of internal vertical mode.] $1083 D
41 Also initialize tail in that case. $1083 D
42 Also put the result of line-breaking into the correct list. M
43 Fix a typo in the free-node routine (‘link’ not ‘Zlank’); by strange chance it had
been harmless until today. $130 T
44 Fix bug: post-line-break forgot t o set adjust-tail. $889 F
45 Update act-width properly when looking for end of word while line breaking. $866 D
46 Repair the “tricky” part of get-node: I used the info field when I meant to say
llink. $127 B
Now the \corners macro of acphdr works! [See \setcornerrules in The
QXbook, page 41 7.1
646 D. E. KNUTH
75 Introduce special logic for eject-penalty; I was wrong to think that forced ejec-
tion was exactly like an infinitely negative penalty. $851 A
+
76 Use (1 b)2 - p 2 when computing demerits with p < 0. $859 A
0 6:30am. The line-breaking algorithm appears to be working fine and efficiently.
91 Fix get-node again: After the variable memory overflows, control falls through
to found instead of going to the overflow call. $125 F
I spent several hours tracking down that data structure bug!
92 Change new-line to next-line in yet another print command (see #7). B
75 ++ 93 Amend the line-breaking algorithm: \break in paragraph doesn’t work with
really bad breaks. $851 A
A problem to be diagnosed tomorrow: Each time I run the test program, the
amount of memory in use grows by 13 cells not returned.
Seven hours tonight.
17 Mar 1978
94 Introduce dead-cycles to keep \end active until ship-out occurs. $1054 G
95 Don’t call line-break with an empty list. $1096 E
96 Take proper account of the (infinite) fillglue when computing the width of a
paragraph line preceding a display. $1146 S
97 Add a new parameter to hpack so that line-break won’t be called at the wrong
time. [This is for the soon-to-be-obsolete feature described in #40.] M
98 Give a warning message if there’s an \ h f i l l in the middle of a paragraph;
fillglue upsets the line breaker, because floating-point calculations don’t have
sufficient accuracy. $869 L
I spent an hour looking for another bug in m, but the following one was in
METRFONT: The xgp-height data in fonts had been supplied wrong.
It took two hours to recompile 32 fonts with proto-METRFONT.
99 Make show-node-list and show-token-list more robust in the presence of soft-
ware bugs. $182 R
97 H 100 Do not remove nodes with eject-penalty, when the new parameter to hpack is
true. D
97 H 101 Put a fast exit into hpack; e.g., at glue nodes, test ‘if paragraphing A (current
width is large)’. E
2am. I have to go to bed “early” tonight.
18 Mar 1978
3:30pm. (Saturday)
102 Add a parameter to check-mem (to suppress display unless needed). $167 I
103 Introduce a user-settable parameter \mudepth, and pass it as a parameter to
vpackage. $668 G
I realized the need for \mardepth while fixing insertions (see #90).
104 Introduce a user-settable parameter for h e - b r e a k : The constant 2.0 in my
original algorithm becomes \jpar [later \tolerance], to be set like \tracing. $828 G
105 Reclaim the eject-penalty nodes removed during line-breaking. $879 D
(Those were the 13 extra nodes reported on Thursday.)
The anit-align procedure worked right the first time!
Also anit-row, init-col. But then.. .
106 Rethink the command codes: endv in a token list has too high a code for the
assumptions of get-next . $207 S
107 Add a prev-cmd variable for processing delimited macro parameters; the origi-
nal algorithm loses track of braces. [The rules will change slightly in w 8 2 ,
and rbrace-ptr will take on a similar function.] $400 A
108 Make the get-next routine intercept & and \cr tokens. $342 S
I’d thought I could just put & and \ c r into big-switch [i.e., in the stomach of
QX, not the eyes]; that was a great big mistake.
109 Make more error checks on endv; e.g., it must not occur in a macro definition
or call. $780 R
T H E ERRORS OF TEX 649
108 H 110 No, rethink alignments again; the new program still fails! $768 S
For the first time I can glimpse the hairiness of alignment in general (e.g.,
‘\halign{\u#\vk. . . ’ when \u and \v are defined to include k’s and possible
alignments themselves).
I think there’s a “simple” solution, by considering only whether an alignment
is currently active (in $3421.
11:30pm. Went to bed.
19 Mar 1978
Woke u p with “better” idea on how to handle & and \cr.
(Namely, to consider a special kind of \def whose parameters don’t interrupt
on k’s and \cr’s.)
But replaced this by a much better idea (to introduce align-state).
l l p m . Began to use computer. Performed major surgery (inserting align-state
and updating the associated routines and documentation).
111 Pop the alignment stacks in fin-align. $800 D
110 H 112 Fix a (newly inserted) typo in show-conteb. $314 T
110 H 113 Set align-state false when a live & or \cr is found. [Originally align-state was
of type boolean .] $789 D
114 Insert \cr when ‘1’ occurs prematurely in an alignment. $1132 I
115 Remember to record glue-stretch when packaging an unset node. $796 M
I had a mistake in acphdr definition of \quoteformat;also extra spaces.
My first test programs, used before today, were contrived to test macro expan-
sion, line-breaking, and page layout.
Next I’m using a test program based on Volume 2.
116 Make carriage-return, space, and tab equivalent for macro matching. $348 C
117 Omit the reference count node when displaying a mark. $176 F
118 Correct a silly slip: I wrote ‘type-displacement ’ instead of ‘value-displacement ’
when packing data in a penalty node. $158 B
119 Don’t go to build-page after seeing \noindent; isn’t ready for that. [In
the original program, this was an instance of a bad goto.] $1091 M
I had undesired spaces coming thru the scanner in my macro definitions of
\tenpoint (see The W b o o k , page 4141.
4am. T)& now knows enough to typeset page 1 of Volume 2!
Also i t did its first “math formula” (namely ‘$X$’) without crucial error.
(Except that the italic correction was missing for some reason.)
120 Remember to decrement cur-level in fin-align. [The routines will eventually
become more general and use unsave here.] $800 D
121 Remember to increment cur-level in error corrections by handle-right-brace.
[A better procedure will be adopted later.] $1069 D
122 Fix a typo: (‘{’ instead of ‘1’) in error message for mmode + math-shift. $1065 T
99 I-+ 123 Make show-noad-list more robust and more like the new show-node-list. [The
routines will be combined in T)jX82.] $690 R
124 Fix a typo in char-box: should say font-info-real. [In w 7 8 a single array is
used for both real and integer; in m 8 2 things will be scaled.] $554 L
125 Fix typos in the definitions of default-rule-thickness and big-op-spacing; they
shouldn’t start at mathez(7). $701 B
126 Reverse the before and after conventions in math nodes. $1196 B
I had them backwards; this turned hyphenation on just before math, and off
just after it!
Seven and a half hours debugging today. Got through the test program a little
more. But blew up on ‘$Y+l$’;tomorrow I hope to find out why.
650 D . E. KNUTH
20 Mar 1978
8pm. I decided to work next on a super-hairy formula.
1 2 7 Change ‘\ascii’t o ‘ \ c c ’ (character code). [This name will change again later,
to ‘\char’.] $265 C
128 Don’t bother to store a penalty node at the beginning of $$ when the paragraph-
so-far fits on a single line, since such a penalty has already been stored.
[These conventions will change later, and the \predisplaypenalty will al-
ways be stored.] $1203 E
129 Avoid reference to tail in build-page, if nest-ptr > 0. $995 s
130 Correct a silly slip in math-comp (the exact opposite of what I did in #118). $1158 B
131 Rectify my mental lapse in make-fraction; I said nucleus instead of thickness. $743 B
1 3 2 Mask off the math class when scanning delimiters. $1160 F
133 Allow an optional space after \def {. . .I. [This decision will be retracted later.] $473 C
My test example is so complicated it causes the semantic stacks to overflow!
134 Don’t test for no pages output by looking at the channel status. $642 L
135 Fix typo in definition of \mathop (open-noad not op-noad). 51156 T
136 Rewrite fin-mlzst , because ‘\left( . . .\above. . .\right)’ doesn’t parse cor-
rectly; the \left goes into the numerator, the \right into the denominator. $1184 A
1 3 7 Correct the use of depth-threshold in print-subsidiary-data: Simple fields get
shown while others look empty. $692 B
78 H 138 Return the carriage before showing the first line of a new file when pausing. 5538 I
139 Fix bug: The call show-noad-list ( m e m [ ..I) should be show-noad-list(. .), in
the incompleat-noad case of show-activities . $219 M
3am. The whole messy formula has been parsed correctly into a tree.
The easy part is done, now comes the harder part.
140 Don’t shift single characters down in make-op. $749 F
141 Make clean-box return a box (as its name implies), not an hlist. $720 D
Font info still isn’t quite right, it has the wrong value of quad.
142 Retain the italic correction when doing rebox; can make glue-set # 0 a flag for
this. [A better solution will be adopted later.] $715 A
143 Fix the bug that makes rebox bomb out: value(p) should be value(mem[p]). 5715 B
6am; ten hours today. didn’t do $\pi\over2$ correctly, but was close.
I found that the rebox problem (#I421 went away when I fixed the clean-box
problem (#141); but I will leave the extra stuff about glue-set # 0 in the
program anyway, just for weird cases.
144 Omit extra levels of boxing when possible in clean-box. $721 E
(To do this, I need to face the rebox problem anyway.)
21 Mar 1978
IOpm. The computer is rather heavily loaded tonight.
145 Don’t forget thickness when making a square-root sign (see #131). [The rule
thickness will later be derived from the character height.] $737 F
146 Define p local to the make-fraction routine. $743 L
Unwittingly using the global p was a disaster.
147 Don’t show the amount of glue-set when it’s zero. $186 I
142 H 148 Make glue-set nonzero in the result of var-delimiter . 5706 D
149 Fix bug: The math-glue function didn’t return any result. 5716 F
150 Fix typo in char-box ( c not w); this caused a subscripted P to come out the
same width as an unsubscripted P. [Later changes in the rules will move
this computation to $755.1 $709 T
144 H 151 Revise clean-box to do operations that are needed often because of the rebox
change. $720 P
152 Use the new clean-box to avoid a bug in \sqrt(\raise. . . I . 5737 D
THE ERRORS OF TEX 65 1
153 Change the definition of \not so that it’s a relation (which will butt against
the following relation). [All math symbols and Greek letters are defined in
INITEX at this time, not in a changeable format definition.] Q
154 Give error message ‘Large d e l i m i t e r must be i n mathex font’, instead of
calling confusion, since the error can occur. [This particular error is impos-
sible in QX82.1 $706 I
155 Change the use of p in var-delimiter; it isn’t always set when I say goto found. $706 F
Another font problem now surfaced: The mathex meta-font didn’t compute
7QX info in a machine-independent way. (It took two hours to correct this.)
156 Don’t forget to set type@) in all relevant cases of var-delimiter. $708 D
157 Use the correct sign convention for shift-amount in hpackage. $653 B
158 Always kern by delta when there’s no superscript. $755 F
159 Declare space-table t o be [0 . . 6 , 0 . .61 not [0 . . 7 , 0 . .71; otherwise its entries are
preloaded into the wrong positions. [The space-table in Q X 7 8 is 7 x 7; it
will become 8 x 8 in w 8 2 , represented as a string called math-spacing.] $764 M
160 Use a negative value, not zero, to represent a null delimiter. [Actually zero will
come back again later.] $685 C
127 H 161 Change \cc to \char. $265 C
162 Don’t use tricky subtraction on packed data when changing q to an ord-noad
in mlist-to-hlzst ; subtraction isn’t always safe. $729 L
163 Fix two typos in the space-table (* for 0). $764 B
164 Initialize cur-size everywhere (I forgot it in two places). $703 F
165 Reset op-noad before resetting bin-noad. $728 A
+
166 Treat display-style cramped the same as display-style inside make-op. $749 F
167 Shift the character correctly in the non-\displaystyle case of make-op. $749 B
Still another font problem: The italic corrections are wrong because the corre-
sponding array was declared real in proto-METRFONT (and italic corrections
were used in nonstandard way in mathex).
168 Use depth instead of height in var-delimiter. [Later, both were used.] $714 B
169 Skew the accents according t o the font slant. [Soon retracted.] $741 Q
At this point I think nearly all the math routines have been exercised.
Tomorrow they should work!
Eight hours debugging today.
22 Mar 1978
(Wednesday, but actually Thursday: I began a t midnight because I was proof-
reading a paper.)
I checked out the font access tables, slowly (i.e., all the \mathcode and special-
character name entries were catalogued).
169 H 170 Do not consider slants after all in the math accent routine, since slanted math
letters are put differently into fonts. $741 Q
171 Don’t use q for two different things simultaneously in make-math-accent. $738 B
172 Fix bug in compact-list (I forgot to advance the loop variable). !This procedure
became unnecessary in m 8 2 . J F
173 Avoid conflict between var-delimiter and mlist-to-hlist, which want to use
temp-head simultaneously. $713 M
174 Fix bad typo in overbar routine (b for p ) . $705 T
Finally T)$i got to after-math after dealing with that hairy formula.. .
175 Fix another bad typo: p for b this time. $1199 T
176 Insert more parentheses (twice) because of ‘lsh’ precedence in SAIL. $1199 L
36 H 177 Use the new hanging-indentation conventions when formatting displayed equa-
tions. $1199 M
178 Recompute penalties so that break is allowed after punct-noads. $761 Q
652 D . E. KNUTH
-
more or Jess at random, until it compiles correctly. I hope the bug isn’t more
severe than it appears.
210 218 Don’t put a new group on save-stack if a null mark is expanded. [ w 8 2 will
remove the ‘I’from the mark text.] $386 D
I had to redo the typewriter-style font since its width tables were wrong.
And I increased low-memory size again to 5500, then 6500.
Finally the entire test program was w e d . Happy Easter! Six hours today.
27 Mar 1978
Beginning at 2:30am.
219 Move \vcenter processing to the first pass of mlist-to-hlist; otherwise the
height, depth, subscripts, etc., are way off. $733 A
220 Omit space after closing $$. $1200 c
Spacing is wrong in the formula YI+. +
. . Yk;I have to rethink the use of three
dots.
654 D . E. KNUTH
221 Make conditional thin space available to user as \S. [Later will retract this.] $226 G
222 Introduce \dispaskip and \dispbskip [later called \abovedisplayshortskip
and \beloudisplayshortskip]. $226 Q
Reminder: I need to test line-breaking with embedded math formulas.
223 Make sure that interaction # error-stop-mode in the ‘Whoa’error Ifatal-error]. $93 I
224 Fix a big mistake in the style-node routine (which points to a glue spec, not
to glue itself); somehow this didn’t cause trouble yesterday. [In w 7 8 , style
nodes double as placeholders for math glue like thin spaces.] $732 B
225 Make \fntfam obey group structure. [ m 7 8 ’ s \fntfam operation is a combi-
nation of W 8 2 ’ s \textfont, \ s c r i p t f o n t , and \scriptscriptfont.] $1234 C
At this point the test routine for Volume 2 works perfectly.
But I will change the page width in order to check harder cases.
178 H 226 Disable automatic line breaks after punctuation in math (e.g., consider f ( x ,y)). $761 Q
227 Represent italic corrections as boxes, not glue, so that they won’t be broken.
[The \kern command doesn’t exist yet.] $1113 S
Eight hours today.
228 Fix a bug that just clobbered the memory: Call free-avail, not free-node, in
the ins-node case of vpackage. [This logic will change completely in w 8 2 . 1 $1019 B
29 Mar 1978
(Wednesday) Again beginning at 2:30am.
229 Put still more space above and below fraction lines in displayed formulas. $746 Q
189 H 230 Install an infinite penalty feature, which positively suppresses breaks; use it in
displayed formulas whose \eqno doesn’t fit. $1205 G
231 Call build-page after finishing a display; and don’t go to the \noindent routine
because of the next remark. $1200 F
232 Put \parskip glue just before a paragraph, not just after (since it interferes
with a penalty after). $1091 s
Although the test program gives correct output, it generates 46 locations of
variable-size memory and 280 of one-word memory that are not freed.
233 Recycle the ulists and vlists in fin-align. $801 F
25 H 234 Fix bug when deleting space at end of paragraph: delete-glue-ref (cur-node)
not delete-glue-ref (value (cur-node)). $816 M
There’s also a more mysterious type of uncollected garbage, a fraction-noad
corresponding to $p\choose$, an incornpleat-noad not completed.
Couldn’t find that one, so I recompiled with #233 and #234 corrected.
Now it gains just 10 locations of variable-size memory and 7 of the other kind.
235 Extend search-mern to search eqtb also. $255 I
143 H 236 Fix bug in rebox when list-ptr(b) = 0. $715 D
The seven one-word nodes were generated by this bug; rebox put them onto a
linked list starting with m e m [ O ] ,growing at the far end!
237 Remember to complete each incornpleat-noad. $1184 D
This solved the other mystery. I had never noticed that my test output was
actually wrong: $p\choose k$ came out as ‘k’.
After these corrections, the test routine worked.. . Ifeel that is nowpretty
well debugged (except perhaps for error recovery)-it ’s time to celebrate!
1 Apr 1978
238 Don’t quit after file lookup fails. $530 I
2 Apr 1978
239 Add w - f o n t - a r e a , so that it’s easier to change the default library area asso-
ciated with a device. $514 P
THE ERRORS OF TEX 655
3 Apr 1978
240 Insert parentheses again, to cope with the precedence of l s h when packing
data. (See #55 and #176.) $1114 L
0 I had never tried hmode + discretionary before!
241 Remember that back-error requires cur-tok to be set. (Problem can arise
during error recovery on parameter #n with n out of range.) $476 M
4 Apr 1978
242 Add a deletion feature to the error routine. $88 I
5 Apr 1978
243 Reset space-factor after \/ [this was later rescinded] and after math in text. $1196 Q
10 Apr 1978
104 H 244 Replace \ j p a r and \ t r a c i n g by a new primitive \chpar for parameters. It
allows a user to change those quantities as well as the penalties for hyphens,
relations, binary ops, widows. $209 G
14 May 1978
Beginning to typeset a real book (Volume 2, second edition), not just a test.
245 Make math in text end with spacing as if it were followed by punctuation. [This
rule will soon be rescinded.] $760 Q
246 Insert \times into the hash table; I left it out by mistake. [It will eventually
move into p l a i n . tex.] F
247 Change the names of Scandinavian accents from \o, \oslash, \Oslash to \a,
\o, \O. [This will also move to plain.] C
17 May 1978
248 Fix a silly bug that hasn’t been tweaked until today: ‘\halign t o s i z e ’
[obsolete in m S 2 ] used vsize instead of hsize. $645 B
19 May 1978
249 Add a \topbaseline feature [later called \topskip]. $1001 G
245 H 250 Subtract the math spacing change of May 14. $760 Q
251 Skip past blanks in the scan-math procedure. [This blank-skipping will even-
tually go into scan-left-brace.] $403 A
252 Introduce a massing-brace routine [later generalized] to improve error recovery
+
in mmode math-shift , when the top of save-stack isn’t a math-shift-grou~. $1065 I
253 Adjust the math spacing between closing parentheses and Ord, Op, Open,
Punct. §764 Q
254 Make the underline go further under. §735 Q
96 H 255 Compute the proper natural width when a displayed equation follows a para-
graph whose fillglue has been deleted by h e - b r e a k . $1146 S
20 May 1978
256 Fix the spurious value of prev-depth inside alignments. 5775 A
257 Consider (and defeat) the following scenario: The u and v lists are built in
init-align using temp-head; then while scanning ‘\tabskip Spt\rt{. . . I ’
the macro \rt is expanded, clobbering temp-head. 5779 s
That bug was more subtle than usual.
258 Add the parameter n u m 3 , so that the positioning of \atop can be different
from that for fractions. §700 Q
259 Add new parameters deliml and delimd, so that \comb can use fixed size
delimiters, not computed as with \ l e f t . §748 Q
22 May 1978
221 H 260 Change \I to \I and introduce \I as the negative of \b. [Later obsolete.] $226 C
656 D . E. KNUTH
261 Fix the display of negative penalty nodes; show-node-list is confused when a
negative value has been packed into the middle of a word. $194 L
Memory overflow just occurred with lo-mem-max = 7500 and mem-max =
16384. So I have to go to 15-bit pointers. (A problem on 32-bit machines?)
23 May 1978
262 Add a new parameter big-op-spacing5, for extra space above and below limits
of big displayed operators. $751 Q
263 Initialize incornpleat-noad in $ $ \ h a l i e { . . .I$$. $775 F
That was another heretofore-untested operation. How much of the code has
not yet been exercised?
238 H 264 Close the file when doing lookup-failure recovery. $27 F
265 Improve the error recovery for ‘Extra &’. $792 I
266 The top piece must be calculated mod 128 in war-delimiter, to guarantee a
valid subscript range. [Obsolete in w 8 2 . 1 $546 R
252 H 267 Fix a blunder in new missing-brace code. $1065 B
262 H 268 Fix a blunder in new code for limits on display operators. $751 B
26 May 1978
269 Don’t insert a new penalty after an explicit penalty in math mode. $767 Q
The hash table overflowed; I ought to make it much bigger.
110 H 270 Avoid possible bad memory references in alignment when there is erroneous
input after \cr. [Instead of extra-info, the value of cur-align in w 7 8 is
negated, because we need only distinguish \cr from &.I $789 R
271 Make the dimension parameters like \hsize all global, so that they can be set
in the \output routine. $279 S
This led to major simplifications, also to major surgery.
[But it was a kludgy decision, overruled in m 8 2 . 1
94 H 272 Don’t forget to set the type of the new null box in the \end routine. $1054 D
27 May 1978
The data overflowed memory again, both low and high, doing Section 3.3.2.
184 H 273 Mask off extra bits of \char in math mode, to avoid bad memory references. $1151 R
274 Zero out the negative \rnedmuskip in script styles. $732 B
29 May 1978
275 Be prepared to handle an undefined control sequence during get-x-token. (Can
fix this by brute force, using get-token instead of get-next.) $380 S
276 Correct the superscript shift when a single character is raised. $758 D
184 H 277 Mask off all but 7 bits in \char routine, to avoid space-factor index out of
range. 5435 R
More memory capacity overflows.
22 H 278 Fix W’soverflow stop so that I don’t have to wait for loading of the BAIL
debug routines. [System dependent.] $93 E
279 Remember to adjust the page number when a file page ends in mid-macro.
[System dependent.] $306 F
5 Jun 1978
280 Make sure that the arguments of positioning commands don’t overflow their
field size. $610 R
281 Report the excess amount when giving an overfull box warning. $666,677 I
7 Jun 1978
282 Use 2 instead of > as termination criterion in war-delimiter. §714 Q
283 Disallow \eject in math mode. [In 33x78, \eject is distinct from \break; in
horizontal mode it includes ‘QjX82’s ‘\vadjust{\break)’.] 51102 R
T H E ERRORS OF TEX 657
284 Don’t put too much clearance above \sqrt in text style. $737 Q
9 Jun 1978
110 H 285 Make align-state an integer variable, not boolean, so that \eqalign can be
within another \eqalign. $309 G
286 A \mark should expand its input. $1101 C
10 Jun 1978
287 Provide for preloading of fonts. $1320 E
288 Close the output file before switching to edit the input file with the ‘e’option. $84 L
289 Return adjustments found by hpack to free storage if they’re not used. [Later,
hpack will detach them only when they’re used.] $655 E
290 Strive for consistency between make-under and make-over. $735 Q
18 J u n 1978
236 H 291 Fix a serious error in reboz (‘b’ instead of ‘Zzst-ptr(b)’). $715 B
Strange that such a bug would now surface for the first time!
292 Remove \deg from INITEX,since macros suffice. C
293 Add an extra hyphenation penalty for two hyphenated lines in a row. $859 Q
19 Jun 1978
294 Introduce the Lno-new-control-sequence’ switch. Among other things, this will
prevent an undefined control sequence following scan-math from clobbering
the save stack. $259 S
20 Jun 1978
295 Change the badness test ‘glue 5 0.0’ to ‘glee 5 0.0001’. [%82 will avoid such
problems by calculating badness without floating point arithmetic.] $99 L
296 Force badness to be at most lo1’. $108 R
297 Add end-template for better error recovery in alignments. $375 I
287 H 298 Make INITEX more like the real w; my simple scheme for font preloading was
no good because it left thousands of ‘dead’ words in memory. $8 €3
299 Economize disk space by using internal arrays in load modules that aren’t being
reinitialized. [System dependent .] E
300 Move the declaration of m e m to the semantics module, so that the object code
will be more efficient. [System dependent. The code of W 7 8 was divided
into separately compiled modules for syntax, semantics, output, extensions,
and general organization.] E
21 J u n 1978
Today I’m working on the user manual.
301 Disallow \input except in vertical mode. [I will change this in w 8 2 , treating
\input as a case of expansion.] $378 C
302 Add error recovery for endv and par-end occurring in math mode. $1047 I
303 Generalize \ifT to \if T. $506 G
22 Jun 1978
304 Preload the \bullet [later done by plain.tax]. F
256 H 305 Get the correct prev-depth at the beginning of an alignment. $775 D
300 Change \ e j e c t so that it ejects only once. $1000 c
14 Jul 1978
307 Look in standard area if a file isn’t found in the user’s area. $537 I
308 Echo all online inputs in the transcript file. $71 I
19 Jul 1978
309 Equalize spacing when only one of numeratorfdenominator is big. $745 Q
310 Prevent subscript from getting too high above baseline. $757 Q
658 D. E. KNUTH
311 Avoid infinite loop when stack overflows: push-input should say ‘ifinput-ptr 2
stachsize A interaction = error-stop-mode’. $321 R
22 Jul 1978
312 Make \quad meaningful outside math mode. (All fonts must be generated
again!) $558 C
313 Show the nesting level at the end of show-activities. [But I decided not to do
this in w 8 2 . 1 $218 I
314 Put in \> [namely, \mskip\medmuskip; w 7 8 already has \L, for conditional
\thinmuskip, as well as the negative amounts \<, \<I.
Change the name
of vector accent from \> to \b. [Math spacing operators will become much
more general in w 8 2 . 1 $716 C
25 Jul 1978
94 H 315 Give the correct \hsize and \vsize to the null boxes created at \end. $1054 Q
94 H 316 And don’t “append” them. [Later this was changed, so that it would work
better with generalized output routines.] $1054 A
297 H 317 Remove the control sequence \endv, since error recovery is now better. $375 I
318 Define another mode of tracing: It says ‘OK’ and stops after \shoulists. $1298 I
244 H 319 Give better defaults to parameters. [Later done by plain.tex.1 $209 Q
320 Allow more bits in the packed representation of \showboxdepth. $238 I
321 Scan past delimiters and/or dimensions when recovering from ambiguous frac-
tions. $1183 I
322 Reduce accent numbers modulo 128 or 512, depending on the mode. $1165 R
323 Include a warning, ‘(\end occurred on level . . .>’. $1335 I
28 Jul 1978
(I’mwriting Chapter 27 of the manual: ‘Recovery From Errors’.)
324 Improve the error message in scan-digit. [This procedure will change its name
to scan-ezght-bit-int, when the number of registers increases from 10 to 256.1 5433 1
325 Don’t report overfull boxes if they’re less than .1 point over. $666,677 I
326 Give the user extra chances to define the font, if read-font-info is unsuccessful. $560 I
327 Change default recovery for bad parameter number from #1 to ##, since #1
won’t always work and since ## is probably intended. $479 I
328 Omit the “Negative?” message on things like scan-char-num. $435 I
329 Improve error recovery when a large delimiter isn’t in family 3. [Obsolete.] I
330 Give a more appropriate error message when the input is ‘$\right’. $1192 I
Currently says ‘Missing $’!
331 Call backinput before the error message in backerror, not afterwards. $327 I
1 Aug 1978
332 Give an appropriate warning when there’s no input file and the user types ‘e’. $84 I
333 Increase the system pushdownlist size so that the manual will compile. [Pro-
cedures hlist-out and vlist-out can recurse deeply.] L
Yesterday I distributed 45 preliminary copies of the manual; today I took out
the “debugging hooks’’ and p u t up as a system program.
2 Aug 1978
I’m typing Volume 2 again (currently in Section 4.2.2). Culture shock!
334 Introduce a \ragged parameter, to indicate a degree of raggedness. [Previ-
ously, ragged-right setting was performed when the \tolerance/100 was
odd! Eventually a better approach, with \rightskip and such things, will
be discovered.] $886 G
335 Omit the ‘widow penalty’ in one-line paragraphs. 5890 Q
THE ERRORS OF TEX 659
5 Aug 1978
336 Generalize \pageno to \count (digit). $236 G
285 H 337 Update align-state when recovering from ‘Missing C’ and ‘Extra 1’ errors.
$1069,1127 D
338 Show “runaway” tokens, making it easier to pinpoint an error. $306 I
22 Aug 1978
339 Add \predisplaypenalty. $1203 G
340 Clarify error messages; they should indicate when something has been inserted,
etc. $1064 I
23 Aug 1978
114 H 341 Substitute ‘Extra 1’ for the losing ‘Missing \cr’ error message. $1069 I
213 H 342 G o past online insertions in show-context. $311 I
343 Exact no penalty for breaking one line before a display. $1145 Q
338 H 344 Check for runaways at end of file. $362 I
345 Give error message when a macro argument begins with 1. 5395 I
24 Aug 1978
213 H 346 Remove extra line-feed in show-contezt after printing insertions. [System de-
pendent.] $318 L
25 Aug 1978
347 Leave no glue at top of page, even after \eject. $997 Q
27 Aug 1978
348 Adopt Guy Steele’s new version of the T)$ source files. [He has recently
made a copy and modified it by introducing compile-time switches for MIT
conventions as an alternative to SUAI. This is the first time that T)$ is being
ported to another site; additional switches for PARC, TENEX, TOPS10, and
TOPS20 will be added later, using the Steele style.] P
1 Sep 1978
349 Don’t pass over leader nodes in the try-break background computation. [At
this time, leaders have not yet been unified with glue.] $837 Q
82 H 350 Prune away all penalties at the top of a page. $997 Q
4 Sep 1978
338 H 351 Include ‘\’ in error message about a runaway argument. $306 I
8 Sep 1978
I just remade all the fonts, with increased ligature field size.
350 H 352 Insert a necessary goto statement in the first branch of the new penalty routine
within build-page. 5997 B
30 Sep 1978
338 H 353 Make the token list for runaway arguments meaningful outside of macro-call.
(I just had a runaway argument ending with ‘\1cm’,which turned out to be
the control sequence in hashtable location 0.) $371 M
354 Avoid infinite loop when recovering from $$ in restricted horizontal mode. $1138 R
355 Fix two hyphenation bugs related to -ages, -ers. [A completely new algorithm
for hyphenation will go into w 8 2 . 1 L
356 Add -est to hyphenation routine; also disable puz-zled and rat-tled, etc. Q
4 Oct 1978
357 Add new primitive \vtop. $1087 G
358 Treat implicit kerns properly after discretionary hyphens have been inserted. 5914 Q
660 D. E. KNUTH
4 Nov 1978
359 Forget the half quad originally required at left and right when centering dis-
played equations without equation numbers. $1202 Q
11 Nov 1978
360 Don’t let the postamble come out empty. [This could occur if no fonts were
selected.] $642 R
15 Nov 1978
361 Allow optional space after digit in scan-int routine. $444 c
17 Nov 1978
362 Make the check-mem procedure slightly more robust. $167 R
20 Nov 1978
363 Make the \par in a \def match the \par that comes automatically with a blank
line. (Suggested by Terry Winograd.) $351 C
364 Add new parameter \mathsurround for spacing before and after math in text. $1196 G
365 Extend \advance to allow increase by other than unity. [At this time it applies
only to the ten \count registers, and it is called \advcount.] $1238 G
25 Nov 1978
366 Add a new primitive: \unskip. $1105 G
367 Add new primitives \uppercase and \lowercase. $1288 G
28 Nov 1978
338 H 368 Don’t let \mark and macro-call interfere with each other’s scanner-status. $306 M
369 Omit extra 3 after show-node-list shows a \mark,since the right brace is already
there. (See #210.) $176 M
370 Add a new primitive suggested by Terry Winograd: \xdef. $1218 G
29 Nov 1978
371 Delete a space following \else{. .. I also in the false case. ( w 7 8 uses braces,
not \f i, for conditionals.] S
320 H 372 Make \tracing set \showboxbreadth as advertised. $198 D
373 Account properly for kerns in width calculations of line-break. $866 F
364 H 374 Delete a math-node at the beginning of a line. $148 Q
339 H 375 Guarantee that \predisplaypenalty=lOOOO will suppress page breaking before
a display. $1005 A
6 Dec 1978
376 Change the file opening statement to allow lines up to 150 characters long.
[System dependent .] L
16 Jan 1979
365 H 377 Initialize negative properly in the \advance routine with a \count as argument. $440 F
20 Jan 1979
378 Try to keep complex, buggy preambles of alignments from crashing the pro-
gram. $789 R
17 Feb 1979
376 H 379 Give more detailed information when warning about a long line being broken.
[System dependent; the buffer size in w 7 8 is very limited.] I
380 Declare p local to try-break, for the “rare” case code. [My original program
included the following comment: “This case can arise only in weird circum-
stances due to changing line lengths, and the code may in fact never be
executed.” Later Michael Plass will discover that variable line lengths re-
quire an entirely different algorithm, using last-special-line .] $847 L
THE ERRORS OF TEX 66 1
334 ++ 381 Don’t omit the raggedness correction when the last line of paragraph has to
shrink. [Obsolete in m 8 2 . 1 F
22 Feb 1979
363 ++ 382 Don’t forget to return from g e t x - t o k e n after finding \par. $351 F
383 Add a new parameter: \lineskiplimit. $679 Q
384 Change the syntactic sugar: ‘\hbox par’ replaces ‘\hjust t o . . .Coverfull)’.
[This vastly improves on the old idea (see #40), but there still is no internal
vertical mode.] C
385 Introduce new names \hbox and \vbox for \hjust and \vjust. $1071 C
386 Add a new condition: \ifpos. [It will later be generalized to \ifnum and
\ i f dim.] 5513 G
387 Add vu and \varunit. [ w 8 2 will eventually allow arbitrary internal dimen-
sions as units of measure.] 5453 G
312 +-+ 388 Add an em unit. $455 G
389 Legalize \hbox spread (negative dimension) [since scan-spec no longer uses
the sign as a flag]. 5645 C
10 Mar 1979
370 390 Make scan-toks expand \count during \xdef. [This will change later when
\the and \number are introduced.] $367 C
23 Mar 1979
391 Put only 100000 pt stretch at the end of a paragraph instead of 10000000000 pt.
[In w 7 8 , “infinite” glue is actually finite but large; in the language of
w 8 2 we would say that \parfillskip, which is not yet user-settable, is
being changed to be like \hf il instead of like \hf ill.] 5816 Q
392 Treat the last line of a paragraph more consistently with the other lines (e.g.,
when \hf il appears in mid-paragraph), by effectively inserting znf-penalty
at the end. 5816 Q
31 Mar 1979
393 Ensure that penalty nodes aren’t wiped out, in weird cases where breaks occur
at penalties that normally disappear. 5879 S
27 Apr 1979
394 Correct the page number count when files begin with an empty page. [System
dependent .] A
395 Allow the rnath-code table to be changeable via \chcode. [In m 8 2 , \chcode
will split into \mathcode and \catcode.] $1232 G
332 +-+ 396 Don’t accept ‘e’ after an error message if not inputting from a file. $84 I
29 May 1979
397 Don’t call end-file-reading if you haven’t already invoked begin-file-readzng ; this
could happen when trying to recover from an error in start-input. $537 F
7 Jun 1979
306 H 398 Be sure to eject two pages, when \eject comes just at the time another break
is preferable (e.g., when the page has just become too full). $1005 A
27 Jun 1979
354 ++ 399 Don’t say ‘You can’t do that in math mode’ when the user says ‘$$’ in
restricted horizontal mode! $1138 I
30 Jun 1979
400 Add ud, dp, ht dimension units. 5455 G
307 401 Don’t try the system area for file names whose area is explicitly indicated. 5537 I
662 D. E. KNUTH
1 Jul 1979
402 Allow letters as (ASCII) numbers [without the ‘ marker introduced later]. $442 G
2 Jul 1979
403 Fix a \gdef bug: If the control sequence was never defined before [this later
became the restore-zero option], don’t remove it at group end. $282 F
16 J u l 1979
320 H 404 Update show-noad-list to be like show-node-list. [The two routines, originally
separate, will be merged in ‘QX82.1 $238 I
18 J u l 1979
405 Extend capacity from 32 fonts to 64 fonts if desired. $134 G
406 Add new eztra-space parameter to all text fonts (requested by Frances Yao). $558 Q
407 Make each node-noad print properly in show-noad-list. $183 F
408 Make \jpar allow any break if it is 1000000 or more. [In ‘QX82, a \tolerance
of 10000 or more allows any break.] $851 Q
23 J u l 1979
409 Introduce new primitives \hf il, vf il, \hf ilneg, \vf ilneg. $1058 E
410 Add \ifmode. $501 G
411 Add \f irstmark. $1012,1016 G
412 Allow break at leaders (horizontal mode only). $149 C
25 J u l 1979
213 H 413 Revise error so that online insertions work properly after end-of-file errors. $336 I
411 H 414 Change ‘iffirst-mark # 0’ to ‘iffirst-mark 2 0’ [because -1 is used to indicate
‘not yet given a value’]. $1012 B
28 Jul 1979
370 415 Stop \xdef from expanding control sequences after \def ’s. [This decision will
H
be rescinded later, after several more years of experience with macro expan-
sion will suggest better ways to cure the problem.] $366 C
416 Change symbolic printout for control symbols. [System dependent.] $49 I
308 H 417 Avoid linefeeds in the transcript file. [System dependent.] L
370 H 418 Expand topmark, etc., in \xdef. $366 C
4 A u g 1979
413 H 419 Fix an error introduced recently: \par was suddenly omitted at end of page.
[System dependent.] B
11 A u g 1979
420 Change error messages that use SAIL characters not in standard ASCII. $360 P
28 A u g 1979
411 H 421 Move the command ‘first-mark t -1’ from wpackage to fire-up. $1012 D
403 H 422 Correct a serious \gdef bug: Control sequences don’t obey a last-in-first-out
discipline, so ‘QX loses things from the hash table when deleting a control
sequence. $259 S
To fix this, I either need to restrict (so that \gdef can be used inside a
group only for control sequences already defined on the outer level) or need
to change the hash table algorithm. Although all applications of T&X known
to me will agree to the former restriction, I’ve chosen the latter alternative,
because it gives me a chance to improve the language: Control sequences of
arbitrary length will now be recognized.
423 Make sure that unsawe cannot call eq-destroy with a value from the upper part
of eqtb. $282 D
T H E ERRORS OF TEX 663
I noticed this long-standing bug while fixing #422. It had very low probability
of causing damage (e.g., it required a certain field of a floating-point number
to have a certain value), but it would have been devastating on the day it
first showed up!
29 Aug 1979
424 Call eq-destroy when a control sequence is \gdef ’ed after being \def ’ed. $283 F
418 H 425 Treat the first token consistently when \topmark and its cousins are expanded
in scan-toks. $477 F
Now I’ve checked things pretty carefully and I think T&X is (‘fully debugged.”
25 Jan 1980
338 H 426 Display runaway alignment preambles. $306 I
427 Introduce active characters (one-stroke control sequences). [I don’t yet go all
the way: The meanings of Lx’ and ‘\x’ have to be identical.] $344 G
7 Feb 1980
314 H 428 Fix a glaring omission: Op space \> was never implemented in math mode! $716 F
25 Feb 1980
429 Add a new dimension ‘ex’ (for units of xheight). $455 G
3 Mar 1980
427 H 430 Allow the control sequence \: to be redefined [it was the ‘select font’ operator];
this allows the character : to be active. [Obsolete.] C
23 Mar 1980
0 An extend-m-for-the-eighties party:
431 Add a new \copy feature. $204 G
432 Add a new \uubox feature. $1110 G
433 Add a new \open feature [later \openout]. $1351 G
434 Add a new \send feature [later \write]. $1352. G
435 Add a new \leqno feature, requested by MDS. $1204 G
436 Add a new \ifdimen feature [later \ifdim]. $513 G
437 Make \(space) in vertical mode begin a paragraph. $1090 c
438 Add a new \font feature [replacing the silly previous convention that a font
must be defined when it is first selected]. $1256 G
439 Add new \parVal and \codeVal features [later \the (whatever) 1, $413 G
427 H 440 Don’t let active characters gobble the following space. $344 c
208 +-+441 Add a new parameter to govern amount of token list dumped. [Obsolete.] $295 G
442 Add a new \linebreak feature [later replaced by \break]. $831 G
25 Mar 1980
(‘Still working on the above, also thought of more.)
443 Add a new \mskip feature. $716 G
444 Add a new \newname feature (soon changed to \ l e t ) . $1221 G
430 H 445 Allow any control sequence to be redefined. $275 G
446 Send the output to the user’s current file area, even when input comes from
elsewhere. $532 I
27 Mar 1980
447 Compute the xheight for accents in math mode from family 1, not family 3.
[Obsolete.] Q
28 Mar 1980
448 Increase minimum clearance between subscript and superscript. §759 Q
29 Mar 1980
222 H 449 When a display follows a display, the second should have the ‘shortskip’ glue. $1146 Q
664 D . E. KNUTH
4 Apr 1980
445 H 450 Look at current token meanings when trying to recognize \tabskip in alignment
preambles. $782 A
23 Apr 1980
451 Estimate the length of printed output, for the new priority feature on our XGP
device driver. [System dependent.] I
434 H 452 Break long \send lines into pieces so that the file can be read in again. [System
dependent .] C
19 May 1980
182 H 453 Don’t make \left and \right delimiters too large; they need to be only 90%
of the enclosed size. [This eventually became \delimiterfactor.] $762 Q
2 1 May 1980
454 Add a new \pagebreak feature [later \vadjust{\break)]. $655 G
13 Jun 1980
0 Today I’m beginning to overhaul the line-breaking routine, and I’ll also install
miscellaneous goodies.
455 Allow a radical sign to be in different font positions. $737 G
456 Clear empty tokenlists off input stacks to allow deeper recursions (suggested
by Jim Boyce’s macros for chess positions). $325 E
457 Make \spaceskip and \parfillskip changeable. $1228 G
458 Add a new parameter \rfudge (per request of Zippel) [later \mag]. $288 G
459 Add a new parameter \loose [later \looseness];now parameters are allowed
to take negative values. $875 G
460 Remove the variable just-par. [Obsolete; it was the real equivalent of an
integer]. E
14 Jun 1980
461 Install new line-breaking routines, including \parshape. (These major changes
are introduced as Michael Plass and I write our article.) $813 Q
462 Add a new parameter \exhyf [later \exhyphenpenalty]. $870 G
16 Jun 1980
444 H 463 Change conventions in eqtb so that glue is distinguishable from other equiva-
lents. $275 S
444 H 464 Don’t expand \b in \xdef C\d\b{. . .1) after \let\d=\def. [Obsolete.] A
444 H 465 Avoid creating dead storage when doing unsave in certain regions. $275 D
17 Jun 1980
466 Allow negative dimensions in rules. $138 C
19 Jun 1980
463 H 467 Make the new test for glue at the outer level of show-eqtb. $252 B
27 Jun 1980
453 H 468 Don’t let \left and \right become too small for big matrices. [This eventually
became \delimitershortfall.] $762 Q
3 Aug 1980
469 Don’t move extra-wide, numbered equations flush left unless they begin with
glue. $1202 Q
15 Sep 1980
461 H 470 Say ‘ 2 fz’ instead of ‘> f.’ in the pre-hyphenation routine; I’d forgotten my
definition of fz [a variable used to test for a sequence of lowercase letters in
the same font]. $897 M
395 H 471 Check the range of the index in \chcode before saving the old value. $1232 R
THE ERRORS OF TEX 665
18 Sep 1980
457 H 472 Don’t forget to increase the reference count to \parfillskip,or it will myste-
riously vanish. $816 D
19 Sep 1980
412 H 473 Make leaders break like glue in both horizontal and vertical modes. $149 C
364 H 474 Make \mathsurround break properly at left and right end of lines. $879 Q
13 Oct 1980
461 H 475 Remove spurious overfull boxes generated when the looseness criterion fails.
[Obsolete.] I
461 H 476 Redesign the iteration for looseness; breakpoints were not chosen optimally. $875 A
461 H 477 Avoid storing a lot of breakpoints when they are dominated by others. $836 E
366 H 478 Don’t say ‘cur-node’when you mean ‘mem[cur-node]’. $1105 B
461 H 479 Prefer the oldest break to the youngest break when two break nodes have the
same total demerits. $836 Q
461 H 480 Don’t make badness too big for floating-point calculations, when forced to make
an overfull box. [Obsolete.] L
10 Dec 1980
481 Make it impossible to get unmatched ‘1’in a delimited macro argument. $392 R
482 Add new \topsep and \botsep features. [These are W 7 8 ’ s way to put space
at the edge of inserts, replaced in m 8 2 by the \skip register corresponding
to an \insert class.] $1009 G
6 Jan 1981
483 Install new routines for reading the font metrics, using Ramshaw’s TFM files
instead of TFX files. $539 P
484 Abort after reporting 100 errors, if not pausing on errors. $82 I
485 Add new \spacef actor and \specskip and \skip primitives. [At this time we
write ‘\specskipd=lOpt’and ‘\skip3’for what will become ‘\skip3=lOpt’
and ‘\hskip\skip3’in m 82 .
1 $1060 G
366 H 486 \unskip is now allowed in internal vertical mode. $1105 G
26 Jan 1981
482 H 487 Don’t say ‘mem[q]’when you mean ‘9’. (See #143 and #478.) $1009 B
27 Feb 1981
417 H 488 Put some linefeeds back into the transcript file, in order to prevent overprinting
in listings. [System dependent.] I
489 Add a new \dpenalty feature [later \postdisplaypenalty]. $1205 G
490 Add the dimension cc for European users. $458 G
491 Make scan-keyword match uppercase letters as alternatives to lowercase ones
(suggested by Barbara Beeton’s experiments with \uppercase). $407 C
492 Add nonstop mode so that overnight batch processing is possible. $73 I
2 Mar 1981
422 H 493 Fix a still more serious \gdef bug: The generality of \gdef almost makes it
a crime to forget any control sequence names, ever! (The previous bug was
only the tip of an iceberg.) $259 S
494 Issue warning message at the end of a file page if nesting level isn’t zero. [System
dependent .] I
5 Mar 1981
495 Keep track of maximum memory usage, for statistical reporting. [Obsolete.] $125 I
350 H 496 Prune away glue and penalties at top of page after marks, sends, inserts. $1000 Q
497 Allow \mark in horizontal mode. [Later it will be \vadjust{\mark.. .I.] $655 G
666 D. E. KNUTH
498 Allow optional space before a required left brace, e.g., \if AA C . . . I . [See
#251.] $403 C
499 Issue an incomplete \ i f error, to help catch a bad \ i f . $336 I
17 Mar 1981
494 H 500 Omit the warning message at end of a file page unless the nesting level has
changed on that page. [System dependent.] I
310 ++ 501 Fix the spacing when there is a very tall subscript with a superscript. $759 Q
20 Mar 1981
371 H 502 Make space-eating after \ e l s e fully consistent between the true and false cases.
[Obsolete.] S
24 Mar 1981
496 H 503 Change glue-spec-size to ins-spec-size in wpackage [where insertions are done].
[Obsolete.] B
5 Apr 1981
501 H 504 Fix a typo ('+' instead of '-') in the new subscript code; this shifted certain
subscripts down instead of up. $759 B
18 Apr 1981
505 Make leaders with rules of specified size act like variable rules. $626,635 G
29 Apr 1981
461 H 506 Don't consider badness > threshold at a line \break except in an emergency. $854 A
13 Jul 1981
402 H 507 Allow other characters as numbers. $442 C
294 H 508 Avoid dead storage if a no-new-controlsequence error occurs. [Obsolete.] $259 R
509 Add a new \ i f x feature. $507 G
510 Add new features \xleaders and \cleaders. $626,635 G
14 Jul 1981
507 H 511 Amend the new code for constants; the ' .' in ' . 5 ' is thought to mean '056 ! $442 S
507 H 512 And fix an egregious blunder in that code: New commands at the end of a
procedure are ignored when earlier statements exit via return. $442 L
4 Aug 1981
513 Accept alphabetic codes for all online error recovery options, instead of insisting
on control codes like line feed or form feed. [The original error-recovery codes
were suggested by the conventions of the SAIL compiler.] $84 P
514 Add a new \thebox feature [later \lastbox]. $1079 G
7 Aug 1981
515 Add f il, f i l l , and f i l l 1 as units for glue stretching or shrinking. $454 G
516 Suppress the overfull box error when shrinkage amount is negative. $664 I
9 Aug 1981
517 Let unset boxes inherit the size of their parent in alignments. $810 Q
12 Apr 1982
518 Make INITEX dump out the font-dsize array needed by the new D V I output
module. $1322 F
1 May 1982
151 H 519 Fix clean-box so that mlist-to-hlist cannot make Zznk(q) = 0 and type(q) =
glue-node. $720 S
[That was the historic final change to m 7 8 . AJJ subsequent entries in this Jog
refer to QX82.1
THE ERRORS OF TEX 667
28 Sep 1982
Here are the first changes made to the preliminary listing of QX82 that was
published by the project earlier this month.
520 Insert the missing cases letter and other-char after z t o k e n looks ahead. $1036 F
521 Change ‘\pause’to ‘\pausing’. $236 C
522 Reset overfull-rule when determining tabskip glue. $804 D
523 Fix the logic for scanning \ i f case [in obsolete syntax-everything is still done
with braces since ‘\fi’doesn’t exist yet]. $509 A
30 Sep 1982
524 Change “0.0“t o “?.?“ (suggested by DRF). $186 I
2 Oct 1982
525 Use conditional thin spacing next to ‘Inner’ noads. $764 Q
526 Make thick spaces conditional. $766 Q
4 Oct 1982
527 Increase trie-size from 7000 to 8000, because of Frank Liang’s improved (but
longer) hyphenation patterns. $11 P
6 Oct 1982
528 Change the string lengths to match the new m - j o r r n a t - d e f a d t . $520 F
Version 0 of TE$ is being released today!
8 Oct 1982
529 Fix a blunder: I decreased h mod a quarterword when it should have been
decreased mod trie-op-hash-size (HWT). $944 B
9 Oct 1982
530 Fix a typo (‘!’ not ‘&’) in the WEB documentation. $524 P
531 Remember to call initialize if a different format was preloaded (Max Diaz). $1337 F
Version 0.1 incorporates the above changes.
12 Oct 1982
532 Add the ‘\immediate’feature, by popular request. $1375 G
Version 0.2 incorporates this (somewhat extensive) change.
13 Oct 1982
533 Introduce new WEB macros so that glue-ratio is more easily changed. $109 P
I began writing The m b o o k today: edited the old preface and searched in the
library for quotations.
14 Oct 1982
534 Change the type of hd to eight-bits; it’s not a quarterword (HWT). $649 B
535 Revise the optimization of D V I commands: It’s not always safe to eliminate pop
when the preceding byte is push, since D V I commands have variable length!
(Embarrassing oversight caught by DRF.) $601 S
15 Oct 1982
536 Test ‘prev-depth > ignore-depth’, not ‘f’. $679 C
Version 0.3incorporates the above changes.
16 Oct 1982
537 Omit definition of align-size; it’s never used (Bill Scherlis). $11 P
538 Inhibit error messages when packaging box 255. $1017 I
21 Oct 1982
539 Subtract width(q) from page-goal, don’t add it to page-so-far[l]. $1009 A
The comment in $982is correct, and so was my first draft of this code; but when
desk checking the program some months after writing it, I introduced this
bug, believing that I was making the algorithm more elegant or something.
668 D. E. KNUTH
Previously a slightly loose hyphenated line followed by a decent line was con-
sidered worse than a decent hyphenated line followed by a quite loose line.
10 Nov 1982
555 Save a bit of buffer space by declaring pool-$le only in INITEX. $50 E
11 Nov 1982
556 Introduce a new context indicator to clarify W’sscanning state: A special
type called backed-up is distinguished from other kinds of inserted lists; it is
called ‘recently read’ or ‘to be read again’, while others are called ‘inserted’. $314 I
557 Append a comment, ‘treated as zero’,to the missing-number message. $446 I
558 Ignore the settings of \hfuzz or \vfuzz if \hbadness or \vbadness is less than
100. $666,677 I
13 Nov 1982
Major surgery on the program is planned for today, because of new ideas sug-
gested by correspondence with MDS and other macro writers.
559 Introduce a new \tokens register; this will be useful and easy to add, since
T@ already can handle \everypar and \output. $1227 G
560 Change get-x-token to get-token when scanning an optional space; then a con-
struction like \def\f oo{ . . .)\f oo won’t complain that \f oo is undefined. $443 C
This change was retracted when it was being debugged, because it could cause
endv to abort the job. Then it was re-established again when I found that
endv needed to be more robust anyway. [But it was eventually rescinded
again.]
561 Make \span mean ‘expand’ in a preamble. $782 G
562 Use three separate if tests instead of ‘A’ in the inner loop of get-next, to gain
efficiency. $342 E
563 Introduce get-r-token so that assignments have uniform error messages and so
that frozen equivalents cannot be changed. $1215 R
I gave a few variables more mnemonic names as I made these changes.
564 Move conditional statements from the semantics (‘stomach’) part of to
the syntax (‘mouth’) part, by introducing ‘\fi’. Also introduce \csname and
\endc sname. 3372,489-500 C
This makes macros much more predictable and logical, but it is by far the most
drastic change ever made to w. The program began to come back to life
only after three days of solid hacking.
Several other things were cleaned u p as part of this change because it is now
more natural to handle them differently. For example, a null control sequence
has now become more logical.
The result of all this is called Version 0.8.
18 Nov 1982
Today I resumed writing Chapter 8. Tomorrow I’m 214 days old!
21 Nov 1982
565 Declare c as a local variable for hyphenation (DRF). $912 F
566 Omit the “first pass” and try hyphenations immediately, if \pretolerance is
negative (suggested by DRF). $863 E
567 Don’t ship out incredibly huge pages; they might foul up D V I files. $641 R
2 Dec 1982
568 Add new features \everymath and \everydisplay. $1139,1145 G
569 Add a new feature \futurelet. $1221 G
The changes above have been incorporated into Version 0.9 of TeX.
670 D . E. KNUTH
7 D e c 1982
570 Add a new \endinput primitive (suggested by FY). $362,378 G
8 Dec 1982
571 Try 08-save, if \par occurs in restricted horizontal mode. (This avoids em-
barrassment if T)$ says ‘type a command or say \end’,then when you type
\end it says you can’t!) [However, I soon retracted this change.] $1094 I
21 Dec 1982
572 Redefine \relax so that its chr field exceeds 127. (This facilitates the test for
end in scan-file-name.) $265 A
566 H 573 Call begin-diagnostic when omitting the first pass of line breaking. $863 F
574 Fix the logic of glue scanning: In \hskip-lpt plus2pt the minus should apply
only to the ipt. $461 A
23 D e e 1982
575 Renumber the decimal codes in paragraph statistics for loose and tight lines;
they were ordered backwards. $817 I
570 Treat a paragraph that ends with leaders like a paragraph that ends with glue. $816 C
577 Allow commas as alternates to radix points, for Europeans. $438 c
578 Change \hangindent to a normal dimension parameter. [It had been a combi-
nation of \hangindent and \hangafter, with special syntax.] $247 C
579 Make \prevgraf accessible to users. $422,1244 G
580 Split \clubpenalty off from \uidoupenalty. $890 G
I’m typing Chapter 14 while making these changes.
24 D e e 1982
581 Use back-input instead of goto reswitch when inserting \par, because \par
may have changed. $1095 S
25 Dee 1982
It’s 1Opm after a very Merry Christmas!
582 Don’t prompt for a new file name if \openin doesn’t find a file. $1275 I
583 Add a new \jobname primitive. $472 G
584 Give the user a way to delete the dollar sign, when T)$i decides to insert one. $1047 I
585 Allow optional equals after \parshape, and implement \the\parshape. $423,1248 C
26 Dec 1982
580 Add an zf-line-field to the condition stack entries, so that more informative
error messages can be given. $489 I
549 I-+ 587 Introduce a normal-paragraph procedure, since initialization is needed also
within \insert, \vadjust, \valign, \output. $1070 D
27 Dec 1982
588 Give users access to \pagetotal and \pagegoal. (Analogous to #679 and
#585, but simpler.) $1245 G
589 Introduce \tracingpagas,allowing users to see page-optimization calculations.
Also split \tracingparagraphs off from \tracingstats. $987,1005,1011 I
The changes above have been incorporated into Version 0.91 of m.
31 Dec 1982
590 Break the buildpage procedure into two parts, by extracting the section now
called fire-up. [This is necessary because some Pascal compilers, notably for
IBM mainframes, cannot deal with large procedures.] $1012 P
564 591 Make \ifoddi\else legal by introducing if-code. $489 s
592 Improve alignments when columns don’t occur: Don’t append null boxes for
columns missing before \cr, and zero out the tabskip glue after nonpresent
columns. !PO2 Q
THE ERRORS OF TEX 67 1
593 Make the error message about overfull alignment more intelligible. $801,804 I
The changes above have been incorporated into Version 0.92 of m 8 2 , which
was the last version of 1982, completed at 11:59pm on December 31.
3 Jan 1983
Today I’m beginning to write Chapter 15, and planning the \output routine
of plain. tex.
594 Change the logic of its-all-over; use max-dead-cycles instead of the fixed con-
stant 100. $1054 C
595 Don’t forget to pop-nest when an insert is empty. Also disallow optional space
after \insert n {. . . I . $1100 F
4 Jan 1983
541 H 596 Use the \bornaxdepth that’s declared inside a \vbox when packaging it. $1086 C
597 Rename \groupbegin and \groupend as \begingroup and \endgroup. $265 C
598 Make \deadcycles accessible to users. $1246 G
599 Base the split insertions on natural height plus depth, not on delta. $1010 Q
The changes above have been incorporated into Version 0.93.
6 Jan 1983
600 Add push-math to handle a case where I forgot to clear incornpleat-noad. (This
long-standing bug was unearthed today by Phyllis Winkler.) $1136 D
588 H 601 Add \pageshrink,etc., too. $1245 G
602 Introduce new parameters \f loatingpenalty,\insertpenalties. Also adopt
a new internal representation of insertion nodes, so that \floatingpenalty,
\splittopskip and \splitmaxdepth can be stored with each insertion. $140,1008 G
7 Jan 1983
603 Improve the rules for entering new-line, in particular when the end-of-line
character is active. §343 Q
9 Jan 1983
604 Distinguish between implicit and explicit kerns. $155,896 Q
605 Change the name \ignorespace to \ignorespaces. $265 C
560 H 606 Don’t omit a blank space after \def,\message,\mark,etc.; the previous hodge-
podge of rules is impossible to learn. $473 c
The above changes appear in Version 0.94.
12 Jan 1983
Beginning to write the chapters on math today.
607 Add a new feature: active characters in math mode. $1151 G
15 Jan 1983
608 Fix a surprise bug: ‘$1-$’ treated the - as binary. $729 A
609 Initialize space-factor inside discretionaries. $1117 D
16 Jan 1983
610 Fix an incredibly embarrassing bug: I forgot to update spotless in the error
routine! F
While fixing this, I decided to change spotless to a more general history vari-
able, as suggested by IBMers who want a return code. $76,82,1335
611 Replace two calls of confusion by attempts at error recovery, in places where
‘This can’t happen’could actually happen. $1027,1372 I
18 Jan 1983
612 Introduce the normalize-selector routine to protect against startup anomalies
when the transcript file isn’t open. Also make open-log-file terminate in some
cases. $92,535 R
672 D. E. KNUTH
591 H 613 Insert \relax, not a blank space, to cure infinite loop like \ifeof\fi (LL). $510 R
614 Change the old \limitswitch to \limits, \nolimits, and \displaylimits.
Incidentally, this fixes a bug in the former positioning of integral signs. $682,749 G
615 Give a \char in math mode its inherited \mathcode. 51151 C
525 616 Make underline, overline, radical, vcenter, accent noads and i. . . I ail revert
to type Ord instead of type Inner. Introduce a new primitive \mathinner.
(This fixes the spacing, which got worse in some ways after change #525.) 3761 Q
I’m working on Appendix G today.
19 Jan 1983
617 Introduce a \mathchoice primitive. $1174 G
618 Move \input from the stomach to the eyes. $378 C
619 Introduce \chardef, analogous to \mathchardef. 31036,1224 C
620 Change \unbox to \-box and \unvbox; also add \unhcopy. $1110 G
621 Consider \spacef actor, \pagetotal, etc., as part of prefixed-command, even
though they are always global. $1211 c
20 Jan 1983
622 Switch modes when \hrule occurs in horizontal mode or \vrule in vertical.
§1090,1094 C
623 Add a new \globaldefs feature. $1211 G
21 Jan 1983
624 Optimize the code, in places where it’s important (based on frequency counts
$ accumulated during the past week): Introduce fast-get-avail
of ‘I)usage
and fast-store-new-token; reduce procedure-call overhead in begin-token-list ,
end-token-list , b a c k i n p u t , flush-node-list ; change some tests from ‘if a A b’
to ‘if a then if b’. $122,371 E
22 Jan 1983
625 Save space in math lists: Don’t insert penalties within restricted horizontal
mode; simplify trivial boxes. $721,1196 E
626 Fix a surprising oversight in the rebox routine: Ensure that b isn’t a vbox. $715 S
545 H 627 Make \nullfont a primitive, so that cur-font always has a value. (This is
a dramatic improvement to w 7 8 , where a missing font was a fatal error
called ‘Whoa’!) $552 C
24 Jan 1983
586 H 628 List all incomplete \if’s when the job ends. $1335 I
29 Jan 1983
629 Change initialization of alignstate SO that \halign\bgroup works. $777 c
30 Jan 1983
625 +-+ 630 Be sure to test ‘is-char-node(q)’ when checking for a trivial box. $721 D
By extraordinary coincidence, this bug was caught when somebody used font
number 11 (= kern-node) in the second character of a list of length 2!
631 Improve format for stats at end of run, as suggested by DRF. $1334 1
The changes above have been incorporated into Version 0.95.
632 Don’t ignore the space after a control symbol (except ‘\ ’). $354 c
633 Remove all trailing spaces at the right of input lines, so that there’s perfect
compatibility with IBM systems that extend-short lines with spaces. $31 P
3 Feb 1983
634 Assume that a math-accent was intended, after giving an error message in the
+
case mmode accent. $1165 I
635 Add new primitives \iftrue and \iffalse. $488 G
T H E ERRORS OF TEX 673
6 Feb 1983
636 Improve the accuracy of fixed-point arithmetic when calculating sizes for \left
and \right. (I had started by dividing delimiter-factor, not deltal, by 500.) $762 A
12 Feb 1983
637 Change the name \delimiterlimit to \delimitershortfall. $248 C
638 Make \aboveuithdelims. . equivalent to \above;change the order of operands
so that delimiters precede the dimension. $1182 C
607 H 639 Remove the kludgy math codes introduced earlier; make \f am a normal integer
parameter and allow \mathcode to equal 215. $1233 C
640 Don’t let \spacef actor become 215 or more. 81233,1243 R
I finished drafting Chapter 17 today.
14 Feb 1983
639 H 641 Replace octal output (print-octal) by hexadecimal (print-hex) so that math
codes are clearer. $67 I
619 H 642 Don’t forget char-given in the math-accent routine. $1124 F
17 Feb 1983
643 Switch modes when \halign occurs in horizontal mode, or \valign in vertical
mode. $1090,1094 C
18 Feb 1983
644 Add a new feature \tracingrestores. This requires a new procedure called
show-eqtb, whose code can be interspersed with the eqtb definitions. $252 1
25 Feb 1983
622 H 645 Suggest using \leaders when the user tries a horizontal rule in restricted hor-
izontal mode. 51095 I
27 Feb 1983
646 Specify the range of source lines, when giving warning messages for underfull
or overfull boxes in alignments. $662,675 I
Why did it take me all day to type the middle part of Chapter 18?
4 Mar 1983
647 Introduce a new feature \xcr (suggested by LL). [Changed later to ‘\crcr’.] $785 G
631 H 648 Subtract out W’sown string requirements from the stats. $1334 I
6 Mar 1983
649 Add new features \everyhbox and \everyvbox. $1083,1167 G
9 Mar 1983
650 Avoid accessing math-quad when the symbol fonts aren’t known to be present. $1199 R
533 H 651 Introduce float and unfloat macros to aid portability (HWT). $109 P
652 Introduce new names \abovedisplayskip and \beloudisplayskip for the old
\dispskip;also\abovedisplayshortskipand\beloudisplayshortskip for
the old \dispaskip and \dispbskip. $226 C
10 Mar 1983
653 Unbundle \romannumeral from \number (suggested by FY). $468 C
12 Mar 1983
654 Ignore leading spaces in scan-keyword. $407 C
14 Mar 1983
631 H 655 Use write and write-ln directly when printing stats. $1334 E
16 Mar 1983
602 H 656 Refine the page-break cost function (introducing ‘deplorable’, which is not quite
‘awful-bad’), after suggestion by LL. 5974,1005 Q
674 D. E. KNUTH
27 May 1983
697 Add a new feature \afterassignment (suggested by ARK). $1269 G
698 Adjust the timing so that commands like ‘\chardef\xx=5\xx’behave sensibly. $1224 C
28 May 1983
699 Ignore ‘\relax’as if it were a space, in math mode and in a few other places
where \relax would otherwise be erroneous. $404 C
700 Improve \mathaccent spacing with respect to subscripts and superscripts (sug-
gested by HWT). $742 Q
30 May 1983
594 H 701 Terminate a job only when dead-cycles = 0. $1054 C
The changes above constitute Version 0.98.
3 Jun 1983
I finished the draft of Chapter 23 (output routines) today.
702 Allow \mark and \insert and \vadjust in restricted horizontal mode, and
also in math mode. (This is a comparatively big change, triggered by the
fact that \mark in a display presently causes ‘I)$ to crash with ‘This can’t
happen’!) The global variable adjust-tail is introduced. $796,888,1085 G
6 Jun 1983
695 ++ 703 Replace (and generalize) the previous uses of ht, ud, and dp in dimensions by
introducing the new control sequences \ht, \ud, and \dp. $1247 G
-
704 Display sub-parts of noads with the symbols and - instead of ( and [. $696 I
694 ++ 705 Allow A . .F in hex constants to be other-char as well as letter. $445 c
7 Jun 1983
654 H 706 Remove an instance of (Scan optional space), since it’s now redundant. $457 E
707 Legalize \mkern\thinmuskip and \mkern5\thinmuskip. $456 C
708 Clean up the treatment of optional spaces in numerical specifications. $455 c
A construction like 2.5\space\space\dimenOwas previously valid after ‘plus’
or ‘minus’ only!
I’mobviously working on Chapter 24 today.
545 H 709 Allow ‘\font’as a (font identifier) for the current font. $577 c
623 ++ 710 Don’t make \gdef global when global-defs < 0. $1218 C
711 Produce zero-glue as the outcome of \advance\spaceskip by-\spaceskip. $1229 E
712 Make \show do something appropriate for every possible token. $1294 I
559 H 713 Replace the (single) \tokens parameter by an array of 256 token registers. $230 G
714 Allow \indent in math mode; also make \valign in math mode produce the
‘Missing $’ error. $1046,1093 C
715 Remove redundant code: There’s no need to check cur-group or call off-save
when starting alignments or equation numbers in displays. §1130,1142 E
8 Jun 1983
716 Disallow \openout-1 and \closeout-I. $1350 C
717 Disallow \lastbox in math mode. $1080 C
9 Jun 1983
718 Call backemor, not error, when \leaders aren’t followed by proper glue. $1078 I
719 Initialize for a possible paragraph, after \noalign in a \valign. $785 D
10 Jun 1983
720 Expand the optional space after an ASCII constant. $442 C
12 Jun 1983
721 Set space-factor +- 1000 after a rule or a constructed accent. $1056,1123 C
THE ERRORS OF TEX 677
14 Jun 1983
722 Correct a serious blunder: Set disc-width t 0 before testing if s is null (caught
by JS). $870 D
This is a real bug that existed since the beginning! It showed u p on page 37 of
the Version 0 TRIP manual, but I didn’t notice the problem.
708 H 723 Make optional spaces after (dimen) like those after (number). $448 C
568 H 724 Insert every-display before calling build-page. $1145 C
648 ++ 725 Report m’scapacity on overflow errors in a way that’s fully consistent with
other statistical reports. $42 I
17 Jun 1983
726 Make all \tracing decisions on the basis of 2 versus <, not # versus =. $581 C
Today I finished the draft of Chapter 27 (the last chapter)!
The changes above were released as Version 0.99 on June 19, 1983.
20 Jun 1983
727 Set \catcode‘\%=14 in INITEX. $232 C
587 H 728 Call normal-paragraph when \par occurs in vertical mode. $1094 C
Once again I’m retiring about 8am and awaking about 4pm.
21 Jun 1983
558 H 729 Don’t append an overfull rule solely because of \hbadness. $666 C
730 Don’t allow the glue-ratio of shrinking to be less than -1. $810,811 R
22 Jun 1983
653 H 731 Declare the parameter to print-roman-int to be of type integer, instead of
nonnegative-integer (found by Debby Clark). $69 B
690 H 732 Make the keyword ‘by’ optional (suggested by LL). $1236 C
24 Jun 1983
733 Say ‘preloaded’ when announcing format-ident . $1328 I
25 Jun 1983
734 Add extra boxes and glue to the output of alignment. [This thwarts possible
attempts at trickery by which system-dependent glue set values computed
by \span could have gotten into ‘&X’s registers by things like \valign and
\ v s p l i t . It also has the advantage of perfect accuracy in alignment of vertical
rules.] $809 R
735 Make leaders affect the height or width of the enclosing boxes. $656,671 C
Today I’m mainly installing a much-improved format for change files in WEB
programs (suggested by DRF).
28 Jun 1983
736 Permit \=skip in vertical mode when we know that it does nothing. $1106 C
1 Jul 1983
700 H 737 Avoid redundant boxes when things like ‘{\bf A)’ occur in math. $1186 E
738 Add a ‘scaled’feature to \font input. $1258 G
700 I-+ 739 Remember to correct delta when an accented box changes. $742 D
2 Jul 1983
740 Introduce bypass-eoln, to remove anomalous behavior on input files of length 1.
(Suggested by DRF after the problem was discovered by LL). $31 R
4 Jul 1983
741 Allow codes like --b as well as --B. 5352,355 G
678 D. E. KNUTH
4 Sep 1983
785 Add new features \lastkern, \lastpenalty, \unkern, \unpenalty. $424,996,1105 G
OK, Appendix D is finished!!
The above changes have been installed in Version 0.999999,
17 Sep 1983
548 H 786 Don’t bother making duplicate font identifiers; that was overkill, not really
needed. $1258 P
Will this be the historic last change to w?
18 Sep 1983
787 Correct a minor inconsistency, ‘display’not ‘displayed’. $211 I
20 Sep 1983
604 H 788 Treat the kerns inserted for accents as explicit kerns. $1125 C
20 Sep 1983
789 Change ‘log’to ‘transcript’in several messages. $535,1335 I
The index was finished today; I mailed the entire W b o o k East for final proof-
reading before publication.
1 Oct 1983
790 Prevent uninitialized trie positions in case of overflow (found by Bernd Schulze). $944 D
7 Oct 1983
Henceforth our weekly ‘w lunch’ meetings will be called ‘METRFONT lunch’.
DRF begins to produce The W b o o k on our A P S phototypesetter.
14 Oct 1983
633 w 791 Ignore spaces at the ends of lines also in TEX.PO0L (found by DRF). $52 P
792 Initialize the history variable at start-here (DRF). $1332 D
18 Oct 1983
793 Extend runaway to catch runaway text (suggested by FY). $306 I
794 Reset cur-cs after back-input, not after scanning the ‘=’ (found by FY). $1226 D
24 Oct 1983
638 H 795 Change the error recovery for bad delimiters, in accordance with the changed
syntax. (Found by Barry Smith.) $1183 I
9 Nov 1983
796 Optimize the code a bit more, based on empirical frequency data gathered
during September and October: In $45, use the fact that the result is almost
always true. In $380, delete ‘while true do’ since many compilers implement
that badly. Rewrite $852 to avoid calling badness in the most common
case. $45,380,852 E
3 Dec 1983
797 Don’t forget to call error after the message has been given (noticed by Gabi
Kuper) . $500 F
Version 1.0 released today incorporates all of the above.
9 Dec 1983
Dinner party with 36 guests to celebrate W’scoming of age.
2 Feb 1984
786 +P 798 Reinstall \font precautions that I thought were unnecessary. I overlooked
many problematic possibilities, like ‘C\font\a=x \global\a) \the\font’and
‘\font\a=x\font\b=x \let\b=\undefined \the\a’,etc. (Found by Mike Ur-
ban.) The new remedy involves removal of the font-dent array and putting
the identifiers into a frozen part of the hash table; so there’s a sprinkling of
THE ERRORS OF TEX 68 1
25 Nov 1984
817 Don’t forget to check for null before looking at subfields of a node. (This was
“dirty Pascal,” with two quarterword 0’s read as a halfword.) $846 R
818 Ditto in another place! $939 R
819 Remove the fixed-at-compile-time partition between lower and upper memory.
$116,125,162 E
This major change in memory management completes Version 1.3, which was
published in preiiminary looseieaf form as ‘w: The Program’.
20 Dec 1984
820 Keep the nodesize field from overflowing if the lower part of memory is too
large. $125 R
That was another bug in existence fiom the beginning!
5 Jan 1985
821 Improve the missing-format-file error (DRF). $524 I
7 Jan 1985
822 Update the terminal right away so that the welcoming message will appear as
soon as possible (DRF). $61 I
23 Jan 1985
823 Convey more uncertainty in the help message at times of confusion. $95 I
824 Improve the history logic in the warning-issued case. $245 I
18 Feb 1985
810 H 825 Stick to standard Pascal: Don’t use first in a for loop. [Some procedures
“threaten” it globally, according to British Standard 6192, section 6.8.3.9.1
(Pointed out by CET.) $331 P
11 Apr 1985
826 Prevent nonexistent characters from being output by unusual combinations of
ligatures and hyphenation. $915 S
15 Apr 1985
819 ++ 827 Compute memory usage correctly in INITEX; the previous number was wrong
because of a WEB text macro without parentheses (DRF). $164 L
16 Apr 1985
828 Speed up flush-list by not calling free-avail (DRF). $123 E
17 Apr 1985
788 H 829 Introduce a special kind of kern for accent positioning; it must not disappear
after a line break. $837,879,1125 A
18 Apr 1985
755 ++ 830 Prevent \lastbox and \unkern from removing discretionary replacements.
$1081,1105 R
That completes Version 1.4.
26 Apr 1985
831 Don’t try m - a r e a if a nonstandard file area has been specified (DRF). $537 c
That was #401 in w 7 8 ; I never learn!
30 Apr 1985
754 ++ 832 Eliminate the limitation on \write length; the reason for it has disappeared
(Nancy Tuma) . $1370 C
8 May 1985
819 ++ 833 Allocate two words for the head of the active list (CET). $162 D
THE ERRORS OF TEX 683
11 May 1985
834 Change w t e m to wtem-ln after a bad beginning (Bill Gropp). $1332 I
806 H 835 Don’t open the terminal twice (CET). $1332 E
22 May 1985
836 Test for batch-mode after trying to open the transcript file, not before (DRF). $92 R
837 Be prepared for string pool overflow while reading the command line! (This
bug was first found in METAFONT, when it could occur more easily.) $525 R
7 Aug 1985
838 Fix a bug in \edef\fooC\iffalse\fi\the\toksO): l&$ should stay in the
loop when expanding non-\the. (Found by Dan Brotsky.) $478 A
The above changes were incorporated in Version 1.5.
27 Nov 1985
764 H 839 Make ‘plain’ a lowercase name, for consistency with the manual. $521 C
669 H 840 Wake up the terminal for \show commands. $1294,1297 I
The above changes were incorporated in Version 2.0, which was published as
Volume B of the Computers & Typesetting series.
15 Dec 1986
841 Punctuate the Poirot help message more carefully. $1283 I
28 Jan 1987
842 Make sure that mu-in-open doesn’t exceed 127 (DRF). $14 R
680 H 843 Don’t allow a \kern to be clobbered at the end of a pre-break list when a
discretionary break is taken. (A missing ‘else’was the source of the error,
diagnosed incorrectly before.) $881 D
844 Take account of discarded nodes when computing the background width after
a discretionary. $840 D
That was the first really serious bug detected for more than 17 months! I found
it while experimenting with right-to-left extensions.
a Version 2.1 was released on January 26, 1987.
5 Feb 1987
845 Remove cases in shorthand-def that cannot occur (found by Pat Monardo). $1224 E
14 Apr 1987
846 Improve robustness of data structure display when debugging (Ronald0 Am&).
$174,182 R
21 Apr 1987
847 Make the storage allocation algorithm more elegant and efficient. $127 E
22 Apr 1987
742 H 848 Calculate the empty-line condition properly when end-line-char is absent. $360 A
The previous three changes were found while I was teaching a class based on
Volume B; they led to Version 2.2.
28 Apr 1987
849 Avoid closing a file when T)$ knows that it isn’t open (JS). $560 E
3 Aug 1987
850 Clean up unfinished output if it’s necessary to jump-out (Klaus Gunterman). $642 S
That makes Version 2.3; subsequent version numbers won’t be logged here.
19 Aug 1987
851 Indent rules properly in cases like
\hangindent=lpt$$\halign{ ...\cr\noalign{\hrule))$$. $806 A
684 D. E. KNUTH
20 Aug 1987
852 Introduce co-backup because of cases like \hskip Opt plus lfil\ifdim (Alan
Guth). $366 S
9 Nov 1987
853 Change the calculation for number of leader boxes, so that it won’t be too
sensitive to roundoff error near exact multiples (M. F. Bridgland). $626 S
17 Nov 1987
854 Replace my stupid algorithm for fixed-point multiplication of negatives (W. G.
Sullivan). $572 A
12 Dec 1987
855 Fix a typo in the initialization of hyphenation tables (Peter Breitenlohner). $952 B
That error was almost completely harmless, thus undetectable, except if some
\kcode is 1 and no \patterns are given.
23 Dec 1987
564 H 856 Be more cautious when “relaxing” a previously undefined \csname;you might
be inside a group (CET). $372 S
20 Apr 1988
857 Make sure temp-head is well-formed whenever it can be printed in a LLrunaway”
message: Consider constructions like \outer\def\aOO\a\a (Silvio Levy). $391 S
24 Apr 1988
858 Avoid conflicting use of the string pool in constructions like \def\\#l{)\input
a\\b (Robert Messer). $260 S
10 May 1988
859 Amend the \patterns data structure when trie-mzn = 0 (Breitenlohner). $951,953 R
25 M a y 1988
860 Guarantee that trie-pointer cannot be out of range. $923 R
861 Avoid additional bugs like #858 in constructions like \input a\romannumerall,
etc. $464,465,470 S
862 Prevent similar string pool confusion that could occur during the processing of
**\input\romannumera16. $525 R
19 Jun 1988
819 H 863 Prevent a negative dividend from rounding upward, causing a loop (CET). $126 S
819 ++ 864 Adopt a smoother allocation strategy when memory is nearly gone (CET). $126 E
20 Jun 1988
852 H 865 Initialize cur-order, now that it’s being backed up (Tsunetoshi Hayashi). $439 D
6 Nov 1988
612 ++ 866 Disable fatal-error in prompt-input, so that open-log-file can use it safely (Tim
Morgan). $71 S
836 H 867 Force terminal output whenever open-log-file fails. $535 s
We’re ROW u p to Version 2.94; I sincerely hope all bugs have been found.
THE ERRORS OF TEX 685
REFERENCES
1. Piet Hein, Gmks, M I T Press, 1966.
2. C. Szkchy, Foundation Failures, Concrete Publications, London, 1961.
3. A. Endres, ‘An analysis of errors and their causes in system programs’, Proc. Int. Cbnf. Software
Eng., 1975, pp. 327-336.
4. Victor R. Basili and Barry T. Perricone, ‘Software errors and complexity: an empirical investigation’,
Communications of the ACM, 27, 42-52 (1984).
5. L. A. Belady and M. M. Lehman, ‘A model of large program development’, IBM Systems J., 15,
225-252 (1976).
6. Donald E. Knuth, TEX: The Program, Addison-Wesley, 1986.
7. Donald E. Knuth, ‘Literate programming’, The Computer Journal, 27, 97-1 11 (1984).
8. Donald E. Knuth, ‘The WEB system of structured documentation’, Stanford Computer Science Report
STAN-(3-980, September 1983.
9. Patrick Winston, Artifcia1 Intelligence: An MIT Perspectice, M I T Press, 1979.
10. Donald E. Knuth, ‘The letter S’, The Mathematical Intelligencer, 2, 114-122 (1980).
11. Donald E. Knuth, ‘Mathematical typography’, Bulletin of the h z e n k a n Mathematical Society (new
series) 1, 337-372 (1979).
12. Donald E. Knuth, Seminumerical Algorithms, second edition, Addison-Wesley, 1981.
13. Donald E. Knuth and Michael F. Plass, ‘Breaking paragraphs into lines’, Software-Practice and
Experience, 11, 1119-1184 (1981).
14. Donald E. Knuth, ‘The concept of a meta-font’, fisible Language, 16, 3-27 (1982).
15. Donald E. Knuth, The T ~ X h o kAddison-Wesley,
, 1984.
16. Barbara Beeton (ed), TEX and IIIETAI’OVT: Errata and Changes, 09 September 1983, distributed
with T C G h a t , 4 (1983).
17. Donald E. Knuth, TEX, a System for Technical Text, American Mathematical Society, 1979.
18. Donald E. Knuth, TEX and METM0,VT: AVew Directions in Typesetting, Digital Press, 1979.
19. David R. Fuchs and Donald E. Knuth, ‘Optimal prepaging and font caching’, AC’M Transactions on
Programming Languages and Systems, 7, 62-79 (1985).
20. Donald E. Knuth, ‘A torture test for TEX’, Stanford Computer Science Report STAY-C’S-1027,
November 1984.
21. Donald E. Knuth, ‘A torture test for METAFONT’, Stanford Computer Science Report STAY-C‘S-
1095, January 1986.
22. Donald E. Knuth, Sorting and Searching, Addison-Wesley, 1973.
23. Brian W. Kernighan and Lorinda L. Cherry, ‘A system for typesetting mathematics’, Communications
of the ACM, 18, 151-157 (1975).
24. Guy L. Steele Jr., Donald R. Woods, Raphael A. Finkel, Mark R. Crispin, Richard M. Stallman
and Geoffrey S. Goodfellow, Hacker’s Dictionary: A Guide to the World of Lliiards, Harper and
Row, 1983.
25. Donald E. Knuth, Fundamental Algorithms, Addison-Wesley, 1968.
26. Reinhard Budde, Christiane Floyd, Reinhard Keil-Slawik and Heinz Zullighoven, (eds) Software
Development and Reality Construction, in preparation.
George Forsythe and the
Development of Computer Science
by Donald E. Knuth
The sudden death of George Forsythe this spring was considered such combinatorial algorithms to be a part of
a serious loss to everyone associated with computing. numerical analysis [46, p. 7], and he regarded automatic
When we recall the many things he contributed to the programming as another branch [49, p. 655]. He began
field during his lifetime, we consider ourselves fortunate to foresee the less obvious implications of programming:
that computer science has had such an able leader. The use of practically any computing technique itself raises a
My purpose in this article is to review George For- number of mathematical problems. There is thus a very con-
sythe's contributions to the establishment of Computer siderable impact of computation on mathematics itself, and this
may be expected to influence mathematical research to an in-
Science as a recognized discipline. It is generally agreed creasing degree. [46, p. 5]
that he, more than any other man, is responsible for the The automatic computer really forces that precision of thinking
rapid development of computer science in the world's which is alleged to be a product of any study of mathematics.
colleges and universities. His foresight, combined with his [49, p. 655l
untiring efforts to spread the gospel of computing, have He also noticed that the rise of computers was being
had a significant and lasting impact; one might almost accompanied by an unprecedented demand for young
regard him as the Martin Luther of the Computer Re- mathematicians:
formation! The majority of our undergraduate mathematics majors are lured
Since George's publications express these ideas so at once into the marketplace, where they are greatly in demand
well, I believe the best way to summarize his work is to as servants of the fast-multiplying family of fast-multiplying com-
puters. [49, p. 651]
repeat many of the things he said, in his own words. This
article consists mainly of the quotations that particularly Therefore he began to argue th,at computers should
struck me as I reread his papers recently. Indeed, much of play a prominent role in undergraduate mathematics edu-
what follows belongs in a computer-science supplement cation. At this time he felt that only one new course was
to Bartlett's Familiar Quotations. needed for undergraduates, namely an introduction to
programming; he stressed that the best way to teach it
F r o m N u m e r i c a l A n a l y s i s to Computer Science would be to combine computer programming with the
George's early training and research in numerical traditional courses, instead of having separate training in
analysis was a good blend of theory and practice: numerical analysis. His paper "The Role of Numerical
The fact that the CPC was generally wrong when I knew the Analysis in an Undergraduate Program" [49] suggests
answer made me wonder what it was like for someone who didn't over 50 good ways to mix computing into other courses;
know what to expect. [76, p. 5] these suggestions ought to be required reading for all
Starting in 1948 he worked for the National Bureau of teachers today, since they are now perhaps even more
Standards' Institute for Numerical Analysis in Los Ange- relevant than they were in 1959. Indeed, the ad'aptation
les, California, where he did extensive programming for of traditional courses has been painfully slow (probably
the SWAC computer. In 1954 this Institute became part of because professors of the older generation have not
U.C.L.A., and he put a great deal of energy into the teach- wanted to dirty their hands with the newfangled ma-
ing of mathematics and numerical analysis. H e also worked chines) ; in 1970 Forsythe was still strongly urging math-
on nonnumerical problems, such as the tabulation of all ematics teachers to bend a little:
possible semigroups on four elements; at this time, he Compared with most undergraduate subjects, mathematics courses
SUMMARY
This paper discusses a new approach to the problem of dividing the text of a paragraph into
lines of approximately equal length. Instead of simply making decisions one line at a time,
the method considers the paragraph as a whole, so that the final appearance of a given line
might be influenced by the text on succeeding lines. A system based on three simple primitive
concepts called ‘boxes’, ‘glue’, and ‘penalties’ provides the ability to deal satisfactorily with
a wide variety of typesetting problems in a unified framework, using a single algorithm that
determines optimum breakpoints. The algorithm avoids backtracking by a judicious use
of the techniques of dynamic programming. Extensive computational experience confirms
that the approach is both efficient and effective in producing high-quality output. The paper
concludes with a brief history of line-breaking methods, and an appendix presents a simplified
algorithm that requires comparatively few resources.
KEY WORDS Typesetting Composition Linebreaking Justification Dynamic programming
Word processing Layout Spacing Box/glue/penalty algebra Shortest paths
TEX (Tau Epsilon Chi) History of printing
INTRODUCTION
One of the most important operations necessary when text materials are prepared for
printing or display is the task of dividing long paragraphs into individual lines. When
this job has been done well, people will not be aware of the fact that the words they
are reading have been arbitrarily broken apart and placed into a somewhat rigid and
unnatural rectangular framework; but if the job has been done poorly, readers will
be distracted by bad breaks that interrupt their train of thought. In some cases it
can be difficult to find suitable breakpoints; for example, the narrow columns often
used in newspapers allow for comparatively little flexibility, and the appearance of
mathematical formulas in technical text introduces special complications regardless
of the column width. But even in comparatively simple cases like the typesetting of
an ordinary novel, good line breaking will contribute greatly to the appearance and
desirability of the finished product. In fact, some authors actually write better material
when they are assured that it will look sufficiently beautiful when it appears in print.
The line-breaking problem is informally called the problem of ‘justification’, since it
is the ‘J’ of ‘H & J’ (hyphenation and justification) in today’s commercial composition
and word-processing systems. However, this tends to be a misnomer, because printers
*This research was supported in part by the National Science Foundation under grants IST-7921977
and MCS-7723738; by Office of Naval Research grant N00014-7&C-0330; by the IBM Corporation; and
by the Addison-Wesley Publishing Company. ‘mX’and ‘Tau Epsilon Chi’ are registered trademarks of the
American Mathematical Society.
have traditionally used justification to mean the process of taking an individual line of
type and adjusting its spacing to produce a desired length. Even when text is being
typeset with ragged right margins (therefore ‘unjustified’), it needs to be broken into
lines of approximately the same size. The job of adjusting spaces so that left and
right margins are uniformly straight is comparatively laborious when one must work
with metal type, so the task of typesetting a paragraph with last century’s technology
was conceptually a task of justification; nowadays, however, it is no trick at all for
computers to adjust the spacing as desired, so the line-breaking task dominates the
work. This shift in relative difficulty probably accounts for the shift in the meaning of
‘justification’; we shall use the term ‘line breaking’ in this paper to emphasize the fact
that the central problem of concern here is to find breakpoints.
The traditional way to break lines is analogous to what we ordinarily do when using
a typewriter: A bell rings (at least conceptually) when we approach the right margin,
and at that time we decide how best to finish off that line, without looking ahead to see
where the next line or lines might end. Once the typewriter carriage has been returned
to the left margin, we begin afresh without needing to remember anything about the
previous text except where the new line starts. Thus, we don’t have to keep track of
many things at once; such a system is ideally suited to human operation, and it also
leads to simple computer programs.
Book printing is different from typing primarily in that the spaces are of variable
width. Traditional practice has been to assign a minimum and maximum width to
interword spaces, together with a normal width representing the ideal situation. The
standard algorithm for line breaking (see, for example, Barnett’, page 55) then proceeds
as follows: Keep appending words to the current line, assuming the normal spacing,
until reaching a word that does not fit. Break after this word, if it is possible to do
so without compressing the spaces to less than the given minimum; otherwise break
before this word, if it is possible to do so without expanding the spaces to more than
the given maximum. Otherwise hyphenate the offending word, putting as much of it
on the current line as will fit; if no suitable hyphenation points can be found, this may
result in a line whose spaces exceed the given maximum.
There is no need to confine computers to such a simple procedure, since the data for
an entire paragraph is generally available in the computer’s memory. Experience has
shown that significant improvements are possible if the computer takes advantage of
its opportunity to ‘look ahead’ at what is coming later in the paragraph, before making
a final decision about where any of the lines will be broken. This not only tends to
avoid cases where the traditional algorithm has to resort to wide spaces, it also reduces
the number of hyphenations necessary. In other words, line breaking decisions provide
another example of the desirability of ‘late binding’ in computer software.
One of the principal reasons for using computers in typesetting is to save money, but
at the same time we don’t want the output to look cheaper. A properly programmed
computer should, in fact, be able to solve the line-breaking problem better than a
skilled typesetter could do by hand in a reasonable amount of time (unless we give this
person the liberty to change the wording in order to obtain a better fit). For example,
Duncan2 studied the interword spacing of 958 lines that were manually typeset by a
“most respectable publishers’ printer” that he chose not to identify by name, and he
found that nearly 5 % of the lines were quite loosely set; the spaces on those lines
exceeded 10 units (i.e., of an em), and two of the lines even had spaces ezceeding
13 units. We shall see later that a good line-breaking algorithm can do better than this.
BREAKING PARAGRAPHS INTO LINES 1121
Besides the avoidance of hyphens and wide spaces, we can improve on the traditional
line-breaking method by keeping the spaces nearly equal to the normal size, so that
they rarely approach the minimum or maximum limits. We can also try to avoid rapid
changes in the spacing of adjacent lines; we can make special efforts not to hyphenate
two lines in a row, and not to hyphenate the second-last line of a paragraph; we can
try to control the white space on the final line of the paragraph; and so on. Given any
mathematical way to rate the quality of a particular choice of breakpoints, we can ask
the computer to find breakpoints that optimize this function.
But how is the computer to solve such a problem efficiently?When a given paragraph
has n optional breakpoints, there are 2" ways to break it into lines, and even the fastest
conceivable computers could not run through all such possibilities in a reasonable
amount of time. In fact, the job of breaking a paragraph as nicely as possible into
equal-size lines sounds suspiciously like the infamous bin-packing problem, which is
well known to be NP ~ o m p l e t e .Fortunately,
~ however, each line is to consist of
contiguous information from the paragraph, so the line-breaking problem is amenable
to the techniques of discrete dynamic p r ~ g r a m m i n g ~this
. ~ ;means there is a reasonably
efficient way to attack it. We shall see that the optimum breakpoints can be found
in practice with only about twice as much computation as needed by the traditional
algorithm; the new method is sometimes even faster than the old, when we consider
the time saved by not needing to hyphenate so often. Furthermore the new algorithm
is capable of doing other things like setting a paragraph one line longer or one line
shorter, in order to improve the layout of a page.
FORMULATING T H E PROBLEM
Let us now state the line-breaking problem explicitly in mathematical terms. We
shall use the basic concepts and terminology of the TEX typesetting system6, but in
simplified form, since the complexities of general typesetting would obscure the main
principles of line breaking.
For the purposes of this paper, a paragraph is a sequence x1x2 . . . x,,,of m items,
where each individual item xi is either a box specification, a glue specification, or a
penalty specification.
0 A box refers to something that is to be typeset: either a character from some font
of type, or a black rectangle such as a horizontal or vertical rule, or something
built up from several characters such as an accented letter or a mathematical
formula. The contents of a box may be extremely complicated, or they may be
extremely simple; the line-breaking algorithm does not peek inside a box to see
what it contains, so we may consider the boxes to be sealed and locked. As far as
we are concerned, the only relevant thing about a box is its width:When item x iof
a paragraph specifies a box, the width of that box is a real number w irepresenting
the amount of space that the box will occupy on a line. The width of a box may be
zero, and in fact it may also be negative, although negative widths must be used
with care and understanding according to the precise rules laid down below.
0 Glue refers to blank space that can vary its width in specified ways; it is an elastic
mortar used between boxes in a typeset line. When item xiof a paragraph specifies
glue, there are three real numbers (wi, y i ,zj) of importance to the line-breaking
1122 DONALD E. KNUTH AND MICHAEL F. PLASS
algorithm:
wi is the ‘ideal’ or ‘normal’ width;
yi is the ‘stretchability’;
zi is the ‘shrinkability’.
For example, the space between words in a line is often specified by the values
w i= f e m , yi = i e m , zi = $em, where one em is the set size of the type being
used (approximately the width of an uppercase ‘M’ in classical type styles). The
actual amount of space occupied by this glue can be adjusted when justifying
a line to some desired width; if the normal width is too small, the adjustment
is proportional to yi, and if the normal width is too large the adjustment is
proportional to zi. The numbers wi, yi, and zi may be negative, subject to certain
natural restrictions explained later; for example, a negative value of wi indicates
a backspace. When yi = zi = 0, the glue has a fixed width wi. Incidentally, the
word ‘glue’ is perhaps not the best term, because it sounds a bit messy; a word
like ‘spring’ would be better, since metal springs expand or compress to fill up
space in essentially the way we want. However, we shall continue to say ‘glue’, a
term used since the early days of TEX (1977), because many people claim to like
it. A glob of glue is often called a skip by T E X users, and it seems preferable to
speak of boxes and skips rather than boxes and springs or boxes and glues. A
skip, by any other name, is of course the same abstract concept, embodied by the
three values (wi,yi, xi).
Penalty specifications refer to potential places to end one line of a paragraph
and begin another, with a certain ‘aesthetic cost’ indicating how desirable or
undesirable such a breakpoint would be. When item x i of a paragraph specifies
a penalty, there is a number pi that helps us decide whether or not to end a
line at this point, as explained below. Intuitively, a high penalty p i indicates
a relatively poor place to break, while a negative value of p i stands for a good
+
breaking-off place. The penalty p i may also be 00 or - 00, where ‘00’ denotes
a large number that is infinite for practical purposes, although it really is finite;
in T E X , any penalty 2 1000 is treated as +a,and any penalty 6 - 1000 is
+
treated as - co. When p i = co, the break is strictly prohibited; when p i =
- GO, the break is mandatory. Penalty specifications also have widths wi, with the
following meaning: If a line break occurs at this place in the paragraph, additional
typeset material of width wi will be added to the line just before the break occurs.
For example, a potential place at which a word might be hyphenated would be
indicated by letting p i be the penalty for hyphenating there and letting wi be
the width of the hyphen. Penalty specifications are of two kinds, flagged and
unflagged, denoted by f i = 1 and f i = 0. The line-breaking algorithm we shall
discuss tries to avoid having two consecutive breaks at flagged penalties (e.g., two
hyphenations in a row).
Thus, box items are specified by one number wi, while glue items have three numbers
(wi,yi,xi) and penalty items have three numbers (wi,pi,fi). For simplicity, we shall
assume that a paragraph x1 . . . x, is actually specified by six sequences, namely
Any fixed unit of measure can be used in connection with wi, yi,and zi; TEX uses
printers’ points, which are slightly less than inch. In this paper we shall specify
all widths in terms of machine units equal to h e m , assuming a particular size of
type, since the widths turn out to be integer multiples of this unit in many cases;
the numbers in our examples will be as simple as possible when expressed in terms
of machine units.
Perhaps the reader feels this is altogether too much mathematical machinery to
deal with something that is quite straightforward. However, each of the concepts
defined here must be dealt with somehow when breaking paragraphs into lines, and it is
important to give precise rules even for the comparatively simple job of setting straight
text. We shall see later that these primitive notions of boxes, glue, and penalties
will actually support a surprising variety of other line-breaking applications, so that a
careful attention to details bill solve many other problems as a free bonus.
For the time being, it will be best to think of the simple application to straight
text material such as the typesetting of a paragraph in a newspaper or in a short story,
since this will help us internalize the abstract concepts represented by wi,yi,etc. A
typesetting system like TEX will put such an actual paragraph into the abstract form
we want in the following way:
(1) If the paragraph is to be indented, the first item x, will be an empty box whose
width is the amount of indentation.
(2) Each word of the paragraph becomes a sequence of boxes for the characters of the
word, including punctuation marks that belong with that word. The widths w i
are determined by the fonts of type being used. Flagged penalty items are inserted
into these words wherever an acceptable hyphenation could be used to divide a
word at the end of a line. (Such hyphenation points do not need to be included
unless necessary, as we shall see later, but for the moment let us assume that all
of the permissible hyphenations have been specified.)
(3) There is glue between words, corresponding to the recommended spacing conven-
tions of the fonts of type in use. The glue might be different in different contexts;
for example, T E X will make the glue specifications following punctuation marks
slightly different from the normal interword glue.
(4)Explicit hyphens and dashes in the text will be followed by flagged penalty items
having width zero. This specifies a permissible line break after a hyphen or a
dash. Some style conventions also allow breaks before em-dashes, in which case
an unflagged width-zero penalty would precede the dash.
(5) At the very end of a paragraph, two items are appended so that the final line
will be treated properly. First comes a glue item x,-, that specifies the white
space allowable at the right of the last line; then comes a penalty item x, with
1124 DONALD E. KNUTH AND MICHAEL F. PLASS
specify the ends of k lines into which the paragraph will be broken. Each penalty
item xiwhose penalty p i is - GO must be included among these breakpoints; thus, the
final breakpoint b, must be equal to m. For convenience we let b, = 0, and we define
--
indices a I < . <ak to mark the beginning of the lines, as follows: The value of aj
is the smallest integer i between b j - l and bj such that xi is a box item or a penalty
item with p i = -a; if none of the x i in the range b j - l < i < bj meet this criterion,
<
we let aj = b j . Then the j t h line consists of all items x i for a j i < b j , plus item
x b , if it is a penalty item. In other words we get the lines of the broken paragraph by
cutting it into pieces at the chosen breakpoints, then removing glue and penalty items
at the beginning of each resulting line.
DESIRABILITY CRITERIA
According to this definition of line breaking, there are 2” ways to break a paragraph
into lines, if the paragraph has n legal breakpoints that aren’t forced. For example,
there are 129 legal breakpoints in the paragraph of Figure 1 , not counting x6,,, so
it can be broken into lines in 2129ways, a number that exceeds lo3’. But of course
most of these choices are absurd, and we need to specify some criteria to separate
acceptable choices from the ridiculous ones. For this purpose we need to know (a) the
desired lengths of lines, and (b) the lengths of lines corresponding to each choice of
breakpoints, including the amount of stretchability and shrinkability that is present.
Then we can compare the desired lengths to the lengths actually obtained.
1126 DONALD E. KNUTH AND MICHAEL F. PLASS
We shall assume that a list of desired lengths Z,, I,, .Z3, . . .is given; normally these
are all the same, but in general we might want lines of different lengths, as when fitting
text around an illustration. The actual length Lj of thejth line, after breakpoints have
been chosen as above, is computed in the following obvious way: We add together
the widths wi of all the box and glue items in the range uj < i < b,, and we add w,,,
to this total if xb, is a penalty item. T h e j t h line also has a total stretchability Y j and
total shrinkability Zj, obtained by summing all of the yi and zi for glue items in the
<
range uj i < bj. Now we can compare the actual length L j to the desired length Z,
by seeing if there is enough stretchability or shrinkability to change Lj into 4; we
define the adjustment ratio rj of the j t h line as follows:
If Lj = lj (a perfect fit), let rj = 0.
If L, < Zj (a short line), let r, = (Zj-Lj)/Y j , assuming that Y j > 0; the value
of rj is undefined if Yj<0 in this case.
If L, > Z, (a long line), let r j = (Zj-Lj)/Zj,assuming that Z j > 0; the value of r j
is undefined if Z j < 0 in this case.
4
Thus, for example, rj = if the total stretchability of l i n e j is three times what would
be needed to expand the glue so that the line length would change from L, to 5.
According to this definition of adjustment ratios, t h e j t h line can be justified by
letting the width of all glue items xion that line be
w i + r j y i , if rj> 0;
w i + r j z i , if rj< 0;
For if we add up the total width of that line after such adjustments are made, we get
5
either Lj+rj Yj = or L j + r j Z j = 5, depending on the sign of rj. This distributes
the necessary stretching or shrinking by amounts proportional to the individual glue
components yi or zi, as desired.
For example, the small numbers at the right of the individual lines in Figure 1 show
the values of rj in those lines. A negative ratio like - .881 in the third line means that
the spaces in that line are narrower than their ideal size; a fairly large positive ratio
like .965 in the third-last line indicates a very ‘loose’ fit.
Although there are 2lZ9ways to break the paragraph of Figure 1 into lines, it turns
out that only 49 of these will result in breaks whose adjustment ratios rj do not
exceed 1 in absolute value; this means that the spaces between words after justification
will lie between wi-zi and wi+yi. Furthermore, only 30 of these 49 ways to make
‘nice’breaks will do so without introducing hyphens. One of these ways is obtained by
moving ‘the’ from the eighth line down to the ninth.
Our main goal is to find a way to avoid choosing any breakpoints that lead to lines
in which the words are spaced very far apart,
or in which they are very close together, because such lines are distractingand harder to read.
We might therefore say that the line-breaking problem is to find breaks such that
lrjl < 1 in each line, with the minimum number of hyphenations subject to this
condition. Such an approach was taken by Duncan et a1.’ in the early 1960s, and
they obtained fairly good results. However, this criterion depends only on the values
wi-zi and wi+yi, not wi itself, so it does not use all the degrees of freedom present
in our data. Furthermore, such stringent conditions may not be possible to achie‘ve; for
example, if each line of our example were to be 418 units wide, instead of the present
BREAKING PARAGRAPHS INTO LINES 1127
Figure 2. The paragraph went out into the for&& and sat down by theside of .zsI
of Figure 1 when the ‘best-jit’ the cool fouqtain; and when she was bored she took a -.lal
method has been used to find golden ball, and threw it up on high and caught it; .60z
successive breakpoints. and this ball was her favoqjte plaGhing. .a01
width of 421 units, there would be no way to set the text of Figure 1 without having at
least one very tight line (rj < -1) or at least one very loose line (rj > 1).
We can do a better job of line breaking if we deal with a continuously varying
criterion of quality, not simply the yes/no tests of whether Irjl ,< 1 or not. Let us
therefore give a quantitative evaluation of the badness of the j t h line by finding a
I 1
formula that is nearly zero when rj is small but grows rapidly when rj takes values I I
exceeding 1. Experience with TEX has shown that good results are obtained if we
define the badness of l i n e j as follows:
if rj is undefined or rj < - 1;
Bj = ( 1001rjI3,
00,
otherwise.
Thus, for example, the individual lines of Figure 1 have badness ratings that are
approximately equal to 0, 7, 68, 18, 5 , 0, 69, 72, 90, 49, 0, respectively. Note that a
line is considered to be ‘infinitely bad’ if rj < -1; this means that glue will never be
shrunk to less than wi -zi. However, values of rj >1 are only finitely bad, so they
will be permitted if there is no better alternative.
A slight improvement over the method used to produce Figure 1 leads to Figure 2.
Once again each line has been broken without looking ahead to the end of the paragraph
and without going back to reconsider previous choices, but this time each break was
chosen so as to minimize the ‘badness plus penalty’ of that line. In other words, when
choosing between alternative ways to end thejth line, given the ending of the previous
line, we obtain Figure 2 if we take the minimum possible value of Pj+nj; here pj is
the badness as defined above, and nj is the amount of penalty pbj if the j t h line ends
at a penalty item, otherwise nj = 0. Figure 2 improves on Figure 1 by moving words
down from lines 4, 8, and 10 to the next line.
The method that produces Figure 1 might be called the ‘first-fit’ algorithm, and the
corresponding method for Figure 2 might be called the ‘best-fit’ algorithm. We have
seen that best-fit is superior to first-fit in this particular case, but other paragraphs can
be contrived in which first-fit finds a better solution; so a single example is not sufficient
to decide which method is preferable. In order to make an unbiased comparison of
the methods, we need to get some statistics on their ‘typical’ behavior. Therefore
300 experiments were performed, using the text of Figures 1 and 2, with line widths
ranging from 350 to 649 in unit steps; although the text for each experiment was the
same, the varying line widths made the problems quite different, since line-breaking
algorithms are quite sensitive to slight changes in the measurements. The ‘tightest’
1128 DONALD E. KNUTH AND MICHAEL F. PLASS
and ‘loosest’ lines in each resulting paragraph were recorded, as well as the number of
hyphens introduced, and the comparisons came out as follows:
min yj max rj hyphens
first-fit < best-fit 69% 35% 12%
first-fit = best-fit 26% 50% 77%
first-fit > best-fit 5% 15% 11%
Thus, in 69% of the cases, the minimum adjustment ratio rj in the lines typeset
by first-fit was less than the corresponding value obtained by best-fit; the maximum
adjustment ratio in the first-fit lines was less than the maximum for best-fit about 35%
of the time; etc. We can summarize this data by saying that the first-fit method usually
typesets at least one line that is tighter than the tightest line set by best-fit, and it
also usually produces a line that is as loose or looser than the loosest line of best-fit.
T h e number of hyphens is about the same for both methods, although best-fit would
produce fewer if the penalty for hyphenation were increased. A more detailed study of
the experimental data shows that the superiority of best-fit is especially pronounced in
the cases where the lines are rather narrow.
We can actually do better than both of these methods by finding an ‘optimum’
way to choose the breakpoints. For example, Figure 3 shows how to improve on both
Figures 1 and 2 by making line 6 a bit looser, thereby avoiding a rather tight 7th line
and a fairly loose 10th line. This pattern of breakpoints was found by an algorithm
that will be discussed in detail below. It is globally optimum in the sense of having
fewest total ‘demerits’ over all choices of breakpoints, where the demerits assessed for
the j t h line are computed by the formula
Sj=
I (I +pj+nj)’+aj,
(I+pj)’-n;+aj,
(1 +Pj>’+aj,
if n j j O ;
if --o0<nj<o;
if nj = - m .
Here pj and nj are the badness rating and the penalty, as before; and aj is zero unless
both l i n e j and the previous line ended on flagged penalty items, in which case aj is
the additional penalty assessed for consecutive hyphenated lines (e.g., 3000). We shall
say that we have found the best choice of breakpoints if we have minimized the sum
of Sj over all linesj.
BREAKING PARAGRAPHS INTO LINES 1129
The above formula for Sj is quite arbitrary, like our formula for pj, but it works well
in practice because it has the following desirable properties: (a) Minimizing the sum
of squares of badnesses not only tends to minimize the maximum badness per line, it
also provides secondary optimization; for example, when one particularly bad line is
inevitable, the other line breaks will also be optimized. (b) The demerit function Sj
increases as nj increases, except in the case nj = - co when we don’t need to consider
the penalty because such breaks are forced. (c) Adding 1 to j j instead of using the
badness pj by itself will minimize the total number of lines in cases where there are
breaks with approximately zero badness.
For example, the following table shows the respective demerits charged to the in-
dividual lines of the paragraphs in Figures 1 , 2, and 3:
First fit Best fit Optimum fit
1 1 1
64 64 64
4803 4803 4803
374 96 96
39 33 33
2 2 1274
4958 4958 43
5313 11 581
8252 3 166
2497 519 1
1 1 ~
1
26304 10491 7063
In the first-fit and best-fit methods, each line is likely to come out about as badly as
any other; but the optimum-fit method tends to have its bad cases near the beginning,
since there is less flexibility in the opening lines.
Figure 4 on the following page shows another comparison of the same three methods
on the same text, this time with a line width of 500 units. Note that the optimum
algorithm finds a solution that does not hyphenate any words, because of its ability
to ‘look ahead’; the other two methods, which proceed one line at a time, miss this
solution because they do not know that a slightly worse first line leads in this case to
fewer problems later on. The demerits per line in Figure 4 are
First fit Best fit Optimum fit
1734 1734 2357
4692 4692 6
3440 3440 93 8
3066 9 21 2
3 1 1
1 22 2
276 210 27
5 24 10
1 10 47 6
1 1
13218 10143 403 0
In this example the 3440 demerits on the third line for ‘first fit’ and ‘best fit’ are
primarily due to the penalty of 50 for an inserted hyphen.
1130 DONALD E. KNUTH AND MICHAEL F. PLASS
(a> In olden times when wishing still helped one, there lived a king -.I41
ished whenever it shone in her face. Close by the king’s castle lay
a great dark forest, and under an old lime-tree in the forest was -.lol
a well, and when the day was very warm, the king’s child went
out into the forest and sat down by the side of the cool fountain; -.6a8
and when she was bored she took a golden ball, and threw it up -.aaa
on high and caught it; and this ball was her favorite plaything. .aaO
In olden times when wishing still helped one, there lived a king -.Tal
whose daughters were all beautiful; and the youngest was so .nTT
beautiful that the sun itself, which has seen so much, was aston- -.4as
ished whenever it shone in her face. Close by the king’s castle . a m
lay a great dark forest, and under an old lime-tree in the forest .OaT
was a well, and when the day was very warm, the king’s child .3aa
went out into the forest and sat down by the side of the cool m a
fountain; and when she was bored she took a golden ball, and .a40
threw it up on high and caught it; and this ball was her favorite -.a61
plaything. .am
so beautiful that the sun itself, which has seen so much, was .EET
castle lay a great dark forest, and under an old lime-tree in the .OaI
forest was a well, and when the day was very warm, the king’s .lTa
child went out into the forest and sat down by the side of the
cool fountain; and when she was bored she took a golden ball, .aTs
and threw it up on high and caught it; and this ball was her .sn3
Figure 4 . A somewhat wider setting of the same sample paragraph, by ( a ) the first-fit
method, ( b ) the best-fit method, and ( c ) the optimum-fit method. Notice the tight line
followed by a loose line at the beginning of examples ( a ) and ( b ) , while no hyphenation
was needed in ( c ) ; on the other hand, ( a ) is one line shorter than ( b ) and ( c ) .
The first-fit method found a way to set the paragraph of Figure 4 in only nine lines,
while the optimum-fit method yields ten. Publishers who prefer to save a little paper,
as long as the line breaks are fairly decent, might therefore prefer the first-fit solution
in spite of all its demerits. There are various ways to modify the specifications so that
the optimum-fit method will give more preference to short solutions; for example, the
stretchability of the glue on the final line could be decreased from its present huge
size to about the width of the line, thereby making the optimum algorithm prefer final
lines that are nearly full. We could also replace the constant ‘1’ in the definition of
demerits Sj by a variable parameter. T h e algorithm we shall describe below can in fact
be set up to produce the optimum solution having the minimum number of lines.
T h e text in these examples is quite straightforward, and we have been setting type
in reasonably wide columns; thus we have not been considering especially difficult or
BREAKING PARAGRAPHS INTO LINES 1131
In the meantime it
knocked a second
time, and cried,
“Princess, youngest
princess, open the
door for me. Do you
Figure 5 . Here the best-fit method is unable to find a satisfactory way to not know what you
break the lines, with respect to justified setting, because the columns are said to me yesterday
so narrow. For example, the third line contains only two spaces, and the by the cool waters of
the we117 princess,
Princess,
third-last line only one; these spaces would have to stretch considerably if
the lines were justified. Thefirst line of this paragraph also illustrates the open the door for
‘sticking-out’ problem that arises in unjustified settings. ma!”
unusual line-breaking problems. Yet we have seen that an optimizing algorithm can
produce noticeably better results even in such routine cases. The improved algorithm
will clearly be of significant value in more difficult situations, for example when math-
ematical formulas are embedded in the text, or when the lines must be narrow as in
a newspaper.
Anyone who is curious about the fate of the beautiful princess mentioned in Figures 1
through 4 can find the answer in Figure 6, which presents the whole story. The columns
in this example are unusually narrow, allowing only about 21 or 22 characters per
line; a width of about 35 characters is normal for newspapers, and magazines often
use columns about twice as wide as those in Figure 6. The line-at-a-time algorithms
cannot cope satisfactorily with such stringent restrictions, but Figure 6 shows that the
optimizing algorithm is able to break the text into reasonably equal lines.
Incidentally, our line-breaking criteria have been developed with justified text in
mind; but the algorithm has been used in Figure 6 to produce ragged right margins.
Another criterion of badness, which is based solely on the difference between the
desired length 4 and the actual length Lj, should actually be used in order to get
the best breakpoints for ragged-right typesetting, and the space between words should
be allowed to stretch but not to shrink so that Lj never exceeds 4. Furthermore,
ragged-right typesetting should not allow words to ‘stick out’, i.e., to begin to the
right of where the following line ends (see the word ‘it’ in Figure 5). Thus, it turns
out that an algorithm intended for high quality line breaking in ragged-right formats
is actually a little bit harder to write than one for justified text, contrary to the
prevailing opinion that justification is more difficult. On the other hand, Figure 6
indicates that an algorithm designed for justification usually can be tuned to produce
adequate breakpoints when justification is suppressed.
The difficulties of setting narrow columns are illustrated in an interesting way by the
pattern of words
“Now, push your little golden plate nearer . . .”
that appears in the third-last paragraph of Figure 6. We don’t want to hyphenate any
of these words, for reasons stated earlier; and it turns out that all of the four-word
sequences containing the word ‘little’, namely
“Now, push your little
push your little golden
your little golden plate
little golden plate nearer
1132 DONALD E. KNUTH AND MICHAEL F. PLASS
I N olden times when the water. “Ah, old delighted t o see her sitting by the well,
wishing still helped water-splasher, is it pretty plaything once playing, my golden
one, there lived a king you?” said she; ”I more, and she picked ball fell into the
whose daughters were am weeping for my i t up and ran away water. And because
all beautiful; and the golden ball, which has with it. “Wait, wait,” I cried so, the frog
youngest was so beau- fallen into the well.” said the frog. “Take brought it out again
tiful that the sun it- “Be quiet, and do not me with you. I can’t for me; and because
self, which has seen so weep,” answered the run as you can.” But he so insisted, I prom-
much, was astonished frog. “I can help you; what did it avail him ised him he should
whenever it shone in but what will you give t o scream his croak, be my companion, but
her face. Close by me if I bring your croak, after her, as I never thought he
the king’s castle lay a plaything up again?” loudly as he could? would be able t o come
great dark forest, and “Whatever you will She did not listen t o out of his water. And
under an old lime-tree have, dear frog,” said it, b u t ran home and now he is outside
in the forest was a she; “my clothes, my soon forgot the poor there, and wants t o
well, and when the pearls and jewels, and frog, who was forced come in t o see me.”
day was very warm, even the golden crown to go back into his In the meantime
the king’s child went that I am wearing.” well again. it knocked a sec-
out into the forest The frog answered, The next day when ond time, and cried,
and sat down by “I do not care for your she had seated her- “Princess, youngest
the side of the cool clothes, your pearls self a t table with the princess, open the
fountain; and when and jewels, nor for king and all the cour- door for me. Do you
she was bored she your golden crown; tiers, and was eet- not know what you
took a golden ball, but if you will love ing from her little said t o me yesterday
and threw i t up on me and let me be golden plate, some- by the cool waters
high and caught it; your companion and thing came creeping of the well? Prin-
and this ball was her play-fellow, and sit splish splash, splieh cess, youngest prin-
favorite plaything. by you a t your little splash, up the marble cess, open the door
Now i t so happened table, and eat off your staircase; and when for me!”
that on one occasion little golden plate, i t had got t o the Then said the king,
the princess’s golden and drink out of your top, i t knocked a t “That which you have
ball did not fall into little cup, and sleep in the door and cried, promised must you
the little hand that your little bed-if you “Princess, youngest perform. Go and let
she was holding up will promise me this princess, open the him in.” She went
for it, but on t o the I will go down below, door for me.” She and opened the door,
ground beyond, and and bring you your ran to see who was and the frog hopped
it rolled straight into golden ball up again.” outside, but when in and followed her,
the water. The king’s “Oh yes,” said she, she opened the door, step by step, t o her
daughter followed i t “I promise you all there sat the frog chair. There he sat
with her eyes, but you wish, if you will in front of it. Then and cried, “Lift me
it vanished, and the but bring me my ball she slammed the door up beside you.” She
well was deep, so back again.” But she to, in great haste, delayed, until at last
deep that the bottom thought, “How the sat down t o dinner the king commanded
could not be seen. At silly frog does talk! again, and was quite her t o do it. Once the
this she began to cry, All he does is sit in the frightened. The king frog was on the chair
and cried louder and water with the other saw plainly that her he wanted t o be on
louder, and could not frogs, and croak. He heart was beating vi- the table, and when
be comforted. And can be no companion olently, and said, “My he was on the table he
as she thus lamented to any human being.” child, what are you so said, “Now, push your
someone said t o her, But the frog, when afraid of? Is there per- little golden plate
“What ails you, king’s he had received thie chance a giant outside nearer t o me, that
daughter? You weep promise, put his head who wants t o carry we may eat together.”
so that even a stone into the water and you away?” “Ah, no,” She did this, but it
would show pity.” sank down; and in a replied she. “It is no was easy t o see that
She looked round short while he came giant, it is a disgust- she did not do it will-
to the aide from swimming up again ing frog.” ingly. The frog en-
whence the voice with the ball in his “What does a frog joyed what he ate, but
came, and saw a frog mouth, and threw it want with you?” “Ah, almost every mouth-
stretching forth its on the grass. The dear father, yesterday ful she took choked
big, ugly head from king’s daughter was as I was in the forest her. At length he said,
are too long to fit in one line. Therefore the word ‘little’ will have to appear in a
line that contains only three words and two spaces, no matter what text precedes this
particular sequence.
The final paragraphs of the story present other difficulties, some of which involve
complex interactions spanning many lines of the text, making it impossible to find
breakpoints that would avoid occasional wide spacing if the text were justified. Figure 7
shows what happens whena portion of Figure 6 is, in fact, justified; this is the most
difficult part of the entire story, in which one of the lines in the optimum solution is
BREAKING PARAGRAPHS INTO LINES 1133
‘‘I have eaten and awoke them, a car-
am satisfied, now I riage came driving
am tired; carry me up with eight white
into your little room horses, which had
and make your little white ostrich feath-
silken bed ready, and ers on their heads,
we will both lie down and were harnessed
and go to sleep.” with golden chains;
The king’s daugh- and behind stood
ter began to cry, for the young king’s ser-
she was afraid of the vant Faithful Henry.
cold frog, which she Faithful Henry had
did not like t o touch, been so unhappy
and which was now when his master was
t o sleep in her pretty, changed into a frog,
clean little bed. But t h a t he had caused
the king grew angry three iron bands t o
and said, “He who be laid round his
helped you when you heart, lest i t should
were in trouble ought burst with grief and
not afterwards t o be sadness. The car-
despised by you.” So riage was to conduct
she took hold of the the young king into
frog with two fingers, his kingdom. Faithful
carried him upstairs, Henry helped them
and put him in a cor- both in, and placed
ner. But when she was himself behind again,
in bed he crept to her and was full of joy
and said, ‘‘I am tired, because of this de-
I want to sleep as well liverance. And when
as you; lift me up or I they had driven a part
will tell your father.” of the way, the king’s
At this she was terri- son heard a cracking
bly angry, and took behind him as if some-
him up and threw him thing had broken. So
with all her might he turned round and
against the wall. cried, “Henry, the
“Now, will you be carriage is breaking.”
quiet, odious frog?” “No, master, i t is
said she. But when he not the carriage. It
fell down he was no is a band from my
frog but a king’s son heart, t h a t was put
with kind and beauti- there in my great
ful eyes. He by her pain when you were Figure 6 . The tale of the Frog K i n g , typeset
father’s will was now a frog and impris- with quite narrow lines and with ‘ragged right’
her dear companion oned in the well.”
and husband. Then Again and once again margins. The breakpoints were optimally chosen
he told her how he while they were on under the assumption that the lines would
had been bewitched their way something be justijied; a somewhat dzfferent criterion of
by a wicked witch, cracked, and each
and how no one could time the king’s son optimality would have been more appropriate f o r
have delivered him thought the carriage unjustified setting, y e t the lines did turn out to
from the well but was breaking; but i t be of approximately equal width. Quite a f e w
herself, and t h a t to- was only the bands
morrow they would t h a t were spring- hyphenations were found to be desirable, since
go together into his ing from the heart this increases the number of spaces per line and
kingdom. of Faithful Henry aids justification, even though the penalty for
Then they went to because his master
sleep, and next morn- was set free and was hyphenation was increased f r o m 50 to 5000 in
ing when the sun so happy. this example.
forced to stretch by the enormous factor 6.833. The only way to typeset that paragraph
without such wide spaces is to leave it unjustified (unless, of course, we change the
problem by altering the text or the line width or the minimum size of spaces).
FURTHER APPLICATIONS
Before we discuss the details of an optimizing algorithm, it is worthwhile to consider
more fully how the basic primitives of boxes, glue, and penalties allow us to solve a
1134 DONALD E. K N U T H AND MICHAEL F. PLASS
Combining paragraphs
If the desired line widths Zi are not all the same, we might want to typeset two para-
graphs with the second one starting in the list of line lengths where the first one leaves
off. This can be done simply by treating the two paragraphs as one, i.e., appending the
second to the first, assuming that each paragraph begins with an indentation and ends
with finishing glue and a forced break as mentioned above.
Patching
Suppose that a paragraph starts on page 100 of some book and continues on to
the next page, and suppose that we want to make a change to the first part of that
paragraph. We want to be sure that the last line of the new page 100 will end at the
right-hand margin just before the word that appears at the beginning of page 101, so
that page 101 doesn’t have to be redone. It is easy to specify this condition in terms
of our conventions, simply by forcing a line break (with penalty - 00) at the desired
place, and discarding the subsequent text. T h e ability of the optimum-fit algorithm
to ‘look ahead’ means that it will find a suitable way to patch page 100 whenever it
is possible to do so.
We can also force the altered part of the paragraph to have a certain number of
lines, k, by using the following trick: Set the desired length Z k f l of the (k+ 1)st line
equal to some number 8 that is different from the length of any other line. Then an
empty box of width 8 that occurs between two forced-break penalty items will have to
be placed on line k 1. +
Punctuation in the margins
Some people prefer to have the right edge of their text look ‘solid’, by setting periods,
commas, and other punctuation marks (including inserted hyphens) in the right-hand
margin. For example, this practice is occasionally used in contemporary advertising.
It is easy to get inserted hyphens into the margin: We simply let the width of the
corresponding penalty item be zero. And it is almost as easy to do the same for periods
and other symbols, b y putting every such character in a box of width zero and adding
the actual symbol width to the glue that follows. If no break occurs at this glue, the
accumulated width is the same as before; and if a break does occur, the line will be
justified as if the period or other symbol were not present.
BREAKING PARAGRAPHS I N T O LINES 1135
A computer will have no qualms about breaking anywhere unless it is told not to; but a
human operator might well avoid bad breaks, perhaps even unconsciously.
Psychologically bad breaks are not easy to define; we just know they are bad. When
the eye journeys from the end of one line to the beginning of another, in the presence
of a bad break, the second word often seems like an anticlimax, or isolated from
its context. Imagine turning the page between the words ‘Chapter’ and ‘8’ in some
sentence; you might well think that the compositor of the book you are reading should
not have broken the text at such an illogical place.
During the first year of experience with TEX, the authors began to notice occasional
breaks that didn’t feel quite right, although the problem wasn’t felt to be severe enough
to warrant corrective action. Finally, however, it became difficult to justify our claim
that TEX has the world’s best line-breaking algorithm, when it would occasionally make
breaks that were semantically annoying; for example, the preliminary TEX manual6
has quite a few of these, and the first drafts of that manual were even worse.
As time went on, the authors grew more and more sensitive to psychologically bad
breaks, not only in the copy produced by TEX but also in other published literature,
and it became desirable to test the hypothesis that computers were really to blame.
Therefore a systematic investigation was made of the first 1000 line breaks in the ACM
Journal of 1960 (which was composed manually by a Monotype operator), compared
to the first 1000 line breaks in the ACMJournaZ of 1980 (which was typeset by one of
the best commercially available systems for mathematics, developed by Penta Systems
International). T h e final lines of paragraphs, and the lines preceding displays, were
not considered to be line breaks, since they are forced; only the texts of articles were
considered, not the bibliographies. A reader who wishes to try the same experiment
should find that the 1000th break in 1960 occurred on page 67, while in 1980 it occurred
on page 64. T h e results of this admittedly subjective procedure were a total of
13 bad breaks in 1960,
5 5 bad breaks in 1980.
In other words, there was more than a four-fold increase, from about 1% to a quite
noticeable 5 - 5 % ! Of course, this test is not absolutely conclusive, because the style of
articles in the ACM Journal has not remained constant, but it strongly suggests that
computer typesetting causes semantic degradation when it chooses breaks solely on the
basis of visual criteria.
Once this problem was identified, a systematic effort was made to purge all such
breaks from the second edition of Knuth’s book Seminumerical AZgorithms’, which
was the first large book to be typeset with TEX. I t is quite easy to get’the line-
breaking algorithm to avoid certain breaks by simply prefixing the glue item by a
1136 DONALD E. KNUTH AND MICHAEL F. PLASS
penalty with pi = 999, say; then the bad break is chosen only in an emergency, when
there is no other good way to set the paragraph. I t is possible to make the typist’s
job reasonably easy by reserving a special symbol (e.g., &) to be used instead of a
normal space between words whenever breaking is undesirable. Although this problem
has rarely been discussed in the literature, the authors subsequently discovered that
some typographers have a word for it: they call such spaces ‘auxiliary’. Thus there is
a growing awareness of the problem.
It may be useful to list the main kinds of contexts in which auxiliary spaces were
used in Seminumerical AZgorithms, since that book ranges over a wide variety of tech-
nical subjects. The following rules should prove to be helpful to compositors who are
keyboarding technical manuscripts into a computer.
1. Use auxiliary spaces in cross-references:
Theorem&A Algorithm&B Chapter&3 Tablek4 Programs E and&F
Note that no & appears after ‘Programs’ in the last example, since it would be
quite all right to have ‘E and F’ at the beginning of a line.
2. Use auxiliary spaces between a person’s forenames and between multiple sur-
names:
&.&I .&J. Matrix LuiskI. Trabb&Pardo Peter Van&Emde&Boas
A recent trend to avoid spaces altogether between initials may be largely a reaction
against typical computer line-breaking algorithms! Note that it seems better to
hyphenate a name than to break it between words; e.g., ‘Don-’and ‘ald E. Knuth’
is more tolerable than ‘Donald’ and ‘E. Knuth’. In a sense, rule 1 is a special
case of rule 2, since we may regard ‘Theorem A’ as a name; another example is
‘register&X’.
3. Use auxiliary spaces for symbols in apposition with nouns:
base&b dimensionkd function&f(x) string&sof lengthkl
However, compare the last example with ‘stringks of length k o r more’.
4. Use auxiliary spaces for symbols in series:
1,&2,or&3 a,&b, and&c l,&2, . . . ,&n
5 . Use auxiliary spaces for symbols as tightly-bound objects of prepositions:
of&x from 0 to&l increase z by&l in common with&m
This does not apply with compound objects: For example, type ‘of u&and&v’.
6. Use auxiliary spaces to avoid breaking up mathematical phrases that are rendered
in words:
equals&n less thanks mod&2 modulo&p‘ (given&X)
Also type ‘If &is. . .’, ‘when xkgrows’. Compare ‘is&15’,with ‘is 15ktimes the
height’; and compare ‘for all largekn’ with ‘for all nkgreater than&n,,’.
7 . Use auxiliary spaces when enumerating cases:
(b)&Showthat f(x) is (l)&continuous; (2)&bounded.
BREAKING PARAGRAPHS INTO LINES 1137
It would be nice to boil these seven rules down into one or two, and it would be even
nicer if the rules could be automated so that keyboarding could be done without them;
but subtle semantic considerations seem to be involved in many of these instances.
Most examples of psychologically bad breaks seem to occur when a single symbol or a
short group of symbols appears just before or after the break; one could do reasonably
well with an automatic scheme if it would associate large penalties with a break just
before a short non-word, and medium penalties with a break just after a short non-
word. Here ‘short non-word’ means a sequence of symbols that is not very long, yet long
enough to include instances like ‘exercise&lS(b)’, ‘length&~2~”, ‘order&n/2’followed by
punctuation marks; one should not simply consider patterns that have only one or two
symbols. On the other hand it is not so offensive to break before or after fairly long
sequences of symbols; e.g., ‘exercise 4.3.2-15’ needs no auxiliary space.
Many books on composition recommend against breaking just before the final word
of a paragraph, especially if that word is short; this can, of course, be done by using
an auxiliary space just before that last word, and the computer could insert this
automatically. Some books also give recommendations analogous to rule 2 above,
saying that compositors should try not to break lines in the middle of a person’s
name. But there is apparently only one book that addresses the other issues of psycho-
logically bad breaks, namely a nineteenth-century French manual by A. Frey”, where
the following examples of undesirable breaks are mentioned (vol. 1, p. 110):
Author lines
Most of the review notices published in Mathematical Reviews are signed with the
reviewer’s name and address, and this information is typeset flush right, i.e., at the
right-hand margin. If there is sufficient space to put such a name and address at the
right of the final line of the paragraph, the publishers can save space, and at the same
time the results look better because there are no strange gaps on the page. During
recent years the composition software used by the American Mathematical Society
was unable to do this operation, but the amount of money saved on paper made it
economical for them to pay someone to move the reviewer-name lines up by hand
wherever possible, applying scissors and (real) glue to the computer output.
Let us say that the ‘MR problem’ is to typeset the contents of a given box flush right
at the end of a given paragraph, with a space of at least w between the paragraph and
the box if they occur on the same line. This problem can be solved entirely in terms
of the box/glue/penalty primitives, as follows:
T h e final penalty of - co forces the final line break with the given box flush right; the
two penalties of + co are used to inhibit breaking at the following glue items. Thus,
the above sequence reduces to two cases: whether or not to break at the penalty of 50.
If a break is taken there, the ‘glue(w, 0,O)’ disappears, according to our rule that each
line begins with a box; the text of the paragraph preceding the penalty of 50 will be
followed b y ‘glue(0, 100000, O)’, which will stretch to fill the line as if the paragraph
had ended normally, and the given box on the final line will similarly be preceded by
‘glue(0, 100000,O)’to fill the gap at the left. O n the other hand if no break occurs at
the penalty of 50, the net effect is to have the glues added all together, producing
so that the space between the paragraph and the box is w or more. Whether the break is
chosen or not, the badness of the two final lines or the final line will be essentially zero,
because so much stretchability is present. T h u s the relative cost differential separating
the two alternatives is almost entirely due to the penalty of 50. T h e optimum-fit
algorithm will choose the better alternative, based on the various possibilities it has
for setting the given paragraph; it might even make the given paragraph a little bit
tighter than its usual setting, if this words out best.
algorithm will take pains to avoid comparatively small deviations. This is illustrated
in Figure 5 , which actually reads better than the corresponding paragraph in Figure 6
(except for the word that sticks out on the first line); hyphens were inserted into the
paragraph of Figure 6 in order to create more interword space for justification.
Although the box/glue/penalty model appears at first glance to be oriented solely to
the problem of justified text, we shall now see that it is powerful enough to be adapted
to the analogous problem of unjustified typesetting: If the spaces between words are
handled in the right way, we can make things work out so that each line has the same
amount of stretchability, no matter how many words are on that line. The idea is to
let spaces between words be represented by the sequence
glue(0,18,0)
penalty(0, 0,O)
glue(6, -18,O)
instead of the ‘glue(6,3,2)’ we used for justified typesetting. We may assume that there
is no break at the ‘glue(O,18,0)’ in the sequence, because it will always be at least as
good for the algorithm to break at the ‘penalty(0, 0, O)’, when 18 units of stretchability
are present. If a break occurs at the penalty, there will be a stretchability of 18 units
on the line, and the ‘glue(6, -18,O)’ will be discarded after the break so that the next
line will begin flush left. On the other hand if no break occurs, the net effect is to have
glue(6,0,0), representing a normal space with no stretching or shrinking.
Note that the stretchability of -18 in the second glue item has no physical signifi-
cance, but it nicely cancels out the stretchability of +18 in the first glue item. Negative
stretchability has several interesting applications, so the reader should study this
example carefully before proceeding to the more elaborate constructions below.
Optional hyphenations in unjustified text can be specified in a similar way; instead
of using ‘penalty(6,50,1)’ for an optional 6-unit hyphen having a penalty of 50, we
can use the sequence
penalty(0, 00 ,0)
glue(0,18,0)
penalty(6,500,1)
glue(0, -18,O).
The penalty has been increased here from 50 to 500, since hyphenations are not as
desirable in unjustified text. After the breakpoints have been chosen using the above
sequences for spaces and for optional hyphens, the individual lines should not actually
be justified, since a hyphen inserted by the ‘penalty(6,500,1)’would otherwise appear
at the right margin.
I t is not difficult to prove that this approach to ragged-right typesetting will never
lead to words that ‘stick out’ in the sense mentioned above; the total demerits are
reduced whenever a word that sticks out is moved to the following line.
Centered text
Occasionally we want to take some text that is too long to fit on one line and break
it into approximately equal-size parts, centering the parts on individual lines. This is
most often done when setting titles or captions, but it can also be applied to the text
of a paragraph, as shown in Figure 9.
1140 DONALD E. KNUTH AND MICHAEL F. PLASS
In olden times when wishing still helped one, there lived a king
whose daughters were all beautiful; and the youngest was
so beautiful that the sun itself, which has seen so much, was
astonished whenever it shone in her face. Close by the king’s castle
lay a great dark forest, and under an old lime-tree in the forest was
a well, and when the day was very warm, the king’s child went
out into the forest and sat down by the side of the cool fountain;
and when she was bored she took a golden ball, and threw it up
on high and caught it; and this ball was her favorite plaything.
Figure 9 . ‘Ragged-centered‘ text: The optimum-$t algorithm will produce special efJects like this,
when appropriate combinations of box/gluelpenalty items are used f o r the spaces between words.
Boxes, glue, and penalties can perform this operation, in the following way: (a) At
the beginning of the paragraph, use ‘glue(O,l8,0)’ instead of an indentation. (b) For
each space between words in the paragraph, use the sequence
glue(0,18,0)
penalty(O,O,0)
glue(6, -36’0)
box(0)
penalty(O,cc, 0)
glue(0,18,0).
glue(0,18,0)
penalty(0, - CO, 0).
The tricky part of this method is part (b), which ensures that an optional break
a t the ‘penalty(O,O,O)’ puts stretchability of 18 units at the end of one line and at
the beginning of the next. If no break occurs, the net effect will be glue(0,18,0)+
glue(6, -36,0)+glue(O, 18,O) = glue(6,0,0), a fixed space of 6 units. The ‘box(0)’
contains no text and occupies no space; its function is to keep the ‘glue(O,18,0)’from
disappearing at the beginning of a line. The ‘penalty(0, 0,O)’ item could be replaced
by other penalties, to represent breakpoints that are more or less desirable. However,
this technique cannot be used together with optional hyphenation, since our box/glue/
penalty model is incapable of inserting optional hyphens anywhere except at the right
margin when lines are justified.
The construction used here essentially minimizes the maximum gap between the
margins and the text on any line; and subject to that minimum it essentially minimizes
the maximum gap on the remaining lines; and so forth. The reason is that our defini-
tions of ‘badness’ and ‘demerits’ reduce in this case so that the sum of demerits for
any choice of breakpoints is approximately proportional to the sum of the sixth powers
of the individual gaps.
ALGOL-like languages
One of the most difficult tasks in technical typesetting is to get computer programs
to look right. In addition to the complications of mathematical formulas and a variety
BREAKING PARAGRAPHS I N T O LINES 1141
const n = 10000;
var sieve, primes :
setof2..n;
next,j : integer;
begin { initialize }
sieve := [2. .n];
primes := [I;
nezt := 2;
repeat { find next
prime }
while not (nezt in
sieve) do
next :=
succ (next);
primes :=
+
primes [nezt];
j := next;
Figure 10. These two settings of a sample P A S C A L program
while j <= n do were made from identical input specifications in the
{ eliminate } boxlgluelpenalty model; in the jirst case the lines were set 100
begin sieve := points wide, and in the second case the width was 250points. All of
sieve - b]; the line-breaking and identation was produced automatically by
j :=j + next the optimum-fit algorithm, which has no specific knowledge of
P A S C A L . Compilation of the P A S C A L source code into boxes,
end glue, and penalties was done mechanically.
until sieve = [I
end.
eonst n = 10000;
var sieve, primes : set of 2 . .n;
next,j : integer;
begin { initialize }
sieve := [2. .n];primes := [ 1; next :=2;
repeat { find next prime }
while not(nezt in sieve) do next := succ(next);
+
primes := primes [next];j := next;
while j <= n do { eliminate }
begin sieve := sieve - b];j :=j +
nezt
end
until sieve = [ ]
end.
Once again, the concepts of boxes, glue, and penalties come to the rescue: I t turns out
that our line-breaking methods developed for ordinary text can be used without change
to do the typesetting of programs in ALGOL-like languages. For example, Figure 10
shows a typical program taken from the PASCAL manual’’ that has been typeset
assuming two different column widths. Although these two settings of the program do
not look very much alike, they both were made from exactly the same input, specified
in terms of boxes, glue, and penalties; the only difference was the specification of line
width. (The input text in this example was prepared by a computer program called
BLAISEI2, which will translate any PASCAL source text into a T E X file that can be
incorporated within other documents.)
The box/glue/penalty specifications that lead to Figure 10 involve constructions
similar to those we have seen above, but with some new twists; it will be sufficient for
our purposes merely to sketch the ideas instead of dwelling on the details. One key
point is that the breaks are chosen by the minimum-demerits criteria we have been
discussing, but the lines are not justified afterwards (i.e., the glue does not actually
stretch or shrink). The reason is that relations and assignment statements are processed
by TEX’Snormal ‘math mode’, which allows line breaks to occur in various places but
without any special constructions particular to this application, so that justification
would have the undesirable effect of putting all such breaks at the right margin. The
fact that justification is suppressed actually turns out to be an advantage in this case,
since it means that we can insert glue stretching wherever we like, within a line, if it
affects the ‘badness’ formula in a desirable way.
Each line in the wider setting of Figure 10 is actually a ‘paragraph’ by itself, so it
is only the narrower setting that shows the line-breaking mechanism at work. Every
‘paragraph’ has a specified amount of indentation for its first line, corresponding to its
position in the program, as a given number t of ‘tab’ units; the paragraph is also given
a hanging indentation of t + 2 tab units. This means that all lines after the first are
required to be two tabs narrower than the first line, and they are shifted two tabs to
the right with respect to that line. In some cases (e.g., those lines beginning with ‘var’
or ‘while’)the offset is three tabs instead of two.
The paragraph begins with ‘glue(0, 100000,O)’, which has the effect of providing
enough stretchability that the line-breaking algorithm will not wince too much at
breaks that do not square perfectly with the right margin, at least not on the first line.
Special breaks are inserted at places where T E X would not normally break in math
mode; e.g., the sequence
penalty(0, co ,0)
glue(0,lOOOOO,O)
penalty(0,50,0)
glue(0, -100000,O)
box(0)
penalty(O,oo, 0)
glue(0,lOOOOO,O)
has been inserted just before ‘primes’ in the v a r declaration. This sequence allows
a break with penalty 50 to the next line, which begins with plenty of stretchability.
A similar construction is used between assignment statements, for example between
‘sieve : = [ 2 . . n];’ and ‘primes : = []’, where the sequence is
BREAKING PARAGRAPHS INTO LINES 1143
penalty(O,oo, 0)
glue(0,100000,0)
penalty(0, 0 , O )
+
glue(6 2w,-100000,O)
box(0)
penalty(0, 00, 0)
glue( -2w, 100000,O);
here w is the width of a tab unit. If a break occurs, the following line begins with
‘glue( -2w, 100000,O)’, which undoes the effect of the hanging indentation and effec-
tively restores the state at the beginning of a paragraph. If no break occurs, the net
effect is ‘glue(6,lOOOOO,O)’,a normal space.
No automatic system can hope to find the best breaks in programs, since an under-
standing of the semantics will indicate that certain breaks make the program clearer
and reveal its symmetries better. However, dozens of experiments on a wide variety
of PASCAL source texts have shown that this approach is surprisingly effective; fewer
than 1% of the line-breaking decisions have been overridden by authors of the
programs in order to provide additional clarity.
A complex index
The final application of line breaking that we shall study is the most difficult one
that has so far been encountered by the authors; it was solved only after acquiring more
than two years of experience with more straightforward line-breaking tasks, since the
full power of the box/glue/penalty primitives was not immediately apparent. The task
is illustrated in Figure 11, which shows excerpts from a ‘Key Index’ in Mathematical
Reviews. Such an index now appears at the end of each volume, together with an
‘Author Index’ that has a similar format.
As in Figure 10, the examples in Figure 11 were generated by the same source input
that was typeset using different line widths, in order to indicate the various possibilities
of breakpoints. Each entry in the index consists of two parts, the name part and the
reference part, both of which might be too long to fit on a single line. If line breaks
occur in the name part, the individual lines are to be set with a ragged right margin,
but breaks in the reference part are to produce lines with a ragged left margin. The
two parts are separated by leaders, a row of dots that expands to fill the space between
them; leaders are introduced by a slight generalization of glue that typesets copies
of a given box into a given space, instead of leaving that space blank. A hanging
indentation is applied to all lines but the first, so that the first line of each entry is
readily identifiable. One of the goals in breaking such entries is to minimize the white
space that appears in ragged-right or ragged-left lines. A subsidiary goal is to minimize
the number of lines that contain the reference part; for example, if it is possible to fit
all of the references on one line, the line-breaking algorithm should do so. The latter
event might mean that a break occurs after the leaders, with the references starting
on a new line; in such a case the leaders should stop a fixed distance w 1from the right
margin. Furthermore, the ragged-right lines should all be at least a fixed distance w 2
from the right margin, so that there is no chance of confusing part of the name with
part of the reference material. The individual boxes to be replicated in the leaders
are w 3 units wide.
1144 DONALD E. KNUTH AND MICHAEL F. PLASS
The ground rules are illustrated in Figure 1 1 , where there is a hanging indentation
of 27 units, and w1 = 45, w2 = 9, w 3 =7-2; the digits are 9 units wide, and the
respective column widths are 405 units, 3 15 units, and 225 units. The entry for ‘Theory
of Computing’ shows three possibilities for the leader dots: They can share a line with
the end of the name part and the beginning of the reference part, or they can end a
line before the reference part or begin a line after the name part.
Here is how all this can be encoded with boxes, glue, and penalties: (a) Each blank
space in the name part is represented by the sequence
penalty(0, cc,0)
g w w , , 1890)
penalty(O,O,0)
glue(6-w2, -18,2)
which yields ragged right margins and spaces that can shrink from 6 units to 4 units
if necessary. (b) T h e transition between name part and reference part is represented
BREAKING PARAGRAPHS INTO LINES 1145
box(0)
penalty(0, co , 0)
leaders(3w3, 100000,3w,)
g w w , 070)
7
penalty(0, 0,O)
glue(-ww,,-18,O)
box(0)
penalty(0, co,0)
glue(0,18,0).
(c) Each blank space in the reference part is represented by the sequence
penalty(0,999,0)
glue(6, - 18,2)
box(0)
penalty(0, 00 ,0)
glue(0,18, O),
which yields ragged left margins and 6-unit to 4-unit spaces.
Parts (a) and (c) of this construction are analogous to things we have seen before;
the 999-point penalties in (c) tend to minimize the total number of lines occupied by
the reference part. The most interesting aspect of this construction is the transition
sequence (b), where there are four possibilities: If no line breaks occur in (b), the net
result is
which allows leader dots to appear between the name and reference parts on the current
line. If a line break occurs before the leaders, the net result is
so that we have a break essentially like that after a blank space in the name part,
and the dot leaders begin the following line. If a line break occurs after the leaders,
the net result is
(name part) glue(6,0,2) (leaders) glue(wl, 0,O)
glue(0,18,0) (reference part),
so that we have a break essentially like that after a blank space in the reference part but
without the penalty of 999; the leaders end w 1 units from the right margin. Finally,
if breaks occur both before and after the leaders in (b), we have a situation that always
has more demerits than the alternative of breaking only before the leaders.
When the choice of breakpoints leaves room for at least 3w3 units of leaders, we
are sure to have at least two dots, but we might not have three dots since leader dots
on different lines are aligned with each other. The glue in other blank spaces on the
line with the leaders will shrink if there is less than 3w3 of space for the leaders, and
1146 DONALD E. KNUTH AND MICHAEL F. PLASS
this tends to make it more likely that the leader dots will not disappear altogether;
however, in the worst case the space for leaders will shrink to zero, so there might
not be any dots visible. It would be possible to ensure that all the leaders contain at
least two dots, by simply setting the shrink component of the leader item in (b) to
zero. This would improve the appearance of the resulting output; but unfortunately
it would also increase the length of the author indexes by about 15 per cent, and such
an expense would probably be prohibitive.
A preliminary version of this construction has been used with T E X to prepare the
indexes of Mathematical Reviews since November, 1979. However, the items ‘box(0)
penalty(0, co,0)’were left out of (b), for compatibility with earlier indexes prepared by
other typesetting software; this means that the leaders disappear completely whenever
a break occurs just before them, and the resulting indexes have unfortunate gaps of
white space that spoil their appearance.
A N ALGEBRAIC APPROACH
T h e examples we have just seen show that boxes, glue, and penalties are quite versatile
primitives that allow a user to obtain a wide variety of effects without extending the
basic operations needed for ordinary typesetting. However, some of the constructions
may have seemed like ‘magic’; they work, but it isn’t clear how they were ever conceived
in the first place. We shall now study a fairly systematic way to deal with these
primitives in order to assess their full potentiality; this brief discussion is independent
of the remainder of the paper and can be omitted.
In the first place it is clear that
with respect to any optimal choice of breakpoints, since there are fewer demerits asso-
ciated with the smaller penalty. However, it is not always possible to replace the general
sequence ‘penalty(w,p, f) penalty(w’,p’,f’)’ by a single penalty item.
We can assume without loss of generality that all box items are immediately followed
by a sequence of the form ‘penalty(O,oo, 0) glue(w, y, z)’. For if the box is followed by
another box, we can combine the two; if it is followed by a penalty item with p < 00,
we can insert ‘penalty(0, CC, 0) glue(0, 0,O)’; if it is followed by ‘penalty(w, co ,f)’
we can
BREAKING PARAGRAPHS INTO LINES 1147
assume that w = f = 0 and thgt the following item is glue; and if the box is followed
by glue, we can insert ‘penalty(0,00,0) glue(0,0, O)penalty(O,0,O)’. Furthermore we can
delete any penalty item with p = if it is not immediately preceded by a box item.
Thus, any sequence of box/glue/penalty items can be converted into a ‘normal form’,
where each box is followed by a penalty of CO, each penalty is followed by glue, and
each glue is either followed by a penalty < co or by a box. We assume that there is
only one penalty - 00, and that it is the final item, since a forced line break effectively
separates a longer sequence into independent parts. It follows that the normal-form
sequences can be written
XIXz...X,penalty(w, -00, j-,)
(end Of text 1) y2 z 2 )
9 J pJf)
glue(w3,y3,z 3 )(beginning of text,)
on two lines. A consideration of normal forms shows that the most general thing we
can do is to insert the sequence
between text, and textz, where no additional text is associated with the two inserted
bpg’s. Our job reduces therefore to determining appropriate values of w, y, z, w’, y’, z’,
w”, y”, zff,and these can be obtained immediately by solving the equations
W+WwI+W” = wl, y + y f + y r r=y1, Z+zr+X’I = z,;
w = w2, Y =y2, z = z,;
w’f = w 39 Y” = Y3, zrf= 273.
1148 DONALD E. KNUTH AND MICHAEL F. PLASS
Once a construction has been found in this way, it can be simplified by undoing
the process we have used to derive normal forms and by using other properties of
box/glue/penalty algebra. For example, we can always delete the penalty co item in
a sequence like
if y 2 0 and z 2 0 and p < 0, since a break at the glue is always worse than a break
at the penalty p .
I N T R O D U C T I O N T O THE A L G O R I T H M
T h e general ideas underlying the optimum-fit algorithm for line breaking can probably
be understood best by considering an example. Figure 12 repeats the paragraph of
Figure 4(c) and includes little vertical marks to indicate ‘feasible breakpoints’ found
by the algorithm. A feasible breakpoint is a place where the text of the paragraph from
the beginning to this point can be broken into lines whose adjustment ratio does not
exceed a given tolerance; in the case of Figure 12, this tolerance was taken to be unity.
Thus, for example, there is a tiny mark after ‘fountain;’ since there is a way to set the
paragraph up to this point with ‘fountain;’ at the end of the 7th line and with none of
lines 1 to 7 having a badness exceeding 100 (cf. Figure 4(a)).
T h e algorithm proceeds by locating all of the feasible breakpoints and remembering
the best way to get to each one, in the sense of fewest total demerits. This is done
by keeping a list of ‘active’ breakpoints, representing all of the feasible breakpoints
that might be a candidate for future breaks. Whenever a potential breakpoint b is
encountered, the algorithm tests to see if there is any active breakpoint a such that
the line from a to b has an acceptable adjustment ratio. If so, b is a feasible breakpoint
and it is appended to the active list. T h e algorithm also remembers the identity of
the breakpoint a that minimizes the total demerits, when the total is computed from
the beginning of the paragraph, through a, to 6 . When an active breakpoint a is
encountered for which the line from a to b has an adjustment ratio less than -1 (i.e.,
when the line can’t be shrunk to fit the desired length), breakpoint a is removed from
the active list. Since the size of the active list is essentially bounded by the maximum
number of words per line, the running time of the algorithm is bounded by this
quantity (which usually is small) times the number of potential breakpoints.
For example, when the algorithm begins to work on the paragraph in Figure 12,
there is only one active breakpoint, representing the beginning of the first line. It is
infeasible to have a line starting there and ending at ‘In’, or ‘olden’, . . . , or ‘lived’,
since the glue between words does not accumulate enough stretchability in such short
segments of the text; but after the next word ‘a’ is encountered, a feasible breakpoint
is found. Now there are two active breakpoints, the original one and the new one.
After the next word ‘king’, there are three active breakpoints; but after the next word
‘whose’, the algorithm sees that it is impossible to squeeze all of the text from the
beginning up to ‘whose’ on one line, so the initial breakpoint becomes inactive and
only two active ones remain.
Skipping ahead, let us consider what happens when the algorithm considers the
potential break after ‘fountain;’. At this stage there are eight active breakpoints,
following the respective text boxes for ‘child’, ‘went’, ‘out’, ‘side’, ‘of‘, ‘the’, ‘cool’,
BREAKING PARAGRAPHS INTO LINES 1149
Figure 12. Tiny vertical marks show ‘feasible breakpoints’ where it is possible to break
in such a w a y that no spaces need to stretch more than their given stretchability.
and ‘foun-’. T h e line starting after ‘child’ and ending with ‘fountain;’ would be too
long to fit, so ‘child’ becomes inactive. Feasible lines are found from ‘went’ or ‘out’
to ‘fountain;’ and the demerits of those lines are 276 and 182, respectively; however,
the line from ‘went’ actually turns out to be preferable, since there are substantially
fewer total demerits from the beginning of the paragraph to ‘went’ than to ‘out’. Thus,
‘fountain;’ becomes a new active breakpoint. T h e algorithm stores a pointer back from
‘fountain;’ to ‘went’, meaning that the best way to get to a break after ‘fountain;’ is
to start with the best way to get to a break after ‘went’.
T h e computation of this algorithm can be represented pictorially by means of the
network in Figure 13, which shows all of the feasible breakpoints together with the
number of demerits charged for each feasible line between them. T h e object of the
algorithm is to compute the shortest path from the top of Figure 13 to the bottom,
using the demerit numbers as the ‘distances’ corresponding to individual parts of the
path. In this sense, the job of optimal line breaking is essentially a special case of the
problem of finding shortest paths in an acyclic network; the line-breaking algorithm is
slightly more complex only because it must construct the network at the same time as
it is finding the shortest path.
Notice that the best-fit algorithm can be described very easily in terms of a network
like Figure 13: it is the algorithm that simply chooses the shortest continuation at every
step. And the first-fit algorithm can be characterized as the method of always taking
the leftmost branch having a negative adjustment ratio (unless it leads to a hyphen,
in which case the rightmost non-hyphenated branch is chosen whenever there is a
feasible one). From these considerations we can readily understand why the optimum-
fit algorithm tends to do a much better job.
Sometimes there is no way to continue from one feasible breakpoint to any other.
This situation doesn’t occur in Figure 13, but it would be present below the word ‘so’
if we had not permitted hyphenation of ‘astonished’. In such cases the first-fit and
best-fit algorithms must resort to infeasible lines, while the optimum-fit algorithm can
usually find another way through the maze.
O n the other hand, some paragraphs are inherently difficult, and there is no way to
break them into feasible lines. In such cases the algorithm we have described will find
that its active list dwindles until eventually there is no activity left; what should be
done in such a case? It would be possible to start over with a more tolerant attitude
1150 DONALD E. K N U T H A N D MICHAEL F. PLASS
Figure 13. This network shows the feasible breakpoints and the number of demerits
charged when going from one breakpoint to another. The ‘shortest path’from the top to
the bottom corresponds to the best way to typeset the paragraph, if w e regard the demerits
as distances.
toward infeasibility (a higher threshold value for the adjustment ratios). Alternatively,
TEX takes the attitude that the user wants to make some manual adjustment when
there is no way to meet the specified criteria, so the active list is forcibly prevented from
becoming empty by simply declaring a breakpoint to be feasible if it would otherwise
leave the active list empty. This results in an overset line and an error message that
encourages the user to take corrective action.
Figure 14 shows what happens when the algorithm allows quite loose lines to be
feasible; in this case a line is considered to be infeasible only if its adjustment ratio
exceeds 10 (so that there would be more than two ems of space between words).
Such a setting of the tolerances would be used by people who don’t want to make
manual adjustments to paragraphs that cannot be set well. The tiny marks rhat
indicate feasible breakpoints have varying lengths in this illustration, with longer marks
BREAKING PARAGRAPHS INTO LINES 1151
' In olden times when wiswng still helped one,' there lived a'
kind whose daughters were all beau&ijful;and the young& wad .*A6
so' bea<ti;ful' that' the sun i$elf,' which' has' seen so much,' wad .66T
astonjshed whedever it' shone in her' face.' Close' by' the' king'$ .614
castle' lay' 8 great' dark' forkst,' and'udder' ad old lim%,ree' id the' .OIT
fo?'st' wad a' well,' and' when the day wad v e d warm,' the' king'$ .ITS
child went' out' into the forkst and sat' down' by' the' side' of thd .346
cool' foudtain; and' when she wad bored she' tooli a' golden' ball,' .lTK
and thred it' up' on' high' and caught! it: and thid balr wad he? .603
Figure 14. When the tolerance is raised to 10 times the stretchability, more breakpoints
become feasible, and there are many more possibilities to explore.
indicating places that can be reached via better paths; the tiny dots are for breakpoints
that are just barely feasible. Notice that all of the potential breakpoints in Figure 14
are marked, except for a few in the first two lines; so there are considerably more
feasible breakpoints here than there were in Figure 12, and the network corresponding
to Figure 13 will be much larger. There are 836,272,858 feasible ways to set the para-
graph when such wide spaces are tolerated, compared to only 81 ways in Figure 12.
However, the number of active nodes will not be significantly bigger in this case than
it was in Figure 12, because it is limited by the length of a line, so the algorithm
will not run too much more slowly even though its tolerance has been raised and the
number of possible settings has increased enormously. For example, after 'fountain;'
there are now 17 active breakpoints instead of the 8 present before, so the processing
takes only about twice as long although huge numbers of additional possibilities are
being taken into account.
When the threshold allows wide spacing, the algorithm is almost certain to find a
feasible solution, and it will report no errors to the user even though some rather loose
lines may have been necessary. T h e user who wants such error messages should set the
tolerance lower; this not only gives warnings when corrective action is needed, it also
improves the algorithm's efficiency.
One of the important things to note about Figure 14 is that breakpoints can become
feasible in completely different ways, leading up to different numbers of lines before the
breakpoint. For example, the word 'seen' is feasible both at the end of line 3:
although 'seen' was not a feasible break at all in Figure 12. T h e breaks that put 'seen'
at the end of line 3 have substantially fewer demerits than those putting it on line 4
(approximately 1.68 x lo6 versus 1-28 x lo1'), so the algorithm will remember only
the former possibility. This is an application of the dynamic-programming 'principle
of optimality', which is responsible for the efficiency of our algorithm4: the optimum
breakpoints of a paragraph are always optimum for the subparagraphs they create.
1152 DONALD E. KNUTH AND MICHAEL F. PLASS
The area of a
circle is a mean propor-
tional between any two regular
and similar polygons of which one
circumscribes it and the other is iso-
perimetric with it. In addition, the area
of the circle is less than that of any cir-
cumscribed polygon and greater than that
of any isoperimetric polygon. And further,
of these circumscribed polygons, the one
that has the greater number of sides has
a smaller area than the one that has
a lesser number; but, on the other
hand, the isoperimetric polygon
that has the greater num-
ber of sides is the
larger.
- Galileo Galilei (1638)
1
turn, in the
following treatises, to
various uses of those triangles
whose generator is unity. But I leave out
many more than I include; it is extraurdinary how
fertile in properties this triangle is. Everyone can try his hand.
- Blaise Pascal (1654)
But the interesting thing is that this economy of storage would not be possible if the
future lines were not all of the same length, since differing line lengths might well
mean that it would be much better to put ‘seen’ on line 4 after all; for example, we
have mentioned a trick for forcing the algorithm to produce a given number of lines.
In the presence of varying line lengths, therefore, the algorithm would need to have
two separate list entries for an active breakpoint after the word ‘seen’. The computer
cannot simply remember the one with fewest total demerits, because the optimality
principle of dynamic programming would not be valid in such a case.
Figure 15 is an example of line breaking when the individual lengths are all different.
In such cases, the need to attach line numbers to breakpoints might mean that the
number of active breakpoints substantially exceeds the maximum number of words per
line, if the feasibility tolerance is set high; so it is desirable to set the tolerance low.
On the other hand, if the tolerance is set too low, there may be no way to break the
paragraph into lines having a desired shape. Fortunately, there is usually a happy
medium in which the algorithm has enough flexibility to find a good solution without
needing too much time and space. The data in Figure 16 shows, for example, that the
BREAKING PARAGRAPHS INTO LINES 1153
breakpoints in the first example that has the' greater num-' .osO
of Figure 15, showing how the be? of sides id the' 4 0
algorithm did not have to do very much work to find an optimal solution for Galileo's
remarks on circles, when the adjustment ratio on each feasible line was required to be
2 or less; yet there was sufficient flexibility to make feasible solutions possible.
A good line-breaking method is especially important for technical typesetting, since
it is undesirable to break up mathematical formulas that appear in the text. Some of
the most difficult copy of this kind appears in Muthematical Reviews or in the answer
pages of The A r t of Computer Programming, since the material in those publications
is often densely packed with formulas. Figure 17 shows a typical example from the
answer pages of Seminumerical Algorithmsg, together with indications of the feasible
breaks when the adjustment ratios are constrained to be at most 1 . Although some
feasible breakpoints occur in the middle of formulas, they are associated with penalties
that make them comparatively undesirable, so the algorithm was able to keep all of
the mathematics of this paragraph intact.
' 15. (This procedure maintains four integers (A,B, C, D ) with the invariant meanind .1as
that "our remaining job is to output the continued fraction for (Ay + B)/(Cy + D); .as9
where y is the input yet to come.") Initially set j k c 0, (A,B, C ,D) (a,b, c, d),J
t t .oas
then input xj and set (A,B, C, D) + B,A, Cxj + D, C ) , j j + 1, ond
t ( h j 4- O+ -160
mord times until C + D has the same sign as C. (When j 2 1 and the input' had -606
not' terminated, we know that 1 < y < and when C + D has the samd sign'
00; .DDa
and A/C: Output! xk t [ A / C ] ,and set (A,B, C ,D ) t (C,D , A - xkc,B -'XkD): .a46
k +I k +' 1;' otherwise' input X j and set (A,B, C,D) t ( A x j + +'
B,A, Cxj D, C): .T6P
j +I j +' 1.' The' general' step' is' repeated ad infinitum. However, if at any time thd - 4 6 1
find z] id input: thd algorithm' immediately switches gears: It outputs the continued .air
fraction' fox' ( k j +'B)/(CZj +ID)!using Euclid's algorithm, and terminates. .000
Figure 17. A n example of the feasible breakpoints found by the algorithm in a paragraph
containing numerous mathematical formulas.
1154 DONALD E. KNUTH AND MICHAEL F. PLASS
In olden times when wishjng still helped one, there lived a .,no
king whose daughters were all beaqtiful; and the younsst was so -el16
beaqtiful that the sun ibelf, which has seen so much, was aston- 4 1 5
ished whenper it shone in her face. Close by the king’s castle lay -.a16
a great dark fopst, and uqder an old lim%tree in the for-st was
a well, and when the day was very warm, the king’s child went . l ~ ~
out into the forEst and sat down by the side of the cool fouqtain; -.538
and when she was bored she took a golden ball, and threw it up -.134
on high and caught it; and this ball was her favorite plaything.
In olden times when wishjng still helped one, there lived a .TnO
king whose daughters were all beaytgul; and the youngest was .a46
so beaqtiful that the sun iQelf, which has seen so much, was .661
child went out into the forfist and sat down by the side of the a346
cool fouqtain; and when she was bored she took a golden ball, -111
and threw it up on high and caught it; and this ball was her -103
a king whose daughters were all bea&.ful; and the young- l.4~l
est was so beaqtaful that the sun iQelf, which has seen so 1.431
old lim%tree in the fopst was a well, and when the day 1.8nI
was very warm, the king’s child went out into the for-st 1.886
and sat down by the side of the cool fouqtain; and when 1.551
she was bored she took a golden ball, and threw it up on 1 . 3 E O
high and caught it; and this ball was her favorite play- l.lTs
thing. .861
the youngpst was so beaqtiful that the sun ibelf, which a.mn
has seen so much, was astonjshed whenper it shone 3.636
thing. .B6¶
Figure 18. Paragraphs obtained when the ‘looseness’ parameter has been set to -1, 0,
f l ,and + 2 . As in Figure 14, the spaces have been allowed to stretch up to two ems before
being considered infeasible. Loose settings like this are sometimes necessary to balance a
page, but of course the effects are not beautiful when one goes to extremes.
BREAKING PARAGRAPHS INTO LINES 1155
without violating the conditions of feasibility. Figure 1 8 shows what happens to the
+ +
example paragraph of Figure 1 4 when q = - 1 , 0, 1, and 2, respectively. Values
of q < -1 would be the same as q = - 1 since this paragraph cannot be squeezed any
further, and values of q > 5 would be the same as q = 5 since the paragraph can’t
be stretched to more than 15 lines without having at least one line whose adjustment
ratio exceeds 10. The user can get the optimum solution having fewest possible lines
by setting q to an extremely negative value like - 1 0 0 . When q # 0, the feasible
breakpoints corresponding to different line numbers must all be remembered, even
when every line has the same length.
When the lines of a paragraph are fairly loose, we don’t want the last line to be
noticeably different, so it is undesirable to use a ‘finishing glue’ with almost infinite
stretchability as in our earlier remarks. The penalty for adjacent lines of contrasting
classes seems to work best in connection with looseness if the finishing glue at the
paragraph end is set to have a normal space equal to about half the total line width,
stretching to nearly the full width and shrinking to zero.
T H E ALGORITHM I T S E L F
Now let us get down to brass tacks and discuss the details of an optimum line-
breaking algorithm. We are given a paragraph xi . . . x, described by items x i =
) explained earlier, where x1 is a box item and x, is a penalty
( t i , w i , y i , z i , p i , xas
item specifying a forced break (p, = --a). We are also given a potentially infinite
sequence of positive line lengths I,, I,, . . . . There is a parameter c( that gets added
to the demerits whenever there are two consecutive breakpoints with = 1, and a
parameter y that gets added to the demerits whenever two consecutive lines belong to
incompatible fitness classes. There is a threshold parameter p that is an upper bound
on the adjustment ratios. And there is a looseness parameter q.
A feasible sequence of breakpoints ( b , , . . ., bk) is a legal choice of breakpoints such
that each of the k resulting lines has an adjustment ratio rj d p . If 4 = 0, the job ofthe
algorithm is to find a feasible sequence of breakpoints having the fewest total demerits.
If q # 0, the job of the algorithm is somewhat more difficult to describe precisely; it
can be formulated as follows: Let k be the number of lines that the algorithm would
+
produce when q = 0. Then the algorithm finds a feasible sequence of k q breakpoints
having fewest total demerits. However, if this is impossible, the value of q is decreased
by 1 (if q > 0) or increased by 1 (if q > 0) until a feasible solution is found. Sometimes
no feasible solution is possible even with q = 0; we will discuss this situation later after
seeing how the algorithm behaves in the normal case.
We have seen that it is occasionally useful to permit boxes, glue, and penalties to
have negative widths and even negative stretchability; but a completely unrestricted
use of negative values leads to unpleasant complications. For reasons of efficiency, it is
desirable to place two limitations on the paragraphs that will be treated:
0 Restriction 1. Let i b f b be the length of the minimum-length line from the begin-
ning of the paragraph to breakpoint b, namely the sum of all w i- zitaken over all
box and glue items xifor 1 d i < b , plus wb if xb is a penalty item. The paragraph
must have M a d Mb whenever a and b are legal breakpoints with a < b.
0 Restriction 2. Let a and b be legal breakpoints with a < 6 , and assume that no x i
in the range a< i < b is a box item or a forced break (penalty p i = - a). Then
either b = m, or xb+l is a box item or a penalty with p b + l < co.
BREAKING PARAGRAPHS INTO LINES 1157
Both of these restrictions are quite reasonable, as they are met by all known practical
applications. Restriction 2 seems peculiar at first glance, but we will see in a moment
why it is helpful.
Our algorithm has the following general outline, viewed from the top down:
The meaning of the ad hoc Algol-like language used here should be self-evident. An
‘active node’ in this description refers to a record that includes information about a
breakpoint together with its fitness classification and the line number on which it ends.
We want to have a data structure that makes this algorithm efficient, and it is not
hard to design a reasonably good one, but there are two aspects in which some subtlety
pays off: T h e operation of computing the adjustment ratio, from a given active node a
to a given legal breakpoint b, should be made as simple as possible; and there should
be an easy way to determine which of the feasible breaks at b ought to be saved as
active nodes.
In the first place, the adjustment ratio depends on the total width, total stretch-
ability, and total shrinkability computed from the first box after one breakpoint to
the following breakpoint, and it would take too much time to compute these sums
over and over. We can avoid this by computing the sums from the beginning of the
paragraph to the current place, and subtracting two such sums to obtain the total of
what lies between them. Let ( & u ) b , ( C Y ) b , and ( b ) b denote the respective sums of all
the wi,yi, and zi in the box and glue items x i for 1 < i < b. Then if a and b are legal
breakpoints with a < b , the width L a b of a line from a to b and its stretchability Y a b
and shrinkability z a b can be computed as follows:
Here ‘after()’ is the smallest index i > a such that either i > m or x i is a box item
or xi is a penalty item that forces a break (pi = -m). These formulas hold even in
the degenerate case that after(a) > b, because of Restriction 2; in fact, Restriction 2
essentially stipulates that the relation ‘after(a) > b’ implies that (h) =, (&u)after(a),
( z Y ) b = (zY)after(o), and ( x z ) b = (Ez)after(a)*
1158 DONALD E. KNUTH A N D MICHAEL F. PLASS
From these considerations, we may conclude that each node a in the data structure
should contain the following fields:
position(a) = index of breakpoint represented by this node (0 = start of paragraph);
line(a) = number of the line ending at this breakpoint;
fitness(a) = fitness class of the line ending at this breakpoint;
totalwidth(a) = (Cw)after(a), used to calculate adjustment ratios;
totalstretch(a) = (Cy)after(a,,used to calculate adjustment ratios;
totalshrink(a) = (Cz)af,er(a),used to calculate adjustment ratios;
totaldemerits(a) = minimum total demerits up to this breakpoint;
previous(a) = pointer to the best node for the preceding breakpoint;
link(a) = pointer to the next node in the list.
Nodes become active when they are first created, and they become passive when they
are deactivated. T h e algorithm maintains global variables A and P, which point
respectively to the first node in the active list and the first node in the passive list.
T h e first step can therefore be fleshed out as follows:
(create an active node representing the beginning of the paragraph) =
begin A : = new node (position = 0, line = 0, fitness = 1 ,
totalwidth = 0, totalstretch = 0, totalshrink = 0,
totaldemerits = 0, previous = A, link = A);
P : = A;
end.
We also introduce global variables CW, C Y , and ZZ to represent (Zw),,(CY)b,
and (Cz), in the main loop of the algorithm, so that the operation ‘for b := 1 to m do
(if b is a legal breakpoint) then (main loop)’ takes the following form:
C W : =C Y : = C Z : = O ;
for b : = 1 tom do
if tb = ‘box’ then C W := C W + wb
else if tb = ‘glue’ then
begin if t b p 1= ‘box’ then (main loop);
C W : = ZW+w,; C Y : = CY+Yb; xz:=xz+zb;
end
else if p , # +cc then (main loop).
In the main loop itself, the operation ‘compute the adjustment ratio Y from a to b’ can
now be implemented simply as follows:
L : = C W - totalwidth(a);
+
if tb = ‘penalty’ then L : = L w,;
j : = line(a)+l;
if L < lj then
begin Y := C Y - totalstretch(a);
if Y > O then Y : = ($-L)/Y else I : = co;
end
else if L > l j then
begin 2 := C Z - totalshrink(a);
if 2 > 0 then Y : = ( $ - L ) / Z else Y : = 00;
end
else Y : = 0.
BREAKING PARAGRAPHS INTO LINES 1159
The other nonobvious problem we have to deal with is caused by the fact that
several nodes might correspond to a single breakpoint. We will never create two nodes
having the same values of (position, line, fitness), since the whole point of our dynamic
programming approach is that we need only remember the best possible way to get
to each feasible break position having a given line number and a given fitness class.
But it is not immediately clear how to keep track of the best ways that lead to a
given position, when that position can occur with different line numbers; we could,
for example, maintain a hash table with (line, fitness) as the key, but that would
be unnecessarily complicated. The solution is to keep the active list sorted by line
numbers: After looking at all the active nodes for l i n e j , we can insert new active
nodes for line j + 1 into the list just before any active nodes for lines > j + 1 that
we are about to look at next.
An additional complication is that we don’t want to create active nodes for different
line numbers when the line lengths are all identical, unless q # 0, since this would
unnecessarily slow the algorithm down; the complexities of the general case should
not encumber the simple situations that arise most often. Therefore we assume that
an indexj, is known such that all breaks at line numbers >jo can be considered
equivalent. This index j , is determined as follows: If q # 0, then j , = 00; otherwise
j , is as small as possible such that .,Z = 4+,
for all j >j,. For example, if q = 0 and
I, = I, = I, # Z4 = I, = - - . , we let j , = 3 , since it is unnecessary to distinguish a
breakpoint that ends line 3 from a breakpoint that ends line 4 at the same position, as
far as any subsequent lines are concerned.
For each position b and line numberj, it is convenient to remember the best feasible
breakpoints having fitness classifications 0, 1, 2, 3 by maintaining four values Do, D,,
D,,D,,where D,is the smallest known total of demerits that leads to a breakpoint at
position b and l i n e j and class c. Another variable D = min(D,, D,, D,,0,) turns out
to be convenient as well, and we let A, point to the active node a that leads to the best
value 0,. Thus the main loop takes the following slightly altered form:
begin a : = A ;preva := A;
loop: D , : = D , : = D , : = D , : = D : = +0O;
loop: nexta := link(a);
(compute the adjustment ratio Y from a to b ) ;
if r < -1 or pb = -00 then (deactivate node a ) else preva:= a;
if -1Grdpthen
begin (compute demerits d and fitness class c);
if d<D, then
begin D,:=d; A,:= a; if d < D then D : =d;
end;
end;
a : = nexta; if a = A then exit loop;
if line(a)aj and j < j , then exit loop;
repeat;
if D < 00 then (insert new active nodes for breaks from A, to b ) ;
if a = A then exit loop;
repeat;
if A = A then (do something drastic since there is no feasible solution);
end.
1160 DONALD E. KNUTH AND MICHAEL F. PLASS
For a given position b, the inner loop of this code considers all nodes a having
equivalent line numbers, while the outer loop runs through all of the line numbers that
are not equivalent.
It is not difficult to derive a precise encoding of the operations that have been
abbreviated in these loops:
(compute demerits d and fitness class c ) =
+
begin if pb 2 0 then d : = (1 100 I r +pJ2 l3
else if pb# -00 then d : = (1 1001 r + -pi 13) 2
+
else d := (1 100 1 r I3l2;
: = d+ a . f b .fposition(a);
if r < -.5 then c : = 0
<
else if r .5 then c : = 1
<
else if r 1 then c : = 2 else c : = 3;
if )c-fitness(a)) > 1 then d : = d+y;
+
d := d totaldemerits(a);
end;
(insert new active nodes for breaks from A, to b ) =
begin (compute tw = (Zw)after(b),tY = (xY)after(b),and tz = (Cz)aftcr(bt);
for c : = 0 to 3 do if D, < D f y then
+
begin s : = new node(position = 6, line = line(A,) 1, fitness = c,
totalwidth = tw,totalstretch = ty, totalshrink = tz,
totaldemerits = D,, previous = A,, link = a);
if preva = A then A = d else link(preva) := s;
preva := s;
end;
(compute tw = (xw)after(b),tY ==(xY)after(b),and = (xz)after(b)) =
begin t w : = C W , t y : = Z=Y, t z : = CZ, z:= b;
loop: if i > m then exit loop;
if 4 = ‘box’ then exit loop;
if ti = ‘glue’ then
begin tw:= tw+wi; t y : = ty+yi; t z : = tz+zi;
end
else if pi = - 00 and i> b then exit loop;
i: = i+ 1;
repeat;
end;
(deactivate node a> =
begin if preva = A then A : = nexta else link(preva) : = nexta;
link(a) : = P; P : = a;
end;
After the main loop has done its job, the active list will contain only nodes with
position = m, since x, is a forced break. Thus, we can write
(choose the active node with fewest total demerits) =
begin if a : = b := A; d : = totaldemerits(a);
loop: a := link(a);
if a = A then exit loop;
BREAKING PARAGRAPHS I N T O LINES 1161
if totaldemerits(a) < d then
begin d := totaldemerits(a); b : = a;
end;
repeat;
k := line(b);
end.
Now b is the chosen node and k is its line number. The subsequent processing for
q# 0 is equally elementary:
(choose the appropriate active"node) =
begin a := A ;s : = 0;
loop: 6 := line(a) - k;
if q < 6 < s or s < 6 d q then
begin s := b; d : = totaldemerits(a); b := a;
end
else if 6 = s and totaldemerits(a) < d then
begin d := totaldemerits(a); b : = a;
end;
a := link(a); if a = A then exit loop;
repeat;
k := line(b);
end.
Now the desired sequence of k breakpoints is accessible from node b:
(use the chosen node to determine the optimum breakpoint sequence) =
for j : = k down to 1 do
begin bj := position(b); b := previous(b);
end.
(Another way to complete the processing, getting the lines in forward order from 1 to k
instead of from k to 1, appears in the appendix below.) If there is no garbage collection,
the algorithm concludes by deallocating all nodes on lists A and P .
Note that Restriction 1 makes it legitimate to deactivate a node when we discover
that r < - 1, since r < - 1 is equivalent to Zl < Lab-Zab, therefore subsequent
breakpoints b'>b will have Labr-Zabr2 L a b - & , . Thus it is not difficult to
verify that the algorithm does indeed find an optimal solution: Given any sequence of
-
feasible breakpoints b , < . <b,, we can prove by induction on j that the algorithm
constructs a node for a feasible break at j , with appropriate line numbers and fitness
classifications, having no more demerits than the given sequence does.
There is only one loose end remaining in the algorithm, namely the operation 'do
something drastic since there is no feasible solution'. As mentioned above, the TEX
system assumes that the user has chosen the tolerance threshold p in such a way that
human intervention is desirable when this tolerance cannot be met. Another alternative
would be to have two thresholds and to try the algorithm first with threshold po,
which is lower than p , so the algorithm will generate comparatively few active nodes;
if there is no way to succeed at tolerance po, the algorithm could simply return all
nodes to free storage and try again with the actual threshold p. This dual-threshold
method will not always find the strictly optimum feasible solution, since it is possible
in unusual circumstances for the optimum solution to include a line whose adjustment
1162 DONALD E. KNUTH AND MICHAEL F. PLASS
ratio exceeds po while there is a non-optimum feasible solution meeting the tolerance
pa; for practical purposes, however, this difference is negligible.
T E X uses a different sort of dual-threshold method. Since the task of word division
is nontrivial, TEX first tries to break a paragraph into lines without any discretionary
hyphens except those already present in the given text, using a tolerance threshold p l .
If the algorithm fails to find a feasible solution, or if there is a feasible solution with
q # 0 but the desired looseness could not be satisfied (6 # q), all nodes are returned
to free storage and T E X starts again using another tolerance p 2 . During this second
pass, all words of five letters or more are submitted to TEX’S hyphenation algorithm
before they are treated by the line-breaking algorithm. Thus, the user sets p1 to the
limit of tolerance for paragraphs that can be completely broken without hyphenation,
and p2 is set to the tolerance limit when hyphenation must be tried; possibly p 1 will be
slightly larger than p z , but it might also be smaller, if hyphenation is not frowned on
too much. ( T E X users specify two integers, ‘jjpar’ = p: and ‘jpar’ = p z . ) In practice
>
p I and p 2 are usually equal to each other, or else p1 is near 1 and p z 2; alternatively,
one can take pz = 0 to effectively disallow hyphenation.
When both passes fail, T E X continues by reactivating the node that was most
recently deactivated and treats it as if it were a feasible break leading to 6. This situation
is actually detected in the routine ‘deactivate node a’, just after the last active node
has become passive:
COMPUTATIONAL EXPERIENCE
The algorithm described in the previous section is rather complex, since it is intended
to apply to a wide variety of situations that arise in typesetting. A considerably
simpler procedure is possible for the special cases needed for word processors and
newspapers; the appendix to this paper gives details about such a stripped-down
version. Contrariwise, the algorithm in TEX is even more complex than the one we
have described, because T E X must deal with leaders, with footnotes or cross references
or page-break marks attached to lines, and with spacing both inside and immediately
outside of math formulas; the spacing that surrounds a formula is slightly different from
glue because it disappears when followed by a line break, but it does not represent a
legal breakpoint. (A complete description of TEX’s algorithm will appear elsewhere.’ 3,
Experience has shown that the general algorithm is quite efficient in practice, in spite
of all the things it must cope with.
So many parameters are present, it is impossible for anyone actually to experiment
with a large fraction of the possibilities. A user can vary the interword spacing and the
penalties for inserted hyphens, explicit hyphens, adjacent flagged lines, and adjacent
BREAKING PARAGRAPHS INTO LINES 1163
lines with incompatible fitness classifications; the tolerance threshold p can also be
twiddled, not to mention the lengths of lines and the looseness parameter q. Thus
one could perform computational experiments for years and not have a completely
definitive idea about the behavior of this algorithm. Even with fixed parameters there
is a significant variation with respect to the kind of material being typeset; for example,
highly mathematical copy presents special problems. An interesting comparative study
of line breaking was made by Duncan et al.’, who considered sample texts from
Gibbon’s Decline and Fall versus excerpts from a story entitled Salar the Salmon; as
expected, Gibbon’s vocabulary forced substantially more hyphenated lines.
On the other hand, we have seen that the optimizing algorithm leads to better
line breaks even in children’s stories where the words are short and simple, as in
Grimm’s fairy tales. It would be nice to have a quantitative feeling for how much
extra computation is necessary to get this improvement in quality. Roughly speaking,
the computation time is proportional to the number of words of the paragraph, times
the average number of words per line, since the main loop of the computation runs
through the currently active nodes, and since the average number of words per line is
a reasonable estimate of the number of active nodes in all but the first few lines of a
paragraph (see Figures 12 and 14). On the other hand, there are comparatively few
active nodes on the first lines of a paragraph, so the performance is actually faster than
this rough estimate would indicate. Furthermore, the special-purpose algorithm in the
appendix runs in nearly linear time, independent of the line length, since it does not
need to run through all of the active nodes.
Detailed statistics were kept when TEX’Sfirst large production, Seminumerical Algo-
rithms’, was typeset using the procedure above. This 700-page book has a total of
5526 ‘paragraphs’ in its text and answer pages, if we regard displayed formulas as
separators between independent paragraphs. The 5526 paragraphs were broken into
a total of 21,057 lines, of which 550 (about 2.6 per cent) ended with hyphens. The
lines were usually 29 picas wide, which means 626.4 machine units in 10-point type and
about 677.19 machine units in 9-point type, roughly twelve or thirteen words per line.
The threshold values p1 and p2 were normally both set to 7 2 % 1*26, so the spaces
between words ranged from a minimum of 4 units to a maximum of 6+ 3 v 2 ~ 9 . 7 8
units. The penalty for breaking after a hyphen was 50; the consecutive-hyphens and
adjacent-incompatibility demerits were c( = y = 3000. The second (hyphenation) pass
was needed on only 279 of the paragraphs, i.e., about 5% of the time; a feasible solution
without hyphenation was found in the remaining 5247 cases. The second pass would
only try to hyphenate uncapitalized words of five or more letters, containing no accents,
ligatures, or hyphens, and it turned out that exactly 6700 words were submitted to the
hyphenation procedure. Thus the number of attempted hyphenations per paragraph
was approximately 1‘2, only slightly more than needed by conventional nonoptimizing
algorithms, and this was not a significant factor in the running time.
The main contribution to the running time came, of course, from the main loop of the
algorithm, which was executed 274,102 times (about 50 times per paragraph, including
both passes lumped together when the second pass was needed). The total number of
break nodes created was 64,003 (about 12 per paragraph), including multiplicities for
the comparatively rare cases that different fitness classifications or line numbers needed
to be distinguished for the same breakpoint. Thus, about 23 yo of the legal breakpoints
turned out to be feasible ones, given these comparatively low values of p1 and p2. The
inner loop of the computation was performed 880,677 times; this is the total number
1164 DONALD E. KNUTH AND MICHAEL F. PLASS
-1.00 5 r <-0.95
-0.95 5 r <-0.85 1
-0.85 5 r < -0.75
-0.75 5 r <-0.65 1
-0.65 5 r <-0.55 1
-0.55 5 r <-0.45
-0.45
-0.35
-0.25
5 r <-0.35
5 r < -0.25
5 r <-0.15
I
< r <-0.05
-0.15
-0.05 < r <+0.05
+0.05 5 r <+0.15
-1 I
+0.15 5 r <+0.25
+0.25 5 r <+0.35 F I
I
+0.35 5 r <+0.45
+0.45 5 r <+0.55
+0.55 5 r <+0.65
+0.65 5 r <+0.75
t0.75 5 r <+0.85
+0.85 5 r < +0.95
+0.95 5 r <+l.05
+l.05 5 r <+1.15 Figure 19. The adjustment ratios for interword spaces
+1.15 5 r <+l.26 in a 700-page book.
of active nodes examined when each legal breakpoint was processed, summed over all
legal breakpoints. Note that this amounts to about 160 active node examinations per
paragraph, and 3.2 per breakpoint, so the inner loop definitely dominates the running
time. If we assume that words are about five letters long, so that a legal break occurs
for every six characters of input text including the spaces between words, the algorithm
costs about half of an inner-loop step per character of input, plus the time to pass over
that character in the outermost loop.
This source data was also used to establish the importance of the optional dominance
test ‘if D,< D+y’ preceding the creation of a new node; without that test, the
algorithm was found to need about 25% more executions of the inner loop, because
so many unnecessary nodes were created.
And how about the output? Figure 19 shows the actual distribution of adjustment
ratios r in the 15,531 typeset lines of Seminumerical Algorithms, not counting the
5526 lines at the ends of paragraphs, for which r ~ 0 There . was also one line with
8 one with ~ ~ 2(i.e.,
r ~ 1 . and . 2a disgraceful spacing of 12.6 units); perhaps some
reader will be able to spot one or both of these anomalies some day. T h e average value
of r over all 21,057 lines was 0.08, and the standard deviation was only 0-403; about
67% of the lines had word spaces varying between 5 and 7 units. Furthermore the
author believes that virtually none of the 15,531 line breaks are ‘psychologically bad’
in the sense mentioned above.
Anyone who has experience with typical English text knows that these statistics are
not only excellent, they are in fact too good to be true; no line-breaking algorithm can
achieve such stellar behavior without occasional assists from the author, who notices
that a slight change in wording will permit nicer breaks. Indeed, this is another source
of improved quality when an author is given composition tools like TEX to work with,
because a professional compositor does not dare mess around with the given wording
when setting a paragraph, while an author is happy to make changes that look better,
especially when such changes are negligible by comparison with changes that are found
to be necessary for other reasons when a draft is being proofread. An author knows
BREAKING PARAGRAPHS INTO LINES 1165
that there are many ways to say what he or she wants to say, so it is no trick at all to
make an occasional change of wording.
Theodore L. De Vinne, one of America’s foremost typographers at the turn of the
century, wrote14 that ‘when the author objects to [a hyphenation] he should be asked
to add or cancel or substitute a word or words that will prevent the breakage.. .
Authors who insist on even spacing always, with sightly divisions always, do not
clearly understand the rigidity of types.’ Another interesting comment was made by
G. B. Shawl’: ‘In his own works, whenever [William Morris] found a line that justified
awkwardly, he altered the wording solely for the sake of making it look well in print.
When a proof has been sent to me with two or three lines so widely spaced as to make
a grey band across the page, I have often rewritten the passage so as to fill up the lines
better; but I am sorry to say that my object has generally been so little understood
that the compositor has spoilt all the rest of the paragraph instead of mending his
former bad work.’
The bias caused by Knuth’s tuning his manuscript to a particular line width makes
the statistics in Figure 19 inapplicable to the printer’s situation where a given text
must be typeset as it is. So another experiment was conducted in which the material
of Section 3 . 5 of Seminumerical Algorithms was set with lines 25 picas wide instead
of 29 picas. Section 3 . 5 , which deals with the question ‘What is a random sequence?’,
was chosen because this section most closely resembles typical mathematics papers con-
taining theorems, proofs, lemmas, etc. In this experiment the optimum-fit algorithm
had to work harder than it did when the material was set to 29 picas, primarily because
the second pass was needed about thrice as often (49 times out of 273 paragraphs,
instead of 16 times); furthermore the second pass was much more tolerant of wide
spaces (p2 = 10 instead of 72), in, order to guarantee that every paragraph could be
typeset without manual intervention. There were about 6 examinations of active nodes
per legal breakpoint encountered, instead of about 3 , so the net effect of this change
in parameters was to nearly double the running time for line breaking. The reason for
such a discrepancy was primarily the combination of difficult mathematical copy and
a narrower column measure, rather than the ‘author tuning’, because when the same
text was set 3 5 picas wide the second pass was needed only 8 times.
It is interesting to observe the quality of the spacing obtained in this 25-pica experi-
ment, since it indicates how well the optimum-fit method can do without any human
intervention. Figure 20 shows what was obtained, together with the corresponding
statistics for the best-fit method when it was applied to the same data. About 800 line
breaks were involved in each case, not counting the final lines of paragraphs. The main
difference was that optimum-fit tended to put more lines into the range *5 d Y d 1,
while best-fit produced considerably more lines that were extremely spaced out. T h e
standard deviation of spacing was 0.53 (optimum-fit) versus 0.65 (best-fit); 24 of the
lines typeset by best-fit had spaces exceeding 12 units, while only 7 such bad lines were
produced by the optimum-fit method. An examination of these seven problematical
cases showed that three of them were due to long unbreakable formulas embedded in
the text, three were due to the rule that TEX does not try to hyphenate capitalized
words, and the other one was due to TEX’Sinability to hyphenate the word ‘reasonable’.
Cursory inspection of the output indicated that the main difference between best-fit
and optimum-fit, in the eyes of a casual reader, would be that the best-fit method not
only resorted to occasional wide spacing, it also tended to end substantially more lines
with hyphens: 119 by comparison with 80. An author who cares about spacing, and
1166 DONALD E. KNUTH AND MICHAEL F. PLASS
-1.00 5 r
-0.75 5 r
-0.50 5 r
-0.25 5 r
0.00 5 r
t0.25 5 r
<-0.75
<-0.50
<-0.25
< 0.00
<+0.25
<$-0.50
-1.00 5 r <-0.75
-0.75 5 t < -0.50
-0.50 5 t <-0.25
-0.25 5 t < 0.00
0.00 5 r <+0.25
+0.25 5 t <+0.50
a,
+0.50 5 r < i-0.75 +0.50 5 r <+0.75
t0.75 5 r <+l.OO +0.75 5 r <+l.OO
+l.OO 5 r <+1.25 +l.OO 5 r <+1.25
+1.25 5 r <+1.50 +1.25 5 r <+l.50
+l.50 5 r <+1.75 +l.50 5 r <+1.75
f1.75 5 r <+2.00 +1.75 5 r <+2.00 'Optimum fit'
t2.00 5 r <+w +2.00 5 r <+oo !I
Figure 20. The distribution of interword spaces found by the best line-at-a-time method,
compared to thedistributionfound by the best paragraph-at-a-timemethod, whendificult
mathematical copy i s typeset without human intervention.
who therefore will edit a manuscript until it can be typeset satisfactorily, would have
to do a significant amount of extra work in order to get the best-fit method to produce
decent results with such difficult copy, but the output of the optimum-fit method could
be made suitable with only a few author's alterations.
A HISTORICAL SUMMARY
We have now discussed most of the issues that arise in line breaking, and it is interesting
to compare the newfangled approaches to what printers have actually been doing
through the years. Medieval scribes, who prepared beautiful manuscripts by hand
before the days of printing, were generally careful to break lines so that the right-
hand margins would be nearly straight, and this practice was continued by the early
printers. Indeed, printers had to fill up each line of type with spaces anyway, so that
the individual letters wouldn't fall out of position while making impressions, and it
wasn't too much more difficult for a compositor to distribute the spaces between words
instead of putting them at the ends of lines.
One of the most difficult challenges faced by printers over the years has been the
typesetting of 'polyglot Bibles'-editions of the Bible in which the original languages
are set side by side with various translations-since special care is needed to keep
the versions of various languages synchronized with each other. Furthermore the fact
that several languages appear on each page means that the texts tend to be set with
narrower columns than usual; this, together with the fact that one dare not alter the
sacred words, makes the line-breaking problem especially difficult. We can get a good
idea of the early printers' approaches to line breaking by examining their polyglot
Bibles carefully.
The first polyglot Bible'6,'7,'8 was produced in Spain by the eminent Cardinal
J i m h e z de Cisneros, who reportedly spent 50,000 gold ducats to support the project.
It is generally called the Complutensian Polyglot, because it was prepared in Alcalh
de Henares, a city near Madrid whose old Roman name was Complutus. The printer,
Arnao Guillen de Brocar, devoted the years 1514-1517 to the production of this six-
volume set, and it is said that the Hebrew and Greek fonts he made for the occasion
are among the finest ever cut. His approach to justification was quite interesting and
unusual, as shown in Figure 21 : Instead of justifying the lines by increasing the word
spaces, he inserted visible leaders to obtain solid blocks of copy with straight margins.
BREAKING PARAGRAPHS INTO LINES 1167
These leaders appear at the right of the Latin lines and at the left of the Hebrew lines.
He changed this style somewhat after gaining more experience: Starting at about the
46th chapter of Genesis, the Hebrew text was justified by word spaces, although the
leaders continued to appear in the Latin column. It is clear that straight margins were
considered strongly desirable at the time.
Brocar’s method of line breaking seems to be essentially a first-fit approach to the
Hebrew text; the corresponding Latin translation could then be set up rather easily,
since there were two lines of Latin for each line of Hebrew, and this gave plenty of room
for the Latin. In some cases when the Greek text was abnormally long by comparison
with the corresponding Hebrew (e.g., Exodus 38), Brocar set the Hebrew quite loosely,
so it is evident that he gave considerable attention to line breaking.
At about the same time, a polyglot version of the book of Psalms was being prepared
as a labor of love by Agostino Giustiniani of Genoa.” This was the first polyglot book
actually to appear in print with each language in its own characters, although Origen’s
third-century Hexupla manuscript is generally considered to be the inspiration for all
of the later polyglot volumes. Giustiniani’s Psalter had eight columns: ( 1 ) The Hebrew
original; (2) A literal Latin rendition of (1); (3) The common Latin (Vulgate) version;
(4)T h e Greek (Septuagint) version; (5) The Arabic version; (6) The Chaldee version;
(7) A literal Latin translation of (6); (8) Notes. Since the Psalms are poems, all of the
columns except the last were set with ragged margins, and an interesting convention
was used to deal with the occasional line that was too wide to fit: A left parenthesis was
placed at the very end of the broken line, and the remainder of that line (preceded by
another left parenthesis) was placed flush with the margin of the preceding or following
line, wherever it would fit.
Only column (8) was justified, and it had a rather narrow measure of about 21 char-
acters per line. By studying this column we can conclude that Giustiniani did not take
great pains to make equal spacing by fiddling with the words. For example, Figure 22,
1168 DONALD E. KNUTH AND MICHAEL F. PLASS
which comes from the notes on Psalm 6, shows two very tight lines enclosing a very
loose one in the passage ‘scriptum est . . .quod qui’. If Giustiniani had been extremely
concerned about spacing he would have used the hyphenation ‘cog-nosces’; the other
potential solution, to move ‘ad’ up a line, would not have worked since there isn’t quite
room for ‘ad’ on the loose line. Notice that another aid to line breaking in Latin at
that time was to replace an m or n by a tilde on the previous vowel (e.g., ‘premifi’
for premium and ‘miido’ for mundo); an extension to the box/glue/penalty algebra
would be needed to include such options in TEX’Sline-breaking algorithm! It is not
clear why Giustiniani didn’t set ‘acceperk’ on the third line, to save space, since he
had no room for the hyphen of ‘in-tellectum’; perhaps he didn’t have enough 6’s left
in his type case.
Figure 23 shows some justified text from the Complutensian polyglot, taken from
the Latin translation of an early Aramaic translation of the original Hebrew. The
compositor was somewhat miraculously able to maintain this uniformly tight spacing
throughout the entire volume, by making use of abbreviations and frequent hyphena-
tions. Note that, as in Figure 22, the hyphen was omitted from a broken word when
there was no room for it; e.g., ‘diuisit’ has been divided without a hyphen.
The next great polyglot Bible was the Royal Polyglot of Antwerp,20produced during
1568-1 572 by the outstanding printer Christophe Plantin. Numerous copies of the
Complutensian Polyglot had unfortunately been lost at sea, so King Phillip I1 commis-
sioned a new edition that would also take advantage of recent scholarship. Plantin
was a pious man who was active in pacifist religious circles and anxious to undertake
the job; but when he had completed the work he described it as an ‘indescribable toil,
labor, and expense.’ On June 9, 1572, Plantin sent a letter to one of his friends, saying
‘I am astonished at what I undertook, a task I would not do again even if I received
12,000 crowns as a gift.’ But at least his work was widely appreciated: Lucas of Bruges,
writing in 1577, said that ‘the art of the printer has never produced anything nobler,
nor anything more splendid.’
Most of Plantin’s polyglot Bible was justified with fairly wide columns having about
42 characters per line, so it did not present especially difficult problems of line breaking.
But we can get some idea of his methods by studying the texts of the Apocrypha, which
were set with a narrower measure of about 27 characters per line. He arranged things
so that each column on a page would have about the same number of lines, even though
the individual columns were in different languages. Figure 24 shows an example of a
passage excerpted from a page where the Latin text was comparatively sparse, so the
paragraphs on that page needed to be rather loose. It appears that the entire page
was set first, then adjustments were made after the Latin column was found to be
too short; in this case the word ‘eos’ was brought down to make a new line and the
previous line was spaced out. Plantin’s compositor did not take the trouble to move
‘sab-’ down to that line, although such a transposition would have avoided a hyphen
without making the spacing any worse. T h e optimum solution would have been to
avoid this hyphenation and to hyphenate the previous line after ‘ad-’, thus achieving
fairly uniform spacing throughout.
The most accurate and complete of all polyglot Bibles was the London
printed by Thomas Roycroft and others during the Cromwellian years 1653-1 657.
This massive 8-volume work included texts in Hebrew, Greek, Latin, Aramaic, Syriac,
Arabic, Ethiopic, Samaritan, and Persian, all with accompanying Latin translations,
and it has been acclaimed as ‘the typographical achievement of the seventeenth cen-
tury.’ As in Plantin’s work shown in Figure 24, a paragraph that has been loosened will
often eod with an unnecessarily tight hyphenated line followed by a loose line followed
by a one-word line; so it is clear that Roycroft’s compositors did not have time to do
complex adjustments of line breaks.
Hyphenations were clearly not frowned upon at the time, since about 40% of all
lines in the London Polyglot end with a hyphen, regardless of the column width. It
is not difficult to find pages on which hyphenated lines outnumber the others; and in
the Latin translation of the Aramaic version of Genesis 4:15, even the two-letter word
‘e-o’ was hyphenated! Such practice was not uncommon: for example, the Hamburg
Polyglot Bible” of 1596 had more than 50% hyphens at the right margin. Both
Plantin’s polyglot and the notes of Giustiniani’s Psalter had hyphenation percentages
of about 40%, and the same was true of many medieval manuscripts. Thus it was
1170 DONALD E. KNUTH AND MICHAEL F. PLASS
considered better to have the margins straight and to keep the spacing tight, rather
than to avoid word splits.
One of the first things that strikes a modern eye when looking at these old Bibles
is the treatment of punctuation. Note, for example, that no space appears after the
commas in Figure 22, and a space appears before as well as after one of the commas
in Figure 24. One can find all four possibilities of ‘space before/no space before’
and ‘space after/no space after’ in each of the Bibles mentioned so far, with respect
to commas, periods, colons, semicolons, and question marks, and with no apparent
preference between the four choices except that it was comparatively rare to put a
space before a period. Giustiniani and Plantin occasionally would insert spaces before
periods, but Roycroft apparently never did. Commas began to be treated like periods
in this respect about 1700, but colons and semicolons were generally both preceded and
followed by spaces until the 19th century. Such extra spaces were helpful in justifying,
of course, and it was also helpful to have the option of leaving out all of the space next
to a punctuation mark. Roycroft would in fact eliminate the space between words
when necessary, if the following word was capitalized (e.g., ‘dixitDeus’); apparently a
printer’s main goal was to keep the text unambiguously decipherable, while ease of
readability was only of secondary importance.
Knowledge about how to carry out the work of a trade like printing was originally
passed from masters to apprentices and not explained to the general public, so we can
only guess at what the early printers did by looking at their finished products. A trend
to put trade secrets into print was developing during the 17th century, however,23
and a book about how to make books was finally written: Joseph Moxon’s Mechanick
published in 1683, was by forty years the earliest manual of printing in
any language. Although Moxon did not discuss rules for hyphenation and punctuation,
he gave interesting information about line breaking and justification.
‘If the Compositor is not firmly resolv’d to keep himself strictly to the Rules of
good Workmanship, he is now tempted to make Botches.. .’, namely bad line breaks,
according to Moxon. The normal ‘thick space’ between words, when beginning to make
up a line, was one-fourth of what Moxon called the body size (one em), and he also spoke
of ‘thin spaces’ that were one-seventh of the body size; thus, a printer who followed
this practice would deal mostly with spaces of 4.5 units and 2.57 units, although these
measurements were only approximate because of the primitive tools used at the time.
Moxon’s procedure for justifying a line whose natural width was too narrow was to
insert thin spaces between one or more words to ‘fill up the Measure pretty stiff,’ and if
necessary to go back through the line and do this again. ‘Strictly, good Workmanship
will not allow more [than the original space plus two thin spaces], unless the Measure
be so short, that by reason of few Words in a Line, necessity compells him to put more
Spaces between the the W o r d s . . .These wide Whites are by Compositers (in way of
Scandal) call’d Pidgeon-holes. . . .And as Lines may be too much Spaced-out, so may
they be too close Set.’
Notice that Moxon’s justification procedure would normally leave uneven spacing
between words on the same line, since he inserts the thin spaces one by one. In fact,
such discrepancies were the norm in early printed books, which look something like
present-day attempts at justification on a typewriter or computer terminal with fixed-
width spacing. For example, the relative proportions in the spaces of the third line of
Plantin’s text in Figure 24 are approximately 8 : 12 : 5 : 9 :4, and in the fifth line
of Giustiniani’s Figure 22 they are approximately 3 : 2 : 1. Moxon’s book itself (see
BREAKING PARAGRAPHS INTO LINES 1171
Figure 25) shows extreme variations, frequently breaking the rules he had stated for
maximum and minimum spaces between words.
It would be nice to report that Moxon described a particular line-breaking algorithm,
like the first-fit or best-fit method, but in fact he never suggested any particular
procedure, nor did any of his successors until the computer age; this is not surprising,
since people were just expected to use their common sense instead of to obey some rigid
rules. Many of the breaks in Figure 25 can, however, be accounted for by assuming
an underlying first-fit algorithm. For example, the looseness on lines 1, 4, and 8 is
probably due to the long words at the beginning of lines 2, 5, and 9, since these long
words would not fit on the previous line unless they were hyphenated. O n the other
hand, the extremely tight spacing on line 13 can best be explained by assuming that
one or more words had to be inserted to correct an error after the page had been set.
Thus we cannot satisfactorily infer the compositor’s procedure from the final copy, we
really need to see the first trial proofs. All we can conclude for certain is that there
was very little attempt to go back and reconsider the already-set lines unless it was
absolutely necessary to do so; for example, this paragraph would have been better if
the first line had ended with ‘can-’ and the second with ‘wherefore’.
Moxon’s compositor was, however, supposed to look ahead: ‘When in Composing he
comes near a Break [i.e., the end of a paragraph], he for some Lines before he comes to
it considers whether that Break will end with some reasonable White; If he finds it will,
he is pleas’d, but if he finds he shall have but a single W o r d in his Break, he either Sets
wide to drive a Word or two more into the Break-line, or else he Sets close to get in that
little Word, because a Line with only a little Word in it, shews almost like a White-line,
which unless it be properly plac’d, is not pleasing to a curious Eye.’
Another extract from a London printing manual25is shown in Figure 26; this one
is from 1864 instead of 1683. Although the author says that the justifying spaces
are to be made as nearly equal as possible, whoever did the composition of his book
did not follow the instructions it contains! Only one of the fine books considered
above has spaces that look the same, namely the Complutensian Polyglot. In fact,
printers only rarely achieved truly uniform spacing until machines like the Monotype
1172 DONALD E . KNUTH AND MICHAEL F. PLASS
and Linotype made the task easier towards the end of the nineteenth century; and these
new machines, with their emphasis on speed, changed the philosophy of justification
so much that the quality of line breaking decreased when the spacing became uniform:
It became too inconvenient for the compositor to go back and reconsider any of the
earlier line breaks of a paragraph, when he was expected to turn out so many more
ems of type per hour.
The line breaks in Figure 26 are fairly well done in spite of the uneven spacing, given
that the compositor wished to avoid hyphenations and the psychologically bad break
in the phrase ‘with j’; it would have been slightly better, however, to move the word
‘but’ down to the third-last line.
Probably the most beautiful spacing ever achieved in any typeset book appeared in
The Art of Spacing26 by Samuel A. Bartels (1926). This book was hand set by the
author, and it contains about 50 characters per line. There are no loose lines, and
no hyphenated words; the final line of each paragraph always fills at least 65% of the
column width, yet ends at least one em from the right margin. Bartels must have
changed his original wording many times in order to make this happen; the author as
compositor is clearly able to enhance the appearance of a book.
General-purpose computers were first applied to typesetting by Georges P. Bafour,
And& R. Blanchard, and Franqois H. Raymond in France, who applied for patents
on their invention in 1954. (They received French and British patents in 1955, and a
U.S. patent in 1956.’” 2 8 ) This system gave special attention to hyphenation, and its
authors were probably the first to formulate the method of breaking one line at a time
in a systematic fashion. Figure 27 shows a specimen of their output, as demonstrated
at the Imprimerie Nationale in 1958. In this example the word ‘en’was not included in
the second line because their scheme tended to favor somewhat loose lines: Each line
would contain as few characters as possible subject to the condition that the line was
feasible but the addition of the next K characters would not be feasible; here K was a
constant, and their method was based on a K-stage lookahead.
Michael P. Barnett began to experiment with computer typesetting at M.1.T.in 1961,
and the work of his group at the Cooperative Computing Laboratory was destined
to become quite influential in the U.S.A. For example, the TROFF system29 that
is now in use at many computer centers is a descendant of Barnett’s PC6 system’,
via other systems called RUNOFF and NROFF.Another line of descent is represented
by the PAGE-1, PAGE-2,and PAGE-3 systems, which have been used extensively in
the typesetting i n d ~ s t r y . ~ 3’ 1, , 32 All of these programs use the first-fit method of line
breaking that is described above.
At about the same time that Barnett began his M.I.T. studies of computer typeset-
ting, another important university research project with similar goals was started by
John Duncan at the University of Newcastle-Upon-Tyne Computing Laboratory.
BREAKING PARAGRAPHS INTO LINES 1173
Line breaking was one of the first subjects studied intensively by this group, and they
developed a program that would find a feasible way to typeset a paragraph without
hyphenations, if any sequence of feasible breaks exists, given minimum and maximum
values for interword spaces. This program essentially worked by backtracking through
all possibilities, treating them in reverse lexicographic order (i.e., starting with the first
breakpoint b , as large as possible and using the same method recursively to find feasible
breaks (b2,b,, . . .) in the rest of the paragraph, then decreasing b, and repeating the
process if necessary). Thus it would either find the lexicographically largest feasible
sequence of breakpoints or it would conclude that none are feasible; in the latter case
hyphenation was attempted. This was the first systematic sequence of experiments to
deal with the line-breaking problem by considering a paragraph as a whole instead of
working line by line.
No distinction was made in these early experiments between one sequence of feasible
breakpoints and another; the only criterion was whether or not all interword spacing
could be confined to a certain range without requiring hyphenation. Duncan found
that when lines were 603 units wide, it was possible to avoid virtually all hyphenations
if spaces were allowed to vary between 3 and 12 units; with 405-unit lines, however,
hyphens were necessary about 3% of the time in order to keep within these fairly
generous limits, and when the line width decreased to 288 units the hyphenation
percentage rose to 12% or 16% depending on the difficulty of the copy being typeset.
More stringent intervals, such as the requirement of 4-to 9-unit spaces used in most
of the examples we have been considering above, were found to need more than 4%
hyphenations on 603-unit lines and 30% to 40% on 288-unit lines. However, these
numbers are higher than necessary because the Newcastle program did not search for
the best places to insert hyphens: Whenever it was unfeasible to set more than k lines,
the (k+ 1)st line was simply hyphenated and the process was restarted. One hyphen
generated by this method tends to spawn more in the same paragraph, since the first
line of a paragraph or of an artificially resumed paragraph is the most likely to require
hyphenation. Examples of the performance can be seen in the article where the method
was introduced’ (using spaces of 4 to 15 units for the first six pages and 4 to 12 units
for the rest), as well as in Duncan’s survey paper.2 These articles also discuss possible
refinements to the method, one idea being to try to avoid loose lines next to tight lines
in some unspecified manner, another being to try the method first with strict spacing
intervals and then to increase the tolerance before resorting to hyphenation.
Such refinements were carried considerably further by P. I. Cooper33 at Elliott
Automation, who developed a sophisticated experimental system for dealing with entire
paragraphs. Cooper’s system worked not only with minimum and maximum spacing
parameters, it also divided the permissible interword spaces into different sectors that
yielded different so-called ‘penalty scores’. Besides the penalties associated with the
spaces on individual lines, there were additional penalty scores based on the respective
spacing sectors of two consecutive lines, and the goal was to minimize the total penalty
1174 DONALD E. KNUTH AND MICHAEL F. PLASS
needed to typeset a given paragraph. Thus, his model was rather similar to the T E X
model that we have been discussing, except that all spaces were equivalent to each
other and special problems like hyphenation were not treated.
Cooper said that his program ‘employs a mathematical technique known as “dynamic
programming’’ ’ to select the optimum setting. However, he gave no details, and from
the stated computer memory requirements it appears that his algorithm was only an
approximation to true dynamic programming in that it would retain just one optimum
sum-of-penalties for each breakpoint, not for each (breakpoint, sector) pair. Thus, his
algorithm was probably similar to the method given in the appendix below.
Unfortunately, Cooper’s method was ahead of its time; the consensus in 1966 was
that such additional computer time and memory space were prohibitively expensive.
Furthermore his method was evaluated only on the basis of how many hyphens it would
save, not on the better spacing it provided on non-hyphenated lines. For example, J. L.
Dolby’s notes on this paper34compared Cooper’s procedure unfavorably to Duncan’s
since the Newcastle method removed the same number of hyphens with what appeared
to be a less complex program. In fact, Cooper himself undersold his scheme with
unusual modesty and caution when he spoke about it: He said ‘this investigation does
not support the view that [my approach] should be given a general and enthusiastic
recommendation.. . . It has to be admitted that an aesthetic improvement is neither
predictable nor measurable.’ His method was soon forgotten.
In retrospect we can see that the defect in Cooper’s otherwise admirable approach
was the way it dealt with hyphenation: No proper tradeoff between hyphenated lines
and feasible unhyphenated lines was made, and the method would be restarted after
every hyphen had to be inserted. Thus, the hyphens tended to cluster as in the
Newcastle experiments.
Another approach to line breaking has recently been investigated by A. M. Pringle
of Cambridge University, who devised a procedure called Juggle.35 This algorithm
uses the best-fit method without hyphenation until reaching a line that cannot be
accommodated; then it calls a recursive procedure pushback that attempts to move
a word from the offending line up into the previous text. If pushback fails to solve
the problem, another recursive routine pullon tries to move a word forward from the
previous text; hyphenation is attempted only if pullon fails too. Thus, Juggle attempts
to simulate the performance of a methodical super-conscientious workman in the good
olde days of hand composition. The recursive backtracking can, however, consume a lot
of time by comparison with a dynamic programming approach, and an optimum
sequence of line breaks is not generally achieved; for example, Figure 2 would be
obtained instead of Figure 3 . Furthermore there are unusual cases in which feasible
solutions exist but Juggle will not find them; for example, it may be feasible to push
back two words but not one.
Hanan Samet has suggested another measure of optimality in his recent work on line
breaking.36Since all methods for setting a paragraph in a given number of lines involve
the same total amount of blank space, he points out that the average interword space in
a paragraph is essentially independent of the breakpoints (if we ignore the fact that the
final line is different). Therefore he suggests that the variance of the interword spaces
should be minimized, and he proposes a ‘downhill’ algorithm that shifts words between
lines until no such local transformation further reduces the variance.
The first magazine publisher to develop computer aids to typesetting was Time Inc. of
New York City, whose line-breaking decisions went largely on-line in 1967. According
BREAKING PARAGRAPHS INTO LINES 1175
to comments made by H. D. Parks3’ at the time, line breaks were determined one by
one using a variation of the first-fit algorithm that we might call ‘tight-fit’; this gives
the most words per line except that hyphenation is done only when necessary, and
it is equivalent to the first-fit method if the normal interword spacing is the same as
the minimum. The tight-fit method had previously been used on the IBM 1620 Type
Composition System demonstrated in 1963 (see Duncan,2 pages 159-160), and it is
reasonable to suppose that essentially the same method was carried over to the Time
group when they dedicated two IBM 360/40 computers to the typesetting task.38
Since the final copy in Time magazine has been edited and re-edited, and since
manual intervention and last-minute corrections will change line-breaking decisions, it
is impossible to deduce what algorithm is presently used for Time articles merely by
examining the printed pages; but it is tempting to speculate about how the optimum-
fit algorithm might improve the appearance of such publications. Figure 28 on the
next page shows an interesting example based on page 22 of Time magazine dated
June 23, 1980; Version A shows the published spacing and Version B shows what the
new algorithm would produce in the same circumstances. All letters of the text have
been replaced by n’s of the corresponding width, so that it is possible to concentrate
solely on the spacing; however, it should be pointed out that this device makes bad
spacing look more innocuous, since a reader isn’t so annoyingly distracted when no
semantic meaning is present anyway.
The most interesting thing about Figure 28 is that the final line of the first paragraph
was brought flush right in order to balance the inserted photograph properly; this
photograph actually carried over into the right-hand column. Version A shows how the
desired effect was achieved by stretching the final three lines, leaving large gaps that
surely caught the curious eye of many a reader; Version B shows how the optimizing
algorithm is magically able to look ahead and make things come out perfectly. Perhaps
even more important is the fact that Version B avoids the need for letterspacing that
spoiled the appearance of lines 6, 9, 10, 23, and 32 in Version A.
Letterspacing-the insertion of tiny spaces between the letters of a word so as to
make large interword spaces less prominent-could readily be incorporated into the
box/glue/penalty model, but it is almost universally denounced by typographers. For
example, De Vinne14 said that letterspacing is improper even when the columns are
so narrow that some lines must contain only a single word; Bruce Rogers39 said ‘it
is preferable to put all the extra space between the words even though the resultant
“holes” are distressing to the eye.’ Even one-fourth of a unit of space between letters
makes the word look noticeably different. According to the style rules of the U.S.
Congressional Record4’, ‘In general, operators should avoid wide spacing. However,
no letterspacing is permitted.’ The optimum-fit algorithm therefore makes it possible
to comply more easily with existing laws.
The idea of applying dynamic programming to line breaking occurred to D. E. Knuth
in 1976, when Professor Leland Smith of Stanford’s music department raised a related
question that arises in connection with the layout of music on a page (see Clancy
and Knuth4’). During a subsequent discussion with students in a problem-solving
seminar, someone pointed out that essentially the same idea would apply to the texts of
paragraphs as well as to music. The box/glue/penalty model was developed by Knuth
in April 1977 when the initial design of T E X was made, although it wasn’t clear at
that time whether a general optimizing algorithm could be implemented with enough
efficiency for practical use. Knuth was blissfully unaware of Cooper’s supposedly
1176 DONALD E. KNUTH AND MICHAEL F. PLASS
nn N m ru F
m
l
A11p11u1l1 Nnnnmm
n nni inu n nnnnrln,
nru nn nnnm-inn
lnlmu, nnn mnn-
nin inn i n n , in niiinin.
- Nrarmmnn 15% ru lnn mun’i
nn N m ru A m mnn munu, inn A m N m
Anmum Nruuuuu~ iiininnni i i n n n niniini ni
nnruumnannnlln, niiinni nninnini in Nniiiin
nlll nn rnIIlIl--1I1P N m ln inn Nnunn N m nn
lnlnnll, nnn lllnnn- lnn unru Nmlnn, nnlnll mnnn-
mn L[LP mu, m m. Nn- llnnDll nnn mnun l l n n nnnnt;
nnmnn 15% ru am mm’i ll~~lll1l1ul3,OOO rum---
innn m, inn A n m u Nn- inn mllnna. mn N m Niun
innlllnlnnalnnnnnmm m Ann Hnnallln-lnnrulm inn
m w n m nnlnnlal in N n m m 1~1nnmrulnnN~Nnnnn
N~unnn ln lnn N w A m l l ~ ~ l~~ llp 1 1 ~ n - m N m
nn inn un ru N I I U ~ U , N n n N ~ I ~ I I Nnn
. 262.4
mlnnllnlnllnnnnlnnnnm mnnru ru inn N.A.A.N. m n n n
nnnru; nm umn 3,000 rum- m n u n m n 100 m u m nmuu
mmmnmnnll.nnnamnNnn nnn nullrl nnlnnxn nnn NnlM-
hhm in Rnn Fhnnallln-un- ninni, Nnini, Nnnnnii nnn
n l m l n n l n n m m l n n nnnnunu N ~ u ~ u m1l111p1.
n Nnm
NU NU llllt7 L[LP 11111- IupIunnl ln mnn,
nxunnnn N m N n n N n ~ m Nnn . m Nnumn, mmm mn m i
262.4 mum ~ u u p ~rul l mn nmnn ru n 19m lllll~ll~l
N.A.A.N. xn nrun umn N n u m n n u m u . Nnnn, I~ILII~II~~,
100 nlrulln nlnnnl nnn nullrl nn- mnrm in nrun m 100 mmrm
i n m mn Nmmunm, Nnuu, nnn hrmnn, NM,
N- ll~l~l Nm- Nnnnnn, N w , ru mn nm-
nun mum. N m rummnnm
~llnlnll m man, m Nnrunn, N m nnn nnnnn ~~1.lnlll. nmu mmnm m A m
nnnni inn mrunrm nmmn ru N m m lllllll~lnnn mum. Nnn ~~nnlnlnnl nmmunnm ni
n 19in n m m n Nurum nmnm. lnn N.A.A.N. m mmm, n n u n m nnn m u m nlnnnallnn m u
Nnnn, nnnnnu, l~lpnn111 nrun
Lnnn 100 lnnnnnl nnn nnllnln
Ilnnnrmn, Nnn, Nnnnnn, N m ,
ln lnn nlllnlu lnlllll ru xlnnrum rlnrmll m m lM IlMln.
N o l n u nnn nnnnn mnmm nnnu nnnnnnn xtn A m i
N m nun llp~~ll~l nnn mum. Nnn nlnnlnlnnl n n m m - m m ru
inn N.A.A.N. m nmm, nnllnnllnla nnn mmnn rumrim mu
Figure 28. This example is based on the spacing in a recent issue of Time magazine, but
all of the letters have been replaced by n’s of various widths. If the text were readable, the
line breaks in Version B would be less distracting than those in Version A .
PROBLEMS A N D REFINEMENTS
One unfortunate restriction remains in T E X although it is not inherent in the box/
glue/penalty model: When a break occurs in the middle of a ligature (e.g., if ‘efficient’
becomes ‘ef-ficient’), the computation of character widths is more complicated than
usual. We must take into account not only the fact that a hyphen has some width, but
also the fact that ‘f‘ followed by ‘fi’ is wider than ‘ffi’. The same problem occurs when
setting German text, where some compound words change their spelling when they are
hyphenated (e.g., ‘backen’becomes ‘bak-ken’and ‘Bettuch’becomes ‘Bett-tuch’). T E X
does not permit such optional spelling variants; it will only insert an optional hyphen
character among other unchangeable characters. Manual intervention is necessary in
the rare cases when a more complicated break cannot be avoided.
It is interesting to consider what extension would be needed to make the optimum-
fit algorithm handle cases like the dropping of m’s and n’s in Figure 22. The badness
function of a line would then depend not only on its natural width, stretchability, and
shrinkability; it would also depend on the number of m’s and the number of n’s on
that line. A similar technique could be used to typeset biblical Hebrew, which is never
hyphenated: Hebrew fonts intended for sacred texts usually include wide variants of
several letters, so that individual characters on a line can be replaced by their wider
counterparts in order to avoid wide spaces between words. For example, there is a
super-extended aleph in addition to the normal one. An appropriate badness function
for the lines of such paragraphs would take account of the number of dual-width
characters present.
The most serious unanticipated problem that has arisen with respect to TEX’Sline-
breaking procedure is the fact that floating-point arithmetic was used for all the
calculations of badness, demerits, etc., in the original implementations. This leads
to different results on different computers, since there is so much diversity in existing
floating-point hardware, and since there are often two choices of breakpoints having
almost the same total demerits. It is important to be able to guarantee that all versions
of T E X will set paragraphs identically, because the ability to proofread, edit, and
print a document at different sites is becoming significant. Therefore the ‘standard’
version of TEX, planned for release in 1982, will use fixed-point arithmetic for all of
its calculations.
Books on typography frequently discuss a problem that may be the most serious
consequence of loose typesetting, the occasional gaps of white space that are called
‘houndsteeth’ or ‘lizards’ or ‘rivers’. Such ugly patterns, which run up through a
1178 DONALD E. KNUTH AND MICHAEL F. PLASS
sequence of lines and distract the reader’s eye, cannot be eliminated by a simple efficient
technique like dynamic programming. Fortunately, however, the problem almost never
arises when the optimum-fit algorithm is used, because the computer is generally able
to find a way to set the lines with suitably tight spacing. Rivers begin to be prevalent
only when the tolerance threshold p has been set high for some reason, for example in
Figure 7 where an unusually narrow column is being justified, or in Figure 18(d)where
the paragraph is two lines longer than optimum. Another case that sometimes leads to
rivers arises when the text of a paragraph falls into a strictly mechanical pattern, as
when a newspaper lists all of the guests at a large dinner party. Extensive experience
with T E X has shown, however, that manual removal of rivers is almost never necessary
after the optimum-fit algorithm has been used.
T h e box/glue/penalty model applies in the vertical dimension as well as in the
horizontal, so TEX is able to make fairly intelligent decisions about where to start
each new page. T h e tricks we have discussed for such things as ragged-right setting
correspond to analogous vertical tricks for such things as ‘ragged-bottom’ setting.
However, the current implementation of TEX keeps each page in memory until it has
been output, so T E X cannot store an entire document and find strictly optimum page
breaks using the algorithm we have presented for line breaks. The ‘best-fit’ method is
therefore used to output one page at a time.
Experiments are now in progress with a two-pass version of TEX that does find
globally optimum page breaks. This experimental system will also help with the
positioning of illustrations as near as possible to where they are cited in the accom-
panying text, taking proper account of the fact that certain pages face each other.
Many of these issues can be resolved by extending the dynamic programming technique
and the box/glue/penalty model of this paper, but some closely related problems can
be shown to be N P complete.42
in terms of &em units, where GO stands for some large number. T h e width w1
of the first box should include the blank space needed for paragraph indentation;
thus, the Grimm fairy tale example of Figure 1 would be represented by
w l , . . . , W , = 34,42,42,.. . , 2 4 , 3 9 , 3 0 , .. . , 6 0 , 7 9
g, ,...,g,= 1 , 1 , 1 ,..., 1 , 2, 1, . . . , 1 , 3
corresponding to
respectively, using widths from a typical roman font of type. T h e general input
sequences w1 . . . w, and g, . . .g, can be expressed in the box/glue/penalty model
by the equivalent specification
z; = xg-zg
and the maximum is
Y; = xg + PYg,
where p is a positive tolerance that can be varied by the user. For example, if
p = 2 the maximum type g space is xg+ 2yg, the normal amount plus twice the
stretchability.
d) Hyphenation is performed only at the point where feasible line breaking becomes
impossible, even though it may be better to hyphenate an earlier word. Thus,
the general optimum-fit algorithm of the text will give substantially better results
when high-quality output is desired and hyphenation is frequently necessary.
e) No penalty is assessed for a tight line next to a loose line, or for consecutive
hyphenated lines, and the algorithm does not produce paragraphs that are longer
or shorter than the optimum length. (In other words, a = y = q = 0 in the
general algorithm.)
Under these restrictions, optimum breakpoints can be found with extra efficiency.
1180 DONALD E. KNUTH AND MICHAEL F. PLASS
where sk denotes the minimum sum of demerits leading to a break after box k , or
s k = 00 if there is no feasible way to break there; and
where p k is meaningful only if sk < co, in which case the best case to end a line at
box k is to begin it with box pk + 1 . We also assume that
this represents an invisible box at the very end of the final line of the paragraph.
+ ,,
Besides the 4n 4 storage locations for w 1 . . .w,+ g, . . .g,, so. . . s,+ and
p , . . . p n +1 , and the memory required to hold the parameters I, p , and (xg,yi, .zi) for
each type g, the stripped-down algorithm needs only a few miscellaneous variables:
end.
Again we can conclude that the while loop must terminate, since it will not be executed
when k = j + 1. The innermost code is easily fleshed out:
(consider breaking from a t o . . . t o j to k) =
-
begin if Z’ < I then r := p (I - Z’)/(XLax- Z’)
else if C’ > I then r : = (I - Z’)/(Z’- ELi,,)
else r := 0;
+ I
d := sj+ (1 100 r ) 3 ) 2 ;
if d < d’ then
begin d’ : = d; j’:= j ;
end;
end.
When hyphenation is necessary, the algorithm goes into panic mode, first searching
for the last value of i that was feasible, then attempting to split word k. At this point
the line from i to k - 1 is too short, and from i to k it is too long, so there is hope
that hyphenation will succeed.
1182 DONALD E. KNUTH AND MICHAEL F. PLASS
while q # i do
begin (output the line from box q+ 1 to box s, inclusive);
q:=s; s:=pq;
end;
end.
In practice there is only a bounded amount of memory available for implementing this
algorithm, but arbitrarily long paragraphs can be handled if we make a minor change
suggested by Cooper33:When the number of words in a given paragraph exceeds some
maximum number nmax,apply the method to the first nmaxwords; then output all
but the final line and resume the method again, beginning with the copy carried over
from the line that was not output.
ACKNOWLEDGEMENTS
We wish to thank Barbara Beeton of the American Mathematical Society for numerous
discussions about ‘real world’ applications; we also are grateful to James Eve of the
University of Newcastle-Upon-Tyne and Neil Wiseman of Cambridge University for
helping us obtain literature that was not readily available in California; and we thank
the librarians of the rare book rooms at Columbia University and Stanford University
for letting us study and photograph excerpts from polyglot Bibles. John Wiley & Sons
Limited have taken unudual care in typesetting this paper in exact accordance with
the line breaks and page breaks found by TEX.
REFERENCES
1. Michael P. Barnett, Computer Typesetting: Experiments and Prospects, M.I.T. Press, Cambridge,
Mass., 1965.
2. C. J. Duncan, ‘Look! No hands!’, The Penrose Annual 57, 121-168 (1964).
3. Michael R. Garey and David S. Johnson, Computers and Intractability, W. H. Freeman, San
Francisco, 1979.
4. Richard Bellman, Dynamic Programming, Princeton Univ. Press, Princeton, N.J., 1957.
5. M. Held and R. M. Karp, ‘The construction of discrete dynamic programming algorithms’, I B M
SystemsJ. 4, 136-147 (1965).
6. Donald E. Knuth, TEX and M E T A F O N T : New Directions in Typesetting, American Mathematical
Society and Digital Press, Bedford, Massachusetts, 1979.
7. Jakob Ludwig Karl Grimm and Wilhelm Karl Grimm, ‘Der Froschkonig (The Frog King)’, in
Kinder- und Hausmarchen, first published in Berlin, 1812. For the history of this story see Heinz
Rolleke, Die Altese Marchensammlung der Bruder Grimm, Fondation Martin Bodmer, Cologny-
G e n b e , 1979, pp. 144-153.
8. C . J. Duncan, J . Eve, L. Molyneux, E. S. Page, and Margaret G. Robson, ‘Computer typesetting:
an evaluation of the problems,’ Printing Technology 7 , 133-151 (1963).
9. Donald E. Knuth, Seminumerical Algorithms, Vol. 2 of The A r t of Computer Programming, second
edition, Addison-Wesley, Reading, Massachusetts, 1981.
10. A. Frey, Manuel Nouveau de Typographie, Paris (1835), 2 vols.
11. Kathleen Jensen and Niklaus Wirth, P A S C A L User Manual and Report, Heidelberg, Springer-
Verlag, 1975.
12. Donald E. Knuth, ‘BLAISE, a preprocessor for PASCAL,’ file BLAISE.DEK[up,doc] at SU-AI on
the ARPA network (March 1979). The program itself is on file BLAISE.SAI[tex,dek].
13. Donald E. Knuth, Tau Epsilon Chi: A System for Technical Text, book in preparation.
14. Theodore Low De Vinne, Correct Composition, Vol. 2 of The Practice of Typography, Century, New
York, 1901. The cited material appears on pages 138 and 206.
1184 DONALD E. KNUTH AND MICHAEL F. PLASS
15. George Bernard Shaw, ‘On Modern Typography’, The Dolphin 4, 80-81 (1940).
16. T. H . Darlow and H. F. Moule, Historical Catalogue of the Printed Editions of Holy Scripture in the
library of The British and Foreign Bible Society, T h e Bible House, London, 1911.
17. Basil Hall, The Greatest Polyglot Bibles, T h e Book Club of California, San Francisco, 1966.
18. Jimbnez de Cisneros, sponsor, Uetus testamentum multiplici lingua nunc primo impressum, Industria
Arnaldi Guillelmi de Brocario in Academia Complutensi, 1522. [The printing was completed in
1517, but papal permission to publish this book was delayed for several years.]
19. Aug. Giustiniani, Psalteriunz, Genoa, 1516.
20. Benedictus Arias Montanus, editor, Biblia Sacra Hebraice, Chaldaice, Grace, & Latine, Christoph.
Plantinus, Antwerp, 1569-1 573.
21. Brianus Waltonus, editor, Biblia Sacra Polgyglotta, Thomas Roycroft, London, 1657.
22. David Wolder, Biblia Sacra G r a c e , Latine & Germanice, Jacobus Lucius Juni., Hamburg, 1596.
23. Walter E. Houghton, Jr., ‘The History of Trades: its relation to seventeenth century thought,’
in Philip P. Wiener and Aaron Noland, eds., Roots of Scientific Thought, Basic Books, New York,
1957, pp. 354-381.
24. Joseph Moxon, Mechanick Exercises, J. Moxon, London, 1683. Reprinted by the Typothetae of
New York, 1896, with preface and notes by T. L. De Vinne; also reprinted by Oxford University
Press, London, 1958; but these reprints d o not capture the full feeling of the original, with its
less sumptuous seventeenth-century workmanship. Quoted passages are from vol. 2, pp. 214-21 5,
226, 245, 248.
25. D. G. Berri, The A r t of Printing, London, 1864.
26. Samual A. Bartels, The A r t of Spacing, T h e Inland Printer, Chicago, 1926.
27. G. P. Bafour, A. R. Blanchard, and F. H. Raymond, ‘Automatic Composing Machine,’ U.S. Patent
2762485, September 11, 1956. (See also British patent 771551 and French patent 1103000.)
28. G. Bafour, ‘A new method for text composition-The BBR System,’ Printing Technology 5,
no. 2, 1961, 65-75.
29. Joseph F. Ossanna, ‘NROFF/TROFF User’s Manual,’ Bell Telephone Laboratories Internal
memorandum, Murray Hill, New Jersey, 1975.
30. Paul E. Justus, ‘There is more to typesetting than setting type’, IEEE Trans. on Prof. Commun.
PC-15, 13-16, 18 (1972).
31. John Pierson, Computer Composition using P A G E - 1 , Wiley-Interscience, New York, 1972.
3 2. Information International, Inc., ‘PAGE-3 Composition Language,’ privately distributed. First
edition, October 31, 1975; second edition, October 20, 1976. T h e language is sometimes called
‘PAGE-111’ because of the company that created it.
33. P. I . Cooper, ‘The influence of program parameters on hyphenation frequency in a sophisticated
justification program,’ Advances in Computer Typesetting [Proceedings of the 1966 International
Computer Typesetting Conference], T h e Institute of Printing, London, 1967, 176-178, 21 1-212.
34. [Untitled] Moderators’ summaries of the papers presented at the International Computer Type-
setting Conference at the University of Sussex, T h e Institute of Printing, London, 1966.
35. Alison M. Pringle, ‘Justification with fewer hyphens,’ Rainbow Memo 170, University of Cam-
bridge Computer Laboratory, March 1980,
36. Hanan Samet, ‘Heuristics for the line division problem in computer justified text,’ preprint, Uni-
versity of Maryland, 1980.
37. H. D. Parks, ‘Computerized processing of editorial copy’, Advances in Computer Typesetting [Pro-
ceedings of the 1966 International Computer Typesetting Conference], T h e Institute of Printing,
London, 1967, 176-178, 211-212.
38. Herman Parks, contributions to the discussions, Proc. A S I S Workshop on Computer Composition,
American Society for Information Science, 1971, pp. 143-145, 151, 180-182.
39. Bruce Rogers, Paragraphs on Printing, William E. Rudge’s Sons, New York, 1943, p. 88.
40. U.S. Government Printing Office,Style Manual, Washington, D.C., 1973. T h e quote is from rule 22
(catch?).
41. Michael J . Clancy and Donald E. Knuth, ‘A programming and problem-solving seminar,’ report
STAN-(3-77-606, Computer Science Department, Stanford University, April 1977, 85-88.
42. Michael F. Plass, ‘Optimal pagination techniques for automatic typesetting systems,’ PI1.D. thesis,
Stanford University, June 1981.
S t r u c t u r e d P r o g r a m m i n g w i t h go to S t a t e m e n t s
DONALD E. KNUTH
Stanford University, Stanford, California 9~S05
A consideration of several different examples sheds new light on the problem of ereat-
ing reliable, well-structured programs that behave efficiently. This study focuses
largely on two issues: (a) improved syntax for iterations and error exits, making it
possible to write a larger class of programs clearly and efficiently without g o t o state-
ments; (b) a methodology of program design, beginning with readable and correct,
but possibly inefficient programs that are systematically transformed if necessary into
efficient and correct, but possibly less readable code. The discussion brings out op-
posing points of view about whether or not g o t o statements should be abolished;
some merit is found on both sides of this question. Fina!ly, an attempt is made to
define the true nature of structured programming, and to recommend fruitful direc-
tions for further study.
Keywords and phrases: structured programming, g o t o statements, language design,
event indicators, recursion, Boolean variables, iteration, optimization of programs,
program transformations, program manipulation systems searching, Quieksort,
efficiency
CR categories: 4.0, 4.10, 4.20, 5.20, 5.5, 6.1 (5.23, 5.24, 5.25, 5.27)
Copyright (~) 1974, Association for Computing Machinery, Inc. General permission to republish,
but not for profit, all or part of this material is granted, provided that ACM's copyright notice is
given and that reference is made to this publication, to its date of issue, and to the fact that reprint-
ing privileges were granted by permission of the Association for Computing Machinery.
Will UTOPIA 84, or perhaps we should call documentation of a program, instead of using
it NEWSPEAK, contain go to statements? At flow charts... Then I would code the pro-
gram in assembly language from the outline.
the moment, unfortunately, there isn't even Everyone liked these outlines better than
a consensus about this apparently trivial t h e flow charts I had drawn before, which
issue, and we had better not be hung up on w e r e not very neat--my flow charts had been
the question too much longer since there are nick-named "balloon-o-grams".
only ten years left. He reported that this method made programs
I will try in what follows to give a reason- easier to plan, to modify and to check out.
ably comprehensive survey of the go to When I met Schorre in 1963, he told me of
controversy, arguing both pro and con, with- his radical ideas, and I didn't believe they
out taking a strong stand one way or the would work. In fact, I suspected that it was
other until the discussion is nearly complete. really his rationalization for not finding an
In order to illustrate different uses of go t o easy way to put labels and go t o statements
statements, I will discuss many example into his META-II subset of ALGOL [84], a
programs, some of which tend to negate the language which I liked very much except for
conclusions we might draw from the others. this omission. In 1964 I challenged him to
There are two reasons why I have chosen to write a program for the eight-queens prob-
present the material in this apparently lem without using go to statements, and he
vacillating manner. First, since I have the responded with a program using recursive
opportunity to choose all the examples, I procedures and Boolean variables, very much
don't think it's fair to load the dice by select- like the program later published independ-
ing only program fragments which favor one ently by Wirth [96].
side of the argument. Second, and perhaps I was still not convinced that all go t o
most important, I tried this approach when I statements could or should be done away
lectured on the subject at UCLA in Feb- with, although I fully subscribed to Peter
ruary, 1974, and it worked beautifully: Naur's observations which had appeared
nearly everybody in the audience had the about the same time [73]. Since Naur's
illusion that I was largely supporting his or comments were the first published remarks
her views, regardless of what those views about harmful go to's, it is instructive to
were ! quote some of them here:
If you look carefully you will find that surpris-
ingly often a g o t o statement which looks back
1. ELIMINATION OF go to STATEMENTS really is a concealed for statement. And you
will be pleased to find how the clarity of the
Historical Background
algorithm improves when you insert the f o r
clause where it belongs. . . . If the purpose [of
At the I F I P Congress in 1971 I had the a programming course] is to teach ALGOLpro-
pleasure of meeting Dr. Eiichi Goto of gramming, the use of flow diagrams will do
Japan, who cheerfully complained that he more harm than good, in my opinion.
was always being eliminated. Here is the The next year we find George Forsythe
history of the subject, as far as I have been also purging go to statements from algo-
able to trace it. rithms submitted to Communications of the
The first programmer who systematically A C M (cf. [53]). Incidentally. the second
began to avoid all labels and go to state- example program at the end of the original
ments was perhaps D. V. Schorre, then of ALGOL 60 report [72] contains four go t o
UCLA. He has written the following account statements, to labels named AA, BB, CC,
of his early experiences [85]: and DD, so it is clear that the advantages of
Since the summer of 1960, I have been writing ALGOL'S control structures weren't fully
programs in outline form, using conventions of perceived in 1960.
indentation to indicate the flow of control. I In 1965, Edsger Dijkstra published the
have never found it necessary to take excep- following instructive remarks [21]:
tion to these conventions by using go state-
ments. I used to keep these outlines as original Two programming department managers from
similarly introduced other statements which more computation an.d aren't really more
provide "equally powerful" alternative ways perspicuous. Therefore, this example has
to jump. been widely quoted in defense of the go to
In other words, it seems that there is wide- statement, and it is appropriate to scrutinize
spread agreement that go to statements are the problem carefully.
harmful, yet programmers and language Let's suppose that we've been forbidden
designers still feel the need for some euphe- to use go to statements, and that we want
mism that "goes to" without saying go to. to do precisely the computation specified in
Example 1 (using the obvious expansion of
A Searching •×ampM such a for statement into assignments and
What are the reasons for this? In [52], Floyd a while iteration). If this means not only
and I gave the following example of a typical that we want the same results, but also that
program for which the ordinary capabilities we want to do the same operations in the
of w h i l e and if statements are inadequate. same order, the mission is impossible. But if
Let's suppose that we want to search a table we are allowed to weaken the conditions
A[1] . . . A[m] of distinct values, in order to just slightly, so that a relation can be tested
find where a given value x appears; if x is not twice in succession (assuming that it will
present in the table, we want to insert it as yield the same result each time, i.e., that it
an additional entry. Let's suppose further has no side-effects), we can solve the problem
that there is another array B, where B[,] as follows:
equals the number of times we have searched
Example la:
for the value A[i]. We might solve such a
problem as follows: i:=1;
w h i l e i < m a n d A[i] # x d o i :-- i + 1 ;
E x a m p l e 1: i f i > m t h e n ra : = i; A[i] : = x; B[i] ::= 0 fi;
B[i] : = B [ i ] + I ;
for i : = 1 s t e p 1 u n t i l m d o .
i f A[i] = x t h e n go t o f o u n d fi; The a n d operation used here stands for
n o t f o u n d : i : = r e + l ; m : = i; McCarthy's sequential conjunction operator
A[i] : = x; B[i] : = 0;
f o u n d : B[i] : = B [ i ] + I ; [62, p. 185]; i.e., "p a n d q" means "if p
t h e n q else false fl", so that q is not evalu-
(In the present article I shall use an ad hoc ated when p is false. Example la will do
programming language that is very similar exactly the same sequence of computations
to ALGOL60, with one exception: the symbol as Example 1, except for one extra compari-
fi is required as a closing bracket for all i f son of i with m (and occasionally one less
statements, so that begin and end aren't computation of m + 1). If the iteration in this
needed between t h e n and else. I don't while loop is performed a large number of
really like the looks of fi at the moment; but times, the extra comparison has a negligible
it is short, performs a useful function, and effect on the running time.
connotes finality, so I'm confidently hoping Thus, we can live without the go to in
that I'll get used to it. Alan Perlis has re- Example 1. But Example la is slightly less
marked that tl is a perfect example of a readable, in my opinion, as well as slightly
cryptic notation that can make program- slower; so it isn't clear what we have gained.
ming unnecessarily complicated for begin- Furthermore, if we had made Example 1
ners; yet I'm more comfortable with fi every more complicated, the trick of going to Ex-
time I write it. I still balk at spelling other ample la would no longer work. For ex-
basic symbols backwards, and so do most of ample, suppose we had inserted another
the people I know; a student's paper con- statement into the for loop, just before the
taining the code fragment "esae; c o m m e n t i f clause; then the relations i _< m and
bletch t n e m m o c ; " is a typical reaction to A[i] -- x wouldn't have been tested consecu-
this trend !) tively, and we couldn't in general have com-
There are ways to express Example 1 bined them with and.
without go to statements, but they require John Cooke told me an instructive story
B(I)
DO; M - I ; A ( I )
B ( I ) ffi 1;
-
= X; B ( I ) ffi 0; END; Efficiency
The ratio of running times (about 6 to 4 in
the first case when n is large) is rather sur-
Solution (a) is best, but since it involves a prising to people who haven't studied pro-
null iteration (with no explicit statements gram behavior carefully. Example 2 doesn't
being iterated) most people came up with look that much more efficient, but it is.
Solution (b). The instructive point is that Experience has shown (see [46], [51]) that
Solution (b) doesn't work; there is a serious most of the running time in non-IO-bound
bug which caused great puzzlement before programs is concentrated in about 3 % of the
the reason was found. Can the reader source text. We often see a short inner loop
spot the difficulty? (The answer appears on whose speed governs the overall program
page 298.) speed to a remarkable degree; speeding up
As I've said, Example 1 has often been the inner loop by 10 % speeds up everything
used to defend the go to statement. Un- by almost 10 %. And if the inner loop has 10
fortunately, however, the example is totally instructions, a moment's thought will usually
unconvincing in spite of the arguments I've cut it to 9 or fewer.
stated so far, because the method in Example My own programming style has of course
1 is almost never a good way to search an changed during the last decade, according to
array for x ! The following modification to the the trends of the times (e.g., I'm not quite so
data structure makes the algorithm much tricky anymore, and I use fewer go to's),
better: but the major change in my style has been
Example 2: due to this inner loop phenomenon. I now
look with an extremely jaundiced eye at
A[mq-1] := x; i := 1; every operation in a critical inner loop, seek-
w h i l e A[i] ~ ~cdo i := i+1; ing to modify my program and data struc-
i f i > m then m := i; B[i] := 1;
e l s e B[i] := B [ i ] + I fi;
ture (as in the change from Example 1 to
Example 2) so that some of the operations
Example 2 beats Example 1 because it can be eliminated. The reasons for this ap-
makes the inner loop considerably faster. If proach are that: a) it doesn't take long, since
we assume that the programs have been the inner loop is short; b) the payoff is real;
handcoded in assembly language, so that the and c) I can then afford to be less efficient
values of i, m, and x are kept in registers, in the other parts of my programs, which
and if we let n be the final value of i at the therefore are more readable and more easily
end of the program, Example 1 will make written and debugged. Tools are being
6n + 10 ( + 3 if not found) references to developed to make this critical-loop identifi-
memory for data and instructions on a cation job easy (see for example [46] and
typical computer, while the second program [82]).
will make only 4n + 14 (+6'if not found).• Thus. if I hadn't seen how to remove one
If, on the other hand, we assume that these of the operations from the loop in Example I
by changing to Example 2. I would probably the same viewpoint should prevail in soft-
(at least) have made the for loop run from ware engineering~ Of course I wouldn't
m to 1 instead of from 1 to m, since it's bother making such optimizations on a one-
usually easier to test for zero than to com- shot job, but when it's a question of prepar-
pare with m. And if Example 2 were really ing quality programs, I don't want to re-
critical, I would improve on it still more by strict myself to tools that deny me such
"doubling it up" so that the machine code efficiencies.
would be essentially as follows. There is no doubt that the grail of effi-
E x a m p l e 2a:
ciency leads to abuse. Programmers waste
enormous amounts of time thinking about,
A [ m + l ] : = x; i : = 1; g o t o t e s t ; or worrying about, the speed of noncritical
loop: i := i+2; parts of their programs, and these attempts
test: i f A[i] = x t h e n g o t o f o u n d fi;
i f A [ i + I ] ~ x t h e n g o t o loop fi;
at efficiency actually have a strong negative
i := i+1; impact when debugging and maintenance are
f o u n d : i f i > m t h e n m : = i; B[i] : = 1; considered. We should forget about small
e l s e B[i] : = B [ i ] + I fi; efficiencies, say about 97% of the time: pre-
Here the loop variable i increases by 2 on mature optimization is the root of all evil.
each iteration, so we need to do that opera- Yet we should not pass up our opportuni-
tion only half as often as before; the rest of ties in that critical 3 %. A good programmer
the code in the loop has essentially been will not be lulled into complacency by such
duplicated to make this work. The running reasoning, he will be wise to look carefully
time has now been reduced to about 3.5n + at the critical code; but only after that code
14.5 or 8.5n + 23.5 under our respective has been identified. It is often a mistake to
assumptions--again this is a noticeable make a priori judgments about what parts
saving in the overall running speed, if, say, of a program are really critical, since the
the average value of n is about 20, and if universal experience of programmers who
this search routine is performed a million or have been using measurement tools has been
so times in the overall program. Such loop- that their intuitive guesses fail. After work-
optimizations are not difficult to learn and, ing with such tools for seven years, I've be-
as I have said, they are appropriate in just come convinced that all compilers written
a small part of a program, yet they very from now on should be designed to provide
often yield substantial savings. (Of course if all programmers with feedback indicating
we want to improve on Example 2a still what parts of their programs are costing
more, especially for large m, we'll use a more the most; indeed, this feedback should be
sophisticated search technique; but let's supplied automatically unless it has been
ignore that issue, at the moment, since I specificMly turned off.
want to illustrate loop optimization in gen- After a programmer knows which parts of
eral, not searching in particular.) his routines are really important, a trans-
The improvement in speed from Example formation like doubling up of loops will be
2 to Example 2a is only about 12%, and worthwhile. Note that this transformation
many people would pronounce that insig- introduces go to statements--and so do
nificant. The conventional wisdom shared several other loop optimizations; I will re-
by many of today's software engineers calls turn to this point later. Meanwhile I have
for ignoring efficiency in the small; but I to admit that the presence of go to state-
believe this is simply an overreaction to the ments in Example 2a has a negative as well
abuses they see being practiced by penny- as a positive effect on efficiency; a non-
wise-and-pound-foolish programmers, who optimizing compiler will tend to produce
can't debug or maintain their "optimized" awkward code, since the contents of regis-
programs. In established engineering dis- ters can't be assumed known when a label is
ciplines a 12 % improvement, easily obtained, passed. When I computed the running times
is never considered marginal; and I believe cited above by looking at a typical compiler's
output for this example, I found that the Aim+ 1] was already invalid in the previous
improvement in performance was not quite line. Similarly, in Example 1 there van be no
as much as I had expected. range error in the for loop unless a range
error occurred earlier. It seems senseless to
Error Exits have expensive range cheeks in those parts
For simplicity I have avoided a very impor- of my programs that I know are clean.
tant issue in the previous examples, but it In this respect I should mention I-Ioare's
must now be faced. All of the programs we almost persuasive arguments to the contrary
have considered exhibit bad programming [40, p. 18]. He points out quite correctly that.
practice, since they fail to make the neces- the current practice of compiling subscript
sary check that m has not gone out of range. range checks into the machine code while a
In each case before we perform "m := i" we program is being tested, then suppressing the
should precede that operation by a test such checks during production runs, is like a sailor
as who wears his life preserver while training
on land but leaves it behind when he sails[
if m = max then go to memory overflow; On the other hand, that sailor isn't so foolish
where max is an appropriate threshold value. if life vests are extremely expensive, and if he
I left this statement out of the examples is such an excellent swimmer that the chance
since it would have been distracting, but we of needing one is quite small compared with
need to look at it now since it is another the other risks he is taking. In the foregoing
important class of go to statements: an examples we typically are much more cer-
er~vr exit. Such checks on the validity of tain that the subscripts will be in range than
data are very important, especially in soft- that other aspects of our overall program will
ware, and it seems to be the one class of go work correctly. John Coeke observes that
to's that still is considered ugly but neces- time-consuming range checks can be avoided
sary by today's leading reformers. (I wonder by a smart compiler which first compiles the
how Val Schorre has managed to avoid such checks into the program then moves them
go to's during all these years.) out of the loop. Wirth [94] and ttoare
Sometimes it is necessary to exit from [39] have pointed out that a well-designed
several levels of control, cutting across code for statement can permit even a rather
that may even have been written by other simple-minded compiler to avoid most range
programmers; and the most graceful way to checks within loops.
do this is a direct approach with a go to or I believe that range checking should be
its equivalent. Then the intermediate levels used far more often than it currently is, but
of the program can be written under the not everywhere. On the other hand I a m
assumption that nothing will go wrong. really assuming infallible hardware when I
I will return to the subject of error exits say this; surely I wouldn't want to remove
later. the parity check mechanism from the hard-
ware, even under a hypothetical assumption
Subscript Checking that it was slowing down the computation.
In the particular examples given above we Additional memory protection is necessary
can, of course, avoid testing m vs. max if to prevent m y program from harming some-
we have dynamic range-checking on all sub- one else's,and theirs from clobbering mine.
scripts o f A. But this usually aborts the M y arguments are directed towards com-
program, giving us little or no control over piled-in tests, not towards the hardware
the error recovery; so we probably want to mechanisms which are reallj~needed to en-
test m anyway. And ouch, what subscript sure reliability.
checking does to the inner loop execution
times! In Example 2, I will certainly want to Hash Coding
suppress range-checking in the while clause Now let's move on to another example, based
since its subscript can't be out of range unless on a standard hashing technique but other-
- ~ , ~ , . ~ 4 d ~ . ~ : , - " i ~ z : ~ . : ~ . ¢ ~ ! "¸¸ •
272 • DonaldE. Knuth
for each machine in place of the old-fash- weeks ago I decided to choose an algorithm
ioned structureless assemblers that still pro- at random from my books, to study its use
liferate. of go to statements. The very first example
On the other hand I'm not really un- I encountered [54, Algorithm 6.2.3C] turned
happy that MIxAL programs appear in my out to be another case where existing pro-
books, because I believe that MIXAL is a gramming languages have no good substitute
good example of a "quick and dirty assem- for go to's. In simplified form, the loop
bler", a genre of software which will always where the trouble arises can be written as
be useful in its proper role. Such an assembler follows.
is characterized by language restrictions
E x a m p l e 5:
that make simple one-pass assembly possible,
and it has several noteworthy advantages compare:
when we are first preparing programs for a i f A[i] < x
new machine: a) it is a great improvement t h e n i f L[i] # 0
t h e n i : = L[i]; g o t o compare;
over numeric machine code; b) its rules are e l s e L[i] : = j ; g o t o insert fi;
easy to state; and c) it can be implemented e l s e i f R[i] # 0
in an afternoon or so, thus getting an effi- t h e n i : = R[i]; g o t o c o m p a r e ;
cient assembler working quickly on what e l s e R[i] :-- j ; g o t o insert fi;
may be very primitive equipment. So far I fi;
insert: A[j] := x;
have implemented six such assemblers, at L[j] := 0; R[j] := 01j := j + l ;
different times in my life, for machines or
interpretive systems or microprocessors that This is part of the well-known "tree search
had no existing software of comparable and insertion" scheme, where a binary search
utility; and in each case other constraints tree is being represented by three arrays:
made it impractical for me to take the extra A[i] denotes the information stored at node
time necessary to develop a good, structured number i, and L[i], R[~] are the respective
assembler. Thus I am sure that the concept node numbers for the roots of that node's
of quick-and-dirty-assembler is useful, and left and right subtrees; e m p t y subtrees are
I'm glad to let MIXAL illustrate what one is represented by zero. The program searches
like. However, I also believe strongly that down the tree until finding an empty sub-
such languages should never be improved to tree where x can be inserted; and variable j
the point where they are too easy or too points to an appropriate place to do the
pleasant to use; one must restrict their use insertion. For convenience, I have assumed
to primitive facilities that are easy to imple- in this example that x is not already present
ment efficiently. I would never switch to a in the search tree.
two-pass process, or add complex pseudo- Example 5 has four go to statements, but
operations, macro-facilities, or even fancy the control structure is saved from obscurity
error diagnostics to such a language, nor because the program is so beautifully sym-
would I maintain or distribute such a metric between L and R. I h-low that these
language as a standard programming tool for go to statements can be eliminated by
a real machine. All such ameliorations and introducing a Boolean variable which be-
refinements should appear in a structured comes true when L[i] or R[i] is found to be
assembler. Now that the technology is zero. But I don't want to test this variable
available, we can condone unstructured in the inner loop of my program.
languages only as a bootstrap-like means to
a limited end, when there are strong eco- Systematic Elimination
nomic reasons for not implementing a better A good deal of theoretical work has been
system. addressed to the question of g o t o elimina-
tion, and I shall now try to summarize the
Tree Searching findings and to discuss their relevance.
But, I'm digressing from my subject of go t o S. C. Kleene proved a famous theorem in
elimination in higher level languages. A few 1956 [48] which says, in essence, that the set
But, in general, their technique may cause a living conditions that are much harder to
program to grow exponentially in size; and quantify.
when error exits or other recalcitrant go Probably the worst mistake any one can
to's are present, the resulting programs will make with respect to the subject of g o t o
indeed look rather like the flowchart emula- statements is to assume that "structured-
tor sketched above. programming" is achieved by writing pro-
If such automatic go to elimination grams as we always have and then elimi-
procedures are applied to badly structured nating the go to's. Most go to's shouldn't
programs, we can expect the resulting pro- be there in the first place! What we really
grams to be at least as badly structured. want is to conceive of our program in such
Dijkstra pointed this out already in [23], a way that we rarely even think about g o t o
saying: statements, because the real need for them
hardly ever arises. The language in which we
The exercise to translate an arbitrary flow express our ideas has a strong influence on
diagram more or less mechanically into a
jumpless one, however, is not to be recom- our thought processes. Therefore, Dijkstra
mended. Then the resulting flow diagram [23] asks for more new language features--
cannot be expected to be more transparent structures which encourage clear thinking--
than the original one. in order to avoid the go to's temptations to-
In other words, we shouldn't merely ward complications.
remove go to statements because it's the
fashionable thing to do; the presence or Event Indicotors
absence of go to statements is not really the The best such language feature I know has
issue. The underlying structure of the recently been proposed by C. T. Zahn
program is what counts, and we want only [102]. Since this is still in the experimental
to avoid usages which somehow clutter up stage, I will take the liberty of modifying
the program. Good structure can be expressed his "syntactic sugar" slightly, without
in FORTRAN or COBOL, or even in assembly changing his basic idea. The essential novelty
language, although less clearly and with in his approach is to introduce a new quan-
much more trouble. The real goal is to tity into programming languages, called an
formulate our programs in such a way that event indicator (not to be confused with
they are easily understood. concepts from P L / I or SIMSC~IPT). M y
Program structure refers to the way in current preference is to write his event-
which a complex algorithm is built up from driven construct in the following two general
successively simpler processes. In most forms.
situations this structure can be described
A) l o o p u n t i l (eventh or - . - or {event)s:
very nicely in terms of sequential composi- (statement list)0;
tion, conditionals, simple iterations, and repeat;
with case statements for multiway branches; t h e n (event)l = > (statement list)l;
undisciplined go to statements make pro-
gram structure harder to perceive, and they (event)~ = > (statement list)n;
fi;
are often symptoms of a poor conceptual
formulation. But there has been far too B) b e g i n u n t i l (event)l o r . . . or (event)n;
much emphasis on go to elimination instead (statement list)0;
of on the really important issues; people end;
have a natural tendency to set up all easily then ( e v e n t ) t = > ( s t a t e m e n t list)t;
understood quantitative goal like the aboli- ievent)~ = > (statement list)z;
tion of jumps, instead of working directly fi:
for a qualitative goal like good program
structure. In a similar way, many people There is also a new statement, "(event)",
have set up "zero population growth" as a which means that the designated event has
goal to be achieved, when they really desire occurred: such a statement is allowed only
i
276 • Donald E. Knuth
within (statement lisQ0 of an u n t i l con- This use of events is, in fact, semantically
struct which declares that event. equivalent to a restricted form of go t o
In form (A), (statement list)0 is executed statement, which Peter Landin discussed
repeatedly until control leaves the construct in 1965 [58] before most of us were ready to
• entirely or until one of the named events listen. Landin's device has been reformulated
occurs; in the latter case, the statement by Clint and Hoare [14] in the following
list corresponding to that event is executed. way: Labels are declared at the beginning
The behavior in form (B) is similar, except of each block, just as procedures normally
that no iteration is implied; one of the named are, and each label also has a (label body)
events must have occurred before the e n d just as a procedure has a (procedure body).
is reached. The t h e n . . , fi part may be Within the block whose heading contains
omitted when there is only one event name. such a declaration of label L, the statement
The above rules should become clear go to L according to this scheme means
after looking at what happens when Example "execute the body of L, then leave the
5 above is recoded in terms of this new fea- block". It is easy to see that this is exactly
ture: the form of control provided by Zahn's
Example 5b:
event mechanism, with the (label body)s
replaced by (statement list)s in the t h e n • • •
loop u n t i l left leaf hit or fi postlude and with (event) statements
right leaf hit: corresponding to Landin's go to. Thus,
i f A[i] < x
t h e n i f L[i] # 0 t h e n i := L[i];
Clint and Hoare would have written Ex-
else left leaf hit fi; ample 5b as follows.
else i f R[i] # 0 t h e n i := R[i];
else right leaf hit fi; w h i l e t r u e do
fi; begin label left leaf hit; L[i] := j;
repeat; label right leaf hit; R[i] := j;
t h e n left leaf hit = > L[i] := j; i f A[i] < x
right leaf hit = > R[i] := j; t h e n i f L[i] # 0 t h e n i := L[i];
fi; e l s e go t o left leaf hit fi;
A[j] := x; L[j] := 0; R[j] := 0; j := j + l ; else i f R[/] # 0 t h e n i := R[/];
else go to right leaf hit fi;
Alternatively, using a singleevent, end;
A[j] := x; L[j] := 0; R[j] := 0; j := j + l ;
Example 5c:
I believe the program reads much better in
l o o p u n t i l leaf replaced:
i f A[i] < x
Zahn's form, with the (label body)s set in
t h e n i f L[i] # 0 t h e n i := L[i] the code between that which logically
e l s e L[i] := j; leaf replaced fi; precedes and follows.
e l s e i f R[i] # 0 t h e n i := R[i] Landin also allowed his "labels" to have
e l s e R[i] := j; leaf replaced fi; parameters like any other procedures; this
fi;
repeat; is a valuable extension to Zahn's proposal,
A[j] := x; L[j] :~ O; R[j] := O; j := j + l ; so I shall use events with value parameters
in several of the examples below.
For reasons to be discussed later, Example As Zahn [102] has shown, event-driven
5b is preferable to 5c. statements blend well with the ideas of
It is important to emphasize that the first structured programming by stepwise refine-
line of the construct merely declares the ment. Thus, Examples 1 to 3 can all be cast
event indicator names, and that event into the following more abstract form, using
indicators are not conditions which are being an event "found" with an integer parameter:
tested continually; (event) statements are
simply transfers of control which the com- b e g i n u n t i l found:
piler can treat very efficiently. Thus, in search table for x and
insert it if not present;
Example 5e the statement "leaf replaced" end;
is essentially a go to which jumps out of t h e n found (integer j) = > B[j] := B [ j ] + I ;
the loop. fi;
Thus, each'event corresponds to a particular specified block of code, and the block may
set of assertions about the state of the be dynamically respecified.
program, and the code which follows that Some people have suggested to me that
event takes cognizance of these assertions, events should be called "conditions" instead,
which are rather different from the assertions by analogy with Boolean expressions. How-
in the main body of the construct. (For this ever, that terminology would tend to imply a
reason I prefer Example 5b to Example 5c.) relation which is continually being moni-
Language features allowing multiple exits tored, instead of a happening. By writing
have been proposed by G. V. Bochmann [7], "loop u n t i l yprime is near y: . . . " we seem
and independently by Shigo et al. [86]. These to be saying that the machine should keep
are semantically equivalent to Zahn's pro- track of whether or not y and yprime are
posals, with minor variations; but they nearly equal; a better choice of words would
express such semantics in terms of state- be an event name like "loop u n t i l con-
ments that say "exit to (label)". I believe vergence established:.-." so that we can
Zahn's idea of event indicators is an im- write " i f abs(yprime - y) < epsilon X y
provement on the previous schemes, because t h e n convergence established". An event
the specification of events instead of labels occurs when the program has discovered
encourages a better conception of the pro- that the state of computatioD has changed.
gram. The identifier given to a label is often
an imperative verb like "insert" or "com- Simple Iterations
pare", saying what action is to be done next, So far I haven't mentioned what I believe
while the appropriate identifier for an event is really the most common situation in which
is more likely to be an adjective like "found". go to statements are needed by an ALGOL
The names of .events are very much like the or P L / I programmer, namely a simple
names of Boolean variables, and I believe iterative loop with one entrance and one
this accounts for the popularity of Boolean exit. The iteration statements most often
variables as documentation aids, in spite of proposed as alternatives to go to statements
their inefficiency. have been "while B do S" and "repeat S
Putting this another way, it is much u n t i l B". However, in practice, the itera-
better from a psychological standpoint to tions I encounter very often have the form
write
A: S;
loop untll found • • • ; found; ••• repeat i f B t h e n go t o Z fi;
T; go t o A;
than to write Z:
search: while true do where S and T both represent reasonably
begin ... ; leave search; .-. end.
long sequences of code. If S is empty, we
The l e a v e or e x i t statement is operationally have a while loop, and if T is empty we
the same, but intuitively different, since it have a repeat loop, but in the general case
talks more about the program than about it is a nuisance to avoid the go to state-
the problem. ments.
The P L / I language allows programmer- A typical example of such an iteration
defined ON-conditions, which are similar occurs when S is the code to acquire or
in spirit to event indicators. A programmer generate a new piece of data, B is the test
first executes a statement "ON CONDITION for end of data, and T is the processing of
(identifier) block" which specifies a block that data. Another example is when the code
of code that is to be executed when the preceding the loop sets initial conditions
identified event occurs, and an occurrence for some iterative process; then S is a com-
of that event is indicated by writing SIG- putation of quantities involved in the test
NAL CONDITION (identifier). However, for convergence, B is the test for conver-
the analogy is not very close, since control gence, and T is the adjustment of variables
returns to the statement following the for the next iteration.
SIGNAL statement after execution of the Dijkstra [29] aptly named this a loop
which is performed "n and a half times". our language would provide a single feature
The usual practice for avoiding go to's in which covered all simple iterations without
such loops is either to duplicate the code going to a rather "big" construct like the
for S, writing event-driven scheme. When a programmer
uses the simpler feature he is thereby making
S; while B do begin T; S end; it clear that he has a simple iteration, with
where B is the negation of relation B; or to exactly one condition which is being tested
figure out some sort of "inverse" for T so exactly once each time around the loop.
that "T-i; T" is equivalent to a null state- Furthermore, by providing special syntax
ment, and writing for this common case we make it easier for a
compiler to produce more efficient code,
T-l; repeat T; S u n t i l B; since the compiler can rearrange the machine
or to duplicate the code for B and to make a instructions so that the test appears physi-
redundant test, writing cally at the end of loop. (Many hours of
computer time are now wasted each day
repeat S; i f B then T fi; until B; executing unconditional jumps to the be-
or its equivalent. The reader who studies ginning of loops.)
go to-less programs as they appear in the Ole-Johan Dahl has recently proposed a
literature will find that all three of these syntax which I think is the first real solution
rather unsatisfactory constructions are used to the n -{- ~ problem, He suggests writing
frequently. the general simple iteration defined above as
I discussed this weakness of ALGOL in a
loop; S; while B: T; repeat;
letter to Niklaus Wirth in 1967, and he
proposed two solutions to the problem, where, as before, S and T denote sequences
together with many other instructive ideas of one or more statements separated by
in an unpublished report on basic concepts semicolons. Note that as in two of our
of programming languages [94]. His first original go to-free examples, the syntax
suggestion was to write refers to condition B which represents
repeat begin S; when B exit; T; end; staying in the iteration, instead of condition
B which represents exiting; and this may
and readers who remember 1967 will also be the secret of its success.
appreciate his second suggestion, Dahl's syntax may not seem appropriate
turn on begin S; when B drop out; T; end. at first, but actually it reads well in every
example I have tried, and I hope the reader
Neither set of delimiters was felt to be will reserve judgment until seeing the ex-
quite right, but a modification of the first amples in the rest of this paper. One of the
proposal (allowing one or more single-level nice properties of his syntax is that the word
exit statements within repeat b e g i n . . . repeat occurs naturally at the end of a loop
end) was later incorporated into an experi- rather than at its beginning, since we read
mental version of the ALGOL W language. the actions of the program sequentially.
Other languages such as BCPL and BLISS As we reach the end, we are instructed to
incorporated and extended the exit idea, as repeat the loop, instead of being informed
mentioned above. Zahn's construction now that the text of the loop (not its execution)
allows us to write, for example, has ended. Furthermore, the above syntax
loop until all data exhausted: avoids ALGOL'S use of the word do (and
S; also the more recent unnatural delimiter
if B then all data exhausted fi; od); the word do as used in ALGOL has
T; never sounded quite right to native speakers
repeat;
of English, it has always been rather quaint
and this is a better syntax for the n + for us to say "do read (A[i])" or "do begln"!
problem than we have had previously. Another feature of Dahl's proposals is that
On the other hand, it would be nicest if it is easily axiomatized along the lines
ments. When procedure calls occur in an above simplification makes q resume the
inner loop the overhead can slow a program caller of p. When q ffi p the argument is
down by a factor of two or more. But if we perhaps a bit subtle, but it's all right. (I'm
hand tailor our own implementation of not sure who originated this principle; I
recursion instead of relying on a general recall learning it from Gill's paper [34, p.
mechanism we can usually find worthwhile 183], and then seeing many instances of it in
simplifications, and in the process we occa- connection with top-do~vn compiler organiza-
sionally get a deeper insight into the original tion. Under certain conditions the BLms/ll
algorithm. compiler [101] is capable of discovering this
There has been a good deal published simplification. Incidentally, the converse of
about recursion elimination (especially in the the above principle is also true (see [52]):
work of Barron [4], Cooper [15], Manna and go to statements can always be eliminated
Waldinger [61], McCarthy [62], and Strong by declaring suitable procedures, each of
[88; 91]); but I'm amazed that very little of which calls another as its last action. This
this is about "down to earth" problems. I shows that procedure calls include go t o
have always felt that the transformation statements as a special case; it cannot be
from recursion to iteration is one of the most argued that procedures are conceptually
fundamental concepts of computer science, simpler than go to's, although some people
and that a student should learn it at about have made such a claim.)
the time he is studying data structures. This As a result of applying the above simplifi-
topic is the subject of Chapter 8 in my multi- cation, and adapting it in the obvious way
volume work; but it's only by accident that to the case of a procedure with one parame-
recursion wasn't Chapter 3, since it concep- ter, Example 6 becomes
tually belongs very early in the table of E x a m p l e 6a:
contents. The material just wouldn't fit com-
fortably into any of the earlier volumes; yet procedure treeprint(t); integer t; value ~;
there are many algorithms in Chapters 1-7 L:ift~0
that are recursions in disguise. Therefore it then treeprint(L[t]) ;
print(A[t]) ;
surprises me that the literature on recursion t : = R[t]; g o t o L;
removal is primarily concerned with "baby" fi;
examples like computing factorials or re-
versing lists, instead of with a sturdy toddler But we don't really want that g o t o , so we
like Example 6. might prefer to write the code as follows,
Now let's go to work on the above ex- using Dahl's syntax for iterations as ex-
ample. I assume, of course, that the reader plained above.
knows the standard way of implementing E x a m p l e 6b:
recursion with a stack [20], but I want to
make simplifications beyond this. Rule procedure treeprint(t); integer t; value t;
loop while t ~ 0:
number one for simplifying procedure calls treeprint (L[t]) ;
is: print (A [t]) ;
If the last action of procedure p before it re- t : = R[t];
turns is to call procedure q, simply go t o the repeat;
beginning of procedure q instead.
If our goal is to impress somebody, we
(We must forget for the time being that we might tell them that we thought of Example
don't like go to statements.) It is easy to 6b first, instead of revealing that we got it
confirm the validity of this rule, if, for sim- by straightforward simplification of the
plicity, we assume parameterless procedures. obvious program in Example 6.
For the operation of calling q is to put a re- There is still a recursive call in Example
turn address on the stack, then to execute q, 6b; and this time it's embedded in the pro-
then to resume p at the return address cedure, so it looks as though we have to go
specified, then to resume the caller of p. The to the general stack implementation. How-
ever, the recursive call now occurs in only equally simple way to remove the recursion
one place, so we need not put a return without resorting to something like Example
address on the stack; only the local variable 6c. As I say, it was a shock when I first ran
t needs to be saved on each call. (This is across such an example. Later, Jim Horning
another simplification which occurs fre- confessed to me that he also was guilty, in
quently.) The program now takes the fol- the syntax-table-building program for the
lowing nonrecursive form. XPL system [65, p. 500], because XPL
Example 6c : doesn't allow recursion; see also [56]. Clearly
a now doctrine about sinful go to's is needed ,
p r o c e d u r e treeprint(t); i n t e g e r t; v a l u e t; some sort of "situation ethics".
b e g i n i n t e g e r s t a c k S; S := e m p t y ; The new morality that I propose may
L I : loop w h i l e t ~ 0:
< = t; t := L[t]; go to L1;
perhaps be stated thus: "Certain go t o
L2: t < = S; statements which arise in connection with
print (A[t]) ; well-understood transformations are accept-
t := R[t]; able, provided that the program documenta-
repeat; tion explains what the transformation was."
i f nonempty(S) t h e n g o t o L2 fi;
end. The use of four-letter words like goto can
occasionally be justified even in the best of
Here for simplicity I have extended ALGOL company.
to allow a "stack" data type, where S < = t This situation is very similar to what
means "push t onto S" and t < = S means people have commonly encountered when
"pop the top of S to t, assuming that S is proving a program correct. To demonstrate
nonempty". the validity of a typical program Q, it is
It is easy to see that Example 6c is equiva- usually simplest and best to prove that some
lent to Example 6b. The statement "go t o rather simple but less efficient program P is
L I " initiates the procedure, and control correct and then to prove that P can be
returns to the following statement (labeled transformed into Q by a sequence of valid
L2) when the procedure is finished. Although optimizations. I'm saying that a similar
Example 6c involves go to statements, their thing should be considered standard prac-
purpose is easy to understand, given the tice for all but the simplest software pro-
knowledge that we have produced Example grams: A programmer should create a pro-
6c by a mechanical, completely reliable gram P which is readily understood and
method for removing recursion. Hopkins well-documented, and then he should op-
[44] has given other examples where go t o timize it into a program Q which is very effi-
at a low level supports high-level construc- cient. Program Q may contain go to state-
tions. ments and other low-level features, but the
But if you look at the above program transformation from P to Q should be ac-
again, you'll probably be just as shocked as complished by completely reliable and well-
I was when I first realized what has hap- documented "mechanical" operations.
pened. I had always thought that the use of At this point many readers will say, "But
g o t o statements was a bit sinful, say a he should only write P, and an optimizing
"venial sin"; but there was one kind of g o t o compiler will produce Q." To this I say,
that I certainly had been taught to regard "No, the optimizing compiler would have to
as a mortal sin, perhaps even unforgivable, be so complicated (much more so than any-
namely one which goes into the middle of an thing we have now) that it will in fact be
iteration! Example 6c does precisely that, unreliable." I have another alternative to
and it is perfectly easy to understand Exam- propose, a new class of software which will
ple 6c by comparing it with Example 6b. be far better.
In this particular case we can remove the
go to's without difficulty; but in general Program ManipulationSystems
when a recursive call is embedded in For 15 years or so I have been trying to
~ r "1 ""~'~ a rc~,~,,~ till .~ ~,,,~,~u~u ,., think of how to write a compiler that really
several complex levels of control, there is no produces top quality code. For example,
moved out of their loops; yet we consciously Furthermore, there is a rather simple way
remove them when the running time of the to understand this program, by providing
program is important. suitable "loop invari~nts". At the beginning
Still another type of transformation occurs of the first (outer) loop, suppose the stack
when we go from high-level "abstract" data contents from top tO bottom are t,, . . . , t~
structures to low-level "concrete" ones (see for some n > 0; then the procedure's re-
Hoare's chapter in [17] for numerous ex- maining duty is to accomplish the effect of
amples). In the case of Example 6c, we can
replace the stack by an array and a pointer, treeprint (t) ;
print(A[t,]) ; treeprint(R[t,]) ;
arriving at
E x a m p l e 6d: print(A[tl]) ; treeprint(R[tl]) ; (*)
storing its current value in the program grammer would write for the examples, and
counter, i.e., b y duplicating the program, the other with the object code produced by
letting one part of the text represent t r u e a typical compiler that does only local op-
and the other part false, with jumps be- timizations. The assembly-language pro-
tween the two parts in appropriate places. grammer will keep i, j, v, and up in registers,
Example 7 therefore becomes while a typical compiler will not keep vari-
Example 7a: ables in registers from one statement to
another, except if they happen to be there
i := m;j := n; b y coincidence. Under these assumptions,
v := A[jl; the asymptotic running time for all entire
loop: i f A [ i ] > v
t h e n A[j] := A[i]; g o t o upf fi; Quicksort program based on these routines
upt:i := i+1; will be
w h i l e i < j r e p e a t ; g o t o common; assembled compiled
loop: i f v > A[j]
t h e n A[i] := A[j]; g o t o upt fi; Example 7 202/~N In N 55~6NIn N
upf: j := j--l; Example 7a 1 5 ~ N In N 40N In N
while i < j repeat;
common: A[j] := v;
expressed in memory references to data and
instructions. So Example 7a saves more than
Note that again we have come up with a 25 % of the sorting time.
program which has jumps into the middle of I showed this example to Dijkstra, cau-
iterations, yet we can understand it since we tioning him that the go t o leading into an
know that it came from a previously under- iteration might be a terrible shock. I was
stood program, by way of an understandable extremely pleased to receive his reply [31]:
transformation.
Of course this program is messier than the Your technique of storing the value of up in
the order counter is, of course, absolutely safe.
first, and we must ask again if the gain in I did not faint! I am in no sense "afraid" of a
speed is worth this cost. If we are writing a program constructed that way, but I cannot
sort procedure that will be used many times, consider it beautiful: it is really the same
we will be interested in the speed. The repetition with the same terminating condi-
average running time of Quicksort was tion, that "changes color" as the computation
proceeds.
analyzed b y Hoare in his 1962 paper on the
subject [36], and it turns out that the body He went on to say that he looks forward to
of the loop in Example 7 is performed about the day when machines are so fast t h a t we
2N In N times while the statement up := won't be under pressure to optimize our
false is performed about ~ N In N times, if programs; yet
we are sorting N elements. All other parts of
For the time being I could not agree mare with
the overall sorting program (not shown your closing remarks : if the economies matter,
here) have a running time of order N or less, apply "disciplined optimalization" to a nice
so when N is reasonably large the speed of program, the correctness of which has been
the inner loop governs the speed of the entire established beyond reasonable doubt. Your
sorting process. (Incidentally, a recursive massaging of the program text is then no
longer trickery ad hoe, it is perfectly safe and
version of Quicksort will run just about as sound.
fast, since the recursion overhead is not
part of the inner loop. But in this case the I t is hard for me to express the joy that this
removal of recursion is of great value for letter gave me; it was like having all my
another reason, because it cuts the auxiliary sins forgiven, since I need no longer feel
stack space requirement from order N to guilty about my optimized programs.
order log N.)
Using these facts about inner loop times, Coroutines
we can make a quantitative comparison of Several of the people who read the first draft
Examples 7 and 7a. As with Example 1, it of this paper observed that Example 7a can
seems best to make two comparisons, one perhaps be understood more easily as the
with the assembly code that a decent pro- result of eliminating coroutine linkage instead
though it is slightly cleaner looking than the Each of these programs leads to a Qnick-
method in my book, it is noticeably slower, sort routine that makes about 102~N In N
and we have nothing to fear by using a memory references on the average; the
slightly more complicated method once it former is preferable (except on machines
has been proved correct. Beautiful algo- for which exchanges are clumsy), since it is
rithms are, unfortunately, not always the easier to understand. Thus I learned again
most useful. that I should always keep looking for im-
This is not the end of the Quicksort provements, even when I have a satisfactory
story (although I almost wish it was, since program.
I think the preceding paragraph makes an
important point). After I had shown Ex- Axiomatics of Jumps
ample 8 to my student, Robert Sedgewick, We have now discussed many different
he found a way to modify it, preserving transformations on programs; and there are
the randomness of the sub files, thereby more which could have been mentioned (e.g.,
achieving both elegance and efficiency at the removal of trivial assignments as in [50,
the same'time. Here is his revised program. exercise 1.1-3] or [54, exercise 5.2.1-33]).
E x a m p l e 8a:
This should be enough to establish that a
program-manipulation system will have
i := m-l; j : = n; v : = A[n]; plenty to do.
loop: Some of these transformations introduce
loop: i := i%1; w h i l e A[i] < v repeat;
loop: j : = j - - l ; w h i l e A[j] > v r e p e a t ;
go to statements that cannot be handled
while i < j: very nicely by event:indicators, and in
A[il := : A[jl; general we might expect to find a few pro-
repeat; grams in which go to statements survive.
A[i] : = : A[n]; Is it really a formidable job to understand
(As in the previous example, we assume such programs? Fortunately this is not an
that Aim-1] is defined and < A[n], since insurmountable task, as recent work has
the j pointer might run off the left end.) shown. For many years,: the go to ~tatement
At the beginning of the outer loop the in- has been troublesome in the definition of
variant conditions are now correctness proofs and language semantics;
for example, Hoare and Wirth have pre-
m--l _< i < j < n; sented an axiomatic definition of PASCAL
A[k] < vform-l_< k < i;
A[k] > v for j _< k < n; [41] in which everything but real arithmetic
A[n] = v. and the go to is defined formally. Clint and
Hoare [14] have shown how to extend this
It follows that Example 8a ends with to event-indicator go to's (i.e., those which
A[m]...A[i-1] < v = A[i] _< A [ i + I ] . . . A [ n ] don't lead into iterations or conditionals),
but they stressed that the general case
and m < i < n; hence a valid partition has appears to be fraught with complications.
been achieved. Just recently, however, Hoare has shown
Sedgewick also found a way to improve that there is, in fact, a rather simple way
the inner loop of the algorithm from my to give an axiomatic definition of go t o
book, namely: statements; indeed, he wishes quite frankly
i : = m - - l ; j : = n ; v : = A[n]; that it hadn't been quite so simple. For each
loop: label L in a program, the programmer should
loop: i : = iq-1; w h i l e A[i] < v repeat; state a logical assertion a(L) which is to be
A[j] : = A[i]: true whenever we reach L. Then the axioms
loop: j : = j - - l ; w h i l e h [ j ] > v repeat;
while i < j: {a(L)} go to L {false}
A[il := A[j];
repeat; plus the rules of inference
i f i ~ j then j : = j ~ l ;
A[j] : = v; {~(L)} S{P} t- {a(L)} L:S {P}
are allowed in program proofs, and all and jump on overflow might be
properties of labels and go to's will follow
i f overflow
if the a(L) are selected intelligently. One t h e n overflow : = f a l s e ; g o t o j u m p ;
must, of course, carry out the entire proof else go to no op;
using the same assertion a(L) for each fi;
appearance of the label L, and some choices I still believe that this is the correct way to
of assertions will lead to more powerful write such a program.
results than others. Such situations aren't restricted to in-
Informally, a(L) represents the desired terpreters and simulators, although the
state of affairs at label L; this definition foregoing is a particularly dramatic example.
says essentially that a program is correct if Multiway branching is an important pro-
a(L) holds at L and before all "go to L" gramming technique which is all too often
statements, and that control never "falls replaced by an inefficient sequence of i f
through" a go to statement to the following tests. Peter Naur recently wrote me that he
text. Stating the assertions a(L) is analogous considers the use of tables to control program
to formulating loop invariants. Thus, it is flow as a basic idea of computer science that
not difficult to deal formally with tortuous has been nearly forgotten; but he expects it
program structure if it turns out to be will be ripe for rediscovery any day now. It
necessary; all we need to know is the "mean- is the key to efficiency in all the best; corn-
ing" of each label. priers I have studied.
Some hints of this situation, where one
Reduction of Complication problem reduces to another, have occurred
There is one remaining use of go to for in previous examples of this paper. Thus,
which I have never seen a good replacement, after searching for x and discovering that
and in fact it's a situation where I still it is absent, the "not found" routine can
think go to is the right idea. This situation insert x into the table, thereby reducing the
typically occurs after a program has made a problem to the "found" case. Consider also
multiway branch to a rather large number our decision-table Example 4, and suppose
of different but related cases. A little com- that each period was to be followed by a
putation often suffices to reduce one case to carriage return instead of by an extra space.
another; and when we've reduced one problem Then it would be natural to reduce the
to a simpler one, the most natural thing is post-processing of periods to the return-
for our program to go to the routine which carriage part of the program. In each case, a
solves the simpler problem. go to would be easy to understand.
For example, consider writing an interpre- If we need to find a way to do this without
tive routine (e.g., a microprogrammed saying go to, we could extend Zahn's event
emulator), or a simulator of another com- indicator scheme so that some events are
puter. After decoding the address and fetch- allowed to happen in the t h e n . . , fl part
ing the operand from memory, we do a after we have begun to process other events.
multiway branch based on the operation This accommodates the above-mentioned
code. Let's say the operations include no-op, examples very nicely; but of course it can
add, subtract, jump on overflow, and uncon- be dangerous when misused, since it gives us
ditional jump. Then the subtract routine back all the power of go to. A restriction
might be which allows (statement list)~ to refer to
(event)j only for j > i would be less dan-
operand : = -- operand; g o t o a d d ; gerous.
With such a language feature, we can't
the add routine might be "fall through" a label (i.e., an event indi-
accum := accum -b operand; cator) when the end of the preceding code
tyme : = tyme ...I- 1; is reached; we must explicitly name each
go to no op; event when we go to its routine. ProI~fibiting
"fall through" means forcing a programmer in the same way by different people. Every-
to write "go to common" just before the body knows it is a Good Thing, but as
label "common:" in Example 7a; surpris- McCracken [64] has said, "Few people
ingly, such a change actually makes that would venture a definition. In fact, it is not
program more readable, since it makes the clear that there exists a simple definition as
symmetry plain. Also, the program fragment yet." Only one thing is really clear: Struc-
tured programming is not the process of
s u b t r a c t : operand := - operand; g o t o add;
add: accum : = accum + operand; writing programs and then eliminating their
go to statements. We should be able to
seems to be more readable than if "go to define structured programming without
add" were deleted. It is interesting to referring to go to statements at all; then
ponder why this is so. the fact that go to statements rarely need
to be introduced as we write programs should
follow as a corollary.
3. CONCLUSIONS Indeed, Dijkstra's original article [25]
which gave Structured Programming its
This has been a long discussion, and very name never mentions go to statements at
detailed, but a few points stand out. First, all; he directed attention to the critical
there are several kinds of programming question, "For what program structures can
situations in which go to statements are we give correctness proofs without undue
harmless, even desirable, if we are program- labor, even if the programs get large?" By
ming in ALGOLor PL/I. But secondly, new correctness proofs he explained that he does
types of syntax are being developed that not mean formal derivations from axioms,
provide good substitutes for these harmless he means any sort of proof (formal or in-
go to's, and without encouraging a pro- formal) that is "sufficiently convincing";
grammer to create "logical spaghetti". and a proof really means an understanding.
One thing we haven't spelled out clearly, By program structure he means data struc-
however, is what makes some go to's bad ture as well as contro[strueture.
and others acceptable. The reason is that We understand complex things by sys-
we've really been directing our attentior~ to tematically breaking them into successively
the wrong issue, to the objective question simpler parts and understanding how these
of go to elimination instead of the important parts fit together locally. Thus, we have
subjective question of program structure. different levels of understanding, and each
In the words of John Brown [9], "The act of of these levels corresponds to an abstraction
focusing our mightiest intellectual resources of the detail at the level it is composed from.
on the elusive goal of go to-less programs For example, at one level of abstraction, we
has helped us get our minds off all those deal with an integer without considering
really tough and possibly unresolvable whether it is represented in binary notation
problems and issues with which today's or two's complement, etc., while at deeper
professional programmer would otherwise levels this representation may be important.
have to grapple." By writing this long At more abstract levels the precise value of
article I don't want to add fuel to the con- the integer is not important except as it
troversy about go to elimination, since that relates to other data.
topic has already assumed entirely too much Charles L. Baker mentioned this principle
significance; my goal is to lay that contro- as early as 1957, as part of his 8-page review
versy to rest, and to help direct the discus- [2] of McCracken's first book on program-
sion towards more fruitful channels. ming:
Structured Programming Break the problem into small, self-contained
subroutines, trying at all times to isolate the
The real issue is structured programming, various sections of coding as much as possible
but unfortunately this has become a catch . . . [then] the problem is reduced to many
phrase whose meaning is rarely understood much smaller ones. The truth of this seems
~ ~ ~ ~ .....
292 • Donald E. Knuth
very obvious to experienced coders, yet it is From these remarks it is clear t h a t se-
hard to put across to the newcomer. quential composition, iteration, and condi-
tional statements present syntactic struc-
Abstraction is easily understood in terms tures that the eye can readily assimilate;
of B N F notation. A metalinguistic category but a go t o statement does not. The visual
like (assignment statement) is an abstrac- structure of go t o statements is like that of
tion which is composed of two abstractions flowcharts, except reduced to one dimension
(a (left part list) and an (arithmetic expres- in our source languages. In two dimensions
sion)), each of which is composed of abstrac- it is possible to perceive go t o structure in
tions such as (identifier) or (term), etc. We small examples, but we rapidly lose our
understand the program syntax as a whole ability to understand larger and larger
b y knowing the structural details that relate flowcharts; some intermediate levels of
these abstract parts. The most difficult abstraction are necessary. As an under-
things to understand about a program's graduate, in 1959, I published an octopus
syntax are the identifiers, since their meaning flowchart which I sincerely hope is the most
is passed across several levels of structure. horribly complicated that will ever appear in
If all identifiers of an ALGOL program wer~ print; anyone who believes that flowcharts
changed to random meaningless strings of are the best way to understand a program
symbols, we would have great difficulty is urged to look at this example [49]. (See
seeing what the type of a variable is and also [32, p. 54] for a nice illustration of how
what the program means, but we would go to's make a P L / I program obscure, and
still easily recognize the more local features, see R. Lawrence Clark's hilarious spoof
such as assignment statements, expressions, about linear representation of flowcharts b y
subscripts, etc. (This inability for our eyes means of a " c o m e f r o m statement" [13].)
to associate a type or mode with an identifier I have felt for a long time that a t~dent
has led to what I believe are fundamental for programming consists largely of the
errors of human engineering in the design ability to switch readily from microscopic
of ALGOL 68, but that's another story. M y to macroscopic views of things, i.e., to change
own notation for stacks in Example 6e levels of abstraction fluently. I mentioned
suffers from the same problem; it works in this [55] to Dijkstra, and he replied [29]
these examples chiefly because t is lower with an excellent analysis of the situation:
case and S is upper case.) Larger nested
structures are harder for the eye to see unless I feel somewhat guilty when I have suggested
they are indented, but indentation makes the that the distinction or introduction of "differ-
ent levels of abstraction" allow you to think
structure plain. about only one level at a time, ignoring com-
I t would probably be still better if we pletely the other levels. This is not true. You
changed our source language concept so that are trying to organize your thoughts; that is,
the program wouldn't appear as one long you are seeking to arrange matters in such a
string. John M c C a r t h y says " I find it diffi- way that you can concentrate on some portion,
say with 90% of your conscious thinking, while
cult to believe that whenever I see a tree I the rest is temporarily moved away somewhat
am really seeing a string of symbols." In- towards the background of your mind. But
stead, we should give meaningful names to that is something quite different from "ignor-
the larger constructs in our program that ing completely": you allow yourself tem-
porarily to ignore details, but some overall
correspond to meaningful levels of abstrac- appreciation of what is supposed to be or to
tion, and we should define those levels of come there continues to play a vital role. You
abstraction in one place, and merely use remain alert for little red lamps that suddenly
their names (instead of including the de- start flickering in the corners of your eye.
tailed code) when they are used to build I asked t t o a r e for a short definition of
larger concepts. Procedure names do this, structured programming, and he replied that
but the language could easily be designed it is "the systematic use of abstraction to
so that no action of calling a subroutine is control a mass of detail, and also a means of
implied. documentation which aids program design."
I hope that m y remarks above have made state an overall purpose, for the statement
the abstract concept of abstraction clear; as a whole. !
the second part of Hoare's definition (which We also need well-structured data; i.e.,
was also stressed by Dijkstra in his original as we write the program we should have an
paper [25]) states that a good way to express abstract idea of what each variable means.
the abstract properties of an unwritten piece This idea is also usually describable as an
of program often helps us to write that invariant relation, e.g.,: "m is the number of
program, and to "know" that it is correct items in the table" or "x is the search argu-
as we write it. ment" Or "L[t] is the number of the root
Syntactic structure is just one part of the node of node t's left subtree, or 0 if this
picture, and B N F would be worthless if the subtree is e m p t y " or "the contents of stack
syntactic constructs did not correspond to S are postponed obligations to do such and
semantic abstractions. Similarly, a good such".
program will be composed in such a way Now let's consider the slightly more
that each semantic level of abstraction has a complex case of an event-driven construct.
reasonably simple relation to its constituent This should also correspond to a meaningful
parts. We noticed in our discussion of abstraction, and our examples show what is
Jacopini's theorem that every program can involved: For each event we give an (in-
trivially be expressed in terms of a simple variant) assertion which describes the situa-
iteration which simulates a computer; but tion which must hold when that event
that iteration has to carry the entire be- occurs, and for the loop u n t i l we also give
havior of the program through the loop, so an invariant for the loop. An event statement
it is worthless as a level of abstraction. typically corresponds to an abrupt change
An iteration statement should have a in conditions so t h a t a different assertion
purpose that is reasonably easy to state; from the loop invariant is necessary.
typically, this purpose is to make a certain An error exit can be considered well-
Boolean relation true while maintaining a structured for precisely t h i s \ r e a s o n - - i t
certain invariant condition satisfied by the corresponds to a situation that is~impossible
variables. The Boolean condition is stated according to the local invariant assertions;
in the program, while the invariant should it is easiest to formulate assertions that
be stated in a comment, unless it is easily assume nothing will go ~ o n g , rather than
supplied by the reader. For example, the to make the invariants cover all contin-
invariant in Example 1 is that A[k] ~ x for gencies. When we jump out to an error exit
1 ~ /~ ~ i, and in Example 2 it is the same, we go to another level of abstraction having
plus the additional relation Aim-k 1] = x. different assumptions.
Both of these are so obvious that I didn't As another simple example, consider bi-
bother to mention them; but in Examples nary search in an ordered array using the
6e and 8, I stated the more complicated invariant relation A[i] < x < A[j]:
invariants that arose. In each of those cases
loop while i~l < j;
the program almost wrote itself once the k : = (i+j) + 2;
proper invariant was given. Note that an i f A[k] < x t h e n i :ffi k;
"invariant assertion" actually does vary e l s e i f A [ k ] > x t h e n j :ffi k;
slightly as we execute statements of the]oop, e l s e cannot preserve the invariant fi;
fi;
b u t it comes back to its original form when
repeat;
we repeat the loop.
Thus, an iteration makes a good abstrac- Upon normal exit from this loop, the
tion if we can assign a meaningful invariant conditions i - b l ~ j and A[i] < x < A[3]
describing the local states of affairs as it imply that A[i] < x < A[i-kl], i.e., t h a t x
executes, and if we can describe its purpose is not present. If the program comes to
(e.g., to change one state to another). Simi- "cannot preserve the iinvariant" (because
larly, an i f . - - t h e n -.- else -.- tl state- x = A[k]), it wants to go t o another set of
ment will be a good abstraction if we can assumptions. The event-driven construct
consciously, they won't see the need for go sation, and four computer listings:
to, and the issue will just fade away. On the Frances E. Mien Ralph L. London
other hand, W. W. Peterson told me about Forest Baskett Zohar Manna
his experience teaching P L / I to beginning G. V. Bochmann W. M. McKeeman
programmers: He taught them to use go t o Per Brinch Hansen Harlan D. Mills
R. M. Burstall Peter Naur
only in unusual special cases where i f and Vinton Cerf Kjell Overholt
w h i l e aren't right, but he found [78] t h a t T. E. Cheatham, Jr. James Pe~erson
"A disturbingly large percentage of the John Cocke W. Wesley Peterson
students ran into situations that require Ole-Johan Dahl Mark Rain
go to's, and sure enough, it was often because Peter J. Denning John Reynolds
Edsger Dijkstra Barry K. Rosen
w h i l e didn't work well to their plan, but James Eve E. Satterthwaite, Jr.
almost invariably because their plan was K. Friedenbach D. V. Schorre
poorly thought out." Because of arguments Donald I. Good Jacob T. Schwartz
like this, I'd say we should, indeed, abolish Ralph E. Gorin Richard L. Sites
go t o from the high-level language, at least Leo Guibas Richard Sweet
C. A. R. Hoare Robert D. Tennent
as an experiment in training people to Martin Hopkins Niklaus Wirth
formulate their abstractions more carefully. James J. Homing M. Woodger
This does have a beneficial effect on style, B. M. Leavenworth William A. Wulf
although I would not make such a prohibi- Henry F. Ledgard Charles T. Zaha
tion if the new language features described These people unselfishly devoted hundreds of
above were not available. The question is man-hours to helping me revise the firstdraft; and
whether we should ban it, or educate against I'm sorry that I wasn't able to reconcile all of their
i t ; should we attempt to legislate program interesting points of view. In many places I have
shamelessly used their suggestions without an
morality? In this case I vote for legislation, explicit acknowledgment; this article is virtually
with appropriate legal substitutes in place a joint paper with 30 to 40 co-authors! However,
of the former overwhelming temptations. any mistakes it contains are my own.
A great deal of research must be done if
we're going to have the desired language b y APPENDIX
1984. Control structure is merely one simple
issue, compared to questions of abstract data In order to make some quantitative esti-
structure. I t will be a major problem to keep mates of efficiency, I have counted memory
the total number of language features within references for data and instructions, assum-
tight limits. And we must especially look at ing a multiregister computer without cache
problems of i n p u t / o u t p u t and data for- memory. Thus, each instruction costs one
matting, in order to provide a viable alterna- unit, plus another if it refers to memory;
tive to CoBoL. small constants and base addresses are as-
sumed to be either part of the instruction or
ACKNOWLEDGMENTS present in a register. Here are the code se-
quences developed for the first two examples,
I've benefited from a truly extraordinary amount
of help while preparing this paper. The individuals assuming that a typical assembly-language
named provided me with a total of 144 pages of programmer or a very good optimizing com-
single-spaced comments, plus six hours of conver- piler is at work.
Example 1: r l ~-- 1 1 1
r2 ~ m 2 1
r3 ~ - x 2 1
to test 1 1
loop: A[rl]: r3 2 n-a
t o found i f = 1 n-a
r l ~- r l + l 1 n-1
test: r l : r2 1 n
t o loop i f _4 1 n
notfound: m +-rl 2
A[rl] ~-- r3 2 a
B[rl] ~-- 0 2 a
found: r4 ~-- B[rl] 2 1
r4 ~-- r 4 + l 1 1
B[rl] * - r4 2 1
E x a m p l e 2: r2 ~-- m 2 I
r3 ~--x 2 1
A [ r 2 + l ] ~-- r3 2 1
r l ~--0 1 1
loop: r l ~-- r l + l 1 n
A[rl]: r3 2 n
t o loop i f ~ 1 n
r l : r2 1 1
to found if < 1 1
notfound: m ~-- r l etc. as in Example I.
Example 1: r l ~-- 1 1 1
to test 1 1
iner: r l ¢- i 2 n--1
r l ~-- r 1 + 1 1 n--1
test: rl : m 2 n
t o notfound i f ~ 1 n
i ¢-- r l 2 n-a
r2 ~-- A[rl] 2 nn-a
r2: x 2 n-a
t o found i f -- 1 n-a
t o iner 1 n- 1
notfound: r l ~-- m 2 a
r l ~-- r l T 1 1 a
i 4--rl 2 a
mc-rl 2 a
r l ~--x 2 a
r2¢-i 2 a
A[r2] ~-- r l 2 a
Bit2] ~-- 0 2 a
found: rl ~-- i 2 1
r2 ~-- B[rl] 2 1
r2 ~- r2W1 1 1
B[rl] ~- r2 2 1
[9] BROWN,JOHN R. "In m e m o r i a m . . . " , un- guages, C. Boon [Ed]., ~.nfoteeh State of the
published note, January 1974. Art Report 7, 1972, 217~232.
[10] BRUNO J., AND STIEGLITZ, K. "The expres- [29] DIJKSTRA,E. W. persbnal communication,
sion of algorithms by charts," J. ACM 19, January 3, 1973.
3 (July 1972), 517-525. [30] DIJKSTRA,E. W. personal communication,
[11] BURKHARD,W. A. "Nonrecursive tree tra- November 19, 1973.
versal algorithms," in Proc. 7th Annual [31] DIJKSTRA,E. W. personal communication,
Princeton Conf. on Information Sciences and • January 30, 1974.
Systems, Princeton Univ. Press, Princeton, [32] DONALDSON, JAMES'R. "Structured pro-
N.J., 1973, 403-405. gramming," Datamation 19, 12 (December
[12] CHEATHAM,T. E., JR., ANDWEGBREIT, BEN. 1973), 52-54.
"A laboratory for the study of automating [33] DYLAN, Bos. Blonde on blonde, reeord album
programming," in Proc. A F I P S 1972 Spring produced by Bob John~ston, Columbia Rec-
Joint Computer Conf., Vol. 40, AFIPS Press, ords, New York, March 1966, Columbia C2S
Montvale, N.J., 1972, 11-21. 841.
[13] CLARK,R. LAWRENCE. "A linguistic contri- [34] GILL, STANLEY. "Automatic computing: Its
bution to GOTO-less programming," Data- problems and prizes," Computer J. 8, 3
marion 19, 12 (December 1973), 62-63. (October 1965), 177-189.
[14] GLINT,M., AND HOARE, C. A. R. "Program [35] HENDERSON,P. AND S~OWDON, R. "An ex-
proving: jumps and functions," Acta Infor- periment in structured programming,"
matica 1, 3 (1972), 214-224. B I T 12, 1 (1972),,~8~-5~. ,, _ . ,
[15] COOPER, D. C. "The equivalence of certain [36] HOARE, C. A . R . Quicksort, Computer J.
computations," Computer J. 9, 1 (May 5, 1 (1962), 10-15.
1966), 45-52. [37] HOARE, C. A. R. "An! axiomatic approach
[16] COOPER, D. C. "BShm and Jacopini's re- to computer programming," Comm. ACM
duction of flow charts," Comm. ACM 10, 8 12, 10 (October 1969!, 576-880, 583.
(August 1967), 463, 473. [38] HOARE, C. A. R. 'Proof of a program:
[17] DAHL,O.-J., DIJKSTRA, E. W., AND HOARE, F I N D , " Comm. ACM 14, 1 (January 1971),
C. A. R. Structured programming, Academic 39-45.
Press, London, England, 1972, 220 pp. [39] HOARE, C. A. R. "A note on the for state-
[18] DARLINGTON,J., AND BURSTALL, R. M. "A ment," B I T 12, 3 (1972), 334-341.
system which automatically improves pro- [40] HOARE, C. A. R. "Prospects for a better
grams," in Proc. 8rd Interntl. Conf. on Arti- programming language," in High level lan-
ficial Intelligence, Stanford Univ., Stanford, guages, C. Boon [Ed.], Infotech State of
Calif., 1973, 479-485. the Art Report 7, 1972, 327-343.
[19] DE MARNEFFE, PIERRE-ARNOUL. "Holon [41] HOARE, C. A. R., AND WXRTH,N. "An axio-
programming: A survey," Universite de matic definition of thb programming lan-
Liege, ~ervice Informatique, Liege, Bel- guage PASCAL," Ac~a ln~ormatiea 2, 4
gium, 1973, 135 pp. (1973), 335-355. i
[20] DIJKSTRA, E. W. "Recursive program- [42] HOARE, C. A. R. "Hints for programming
ming," Numerische Mathematik 2, 5 (1960), language design," COmputer Science re-
312-318. port STAN-CS-74-403, Stanford Univ.,
[21] DIJKSTRA,E. W. "Programming considered Stanford, Calif., January 1974, 29 pp.
as a human activity," in Proc. IFIP Con- [43] HOPKINS, M~R~IN E: "Computer aided
gress 1965, North-Holland Publ. Co., Am- software design," in ~oftware engineering
sterdam, The Netherlands, 1965, 213-217. techniques, J. N. Buxton and B. Randell
[22] DIJKSTRA,E. W. "A constructive approach [Eds.] NATO Scientific Affairs Division,
to the problem of program correctness," Brussels, Belgium, 1970, 99-101.
B I T 8, 3 (1968), 174-186. [44] HOPKINS, MARTIN E, "A case for the
[23] DIJKSTRA, E. W. "Go to statement con- GOTO," ProP. ACM Annual Conference
sidered harmful," Comm. ACM l l , 3 (March Boston, Mass., August 1972, 787-790.
1968), 147-148, 538, 541. [45] HULL, T. E. "Would you believe structured
[There are two instances of pages 147-148 FORTRAN?" SIGNUM Newsletter 8, 4
in this volume; the second 147-148 is rele- (October 1973), 13-16. ~
vant here.] [46] INGALLS, DAN. "The execution time pro-
[24] DIJKSTRA,E. W. "Solution of a problem in file as a programming tool," in Compiler
concurrent programming control," Comm. optimization, 2d Courant Computer Sci-
ACM 9, 9 (September 1968), 569. ence Symposium, Randall Rustin [Ed.],
[25] DIJKSTRA, E. W. "Structured program- Prentice-Hall, Englewood Cliffs, N. J.,
ming," in Software engineering techniques, 1972, 107-128.
J. N. Buxton and B. Randell [Eds.] NATO [471 KELLEY,ROBERT A., AND WAVrERS, JOHN
Scientific Affairs Division, Brussels, Bel- R. "APLGOL-2, a structured programming
gium, 1970, 84-88. system for APL," IBM Palo Alto Scientific
[26] DIJKSTRA,E. W. "EWD316: A short intro- Center report 320-3318 i(August 1973), 29 pp.
duction to the art of programming," Tech- [48] KLEENE, S. C. "Representation of events
nical University Eindhoven, The Nether- in nerve nets," in Automata ~tudies, C. E.
lands, August 1971, 97 pp. Shannon and J. McCarthy [Eds.],Princeton
[27] DIJKSTRA, E. W. "The humble program- University Press, Princeton, N.J., 1956, 3-
mer," Comm. ACM 15, 10 (October 1972), 40.
859-866. [49] KNUTH, DONALD E. "RUNCIBLE--Alge-
[28] DIJKSTRA, E. W. "Prospects for a better braie translation on a limited computer,"
programming language," in High level lan- Comm. A C M 2, 11 (November, 1959), 18-21.
[There is a bug in the flowchart. The arc [66] MILLAY, EDnA ST. VINCENT. "Elaine"; el.
labeled "2" from the box labeled "0:" in Bartlett's Familiar Quotations.
the upper left corner should go to the box [67] MILLER, EDWARD F., JR., AND LINDAMOOD,
labeledR~ ffi 8003.] GEORGE E. "Structured programming: top-
[50] KNUTH,DONALDE. Fundamental algorithms, down approach," Datamation 19, 12 (De-
The art of computer programming, Vol. 1, cember 1973)i 55--57.
Addison-Wesley, Reading, Mass. 1968 2d [68] MILLS, H. D~ "Top-down programming in
ed., 1973, 634 pp • large systems," in Debugging techniques in
[51] KNUTH, DONALDE. "An empirical study of large systems, Randall Rustin [Ed.], Pren-
FORTRAN programs,!' Software--Practice tice-Hall, Englewood Cliffs, N. J., 1971, 41-
and Experience 1, 2 (April-June 1971), 105- 55.
133. [69] MILLS, H. D. "Mathematical foundations
[52] KNUTH, DONALDE., AND FLOYD,ROBERT W. for structured programming," report FSC
"Notes on avoiding 'go to' statements," 72-6012, IBM Federal Systems Division,
Information Processing Letters 1, 1 (Febru- Gaithersburg, Md. (February 1972), 62 pp.
ary 1971), 23-31, 177. [70] MILLS, H. D, "How to write correct pro-
[53] KNUTH, DONALD E. "George Forsythe and grams and know i t , " report FSC 73-5008,
the development of Computer Science," IBM Federal Systems Division, Gaithers-
Comm. ACM 15, 8 (August 1972), 721-726. burg, Md. (1973), 26 pp.
[54] KNUTH, DONALD E. Sorting and searching, [71] NASSI, I. R., AND AKKOYUNLU,E. A. "Veri-
The art of computer programming, Vol. 3, fication techniques for a hierarchy of con-
Addison-Wesley, Reading, Mass., 1973, 722 trol structures," Tech. report 26, Dept. of
Computer Science, State Univ. of New
[55] ~NUTH, DONALD E. "A review of 'struc- York, Stony Brook, New York (January
tured programming'," Stanford Computer 1974), 48 pp.
Science Department report STAN-CS-73- [72] NAUR, PETER [Ed.] "Report on the al-
371, Stanford Univ., Stanford, Calif., June gorithmic language ALGOL 60," Comm.
1973, 25 p p - ACM 3, 5 (May 1960), 299-314.
[56] KNUTH, DONALD E., AND SZWARCFITER, [73] NAUR,PETER. "Go to statements and good
JAYME L. "A structured program to gener- Algol style," BIT 3, 3 (1963), 204-208.
ate all topological sorting arrangements," [74] NAUR,PETER. "Program translation viewed
Information Processing Letters 2, 6 (April as a general data processing problem,"
1974) 153-157. Comm. ACM 9, 3 (March 1966), 176--179.
[57] KOSARAJU,S. RAO. "Analysis of structured [75] NAUR, PETER. "An experiment on program
rograms," Proe. Fifth Annual ACM Syrup. development," BIT 12, 3 (1972), 347-365.
heory of Computing, (May 1973), 240-252; {76] PAGER, D. "Some notes on speeding up
also in J. Computer and System Sciences, 9, certain loops by software, firmware, and
3 (December 1974). hard w are means, '~ in" Computers and auto-
[58] LANDIN, P. J. "A correspondence between mata, Jerome Fox lEd.f, John Wiley & Sons,
ALGOL 60 and Church's lambda-notation: 1yew x o r k 1972, 207-213; also in IEEE
part I , " Comm. ACM 8, 2 (February 1965), Trans. Computers, C-21, 1 (January 1972),
89-101. 97-100.
[59] LANDIN, P. J. "The next 700 programming
languages," Comm. ACM 9, 3 (March 1966), [77] PETERSON, W. W.; KASAMI, T.; AND TOK-
157-166. UEA, N. "On the capabilities of w h i l e , re-
[60] LEAVENWORTH, B. M. "Programming p e a t , and exit statements," Comm. ACM
with(out) the GOTO," Proc. ACM Annual 16, 8 (August 1973), 503-512.
Conference, Boston, Mass., August 1972, 782- [78] PETERSON,W. WESLEY. personal communi-
786. cation, April 2, 1974.
[61] MANNA, ZOHAR, AND WALDINGER, RICHARD [79] RAIN, MARK AND HOLAOER, PER. "The
J. "Towards automatic program synthesis," present most recent final word about labels
in Symposium on Semantics of Algorithmic in MARY," Machine Oriented Languages Bul-
Languages, Lecture Notes in Mathematics letin 1, Trondheim, Norway (October 1972),
188, E. Engeler [Ed.], Springer-Verlag, New 18-26.
York, 1971, 270-310. [80] REID, CONSTANCE.Hilbert, Springer-Verlag,
[62] McCARTHY, JOHN. "Reeursive functions New York, 1970, 290 pp.
of symbolic expressions and their compu- [81] REYNOLDS,JOHN. "Fundamentals of struc-
tation by machine, part I , " Comm. ACM 3, tured programming," Systems and Info.
4 (April 1960), 184-195. Set. 555 course notes, Syracuse Univ., Syra-
[63] MCCARTHY,JOHN. "Towards a mathemati- cuse, N.Y., Spring 1973.
cal science of computation," in Proc. IFIP [82] SATTERTHWAITE, E. H. "Debugging tools
Congress 1962, Munich, Germany, North- for high level languages," Software--Practice
Holland Publ. Co., Amsterdam, The Nether- and Experience 2, 3 (July-September 1972),
lands, 1963, 21-28. 197-217.
[64] McCRACKEN, DANIEL D. "Revolution in [83] SCHNECK, P. B., AND ANGEL, ELLINOR. " A
• rogranmaing," Datamation 19, 12 (Decem-
er 1973), 50-52.
FORTRAN to FORTRAN optimizing com-
piler," Computer J. 16, 4 (1973), 322-330.
[65] McKEEMAN, W. M.; HORNING, J. J.; AND [84] SCHORRE, D. V. "META-II--a syntax-di-
WORTMAN, D. B. A compiler generator, rected compiler writing language," Proc.
Prentice-Hall, Englewood Cliffs, N. J., ACM National Conference, Philadelphia,
1970, 527 pp. Pa., 1964, paper D1.3.
Donald E. Knuth
Computer Science Department, Stanford University, Stanford, CA 94305, USA
The author and his associates have been experimenting for the past several years with a program-
ming language and documentation system called WEB. This paper presents WEB by example, and
discusses why the new system appears to be an improvement over previous ones.
TEX as the document formatting language and PAS- tion, so you can get printed output by applying one
CAL as the programming language, but the same prin- more system routine to this file.
ciples would apply equally well if other languages were You can also follow the other branch of Figure 1, by
substituted. Instead of TEX, one could use a language running the TANGLE processor; this is a system program
like Scribe or Troff; instead of PASCAL, one could use that takes the file COB.WEB as input and produces a new
ADA, ALGOL, LISP, COBOL, FORTRAN, APL, C, etc., file COB.PAS as output. Then you run the PASCAL com-
or even assembly language. The main point is that WEB piler, which converts COB.PAS to a binary file COB.REL
is inherently bilingual, and that such a combination of (say). Finally, you can run your program by loading
languages proves to be much more powerful than either and executing COB.REL. The process of “compile, load,
single language by itself. WEB does not make the other and go” has been slightly lengthened to “tangle, com-
languages obsolete; on the contrary, it enhances them. pile, load, and go.”
I naturally chose TEX to be the document formatting
language, in the first WEB system, because TEX is my
own creation;1 I wanted to acquire a lot of experience
in harnessing TEX to a variety of different tasks. I chose C. A COMPLETE EXAMPLE
PASCAL as the programming language because it has
received such widespread support from educational in- Now it’s time for me to stop presenting general plat-
stitutions all over the world; it is not my favorite lan- itudes and to move on to something tangible. Let us
guage for system programming, but it has become a look at a real program that has been written in WEB.
“second language” for so many programmers that it The numbered paragraphs that follow are the actual
provides an exceptionally effective medium of commu- output of a WEB file that has been “woven” into a doc-
nication. Furthermore WEB itself has a macro-processing ument; a computer has also generated the indexes that
ability that makes PASCAL’s limitations largely irrele- appear at the program’s end. If my claims for the ad-
vant. vantages of literate programming have any merit, you
Document formatting languages are newcomers to should be able to understand the following description
the computing scene, but their use is spreading rapidly. more easily than you could have understood the same
Therefore I’m confident that we will be able to expect program when presented in a more conventional way.
each member of the next generation of programmers to However, I am trying here to explain the format of WEB
be familiar with a document language as well as a pro- documentation at the same time as I am discussing the
gramming language, as part of their basic education. details of a nontrivial algorithm, so the description be-
Once a person knows both of the underlying languages, low is slightly longer than it would be if it were written
there’s no trick at all to learning WEB, because the WEB for people who already have been introduced to WEB.
user’s manual is fewer than ten pages long. Here, then, is the computer-generated output:
A WEB user writes a program that serves as the source
language for two different system routines. (See Fig- Printing primes: An example of WEB . . . . . . . . . . . . . . §1
ure 1.) One line of processing is called weaving the Plan of the program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . §3
web; it produces a document that describes the pro- The output phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . §5
gram clearly and that facilitates program maintenance. Generating the primes . . . . . . . . . . . . . . . . . . . . . . . . . . §11
The other line of processing is called tangling the web; The inner loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . §22
it produces a machine-executable program. The pro- Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . §27
gram and its documentation are both generated from
the same source, so they are consistent with each other. 1. Printing primes: An example of WEB. The
following program is essentially the same as Edsger
Y TEX Y Dijkstra’s “first example of step-wise program composi-
TEX −−−−→ DVI tion,” found on pages 26–39 of his Notes on Structured
WEAVE%
Programming,2 but it has been translated into the WEB
Y language.
WEB
[[Double brackets will be used in what follows to en-
TANGLE
& Y Y close comments relating to WEB itself, because the chief
purpose of this program is to introduce the reader to
PAS −−−−→ REL
PASCAL the WEB style of documentation. WEB programs are al-
ways broken into small sections, each of which has a
Figure 1. Dual usage of a WEB file. serial number; the present section is number 1.]]
Dijkstra’s program prints a table of the first thou-
Let’s look at this process in slightly more detail. Sup- sand prime numbers. We shall begin as he did, by re-
pose you have written a WEB program and put it into ducing the entire program to its top-level description.
a computer text file called COB.WEB (say). To gener- [[Every section in a WEB program begins with optional
ate hardcopy documentation for your program, you can commentary about that section, and ends with optional
run the WEAVE processor; this is a system program that program text for the section. For example, you are now
takes the file COB.WEB as input and produces another file reading part of the commentary in §1, and the program
COB.TEX as output. Then you run the TEX processor, text for §1 immediately follows the present paragraph.
which takes COB.TEX as input and produces COB.DVI as Program texts are specifications of PASCAL programs;
output. The latter file, COB.DVI, is a “device-independent” they either use PASCAL language directly, or they use
binary description of how to typeset the documenta- angle brackets to represent PASCAL code that appears
2 submitted to THE COMPUTER JOURNAL
LITERATE PROGRAMMING
in other sections. For example, the angle-bracket nota- In the documentation below, the notation ‘p[k]’ will
tion ‘h Program to print . . . numbers 2 i’ is WEB’s way of refer to the kth element of array p, while ‘pk ’ will refer
saying the following: “The PASCAL text to be inserted to the kth prime number. If the program is correct, p[k]
here is called ‘Program to print . . . numbers’, and you will either be equal to pk or it will not yet have been
can find out all about it by looking at section 2.” One assigned any value.
of the main characteristics of WEB is that different parts [[Incidentally, our program will eventually make use of
of the program are usually abbreviated, by giving them several more variables as we refine the data structures.
such an informal top-level description.]] All of the sections where variables are declared will
h Program to print the first thousand prime be called ‘h Variables of the program 4 i’; the number
numbers 2 i ‘4’ in this name refers to the present section, which is
the first section to specify the expanded meaning of
2. This program has no input, because we want to ‘h Variables of the program i’. The note ‘See also . . .’
keep it rather simple. The result of the program will be refers to all of the other sections that have the same top-
to produce a list of the first thousand prime numbers, level description. The expanded meaning of ‘h Variables
and this list will appear on the output file. of the program 4 i’ consists of all the program texts for
Since there is no input, we declare the value m = this name, not just the text found in §4.]]
1000 as a compile-time constant. The program itself is
capable of generating the first m prime numbers for any h Variables of the program 4 i ≡
positive m, as long as the computer’s finite limitations p: array [1 . . m] of integer ; { the first m prime
are not exceeded. numbers, in increasing order }
[[The program text below specifies the “expanded mean- See also sections 7, 12, 15, 17, 23, and 24.
ing” of ‘h Program to print . . . numbers 2 i’; notice that This code is used in section 2.
it involves the top-level descriptions of three other sec-
tions. When those top-level descriptions are replaced
by their expanded meanings, a syntactically correct PAS- 5. The output phase. Let’s work on the second
CAL program will be obtained.]] part of the program first. It’s not as interesting as the
problem of computing prime numbers; but the job of
h Program to print the first thousand prime printing must be done sooner or later, and we might as
numbers 2 i ≡ well do it sooner, since it will be good to have it done.
program print primes (output ); [[And it is easier to learn WEB when reading a program
const m = 1000; that has comparatively few distracting complications.]]
h Other constants of the program 5 i Since p is simply an array of integers, there is little
var h Variables of the program 4 i difficulty in printing the output, except that we need to
begin h Print the first m prime numbers 3 i; decide upon a suitable output format. Let us print the
end. table on separate pages, with rr rows and cc columns
This code is used in section 1. per page, where every column is ww character positions
wide. In this case we shall choose rr = 50, cc = 4, and
3. Plan of the program. We shall proceed to fill ww = 10, so that the first 1000 primes will appear on
out the rest of the program by making whatever deci- five pages. The program will not assume that m is an
sions seem easiest at each step; the idea will be to strive exact multiple of rr · cc .
for simplicity first and efficiency later, in order to see
where this leads us. The final program may not be op- h Other constants of the program 5 i ≡
timum, but we want it to be reliable, well motivated, rr = 50; { this many rows will be on each page in
and reasonably fast. the output }
Let us decide at this point to maintain a table that cc = 4; { this many columns will be on each page
includes all of the prime numbers that will be gener- in the output }
ated, and to separate the generation problem from the ww = 10; { this many character positions will be
printing problem. used in each column }
[[The WEB description you are reading once again fol- See also section 19.
lows a pattern that will soon be familiar: A typical This code is used in section 2.
section begins with comments and ends with program
text. The comments motivate and explain noteworthy 6. In order to keep this program reasonably free of no-
features of the program text.]] tations that are uniquely PASCALesque, [[and in order
to illustrate more of the facilities of WEB,]] a few macro
h Print the first m prime numbers 3 i ≡
definitions for low-level output instructions are intro-
h Fill table p with the first m prime numbers 11 i;
duced here. All of the output-oriented commands in
h Print table p 8 i
the remainder of the program will be stated in terms of
This code is used in section 2.
five simple primitives called print string , print integer ,
4. How should table p be represented? Two possi- print entry , new line , and new page .
bilities suggest themselves: We could construct a suffi- [[Sections of a WEB program are allowed to contain
ciently large array of boolean values in which the kth macro definitions between the opening comments and
entry is true if and only if the number k is prime; or the closing program text. The general format for each
we could build an array of integers in which the kth section is actually tripartite: commentary, then defini-
entry is the kth prime number. Let us choose the lat- tions, then program. Any of the three parts may be
ter alternative, by introducing an integer array called absent; for example, the present section contains no
p[1 . . m]. program text.]]
submitted to THE COMPUTER JOURNAL 3
D. E. KNUTH
[[Simple macros simply substitute a bit of PASCAL 10. The first row will contain
code for an identifier. Parametric macros are similar, p[1], p[1 + rr ], p[1 + 2 ∗ rr ], . . . ;
but they also substitute an argument wherever ‘#’ oc-
curs in the macro definition. The first three macro def- a similar pattern holds for each value of the row offset .
initions here are parametric; the other two are simple.]] h Output a line of answers 10 i ≡
begin for c ← 0 to cc − 1 do
define print string (#) ≡ write (#) if row offset + c ∗ rr ≤ m then
{ put a given string into the output file } print entry (p[row offset + c ∗ rr ]);
define print integer (#) ≡ write (# : 1) new line ;
{ put a given integer into the output file, end
in decimal notation, using only as many This code is used in section 9.
digit positions as necessary }
define print entry (#) ≡ write (# : ww ) { like 11. Generating the primes. The remaining task
print integer , but ww character positions is to fill table p with the correct numbers. Let us do
are filled, inserting blanks at the left } this by generating its entries one at a time: Assuming
define new line ≡ write ln { advance to a new line that we have computed all primes that are j or less, we
in the output file } will advance j to the next suitable value, and continue
define new page ≡ page { advance to a new page doing this until the table is completely full.
in the output file } The program includes a provision to initialize the
variables in certain data structures that will be intro-
7. Several variables are needed to govern the output
duced later.
process. When we begin to print a new page, the
variable page number will be the ordinal number of that h Fill table p with the first m prime numbers 11 i ≡
page, and page offset will be such that p[page offset ] is h Initialize the data structures 16 i;
the first prime to be printed. Similarly, p[row offset ] while k < m do
will be the first prime in a given row. begin h Increase j until it is the next prime
[[Notice the notation ‘+ ≡’ below; this indicates that number 14 i;
the present section has the same name as a previous k ← k + 1; p[k] ← j;
section, so the program text will be appended to some end
text that was previously specified.]] This code is used in section 3.
h Variables of the program 4 i +≡ 12. We need to declare the two variables j and k that
page number : integer ; { one more than the number were just introduced.
of pages printed so far } h Variables of the program 4 i +≡
page offset : integer ; { index into p for the first entry j: integer ; { all primes ≤ j are in table p }
on the current page } k: 0 . . m; { this many primes are in table p }
row offset : integer ; { index into p for the first entry 13. So far we haven’t needed to confront the issue of
in the current row } what a prime number is. But everything else has been
c: 0 . . cc ; { runs through the columns in a row } taken care of, so we must delve into a bit of number
theory now.
8. Now that appropriate auxiliary variables have been
By definition, a number is called prime if it is an
introduced, the process of outputting table p almost
integer greater than 1 that is not evenly divisible by
writes itself.
any smaller prime number. Stating this another way,
h Print table p 8 i ≡ the integer j > 1 is not prime if and only if there exists
begin page number ← 1; page offset ← 1; a prime number pn < j such that j is a multiple of pn .
while page offset ≤ m do Therefore the section of the program that is called
begin h Output a page of answers 9 i; ‘h Increase j until it is the next prime number i’ could be
page number ← page number + 1; coded very simply: ‘repeat j ← j +1; h Give to j prime
page offset ← page offset + rr ∗ cc ; the meaning: j is a prime number i; until j prime ’.
end; And to compute the boolean value j prime , the follow-
end ing would suffice: ‘j prime ← true ; for n ← 1 to k do
This code is used in section 3. h If p[n] divides j, set j prime ← false i’.
14. However, it is possible to obtain a much more ef-
9. A simple heading is printed at the top of each page. ficient algorithm by using more facts of number theory.
h Output a page of answers 9 i ≡ In the first place, we can speed things up a bit by rec-
begin print string (´TheÃFirstô); ognizing that p1 = 2 and that all subsequent primes
print integer (m); are odd; therefore we can let j run through odd values
print string (´ÃPrimeÃNumbersÃ−−−ÃPageô); only. Our program now takes the following form:
print integer (page number ); new line ; new line ; h Increase j until it is the next prime number 14 i ≡
{ there’s a blank line after the heading } repeat j ← j + 2;
for row offset ← page offset to page offset + rr − 1 h Update variables that depend on j 20 i;
do h Output a line of answers 10 i; h Give to j prime the meaning: j is a prime
new page ; number 22 i;
end until j prime
This code is used in section 8. This code is used in section 11.
15. The repeat loop in the previous section intro- famous demonstration that there are infinitely many
duces a boolean variable j prime , so that it will not prime numbers is strong enough to prove only that
be necessary to resort to a goto statement. (We are pk+1 <= p1 . . . pk + 1. Advanced books on number
following Dijkstra,2 not Knuth.3 ) theory come to our rescue by showing that much more
h Variables of the program 4 i +≡ is true; for example, “Bertrand’s postulate” states that
j prime : boolean ; { is j a prime number? } pk+1 < 2pk for all k.
h Update variables that depend on ord 21 i ≡
16. In order to make the odd-even trick work, we
square ← p[ord ] ∗ p[ord ]; { at this point ord ≤ k }
must of course initialize the variables j, k, and p[1] as
follows. See also section 25.
statistics-gathering is pointless unless someone is actu- software of such complexity has ever been transported
ally going to use the results. In order to make the in- to so many different machines. It seems likely that TEX
strumentation code optional, I include the word ‘stat’ will soon be operating on all but the smallest of the
just before any special code for statistics, and ‘tats’ world’s computer systems.
just after such code; and I tell WEAVE to regard stat To my surprise, the main bottleneck to portability of
and tats as if they were begin and end. But stat and the TEXware has been the lack of suitable PASCAL com-
tats are actually simple macros. When I do want to pilers, because PASCAL has often been implemented
gather the statistics, I define stat and tats to be null; without system programming in mind. Anybody who
but in a production version of the software, I make stat has a decent PASCAL compiler can install WEB (and all
expand to ‘@{’ and tats expand to ‘@}’, where @{ and @} programs written in WEB) without great difficulty, es-
are special braces that TANGLE does not remove. Thus sentially as follows:
the optional code appears as a harmless comment in 1) Start with the three files WEAVE.WEB, TANGLE.WEB,
the PASCAL program. and TANGLE.PAS. (The programs have not been copy-
WEB’s macros are allowed to have at most one pa- righted, so these files are not difficult to obtain.)
rameter. Again, I did this in the interests of simplicity, 2) Run TANGLE.PAS through your PASCAL compiler to
because I noticed that most applications of multiple pa- get a working TANGLE program.
rameters could in fact be reduced to the one-parameter 3) Check your TANGLE by applying it to TANGLE.WEB;
case. For example, suppose that you want to define your output file should match TANGLE.PAS.
something like 4) Apply your TANGLE to the file WEAVE.WEB, obtaining
WEAVE.PAS; then apply PASCAL to WEAVE.PAS and
mac(#1,#2) == m[#1*r+#2]
you’ll have a working WEAVE system.
which WEB doesn’t permit. You can get essentially the 5) The same process applies to any software written
same result with two one-parameter macros in WEB, notably to TEX itself. (However, you need
fonts and suitable output equipment in order to make
mac_tail(#) == #] proper use of TEX; that may be another bottleneck.)
mac(#) == m[#*r+mac_tail Once you have TEX working, you can apply WEAVE
and TEX to your WEB files, thereby getting program
since, e.g., ‘mac(a)(b)’ will expand into ‘m[a*r+b]’. documents as illustrated above.
Here is another example that indicates some of the
surprising generality of one-parameter macros: Con- Notice that a TANGLE.PAS file is needed in order to get
sider the two definitions this “bootstrapping” process started. If you have just
WEAVE.WEB and TANGLE.WEB, you can’t do the first step.
define two_cases(#)==case j of However, anybody who has looked seriously into the
1:#(1); 2:#(2); end question of software portability will realize that my
define reset_file(#)==reset(file@#) comments in the preceding paragraphs have been over-
simplified. I have glossed over some serious problems
where ‘@&’ in the second definition is the concatenation
that arise: Character sets are different; file naming con-
operation that pastes two texts together. You can now
ventions are different; special conventions are needed to
say
interact with a user’s terminal; data is packed differ-
two_cases(reset_file)
ently on different machines; floating-point arithmetic is
and the resulting PASCAL output will be always nonstandard and sometimes nonexistent; users
want “friendly” interaction with existing programs for
case j of
editing and spooling; etc., etc. Furthermore, many of
1:reset(file1);
the world’s PASCAL compilers are incredibly bizarre.
2:reset(file2);
Therefore it is quite naı̈ve to believe that a single pro-
end
gram TANGLE.PAS could actually work on very many
In other words, the name of one macro can usefully be different machines, or even that one single source file
a parameter to another macro. This particular trick TANGLE.WEB could be adequate; some system-dependent
makes it possible to live with PASCAL compilers that changes are inevitable.
do not allow arrays of files. The WEB system caters to system-dependent changes
in a simple but surprisingly effective way that I ne-
glected to mention when I listed its other features. Both
TANGLE and WEAVE are designed to work with two in-
I. PORTABILITY put files, not just one: In addition to a WEB source
file like TEX.WEB, there is also a “change file” TEX.CH
One of the goals of my TEX research has been to pro- that contains whatever changes are needed to customize
duce portable software, and the WEB system has been TEX for a particular system. (Similarly, the source
extremely helpful in this respect. Although my own files WEAVE.WEB and TANGLE.WEB are accompanied by
work is done on a DEC-10 computer with Stanford’s WEAVE.CH and TANGLE.CH.)
one-of-a-kind operating system, the software developed Here’s how change files work: Each change has the
with WEB has already been transported successfully to a form “replace x1 . . . xm by y1 . . . yn ,” for some m ≥ 1
wide variety of computers made by other manufactur- and n ≥ 0; here xi and yj represent lines in the change
ers (including IBM, Control Data, XEROX, Hewlett- file. The WEAVE and TANGLE programs read data from
Packard), and to a variety of different operating sys- the WEB input file until finding a line that matches x1 ;
tems for those machines. To my knowledge, no other this line, and the m − 1 following lines, are replaced
10 submitted to THE COMPUTER JOURNAL
LITERATE PROGRAMMING
by y1 . . . yn . An error message is given if the m lines top-down and bottom-up were opposing methodologies:
replaced did not match x1 . . . xm perfectly. one more suitable for program exposition and the other
For example, the program PRIMES.WEB invokes a page more suitable for program creation.
procedure to begin a new page; but page was not pres- But after gaining experience with WEB, I have come to
ent in Wirth’s original PASCAL and it is defined rather realize that there is no need to choose once and for all
vaguely in the PASCAL standard. Therefore a system- between top-down and bottom-up, because a program
dependent change may be needed here. A change file is best thought of as a web instead of a tree. A hi-
PRIMES.CH could be made by copying the line erarchical structure is present, but the most important
thing about a program is its structural relationships. A
@d new_page==page complex piece of software consists of simple parts and
simple relations between those parts; the programmer’s
from Figure 2c and specifying one or more appropriate task is to state those parts and those relationships,
replacement lines. in whatever order is best for human comprehension—
The program TANGLE itself contains about 190 sec- not in some rigidly determined order like top-down or
tions, and a typical installation will have to change bottom-up.
about 15 of these. If you want to transport TANGLE
When I’m writing a longish program like TANGLE.WEB
to a new environment, you therefore need to create a
or WEAVE.WEB or TEX.WEB, I invariably have strong feel-
suitable file TANGLE.CH that modifies 15 or so parts of
ings about what part of the whole should be tackled
TANGLE.WEB. (Examples of TANGLE.CH are provided to
next. For example, I’ll come to a point where I need to
all people who receive TANGLE.WEB, so that each imple-
define a major data structure and its conventions, be-
mentor has a model of what to do.) You need to insert
fore I’ll feel happy about going further. My experiences
your changes by hand into TANGLE.PAS, until you have a
have led me to believe that a person reading a program
TANGLE program that works sufficiently well to support
is, likewise, ready to comprehend it by learning its var-
further bootstrapping. But you never actually change
ious parts in approximately the order in which it was
the master file TANGLE.WEB.
written. The PRIMES.WEB example illustrates this prin-
This approach has two important advantages. First,
ciple on a small scale; the decisions that Dijkstra made
the same master file TANGLE.WEB is used by everybody,
as he composed the original program2 appear in the WEB
and it contains the basic logic of TANGLE that really
documentation in the same order.
defines the essence of tangling. The system-dependent
changes do not affect any of the subtle parts of TANGLE’s Top-down programming gives you a strong idea of
control structures or data structures. Second, when the where you are going, but it forces you to keep a lot of
official TANGLE has been upgraded to a newer version, plans in your head; suspense builds up because noth-
a brand new TANGLE.WEB will almost always work with ing is really nailed down until the end. Bottom-up
the old TANGLE.CH, since changes are rarely made to programming has the advantage that you continually
the system-dependent parts. In other words, this dual- wield a more and more powerful pencil, as more and
input-file scheme works when the WEB file is constant more subroutines have been constructed; but it forces
and the CH file is modified, and it also works when the you to postpone the overall program organization until
CH file is constant but the WEB file is modified. the last minute, so you might flounder aimlessly.
Change files were added to WEB about three months When I tear up the first draft of a program and start
after the system was initially designed, based on our over, my second draft usually considers things in almost
initial experiences with people who had volunteered the same order as the first one did. Sometimes the
to participate in portability experiments. We realized “correct” order is top-down, sometimes it is bottom-
about a year later that WEAVE could be modified so that up, and sometimes it’s a mixture; but always it’s an
only the changed parts of a program would (optionally) order that makes sense on expository grounds.
be printed; thus, it’s now possible to document the Thus the WEB language allows a person to express
changes by listing only the sections that are actually programs in a “stream of consciousness” order. TANGLE
affected by the CH file that WEAVE has processed. We is able to scramble everything up into the arrangement
also generalized the original format of CH files, which that a PASCAL compiler demands. This feature of WEB
permitted only changes that extended to the end of a is perhaps its greatest asset; it makes a WEB-written
section. These two important ideas were among the program much more readable than the same program
final enhancements incorporated into WEB83. written purely in PASCAL, even if the latter program is
well commented. And the fact that there’s no need to
be hung up on the question of top-down versus bottom-
up—since a programmer can now view a large program
J. PROGRAMS AS WEBS as a web, to be explored in a psychologically correct
order—is perhaps the greatest lesson I have learned
When I first began to work with the ideas that even- from my recent experiences.
tually became the WEB system, I thought that I would Another surprising thing that I learned while using
be designing a language for “top-down” programming, WEB was that traditional programming languages had
where a top-level description is given first and succes- been causing me to write inferior programs, although I
sively refined. On the other hand I knew that I of- hadn’t realized what I was doing. My original idea was
ten created major parts of programs in a “bottom-up” that WEB would be merely a tool for documentation, but
fashion, starting with the definitions of basic proce- I actually found that my WEB programs were better than
dures and data structures and gradually building more the programs I had been writing in other languages.
and more powerful subroutines. I had the feeling that How could this be?
submitted to THE COMPUTER JOURNAL 11
D. E. KNUTH
Well, imagine that you are writing a small subroutine I usually start the name of a section with an im-
that updates part of a data structure, and suppose that perative verb, but I give a declarative commentary at
the updating takes only one or two lines of code. In the beginning of a section. Thus, PRIMES.WEB says
practical programs, there’s often something that can go ‘8. Now that appropriate . . . h Print table p 8 i ≡ . . . ’;
wrong, if the user’s input is incorrect, so the subroutine I wouldn’t do the opposite and say ‘8. Print the table.
has to check that the input is correct before doing the h Code for printing 8 i ≡ . . . ’.
update. Thus, the subroutine has the general form The name of a section (enclosed in angle brackets)
should be long enough to encapsulate the essential char-
procedure update ; acteristics of the code in that section, but it should not
begin if h input data is invalid i then be too verbose. I found very early that it would be a
h Issue an error message and try to recover i; mistake to include all of the assumptions about local
h Update the data structure i; and global variables in the name of each section, even
end. though such information would strictly be necessary to
isolate that section as an independent module. The
A subtle phenomenon occurs in traditional program- trick is to find a balance between formal and informal
ming languages: While writing the program for ‘h Issue exposition so that a reader can grasp what is happening
an error message and try to recover i’, a programmer without being overwhelmed with detail.5
subconsciously tries to get by with the fewest possible Another lesson I learned early in the game was that
lines of code, since the program for ‘h Update the data the name of a section should explicitly mention any
structure i’ is quite short. If an extensive error recovery nonstandard control structures, even though its data
is actually programmed, the subroutine will appear to structures can often be left implied. Furthermore, if
have error-message printing as its main purpose. But the control flow is properly explained, you can avoid
the programmer knows that the error is really an excep- the usual errors associated with goto statements; such
tional case that arises only rarely; therefore a lengthy statements can safely be introduced in a restrained but
error recovery doesn’t look right, and most program- natural manner.
mers will minimize it (without realizing that they are For example, §14 of the prime-printing example could
doing so) in order to make the subroutine’s appearance be reprogrammed as follows, using ‘loop’ as a macro
match its intended behavior. On the other hand when abbreviation for ‘while true do’:
the same task is programmed with WEB, the purpose
of update can be shown quite clearly, and the possibil- h Increase j until it is the next prime number 14 i ≡
ity of error recovery can be reduced to a mere mention loop begin j ← j + 2;
when update is defined. When another section entitled h Update variables that depend on j 20 i;
‘h Issue an error message and try to recover i’ is subse- h If j is prime, goto found 22 i;
quently written, the whole point of that section is to do end;
the best error recovery, and it becomes quite natural to found :
write a better program as a result.
With this change, §22 could become
This fact—that WEB allows you to let each part of
the program have its appropriate size, without distort- h If j is prime, goto found 22 i ≡
ing the readability of other parts—means that good n ← 2;
programmers find their WEB programs better than their while n < ord do
PASCAL programs, even though their PASCAL programs
begin h If p[n] is a factor of j, goto not found 26 i;
once looked like the work of an expert. n ← n + 1;
end;
goto found ;
not found :
K. STYLISTIC ISSUES
if §26 changes in the obvious way. The resulting pro-
I found that my style of using WEB evolved quite a bit gram will be more efficient on most machines; and I
during the first year. The general format, in which each believe that it is actually easier to read and to write,
section beings with commentary and ends with a formal in spite of the fact that two goto statements appear,
program fragment, is extremely versatile; you have the because the labels have been used with appropriate in-
freedom to say anything you want, yet you must make a terpretations of their abstract significance.
decision about how you’ll do it. I imagine that different Of course, PASCAL makes it difficult to use goto
programmers will converge to quite different styles, but statements, because Wirth decided that labels should
I would like to note down some of the things that have be numeric, and that they should be declared in ad-
seemed to work best for me. vance. If I were to introduce the goto statements as
Consider first the question of macros versus section suggested, I would have to define numeric macros found
names. A named section, like ‘h Issue an error mes- and not found , and I would have to insert ‘label found ,
sage and try to recover i’, is essentially the same as a not found ’ into the program at the right place. Such
parameterless macro; WEB provides both. I prefer to extra work is a bit of a nuisance, but it can be done in
use parameterless macros for “small” things that can WEB without spoiling the exposition.
be embodied in a word or two, but named sections for PASCAL has a few other misfeatures that prove to
longer portions of the program that merit a fuller de- be inconvenient with respect to WEB exposition. The
scription. worst of these is the inability to declare local variables
12 submitted to THE COMPUTER JOURNAL
LITERATE PROGRAMMING
in the midst of a program or procedure. For example, a long time that the programs I construct for publi-
a programmer often finds it most natural to define an cation in a book, or the programs that I construct in
integer variable when a for loop is introduced, but the front of a class, have tended to be comparatively free
rules of PASCAL insist that such a variable be declared of errors, because I am forced to clarify my thoughts as
rather far away from that for loop. My WEB programs I do the programming. By contrast, when writing for
overcome this problem by having sections like ‘h Local myself alone, I have often taken shortcuts that proved
variables for xyzzy i’ whenever there’s a rather lengthy later to be dreadful mistakes. It’s harder for me to fool
procedure ‘xyzzy ’ whose local variables should not be myself in such ways when I’m writing a WEB program,
declared all at once. But when a procedure is short, say because I’m in “expository mode” (analogous to class-
only half a dozen sections long, there’s usually no harm room lecturing) whenever a WEB is being spun. Ergo,
in declaring its local variables in PASCAL style, because less debugging time.
the entire text of the procedure will tend to appear on Now that I am writing all my programs in WEB, an
one or two adjacent pages of the documentation. unforeseen problem has, however, arisen: I suddenly
Another slightly awkward aspect of PASCAL is its have a collection of programs that seem quite beautiful
treatment of semicolons. If you look closely at the in my own eyes, and I have a compelling urge to publish
prime-number example, you’ll see that I had to be a all of them so that everybody can admire these works of
bit careful about where I put semicolons; sometimes art. A nice little 10-page program can easily be written
they occur at the end of the expanded text of a section, and debugged in an afternoon and evening; if I keep
but usually they don’t. With a little self discipline, a accumulating such gems, I’ll soon run out of storage
person can learn to do this quite satisfactorily, but it is space, and my office will be encrusted with webs of my
a nuisance until you get used to it. own making. There is no telling what will happen if
lots of other people catch WEB fever and start foisting
their creations on each other. I can already envision the
appearance of a new journal, to be entitled Webs, for
L. ECONOMIC ISSUES the publication of literate programs; I imagine that it
will have a large backlog and a large group of dedicated
What does it cost to use WEB? Let’s look first at the editors and referees.
lowest level, where computer costs are considered, be-
cause it is easy to make quantitative statements at this
level. The running time to TANGLE a WEB file is approx-
imately the same as the time needed to compile the M. RELATED WORK
resulting PASCAL program; hence the extra preprocess-
ing does not cost much. Similarly, WEAVE doesn’t take Nothing about WEB is really new; I have simply com-
long to produce a file for TEX. However, TEX needs a bined a bunch of ideas that have been in the air for a
comparatively large amount of time to typeset the final long time. I would like to summarize in the next few
document. For example, if we assume that each page paragraphs the things that had the greatest influence
requires four seconds, it will take four minutes to pro- on my thinking as I put those pieces together.
duce a 60-page document. The running time for WEAVE- George Forsythe wrote in 1966 that “A useful algo-
plus-TEX is quite reasonable when you consider that rithm is a substantial contribution to knowledge. Its
your program is effectively being converted into a fairly publication constitutes an important piece of scholar-
substantial booklet; but the costs are sufficiently large ship.”6 His comments have always inspired me to strive
to discourage remaking and reprinting such a booklet for excellence in programming, and they have played a
more than once or twice a day. When a new program is major rôle in shaping my present view that it is worth-
being developed, it is therefore customary to work with while to consider every program as a work of literature.
hardcopy documentation that is slightly obsolete, and The design of WEB was influenced primarily by the pi-
to read the WEB source file itself when up-to-date infor- oneering work of Pierre-Arnoul de Marneffe,7,8 whose
mation is required; the source file is sufficiently easy to research on what he called “Holon Programming” has
read for such purposes. not received the attention it deserves. His work was, in
The costs of WEB are more difficult to estimate at turn, inspired by Arthur Koestler’s excellent treatise on
higher levels, but I have found to my surprise that the the structure of complex systems and organisms;9 thus
total time of writing and debugging a WEB program is we have another connection between programming and
no greater than the total time of writing and debug- literature. A somewhat similar system was indepen-
ging an ALGOL or PASCAL program, even though my dently created by Edwin Towster.10
WEB programs are much better, and even though I am I owe a great debt to Edsger Dijkstra, Tony Hoare,
putting substantially more documentation into the pro- Ole-Johan Dahl, and Niklaus Wirth for opening my eyes
grams. Therefore I have lately been using WEB for all of to the importance of abstraction in the reading and
my programming, even for one-off jobs that I write “for writing of programs, and to Peter Naur for stressing the
my eyes only” just to explore occasional problems. The importance of a balance between formal and informal
extra time I spend in preparing additional commentary methods.
is regained because the debugging time is reduced. Tony Hoare provided a special impetus for WEB when
In retrospect, the fact that a “literate” program takes he suggested in 1978 that I should publish my program
much less time to debug is not surprising, because the for TEX. Since very few large-scale software systems
WEB language encourages a discipline that I was previ- were available in the literature, he had been trying
ously unwilling to impose on myself. I had known for to promote the publication of well-written programs.
submitted to THE COMPUTER JOURNAL 13
D. E. KNUTH
Hoare’s suggestion was actually rather terrifying to me, of exposition at that time; then Ignacio Zabala Salelles
and I’m sure he knew that he was posing quite a chal- gave a DOC a thorough test when he prepared a full im-
lenge. As a professor of computer science, I was quite plementation of TEX in PASCAL. Zabala’s implemen-
comfortable publishing papers about toy problems that tation was successfully transported to many different
could be polished up nicely and presented in an elegant computers,17−20 and this experience was of immense
manner; but I had no idea how to take a piece of real value to me when I cast WEB into its present form in
software, with all the compromises necessary to make it 1981. Since then many significant improvements have
useful to a large class of people on a wide variety of sys- been suggested by my colleague David R. Fuchs, and
tems, and to open it up to public scrutiny. How could I have also benefited from the experiences of a large
a supposedly respectable academic, like me, reveal the number of outstanding people who volunteered to be
way he actually writes large programs? And could a guinea pigs for pre-released versions of TEX. It’s im-
large program be made intelligible? My previous at- possible for me to name everyone who has helped, but
tempts along these lines11 were by now hopelessly out I would like to give special thanks to Arthur Samuel,
of date. I decided that this would be a good time to Howard Trickey, Joe Weening, and Pierre MacKay for
try out de Marneffe’s ideas; furthermore, the TEX sys- important contributions. I’m fortunate indeed to share
tem itself provided me with new tools for printing and a working environment with such stimulating people.
format control, so I suspected that it would be possi- When I originally designed the WEB system, I spent
ble to obtain state-of-the-art documentation by making about six weeks preparing the files TANGLE.WEB and
proper use of typography. WEAVE.WEB, during which time I was continually chang-
It is interesting to reread some of the comments that ing the language and trying different styles of expo-
Tony made ten years ago in his keynote address to the sition. (The programs were neither long nor compli-
first ACM symposium on Principles of Programming cated, but this was rather intensive work, so I didn’t
Languages:12 get much else done during those six weeks. The first
Documentation must be regarded as an integral two weeks were actually spent drafting the first ten per
part of the process of design and coding. A good cent of what is now TEX.WEB.) Then I spent about six
programming language will encourage and assist tedious hours with a text editor, hand-simulating the
the programmer to write clear, self-documenting behavior of TANGLE on TANGLE.WEB, so that I had a
code, and even perhaps to develop and display a program TANGLE.PAS that was ripe for debugging. At
pleasant style of writing. first I had to correct errors both in TANGLE.WEB and
TANGLE.PAS, but soon TANGLE was working well enough
He foresaw many future trends, but not the impending that I needed only TANGLE.WEB as a source file. Then
improvements in typesetting quality: WEAVE.WEB could be tangled and debugged too. The
It is of course possible for a compiler or service total time to create “Version 0” of the WEB system, in-
program to expand the abbreviations, fill in the cluding the language design and the time to debug the
defaults, and make explicit the assumptions. But programs and write a brief manual for users, was about
in practice, experience shows that it is very un- eight weeks; then enhancements were added at the rate
likely that the output of a computer will ever be of about one per month for the next 18 months. As a
more readable than its input, except in such trivial result of this experience I think it’s reasonable to state
but important aspects as improved indentation. that a WEB-like system can be created from scratch in a
Typographic formatting of computer programs has fairly short time, for some other pair of languages be-
a long tradition, originating with ALGOL and its im- sides TEX and PASCAL, by an expert system program-
mediate precursors. I’m not sure who made the first mer who is conversant with both languages. Indeed, I
experiments, but I believe that the lion’s share of the spoke about WEB on a recent visit to London and one
credit for developing excellent programming-language of the people in the audience decided to test this hy-
typography belongs to two people: Peter Naur, who pothesis; shortly afterwards I received an elegant report
edited the ALGOL 60 report13 and gave special care from Harold Thimbleby, who had just constructed an
to its presentation; and Myrtle Kellington, who served excellent system called Cweb, based on Troff/Nroff and
C instead of TEX and PASCAL.21
for many years as executive editor of ACM publica-
tions and set the standards that have been adopted by
other journals. The computing profession owes much to
these people, who made published programs so much
more readable than they would otherwise have been; N. RETROSPECT AND PROSPECTS
the magnitude of their contribution can only be ap-
preciated by people who submit computer programs to Enthusiastic reports about new computer languages,
journals like Acta Arithmetica whose editors are unfa- by the authors of those languages, are commonplace.
miliar with computer science. Bill McKeeman called Hence I’m well aware of the fact that my own experi-
attention to formatting issues when he published Algo- ences cannot be extrapolated too far. I also realize that,
rithm 268, “ALGOL 60 reference language editor,” in whenever I have encountered a problem with WEB, I’ve
1965.14 There has been a flowering of such algorithms simply changed the system; other users of WEB cannot
in recent years; the papers by Oppen15 and by Rose operate under the same ground rules.
and Welsh16 are particularly noteworthy. However, I believe that I have stumbled on a way of
I began to design WEB in the spring of 1979, when programming that produces better programs that are
I constructed a prototype system that was called DOC. more portable and more easily understood and main-
Luis Trabb Pardo helped me to develop a suitable style tained; furthermore, the system seems to work with
14 submitted to THE COMPUTER JOURNAL
LITERATE PROGRAMMING
large programs as well as with small ones. I’m pleased like to write and to explain what they are doing. My
that my work on typography, which began as an appli- hope is that the ability to make explanations more nat-
cation of computers to another field, has come full circle ural will cause more programmers to discover the joys
and become an application of typography to the heart of literate programming, because I believe it’s quite a
of computer science; I like to think of WEB as a neat pleasure to combine verbal and mathematical skills; but
“spinoff” of my research on TEX. However, all of my perhaps I’m hoping for too much. The fact that at least
experiences with this system have been highly colored one paper has been written that is a syntactically cor-
by my own tastes, and only time will tell if a large num- rect ALGOL 68 program22 encourages me to persevere
ber of other people will find WEB to be equally attractive in my hopes for the future. Perhaps we will even one
and useful. day find Pulitzer prizes awarded to computer programs.
I made a conscious decision not to design a language And what about the future of WEB? If the next year or
that would be suitable for everybody. My goal was to so of trial use shows that a lot of other people besides
provide a tool for system programmers, not for high myself become “hooked” on this method of program-
school students or for hobbyists. I don’t have anything ming, there will be many ways to incorporate the WEB
against high school students and hobbyists, but I don’t philosophy into a really effective programming environ-
believe every computer language should attempt to of- ment. For example, it will be worthwhile to produce a
fer all things to all people. A user of WEB needs to unified system that does both tangling and compiling,
be good enough at computer science that he or she is instead of using separate programs as in Figure 1; and
comfortable dealing with several languages simultane- it will also be worthwhile to carry the unification one
ously. Since WEB combines TEX and PASCAL with a few step further, so that run-time debugging as well as syn-
rules of its own, WEB programs can contain WEB syntax tactic debugging can be done entirely in terms of the
errors, TEX syntax errors, PASCAL syntax errors, and WEB source language. Furthermore, a WEB-like system
algorithmic errors; in practice, all four types of errors could be designed to incorporate additional modular-
occur, and a bit of sophistication is needed to sort out ization, so that it would be easier to compile different
which is which. Computer scientists tend to be better parts of a program independently. The new generation
at such things than other people. I have found that of graphic workstations makes it desirable to display se-
WEB programs can be debugged rapidly in spite of the lected program sections on demand, by using TEX only
profusion of languages, but I’m sure that many other on the sections that are of current interest, instead of
intelligent people will find such a task difficult. producing hardcopy for an entire document. And so
In other words, WEB seems to be specifically for the on; a considerable amount of additional research and
peculiar breed of people who are called computer sci- development will be appropriate if the idea of literate
entists. And I’m pretty sure that there are also a lot of programming catches on.
computer scientists who will not enjoy using WEB; some
of us are glad that traditional programming languages Acknowledgements
have comparatively primitive capabilities for inserted
The preparation of this paper was supported in part by the Na-
comments, because such difficulties provide a good ex- tional Science Foundation under grants IST-8201926 and MCS-
cuse for not documenting programs well. Thus, WEB 8300984, and by the System Development Foundation. ‘TEX’ is
may be only for the subset of computer scientists who a trademark of the American Mathematical Society.
REFERENCES
1. D. E. Knuth, The TEXbook. Addison-Wesley, Reading, Mass., 12. C. A. R. Hoare, Hints on Programming Language Design. Stan-
U.S.A. (1983). ford Computer Science Report CS403 (October 1973).
2. O.-J. Dahl, E. W. Dijkstra, and C. A. R. Hoare, Structured Pro- 13. P. Naur (ed.) et al., Report on the algorithmic language ALGOL
gramming. Academic Press, London and New York (1972). 60. Communications of the ACM 3, 299–314.
3. D. E. Knuth, Structured programming with go to statements. 14. W. M. McKeeman, Algorithm 268. Communications of the ACM
Computing Surveys 6, 261–301 (1974). 8, 667–668 (1965).
4. D. E. Knuth, The WEB System of Structured Documentation. 15. D. Oppen, Prettyprinting. ACM Transactions on Programming
Stanford Computer Science Report CS980 (September 1983). Languages and Systems 2, 465–483 (1980).
5. P. Naur, Formalization in program development. BIT 22, 437– 16. G. A. Rose and J. Welsh, Formatted programming languages.
453 (1982). Software—Practice & Experience 11, 651–669 (1981).
6. G. E. Forsythe, Algorithms for scientific computation. Communi- 17. I. Zabala and L. Trabb Pardo, The status of the PASCAL imple-
cations of the ACM 9, 255–256 (1966). mentation of TEX. TUGboat 1, 16–17 (1980).
7. P. A. de Marneffe, Holon Programming. Univ. de Liege, Service 18. I. Zabala, TEX-PASCAL and PASCAL compilers. TUGboat 2 (1),
D’Informatique (December, 1973). 11–12 (1981).
8. P. A. de Marneffe and D. Ribbens, Holon Programming, in A. 19. I. Zabala, Some feedback from PTEX installations. TUGboat 2
Günther et al. (eds.), International Computing Symposium 1973 , (2), 16–19 (1981).
Amsterdam, North-Holland (1974). 20. I. A. Zabala, How portable is PASCAL? Draft of paper in prepa-
9. A. Koestler, The Ghost in the Machine. New York, Macmillan ration (1982).
(1968). 21. H. Thimbleby, Cweb. Preprint, University of York (August 1983).
10. E. Towster, A convention for explicit declaration of environments 22. C. H. Lindsey, ALGOL 68 with fewer tears. The Computer Journal
and top-down refinement of data. IEEE Transactions on Software
15, 176–188 (1972).
Engineering SE–5, 374–386 (1979).
11. D. E. Knuth, Computer-drawn flow charts. Communications of
the ACM 6, 555–563 (1963). Received September 1983
For several years we and many of our colleagues have become more and
more concerned about the fact that libraries are increasingly unable to afford
the prices being charged by commercial publishers of scientific journals. In
October of 2003, Don Knuth wrote a long letter to the editorial board of the
Journal of Algorithms, attempting to explain the current state of affairs as
comprehensively and accurately as he could. His letter has now been posted
online at
http://www-cs-faculty.stanford.edu/~knuth/joalet.pdf
and we hope it will be read also by everyone else who is concerned with
publication of computer-science journals. In response to Knuth’s letter, the
entire editorial board ultimately decided to resign from the Journal of Algo-
rithms in favor of launching a new journal to be called ACM Transactions
on Algorithms (see next page).
Elsevier, the publisher of the Journal of Algorithms, intends to continue
publishing the journal, and papers currently in the pipeline will continue to
be handled by the outgoing editors. Here is Elsevier’s official statement.
The Managing Editors and the Publisher announce that the Ed-
itorial Board of the Journal of Algorithms has resigned per Jan-
uary 1 of this year because of an unresolved dispute concerning
the commercial aspects of scientific publishing. Papers which
have been submitted prior to this date will be refereed in the
usual way and published in the course of this year and next year.
Papers submitted after this date will be forwarded to the new
Editorial Board, which will be appointed shortly. It is expected
that this transition will not result in any additional publication
delay.
1
A New Journal: ACM Transactions on Algorithms
Hal Gabow
2
All Questions Answered
Donald Knuth
On October 5, 2001, at the Technische Universität (1979), the Adelsköld Medal from the Royal Swedish
München, Donald Knuth presented a lecture entitled Academy of Sciences (1994), the Harvey Prize from
“All Questions Answered”. The lecture drew an au- the Technion of Israel (1995), the John von Neumann
dience of around 350 people. This article contains Medal from the Institute of Electrical and Electron-
the text of the lecture, edited by Notices senior ics Engineers (1995), and the Kyoto Prize from the
writer and deputy editor Allyn Jackson. Inamori Foundation (1996). Since 1968 Knuth has
Originally trained as a mathematician, Donald been on the faculty of Stanford University, where
Knuth is renowned for his research in computer sci- he currently holds the title of Professor Emeritus of
ence, especially the analysis of algorithms. He is a The Art of Computer Programming.
prolific author, with 160 entries in MathSciNet. —Allyn Jackson
Among his many books is the three-volume series
The Art of Computer Programming [TAOCP], for Knuth: In every class that I taught at Stanford,
which he received the AMS Steele Prize for Exposi- the last day was devoted to “all questions an-
tion in 1986. The citation for the prize stated that swered”. The students didn’t have to come to class
TAOCP “has made as great a contribution to the if they didn’t want to, but if they did, they could
teaching of mathematics for the present generation ask any question on any subject except religion or
of students as any book in mathematics proper in politics or the final exam. I got the idea from
recent decades.” The long awaited fourth volume is Richard Feynman, who did the same thing in his
in preparation and some parts are available through classes at Caltech, and it was always interesting to
Knuth’s website, http://www-cs-faculty. see what the students really wanted to know. Today
stanford.edu/~knuth/. I’ll answer any question on any subject. Do we
Knuth is the creator of the TE X and METAFONT allow religion or politics? I don’t know. But there
languages for computer typesetting, which have is no final exam to worry about. I’ll try to answer
revolutionized the preparation and distribution of without taking too much time so that we can get a
technical documents in many fields, including math- lot of questions in.
ematics. In 1978 he presented the AMS Gibbs Lecture So, who wants to ask the first question?… Well,
entitled “Mathematical Typography”. The lecture if there are no questions…[Knuth makes as if to
was subsequently published in the Bulletin of the leave.]
AMS [MT]. Question: There was a special report to the Amer-
Knuth earned his Ph.D. in mathematics in 1963 ican president, the PITAC report [PITAC], contain-
from the California Institute of Technology under ing some recommendations. I am wondering
the direction of Marshall Hall. He has received the whether you would be willing to comment on the pri-
Turing Award from the Association for Computing orities outlined in these recommendations:
Machinery (1974), the National Medal of Science better software engineering, building a teraflop
The early origins of mathematics are discussed, One of the ways to help make computer science re-
emphasizing those aspects which seem to be of greatest spectable is to show that it is deeply rooted in history,
interest from the standpoint of computer science. A not just a short-lived phenomenon. Therefore it is natu-
number of old Babylonian tablets, many of which have ral to turn to the earliest surviving documents which
never before been translated into English, are quoted. deal with computation, and to study how people ap-
Key Words and Phrases: history of computation, proached the subject nearly 4000 years ago. Archeo-
Babylonian tablets, sexagesimal number system, sorting logical expeditions in the Middle East have unearthed a
CR Categories: 1.2 large number of clay tablets which contain mathematical
calculations, and we shall see that these tablets give
many interesting clues about the life of early "computer
scientists."
the answer lies somewhere between three and four years. ~-~2 k = 2 n q - ( 2 " - - 1)
k=l
The growth is linear in any one year, so the answer is
and for the sum of a quadratic series
1.24 -- 2 33 20
1.24 _ 1.23 X 12 = 2 q- ~ q- 36---~
kffil ~n k.
months less than four years. This is exactly what was
computed [5, p. 63]. These formulas have not been found in Old-Babylonian
Note that here we have a problem with a nontrivial texts.
iteration, like a "WHILE" clause: The procedure is to Moreover, this same Seleucid tablet shows an in-
form powers of I q- r, where r is the interest rate, until creased virtuosity in calculation; for example, the roots
finding the first value of n such that (1 + r)" >_ 2; then to complicated equations like
calculate
xy= 1, xq-y= 2,0,0,33,20
12((1 -F r)" -- 2)/((1 q- r)" -- (1 -Jr- r)"-1),
(solution: x = 1,0,45 and y = 59,15,33,20) are com-
and the answer is that the original investment will puted. Perhaps this problem was designed to demon-
double in n years minus this many months. strate the use of the new zero symbol.
This procedure suggests that the Babylonians were An extremely impressive example of the Seleucid era
familiar with the idea of linear interpolation. Therefore calculating ability appears in another Louvre Museum
the trigonometric tables in the famous "Plimpton tab- tablet [3, pp. 14-22]. It is a 6-place table of reciprocals,
let" [6, p. 38-41] were possibly used to obtain sines and which begins thus:
cosines in a similar way.
By the power of Anu and Antum, whatever I have made with my
hands, let it remain intact.
Appendix
The 20 additional entries included in Inakibit's table are some-
what mysterious. In 19 of the cases, the number has a reciprocal
with six digits or less; the exception is 3z3 = 2,1,4,8,3,0, 7, whose
reciprocal has 17 sexagesimal digits.
Let's say that a sexagesimal number is a Q-number if it has
six or less digits, while its reciprocal is finite and has more than
six digits and begins with 1 or 2. There are 132 Q-numbers in
all, only 19 of which appear in Inakibit's table. Five of these are
217, 223, 311, 3TM, and 32z; they constitute all Q-numbers of the forms
2~, 3., or 5~, and it is likely that such numbers appeared in special
tables. However, the Q-number 611 is not included, so it is not
simply a matter of perfect powers being included. The three-
digit Q-numbers 2131° and 2239 are excluded, so it not a matter of
including the smallest cases. The Q-numbers which do appear,
besides the five listed above, are 3951, 3105a, 31155; 213951, 2131'52,
213135a (but not 2131554); 31851, 2339, 2731°, 212311, 2183TM, 2203~, 29259,
2'2452. It is perhaps noteworthy that 31153 does not appear, but its
multiple 3u5 ~ does.
Since so many Q-numbers are missing, we may conclude that
Inakibit continued his table by giving the reciprocals of all six-
digit numbers up to 59,43,10, 50, 52,48, not taking advantage of
symmetry. Hence the complete table contained the reciprocals of
at least 721 six-digit numbers, and it probably filled three clay
tablets in all.
References
1°
Aaboe, Asger A. Episodesfrom the Early History of Mathematics.
Random House, New York, 1964, 133 pp.
2.
Knuth, Donald E. Seminumerical Algorithms. Addison-Wesley,
Reading, Mass., 1971 (second printing), 624 pp.
3.
Neugebauer, O. Mathematische keilschrift-texte. In Quellen und
Studien zur Geschichte der Mathematik, Astronomie, und
Physik, Vol. A3, Pt. 1, 1935, 516 pp.
4.
Neugebauer, O. Mathematische keilschrift-texte. In Quellen und
Studien zur Geschichte der Mathematik, Astronomie, und
Physik, Vol. A3, Pt. 2, 1935, 64 pp. plus 69 reproductions of
tablets.
5.
Neugebauer, O. Mathematische keilschrift-texte. In Quellen und
Studien zur Geschiehte der Mathematik, Astronomie, und
Physik, Vol. A3, Pt. 3, 1937, 83 pp. plus 6 reproductions of
tablets.
6.
Neugebauer, O., and Sachs, A. Mathematical Cuneiform Texts.
American Oriental Society, New Haven, Conn., 1945, 177 pp.
plus 49 reproductions of tablets.
7.
Neugebauer, O. The Exact Sciences in Antiquity. Brown U. Press,
Providence, R.I., 1957 (second ed.), 240 pp. plus 14
photographic plates.
8.
Thureau-Dangin, F. Textes Math~matiques Babyloniens. E.J.
Brill, Leiden, The Netherlands, 1938, 243 pp.
9.
van der Waerden, B.L. Science Awakening. Tr. by Arnold Dresden.
P. Noordhoff, Groningen, The Netherlands, 1954, 306 pp.