100 Years of Math Milestones 9781470436520 PDF

100 YEARS OF
MATH MILESTONES
The Pi Mu Epsilon Centennial Collection
Stephan Ramon Garcia Steven J. Miller
100 YEARS OF
MATH MILESTONES
100 YEARS OF
MATH MILESTONES
2010 Mathematics Subject Classification. Primary 00A08, 00A30, 00A35, 05-01, 11-01,
30-01, 54-01, 60-01.
For additional information and updates on this book, visit

www.ams.org/bookpages/mbk-121
Library of Congress Cataloging-in-Publication Data

Names: Garcia, Stephan Ramon, author. | Miller, Steven J., 1974- author.
Title: 100 years of math milestones : the Pi Mu Epsilon centennial collection / Stephan Ramon
Garcia, Steven J. Miller.
Other titles: One hundred years of math milestones | Pi Mu Epsilon centennial collection
Description: Providence, Rhode Island : American Mathematical Society, [2019] | Includes bibli-
ographical references and indexes.
Identifiers: LCCN 2019000982 | ISBN 9781470436520 (alk. paper)
Subjects: LCSH: Mathematics–United States–History. | Pi Mu Epsilon. | AMS: General – General
and miscellaneous specific topics – Philosophy of mathematics. msc | General – General and
miscellaneous specific topics – Methodology of mathematics, didactics. msc | Combinatorics
– Instructional exposition (textbooks, tutorial papers, etc.). msc | Number theory – Instruc-
tional exposition (textbooks, tutorial papers, etc.). msc | Functions of a complex variable –
Instructional exposition (textbooks, tutorial papers, etc.). msc | General topology – Instruc-
tional exposition (textbooks, tutorial papers, etc.). msc | Probability theory and stochastic
processes – Instructional exposition (textbooks, tutorial papers, etc.). msc
Classification: LCC QA27.U5 G37 2019 | DDC 510.9–dc23
LC record available at https://lccn.loc.gov/2019000982
Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for permission
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For
more information, please visit www.ams.org/publications/pubpermissions.
Send requests for translation rights and licensed reprints to reprint-permission@ams.org.

c 2019 by the authors. All rights reserved.
Printed in the United States of America.

∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at https://www.ams.org/
10 9 8 7 6 5 4 3 2 1 24 23 22 21 20 19
Stephan Ramon Garcia dedicates this book to his wife, Gizem Karaali, and
their children, Reyhan and Altay. Thanks also go to his parents for their constant
support and affection.
Steven Miller dedicates this book with thanks to his many colleagues and stu-
dents who assisted in writing this book, to his in-laws Jeffrey and Judy Gelfand for
providing a hospitable environment where many of these entries were written and
edited, and to his friends at Pi Mu Epsilon (especially Harold Reiter, a previous
editor of the Problem Section) for their support of this project.
Contents
Preface xi
Notation xiii
1913. Paul Erdős 1
1914. Martin Gardner 7
1915. General Relativity and the Absolute Differential Calculus 11
1916. Ostrowski’s Theorem 17

1917. Morse Theory, but Really Cantor 21
1918. Georg Cantor 27
1919. Brun’s Theorem 33
1920. Waring’s Problem 39
1921. Mordell’s Theorem 45
1922. Lindeberg Condition 51
1923. The Circle Method 57
1924. The Banach–Tarski Paradox 61
1925. The Schrödinger Equation 67
1926. Ackermann’s Function 71
1927. William Lowell Putnam Mathematical Competition 75

1928. Random Matrix Theory 79
1929. Gödel’s Incompleteness Theorems 85
1930. Ramsey Theory 89
1931. The Ergodic Theorem 95
1932. The 3x + 1 Problem 101
1933. Skewes’s Number 107
1934. Khinchin’s Constant 113
vii
viii CONTENTS
1935. Hilbert’s Seventh Problem 117
1936. Alan Turing 121
1937. Vinogradov’s Theorem 127
1938. Benford’s Law 131
1939. The Power of Positive Thinking 137
1940. A Mathematician’s Apology 141
1941. The Foundation Trilogy 145
1942. Zeros of ζ(s) 151
1943. Breaking Enigma 157
1944. Theory of Games and Economic Behavior 163
1945. The Riemann Hypothesis in Function Fields 169
1946. Monte Carlo Method 175
1947. The Simplex Method 181
1948. Elementary Proof of the Prime Number Theorem 187
1949. Beurling’s Theorem 193
1950. Arrow’s Impossibility Theorem 199

√
1951. Tennenbaum’s Proof of the Irrationality of 2 205
1952. NSA Founded 209
1953. The Metropolis Algorithm 215
1954. Kolmogorov–Arnold–Moser Theorem 221
1955. Roth’s Theorem 227
1956. The GAGA Principle 233
1957. The Ross Program 235
1958. Smale’s Paradox 241
1959. QR Decomposition 247
1960. The Unreasonable Effectiveness of Mathematics 251
1961. Lorenz’s Nonperiodic Flow 257
1962. The Gale–Shapley Algorithm and the Stable Marriage Problem 263
1963. Continuum Hypothesis 269

CONTENTS ix
1964. Principles of Mathematical Analysis 275
1965. Fast Fourier Transform 281
1966. Class Number One Problem 287
1967. The Langlands Program 293
1968. Atiyah–Singer Index Theorem 299
1969. Erdős Numbers 305
1970. Hilbert’s Tenth Problem 311
1971. Society for American Baseball Research 317
1972. Zaremba’s Conjecture 323
1973. Transcendence of e Centennial 329
1974. Rubik’s Cube 333
1975. Szemerédi’s Theorem 339
1976. Four Color Theorem 345
1977. RSA Encryption 351
1978. Mandelbrot Set 357
1979. TEX 363
1980. Hilbert’s Third Problem 369
1981. The Mason–Stothers Theorem 375
1982. Two Envelopes Problem 381
1983. Julia Robinson 385
1984. 1984 391
1985. The Jones Polynomial 395
1986. Sudokus and Look and Say 401
1987. Primes, the Zeta Function, Randomness, and Physics 407
1988. Mathematica 413
1989. PROMYS 421
1990. The Monty Hall Problem 427
1991. arXiv 433
1992. Monstrous Moonshine 439

x CONTENTS
1993. The 15-Theorem 445

1994. AIM 451
1995. Fermat’s Last Theorem 457
1996. Great Internet Mersenne Prime Search (GIMPS) 463
1997. The Nobel Prize of Merton and Scholes 469

1998. The Kepler Conjecture 475
1999. Baire Category Theorem 481
2000. R 487
2001. Colin Hughes Founds Project Euler 493
2002. PRIMES in P 499
2003. Poincaré Conjecture 505
2004. Primes in Arithmetic Progression 511
2005. William Stein Developed Sage 519
2006. The Strong Perfect Graph Theorem 525
2007. Flatland 531
2008. 100th Anniversary of the t-Test 537

2009. 100th Anniversary of Brouwer’s Fixed-Point Theorem 543
2010. Carmichael Numbers 549
2011. 100th Anniversary of Egorov’s Theorem 555
2012. National Museum of Mathematics 561
Index of People 565
Index 571
Preface
In 2013, the second named author had the honor of succeeding Ashley Ahlin
and Harold Reiter as the editor of the Problem Department of the ΠME Journal.
This event essentially coincided with the 100th anniversary of Pi Mu Epsilon, so
Miller thought it would be fun and appropriate to recognize this milestone in some
way. Many others agreed. For example, Mike Pinter, from Belmont University in
Nashville, Tennessee, proposed the base-16 celebratory equation
PMEMATH
+ SOCIETY
HUNDRED
(which was used in the Spring 2014 issue). Many readers submitted correct solu-
tions, the first being Jessica Lehr of Elizabethtown College. We leave the task of
determining all possible solutions as a fun exercise for you.
Being still somewhat young, energetic, and new to the job, while also gravely
worried about finding enough good problems for issue after issue (not yet aware of
the excellent submissions that would consistently arrive), Miller decided to celebrate
with one hundred problems related to important mathematical milestones of the
past century. Since one hundred is a large number of problems relative to the normal
operation of the Problem Department (there are typically five or six problems per
issue), he asked many colleagues for contributions. This resulted in four centennial
articles, which appeared in The Pi Mu Epsilon Journal in 2013–2014 (13 (2013),
no. 9, 513–534; 13 (2014), no. 10, 577–608; 14 (2014), no. 1, 65–99; and 14 (2014),
no. 2, 100–134).
The four articles were well received and there was strong interest in converting
them into a book. The first named author came on board early in the process
as a collaborator. Every entry was either expanded jointly by us from the four
centennial articles or simply written anew. The second option was an essential
step in converting the collection from a series of disjointed problems into a unified
whole. We have used the original descriptions as springboards to introduce a variety
of mathematical ideas, techniques, and applications. Whenever possible, we have
quoted primary sources. Concepts are often introduced early on and then threaded
through and expanded upon in later entries. The final result is a tour through much
of mathematics, with an emphasis on beauty, big ideas, and interesting problems.
There are several influential collections of problems that have motivated and
guided mathematics. Hilbert’s problems and the Clay Millennium Problems are
notable examples. We have a different emphasis here. Pi Mu Epsilon is an un-
dergraduate mathematics honor society and thus, in addition to being important,
the problems must be accessible to students. Although some of them do require
analysis or algebra, number theory or probability, as a whole we hope they will be
xi
xii PREFACE
appealing to energetic and enthusiastic math majors of all stripes. We wanted to

create a collection that would motivate people who are still trying to decide what
to do with their lives, as well as those who already have.
No list can be complete and there are far too many items to celebrate. This book
necessarily misses many old favorites. It is largely a reflection of the personal tastes
and inclinations of the two authors. Accessibility counted far more than importance
in breaking the many ties, and thus the collection below is well represented with
problems that are somewhat recreational but also serve as springboards to great
mathematics.
We thank all the people who have helped us over the last several years. This in-
cludes the problem proposers, James M. Andrews and Avery T. Carr, who helped
edit some of the original collection of problems; Miles C. Fippinger, who helped
with some of the initial organization; and Ben Logsdon, who carefully read an early
draft. We owe particular gratitude to Zachary Glassman, who made numerous
Tikz drawings for some of the earlier entries. We learned many Tikz tricks and
techniques from him, without which many of the remaining illustrations would not
have been possible. In addition, we are greatly indebted to Yo Akiyama, Kather-
ine Blake, Paula Burkhardt-Guim, Max Chao-Haft, Amina Diop, Alexandre Gue-
ganic, Mark Hay, Bjørn Kjos-Hanssen, Forest Kobayashi, Scott Duke Kominers,
Jeffrey Lagarias, David Lee, Clayton Mizgerd, José Muñoz-López, Giebien Na, Carl
Pomerance, Harald Schilly, Zachary Siegel, Lily Shao, William A. Stein, Hong Suh,
Alexander Summers, James Tener, Gabe Udell, and Hunter Wieman for spotting
numerous mistakes, typos, and errors throughout the book or suggesting various
improvements to the text. The first named author also thanks Kathy Sheldon for
her considerable logistical support.
We were fortunate to work with a terrific staff at AMS (Marcia Almeida, Brian
Bartling, John Brady, Sergei Gelfand, Eriko Hironaka, Arlene O’Sean, and Court-
ney Rose), whose tireless efforts from the start of this project years ago to the
careful reading of the final draft greatly enhanced the book before you.
Although we are no longer young or energetic, it has been a fun and enlighten-
ing experience working on so many diverse topics and with so many distinguished
people. Read on, enjoy, and for those of you who someday aspire to be the Problem
Editor for PME, here is some useful advice: start assembling the next hundred
problems today!

Claremont, CA Williams College
May 2, 2019 Williamstown, MA 01267
Carnegie Mellon University
Pittsburgh, PA 15213
May 2, 2019
Notation
• ∅ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . empty set
• |A| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cardinality of a set A
• (. . .)b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . number in base-b
• log x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .base-e logarithm of x
• logb x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . base-b logarithm of x
• a|b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a divides b
• x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . greatest integer function
• gcd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . greatest common divisor
• lcm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . least common multiple
• a ≡ b (mod m) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . congruence modulo m
n
• i=1 ai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . product of a1 , a2 , . . . , an
• N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set {1, 2, 3, . . .} of natural numbers
• Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set {. . . , −2, −1, 0, 1, 2, . . .} of integers
• Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set of rational numbers
• R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set of real numbers
• C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set of complex numbers
• Re z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . real part of the complex number z
• Im z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . imaginary part of the complex number z
• ∼
= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . equinumerosity (p. 28)
• f ∼ g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . asymptotic equivalence (p. 33)
• π(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the number of primes at most x (p. 33)
• Li(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (offset) logarithmic integral of x (p. 107)
xiii
1913
Paul Erdős
Introduction
How many contacts do you have in your cell phone? How many friends do you
have on Facebook? Over the course of his life, Paul Erdős (1913–1996) published
over 1,500 mathematical papers with more than 500 different people. These are
staggering numbers, and it is fitting to begin with a problem related to him. He
worked in many fields, especially in combinatorics and number theory, often using
probabilistic methods. The Erdős number (see the problem from 1969 for more
details) measures a mathematician’s collaborative distance from Erdős; the famous
“Six Degrees of Kevin Bacon” game is based upon it.
Erdős is best known for solving difficult problems and making profound con-
jectures, as opposed to developing new theories. Many conjectures he formulated
remain open. Some have small cash prizes associated with them to attract attention
and encourage further investigation. One of his most famous conjectures deals with
finding arithmetic progressions contained in a given set of integers. An arithmetic
progression is a (finite or infinite) sequence of integers, such as 4, 9, 14, 19, 24, whose
terms differ by a fixed amount.
Let N = {1, 2, 3, . . .} denote the set of natural numbers, let
A = {n ∈ N : n is divisible by a prime congruent to 3 mod 4},
and let B = N\A. The first few primes congruent to 3 modulo 4 are
3, 7, 11, 19, 23, 31, 43, 47, 59, 67, 71, 79, 83, 103, 107, 127, 131, 139, . . . ,
so
A = {3, 6, 7, 9, 11, 12, 14, 15, 18, 19, 21, 22, 23, 24, . . .}
and
B = {1, 2, 4, 5, 8, 10, 13, 16, 17, 20, 25, . . .}.
Examining the numbers that are at most 25, we see A contains a lot of arithmetic
progressions, from short ones of length three (for example, 7, 11, 15) to long ones
of length seven (for example, 3, 6, 9, 12, 15, 18, 21). However, it is harder to find
progressions among the elements of B at most 25. A little work turns up many
of length three (for example, 2, 5, 8 or 4, 10, 16 or 1, 13, 25), but we do not have a
progression as long as seven. What happens if we look at the full sets A and B? Do
you think that there are arithmetic progressions of length for any finite ? Why
might the two sets behave differently?
1
2 1913. PAUL ERDŐS
Centennial Problem 1913

Proposed by Craig Corsi and Steven J. Miller, Williams College.
Erdős suspected that any set of natural numbers that is “not too sparse” con-
tains “lots” of arithmetic progressions. More specifically, he conjectured that if
S ⊆ N and the reciprocal sum
1
s
s∈S
diverges, then S contains arithmetic progressions of any given finite length. Cur-
rently $5,000 is offered for the proof of the conjecture.
1913: Comments
More on the Erdős conjecture. It is important to note that the Erdős
conjecture is not an “if and only if” statement. A set of natural numbers may
contain arbitrarily long arithmetic progressions and have a convergent reciprocal
sum. An example is
1, 10, 11, 100, 101, 102, 1000, 1001, 1002, 1003, 1004, 10001, . . . , 10005, . . . .
Erdős’s conjecture only asserts that a divergent reciprocal sum is sufficient to en-
sure the existence of arbitrarily long arithmetic progressions. It is not a necessary
condition.
Notable progress on Erdős’s problem includes the celebrated Green–Tao the-
orem (see the 2004 entry), which states that the primes contain arbitrarily long
arithmetic progressions. That the sum of the reciprocals of the primes diverges is
an old result of Leonhard Euler (1707–1783); see p. 4 for a proof. Even though
the Green–Tao theorem is a special case of Erdős’s more general conjecture, it is a
profound one. It shows that a set of natural numbers as seemingly erratic as the
primes enjoys some occasional semblance of regularity.
While the proof of the Green–Tao theorem is beyond the scope of this book,
we can look at some famous sequences whose reciprocal sums converge, to see if
Erdős’s conjecture is reasonable. Two well-known examples are
∞ ∞
1 1 π2
= 1 and = ;
n=1
2n n=1
n2 6
see the notes for 1919 for a proof of the second identity.
Suppose there is a three-term arithmetic progression in the powers of 2, say
2a < 2b < 2c . Since the two gaps between the three terms are the same,
2b − 2a = 2c − 2b ,
or equivalently
2b+1 = 2c + 2a = 2a (2c−a + 1).
Since b > a, the left-hand side is divisible by a higher power of 2 than the right-hand
side, a contradiction. Thus, the longest arithmetic progression in this sequence is
of length 2 (which is not impressive). Now for perfect squares.
100TH ANNIVERSARY PROBLEMS 3
Arithmetic progressions of perfect squares. Imagine that we have a

three-term arithmetic progression of perfect squares, say
a2 < b2 < c2 .
Then there is a common difference d so that
a2 = b2 − d and c2 = b2 + d.
Therefore,
a2 + c2 = 2b2 ,
in which a < b < c. If (a, b, c) is one such solution, then so is (ma, mb, mc) for
m = 1, 2, . . .. Thus, we only need to find a single primitive solution, in which
gcd(a, b, c) = 1, in order to conclude that the set of perfect squares has infinitely
many arithmetic progressions of length three. A quick computer search turns up
several primitive solutions, such as (1, 5, 7), (7, 13, 17), and (7, 17, 23), which lead
to the arithmetic progressions
1, 25, 49 with d = 24,
49, 169, 289 with d = 120,
49, 289, 529 with d = 240.
What about quadruples of perfect squares in arithmetic progressions? See the notes
for the 2004 entry for the answer.
Primes in arithmetic progressions. Erdős remarked that one does not have
to believe in a supreme being to be a mathematician, but one had to believe in The
Book , in which is collected the most elegant “aha” proofs of results. Martin Aigner
(1942– ) and Günter M. Ziegler (1963– ) compiled a beautiful approximation of The
Book [1]. The first chapter gives six proofs of the infinitude of primes, including
the shocking topological proof of Hillel Furstenberg (1935– ); see the 1955 entry.
After proving the infinitude of the primes, it is natural to study primes in
arithmetic progressions. This is a fascinating subject and a terrific window into
mathematics. For example, consider the following two statements (recall that two
integers a and b are relatively prime if gcd(a, b) = 1).
(a) Given two relatively prime natural numbers a and b, there is a prime congruent
to b modulo a (that is, there is a prime p such that p − b is a multiple of a).1
(b) Given two relatively prime natural numbers a and b, there are infinitely many
primes congruent to b modulo a.
For instance, if a = 1000 and b = 123, then (a) asserts that there is at least one
prime ending in 123 (one such example is 1123). On the other hand, (b) says that
there are infinitely many primes ending in 123; this is much more difficult to prove.
Except it is not! The only way that we currently know to prove that there exists
a prime congruent to b modulo a is to show there are infinitely many such primes
(and hence there must be at least one). Proving that there are infinitely many
such primes is difficult; Peter Gustav Lejeune Dirichlet (1805–1859) succeeded in
the 1830s by introducing and developing properties of L-functions (generalizations
of the Riemann zeta function; see the 1967 entry).
1 If a and b are not relatively prime, there can be at most one prime congruent to b modulo
a, and this happens precisely when b is prime. Thus, this case is uninteresting.
4 1913. PAUL ERDŐS
Let us look at elementary approaches to finding primes in arithmetic progres-

sions. Euclid’s proof of the infinitude of the primes goes like this. If there are
only finitely many primes p1 , p2 , . . . , pn , then division of N = p1 p2 · · · pn + 1 by
any pi leaves a remainder of 1. Thus, any prime factor of N is not on our list, a
contradiction.
It is natural to ask if this argument extends to arithmetic progressions. Here
is a proof that there are infinitely many primes congruent to 3 (mod 4). If there
are only finitely many (say p1 = 3 and p2 , p3 , . . . , pn ), consider the natural number
M = 4p2 p3 · · · pn + 3. It is not divisible by any prime in our list (nor by 2 or
3). Since any product of primes congruent to 1 (mod 4) is congruent to 1 (mod 4),
it follows that M is divisible by one of the primes p1 , p2 , . . . , pn , a contradiction.
How far can this method be pushed? See [5]; since this paper is hard to find, the
argument was reproduced in [6].
Sum of the reciprocals of the primes. Most mathematicians suspected

that the primes contain arbitrarily long arithmetic progressions. Why? Because
they believed in Erdős’s conjecture and because Euler proved in 1737 that the sum
of the reciprocals of the primes diverges. Their faith was rewarded in 2004 when
Ben Green (1977– ) and Terence Tao (1975– ) proved this special case of Erdős’s
conjecture [3]. While we cannot yet prove Erdős’s conjecture, there is a beautiful
elementary proof of Euler’s result [2] that uses an idea similar to Euclid’s proof of
the infinitude of the primes.
∞Let1 pn denote the nth prime number and suppose toward a contradiction that
n=1 pn converges. Since the tail end of a convergent series tends to zero, let K
be so large that
∞
1 1
< .
pj 2
j=K+1
Let Q = p1 p2 · · · pK and note that none of the numbers

Q + 1, 2Q + 1, 3Q + 1, . . .
is divisible by any of the primes p1 , p2 , . . . , pK . Now observe that
N ∞
∞ m ∞ m
1 1 1
≤ < = 2
n=1
nQ + 1 m=0
pj m=0
2
j=K+1
for N ≥ 1; the reason for the first inequality is due to the fact that the sum in the
middle, when expanded term-by-term, includes every term on the left-hand side.
∞ 1
This is a contradiction, since the series n=1 nQ+1 diverges.
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the 1998
original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin, 2018.
MR3823190
[2] J. A. Clarkson, On the series of prime reciprocals, Proc. Amer. Math. Soc. 17 (1966), 541,
DOI 10.2307/2035210. MR0188132
[3] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of
Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. MR2415379
[4] P. Hoffman, The man who loved only numbers: The story of Paul Erdős and the search for
mathematical truth, Hyperion Books, New York, 1998. MR1666054
[5] M. Ram Murty, Primes in certain arithmetic progressions, Journal of the Madras University
(1988), 161–169.
[6] M. R. Murty and N. Thain, Prime numbers in certain arithmetic progressions, Funct. Approx.
Comment. Math. 35 (2006), 249–259, DOI 10.7169/facm/1229442627. MR2271617
1914
Martin Gardner
Introduction
Few twentieth-century mathematical authors have written on such diverse sub-
jects as Martin Gardner (1914–2010), whose books, numbering over seventy, cover
not only numerous fields of mathematics but also literature, philosophy, pseudo-
science, religion, and magic. He is best known as a recreational mathematician, due
to the accessible and entertaining manner in which he wrote. This is an important
role and should not be overlooked or minimized, as it both draws people to study
mathematics and helps with public awareness and appreciation.
In the introduction to his first book of puzzles, Hexaflexagons, Probability Para-
doxes, and the Tower of Hanoi, he wrote:
There is not much difference between the delight a novice experiences

in cracking a clever brain teaser and the delight a mathematician expe-
riences in mastering a more advanced problem. Both look on beauty
bare—that clean, sharply defined, mysterious, entrancing order that
underlies all structure.
A philosophy major at the University of Chicago, Gardner worked as a reporter, a

yeoman in the Navy, and a writer for a childrens’ magazine before writing his first
article for Scientific American in 1956. The publisher enjoyed the article and asked
Gardner to turn it into a monthly puzzle column. The column ran for over twenty-
five years and spawned fifteen books, reaching and inspiring countless mathematical
hobbyists.
There are many problems Gardner popularized that would make excellent ad-
ditions to this collection. A good problem has to have many features: in addition to
being interesting, there should be something wonderful about the solution, some-
thing that forces you to stop, smile, and marvel at the beauty of the argument.
We decided upon a fun geometry question that meets these criteria. Its solution
highlights a powerful method: passing from simple cases to the general case. Before
getting to the question, let us look at a couple examples of this method.
(a) Most students remember that cos(x + y) involves cos x cos y and sin x sin y, but
do we add them or subtract? Let us suppose that
cos(x + y) = a cos x cos y + b sin x sin y
and refine our guess; as long as the formula is of this general shape we can
determine a and b from special cases. When investigating special cases, try
the simplest. For example, if we take x = y = 0, then we see a = 1. Setting
7
8 1914. MARTIN GARDNER
x = y = π/2 gives −1 = a · 02 + b · 12 , so b = −1. We remembered enough of

the formula to figure out the rest.1
(b) The sum of the zeroth powers of the first n positive integers is n. The sum of
the first powers is
1 + 2 + ···+ n = 1 2
2n + 12 n;
this is a familiar exercise in mathematical induction. In fact, there are similar
formulas for sums of squares, cubes, and so forth. In general,
1k + 2k + · · · + nk = Pk (n), (1914.1)
in which Pk (n) is a polynomial in n of degree k + 1 with rational coefficients.
If you remember that the sum of the first kth powers has this form, then one
can evaluate (1914.1) for n = 0, 1, 2, . . . , k + 1 and solve the resulting linear
equations for the unknown coefficients of Pk (n).
(c) The method of undetermined coefficients from differential equations is another
example of this technique. When confronted with a differential equation, such
as
y (t) + y(t) = 2tet , (1914.2)
one makes an educated guess about the form of a particular solution yp (t).
Frequently, our guess has several undetermined coefficients (hence the name)
that we try to find. In (1914.2), we have the unknown function and its second
derivative on the left-hand side and a linear polynomial times an exponential
on the right. This suggests that we substitute yp (t) = atet + bet into (1914.2)
and attempt to solve for the constants a, b. In this case, solving the resulting
linear equations for a and b yields the particular solution yp (t) = tet − et .
(d) We end with an example from complex analysis. If you have never seen this
before, it is a great way to discover the Cauchy–Riemann equations. Consider
a complex function f (z), in which z = x + iy and i2 = −1. We can write
f (z) = u(x, y) + iv(x, y) for two real-valued functions u and v. The partial
derivatives of u and v enjoy a simple relationship of the form ux = avy and
uy = bvx (where ux = ∂u/∂x and similarly for the rest), with one of a, b equal
to 1 and the other equal to −1. The difficulty is remembering where the minus
sign goes. To figure out the correct signs, take f (z) = z 2 . For this function,
f (z) = (x + iy)2 = (x2 − y 2 ) + i(2xy);
thus, u(x, y) = x2 − y 2 and v(x, y) = 2xy. Going through the calculations, we
see that ux = 2x and vy = 2x, so ux = vy . A similar calculation shows that
uy = −vx .
The idea above can be used to do more than just recover a forgotten formula;
it can help us discover something new. If we can show that a solution has to have
a certain form, then we can often determine the answer by investigating a special
case. This idea comes into play in our problem, where it turns out that a large class
of configurations all lead to the same answer. Thus, if we can solve the problem for
1 Along the same lines as the dictum on p. 187, the use of complex numbers provides a more
powerful method. Euler’s formula implies that cos(x + y) + i sin(x + y) = ei(x+y) = eix eiy =
(cos x + i sin x)(cos y + i sin y) = (cos x cos y − sin x sin y) + i(cos x sin y + sin x cos y). Compare real
and imaginary parts to obtain the addition formulas for cosine and sine.
Figure 1. The sphere with the removed cylinder.
the simplest configuration, then we can solve it for all configurations! Of course,
it is often hard to show that all the different possibilities lead to the same answer
and that we need only deal with one case. Fortunately, this idea is still useful even
if we cannot prove the equivalence since we can use it as a starting point to guess
the correct solution.

Proposed by Byron Perpetua, Williams College.
The following problem, which he popularized in the 1950s, is classic Gardner:
easily stated and solvable without advanced techniques, yet challenging and surpris-
ing. Take a solid sphere and drill a cylindrical hole 6 inches long through its center
(this means that the height of the cylinder is 6 inches; the caps on the bottom and
top, which are removed from the sphere when we drill our hole, are not counted);
see Figure 1. What is the remaining volume of the sphere? One approach is tedious
and slow; the other is clever and skips several computations. Hint: although the
problem seems to be missing necessary information, it would not be posed unless
it had a unique solution. While some effort is required to prove that all possible
realizations lead to the same answer, there is a particularly simple case that you
can solve by inspection.
1914: Comments
Solution to the problem. If we have a rough idea of the answer, checking a
special case can help us determine it precisely. Let us use this idea to attack the
problem from Gardner’s column. We therefore assume that the answer is indepen-
dent of the radius of the given sphere since that information is not given to us.
What would be a good choice for the radius of the sphere? An excellent option is
to have the diameter of the sphere equal 6, so the volume of the removed cylinder
is zero! If instead of choosing the diameter to be 6 we considered the general case,
we would have to argue as in Figure 2. This is certainly possible, but it is not fun.
10 1914. MARTIN GARDNER
√
R2 − 9
R−3
R
3
Figure 2. Analysis of the removed cylinder from the sphere.

Some calculus will yield the correct answer, but there is a simpler
way!
Of course, the difficulty of this problem is proving that the answer is indepen-
dent of the radius of the initial sphere. However, if you are willing to accept this
fact (which is implicit in the formulation of the problem), we just need to find the
answer in one special case. We might as well choose the case that is the simplest.
This is a truly powerful method and it is well worth mastering.
Bibliography
[1] M. Gardner, Hexaflexagons, probability paradoxes, and the Tower of Hanoi, New Martin Gard-
ner Mathematical Library, vol. 1, Cambridge University Press, Cambridge; Mathematical As-
sociation of America, Washington, DC, 2008. Martin Gardner’s first book of mathematical
puzzles and games; second edition of The Scientific American book of mathematical puzzles
and diversions. MR2444876
[2] E. Peres, Martin Gardner: the mathematical jester, Mathematical lives, Springer, Berlin, 2011,
pp. 217–220, DOI 10.1007/978-3-642-13606-1 31. MR2743951
[3] J. J. O’Connor and E. F. Robertson, Martin Gardner, MacTutor History of Mathematics,
http://www-history.mcs.st-and.ac.uk/Biographies/Gardner.html.
1915
General Relativity
and the Absolute Differential Calculus
Introduction
Gregorio Ricci-Curbastro (1853–1925) developed a branch of mathematics
known as the absolute differential calculus in his study of geometrical quantities and
physical laws that are invariant under general coordinate transformations. The con-
cept of a tensor first appeared in Ricci’s work, although a restricted form of tensors
had been previously introduced in vector analysis. In 1901, Ricci and his student,
Tullio Levi-Civita (1873–1941), published a complete account of the methods of
absolute differential calculus and their applications [12]. Their work was a natural
extension of the mathematics of curved surfaces introduced by Gauss and devel-
oped by Riemann and others, and of the vector analysis developed by Gibbs and
Heaviside.
Albert Einstein’s special theory of relativity deals with the study of the dy-
namics of matter and light in frames of reference that move uniformly with respect
to each other, the so-called inertial frames. Those quantities that are invariant
under the (Lorentz) transformation from one frame to another are of fundamental
importance. They include the invariant interval between two events (ct)2 − x2 , the
energy-momentum invariant E 2 − (pc)2 , and the frequency-wave number invariant
ω 2 − (kc)2 . Here c denotes the speed of light in free space. The special theory is
formulated in a gravity-free universe.
Ten years after introducing his special theory of relativity, Einstein (1879–1955)
published his crowning achievement, the general theory of relativity [6, 7]. This is
a theory of space-time and dynamics in the presence of gravity. The essential
mathematical methods used in the general theory are differential geometry and
the absolute differential calculus (which Einstein referred to as tensor analysis).
Einstein devoted more than five years to mastering the necessary mathematical
techniques. He corresponded with Levi-Civita, asking for his advice on applications
of tensor analysis.
A tensor is a set of functions, fixed in a coordinate system that transforms
under a change of the coordinate system according to definite rules. Each tensor
component in a given coordinate system is a linear, homogeneous function of the
components in another system. If there are two tensors with components that
are equal when both are written in one coordinate system, then they are equal
in all coordinate systems; these tensors are invariant under a transformation of
the coordinates [14]. Physical laws are true in their mathematical forms for all
observers in their own frames of reference (coordinate systems) and therefore the
laws are necessarily formulated in terms of tensors.
11
12 1915. GENERAL RELATIVITY AND THE ABSOLUTE DIFFERENTIAL CALCULUS
Observed position
Actual position
Sun during eclipse
Observer
Figure 1. Gravitational lensing.
Einstein’s belief that matter generates a curvature of space-time led him to

the notion that space-time is Riemannian, that is, locally Euclidean. The entire
curved surface can be approximated by tiling with flat frames. Einstein assumed
that in such locally flat regions, in which there is no appreciable gradient in the
gravitational field, a freely falling observer experiences all physical aspects of special
relativity; the effects of gravity are thereby locally removed. This assumption is
known as the principle of equivalence.
In special relativity, the energy-momentum invariant is of fundamental impor-
tance. It involves energy E, momentum p, and rest mass m:
E 2 − (pc)2 = (mc2 )2 .
Einstein proposed that in general relativity, it is mass/energy that is responsible for
the curvature. He introduced the stress-energy tensor, well known in physics, to be
the quantity related to the curvature. He proposed that the relationship between
them is the simplest possible; they are proportional to each other [12]:
Curvature tensor = k (Stress-Energy tensor),
in which the constant k is chosen so that the equation agrees with Newton’s law
of gravity for the motion of low-velocity objects in weak gravitational fields (k =
8πG/c4 , in which G is Newton’s constant).
In 1907, Einstein [5] combined his principle of equivalence with the theory
of special relativity (1905) and predicted that clocks run at different rates in a
gravitational potential, and that light rays bend in a gravitational field; see Figure
1. This work predated his introduction of the theory of general relativity (1915).
In general relativity, objects falling in a gravitational field are not being acted upon
by a gravitational force (in the Newtonian sense). Rather, they are moving along
geodesics (distance-minimizing paths) in the warped space-time that surrounds
massive objects. The observed deflection of light beams near the sun is a test of
the principle of equivalence. Tests of general relativity are an active part of research
in physics and astronomy. The problem below is related to one of these tests; for a
review of early tests of gravitational theory see [10].
The Schwarzschild line element, in the region of a spherical mass M (obtained
as an exact solution of the Einstein field equations) is, in polar coordinates,
ds2 = c2 (1 − 2GM/rc2 )dt2 − (1 − 2GM/rc2 )−1 dr 2 − r 2 (dθ 2 + sin2 θ dφ2 ).
If χ = 2GM/rc2 is small, then the coefficient (1 − χ)−1 of dr 2 in the Schwarzschild
line element can be replaced by the leading term of its binomial expansion to give
the “weak field” line element
ds2W = (1 − χ)(c dt)2 − (1 + χ)dr 2 − r 2 (dθ 2 + sin2 θ dφ2 ).
At the surface of the sun, the value of χ is 4.2 · 8−6 , so that the weak-field approx-
imation is valid for all gravitational phenomena in our solar system.
Consider a beam of light traveling radially in the weak field of a mass M . Then
ds2W = 0 (a light-like interval) and dθ 2 + sin2 θ dφ2 = 0,
which gives
0 = (1 − χ)(c dt)2 − (1 + χ)dr 2 .
The “velocity” of the light vL = dr/dt, as determined by observers far from the
gravitational influence of M , is therefore

vL = c (1 − χ)/(1 + χ) < c
since χ > 0. Observers in free fall near M have
χ = 0 and hence measure the
speed of light to be c. Expanding the term (1 − χ)/(1 + χ) to first order in
χ = 2GM/rc2 provides the approximation
vL (r) ≈ c(1 − 2GM/rc2 + · · · ).
In geometrical optics, the refractive index n of a material is n = c/vmedium , in
which vmedium is the speed of light in the medium. We introduce the concept of
the refractive index of space-time nG (r) at a point r in the gravitational field of a
mass M :
nG (r) = c/vL (r) ≈ 1 − 2GM/rc2 .
The value of nG (r) increases as r decreases. This effect can be interpreted as an
increase in the “density” of space-time as M is approached.
As a plane wave of light approaches a spherical mass, those parts of the wave
front nearest the mass are slowed down more than those parts farthest from the
mass. The speed of the wave front is no longer constant along its surface, and
therefore the normal to the surface must be deflected. The deflection of a plane
wave of light by a spherical mass M of radius R, as it travels through space-time,
can be calculated in the weak-field approximation.

Proposed by Frank W. K. Firk, Yale University.
Show that in the weak-field approximation the total deflection Δα equals
4GM/Rc2 . This is Einstein’s famous prediction on the bending of light in a gravi-
tational field.
14 1915. GENERAL RELATIVITY AND THE ABSOLUTE DIFFERENTIAL CALCULUS
1915: Comments
There are several nice points worth isolating from this problem and remarking
on. First, when a new theory is conjectured in the sciences, we test it to see
whether or not it can explain current observations. In the case of the general
theory of relativity, this was spectacularly done by its explanation of the perihelion
of Mercury; Isaac Asimov (1920–1992) has a beautiful article on this [1]. If one is
lucky, the theory also predicts new phenomena. A terrific example of such a theory
is Bohr’s model for the hydrogen atom, which not only explained the observed
spectral lines but also predicted others previously unseen. Scientists before Einstein,
using Newtonian physics and particle models for light, posited a deflection of light
passing near a massive object. But Einstein obtained a much different value for
this deflection, which experiments then verified. Speaking of gravitational lensing,
did you know that the number of images produced by n coplanar point lenses is
at most 5n − 5? This was proved in 2008 using complex dynamics and harmonic
function theory [9].
The second great lesson here is that the usefulness of mathematics is not always
apparent. When asked about the utility of a new invention, Benjamin Franklin
(1706–1790) remarked, “What is the use of a new-born child?” The differential
geometry that underlies Einstein’s theories was not developed for relativity, but
it was available and could be used when the proper situation arose. While it can
take decades or more for some mathematics to find applications, such connections
often arise to the surprise of many of the involved parties. The 1940 entry involves
G. H. Hardy’s classic book, A Mathematician’s Apology; the reader is encouraged
to jump to that entry and reflect, while reading the excerpt, on the fact that many
of Hardy’s results have found a home in modern cryptography (and even in biology
[2]). That said, for those who would like a more down-to-earth answer here is one:
Einstein’s general theory of relativity is essential for the Global Positioning System
(GPS) to function properly and accurately [13].
Finally, it is important to remember that the jury is always out and we should
constantly explore additional ways to test a theory. It often takes decades or longer
to fully explore all the predictions and verify the results of these experiments. To
this end, there have been some exciting recent developments in the field of rela-
tivity. The Laser Interferometer Gravitational-Wave Observatory (LIGO) recently
announced [11] that they have verified another prediction of Einstein’s general the-
ory: the existence of gravitational waves. Of course, with monumental discoveries
such as this, one must wait for the results to be confirmed. To give the reader a
sense of how delicate these measurements are, researchers are looking for effects on
the order of one part in 1021 . One article put this in perspective by saying this is
equivalent to squishing our galaxy to the height of a human [8].
Bibliography
[1] I. Asimov, The planet that wasn’t, The Magazine of Fantasy and Science Fiction (1975), May.
http://geobeck.tripod.com/frontier/planet.htm.
[2] H. E. Christenson and S. R. Garcia, G. H. Hardy: mathematical biologist, J. Humanist. Math.
5 (2015), no. 2, 96–102, DOI 10.5642/jhummath.201502.08. http://scholarship.claremont.
edu/jhm/vol5/iss2/8. MR3378780
[3] P. A. M. Dirac, General theory of relativity, reprint of the 1975 original, Princeton Landmarks
in Physics, Princeton University Press, Princeton, NJ, 1996. MR1373868
[4] A. Einstein, On the electrodymanics of moving bodies, Annalen der Physik 17 (1905), 891–921.
http://www.fourmilab.ch/etexts/einstein/specrel/www/. For more of Einstein’s papers
from this time period, see http://www.loc.gov/rr/scitech/SciRefGuides/einstein.html.
[5] A. Einstein, Über das Relativitätsprinzip und die aus demselben gezogene Fol-
gerungen, Jahrbuch Rad. 4 (1907), 410. http://www.relativitycalculator.com/pdfs/
Einstein_1907_Comprehensive_Essay_PartsI_II_III.pdf.
[6] A. Einstein, The foundation of the general theory of relativity, Annalen der Physik
(1916). http://web.archive.org/web/20060831163721/http://www.alberteinstein.info/
gallery/pdf/CP6Doc30_English_pp146-200.pdf.
[7] A. Einstein, The meaning of relativity, reprint of the 1956 edition, Princeton University Press,
Princeton, NJ, 1988. MR1042572
[8] C. Hanna, What happens when LIGO texts you to say it’s detected one of Einstein’s pre-
dicted gravitational waves, The Conversation, February 11, 2016. http://theconversation.
com/what-happens-when-ligo-texts-you-to-say-its-detected-one-of-einsteins-
predicted-gravitational-waves-53259.
[9] D. Khavinson and G. Neumann, From the fundamental theorem of algebra to astrophysics: a
“harmonious” path, Notices Amer. Math. Soc. 55 (2008), no. 6, 666–675. MR2431564
[10] D. F. Lawden, An introduction to tensor calculus, relativity and cosmology, 3rd ed., John
Wiley & Sons, Ltd., Chichester, 1982. MR665917
[11] LIGO, Gravitational Waves Detected 100 Years After Einstein’s Prediction, LIGO News
Release, February 11, 2016. https://www.ligo.caltech.edu/news/ligo20160211.
[12] M. M. G. Ricci and T. Levi-Civita, Méthodes de calcul différentiel absolu et leurs applications
(French), Math. Ann. 54 (1900), no. 1-2, 125–201, DOI 10.1007/BF01454201. MR1511109
[13] T. Van Flandern, What the Global Positioning System tells us about relativity, in Open
Questions in Relativistic Physics (edited by F. Selleri), Apeiron (1998), 81–90.
[14] C. M. Will, in General Relativity (edited by S. W. Hawking and W. Israel), Chapter 2,
Cambridge University Press, 1979.
1916
Ostrowski’s Theorem
Introduction
The absolute value function gives the magnitude of a real or complex number.
However, there are other ways to define the “size” of a number. An absolute value
on a field F is a real-valued function that satisfies
(a) x ≥ 0,
(b) x = 0 if and only if x = 0,
(c) xy = xy, and
(d) x + y ≤ x + y.
Josef Kurschak proposed these axioms in 1912, although Kurt Hensel (1861–1941)
had started related research in 1897.
The standard absolute value on the field Q of rational numbers is

x if x ≥ 0,
x0 =
−x if x < 0.
Another example is the trivial absolute value, defined by

1 if x = 0,
x =
0 if x = 0.
There is an important type of absolute value on Q that leads to a notion of the

size of a number that is related to its arithmetic properties. Given a prime p and a
nonzero rational number x, we may write x = pn a/b, in which n, a, b ∈ Z and a, b, p
are pairwise relatively prime. The p-adic absolute value of x is

0 if x = 0,
xp = −n
p if x = 0 and x = pn a/b as above.
The beautiful Artin–Whaples product formula

x0 xp = 1, x ∈ Q, (1916.1)
p prime
relates the standard absolute value to all of the p-adic absolute values.
We say that two absolute values ·1 and ·2 on a field F are equivalent if there
is a c > 0 so that x1 = xc2 for all x ∈ F. In 1916, Alexander Ostrowski (1893–
1986) proved what is now known as Ostrowski’s theorem: each absolute value on the
rational numbers is equivalent to the trivial absolute value, the standard absolute
value, or a p-adic absolute value. In other words, we have a complete description
17
18 1916. OSTROWSKI’S THEOREM
of all possible ways to generalize the notion of “size” for rational numbers so that
the four axioms above hold.
The standard absolute value on Q is Archimedean; that is, for each x = 0 there
is an N ∈ N so that nx0 > 1 for all n ≥ N . In contrast, the p-adic absolute values
are non-Archimedean. Since the Archimedean property is, in a sense, “natural,”
one might use Ostrowski’s theorem to argue that the standard absolute value is the
most natural possible absolute value one can endow Q with.

Proposed by David Burt and Steven J. Miller, Williams College.
√ √
A number field is a finite field extension of Q, such√as Q[ −5] = {a + b −5 :
a, b ∈ Q}. Observe that unique factorization fails in Q[ −5] since
√ √
2 · 3 = (1 + −5)(1 − −5)
√
and none of the factors divide any of the others in Q[ −5]. Are there notions of
absolute values in this context? If so, what are they?
1916: Comments
The p-adic numbers. Each absolute value on Q defines a metric. The stan-
dard metric on Q is
d0 (x, y) = x − y0 .
On the other hand, each prime number p gives rise to the p-adic metric on Q:
dp (x, y) = x − yp .
The real number system is the completion of Q with respect to the standard metric.
In the same way, for each prime p we complete Q with respect to the p-adic metric
and obtain the p-adic number system Qp . Just as the completion of Q with respect
to the standard metric is a field (namely R), one can show that Qp is a field. What
do the elements of Qp look like?
First let us examine how Z, the set of integers, sits inside of Q3 ; see Figure 1.
Modulo 3, the integers come in precisely three flavors: an integer is congruent to
exactly one of 0, 1, or 2 modulo 3. If x ≡ y (mod 3), then they are “pretty close”
to each other in Qp since 3|(x − y) and hence x − y3 ≤ 13 . If x ≡ y (mod 9), then
they are even closer since 9|(x − y) and hence x − y3 ≤ 19 . Continuing in this
fashion, a famous fractal (a Sierpiński triangle; see the 1963 entry) emerges. This is
suggested by Figure 1(d). To picture how Z sits inside of Q3 , imagine iterating this
process “downward” infinitely many times; to picture Q3 itself, imagine iterating
“upward” too!
If you are baffled and confused, then we have done our job. The p-adic number
system is strange. For instance, the p-adic metric is an example of an ultrametric.
An ultrametric is a metric that satisfies the strong triangle inequality
d(x, z) ≤ max{d(x, y), d(y, z)}.
One consequence of this is that every triangle in Qp is isosceles: if x − yp =
z − yp , then
x − yp = max {x − zp , z − yp } .
Even more baffling is the fact that every point in a p-adic open disk is a center of
that open disk! Try to prove these results.
(a) Location in Q3 of the integers congruent to (b) Location in Q3 of the integers congruent to
0, 1, 2 (mod 3). 0, 1, . . . , 8 (mod 9).
0
0 27 54
9 18
9 18 36 63 45 72
3 6
3 6 30 57 33 60
12 21 15 24
12 21 15 24 39 66 48 75 42 69 51 78
1 2
1 2 28 55 29 56
10 19 11 20
10 19 11 20 37 64 46 73 38 65 47 74
4 7 5 8
4 7 5 8 31 58 34 61 32 59 35 62
13 22 16 25 14 23 17 26
13 22 16 25 14 23 17 26 40 67 49 76 43 70 52 79 41 68 50 77 44 71 53 80
(c) Location in Q3 of the integers congruent to (d) Location in Q3 of the integers congruent to
0, 1, . . . , 26 (mod 27). 0, 1, . . . , 80 (mod 81).
Figure 1. Depiction of the integers in Q3 .
One of the most common mistakes in mathematics is to use a formula without

checking to see if its requirements are satisfied. If we ignore the fact that 2 > 1,
then the geometric series formula suggests that
1
1 + 2 + 22 + 23 + 24 + · · · = = −1. (1916.2)
1−2
This seems absurd: how can the sum of infinitely many positive numbers be nega-
tive? It cannot, if you insist on using the standard metric on Q. It does however
make sense 2-adically since

N −1

1 − 2N

2 − (−1)
=
n
+ 1

N

= 1 − (1 − 2 ) 2 = 2 2 = 2 ,
N −N

1−2 2
n=0 2
which tends to zero as N → ∞. So the partial sums of our series converge to −1

with respect to the 2-adic metric. Thus, (1916.2) makes sense in Q2 ! From this
analysis we can extract an important lesson: whether or not something converges
20 1916. OSTROWSKI’S THEOREM
depends on what we mean by “converges”. Can you show that

1
1 + 3 + 32 + 33 + 34 + · · · = −
2
in Q3 ? What is the closure of Q ∩ (0, ∞) in Qp ? Is it all of Qp ?
Now that we have played around with p-adic arithmetic a little, a more con-
crete description of Qp is at hand. Each p-adic number can be expressed as an
∞
infinite series of the form n=N an pn with each an ∈ {0, 1, 2, . . . , p − 1} for some
integer N (the series converges with respect to the p-adic metric). At this point,
manipulating p-adic numbers is analogous to handling decimal expansions of real
numbers. Instead of powers 10n with n running from N (usually positive) to −∞,
we have powers pn with n running from N (potentially negative) to +∞.
Of course, Qp is good for something besides mathematical parlor tricks. For
instance, the famous Hasse–Minkowski local-global principle for quadratic forms
(see the 1966 and 1993 entries) asserts that a multivariate quadratic equation with
rational coefficients has a solution in integers if and only if it has a solution in R
and in each Qp . The unity suggested by the product formula (1916.1) is no illusion!
Bibliography
[1] J. E. Holly, Pictures of ultrametric spaces, the p-adic numbers, and valued fields, Amer. Math.
Monthly 108 (2001), no. 8, 721–728, DOI 10.2307/2695615. https://www.colby.edu/math/
faculty/Faculty_files/hollydir/Holly01.pdf. MR1865659
[2] A. Ostrowski, Über einige Lösungen der Funktionalgleichung ψ(x) · ψ(x) = ψ(xy) (German),
Acta Math. 41 (1916), no. 1, 271–284, DOI 10.1007/BF02422947. http://link.springer.
com/article/10.1007%2FBF02422947. MR1555153
[3] W. Stein, Introduction to Algebraic Number Theory, May 2005, http://wstein.org/129-05/
notes/129.pdf.
1917
Morse Theory, but Really Cantor
Introduction
Marston Morse (1892–1977) was inspired by the work of Jacques Hadamard
(1865–1963), Henri Poincaré (1854–1912), and his advisor George Birkhoff (1884–
1944). In choosing a topic for his thesis, he wished to combine the fields of analysis
and geometry, a theme that continued throughout his life’s work. An entire branch
of mathematics, Morse theory, is named after him.
The shortest distance between two points in a plane is a straight line, and
straight lines have constant slope. Now consider two points on a surface. The
analogue for the straight line is a curve called a geodesic. The analogue for constant
slope is that the tangent vectors to the curve remain parallel as they are transported
along the curve. For example, on a sphere the geodesic between two points is the
arc of the great circle going through them; see Figure 1. Morse often focused on
surfaces with negative curvature, such as the “pair of pants” in Figure 2(a). In
his 1917 thesis he proved the existence of certain types of nonperiodic geodesics on
surfaces of negative curvature; for more information, see Morse’s article [2].
On a less happy note, 1917 was the year when Georg Cantor (1845–1918)
entered the sanatorium in which he ultimately died. We have a lot of things to say
about the work of Cantor, so much so that he has snuck into this entry even though
the entire 1918 entry is devoted to him!
Figure 1. Geodesics on the sphere are great circles.
21
22 1917. MORSE THEORY, BUT REALLY CANTOR
(a) A topological pair of pants. (b) Add a pair of pants to each “leghole” and
continue indefinitely.
Figure 2. If you have infinitely many pairs of pants, you could

try this at home.
Define a sequence of subsets Cn of [0, 1] according to the following scheme:
C0 = [0, 1],
C1 = [0, 13 ] ∪ [ 23 , 1],
C2 = [0, 19 ] ∪ [ 29 , 13 ] ∪ [ 23 , 79 ] ∪ [ 89 , 1],
and so forth. For n ∈ N, the set Cn is obtained from Cn−1 by removing the middle
third of every closed interval contained in Cn−1 ; see Figure 3. The Cantor set is
∞

C= Cn ,
n=0
which is what is “left over” after removing the open intervals ( 13 , 23 ), ( 19 , 29 ), ( 97 , 89 ), . . .

from [0, 1]. To pass from stage Cn−1 to Cn we remove 2n−1 open intervals of length
1/3n . Thus, the total length of the omitted intervals is
∞
2n−1
= 1.
n=1
3n
What remains behind, namely C, has Lebesgue measure zero; that is, it has zero
length. The Cantor set is a fractal, a set that demonstrates self-similarity: it con-
tains infinitely many scaled copies of itself. Moreover, C is a compact, uncountable
(see the 1918 and 1999 entries), nowhere dense, and totally disconnected subset of
[0, 1].
Two metric spaces (X, dX ) and (Y, dY ) are homeomorphic if there is a continu-
ous bijection f : X → Y whose inverse f −1 : Y → X is continuous; the functions f
Figure 3. The sets C0 , C1 , . . . , C6 .
and f −1 are homeomorphisms. The notion of homeomorphism provides an equiv-

alence relation among metric spaces. One often considers homeomorphic metric
spaces to be “the same.”
The following problem is related, perhaps tangentially, to the work of both
Morse and Cantor. It concerns a specific surface of negative curvature that lies
at the intersection of analysis and topology, a recurring theme in much of Morse’s
work [4, p. 29].

Proposed by Joanne Snow, Colleen Hoover, and Steven Broad at Saint
Mary’s College.
Let C ⊂ R ⊂ R2 be the Cantor set, embedded in R2 . Show that R2 \C is
homeomorphic to the surface pictured in Figure 2(b).
1917: Comments
The Brunn–Minkowski theorem and Cantor dust. It seems appropriate
to spend a few pages discussing some little-known, but extremely interesting, prop-
erties of the Cantor set. Much of this can be found in [3]. A famous result that
combines arithmetic properties of sets with topological and measure-theoretic prop-
erties is the Brunn–Minkowski theorem. Let n ∈ N, let A and B be two nonempty,
compact subsets of Rd , and let
A + B = {a + b : a ∈ A, b ∈ B}.
Then
(m(A))1/n + (m(B))1/n ≤ (m(A + B))1/n ,
24 1917. MORSE THEORY, BUT REALLY CANTOR
Figure 4. The set C × C is sometimes called Cantor dust.
in which m(S) denotes the Lebesgue measure of S.1 For n = 1, we have

m(A) + m(B) ≤ m(A + B) (1917.1)
for any nonempty, compact subsets A, B ⊆ R. How large can the gap between the
left- and right-hand sides of the inequality be? It turns out that m(A + B) can be
made arbitrarily large, even if m(A) = m(B) = 0. This is a consequence of the fact
that
C + C = [0, 2]. (1917.2)
To prove (1917.2), it suffices to show that for each b ∈ [0, 2], the line defined
by y = −x + b intersects the Cantor dust C × C; see Figure 4. Indeed, if contains
a point (x0 , y0 ) ∈ C × C, then b = x0 + y0 ∈ C + C. The Cantor dust can be
constructed iteratively in a manner similar to the Cantor set. Start with a unit
square, subdivide it into 9 squares with side length 1/3, remove the central “plus
sign” consisting of 5 squares, and iterate the process.
Since the slope of the line is −1, it intersects at least one of the four corner
squares in stage one; call it S1 . Similarly, intersects one of the corner squares in
stage two; call it S2 . Continuing in this manner,we obtain a decreasing sequence
∞
of closed squares S1 ⊃ S2 ⊃ · · · . Consequently, n=1 Sn consists of a single point
(x0 , y0 ) ∈ C × C.2 We have (x0 , y0 ) ∈ since is closed and contains points that
are arbitrarily close to (x0 , y0 ). Returning to (1917.1), we may let α > 0 and let
A = B = { α2 c : c ∈ C} be a scaled copy of the Cantor set, so that m(A) = m(B) = 0
and m(A + B) = α.
The Cantor surjection theorem. Another surprising result about the Can-
tor set is the Cantor surjection theorem. If (X, d) is a compact metric space, then
there is a continuous surjection f : C → X (see the 1918 entry). Thus, every
compact metric space is the continuous image of the Cantor set. This does not
1 Thinkof m(S) as the “length” of S. That is, until you read the notes to the 1924 entry!
a complete metric space, the intersection of a sequence A1 ⊇ A2 ⊇ · · · of nested compact
2 In
sets whose diameters sup{d(x, y) : x, y ∈ An } tend to zero is a singleton.

contradict anything about connectedness: although the continuous image of a con-

nected set is connected, the continuous image of a disconnected set can very well
be connected.
The proof of the Cantor surjection theorem, while not difficult in principle,
is notationally cumbersome. The basic premise is that one uses compactness to
dissect X into 2k1 nonempty compact subsets of diameter less than 1/2 for some
k1 , then one dissects the resulting “pieces” into 2k2 pieces of diameter less than 1/4,
and so forth. By labeling things cleverly, one obtains a dyadic filtration of X with
which each x ∈ X can be assigned at least one address using a binary string. This
is done in a manner so that the strings corresponding to x and y agree for more
and more initial bits the closer that x and y are; this guarantees the continuity of
our function. Each binary string specifies a point in the Cantor set; the sequence
of zeros and ones specifies whether one stays in the left- or right-hand interval at
each stage in the Cantor set construction.
A remarkable consequence of the Cantor surjection theorem is that the car-
dinality of a compact metric space cannot exceed that of R. That is, if (X, d) is
a compact metric space, then there is an injection g : X → R (that is, g is one-
to-one). For instance, it is impossible to endow the powerset P(R) of R (see the
comments for the 1918 entry) with a metric so that it becomes a complete metric
space.
Peano curves. A byproduct of the Cantor surjection theorem is the existence
of Peano curves, that is, curves whose images have nonempty interiors. To be more
specific, if K is a nonempty, compact, convex subset of Rn , then there exists a
continuous surjection f : [0, 1] → K. Here is a sketch of the proof. Let g : C → K
be the continuous surjection from the Cantor surjection theorem and let

g(x) if x ∈ C,
f (x) =
(1 − t)g(a) + tg(b) if x = (1 − t)x + tb ∈ (a, b),
in which (a, b) denotes a gap interval from the construction of the Cantor set; that
is, (a, b) ⊆ [0, 1]\C and a, b ∈ C. Since K is convex and g(a), g(b) ∈ K, it follows
that (1 − t)g(a) + tg(b) ∈ K for t ∈ [0, 1]. Moreover, f is continuous since g is
continuous and f extends g linearly on each gap interval. Voilà!
Returning to the concept of homeomorphism, we should mention that C is
homeomorphic to C × C. This is a consequence of the Moore–Kline theorem: if
X = ∅ is a compact, perfect (closed with no isolated points), totally disconnected
metric space, then X is homeomorphic to C.
Bibliography
[1] H. M. Morse, A One-to-One Representation of Geodesics on a Surface of Negative Curva-
ture, Amer. J. Math. 43 (1921), no. 1, 33–51, DOI 10.2307/2370306. http://www.jstor.org/
stable/2370306. MR1506428
[2] H. M. Morse, Recurrent geodesics on a surface of negative curvature, Trans. Amer. Math. Soc.
22 (1921), no. 1, 84–100, DOI 10.2307/1988844. http://www.ams.org/journals/tran/1921-
022-01/S0002-9947-1921-1501161-8/S0002-9947-1921-1501161-8.pdf. MR1501161
[3] C. C. Pugh, Real mathematical analysis, 2nd ed., Undergraduate Texts in Mathematics,
Springer, Cham, 2015. MR3380933
[4] M. Spivak, A comprehensive introduction to differential geometry. Vol. One, Published by M.
Spivak, Brandeis Univ., Waltham, Mass., 1970. MR0267467
1918
Georg Cantor
Introduction
It is known that there are an infinite number of worlds, simply because
there is an infinite amount of space for them to be in. However, not
every one of them is inhabited. Therefore, there must be a finite
number of inhabited worlds. Any finite number divided by infinity is
as near to nothing as makes no odds, so the average population of all
the planets in the Universe can be said to be zero. From this it follows
that the population of the whole Universe is also zero, and that any
people you may meet from time to time are merely the products of a
deranged imagination. [1]
While we hold Douglas Adams (1952–2001), author of the famed Hitchhiker’s
Guide to the Galaxy “trilogy,” in the highest regard, “this argument isn’t worth a
pair of fetid dingo’s kidneys.” Find at least three things wrong with his argument!
The most influential mathematician to study infinity was Georg Cantor. His
work was so mind-blowing that he even managed to appropriate territory in our
1917 entry. Before getting into Cantor’s theory of cardinality and some of its jaw-
dropping consequences, let us first warm up with a few infinity-related paradoxes.
Imagine that every second, you are given two numbers that you add to your
(initially empty) collection. The first pair is 1, 2, the second pair is 3, 4, and so on.
After receiving each pair of numbers, you must discard exactly one number from
your collection. Let us examine two strategies for handling this situation.
(a) Every time you receive a pair of numbers, you discard the odd one. Thus, the
number 2n arrives in round n and remains in your collection in all successive
rounds. You are eventually left with the infinite set {2, 4, 6, . . .}.
(b) Every time you receive a pair of numbers, you discard the lowest number in
your collection. Thus, the natural number n arrives in round n+1 2 and is
removed in round n. You are eventually left with the empty set ∅.
In both scenarios, you discard exactly one card in each round. How can they lead
to two such different outcomes?
This next paradox is from the final book of Galileo Galilei (1564–1642), Dis-
courses and Mathematical Demonstrations Relating to Two New Sciences (1638).
Let S = {1, 4, 9, 16, . . .} denote the set of perfect squares. Galileo’s paradox is the
apparent contradiction that although S is “much smaller than N,” the function
n → n2 exhibits a one-to-one correspondence between S and N. How can this be?
A function f : A → B is injective (one-to-one) if “distinct inputs are sent to
distinct outputs”, that is, if f (a) = f (a ) implies a = a . We say that f : A → B is
surjective (onto) if every element of B is “hit by f ”, that is, if f (A) = B. Two sets
A and B are equinumerous if there is a one-to-one and onto function f : A → B.
27
28 1918. GEORG CANTOR
a x a x
b y b y
c z c z
(a) Injective and surjective. (b) Not injective and not surjective.
a x a x
b y
y b
c z
d z c w
(c) Surjective and not injective. (d) Injective and not surjective.
Figure 1. Of the four functions depicted, only (a) is a bijection.
Such a function is called a bijection; see Figure 1. This relationship between A and
B is denoted A ∼ = B; we also say that A and B have the same cardinality.
One of the most important properties of the symbol ∼ = is that it is an equivalence
relation. In other words, it “behaves like an equal sign” in the sense that it is
reflexive (A ∼ = A), symmetric (A ∼ = B implies that B ∼ = A), and transitive (A ∼ =B
∼ ∼
and B = C implies that A = C). Can you prove this?
We say that A is finite if A = ∅ or A ∼ = {1, 2, . . . , n} for some n ∈ N. For finite
sets, A ∼ = B just means that “A and B have the same number of elements.”
We say that A is infinite if A is not finite, and countable if A is finite or A ∼ = N.
In fact, A is infinite if and only if it has a proper subset B such that A ∼ = B. For
instance, Galileo noted that S is a proper subset of N and S ∼ = N.
An infinite set A is countable if and only if its elements can be enumerated
a1 , a2 , a3 , . . . without repetition. Indeed, if A is so enumerable, then the function
f : N → A defined by f (n) = an is a bijection. Conversely, each bijection f : N → A
gives rise to an enumeration f (1), f (2), f (3), . . . of A.
Even though Z = {. . . , −2, −1, 0, 1, 2, . . .} has ellipses going in two directions,
it is countable since 0, 1, −1, 2, −2, 3, −3, 4, −4, . . . is an enumeration of Z. In fact,
an explicit bijection f : N → Z is
⎧n
⎪
⎨ if n is even,
2
f (n) =
⎩ −n
⎪ 1
if n is odd.
2
Can you use a similar idea to prove that the union of two countable sets is countable?
(1, 5) (2, 5) (3, 5) (4, 5) (5, 5)
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4)
(1, 3) (2, 3) (3, 3) (4, 3) (5, 3)
(1, 2) (2, 2) (3, 2) (4, 2) (5, 2)
(1, 1) (2, 1) (3, 1) (4, 1) (5, 1)
Figure 2. Diagram illustrating an enumeration of N2 .
What about N2 = N×N? It is countable too! Consider the function f : N2 → N

defined by f (a, b) = 2a−1 (2b − 1). If f (a, b) = f (c, d), then
2a−1 (2b − 1) = 2c−1 (2d − 1) .

odd odd
The fundamental theorem of arithmetic ensures that a = c and b = d, so f is

injective. Given n ∈ N, we may write n = 2a−1 (2b − 1) for some a, b ∈ N and hence
f (a, b) = n. Thus, f is surjective and we conclude that N2 ∼
= N. See Figure 2 for a
different approach (a “proof without words”).
A similar argument shows that Q is countable too; see Figure 3. Another
approach is to first show that the map
1
n →
2n − n + 1
recursively generates the Calkin–Wilf sequence 11 , 12 , 21 , 13 , 32 , 23 , 31 , 14 , 43 , . . ., which is

an enumeration of the positive rational numbers. At this point, one might suspect
that every infinite set is countable. How boring would that be?
One of Cantor’s most brilliant insights is that uncountable sets exist. One
example is R. In fact, it is impossible to find a surjective function f : N → [0, 1).
Here is Cantor’s classic (1891) diagonal argument [3]. Recall that each real number
in [0, 1) can be written uniquely as a sequence 0.d1 d2 d3 . . . of decimal digits di that
does not eventually terminate in all 9’s. Given a function f : N → [0, 1), consider
the list f (1), f (2), . . .. Write the decimal representations of these numbers in an
−2/2 − 1/2 0/2 1/2 2/2
−2/1 −1 0 1 2
−2/0 −1/0 0/0 1/0 2/0
−2/−1 −1/−1 0/−1 1/−1 −2
−2/−2 −1/−2 0/−2 1/−2 2/−2
Figure 3. Diagram illustrating an enumeration of Q. Quotients

m/n that are undefined or that have appeared previously in the
enumeration are discarded; these are in red.
array
f (1) = 0.d11 d12 d13 d14 d15 . . .
f (2) = 0.d21 d22 d23 d24 d25 . . .
f (3) = 0.d31 d32 d33 d34 d35 . . .
f (4) = 0.d41 d42 d43 d44 d45 . . .
f (5) = 0.d51 d52 d53 d54 d55 . . .
.. .. ..
. . .
and consider the number c = 0.c1 c2 c3 . . . ∈ [0, 1), in which

4 if dnn = 4,
cn = (1918.1)
7 if dnn = 4.
For each n = 1, 2, . . ., the nth digit of c differs from the nth digit of f (n). Since
c = f (n) for any n, the function f : N → [0, 1) is not a surjection.1
A shocking consequence of the uncountability of R is that there are many more
irrational numbers than rational numbers. Indeed, if the set of Qc of irrational
numbers were countable, then R = Q ∪ Qc would be the union of two countable
sets and hence be countable, which is not the case.
An algebraic number is a complex number that is a root of a polynomial with
integer coefficients. The set A of all algebraic numbers includes the rationals and
1 Why can this argument not be used to prove that Q is uncountable? Even if each
f (1), f (2), . . . is rational, there is no way to guarantee that c has an eventually repeating dec-
imal expansion (a real number is rational if and only if it has an eventually repeating decimal
expansion).
numbers such as
√ √ √
21/3
, i = −1, and 3 + 5; (1918.2)
these are roots of
x3 − 2, x2 + 1, and x8 − 16x4 + 4,
respectively. The degree of the integer polynomial with least degree for which an
algebraic number α is a root is called the degree of α. For instance, the numbers
(1918.2) have degrees 3, 2, and 8, respectively. One can show that the set of all
algebraic numbers is countable2 and hence most real numbers are transcendental
(not algebraic). For more information about transcendental numbers, see the 1935
and 1955 entries.
If all of this has not blown your mind, then maybe our next major revelation
will. If A is a set, then the powerset P(A) of A is the set of all subsets of A. For
example, if A = {a, b, c}, then

P(A) = ∅, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, A .
Cantor’s powerset theorem asserts that if S is any set, then there does not exist
a surjection (let alone a bijection) f : S → P(S). Since s → {s} furnishes an
injection from S into P(S) (so P(S) is “at least as big as S”), Cantor’s theorem
tells us that “P(S) is of a strictly larger cardinality than S.” Starting with S = N
and iterating the preceding result reveals that there are “infinitely many levels of
infinity”! If that did not blow your mind, then please consider a pan galactic gargle
blaster3 or two.
Here is the proof of Cantor’s theorem. Suppose toward a contradiction that
f : S → P(S) is a surjection. For each x ∈ S, we have f (x) ⊆ S and hence either
x ∈ f (x) or x ∈
/ f (x). Let
E = {x ∈ S : x ∈
/ f (x)}.
Since f is a bijection, there exists a z ∈ S such that f (z) = E. However,
z ∈ E ⇐⇒ z ∈
/ f (z) ⇐⇒ z ∈
/ E;
the first equivalence is from the definition of E, the second since f (z) = E. This
contradiction shows that no such f exists.

Proposed by Stephan Ramon Garcia, Pomona College.4
A chain of subsets of N is a collection C ⊆ P(N) that is totally ordered by the
relation ⊆; that is, if A, B ∈ C, then either A ⊆ B or B ⊆ A. We refer to such
subsets A and B as links in the chain C. Does there exists a chain C of subsets of
N that has uncountably many links? For the solution, see the comments below.

2 First show that A = ∞ n=0 An , in which An denotes the set of algebraic numbers of degree
n. Show that each An is countable (how many degree n polynomials with integer coefficients are
there?) and, as a consequence, that A is countable.
3 The effect is similar to “having your brains smashed in by a slice of lemon wrapped round
a large gold brick” [1].

4 The original proposed problem, due to Steven Miller of Williams College, dealt with func-
tions taking on only transcendental values. This has been moved to the 1955 entry.
1918: Comments
A common misconception. Cantor’s diagonal method is often described as
existential and nonconstructive. This is incorrect, since it can be used to produce
a real number that is not on the given list. For instance, when Cantor’s method
is applied to a list of all algebraic numbers, in some specified order, it produces
the digits of a transcendental number. For more on these issues, see the excellent
article by Gray [4].
Solution to the problem. There are many potential false starts to the prob-
lem. If your solution involves the word “next” or “first”, then it is probably incor-
rect! Since N ∼= Q, we may as well replace N with Q to see if that makes things
easier. For each x ∈ R, let
Ax = {q ∈ Q : q < x}.
The function f : R → P(Q) defined by f (x) = Ax is an injection since the density
of Q in R implies that x < y if and only if Ax Ay . It follows that the collection
f (R) = {Ax : x ∈ R}
is uncountable and linearly ordered by ⊆. Now let g : N → Q be a bijection and let
Bx = {n ∈ N : g(n) < x}.
The collection
C = {Bx : x ∈ R}
is an uncountable chain of subsets of N.
Bibliography
[1] D. Adams, The Restaurant at the End of the Universe, Pan Books, 1980.
[2] H. Cantor, Ueber eine Eigenschaft des Inbegriffs aller reellen algebraischen Zahlen (German),
J. Reine Angew. Math. 77 (1874), 258–262, DOI 10.1515/crll.1874.77.258. MR1579605
[3] G. Cantor, Über eine elementare Frage der Mannigfaltigskeitslehre, Jahresbericht der
Deutschen Mathematiker-Vereinigung 1 (1891), 75–78.
[4] R. Gray, Georg Cantor and transcendental numbers, Amer. Math. Monthly 101 (1994), no. 9,
819–832, DOI 10.2307/2975129. http://www.jstor.org/stable/2975129. MR1300488
[5] J. J. O’Connor and E. F. Robertson, Georg Ferdinand Ludwig Philipp Cantor, MacTutor
History of Mathematics, http://www-history.mcs.st-andrews.ac.uk/Biographies/Cantor.
html.
1919
Brun’s Theorem
Introduction
One of the most tantalizing conjectures in number theory is the twin prime
conjecture. It asserts that there are infinitely many pairs of primes that differ by
2; a prime in such a pair is a twin prime. Examples of such pairs are 5 and 7,
29 and 31, and 2,996,863,034,895 × 21,290,000 ± 1 (the numbers in the last example
have 388,342 digits when fully written out [11]). More generally, given any even
k ∈ N, are there infinitely many pairs of primes whose elements differ by k? This
is Polignac’s conjecture.
Although both conjectures remain open, there has been remarkable progress
over the past 100 years, culminating in the 2013 proof of Yitang Zhang (1955– )
that there is some even number k ≤ 70,000,000 such that infinitely many pairs
of primes differ by k. This result has been improved and generalized by many
authors, especially James Maynard (1987– ), Terence Tao, and the Polymath8
project [2, 8, 9]. It is now known that there are infinitely many pairs of primes that
differ by at most 246.
One of the earliest results in the field is due to Viggo Brun (1885–1978), who
proved in 1919 that the sum of the reciprocals of the twin primes converges. Com-
pare this to Euler’s result that the sum of the reciprocals of the primes diverges
(see p. 4). Thus, in a qualitative sense, the twin primes are far more sparse than
the primes. The value of Brun’s sum,

1 1 1 1 1 1
B = + + + + + + ··· , (1919.1)
3 5 5 7 11 13
which is at least 1.83 and less than 2.347 [4], is Brun’s constant. The search for
a good approximation to it led Thomas Nicely of Lynchburg College to discover a
floating-point arithmetic error in Intel’s Pentium processor [6]. This led to a $475
million loss for Intel, demonstrating the power of pure mathematics!
Unfortunately, the convergence of Brun’s series does not resolve the twin prime
conjecture since there are many infinite collections of natural numbers that have
convergent reciprocal sums. The perfect squares are an example, since
∞
1 π2
= (1919.2)
n=1
n2 6
as Euler showed in 1734 (see the notes for a derivation). Since any finite sum of
rational numbers is rational, if one could show that Brun’s constant were irrational,
then one would have a proof of the twin prime conjecture!
In what follows, log x denotes the base-e logarithm of x. We say that functions
f and g are asymptotically equivalent, denoted f ∼ g, if limx→∞ f (x)/g(x) = 1.
Let π(x) denote the number of primes at most x. For example, π(10.5) = 4 since
33
34 1919. BRUN’S THEOREM
Figure 1. Plot of the prime counting function π(x) (top) ver-

sus the twin prime counting function π2 (x) (bottom). The prime
number theorem ensures that π(x) grows like x/ log x; the first
Hardy–Littlewood conjecture asserts that π2 (x) grows like a con-
stant times x/(log x)2 .
2, 3, 5, 7 ≤ 10. Similarly, we let π2 (x) denote the number of twin primes at most x.
The celebrated prime number theorem states that π(x) ∼ x/ log x; that is,
π(x)
lim = 1;
x→∞ x/ log x
see the 1933 and 1948 entries. Consequently, for any C > 1
Cx
π(x) ≤
log x
for sufficiently large x. In contrast, one can show that there is a constant D > 0
such that
Dx
π2 (x) ≤
(log x)2
for sufficiently large x. The smallest constant known to work here is D = 4.5 [12].
A refinement of the twin prime conjecture is the Hardy–Littlewood conjecture
(twin primes), which suggests that x/(log x)2 is the appropriate benchmark function
for the twin primes. The conjecture is
x
dt
π2 (x) ∼ 2C2 , (1919.3)
2 (log t)2
in which
p(p − 2)
C2 = = 0.660161815 . . . (1919.4)
(p − 1)2
p≥3
is the twin primes constant [3]; see Figure 1. A simpler expression that is asymptot-
ically equivalent to (1919.3) is 2C2 x/(log x)2 . See the comments for the 2005 entry
for information about the Bateman–Horn conjecture, a wide-ranging generalization
of the Hardy–Littlewood conjecture.

Proposed by Stephan Ramon Garcia, Pomona College, and Steven J.
Miller, Williams College.
Let Ntwin be the set of all natural numbers whose only prime factors are twin
primes. Thus, Ntwin contains 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 25, but not 2, 4, 6, 8, 10,
12, 14, 16, 18, 20, 22, 23, 24. Note that 1 ∈ Ntwin because the set of all primes that
divide 1, namely the empty set, is a subset of Ptwin , the set of twin primes! Does
1
S = (1919.5)
n
n∈Ntwin
converge or diverge? If it converges, approximate the sum.

If this one is too hard to start out with, here is a closely related problem that
is a little easier. Let A denote the set of all natural numbers that do not have a
“9” in their decimal representations. In other words,
A = {1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, . . .}. (1919.6)
Does
1 1 1 1 1
= 1 + + + ··· + + + ···
n 2 3 8 10
n∈A
converge or diverge? To put this another way: have we removed enough terms from
the harmonic series to obtain a series that converges?
1919: Comments
The Basel problem. In 1644, Pietro Mengoli (1626–1686), posed the famous
Basel problem: evaluate
1 1 1 1
1+
+ + + + ··· .
4 9 16 25
This was solved by Euler in 1734, who provided the formula (1919.2); the problem
is named after his hometown. There are now dozens of proofs of Euler’s result. We
present a 2015 proof by Samuel G. Moreno, which is also one of the shortest [5].
It simplifies an earlier argument of Eberhard L. Stark [10]. We require the mean
value theorem for integrals: if f : [a, b] → R is continuous and g : [a, b] → R is
Riemann integrable and nonnegative, then there is a c ∈ (a, b) so that
b b
f (x)g(x) dx = f (c) g(x) dx.
a a
We start by proving the well-known formula
1
n
sin((n + 12 )x)
+ cos kx = (1919.7)
2 2 sin x2
k=1
from Fourier analysis. Euler’s formula

eix = cos x + i sin x,
in which i2 = −1, implies that
eix + e−ix eix − e−ix
cos x = and sin x = .
2 2i
Convert the sum on the left-hand side of (1919.7) into a sum of complex exponen-
tials, use the finite geometric series summation formula:
1 + r + · · · + r n = (1 − r n+1 )/(1 − r),
and appeal to the exponential representation of the sine function to complete the
proof of (1919.7).
Now multiply both sides of (1919.7) by x2 − 2πx and integrate by parts over
[0, π] to obtain
π
π 3 2π
n
x/2
− + 2
= (x − 2π) sin (n + 1/2)x dx
3 k sin(x/2)
k=1 0

u dv
π
x/2 − cos (n + 1/2)x
= (x − 2π)
sin(x/2) n + 1/2
0
u v
π
− cos (n + 1/2)x du
− dx
n + 1/2 dx
0

v du
π
−2π cos (n + 1/2)ξ n du
= + dx ξn ∈ [0, π]
n + 1/2 n + 1/2 0 dx

−2π + (u(π) − u(0)) cos (n + 1/2)ξn
=
n + 1/2

−2π + (2π − π /2) cos (n + 1/2)ξn
2
= .
n + 1/2
Let n → ∞ and obtain
∞
π 3 2π −2π + (2π − π 2 /2) cos (n + 1/2)ξn
− + = lim = 0,
3 k2 n→∞ n + 1/2
k=1
which is equivalent to (1919.2). This completes the proof.
Solution to the second problem. Before tackling the first (and more diffi-
cult) problem, let us address the second: the series converges. One way to interpret
this result is: “most natural numbers have 9’s in them.” Big numbers have lots of
digits, and hence a high probability of having a 9 in them somewhere. Since most
numbers are “big,” we expect that the set (1919.6) omits most natural numbers.
Let us try to make this precise.
The sum of the terms with single-digit denominators is
1 1 1
1++ + ···+ < 9 · 1 = 9.
2 3 8
The sum of the terms with 2-digit denominators is
1 1 1 1
+ + ··· + < 92 · ,
10 11 88 10
since there are 92 ways of getting an ordered pair of digits from the set {0, 1, 2, 3, 4, 5,
1
6, 7, 8} and since 10 is the largest summand in the group. Similarly, the sum of the
terms with 3-digit denominators is less than 93 /102 , and so forth. Thus, the series
converges1 and
2
1 9 9 9
< 9 1+ + + ··· = = 90.
n 10 10 1 − 9/10
n∈A
Solution to the first problem. Let us get back to the first problem. The
sum in (1919.5) can be written as
−1
1 1

S= (1 − 1/p) = 1 + + 2 + ··· .
p p
p∈Ptwin p∈Ptwin
To see this, multiply the right-hand side of the preceding equation term-by-term
and use the fundamental theorem of arithmetic (this is permissible by Mertens’s
theorem; see the 1933 entry). Consequently,
1
1 1 1
log S = log = + 2 + 3 + ··· (1919.8)
1 − 1/p p 2p 3p
p∈Ptwin p∈Ptwin
1 1 1 1 1
= + 2
+ + ···
p 2 p 3 p3
p∈Ptwin p∈Ptwin p∈Ptwin
1 1 1 1 1
≤ + + + ···
p 2·3 p 3·3 2 p
p∈Ptwin p∈Ptwin p∈Ptwin
∞
1
= (B − 15 ) n−1
= 3(B − 15 ) log 32 < ∞,
n=1
n3
so the series that defines S converges. The appearance of B − 15 in place of Brun’s
constant is due to the fact that 15 occurs twice in the sum (1919.1) that defines B.
From (1919.8) we obtain B − 15 < log S, so
1
3 3(B− 15 )
5.10 ≈ eB− 5 < S ≤ ≈ 13.62,
2
since 1.83 ≤ B ≤ 2.347 [4].
Bibliography
[1] V. Brun, La série 1/5 + 1/7 + 1/11 + 1/13 + 1/17 + 1/19 + 1/29 + 1/31 + 1/41 + 1/43
+ 1/59 + 1/61 + ..., oú les dénominateurs sont nombres premiers jumeaux est convergente
ou finie, Bulletin des Sciences Mathématiques 43 (1919), 100-104, 124-128. http://gallica.
bnf.fr/ark:/12148/bpt6k486270d.
[2] J. Maynard, Small gaps between primes, Ann. of Math. (2) 181 (2015), no. 1, 383–413, DOI
10.4007/annals.2015.181.1.7. MR3272929
[3] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’; III: On the
expression of a number as a sum of primes, Acta Math. 44 (1923), no. 1, 1–70, DOI
10.1007/BF02403921. MR1555183
[4] D. Klyve, Explicit bounds on twin primes and Brun’s Constant, Thesis (Ph.D.)–Dartmouth
College, ProQuest LLC, Ann Arbor, MI, 2007. MR2712414
[5] Samuel G. Moreno, A one-sentence and truly elementary proof of the Basel problem, http://
arxiv.org/abs/1502.07667.
[6] T. Nicely, Pentium FDIV flaw (2011), http://www.trnicely.net/pentbug/pentbug.html.
1 To
1
be more specific, n∈A n is a series of positive terms for which the partial sums are
1
bounded above by 90. Thus, the monotone sequence property ensures that n∈A n converges.
[7] T. R. Nicely, Enumeration to 1014 of the twin primes and Brun’s constant, Virginia J. Sci. 46
(1995), no. 3, 195–204. See also http://www.trnicely.net/twins/twins2.html. MR1401560
[8] D. H. J. Polymath, New equidistribution estimates of Zhang type, Algebra Number Theory
8 (2014), no. 9, 2067–2199, DOI 10.2140/ant.2014.8.2067. MR3294387
[9] D. H. J. Polymath, Variants of the Selberg sieve, and bounded intervals containing many
primes, Res. Math. Sci. 1 (2014), Art. 12, 83, DOI 10.1186/s40687-014-0012-7. MR3373710
[10] E. L. Stark, Application of a mean value theorem for integrals to series summation, Amer.
Math. Monthly 85 (1978), no. 6, 481–483, DOI 10.2307/2320072. MR0476932
[11] The Prime Pages, The List of Largest Known Primes Home Page, http://primes.utm.edu/
primes/.
[12] J. Wu, Chen’s double sieve, Goldbach’s conjecture and the twin prime problem, Acta Arith.
114 (2004), no. 3, 215–273, DOI 10.4064/aa114-3-2. MR2071082
1920
Waring’s Problem
Introduction
Godfrey Harold Hardy and John Edensor Littlewood wrote a series of influential
papers concerning additive problems in number theory. The first paper in this
series, published in 1920, addressed Waring’s problem [1]. For each k ≥ 2, is there
an s = s(k) such that every natural number is a sum of at most s perfect k-powers?
This problem, posed in 1770 by Edward Waring (1736–1798), is closely related
to several other famous problems in number theory. In 1769, Euler suggested that
for k ≥ 3, it is impossible to write a kth power as a sum of fewer than k nonzero kth
powers. The case k = 3 is Fermat’s last theorem for the exponent 3, now known
to be true; see the 1995 entry. Euler’s conjecture was disproved in 1966 by Leon J.
Lander and Thomas R. Parkin, who showed that
275 + 845 + 1105 + 1335 = 1445 .
What about k = 4? In 1986, Noam Elkies (1966– ) constructed an infinite
sequence of counterexamples, the smallest of which is
2,682,4404 + 15,365,6394 + 18,796,7604 = 20,615,6734 .
The smallest possible counterexample to Euler’s conjecture with k = 4 was provided
by Roger Fry in 1988:
95,8004 + 217,5194 + 414,5604 = 422,4814 .
The case k = 2 of Waring’s problem was settled in 1770 by Joseph-Louis La-
grange (1736–1813), who proved that every integer is a sum of four squares. For
instance,
2 = 12 + 12 + 02 + 02 and 7 = 22 + 12 + 12 + 12 .
A key ingredient in Lagrange’s proof is the four-square identity:
(a21 + a22 + a23 + a24 )(b21 + b22 + b23 + b24 )
= (a1 b1 − a2 b2 − a3 b3 − a4 b4 )2 + (a1 b2 + a2 b1 + a3 b4 − a4 b3 )2
+ (a1 b3 − a2 b4 + a3 b1 + a4 b2 )2 + (a1 b4 + a2 b3 − a3 b2 + a4 b1 )2 .
This identity is not as “magical” as it seems; see the notes for a derivation.
Do three squares suffice? No, because 7 cannot be written as a sum of three
squares (try it). Lagrange proved that a natural number can be represented as the
sum of three perfect squares if and only if it is not of the form 4j (8k + 7). Thus,
every natural number at most 100 except for
7, 15, 23, 28, 31, 39, 47, 55, 60, 63, 71, 79, 87, 92, 95
can be written as a sum of three squares.
39
40 1920. WARING’S PROBLEM
The finiteness of s(k) for all k ≥ 2 was not shown until the work of David
Hilbert (1862–1943) in 1909. For k = 1, 2, 3, 4, 5, 6, 7, the optimal values of s are
1, 4, 9, 19, 37, 73, 143. For instance, each positive integer is the sum of 19 fourth
powers, and there are some positive integers for which 18 fourth powers do not
suffice. For most values of k, we still do not know the optimal value of s.
Hilbert’s proof is existential; as originally stated it does not provide a bound on
how many kth powers are needed. This was remedied by Hardy and Littlewood in
their masterful paper, in which they further develop the circle method, introduced
by Hardy and Srinivasa Ramanujan (1887–1920) in 1916–1917 to analyze the par-
tition function; see the 1923 entry. This approach involves a delicate analysis of ex-
ponential sums, which we now sketch in the more modern trigonometric-polynomial
formulation.
If we attempt to write integers as a sum of kth powers, we might attempt to
use the generating function
∞
k
f (x) = e2πin x ,
n=0
ix
in which e = cos x + i sin x (this is Euler’s formula). Unfortunately, the series
above does not converge since its terms are each of unit magnitude and hence do
not tend to zero.1 We can avoid convergence problems altogether by considering
the truncated sum
N
k
fN (x) = e2πin x .
n=0
There is now a free parameter N involved; we choose N to be the number we are
trying to represent as a sum of kth powers. There is no danger in doing so since if
we are trying to express N as a sum of kth powers, none of the summands can be
larger than N .
The great insight is to consider

N
k
s
fN (x)s = e2πin x
(1920.1)
n=0

N
k
N
k
= e2πin1 x · · · e2πins x
n1 =0 ns =0

2πi(nk k
1 +···+ns )x
= e
0≤n1 ,...,ns ≤N
k

sN
= a(m; s, N )e2πimx , (1920.2)
m=0
in which a(m; s, N ) is the number of ways of writing m = nk1 + · · · + nks with each
n ≤ N . This can be deduced by expanding the product (1920.1) and collecting
terms involving e2πimx for each m.
To solve Waring’s problem, we just need to show that if s is sufficiently large,
then a(N ; s, N ) = 0 for all N . Fortunately, we can isolate a(N ; s, N ) in (1920.2),
1 The terms of a series that converges must tend to zero. However, a series whose terms tend
to zero need not converge. Consider the harmonic series 1 + 12 + 13 + · · · .

which is the number of ways of writing N as a sum of exactly s perfect kth powers:
1
a(N ; s, N ) = fN (x)s e−2πiN x dx. (1920.3)
0
This is because
1
1 sN
k
s −2πiN x
fN (x) e dx = a(m; s, N )e2πimx e−2πiN x
0 0 m=0

sN k 1
= a(m; s, N ) e2πi(m−N )x
m=0 0

sN k 1
= a(m; s, N ) [cos (2π(m − N )x) + i sin (2π(m − N )x)] .
m=0 0
The integrals of the sine terms are all zero; the cosine integral vanishes unless
m − N = 0, in which case the integral is 1. Thus, the preceding sum collapses to
(1920.3).
We must still show that the integral (1920.3) is nonzero. This is done by
splitting the domain of integration into two pieces, the set M of major arcs (where
the integrand is large) and the set m of minor arcs (where the integrand is small).
The terms “arc” and “circle method” originate with Hardy and Littlewood, who
formulated the method in terms of power series on the unit disk in the complex
plane. The modern treatment recasts things in terms of truncated exponential
sums and the wrapped interval [0, 1). Since fN (x) is highly oscillatory, we expect
it to exhibit massive amounts of cancellation for most values of x. For which x is
there strong reinforcement? It turns out that if x = a/q is a rational number whose
denominator q is small relative to N , then fN (x) is large. If one can show that the
integrals over M and m are of different orders of magnitude as N → ∞, we win.
See [6] for details of this calculation and [5] for a general introduction to the circle
method.

Proposed by Steven J. Miller, Williams College.
Often a related problem is significantly easier to attack than the original. This
is the case for the well-studied easier Waring’s problem, due to Wright [8]. Given
a positive integer k, is there a ν(k) such that every integer can be written as
1 nk1 + · · · + ν(k) nkν(k) ,
in which 1 , 2 , . . . , ν(k) ∈ {−1, 0, 1}? Solve the easier Waring’s problem and show
that ν(k) ≤ 2k−1 + 12 k!.
1920: Comments
The four-square identity. Before giving the solution to the problem, let us
return to the four-square identity. Let
z1 = a1 + ia2 , z2 = a3 + ia4 , w1 = b1 + ib2 , and w2 = b3 + ib4 .
42 1920. WARING’S PROBLEM
To obtain the four-square identity, compute the determinants of both sides of

! ! !
z1 −z2 w1 −w2 z1 w1 − z2 w2 −z1 w2 − z2 w1
= . (1920.4)
z2 z1 w2 w1 z2 w1 + z1 w2 −z2 w2 + z1 w1
The determinants of the two matrices on the left-hand side of (1920.4) are
|z1 |2 + |z2 |2 = a21 + a22 + a23 + a24 and |w1 |2 + |w2 |2 = b21 + b22 + b23 + b24 ,
respectively. The determinant of the matrix on the right-hand side of (1920.4)
is the right-hand side of the four-square identity. The multiplicative property of
determinants completes the proof. What is really going on here? That involves
quaternions, another story altogether.
Solution to the problem. Here is a quick sketch of the solution; for more
details see [6]. The idea is to exploit the fact that both sums and differences
are allowed. This allows us to overshoot our target number, then fall back down
through subtractions. For f : N → N, let
(Δf )(x) = f (x + 1) − f (x),
(Δ(2) f )(x) = (Δ(Δf ))(x), and so forth. Induction confirms that
r
r
(Δ(r) f )(x) = (−1)r− f (x + )

=0
for r = 1, 2, . . . and that

Δ(k−1) xk = k!x + dk ,
in which dk is an integer that depends on k but not on x. Thus,

k−1
k−1
k!x + dk = (−1)k−1− (x + )k

=0
is the sum and difference of at most

k − 1
k−1
= 2k−1

=0
kth powers. Given N , write N −dk = k!z +w, in which |w| ≤ k!/2. Since 1 = 1k , we
see that w is at worst the sum or difference of at most 12 k! kth powers. Consequently,
N is a sum and difference of at most 2k−1 + 12 k! kth powers. What is the optimal
number of summands?
Bibliography
[1] G. H. Hardy and J. E. Littlewood, Some problems of “Partitio numerorum”, I: A new solu-
tion of Waring’s problem, Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen,
Mathematisch-Physikalische Klasse (1920), 33-54. https://eudml.org/doc/59073.
[2] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio Numerorum’: IV. The singular
series in Waring’s Problem and the value of the number G(k), Math. Z. 12 (1922), no. 1, 161–
188, DOI 10.1007/BF01482074. http://link.springer.com/article/10.1007%2FBF01482074.
MR1544511
[3] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’ (VI): Further
researches in Waring’s Problem, Math. Z. 23 (1925), no. 1, 1–37, DOI 10.1007/BF01506218.
http://link.springer.com/article/10.1007%2FBF01506218. MR1544728
[4] G. H. Hardy and S. Ramanujan, Asymptotic Formulaae in Combinatory Analysis, Proc.

London Math. Soc. (2) 17 (1918), 75–115, DOI 10.1112/plms/s2-17.1.75. http://plms.
oxfordjournals.org/content/s2-17/1/75.full.pdf+html. MR1575586
[5] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[6] M. B. Nathanson, Additive number theory: The classical bases, Graduate Texts in Mathemat-
ics, vol. 164, Springer-Verlag, New York, 1996. MR1395371
[7] P. Pollack, On Hilbert’s solution of Waring’s problem, Cent. Eur. J. Math. 9 (2011), no. 2,
294–301, DOI 10.2478/s11533-011-0009-z. MR2772425
[8] E. M. Wright, An Easier Waring’s Problem, J. London Math. Soc. 9 (1934), no. 4, 267–272,
DOI 10.1112/jlms/s1-9.4.267. MR1574875
1921
Mordell’s Theorem
Introduction
An elliptic curve is a plane curve E determined by an equation of the form
y 2 = x3 + ax + b, (1921.1)
in which a and b are fixed integers and the discriminant
Δ = −16(4a3 + 27b2 )
is nonzero. The nonvanishing of the discriminant ensures that E has no cusps, self-
intersections, or isolated points; see Figure 1. Elliptic curves have many fascinating
properties; we can barely scratch the surface of this important topic.
(a) a = −2, b = 0, Δ = 512 (b) a = −1, b = 1, Δ = −368 (c) a = −3, b = 3, Δ = −2160
(d) a = 0, b = 2, Δ = −1728 (e) a = −3, b = 2, Δ = 0 (f) a = 0, b = 0, Δ = 0
Figure 1. (a)–(d) are elliptic curves. If Δ > 0, the curve is

disconnected; if Δ < 0, the curve is connected. If Δ = 0, then the
curve is not an elliptic curve: (e) has a self-intersection and (f) has
a cusp.
45
46 1921. MORDELL’S THEOREM
y y
Q
x Q x
P
P
P +Q
P +Q
(a) a = −3, b = 3 (b) a = −2, b = 1
Figure 2. Addition of points on two elliptic curves.
Let E be an elliptic curve. A point (x, y) ∈ E with rational coordinates is

a rational point of E. Amazingly, the set E(Q) of rational points on E (which
includes a “point at infinity”1 ) can be endowed with the structure of an abelian
group. Here is a quick, if imprecise, explanation; see [8, 9] for complete details.
Let P = (x1 , y1 ) and Q = (x2 , y2 ) be distinct rational points on E. The straight
line y = αx+β that connects P and Q has rational coefficients α and β. It intersects
E in a third point, R = (x3 , y3 ), which is also rational. Indeed,
(αx + β)2 = x3 + ax + b (1921.2)
is a cubic equation in x with rational coefficients and two known rational roots, x1
and x2 . The third root, x3 , must also be rational; this can be seen by expanding
the left-hand side of
(x − x1 )(x − x2 )(x − x3 ) = 0
and comparing it to the cubic obtained in (1921.2). Thus, y3 = αx3 + β is rational
as well. The sum of P and Q is defined to be the reflection, with respect to the
x-axis, of R; see Figure 2. To be more specific,

y2 − y1 2 y −y
2 1 y1 x2 − y2 x1
P +Q = − x1 − x2 , x3 − . (1921.3)
x2 − x1 x2 − x1 x2 − x1
It is not easy to show that addition in E(Q) is associative; have a look at (1921.3)
and try to prove that (P + Q) + R = P + (Q + R)! Although brute force works, a
higher-level approach is to use the Riemann–Roch theorem; see the 1945 entry.
In 1921–1922 Louis Mordell (1888–1972) proved that E(Q) is a finitely gen-
erated abelian group. That is, E(Q) is isomorphic to Zr ⊕ T for some nonnega-
tive integer r and some finite abelian group T. The number r is the (geometric)
1 To be more precise, one instead considers the curve y 2 z = x3 + axz 2 + bz 3 in projective
space; the point at “infinity” is the equivalence class of (0, 1, 0).

rank of E(Q) and the group T is the torsion subgroup of E(Q). Barry Mazur
(1937– ) proved that T must be of the form Z/nZ for 1 ≤ n ≤ 12 or Z/2Z × Z/2nZ
for n ∈ {1, 2, 3, 4} [2, 3]. Moreover, each of these groups occurs infinitely often as
the torsion subgroup of an elliptic curve.
It is possible for r = 0 to occur. In fact, it is conjectured that 50% of elliptic
curves have rank 0 and 50% have rank 1 (in a probabilistic sense that we cannot
make precise here). For instance, one can show that the elliptic curve E defined by
y 2 = x3 − x has only four rational points:

E(Q) = (0, 0), (1, 0), (0, 1), ∞ ;
see [7, Ex. 1.1]. In this case, rank E = 0 and E(Q) is isomorphic to Z/2Z × Z/2Z.
On the other hand, the elliptic curve with the largest known rank is
y 2 + xy + y
= x3 − 120039822036992245303534619191166796374x
+ 504224992484910670010801799168082726759443756222911415116.
This can be put in “standard form” (1921.1) with a change of variables, although
the coefficients become even larger. The rank of this curve is at least 24; the actual
rank is unknown (it is suspected to be exactly 24).
In addition to being of theoretical interest, elliptic curves play an important
role in cryptography, factorization algorithms, and primality testing; see [10] and
the references in [7]. Their group structure is far richer than the group structure
arising in the cyclic groups (Z/pqZ)× , in which p and q are distinct primes, that
are used in RSA (see the entry for 1977).

Proposed by Stephan Ramon Garcia, Pomona College, and Steven J.
We sketched a geometric definition of addition in E(Q). However, we glossed
over some important details. Think about the following questions geometrically. If
P ∈ E(Q), what is P + P ? What is the additive identity in E(Q)? What is the
inverse −P of P in E(Q)? Why is addition in E(Q) associative?
Now consider the equation
y 2 = x4 + ax3 + bx2 + cx + d.
For which choices of coefficients a, b, c, d is there an analogous definition of adding
two rational points and obtaining a rational point? Or any definition for adding
two rational points? More generally, what if we replace a quartic with a fixed
polynomial of higher degree?
1921: Comments
The Birch and Swinnerton-Dyer conjecture. The celebrated Birch and
Swinnerton-Dyer conjecture, one of the Clay Millennium Problems (see the com-
ments for the 2000 entry), concerns the rank of elliptic curves. The 2014 Fields
Medal of Manjul Bhargava (1974– ) was awarded in part for work related to this
problem. The conjecture states that the geometric rank of an elliptic curve equals
its analytic rank . What does this mean?
48 1921. MORDELL’S THEOREM
We may consider the equation (1921.1) modulo some prime p; for technical
reasons, p should not divide Δ. The number of points on the elliptic curve modulo
p is

Np = 1 + (x, y) ∈ (Z/pZ)2 : y 2 = x3 + ax + b ,
in which the +1 is added for the “point at infinity.” Helmut Hasse (1898–1979)
proved that
√
|Np − (p + 1)| ≤ 2 p (1921.4)
for every prime p that does not divide Δ. The Hasse–Weil L-function of the elliptic
curve E is the function
(p + 1) − Np p −1
L(E, s) = 1− 2
+ 2s × p (E, s)−1
p p
p∈P p∈P
pΔ pΔ
of the complex variable s, in which p (E, s) is a certain polynomial in p−s that does
not vanish at s = 1. Hasse’s bound (1921.4) ensures that the product that defines
L(E, s) converges absolutely and locally uniformly on the half plane Re s > 32 .
Consequently, L(E, s) is an analytic function of s on this region.
The famed Taniyama–Shimura conjecture ensures that L(E, s) has an analytic
continuation to C satisfying a certain functional equation analogous to that satisfied
by the Riemann zeta function (see the 1928, 1933, 1939, 1942, 1945, 1967, and
1987 entries). The analytic rank of E is the order of the zero of L(E, s) at s = 1.
The simplest version of the Birch and Swinnerton-Dyer conjecture asserts that the
geometric and analytic ranks of an elliptic curve are equal. See [4, 5, 8] for more on
L-functions associated to elliptic curves. See [1, 11] and the references therein for
results towards the Birch and Swinnerton-Dyer conjecture and the distribution of
ranks.
Bibliography
[1] M. Bhargava and A. Shankar, Ternary cubic forms having bounded invariants, and the ex-
istence of a positive proportion of elliptic curves having rank 0, Ann. of Math. (2) 181
(2015), no. 2, 587–621, DOI 10.4007/annals.2015.181.2.4. http://annals.math.princeton.
edu/2015/181-2/p04. MR3275847
[2] B. Mazur, Modular curves and the Eisenstein ideal, Inst. Hautes Études Sci. Publ. Math.
47 (1977), 33–186 (1978). http://link.springer.com/article/10.1007%2FBF02684339.
MR488287
[3] B. Mazur, Rational isogenies of prime degree (with an appendix by D. Goldfeld), Invent.
Math. 44 (1978), no. 2, 129–162, DOI 10.1007/BF01390348. http://link.springer.com/
article/10.1007%2FBF01390348. MR482230
[4] A. W. Knapp, Elliptic curves, Mathematical Notes, vol. 40, Princeton University Press,
[5] Á. Lozano-Robledo, Elliptic curves, modular forms, and their L-functions, Student Math-
ematical Library, vol. 58, American Mathematical Society, Providence, RI; Institute for
Advanced Study (IAS), Princeton, NJ, 2011. IAS/Park City Mathematical Subseries.
MR2757255
[6] L. J. Mordell, On the rational solutions of the indeterminate equations of the third and fourth
degrees, Proc Cam. Phil. Soc. 21 (1922).
[7] K. Rubin and A. Silverberg, Ranks of elliptic curves, Bull. Amer. Math. Soc. (N.S.) 39 (2002),
no. 4, 455–474, DOI 10.1090/S0273-0979-02-00952-7. MR1920278
[8] J. H. Silverman, The arithmetic of elliptic curves, Graduate Texts in Mathematics, vol. 106,
Springer-Verlag, New York, 1986. MR817210
[9] J. H. Silverman and J. Tate, Rational points on elliptic curves, Undergraduate Texts in
Mathematics, Springer-Verlag, New York, 1992. MR1171452
[10] L. C. Washington, Elliptic curves: Number theory and cryptography, 2nd ed., Discrete Math-
ematics and its Applications (Boca Raton), Chapman & Hall/CRC, Boca Raton, FL, 2008.
MR2404461
[11] A. Wiles, The Birch and Swinnerton-Dyer conjecture, The millennium prize problems, Clay
Math. Inst., Cambridge, MA, 2006, pp. 31–41. http://www.claymath.org/sites/default/
files/birchswin.pdf. MR2238272
1922
Lindeberg Condition
Introduction
This year celebrates a milestone in the history of the central limit theorem, one
of the most important results in probability theory. We first introduce some of the
key concepts. A continuous random variable X has density fX if
(a) fX (x) ≥ 0,
∞
(b) fX (x) dx = 1, and
−∞
b
(c) P (a ≤ X ≤ b) = fX (x) dx,
a
in which P (a ≤ X ≤ b) denotes the probability that X takes on a value in the closed
interval [a, b]. This leads to one of the most important applications of integration:
it allows us to compute probabilities.
The nth moment of a random variable X with density fX , also called the
expected value of X n , is
∞
E[X ] =
n
xn fX (x) dx.
−∞
The two most important moments are the mean

∞
μX = E[X] = xfX (x) dx
−∞
and the variance (the second centered moment of fX )

∞
σX = E[(X − μX ) ] =
2 2
(x − μX )2 fX (x) dx = E[X 2 ] − E[X]2 .
−∞
When the random variable X is clear from context, we often simplify the notation
and write μ, σ, and f in place of μX , σX , and fX , respectively.
The mean is the average value of a random variable. The standard deviation,
the square root of the variance, is the natural scale to measure fluctuations from the
mean. If you assign units to the random variable, say meters, then the mean and
the standard deviation are both in meters while the variance is in meters-squared.
If we want to have confidence intervals about a measurement, then the units of the
uncertainty should be the same as the units of the random variable. This is why
the standard deviation is the natural quantity considered in many problems.
There are many densities that arise in theory and applications. The normal
distribution occupies a central role in the subject; we will see why shortly. A random
51
52 1922. LINDEBERG CONDITION
Figure 1. Plots of Gaussians with variance 1 and 12 .
variable X is normally distributed with mean μ and variance σ 2 if its density is

2 2
e−(x−μ) /2σ
√ ;
2πσ 2
see Figure 1 for two representative plots. We also call this density a Gaussian or a
bell curve.
We can standardize a random variable X by passing to (X − μX )/σX , which
has mean 0 and variance 1. The central limit theorem says that for appropriate
random variables Xi , if we set
Yn = X1 + · · · + Xn ,
then
Zn = (Yn − μYn )/σYn
converges to being normally distributed as n → ∞. For example, draw each Xi
from a uniform random variable on [− 12 , 12 ]. Then
⎧ 1 √ 3 √ √ √
⎪
⎪ √ 3z + 18z 2 + 36 3z + 72 if −2 3 < z ≤ − 3,
⎪
⎪ 54 3
√ 3 √
⎪
⎪
⎪
⎪ 18√3 − 3z − 6z + 12
1 2
if − 3 < z < 0,
⎪
⎨
2
fZ8 (z) = 3√ if z = 0,
⎪
⎪
3
√ 3 √
⎪
⎪
⎪
⎪
1
√ 3z − 6z 2 + 12 if 0 < z < 3,
⎪
⎪ 18 3
⎩ 1√ √ 3
⎪ √ √ √
54 3
− 3z + 18z 2
− 36 3z + 72 if 3 ≤ z < 2 3.
Although the preceding formula is exact, it is hard to work with. The central limit
theorem permits us to approximate this exact, but hard to manipulate, expression
with the density of a normal distribution; see Figure 2. In particular, observe how
rapid the convergence is.
The central limit theorem has a long and rich history. It has been a perennial
quest to find the weakest possible conditions that suffice to ensure convergence to
(a) Z1 (b) Z2
(c) Z4 (d) Z8
Figure 2. Plots of normalized sums Zn of n copies of the uniform

random variable on [− 12 , 12 ] versus the standard normal.
normality. Typically the first version students encounter has the random variables
identically distributed, and the even moments
∞
m2k = x2k fX (x) dx
−∞
growing sufficiently slowly so that

∞
m2k 2k
t
(2k)!
k=0
converges for all |t| < δ for some δ > 0. This comes from a desire to have the
moment generating function
∞
MX (t) = E[e ] =
tX
etx fX (x) dx
−∞
converge for t in an open neighborhood of the origin. Unfortunately, the moment

generating function is not necessary well-defined, even for some common random
variables. This occurs, for example, with the Cauchy random variable, whose den-
sity is
1 1
.
π 1 + x2
Thus, one often studies the closely related characteristic function

∞
φX (t) = E[eitX ] = eitx fX (x) dx,
−∞
essentially the Fourier transform of fX , which always exists.

In 1922 Jarl Lindeberg (1876–1932) proved that a certain set of conditions
on the Xi forces convergence to a normal distribution. Specifically, consider the
following situation.
Central Limit Theorem: Let Xk be random variables on a probability space, and
2
assume that the means μXk and variances σX k
exist and are finite. Let

1 if |Xk | ≥ sn ,
I(|Xk | ≥ sn ) =
0 otherwise,
and let E[·] denote expectation relative to the underlying probability space. If s2n =
n 2
k=1 σXk and for all > 0 we have
n
k=1 E[(Xk − μXk ) I(Xk ) ≥ sn ]
2
lim = 0,
n→∞ s2n
2
then Zn converges to a Gaussian. If we additionally assume maxk σX k
/s2n → 0,
then this condition is also necessary.

Instead of caring about the sum Yn = X1 + · · · + Xn , suppose that we only
care about its value modulo 1; that is, we are concerned with Yn − Yn , in which
· denotes the greatest integer function. This expression cannot converge to a
Gaussian since it is only nonzero on [0, 1). What do you expect this sum to converge
to? What is the most general set of conditions required to ensure convergence?
1922: Comments
A useful trick. By taking logarithms we can convert questions about products
of random variables to questions about sums of related random variables. If Yi =
logB Xi , then to understand the distribution of the product X1 · · · Xn it suffices
to determine the distribution of Y1 + · · · + Yn and then exponentiate. In many
situations Lindeberg’s conditions hold for the Yi and as n tends to infinity the sum
is approximately a Gaussian. Since we have not standardized this sum, we expect
the variance of the Gaussian to tend to infinity as we add more and more terms.
Given a positive real number r, we may write it uniquely as r = S10 (r)10k(r) ,
where S10 (r) ∈ [1, 10) is the significand and k(r) is an integer. Computing the
sum Y1 + · · · + Yn modulo 1, in which Yi = log10 Xi , is equivalent to determining
the significand of the product X1 · · · Xn , which is the first nonzero digit of this
product when rounded down. See [4, 6] for proofs that the sum converges to the
uniform distribution, as well as applications to Benford’s law; see the 1938 entry. In
addition to being of theoretical interest, such probabilistic digit laws are frequently
used in a variety of fields. The Internal Revenue Service (IRS) uses them to detect
tax fraud and computer scientists use them to optimize systems architecture.
Bernoulli random variables. The linearity of expectation frequently sim-

plifies the evaluation of a sum. We say that X is a Bernoulli random variable
with parameter p ∈ [0, 1] if X assumes the value 1 with probability p and 0 with
probability 1 − p. If X1 , . . . , Xn are independent Bernoulli random variables, then
Sn = X 1 + · · · + X n
is a binomial random variable (with parameters p and n). This random variable
assumes values in {0, 1, . . . , n}, and the probability that Sn assumes the value k is

n k
p (1 − p)n−k .
k
To calculate the mean of Sn from the definition, we need to evaluate
n
n k
E[Sn ] = k p (1 − p)n−k ,
k
k=0
which is a tedious exercise. However, the linearity of expectation provides an easier
evaluation. Since each Xk is a Bernoulli random variable with parameter p,
E[Xk ] = 1p + 0(1 − p) = p,
and hence
E[Sn ] = E[X1 + · · · + Xn ]
= E[X1 ] + · · · + E[Xn ]
= p + p + ··· + p

n times
= np.
The central limit theorem ensures that, when appropriately normalized, Sn ap-
proaches a Gaussian distribution; see Figure 3.
(a) n = 10 (b) n = 20
(c) n = 50 (d) n = 100
Figure 3. Convergence of a binomial random variable to a Gaussian.

Bibliography
[1] P. Billingsley, Probability and measure, 3rd ed., A Wiley-Interscience Publication, Wiley Se-
ries in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1995.
MR1324786
[2] J. W. Lindeberg, Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeit-
srechnung (German), Math. Z. 15 (1922), no. 1, 211–225, DOI 10.1007/BF01494395. http://
link.springer.com/article/10.1007%2FBF01494395. MR1544569
[3] W. Feller, An introduction to probability theory and its applications. Vol. II., Second edition,
John Wiley & Sons, Inc., New York-London-Sydney, 1971. MR0270403
[4] S. J. Miller (ed.), Benford’s Law: theory and applications, Princeton University Press, Prince-
ton, NJ, 2015. MR3408774
[5] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton
Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480
[6] S. J. Miller and M. J. Nigrini, The modulo 1 central limit theorem and Benford’s law for
products, Int. J. Algebra 2 (2008), no. 1-4, 119–130. http://arxiv.org/pdf/math/0607686v2.
MR2417189
1923
The Circle Method
Introduction
In 1923 G. H. Hardy and J. E. Littlewood published a landmark paper [3] that
further developed the celebrated circle method; see the 1920 entry. This method
is well represented in the early years of this book. This is no accident, since it has
enjoyed spectacular success in resolving difficult problems in number theory. Their
paper attacked many questions in additive number theory, including the ternary
Goldbach conjecture, the twin prime conjecture, and the distribution of admissible
tuples of primes. The first two problems are discussed in the 1937 and 1919 entries,
respectively. We discuss the third problem below.
The expressions n, n + 2, and n + 4 are simultaneously prime if and only if
n = 3; this yields the primes 3, 5, and 7. Indeed, n, n+2, and n+4 are congruent to
n, n + 2, and n + 1 modulo 3, respectively. Therefore, exactly one of these numbers
is divisible by 3, which leads to only one prime triple, (3, 5, 7). This congruence
obstruction prevents (n, n + 2, n + 4) from being a triple of primes infinitely often.
On the other hand, there is no congruence obstruction that prevents n and n + 2
from being simultaneously prime. The twin prime conjecture suggests that this
occurs infinitely often.
Hardy and Littlewood conjectured that if there are no congruence obstructions
that prevent a particular k-tuple of primes from occurring, then there are infinitely
many such k-tuples. They also gave asymptotic formulas for the expected number
of k-tuples of primes below a certain threshold. The main term is the product of
a function asymptotic to x/(log x)k+1 and a constant that depends on the k − 1
neighbor differences and vanishes if there is a congruence obstruction. For example,
they predicted that the number π2 (x) of twin primes at most x obeys
p(p − 2) x dt
π2 (x) ∼ 2
(p − 1)2 2 (log t)2
p≥3
as x → ∞ (see the comments for the 2005 entry for information about the Bateman–
Horn conjecture, a broad generalization). The formula does a phenomenal job; the
number of twin primes at most 1016 is 10,304,195,697,298, which differs from the
Hardy–Littlewood prediction by 3,142,802. Although the error might seem large,
it is small in terms of percentages. The ratio of the error to the true value is about
3 × 10−7 (and we believe the ratio gets smaller the higher up we go). To put this
in the proper perspective, MapQuest lists it as 2,990.1 miles from Fenway Park in
Boston to Dodger Stadium in Los Angeles. A similarly precise measurement here
would correspond to an error of about 5 feet (about a third of the length of a typical
car).
57
58 1923. THE CIRCLE METHOD
Ben Green and Terence Tao extended a seminal result of Endre Szemerédi
(1940– ) and established that the primes contain arbitrarily long arithmetic pro-
gressions [2]; see the 1975 and 2004 entries. That is, given , there exist a and b
so that an + b is prime for n = 1, 2, . . . , . This differs from Dirichlet’s theorem on
primes in arithmetic progressions (see the 1913 entry), in which one fixes relatively
prime a and b and then concludes that the set {an + b : n = 1, 2, . . .} contains infin-
itely many primes (along with many composite numbers). The Green–Tao theorem
is a consequence of the more far-reaching Hardy–Littlewood k-tuple conjecture,
which is currently beyond our reach.

Let us consider a significantly easier problem. Prove, without using the Green–
Tao theorem, that there are infinitely many triples of primes of the form
(p, p + 2mp , p + 4mp ),
in which mp is a constant that depends upon p. Here the difference between neigh-
boring primes can depend on the first prime in the sequence. That is, we do not
require that the triples have the same common differences. For example, the triples
(11, 17, 23), which has a common difference of 6, and (29, 41, 53), which has a com-
mon difference of 12, are acceptable.
More generally, consider a set A of positive integers and let
A(x) = |A ∩ {1, 2 . . . , x}|.
For the set of prime numbers, A(x) √ ∼ x/ log x by the prime number theorem. For
the set of perfect squares, A(x) ∼ x. The quotient A(x)/x can be interpreted
as the density of the set A in {1, 2, . . . , x}. The hope is that if the density of A
is sufficiently high as x → ∞, then A should contain infinitely many triples in
arithmetic progression. On the other hand, if the density remains low, then there
may not be infinitely many triples.
(a) Find a function g(x) such that if A(x) ≥ g(x), then there are infinitely many
triples (n, n + mn , n + 2mn ) in A, in which mn depends on n.
(b) Find a function h(x) such that if A(x) ≤ h(x), then it is possible for A to lack
infinitely many such triples.
(c) Of course, one could take g(x) = x and h(x) = 2. While the analysis is easy in
these extreme cases, the goal is to find the best functions possible. What are
the smallest g(x) and the largest h(x) that we can take and still successfully
resolve the problem?
(d) If you cannot find a function that yields infinitely many triples, can you at
least ensure the existence of one, or find a special sequence so that there are
no triples?
1923: Comments
The power of counting. The centennial problem illustrates another impor-
tant idea: the power of counting arguments. Brun proved his theorem on the
convergence of the sum of the reciprocals of the twin primes by showing there is a
C > 0 such that
2
log log x
π2 (x) ≤ Cx
log x
for sufficiently large x. Brun’s estimate permits us to obtain an upper bound on the
number of the twin primes in each interval of the form [2n , 2n+1 ). The reciprocals
of such primes are at most 1/2n , from which one can show that the sum of the
reciprocals of the twin primes converges.
Suppose that p is prime, p a, and p b. For a given integer c, are there any
points (x, y) on the “ellipse”
ax2 + by 2 ≡ c (mod p)?
Although the expression appears familiar, we are working modulo p and things
become hard to visualize. The answer involves a beautiful counting argument.
First observe that there are exactly p+1 2
2 distinct values modulo p assumed by x ,
p−1
namely 0 and 2 nonzero values. This is because
x2 ≡ y 2 (mod p) ⇐⇒ x ≡ y (mod p) or x ≡ −y (mod p).
By hypothesis, a is invertible modulo p and hence ax also assumes p+1
2
2 distinct
values modulo p. Similarly, −by + c assumes 2 distinct values as well. If there
2 p+1
did not exist x, y such that ax2 ≡ −by 2 + c (mod p), then there would be at least
p+1 p+1
+ =p+1
2 2
distinct residue classes modulo p, which is impossible. Thus, ax2 + by 2 ≡ c (mod p)
has a solution.
The partition function. Although generally attributed to Hardy and Little-

wood, the basic ideas of the circle method originated in Hardy and Ramanujan’s
work on the partition function a few years earlier [4]. The partition function p(n)
counts the number of ways to write n as a sum of nonincreasing integers. For
example, p(4) = 5 since
4 = 3 + 1 = 2 + 2 = 2 + 1 + 1 = 1 + 1 + 1 + 1.
Hardy and Ramanujan investigated properties of the generating function
∞ ∞
n 1
p(n)z = (1923.1)
n=0
1 − zk
k=1
to study the asymptotic behavior of p(n). To see why (1923.1) iscorrect, expand
∞
each term in the product as a geometric series: (1 − z k )−1 = jk
j=0 z . Then
multiply the product of these series term-by-term. When the terms are gathered
together, the coefficient of z n will be p(n). If multiplying together infinitely many
infinite series makes you feel queazy, look at the derivation of the Euler product
formula in 1933. We give a rigorous derivation there that has a similar flavor.
In 1918, Hardy and Ramanujan proved that
"
1 2n
p(n) ∼ √ exp π ;
4n 3 3
60 1923. THE CIRCLE METHOD
that is, the quotient of these two expressions tends to 1 as n → ∞. Ramanujan

also discovered some fascinating divisibility properties of p(n). He showed that
p(5k + 4) ≡ 0 (mod 5),
p(7k + 5) ≡ 0 (mod 7), and
p(11k + 6) ≡ 0 (mod 11),
for k = 0, 1, 2, . . .. Until recently, only a few other “simple” congruence relations
of the type Ramanujan provided had been discovered. In retrospect, this is not
surprising since we now know, for example, that
p(711647853449k + 485138482133) ≡ 0 (mod 13)
and
p(28995244292486005245947069k + 28995221336976431135321047) ≡ 0 (mod 29);
see [5, 8]. The numbers involved in these expressions are so large that they could
not have been found by computation; they required a deep understanding of the
theory of modular forms. Ken Ono (1968– ), who proved that such congruences
exist for every prime modulus, is particularly fond of his discovery that
p(4063467631k + 30064597) ≡ 0 (mod 31);
it appears at the top of his homepage. For more information about the partition
function and its properties, the reader is strongly encouraged to read the expository
article [1] and the references therein.
Bibliography
[1] S. Ahlgren and K. Ono, Addition and counting: the arithmetic of partitions, Notices Amer.
Math. Soc. 48 (2001), no. 9, 978–984. http://www.ams.org/notices/200109/fea-ahlgren.
pdf. MR1854533
Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. arXiv:math.NT/
0404188. MR2415379
[3] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’; III: On
the expression of a number as a sum of primes, Acta Math. 44 (1923), no. 1, 1–
MR1555183
[4] G. H. Hardy and S. Ramanujan, Asymptotic Formulaae in Combinatory Analysis, Proc. Lon-
don Math. Soc. (2) 17 (1918), 75–115, DOI 10.1112/plms/s2-17.1.75. MR1575586
[5] F. Johansson, Efficient implementation of the Hardy-Ramanujan-Rademacher formula, LMS
J. Comput. Math. 15 (2012), 341–359, DOI 10.1112/S1461157012001088. MR2988821
[7] M. B. Nathanson, Additive number theory: The classical bases, Graduate Texts in Mathemat-
ics, vol. 164, Springer-Verlag, New York, 1996. MR1395371
[8] K. Ono, Distribution of the partition function modulo m, Ann. of Math. (2) 151 (2000), no. 1,
293–307, DOI 10.2307/121118. MR1745012
1924
The Banach–Tarski Paradox
Introduction
A mathematical paradox often leads to a reevaluation of underlying assump-
tions. A good paradox can greatly influence the development of a subject. The
Banach–Tarski paradox is one of the best: there exists a partition of the unit ball
in R3 into a finite number of subsets that can be rearranged, using rigid motions,
to yield two identical copies of the original ball. In other words, you can dissect an
orange and reassemble it into two full-sized oranges; see Figure 1.
This is impossible in the real world—hence the word “paradox”. Whereas real
oranges are made up of atoms and are cut with a knife, mathematical oranges are
made up of infinitely many points that can be partitioned into extremely compli-
cated sets. We can choose which subset each point should belong to, with no regard
Figure 1. The Banach–Tarski paradox asserts that one can par-

tition a ball in R3 into a finite number of disjoint sets that can be
rearranged, using rigid motions, to form two balls identical to the
original. Raphael Robinson (1911–1995) proved that this can be
accomplished with five pieces; fewer than five pieces will not suffice
[3].
61
62 1924. THE BANACH–TARSKI PARADOX
for nearby points (that is, ignoring continuity). One should not think of the pieces
involved as solid pieces that can be handled like everyday objects. It is best to
think of the pieces as, perhaps, “dense gas clouds.” The actual construction is
more subtle, but it does involve making infinitely many arbitrary choices. This is
permitted by the axiom of choice, a topic that deserves a healthy digression; see
the 1940 and 1999 entries.
Stefan Banach (1882–1945) and Alfred Tarski (1901–1983) actually proved
much more. For n ≥ 3, given any two bounded subsets E and F of Rn having
#N #N
nonempty interior, there are partitions E = i=1 Ei and F = i=1 Fi into dis-
joint sets so that the sets Ei and Fi are congruent (in the geometric sense) for
i = 1, 2, . . . , N . So for n ≥ 3, it is impossible to find a finitely additive, normal-
ized “volume function,” defined on all subsets of Rn , that is invariant under rigid
motions. This defeats any attempt to assign a “volume” to all subsets of Rn .
We briefly sketch the main ideas behind “doubling the ball” in R3 . Let SO3 =
SO3 (R) denote the group of all 3 × 3 real orthogonal matrices with determinant 1.
That is, SO3 is the set of rigid motions of R3 that fix the origin and preserve orien-
tation. The crucial observation is that SO3 contains a subgroup that is “isomorphic
to the free group on two generators.” In less technical language, SO3 contains two
matrices A and B for which there are no nontrivial relationships between A, A−1 ,
B, and B −1 , apart from A−1 A = I, BB −1 = I, B −1 AA−1 B = I, and so forth. For
example, ⎡3 ⎤ ⎡ ⎤
5 − 45 0 1 0 0
⎢ ⎥ ⎢ ⎥
A = ⎣ 45 3
5 0⎦ , B = ⎣0 35 − 45 ⎦
0 0 1 0 45 3
5
are such matrices (the proof is nontrivial and involves elements of Diophantine
approximation). These matrices induce rotations by θ = tan−1 ( 34 ), which is not a
rational multiple of π, with respect to the z- and x-axes, respectively. Let Γ ⊂ SO3
denote the group generated by A and B; it consists of words in A, A−1 , B, and B −1 ,
such as AB 2 B −2 A−1 A2 B. This word is not reduced because further cancellation is
possible; it reduces to A2 B.
Let w(A) be the set of all reduced words that begin with A, and so on, and let
Γ1 = w(A) ∪ {I, A−1 , A−2 , A−3 , . . .},
Γ2 = w(A−1 )\{A−1 , A−2 , A−3 , . . .},
Γ3 = w(B), and
−1
Γ4 = w(B ).
We have the paradoxical decomposition
Γ = Γ1 ∪ Γ2 ∪ Γ3 ∪ Γ4 = Γ1 ∪ AΓ2 = Γ3 ∪ BΓ4 ,
which facilitates “doubling the ball.”
Let S 2 = {x ∈ R3 : x = 1} denote the surface of the unit sphere in R3 ;
the exponent 2 denotes that S 2 is a “two-dimensional manifold” (a microscopic
observer on the surface of the sphere will think that they are in R2 ; see the 2003
entry for more information). Define a relation on S 2 by saying that x ∼ y if there
is a C ∈ Γ so that Cx = y. This is an equivalence relation: ∼ is reflexive since
I ∈ Γ; ∼ is symmetric since Γ is closed under inversion; ∼ is transitive since matrix
multiplication is associative.
The relation ∼ partitions S 2 into equivalence classes, which are necessarily

disjoint. Form a set X by selecting one element from each equivalence class. Since
S 2 = ΓX, we have the decompositions
S 2 “=” Γ1 X Γ2 X Γ3 X Γ4 X
and
S 2 “=” Γ1 X AΓ2 X = Γ3 X BΓ4 X.
Now join each point on the sphere to 0 and consider the corresponding decomposi-
tions to obtain the Banach–Tarski paradox for the unit ball in R3 . The quotation
marks indicate that these equalities are not entirely legitimate. We have not dis-
cussed the countably many points of S 2 (the “poles” of some rotation) that are
fixed by some element of Γ. Nor have we discussed what happens to 0. We do not
explore the details further, although one can see that the introduction of additional
“pieces” in some elaborate manner is involved.

Proposed by Stephen Bigelow, University of California, Santa Barbara.
Perhaps the most fruitful legacy of the Banach–Tarski paradox is in group
theory. Inspired by the paradox, John von Neumann (1903–1957) defined amenable
groups, which are groups that do not admit a paradoxical decomposition. More
precisely, a discrete group is amenable if and only if it has a finitely additive,
left-invariant probability measure. Such a measure gives a reasonable notion of
“volume,” exactly the concept that the Banach–Tarski paradox appears to violate.
The Thompson group F was introduced by John Thompson (1932– ) in 1965
and has unusual properties that make it a good source of counterexamples. It can
be defined as the group of piecewise-linear bijections from the unit interval to itself
for which all nondifferentiable points are dyadic rational numbers and all slopes are
powers of two. Is the Thompson group amenable?
The question of its amenability is still controversial. A preprint by E. T.
Shavgulidze [4] claims to show it is amenable, and one by Azer Akhmedov [1]
claims to show it is not. The consensus seems to be that both preprints contain
serious gaps, and the correct answer is not clear.
1924: Comments
Why three dimensions? The Banach–Tarski paradox dealt with a solid ball
in R3 . What happens if we work with a disk in R2 ? Recall that SO3 contains
a subgroup isomorphic to the free group on two generators. The group SO2 of
2 × 2 orthogonal matrices with determinant 1 does not contain such a subgroup.
Indeed, SO2 is the group of all rotations of R2 around the origin. Rotations in R2
commute with each other, so SO2 is an abelian (commutative) group. It is too nice
for something like the “paradoxical decomposition” required for the Banach–Tarski
paradox to occur.
The problem of measure. Here is a closely related paradox that helped

launch the field of measure theory. Suppose that we wish to define a function
m : P(R) → [0, ∞] that assigns a “length” to each subset of R (recall that P(R) is
the set of all subsets of R; see p. 31). Our intuition tells us such a function should
64 1924. THE BANACH–TARSKI PARADOX
satisfy the following three axioms:

(a) m (a, b) = b − a if a < b.
(b) If An is a sequence of disjoint subsets of R, then
∞ * ∞
m(An ) = m An .
n=1 n=1
(c) If x + A = {x + a : a ∈ A}, then m(x + A) = m(A) for all x in R.

A function m : P(R) → [0, ∞] that satisfies these properties would be funda-
mental to analysis, topology, and geometry. Unfortunately, no such function exists!
Suppose toward a contradiction that m : P(R) → [0, ∞] satisfies (a), (b), and (c).
One can show that the relation
x∼y ⇐⇒ x−y ∈Q
on the open interval (0, 1) is an equivalence relation. Thus, (0, 1) is the disjoint
union of equivalence classes Eα , in which α runs over some index set I. Let S ⊆
(0, 1) contain exactly one element sα from each equivalence class Eα , let r1 , r2 , r3 , . . .
be an enumeration of Q ∩ (−1, 1), and let Sn = rn + S.
Because ∼ partitions (0, 1) into equivalence classes, each x in (0, 1) belongs to
some Eα . Since S contains a representative from each equivalence class, there is
an sα ∈ S so that x ∼ sα . Thus, x − sα = rn for some n since |x − sα | < 1. We
conclude that
*∞
x = rn + s α ∈ S n ⊆ Sn ,
n=1
and hence
∞
*
(0, 1) ⊆ Sn .
n=1
If x ∈ Si ∩ Sj , then
x = ri + s α = rj + s β
for some sα , sβ ∈ S. Consequently,
sα − sβ = rj − ri ∈ Q,
so sα ∼ sβ . Since S contains exactly one element from each equivalence class,
sα = sβ , from which ri = rj and Si = Sj follow. Thus, the Sn are disjoint.
A consequence of axiom (b) is that m is a monotone set function: A ⊆ B =⇒
m(A) ≤ m(B). Since Sn ⊆ (−1, 2) for all n,
∞
*
(0, 1) ⊆ Sn ⊆ (−1, 2)
n=1
and hence
*
∞
1 = m (0, 1) ≤ m Sn ≤ m (−1, 2) = 3.
n=1
The translation invariance of m ensures that m(Sn ) = m(S) for all n, so
∞

1≤ m(S) ≤ 3.
n=1
Since m(S) is constant, the preceding is impossible. Thus, no such m exists.

The set S constructed above is an example of a Vitali set, named after Giuseppe
Vitali (1875–1932). These sets are so strange that there is no reasonable way to
define their “length.” Their construction relies upon the axiom of choice; see the
1940 and 1999 entries for more about this fascinating, and somewhat controversial,
axiom of set theory.
Bibliography
[1] A. Akhmedov, A new metric criterion for non-amenability III: Non-amenability of R. Thomp-
son’s group F , http://arxiv.org/abs/0902.3849.
[2] S. Banach and A. Tarski, Sur la décomposition des ensembles de points en parties respective-
ment congruentes, Fund. Math. 6 (1924), 244-277. http://matwbn.icm.edu.pl/ksiazki/fm/
fm6/fm6127.pdf
[3] R. M. Robinson, On the decomposition of spheres, Fund. Math. 34 (1947), 246–260,
DOI 10.4064/fm-34-1-246-260. http://matwbn.icm.edu.pl/ksiazki/fm/fm34/fm34125.pdf.
MR0026093
[4] E. T. Shavgulidze, About amenability of subgroups of the group of diffeomorphisms of the
interval, http://arxiv.org/abs/0906.0107.
[5] S. Wagon, The Banach-Tarski paradox, with a foreword by Jan Mycielski, Encyclopedia of
Mathematics and its Applications, vol. 24, Cambridge University Press, Cambridge, 1985.
MR803509
1925
The Schrödinger Equation
Introduction
1
Probably the second most famous equation in physics is Newton’s second law:
F = ma. Here F is the force acting on a body, m is the mass of the body, and a is
the acceleration (the second derivative of position).
The analogue for quantum mechanical systems is the Schrödinger equation,
formulated in 1925 by Erwin Schrödinger (1887–1961). It is
∂ +
i Ψ = HΨ,
∂t
in which i2 = −1, = h/2π (h is Planck’s constant), H + is the Hamiltonian operator
of the system, and Ψ is the wave function that governs the system. To explain the
mathematics behind the Schrödinger equation with any sense of rigor would occupy
the remaining pages of this book.2
In the quantum-mechanical setting, the eigenvalues E of the time-independent
Schrödinger equation EΨ = HΨ + are the energy levels of the corresponding quan-
tum system. In this eigenvalue problem, H + is an unbounded, selfadjoint operator
on a Hilbert space (pardon the jargon). If one knows the eigenvalues of a given
Schrödinger operator, one often wants to predict how these eigenvalues are affected
by a slight modification of the original operator. Although this is far too compli-
cated to address here, we can discuss the finite-dimensional setting.
Let Mn (C) denote the set of n × n complex matrices. We say that A ∈ Mn (C)
is selfadjoint if A = A∗ , that is, if A equals its conjugate transpose (physicists tend
to use A† instead of A∗ ). A selfadjoint matrix A ∈ Mn (C) has only real eigenvalues,
denoted by
λ1 (A) ≥ λ2 (A) ≥ · · · ≥ λn (A)
and repeated according to multiplicity, along with a corresponding orthonormal
basis of eigenvectors. This is a special case of the spectral theorem [3]. How do the
eigenvalues of a selfadjoint matrix behave under a small perturbation?
Suppose that E ∈ Mn (C) is a positive semidefinite matrix of rank one. That is,
E = ee∗ ∈ Mn (C) for some nonzero column vector e ∈ Cn . Then the eigenvalues of
A interlace with those of A + E and A − E; that is, each eigenvalue of A is at most
the corresponding eigenvalue of A + E and at least the corresponding eigenvalue of
1 The most famous is undoubtedly E = mc2 ; see the entry for 1915. See Episode 2 of the
1972 Doctor Who serial The Time Monster for the more dubious E = mc3 .
2 It took the genius of John von Neumann to put quantum mechanics on a firm mathematical
foundation; see the entries for 1924, 1931, 1944, 1946 for more about him.
67
68 1925. THE SCHRÖDINGER EQUATION
A+E
A−E
−12 −10 −8 −6 −4 −2 0 2 4 6 8
Figure 1. The eigenvalues of A − E, A, and A + E interlace.
A − E. For example, if
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−4 0 2 4 1 2 1 0 1
⎢ 0 −4 1 2⎥ ⎢2 4 2 0⎥ ⎢ ⎥
A = ⎣⎢ ⎥ ⎢
and E = ⎣ ⎥ = ⎢2⎥ [1 2 1 0],
2 1 4 −2⎦ 1 2 1 0⎦ ⎣1⎦
4 2 −2 −4 0 0 0 0 0
then adding E to A increases each of its eigenvalues. Subtracting E from A de-
creases each of its eigenvalues. This is illustrated in Figure 1. This sort of eigenvalue
interlacing result is the tip of the iceberg; for more information see [4].
There are lots of other beautiful results like this that are not typically covered in
an undergraduate linear algebra course. What if we have three selfadjoint matrices
A, B, C ∈ Mn (C) that satisfy A + B = C? How are the eigenvalues of A, B, and C
related? Taking the trace of this equation indicates that
n n n
λi (A) + λi (B) = λi (C).
i=1 i=1 i=1
However, there are many other nontrivial relationships between the eigenvalues of
A, B, and C. For instance, if n = 2, then
λ1 (C) ≤ λ1 (A) + λ1 (B),
λ2 (C) ≤ λ1 (A) + λ2 (B), and
λ2 (C) ≤ λ2 (A) + λ1 (B).
For larger values of n, more and more inequalities emerge. The story was only
completed in 1999, with the resolution of the famous Horn conjecture by Alexander
Klyachko [1], and by Allen Knutson (1969– ) and Terence Tao [2].

Proposed by Stephan Ramon Garcia, Pomona College.
Why does a planet spin on an axis? For instance, Venus spins on its axis once
every 5,832 hours, Earth every 24 hours, and Mars every 25 hours. What does this
have to do with linear algebra?
1925: Comments
Stone’s theorem and the solution to the problem. Stone’s theorem is
a seminal result in the mathematical formulation of quantum mechanics. It says
that a strongly continuous, one-parameter semigroup t → U (t) of unitary operators
on a Hilbert space H is of the form U (t) = exp(itA), in which A is a (potentially
unbounded) selfadjoint operator on H. In physical terms, the time evolution of a

quantum system is obtained by exponentiating its Hamiltonian. If that sounds like
a lot of technical jargon, you are right. Stone’s theorem is usually not covered until
a second course in functional analysis.3
We content ourselves here with a finite-dimensional manifestation of Stone’s
theorem that explains why a body in three-dimensional space spins on some axis.
Consider a planet in space. Fix a coordinate system so that the center of the
planet is always at 0 ∈ R3 . With its center now fixed and free from external forces
the planet’s movement is governed by a continuous, one-parameter semigroup of
matrices denoted t → U (t), in which U (t) is a 3 × 3 real matrix for each time t.
The position of a point x on the planet’s surface at time t is U (t)x. The position of
x at time s + t is U (s + t)x. This is the same as the position of the point U (t)x at
time s, namely U (s)U (t)x. Consequently, U (s + t)x = U (s)U (t)x for each x ∈ R2 .
This is the semigroup condition: U (s + t) = U (s)U (t).
Consider further properties of the matrices U (t). Since the planet is not de-
formed as time passes, the matrices involved induce rigid motions of Euclidean
space. These are the real orthogonal matrices, which are characterized by the
condition U T U = I. If U is real orthogonal, then
1 = det I = det U T U = det(U T ) det U = (det U )2 ,
so det U = ±1. The sign of the determinant reveals whether the linear transforma-
tion induced by U is orientation preserving (+1) or reversing (−1). The orientation
of our planet does not change with time, so we insist that each U (t) has determinant
1.
We know that U (0) = I, since U (s) = U (s + 0) = U (s)U (0) for all s ∈ R. The
derivative of U (t) at time t = 0 is
U (t) − I
S = lim .
t→0 t
Stone’s theorem (which works in a much more general setting) says that the pre-
ceding limit exists and that U (t) = exp(St), in which
∞
1 n
exp(A) = A
n=0
n!
is the matrix exponential function. Since U (t)T = U (t)−1 for t ≥ 0, we conclude
that S is skew symmetric: S T = −S. In particular, U (t) = exp(itA), in which
A = −iS is selfadjoint:
T
A∗ = (iS)∗ = −iS ∗ = −iS = −iS = A.
We say that A is the infinitesimal generator of the semigroup U (t). Since S is 3 × 3
and skew symmetric,
det S = det(S T ) = det(−S) = (−1)3 det S = − det S. (1925.1)
Thus, det S = 0 and there is a nonzero x ∈ R such that Sx = 0. Therefore,
3
∞ ∞
1 n n 1 n n
U (t)x = exp(tS)x = t S x=x+ t S x = x + 0 = x.
n=0
n! n=1
n!
3 Interesting historical tidbit: mathematician Marshall H. Stone (1903–1989) was the son of
Harlan F. Stone (1872–1946), Chief Justice of the Supreme Court from 1941 to 1946.
70 1925. THE SCHRÖDINGER EQUATION
That is, the point x is fixed by each U (t). Furthermore, x generates a one-
dimensional subspace span{x} of R3 that is fixed by each U (t). In other words,
span{x} is an axis of rotation for our planet.
The fact that our model takes place in three dimensions is crucial. In an
even-dimensional universe, the planet need not have any axis of rotation. The
computation (1925.1) would only yield the useless deduction det S = det S. Here
is a 2 × 2 semigroup of real orthogonal matrices:
!
cos t − sin t
U (t) = .
sin t cos t
Their eigenvalues are eit = cos t+i sin t and they have no common (real) eigenvector.
Although U (t) rotates R2 clockwise around the origin through an angle of t, there
is no nonzero vector that is fixed by each U (t).
Bibliography
[1] A. A. Klyachko, Stable bundles, representation theory and Hermitian operators, Selecta Math.
(N.S.) 4 (1998), no. 3, 419–445, DOI 10.1007/s000290050037. MR1654578
[2] A. Knutson and T. Tao, Honeycombs and sums of Hermitian matrices, Notices Amer. Math.
Soc. 48 (2001), no. 2, 175–186. MR1811121
[3] S. R. Garcia and R. A. Horn, A Second Course in Linear Algebra, Cambridge University Press,
2017.
[4] R. A. Horn and C. R. Johnson, Matrix analysis, 2nd ed., Cambridge University Press, Cam-
bridge, 2013. MR2978290
[5] D. Hilbert, Über das Unendliche, Math. Ann. 95 (1926), 161-190. http://link.springer.
com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-08-
10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf.
[6] E. Schrödinger, An undulatory theory of the mechanics of atoms and molecules, Phys. Rev.
28 (1926), no. 6, 1049-1070. http://journals.aps.org/pr/abstract/10.1103/PhysRev.28.
1049.
1926
Ackermann’s Function
Introduction
In 1926 David Hilbert published an article on infinity [2], at that time still a
controversial topic, in which he famously declared “no one will drive us from the
paradise which Cantor created for us” (see the 1918 entry for a brief introduction
to Cantor’s theory of cardinality). In this important paper, Hilbert also described
a function discovered by his student, Wilhelm Ackermann (1896–1962).
Ackermann was trying to unify arithmetic operations on natural numbers. Just
as addition is repeated counting, multiplication is repeated addition, and exponen-
tiation is repeated multiplication, one can continue to iterate each successive oper-
ation to produce an even faster-growing one. Ackermann defined his function ϕ of
three variables recursively in such a way that
.a
..
ϕ(a, b, 0) = a + b, ϕ(a, b, 1) = a · b, ϕ(a, b, 2) = ab , ϕ(a, b, 3) = aa ,
b times
and so on. In particular, ϕ grows astronomically as its arguments increase. The

significance is that ϕ is computable, but only by using tricks like double recursion,
unbounded loops, or the operator “the least n such that.” Functions that can
be computed in a more direct manner, without resort to such devices, are called
primitive recursive.
Later authors simplified the definition but kept the spirit. The cleanest version
is due to Raphael Robinson:
⎧
⎪
⎨j + 1 if i = 0,
A(i, j) = A(i − 1, 1) if i > 0 and j = 0,
⎪
⎩
A(i − 1, A(i, j − 1)) if i > 0 and j > 0.
To get an idea of how fast this function grows, note that A(2, 3) = 9, A(3, 3) = 61,
and A(4, 3) have about 1020,000 decimal digits. The enormity of A(5, 3) is scarcely
conceivable.
Because Ackermann’s function (in whatever incarnation) grows very rapidly,
one can form a kind of “inverse” function, α, that grows so slowly that for all
practical purposes it is constant. This function turns out to play an important
role in the analysis of algorithms. For example, although there is no linear-time
algorithm for managing a sequence of “union” and “find” operations on a collection
of n disjoint sets, Robert Tarjan (1948– ) found a data structure such that these
operations can be performed in time O(n · α(n)) [5].
71
72 1926. ACKERMANN’S FUNCTION

Proposed by Jerrold Grossman, Oakland University.
Here is a problem about a modification (pun intended) of the Ackermann func-
tion. For each n > 2, define
An : {0, 1, 2, . . .} × {0, 1, 2, . . . , n − 1} → {0, 1, 2, . . . , n − 1}
by
⎧
⎪
⎨j + 1 (mod n) if i = 0,
An (i, j) = An (i − 1, 1) if i > 0 and j = 0,
⎪
⎩
An (i − 1, An (i, j − 1)) if i > 0 and j > 0.
If you make a table of its values for small i and j, and for various small values of
n, you will find that An (i, j) quickly becomes constant. For example, A13 (i, j) = 9
for all j once i ≥ 6. Prove or disprove that this behavior happens for all n.
1926: Comments
Euler’s power tower. Before discussing the solution, let us digress a bit on
a few other interesting expressions with a recursive flavor. Under certain circum-
stances, the function
··
x·
f (x) = xx (1926.1)
can be made sense of. First of all, the preceding denotes the limit of the sequence
a1 , a2 , . . . defined by a1 = x and an+1 = xan for n = 1, 2, . . .. That is, we always
group exponents “from the top down”:
x x 2
xx means x(x ) , not (xx )x = x(x ) .
Euler showed in 1783 that the expression that defines f (x) converges if
0.06598 . . . = e−e < x < e1/e = 1.4446 . . . ;
√
see Figure 1. Since 2 = 1.4142 . . .,
·
√ √2··
√ 2
s = 2
√ s
is well-defined and nonnegative. Since s = 2 , we have
√ s
s2 = ( 2 )2 = (2s/2 )2 = 2s .
Consequently, 2 log s = s log 2 and hence
log s log 2
= .
s 2
Since s ≥ 1 and log x/x is strictly decreasing on [1, ∞), we conclude that s = 2.
Assuming convergence, a similar approach can be used to evaluate
, "
√
r = 2 + 2 + 2 + 2 + · · ·.
√
Since r = 2 + r, we have r 2 = 2 + r and hence (r − 2)(r + 1) = 0. Since r = −1
is impossible, we must have r = 2.
Figure 1. Graph of Euler’s iterated exponential function (1926.1).
A continued fraction. How can we justify these sorts of computations? Let

us work through an example in detail. Consider the sequence defined by x1 = 1
and xn+1 = 1 + 1/(1 + xn ) for n ≥ 1. That is,
1 1 1
x1 = 1, x2 = 1 + , x3 = 1 + 1 , x4 = 1 + ,
2 2+ 2 2 + 2+1 1
2
and so forth. Induction confirms that 1 ≤ xn ≤ 2 and hence

1 1

|xn+1 − xn | = 1 + − 1+
1 + xn 1 + xn−1

1 1 |xn − xn−1 |

= − =
1 + xn 1 + xn−1 (1 + xn )(1 + xn−1 )
|xn − xn−1 | 1
≤ ≤ |xn − xn−1 |
(1 + 1)(1 + 1) 4
1 1
≤ · · · ≤ n−1 |x2 − x1 | =
4 2 · 4n−1
for n ≥ 2. Consequently, the limit

n ∞

L = lim xn = x1 + lim (xk − xk−1 ) = x1 + (xk − xk−1 )
n→∞ n→∞
k=2 k=2
exists since the series involved converges absolutely. Since L = limn→∞ xn+1 , it
√ that L = 1 + 1/(1 + L). Thus, L − 2 = 0, from which it follows that
1 2
follows
L = 2 since L ≥ 0. We write this as an infinite continued fraction:
√ 1
2=1+ .
2 + 2+ 1 1
2+...
See the 1931, 1934, and 1972 entries for more on continued fractions.
1 Recall that we may interchange limits with continuous functions. Here we are using the fact
that f (t) = 1 + t and g(t) = 1/t are continuous functions of the real variable t = 0.
74 1926. ACKERMANN’S FUNCTION
Solution to the problem. It turns out that there is exactly one counterex-
ample for n < 4,000,000, namely n = 1969. In this case, the values A1969 (2i, ·) are
(1698, 0, 0, 0, 0, 0, . . .), and the values A1969 (2i+1, ·) are (0, 1698, 0, 1698, 0, 1698, . . .)
for all i ≥ 4 [3]. It is not known whether there are other counterexamples.
Bibliography
[1] W. Ackermann, Zum Hilbertschen Aufbau der reellen Zahlen (German), Math. Ann.
99 (1928), no. 1, 118–133, DOI 10.1007/BF01459088. http://eretrandre.org/rb/files/
Ackermann1928_126.pdf. MR1512441
[2] D. Hilbert, Über das Unendliche (German), Math. Ann. 95 (1926), no. 1, 161–
MR1512272
[3] J. Froemke and J. W. Grossman, Unsolved Problems: A Mod-n Ackermann Function, or
What’s So Special About 1969?, Amer. Math. Monthly 100 (1993), no. 2, 180–183, DOI
10.2307/2323780. MR1542281
[4] R. M. Robinson, Recursion and double recursion, Bull. Amer. Math. Soc. 54 (1948), 987–
993, DOI 10.1090/S0002-9904-1948-09121-2. http://www.math.ntnu.no/emner/MA2301/2010h/
robinson_doublerec.pdf. MR0026976
[5] R. E. Tarjan, Efficiency of a good but not linear set union algorithm, J. Assoc. Comput.
Mach. 22 (1975), 215–225, DOI 10.1145/321879.321884. http://ecommons.library.cornell.
edu/handle/1813/5942. MR0458996
1927
William Lowell Putnam

Mathematical Competition
Introduction
Many a problem solver will be aware of the William Lowell Putnam Math-
ematical Competition, a North American undergraduate contest administered by
the Mathematical Association of America. It was founded in 1927 by Elizabeth
Lowell Putnam1 (1862–1935) in honor of her late husband, William Lowell Put-
nam (1861–1923), who firmly believed in the virtues of academic rivalry between
universities. Among the many unwritten traditions of the Putnam exam is that
every exam should have at least one problem that uses the year number as part of
a problem statement or its solution. So, it is a fitting twist that the Putnam exam
is the subject of this section.
Joseph Gallian (1942– ) wrote a fabulous overview of the Putnam exam’s his-
tory, milestones, statistics, and trivia [2]. Offered every year since 1938 (except in
1943–1945), the Putnam exam’s roots include a math competition also sponsored
by Elizabeth Lowell Putnam and held in 1933 between ten Harvard students and
ten West Point cadets. The cadets both won the team contest and had the top
individual score. Earlier Putnam exams featured problems in areas closer to the
introductory technical undergraduate curriculum such as calculus, differential equa-
tions, or geometry; in more recent years, a recognizable blend of topics including
also linear algebra, some abstract algebra, combinatorics, number theory (or even
an occasional advanced topic on harder questions) characterizes each year’s twelve
problems.
The five most successful contestants each year are named Putnam Fellows, one
of whom is also awarded a fellowship for graduate study at Harvard; eight persons so
far have been a Putnam Fellow the maximum possible four times. Other substantial
monetary team and individual prizes are given, and an Elizabeth Lowell Putnam
prize may be awarded to one female contestant. The original intent to boost team
spirit and provide an avenue for students to fight for their institution’s glory in an
academic subject helps one to understand the peculiar ranking system, in which
every participating institution must designate a three-person team in advance, and
the team ranking is obtained by adding the team members’ individual ranks (rather
than their scores). Since higher scores are obtained by much fewer students, a
university whose three team members solve seven problems will usually rank higher
than the one where two brilliant team members solve nine problems and the third
solves three.
1 Her brother was astronomer Percival Lowell (1855–1916), who predicted the location of
Pluto and popularized the erroneous theory that Mars bore canals that indicated the presence of
intelligent life.
75
76 1927. WILLIAM LOWELL PUTNAM MATHEMATICAL COMPETITION
The Putnam exam has been called “the hardest math test in the world” [1, 3].
This is not without reason: the median score has budged above 1 point out of 120 in
only four years since 1999 and then never above 3, while fully 62.6% of 2006 entrants
scored 0. A student must make substantial progress toward an actual solution to
receive any points for a problem; checking small examples or stating some immediate
conclusions typically does not make the cut. Each of the 12 problems is graded on
a scale from 0 to 10 points, with the only scores allowed being 0, 1, 2, 8, 9, and
10. Thus, the grader must decide whether the problem is essentially solved or
not. A submission that solves one of the two main cases or one that contains the
structure of a full solution but has a serious flaw might get 1 or 2 points. On
the other hand, a submission that contains all the ingredients of a full solution
but neglects to check a minor subcase might get 8 or 9 points; the full mark of
10 points is reserved for essentially perfect solutions. The first round of grading
currently occurs in December at Santa Clara University. Imagine several dozen
mathematicians tackling the collective output of over 4,000 competitors from over
500 colleges on twelve problems one paper at a time over the span of four days.
Undergraduate students solve problems in several competitions around the
world, such as the annual Schweitzer competition in Hungary, the Jarnı́k competi-
tion in central Europe, the famous competitions at Moscow’s and Kiev’s Mech-Mats,
or the International Mathematics Competition for University Students [5], an an-
nual contest held in Europe that has also seen participation from several American
universities. While many Putnam stars were successful in the high school IMOs,
the two contests retain distinct mathematical profiles.
Opinions differ on the extent to which the Putnam exam or other competi-
tions mimic the mathematical research experience or are somehow reflective of the
student’s research aptitude; see several Putnam Fellows’ perspectives in [1]. The
Putnam exam was, of course, never designed for such use. Five Fellows, namely
Richard Feynman (1918–1988), John Milnor (1931– ), David Mumford (1937– ),
Daniel Quillen (1940–2011), and Kenneth Wilson (1936–2013), have been subse-
quently recognized with a Fields Medal or a Nobel Prize, and many dozens more
have become distinguished mathematicians at top universities and research insti-
tutes. Notable Putnam competitors include many Abel Prize winners, MacArthur
Fellows, AMS and MAA presidents, members of the National Academy of Sciences,
as well as many winners of the Morgan Prize for undergraduate research. Many oth-
ers have chosen entirely different careers, and many top-notch mathematicians have
never taken or particularly enjoyed contest mathematics. Ask mathematics gradu-
ate program admissions chairs or hedge fund managers and many will tell you that,
while neither a prerequisite nor a guarantee of success, a candidate’s good showing
on the Putnam exam gets their attention. Putnam problems test a specific kind of
ingenuity over technical mastery and are sometimes seen as occupying a universe
of their own, but here as in Hamming we must remember that Putnam problems
“were not on the stone tablets that Moses brought down from Mt. Sinai” [4]. They
are composed by a committee of working mathematicians designated by the MAA,
and so their evolution over time perhaps reflects our collective style and taste.
What makes a good Putnam problem? Bruce Reznick (1953– ) has writ-
ten with charm and detail about writing for the Putnam exam [8]. André Weil
(1906–1998), paraphrasing the English poet Housman who had used an example of
a fox-terrier hunting for a rat to explain why he cannot define poetry, famously
quipped: “when I smell number-theory I think I know it, and when I smell some-
thing else I think I know it too” [9]. He then proceeded to argue that analytic
number theory is not number theory, but this is a subject for another article. Put-
nam takers and experienced problem-solvers will similarly spot a juicy problem. It
will be accessible but not trivial, challenging but not impossibly so. It will relate
to important mathematics, but with an unexpected twist. It will make you smile
and, in Reznick’s words, whistle in your mind like a catchy tune.
We propose the following Putnam problem for your whistling enjoyment. It
appeared as problem A3 in the 2013 exam and requires nothing beyond calculus.
Do not just solve it and bask in your glory. Make yourself a hot beverage, relate the
solution to your other mathematical experiences, continue the story that it tells;
you will have new problems of your own in no time.

Proposed by Djordje Milićević, Bryn Mawr College.
Suppose that the real numbers a0 , a1 , . . . , an and x, in which 0 < x < 1, satisfy
a0 a1 an
+ + ···+ = 0.
1 − x 1 − x2 1 − xn+1
Prove that there exists a real number y with 0 < y < 1 such that
a0 + a1 y + · · · + an y n = 0. (1927.1)
1927: Comments
A hint for the problem. There are numerous collections, both in print and
online, with solutions and commentaries on the Putnam problems. For example, see
[6]. Before looking at the solution, you are strongly encouraged to try it yourself.
Here is a hint: suppose that
a0 + a1 y + · · · + an y n = 0 (1927.2)
for 0 < y < 1. If this occurs, then the intermediate value theorem implies that
a0 + a1 y + · · · + an y n has the same sign for all 0 < y < 1.
Eigenvalues and the intermediate value theorem. Never underestimate

the power of some of those first-semester calculus theorems. Here is a cute appli-
cation of the intermediate value theorem to linear algebra. Let A be a real n × n
matrix, in which n is odd. Then the characteristic polynomial pA of A is monic
and of odd degree. Consequently,
lim pA (x) = +∞ and lim pA (x) = −∞,
x→+∞ x→−∞
and hence pA assumes both positive and negative values on R. The intermediate
value theorem ensures that pA has a real zero; that is, A has a real eigenvalue. This
argument fails if n is even. For example, the eigenvalues of
!
0 −1
A=
1 0
are ±i.
78 1927. WILLIAM LOWELL PUTNAM MATHEMATICAL COMPETITION
Solution to the problem. Suppose toward a contradiction that (1927.2)

holds for 0 < y < 1. Then the intermediate value theorem ensures that a0 +
a1 y + · · · + an y n has the same sign for 0 < y < 1. Without loss of generality, we
assume that
a0 + a1 y + · · · + an y n > 0
for 0 < y < 1. Then
a0 xk + a1 x2k + · · · + an x(n+1)k > 0
for all 0 < x < 1 and k = 1, 2, . . .. Continuity ensures that a0 + a1 + · · · + an ≥ 0.
Since |x| < 1, we have
∞ ∞
∞
a0 xk + a 1 x2k + · · · + an x(n+1)k > 0
k=0 k=0 k=0
and hence
a0 a1 an
+ + ···+ > 0
1 − x 1 − x2 1 − xn+1
by the geometric series summation formula. Since this is a contradiction, we con-
clude that there is a y ∈ (0, 1) so that (1927.1) holds.
Bibliography
[1] G. L. Alexanderson, How Putnam fellows view the competition, MAA Focus, December 2004,
14-15, http://www.maa.org/sites/default/files/pdf/pubs/dec04.pdf
[2] J. A. Gallian, The first sixty-six years of the Putnam competition, American Mathematical
Monthly 111 (2004), 691–699. See also http://www.d.umn.edu/~jgallian/putnam.pdf.
[3] L. Grossman, Crunching the numbers, Time, December 16, 2002, http://content.time.com/
time/magazine/article/0,9171,400000,00.html.
[4] R. W. Hamming, The unreasonable effectiveness of mathematics, Amer. Math. Monthly 87
(1980), no. 2, 81–90, DOI 10.2307/2321982. MR559142
[5] International Mathematics Competition for University Students, http://www.imc-math.org.
uk.
[6] K. S. Kedlaya, B. Poonen, and R. Vakil, The William Lowell Putnam Mathematical Compe-
tition, 1985–2000, Problems, solutions, and commentary, MAA Problem Books Series, Math-
ematical Association of America, Washington, DC, 2002. MR1933844
[7] K. S. Kedlaya, The Putnam archive, http://kskedlaya.org/putnam-archive/.
[8] B. Reznick, Some thoughts on writing for the Putnam, http://www.math.uiuc.edu/~reznick/
putnam.pdf.
[9] A. Weil, Essais historiques sur la théorie des nombres (French), L’Enseignement
Mathématique, Université de Genève, Geneva, 1975. Extrait de l’Enseignement Math. 20
(1974); Monographie No. 22 de L’Enseignement Mathématique. MR0389725
1928
Random Matrix Theory
Introduction
Random matrix theory is, as expected, the study of randomly chosen matrices.
What is not immediately apparent is why it should so beautifully model such diverse
phenomena as energy levels in nuclear physics, zeros of the Riemann zeta function
(which encode information about the primes; see the 1942 entry), and stopping
times of bus routes, to name just a few! While the subject began with the 1928
paper of John Wishart (1898–1956) [24], for many people the most exciting dates
come later, in the 1950s, 1970s and 1990s.
In the 1950s Eugene Wigner (1902–1995) had the fruitful insight that systems
of random matrices could accurately predict properties of heavy nuclei [18–22]. In
a classical mechanics course one learns how to solve, in closed form, problems that
involve just one or two point masses. Once we have three bodies, chaos sets in and
closed-form solutions typically do not exist. Now imagine how much more daunting
the task is with heavy nuclei, in which there are hundreds of protons and neutrons
interacting under far more complicated forces than gravity. In quantum mechanics,
this is represented as HΨn = En Ψn , in which H is the Hamiltonian of the system
and Ψn are the energy eigenstates with eigenvalues En ; see the 1925 entry. While
this reduces quantum mechanics to linear algebra, there is a twist. The matrices
are infinite1 and the entries are unknown. These sorts of problems are beyond the
techniques learned in an undergraduate linear algebra class.
Wigner’s idea, for which he earned a Nobel Prize, is that the complicated
interactions actually help us. Rather than trying to find the eigenvalues of the
operator associated to our physical system, he looked at many random matrices,
diagonalized them, weighted the observed eigenvalue spectra by the probability of
choosing those matrices, and then averaged over a family of matrices. The hope,
which has been borne out time and time again in experiments and theories, is that
a “typical” system is close to average. A good way to view this universality is
to see it as a sort of central limit theorem; see the 1922 entry. We first establish
some notation and then give a simple version of his result below; it has since been
greatly generalized and extended. See [4–7, 16, 17] for some of the recent successes
and surveys, which have greatly weakened the assumptions needed on the random
variables. While we concentrate on real symmetric matrices, variants have been
proved in many other settings.
1 To be more precise, they are unbounded selfadjoint operators on an infinite-dimensional
Hilbert space. There are additional wrinkles too. On infinite-dimensional vector spaces, linear
operators need not have any eigenvalues. Consider the operator T : C[0, 1] → C[0, 1] defined
by (T f )(x) = xf (x). Here C[0, 1] denotes the complex vector space of continuous functions
f : [0, 1] → C. If T f = λf for some λ ∈ C, then (x − λ)f (x) = 0 for all x ∈ [0, 1]. Since f is
continuous, it must be the zero function. Thus, no λ ∈ C is an eigenvalue of T !
79
80 1928. RANDOM MATRIX THEORY
An N × N real symmetric matrix A has N real eigenvalues, repeated according

to multiplicity, which we label
λ1 (A) ≥ λ2 (A) ≥ · · · ≥ λN (A).
Fix a probability distribution p with mean 0, variance 1, and finite higher moments.
We consider the ensemble of N × N real symmetric matrices, in which the indepen-
dent entries2 are independent (in the probabilistic sense), identically distributed
random variables drawn from p. The probability that we choose a matrix A whose
(i, j) entry is in [αij , αij + ] is
αij +
p(aij ) daij .
1≤i≤j≤N αij
A key tool in understanding the eigenvalues is the eigenvalue trace lemma, which
states
n
tr A = aii = λ1 (A) + · · · + λN (A)
i=1
and, more generally,

N
tr(Ak ) = ai1 i2 ai2 i3 · · · aik−1 ik aik i1 = λi (A)k .
1≤i1 ≤···≤ik ≤N i=1
The importance of this lemma is that it allows us to pass from the matrix elements
(which we know) to the eigenvalues (which we want to understand). Determining
the moments of the eigenvalues yields information on their distribution. For exam-
ple, taking k = 2 implies that the expected value of tr(A2 ) is N 2 and hence the
average
√ square of an eigenvalue is N . Thus, a typical eigenvalue should be of order
N . This simple calculation suggests the normalization we shall see in Wigner’s
semicircle law.
√ we need is the empirical spectral measure of A,√denoted by μA .
The last item
We divide by 2 N to normalize each eigenvalue; the factor 2 in 2 N ensures that
we ultimately wind up with a circle instead of an ellipse. We write

1
N
λi (A)
μA (x) = δ x− √ ,
N i=1 2 N
in which δ is the Dirac delta functional. One can view δ(x − a) as a unit point mass
at a. If f is a suitably nice function, then
∞
f (x)δ(x − a) dx = f (a).
−∞
The way to mathematically realize “delta functions” in a rigorous way is to regard
them as linear functionals on suitable spaces of functions. For example, δ(x − a) is
the linear functional that sends a function f to the scalar f (a).
Wigner’s Semicircle Law: Consider the ensemble E(N, p) of N ×N real symmet-
ric matrices where the independent entries are independent, identically distributed
random variables drawn from a distribution p with mean 0, variance 1, and finite
2 Forinstance, the (1, 2) and (3, 4) entries are independent, in the sense that neither deter-
mines the other. On the other hand, the (1, 2) and (2, 1) entries each determine the other since
the matrices involved are symmetric.
Figure 1. Eigenvalues of a 1,000 × 1,000 random real (nonsym-

metric) matrix, with entries drawn independently from the uniform
distribution on [−1, 1].
higher moments. As N → ∞, for almost all A ∈ E(N, p), the √ empirical spectral
measure μA converges to the density of the semicircle, fsc (x) = π2 1 − x2 if |x| ≤ 1
and 0 otherwise.
Wigner’s work was expanded by Freeman Dyson (1923– ) [2, 3], whom we will
meet again shortly, and many others. Physical reasons often require the matrices
involved to be real symmetric (this ensures that the eigenvalues are real). Modulo
such constraints, researchers mostly considered matrices in which the free entries
were chosen independently from a fixed distribution. See Figures 1 and 2 for ex-
amples.
Fast forward to the 1970s. The Riemann zeta function, defined for Re s > 1 by
∞
1 1
−1
ζ(s) = = 1− s ,
n=1
ns p prime
p
can be meromorphically continued to the entire complex plane with a simple pole
at s = 1; see the 1933 entry for a proof of the remarkable product formula above.
Using complex analysis, one can show that the zeros of the completed zeta func-
tion are intimately connected to many properties of the primes. Hugh Montgomery
(1944– ) was working on the pair correlation problem [14], trying to understand the
distribution of differences of pairs of zeta zeros. While visiting the Institute for Ad-
vanced Study at Princeton, he relayed what he had found to Dyson, who remarked
that the same behavior is seen in the eigenvalues of certain ensembles of random
matrices. Additional support was later provided by the numerical investigations of
Andrew Odlyzko (1949– ) on the zeros of ζ(s); see [15] and the 1987 entry.
Figure 2. Histogram of eigenvalues of a 2,500 × 2,500 random

real symmetric matrix, with entries drawn independently from the
uniform distribution on [−1, 1].
From that moment, number theory, random matrix theory, and physics had a
lot to say to each other. These subjects continue to drive each other. New questions
emerged in the 1990s with the work of Nick Katz (1943– ) and Peter Sarnak (1953– )
[11], expanding the universe of matrix families relevant to number theory. For more
information on random matrix theory and its connection to number theory, see the
books [12, 13] and the survey articles [1, 21, 22]. See also the entry from 1960 for
an entertaining look at Wigner’s views on the role of mathematics in physics.

Let f be a “nice” probability distribution with mean 0, variance 1, and finite
higher moments. For example, f could be the standard normal distribution
f (x) =
√
e−x /2 / 2π, or f could be the uniform distribution on [− 3 3/2, 3 3/2]. Consider
2
the family of real symmetric matrices

- ! .
a11 a12
: a11 , a12 , a22 i.i.d. random variables with density f ;
a12 a22
this means that the entries a11 , a12 , and a22 are chosen independently of each
/β
other, and the probability that aij ∈ [α, β] is α f (x) dx. Calculate the probabil-
ity that such a randomly chosen matrix has its largest eigenvalue in the interval
[A, B]. What about its smallest eigenvalue? What about the difference between its
eigenvalues?
1928: Comments
Some connections. It is worth briefly mentioning the remarkable similarities
we see in such diverse systems. In the 1922 entry, we saw another example of
different systems converging to similar behavior. For more on this phenomenon see
the 1960 entry on Wigner’s paper The Unreasonable Effectiveness of Mathematics
in the Natural Sciences [23], in which we deliberately chose to focus on quantities
related to the characters from this year’s entry.
Analogues of the eigenvalue trace lemma arise in number theory. There we

wish to understand the zeros of the Riemann zeta function (or other related L-
functions); here the zeros play a role analogous to that played by eigenvalues, and
the coefficients of the L-functions correspond to the entries of the matrices. The
analysis frequently begins through some explicit formula, which allows us to pass
from what we understand (the coefficients) to what we wish to understand (the
zeros). These correspondences are useless, of course, if one cannot execute the
averaging on at least one side. For random matrix theory, this is done through
integration and combinatorics; much of the difficulty in number theory is that we
do not have similarly powerful results for the expressions involved.
For further reading. The references below contain many great introductions
to random matrix theory. These include a general interest article [10], short survey
articles [1, 8, 11], textbooks [9, 12], and many of the original papers in nuclear
physics and number theory [2, 18–24].
Bibliography
[1] J. B. Conrey, L-functions and random matrices, Mathematics unlimited—2001 and
beyond, Springer, Berlin, 2001, pp. 331–352. http://arxiv.org/pdf/math/0005300.pdf?
origin=publication_detail. MR1852163
[2] F. Dyson, Statistical theory of the energy levels of complex systems: I, II, III, J. Mathemat-
ical Phys. 3 (1962), 140-156, 157-165, 166-175, http://scitation.aip.org/content/aip/
journal/jmp/3/1/10.1063/1.1703773, http://scitation.aip.org/content/aip/journal/
jmp/3/1/10.1063/1.1703774, http://scitation.aip.org/content/aip/journal/jmp/3/1/
10.1063/1.1703773.
[3] F. J. Dyson, The threefold way. Algebraic structure of symmetry groups and en-
sembles in quantum mechanics, J. Mathematical Phys. 3 (1962), 1199–1215, DOI
10.1063/1.1703863. http://scitation.aip.org/content/aip/journal/jmp/3/6/10.1063/1.
1703863. MR0177643
[4] L. Erdős, Universality of Wigner random matrices: a survey of recent results, http://arxiv.
org/abs/1004.0861.
[5] L. Erdős, B. Schlein, and H.-T. Yau, Semicircle law on short scales and delocalization of
eigenvectors for Wigner random matrices, Ann. Probab. 37 (2009), no. 3, 815–852, DOI
10.1214/08-AOP421. MR2537522
[6] L. Erdős, B. Schlein, and H.-T. Yau, Local semicircle law and complete delocalization
for Wigner random matrices, Comm. Math. Phys. 287 (2009), no. 2, 641–655, DOI
10.1007/s00220-008-0636-9. MR2481753
[7] L. Erdős, B. Schlein, and H.-T. Yau, Wegner estimate and level repulsion for Wigner ran-
dom matrices, Int. Math. Res. Not. IMRN 3 (2010), 436–479, DOI 10.1093/imrn/rnp136.
MR2587574
[8] F. W. K. Firk and S. J. Miller, Nuclei, primes and the random matrix connection, Symmetry
1 (2009), no. 1, 64–105, DOI 10.3390/sym1010064. http://arxiv.org/pdf/0909.4914.pdf.
MR2756142
[9] P. J. Forrester, Log-gases and random matrices, London Mathematical Society Monographs
Series, vol. 34, Princeton University Press, Princeton, NJ, 2010. MR2641363
[10] B. Hayes, The spectrum of Riemannium, American Scientist 91 (2003), no. 4, 296-300.
[11] N. M. Katz and P. Sarnak, Zeroes of zeta functions and symmetry, Bull. Amer. Math. Soc.
(N.S.) 36 (1999), no. 1, 1–26, DOI 10.1090/S0273-0979-99-00766-1. MR1640151
[12] M. L. Mehta, Random matrices, 2nd ed., Academic Press, Inc., Boston, MA, 1991.
MR1083764
[14] H. L. Montgomery, The pair correlation of zeros of the zeta function, Analytic number theory
(Proc. Sympos. Pure Math., Vol. XXIV, St. Louis Univ., St. Louis, Mo., 1972), Amer. Math.
Soc., Providence, R.I., 1973, pp. 181–193. http://www-personal.umich.edu/~hlm/paircor1.
pdf. MR0337821
[15] A. M. Odlyzko, On the distribution of spacings between zeros of the zeta function, Math.
Comp. 48 (1987), no. 177, 273–308, DOI 10.2307/2007890. http://www.ams.org/journals/
mcom/1987-48-177/S0025-5718-1987-0866115-0/. MR866115
[16] B. Schlein, Spectral Properties of Wigner Matrices, Proceedings of the Conference QMath
11, Hradec Kralove, 2010.
[17] T. Tao and V. Vu, Random matrices: universality of local eigenvalue statistics, Acta Math.
206 (2011), no. 1, 127–204, DOI 10.1007/s11511-011-0061-3. MR2784665
[18] E. Wigner, On the statistical distribution of the widths and spacings of nuclear resonance
levels, Proc. Cambridge Philo. Soc. 47 (1951), 790-798. http://journals.cambridge.org/
abstract_S0305004100027237.
[19] E. Wigner, Characteristic vectors of bordered matrices with infinite dimensions, Ann.
of Math. 2 (1955), no. 62, 548-564. http://www.jstor.org/stable/1970079?seq=1#
page_scan_tab_contents.
[20] E. Wigner, Statistical properties of real symmetric matrices, in Canadian Mathematical Con-
gress Proceedings, University of Toronto Press, Toronto, 1957, 174–184.
[21] E. P. Wigner, Characteristic vectors of bordered matrices with infinite dimensions. II, Ann.
of Math. (2) 65 (1957), 203–207, DOI 10.2307/1969956. MR0083848
[22] E. P. Wigner, On the distribution of the roots of certain symmetric matrices, Ann. of Math.
(2) 67 (1958), 325–327, DOI 10.2307/1970008. http://www.jstor.org/stable/1970008?
seq=1#page_scan_tab_contents. MR0095527
[23] E. P. Wigner, The unreasonable effectiveness of mathematics in the natural sciences [Comm.
Pure Appl. Math. 13 (1960), 1–14; Zbl 102, 7], Mathematical analysis of physical systems,
Van Nostrand Reinhold, New York, 1985, pp. 1–14. https://www.dartmouth.edu/~matc/
MathDrama/reading/Wigner.html. MR824292
[24] J. Wishart, The generalized product moment distribution in samples from a normal multi-
variate population, Biometrika 20 A (1928), 32-52. http://www.jstor.org/stable/2331939?
seq=1#page_scan_tab_contents.
1929
Gödel’s Incompleteness Theorems
Introduction
This statement is false. If it is false, then it is true; likewise, if it is true, then it
is false. This paradox, commonly known as the liar’s paradox, has been attributed
to Eubulides of Miletus (4th century BCE). If you are confused, you are not the only
one. The liar’s paradox was used to disable an android in the Star Trek episode I,
Mudd (1967) and a sentient mainframe in the Doctor Who serial The Green Death
(1973).
In a similar vein, Bertrand Russell (1872–1970) dealt the death blow to “naive
set theory” in 1901 when he observed that the definition of the “set” R = {x : x ∈ / x}
ensures that R ∈ R if and only if R ∈ / R, which is absurd. Thus, R is not a set;
it is too big to be treated as a set in a logically sound manner. This is Russell’s
paradox .
Needless to say, the invalidity of naive set theory1 was a great disappointment.
No one was more disappointed than Gottlob Frege (1848–1925), who was just fin-
ishing his would-be masterpiece Grundgesetze der Arithmetik , which purported to
derive the laws of arithmetic from supposedly logical axioms. As Frege admitted:
A scientist can hardly meet with anything more undesirable than to
have the foundation give way just as the work is finished. I was put
in this position by a letter from Mr. Bertrand Russell when the work
was nearly through the press.
Mathematicians were forced to reevaluate the foundations of their discipline. Sets
would have to be treated in a rigorous manner. The rules would have to be explicitly
stated so that contradictions would not occur in axiomatic set theory.
The Zermelo–Fraenkel axioms (ZF) are a list of eight or so axioms,2 depending
upon the particular formulation, that codify the properties of sets. For the most
part they assert things that most mathematicians take for granted (for example,
unions of sets exist). Here are a couple of the axioms.

Axiom of Foundation. ∀x x = ∅ =⇒ ∃y ∈ x(y ∩ x = ∅) .
This axiom prevents a set from being an element of itself.3
1 More specifically, the general comprehension principle, which asserts that given any prop-
erty, there exists a set that consists of all objects having that property.
2 Technically, some of them are axiom schema; the distinction is not important for us.
3 If A is a set, then so is {A} (this requires the axiom of pairing, which asserts that if A
and B are sets, then so is {A, B}; let A = B to see that {A} is a set). The axiom of foundation
ensures that there is an element of {A} that is disjoint from {A}. The only element of {A} is A,
so A and {A} are disjoint. Thus, A ∈ / A.
85
86 1929. GÖDEL’S INCOMPLETENESS THEOREMS

Axiom of Infinity. ∃X ∅ ∈ X ∧ ∀x ∈ X (x ∪ {x}) ∈ X .
This ensures that an infinite set exists; the set X described by the axiom contains

∅, ∅ ∪ {∅}, ∅ ∪ {∅} ∪ ∅ ∪ {∅} , . . . ,
which can be used to define the natural numbers 0, 1, 2, . . .. Is Zermelo–Fraenkel
set theory the ultimate answer to the foundational crisis in mathematics? In short,
no. We have not even brought up the axiom of choice or the continuum hypothesis
yet; see the entries for 1924, 1940, 1963, 1964, and 1999.
Self-reference, seen in the liar’s paradox and Russell’s paradox, lies at the heart
of the celebrated first incompleteness theorem of Kurt Gödel (1906–1978).4 A set
of axioms is consistent if there does not exist a statement S such that both S
and its negation ¬S are provable from the axioms (that is, the axioms are not
self-contradictory). The first incompleteness theorem says that any “sufficiently
complicated” axiomatic system is either incomplete (not all true statements in that
system can be proved in that system) or inconsistent (self-contradictory). The
second incompleteness theorem states that no “sufficiently complicated” axiomatic
system (this includes ZF) can prove its own consistency.
Around the turn of the 20th century, David Hilbert initiated a program that
aimed to show that all of mathematics can be derived from a set of self-evident
axioms. This program was pursued in earnest by Bertrand Russell and Alfred North
Whitehead (1861–1947), who authored the imposing Principia Mathematica. This
was an ambitious task; it took over 300 pages to establish that 1 + 1 = 2.
Gödel’s theorems show that Hilbert’s program is doomed. If ZF is consistent,
then there are true statements that can be expressed in the language of ZF, but
not proved in ZF. Moreover, we cannot even hope to use ZF to prove that ZF
is consistent. Indeed, if ZF could be used to prove that ZF is consistent, then
the second incompleteness theorem would ensure that ZF is inconsistent! There
is nothing special about ZF. Any other sufficiently complicated system of axioms
would be plagued by the same issues.

To obtain the hereditary base-b representation of a natural number, write it
in base-b, then expand the exponents in base-b, and continue until the process
terminates. Since 266 = 28 + 23 + 21 , the hereditary base-2 representation of 266 is
2+1
266 = 22 + 22+1 + 2.
The Goodstein sequence Gb (n) of a natural number n is obtained as follows.
(a) Let G2 (n) = n.
(b) Write Gb (n) in hereditary base-b notation.
(c) Obtain Gb+1 (n) from Gb (n) be first replacing all occurrences of the base b by
b + 1, then subtracting 1 from the end result.
4 Technically, we are a little early. The completeness theorem for first-order logic was the
subject of Gödel’s 1929 thesis; the incompleteness theorems actually date from 1931.
If n = 4, then G2 (4) = 22 = 4, G3 (4) = 33 − 1 = 26, and so forth. The first few

values of Gb (266) are
2+1
G2 (266) = 22 + 22+1 + 2 = 266,
3+1
G3 (266) = (33 + 33+1 + 3) − 1 ≈ 4.4 × 1038 ,
4+1
G4 (266) = (44 + 44+1 + 2) − 1 ≈ 3.2 × 10616 ,
5+1
G5 (266) = (55 + 55+1 + 1) − 1 ≈ 2.5 × 1010921 ,
6+1
G6 (266) = 66 + 66+1 − 1 ≈ 3.5 × 10217832 ; (1929.1)
see [8]. For n ≥ 4, a few calculations quickly suggest that limb→∞ Gb (n) = ∞.
However, this is completely misleading. Prove that every Goodstein sequence ter-
minates with 0.
1929: Comments
Although we do not provide the complete solution here, the following example
illustrates the main idea; see [1] for more details. Let us return to the case n = 266
discussed above. In the line corresponding to base b, replace every occurrence of b
in the hereditary base-b expansion of Gb (266) − 1 with the first infinite ordinal ω;
see Figure 1. This yields
ω+1
H2 (266) = ω ω + ω ω+1 + ω,
ω+1
H3 (266) = ω ω + ω ω+1 + 2,
ω+1
H4 (266) = ω ω + ω ω+1 + 1,
ω+1
H5 (266) = ω ω + ω ω+1 ,
ω+1
H6 (266) = ω ω + ω ω · 5 + ω 5 · 5 + · · · + ω · 5 + 5,
and so forth. The structure of H6 (266) looks different than expected because
(1929.1) is not the hereditary base-6 expansion of Gb (266) − 1; the “ones” term
cannot be −1. Instead.
6+1 6+1
66 + 66+1 − 1 = 66 + 5 · 66 + 5 · 65 + · · · + 5 · 6 + 5.
This leads to a strictly decreasing sequence Hb (266) of countable ordinals. It turns
out that any strictly decreasing sequence of ordinal numbers is finite, so Hb (266),
and hence Gb (266), terminates with 0.
Although Goodstein’s theorem is a statement about natural numbers and their
properties, it cannot be proved without “infinitary” means; some form of transfinite
mathematics is required to prove Goodstein’s theorem. The Kirby–Paris theorem
(1982) implies that Goodstein’s theorem is independent of Peano arithmetic (PA).
In other words, Goodstein’s theorem can neither be proved nor disproved in PA.
One can think of Peano arithmetic as the system ZFCfin obtained from ZFC
(see the 1963, 1964, and 1969 entries) if the axiom of infinity (“there exists an
infinite set”) is replaced by its negation (“all sets are finite”). PA is sufficient for
almost all familiar statements of elementary number theory. For instance, Euclid’s
theorem on the infinitude of the primes can be proved without any reference to
infinite sets. “For each prime p there exists a prime q such that p < q” expresses
88 1929. GÖDEL’S INCOMPLETENESS THEOREMS
ω4 ω5 ω6
ω +2 ω3
4 ω +1 ω2
3
2 ω
Figure 1. Graphical depiction of an initial segment of the ordinal

numbers. The “set of all ordinal numbers” is well-ordered and has
the property that a strictly decreasing sequence of ordinals is finite.
However, the “set of all ordinal numbers” is too big to be a set; this
leads to the Burali–Forti paradox. But that is a story for another
time.
the infinitude of primes without discussing infinite sets. One does not need to “hold
in one’s hand” the set of all primes in order to prove Euclid’s theorem.
Gödel’s first incompleteness theorem ensures that if Peano arithmetic is con-
sistent (as most people believe), then there are true statements about the integers
that cannot be proved in PA. Goodstein’s theorem is one such statement.
If you want more information about the foundational crisis and its main char-
acters and you want it in the form of a graphic novel, then [2] is for you. If
you want your Gödel with a serving of M. C. Escher (1898–1972) and J. S. Bach
(1685–1750), then you need the acclaimed book [6]. Another great choice is [3],
particularly for debunking the numerous pseudoscientific assertions often ascribed
to Gödel’s theorems.
Bibliography
[1] A. E. Caicedo, Goodstein’s function (English, with English and Spanish summaries), Rev.
Colombiana Mat. 41 (2007), no. 2, 381–391. MR2585906
[2] A. Doxiadis and C. H. Papadimitriou, Logicomix: An epic search for truth, character design
and drawings by Alecos Papadatos, color by Annie Di Donna, Bloomsbury Press, New York,
2009. MR2884886
[3] T. Franzén, Gödel’s theorem: An incomplete guide to its use and abuse, A K Peters, Ltd.,
Wellesley, MA, 2005. MR2146326
[4] K. Gödel, Über die Vollständigkeit des Logikkalküls, Doctoral dissertation, University of Vi-
enna, 1929.
[5] R. L. Goodstein, On the restricted ordinal theorem, J. Symbolic Logic 9 (1944), 33–41, DOI
10.2307/2268019. https://projecteuclid.org/euclid.jsl/1183391360. MR0010515
[6] D. R. Hofstadter, Gödel, Escher, Bach: an eternal golden braid, Basic Books, Inc., Publishers,
New York, 1979. MR530196
[7] L. Kirby and J. Paris, Accessible independence results for Peano arithmetic, Bull. Lon-
don Math. Soc. 14 (1982), no. 4, 285–293, DOI 10.1112/blms/14.4.285. http://blms.
oxfordjournals.org/content/14/4/285.full.pdf. MR663480
[8] Wolfram MathWorld, Goodstein Sequence, http://mathworld.wolfram.com/
GoodsteinSequence.html.
1930
Ramsey Theory
Introduction
There are many questions that could, in principle, be settled by a computation.
However, some of these problems are so far beyond the realm of practical compu-
tation that we may never know the answer; see the 1933 entry and the comments
for the 1992 entry for other examples of this phenomenon. A great source of such
problems is Ramsey theory, named after Frank Plumpton Ramsey (1903–1930), an
area of mathematics that studies how large a collection of objects must be to ensure
the emergence of a desired property.
The seminal problem in Ramsey theory is the determination of the Ramsey
number R(m, n), which is defined as follows. Imagine there is a long-expected
party with N people, and in any pair of two people either both know each other
or neither knows the other. Then R(m, n) is the smallest N which guarantees that
there are either (a) at least m people that all know each other or (b) at least n people
such that none of these n people know each other. It is known that R(3, 3) = 6 and
R(4, 4) = 18.
Ramsey theory’s mantra is “complete disorder is impossible”: any large, seem-
ingly disordered, structure should contain a smaller, highly ordered substructure.
Unfortunately, there are often so many cases to investigate that these sorts of prob-
lems cannot be solved by brute force. For example, we may associate a graph to
the party problem (see Figure 1), with vertices representing people and an edge
C B
D A
E F
Figure 1. Since R(3, 3) = 6, in any party of six or more there

are at least three people who all know each other (blue lines) or
at least three people who are mutual strangers (red lines). In the
configuration illustrated, both of these situations occur: people A,
C, E all know each other and B, D, F are mutual strangers.
89
90 1930. RAMSEY THEORY
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Figure 2. The van der Waerden number W (2, 3) is at least 8.

The coloring of {1, 2, . . . , 8} pictured at the top has no monochro-
matic arithmetic progression of length three. It is possible to show
that W (2, 3) = 9. For instance, appending a red 9 to the original
coloring yields the monochromatic progression 1, 5, 9; appending a
blue 9 yields the monochromatic progression 3, 6, 9.
connecting people who know each other. The number of graphs on N labeled ver-
n
tices is 2( 2 ) = 2n(n−1)/2 , which already exceeds 10200 for n = 40. According to
Paul Erdős:
Suppose aliens invade the earth and threaten to obliterate it in a year’s
time unless human beings can find the Ramsey number for red five
and blue five. We could marshal the world’s best minds and fastest
computers, and within a year we could probably calculate the value.
If the aliens demanded the Ramsey number for red six and blue six,
however, we would have no choice but to launch a preemptive attack.
Erdős’s quote is about R(5, 5) (which is between 43 and 49) and R(6, 6) (which is
between 102 and 165).
A famous theorem in the subject is due to Bartel Leendert van der Waerden
(1903–1996). Given c ≥ 2 colors and a natural number n, there is a natural number
W (c, n) such that if N ≥ W (c, n) and we paint the integers 1, 2, . . . , N with these
colors, there is a length n arithmetic progression in {1, 2, . . . , N }, each element of
which has the same color. It is known that W (2, 3) = 9 (see Figure 2), W (2, 4) = 35,
W (2, 5) = 178, and W (2, 6) = 1132. Most other values of W (c, d) are unknown,
although bounds exist. For instance, the novel approach to Szemerédi’s theorem
(see the 1975 entry) developed by Fields Medalist Timothy Gowers (1963– ) yields
the upper bound
2n+9
c2
W (c, n) ≤ 22 .
Although W (c, n) grows rapidly, it is hoped that Gowers’s bound is overkill. A cash
prize of $1,000 was offered by Ronald Graham (1935– ) for a proof that
2
W (2, n) < 2n ;
see [6] for a list of problems in Ramsey theory with cash prizes attached.

Proposed by Joel Spencer, NYU, James M. Andrews, University of
Memphis, and Steven J. Miller, Williams College, based on the 1953
Putnam Mathematical Examination.
Six points are in general position in R3 (no three on a line, no four in a plane).
The fifteen line segments joining them in pairs are drawn and then painted, some
segments red, some blue. Prove that some triangle has all its sides the same color.
1930: Comments
Some cautionary tales. Before giving the solution to this problem, let us
digress a bit on the dangers of extrapolating from limited data. For instance, the
Ramsey numbers R(n, n) are known only for n = 1, 2, 3, 4; it is hard to surmise
what R(5, 5) should be based on this information. Here are a couple cautionary
tales about careless extrapolation.
Moser’s circle problem1 asks for the maximum number f (n) of regions into
which a circle can be partitioned by connecting n points on the circle with chords.
A couple quick sketches confirm that
f (1) = 1, f (2) = 2, f (3) = 4, f (4) = 8, and f (5) = 16;
n−1
see Figure 3. This limited data suggests that f (n) = 2 for all n. However, it
turns out that f (6) = 31; see Figure 4. The correct general answer
1 4
f (n) = (n − 6n3 + 23n2 − 18n + 24),
24
can be obtained by induction or combinatorial topology [3, 11].
The previous conjecture failed at n = 6. Here is an even more striking example.
Let p(n) = n2 + n + 41 and consider its values
41, 43, 47, 53, 61, 71, 83, 97, 113, 131, 151, 173, 197, 223, 251,
281, 313, 347, 383, 421, 461, 503, 547, 593, 641, 691, 743, 797,
853, 911, 971, 1033, 1097, 1163, 1231, 1301, 1373, 1447, 1523,
1601, . . .
for n = 0, 1, 2, . . .. Do you see a pattern? They are all prime! Or at least
p(0), p(1), . . . , p(39) are; we intentionally left off the composite number p(40) =
1681 = 412 . This shows that even a few dozen cases do not a theorem make. This
amazing polynomial was discovered by Euler in 1772; see the 1983 entry for an even
more amazing “prime generating polynomial.”
In 1919, George Pólya (1887–1985) suggested that most natural numbers have
an odd number of prime factors [12]. For instance, 108 = 22 · 33 has 2 + 3 = 5
prime factors. The Liouville lambda function λ(n) is +1 if n has an even number of
prime factors and −1 if n has an odd number of prime factors. Pólya’s conjecture
states that
n
L(n) = λ(i) ≤ 0
i=1
for n = 2, 3, 4, . . .. Numerical evidence suggests that truth of the conjecture. In fact,
it holds for all n < 906,150,257, which is the smallest counterexample to Pólya’s
1 The problem appears in a paper of Leo Moser (1921–1970) and W. Bruce Ross [10], so
perhaps the “Moser–Ross circle problem” would be more appropriate.

92 1930. RAMSEY THEORY
2
1
1 3 1
2
4
2
2 1 7 11 1
6 5 12 6
3 8 16 15
7 8 13 10
3 4 9 14 5
4
Figure 3. Illustration of Moser’s circle problem for small n. Here

n points on a circle yield chords that determine at most 2n−1 re-
gions for n = 1, 2, 3, 4, 5.
14 20
21 13
3 1
9 7
26
27 25
15 19
31
22 18
28 30
29
10 12
4 6
16 24
23 17
11
Figure 4. Illustration of Moser’s circle problem for n = 6. The

pattern f (n) = 2n−1 that held for n = 1, 2, 3, 4, 5 fails for n = 6.
At most f (6) = 31 regions are determined.
conjecture. When C. Brian Haselgrove (1926–1974) disproved the conjecture in

1958, he proved the existence of a counterexample in the vicinity of 1.845 × 10361
[8].
Now for our final warning about naive reliance on numerical data (see also the
notes for 1957). One can show that

∞ t t t
sin t sin 101 sin 201 sin 100n+1 π
t t ··· t dt =
0 t 101 201 100n+1
2
for n = 1, 2, . . . , 9.8 × 1042 . However, it is possible to show that the integral is
unequal to π/2 for all n > 7.4 × 1043 [1].
Solution to the problem. The following is from [4]. Let P be any one of
the six points. Five of the line segments end at P , and of these at least three, say
P Q, P R, and P S, must have the same color, say blue. If any of the segments QR,
RS, SQ is blue, we obtain a blue triangle; if not, QRS will be a red triangle. The
proposer, Joel Spencer (1946– ), was given this problem while in high school. After
many days he had a proof, but only after going through a good many of the 32,784
cases!
Bibliography
[1] J. C. Baez, Patterns that eventually fail, https://johncarlosbaez.wordpress.com/2018/09/
20/patterns-that-eventually-fail/.
[2] A. Carr, Party at Ramsey’s, http://blogs.ams.org/mathgradblog/2013/05/11/
mathematics/.
[3] J. H. Conway and R. K. Guy, The book of numbers, Copernicus, New York, 1996. MR1411676
[4] A. M. Gleason, R. E. Greenwood, and L. M. Kelly, The William Lowell Putnam Mathemati-
cal Competition: Problems and solutions: 1938–1964, Mathematical Association of America,
Washington, D.C., 1980. MR588757
[5] R. L. Graham and J. H. Spencer, Ramsey Theory, Scientific American (July 1990), 112-117.
http://www.math.ucsd.edu/~ronspubs/90_06_ramsey_theory.pdf.
[6] R. L. Graham, Some of my favorite problems in Ramsey theory, INTEGERS 7 (2007), no. 2,
#A15.
[7] W. T. Gowers, A new proof of Szemerédi’s theorem, Geom. Funct. Anal. 11 (2001), no. 3,
465–588, DOI 10.1007/s00039-001-0332-9. MR1844079
[8] C. B. Haselgrove, A disproof of a conjecture of Pólya, Mathematika 5 (1958), 141–145, DOI
10.1112/S0025579300001480. MR0104638
[9] B. M. Landman and A. Robertson, Ramsey theory on the integers, Student Mathematical
Library, vol. 24, American Mathematical Society, Providence, RI, 2004. MR2020361
[10] L. Moser and W. B. Ross, Mathematical Miscellany, Math. Mag. 23 (1949), no. 2, 109–114.
MR1570450
[11] The On-Line Encyclopedia of Integer Sequences, A000127 (Maximal number of regions ob-
tained by joining n points around a circle by straight lines. Also number of regions in 4-space
formed by n − 1 hyperplanes), http://oeis.org/A000127.
[12] G. Pólya Verschiedene Bemerkungen zur Zahlentheorie, Jahresbericht der Deutschen
Mathematiker-Vereinigung 28, 31-40.
[13] F. P. Ramsey, On a Problem of Formal Logic, Proc. London Math. Soc. (1930), s2-30,
no. 1, 264-286. https://londmathsoc.onlinelibrary.wiley.com/doi/abs/10.1112/plms/
s2-30.1.264.
1931
The Ergodic Theorem
Introduction
A discrete dynamical system consists of a set of states X and a function T :
X → X. Given a system in state x ∈ X, let T (x) be the state of the system one
unit of time later. If the system starts in state x, it next moves to T (x), then to
T 2 (x) = T (T (x)), and so forth. If A is a set of states and

1 if x ∈ A,
χA (x) =
0 if x ∈ /A
is the characteristic function (also called the indicator function) of A, then
n
χA (T i (x))
i=1
counts the number of visits of x to A up to and including time n. The time average
of visits of x to A is
1
n
lim χA (T i (x)),
n→∞ n
i=1
if this limit exists.
Suppose that we have a notion of “size” m(A) for subsets A of X. We insist
that this “measure” is normalized so that m(X) = 1. Then m(A) can be thought
of as the relative size of A in X (see the notes for 1924 for some of the hazards of
naive measure theory). We also assume that T : X → X is measure preserving, in
the sense that m(T −1 (A)) = m(A) for every measurable set A. That is, although
T mixes and rearranges points of X, the size of A is unchanged after an application
of T . For instance, X could be a batch of (incompressible) cookie dough and T
could be the act of kneading the dough once (in some prescribed manner) for one
minute. A particular handful A of dough might be warped, stretched, or cut, but
the volume occupied by A, T (A), T 2 (A),. . . is always the same.
A consequence of Ludwig Boltzmann (1844–1906) and Josiah Willard Gibbs’s
investigations in statistical mechanics was the ergodic hypothesis. A version of the
ergodic hypothesis states that the time average of a system should equal the space
average, m(A). To see what this means, we consider a simple example. An irrational
rotation is a function T : [0, 1) → [0, 1) of the form T (x) = x + θ (mod 1), in which
θ is a fixed irrational real number. By x + θ (mod 1), we refer to√ the fractional part
x + θ − x + θ of x + θ. For example, if x = 0.5 and θ = 2 = 1.414 . . ., then
x + θ (mod 1) = 0.914 . . .. The term “rotation” stems from the fact the wrapped
interval [0, 1) is topologically the same as a circle. From this perspective, addition
of θ modulo 1 corresponds to a rotation of the circle through an angle of 2πθ. It is
possible to show that the ergodic hypothesis holds for this example. For instance,
95
96 1931. THE ERGODIC THEOREM
√
Figure 1. T (0), T 2 (0), . . . , T 100 (0) for θ = 2, e, and π, respectively.
the average amount of time that T (0), T 2 (0), . . . spends in an interval [a, b] equals
the length b − a of that interval; see Figure 1.
In 1931 John von Neumann [6] followed shortly by the sooner-to-publish George
Birkhoff [2], proved that time averages exist and equal the space averages for
measure-preserving systems satisfying a condition called ergodicity. A set E is
invariant if T (x) ∈ E if and only if x ∈ E. Ergodicity means that the only invari-
ant sets for T are those that differ from ∅ or X by a set of measure zero. If E is
invariant and x starts in E, then all of its iterates stay in E and no point outside
E visits E. That means that if T were not ergodic, there would exist sets E and
E c , both of positive measure, for which the dynamics of T on E will be totally
unrelated to the dynamics of T on E c . In other words, one could decompose the
dynamic system into two independent, simpler systems.
We can now state the Birkhoff ergodic theorem: for all measurable sets A there
exists a set of measure zero N so that
1
n
lim χA (T i (x)) = m(A) for all x outside N.
n→∞ n
i=1
An important part of the theorem is that the limit exists. In fact, once we know
the limit exists, using standard results from analysis it is possible to show the
limit equals the measure of A. An immediate consequence is that for all sets A of
positive measure, every point of X (outside a set of measure zero) visits the set A,
and furthermore, the visits are with the “right frequency.” Thus, we know a lot
about the orbit of almost every point.
This theorem has had a strong influence in analysis and has many consequences.
For example, it can be used to prove Weyl’s uniform distribution property1 and the
law of large numbers2 from probability. The ergodic theorem is in fact a bit more
general: one can replace χA by any Lebesgue-integrable function f , and then m(A)
is replaced by the integral of f . The theorem proved by von Neumann is similar
to Birkhoff’s but the convergence is in the norm of a Hilbert space in which the
functions reside. For an introduction and proof the reader may consult [7]. Further
historical details and current developments can be found in [1].
1 Ifα is irrational, then the set {nα (mod 1)}∞ n=1 is equidistributed in [0, 1].
2 Let X1 , . . . , Xn be independent random variables drawn from a common distribution with
mean μ and let X n = (X1 + · · · + Xn )/n denote the sample mean. Then X n converges in
probability to μ. A sequence of random variables Sn converges in probability to a random variable
S (which in our case will be the constant μ) if for every > 0 we have limn→∞ P (|Sn −S| > ) = 0.

Proposed by Cesar E. Silva, Williams College.
In 1988 Jean Bourgain (1954– ) proved that for every square-integrable function
f , the time average along polynomial times exists outside a set of measure zero [3].
In other words,
1
n
f (T p(i) (x)) converges for almost all x,
n i=1
for any polynomial p with integer coefficients. When all powers of T are ergodic,
it follows that this limit equals the integral that is expected. It is reasonable to
ask what happens when the function f is merely integrable, even in the case of the
squares: p(i) = i2 . It was shown recently by Buczolich and Mauldin [4] that the
theorem for the squares fails when f is only assumed to be integrable. This proof
has been extended recently by P. LaVictoire. It would be interesting to find simpler
proofs of all of these results.
1931: Comments
Continued fractions. We briefly discuss a connection between the ergodic
theorem and continued fractions; see [5] and the references therein, as well as
the 1934 and 1972 entries. Each real number x has a unique continued fraction
expansion
1
x = a0 (x) + , (1931.1)
1
a1 (x) +
1
a2 (x) + 1
a3 (x) + ···
in which the positive integers ai (x) are the continued fraction digits of x. For
typographical reasons we write x = [a0 ; a1 , a2 , . . . ] or x = [a1 , a2 , . . . ] if a0 = 0.
How are the ai (x) computed? First, let a0 (x) = x, the greatest integer at most
x. Next, let a1 (x) = 1/(x − a0 (x)) and so forth. The continued fraction (1931.1)
is finite if and only if x is rational. It is eventually periodic if and only if x is a
quadratic irrational. For an x chosen uniformly at random in [0, 1), what is the
probability as n → ∞ that the nth digit is k?
The answer is the beautiful Gauss–Kuzmin theorem, due to Gauss and Rodion
Kuzmin (1891–1949). It says that for almost all x the probability converges to

1
log2 1 + ;
k(k + 2)
see [5] for a proof, which is an expanded version of the the argument in the classic
book by Aleksandr Khinchin (1894–1959). The beauty stems from the clear, simple
formula. The problem with the Gauss–Kuzmin theorem is that we do not know
much about the exceptional set. For example, although we believe that cubic
irrationals follow the Gauss–Kuzmin distribution, we do not know for sure. Some
specific numbers, such as e1/n for n = 1, 2, . . ., fail dramatically.
98 1931. THE ERGODIC THEOREM
(a) x = π (b) x = e
Figure 2. Histograms of the first 10,000 iterates of the Gauss

map versus the probability density function (PDF) of the Gauss
measure for fixed x.
A central ingredient in the proof of the Gauss–Kuzmin theorem is the Gauss

map G. This is the ergodic transformation on [0, 1) defined by
0 1
1 1
G(x) = − .
x x
We can use G to obtain the digits of x’s continued fraction expansion as follows.
Given x ∈ [0, 1), set a0 = 0, a1 = 1/G(x), and an = 1/Gn (x). The reader
is strongly encouraged to choose a few different x and to look at the sequence
{Gn (x)}∞n=1 with respect to the Gauss measure γ, defined by

1 dx
γ(A) = .
log 2 A 1 + x
We compare the distribution of iterates of the Gauss map applied to π and e and
the Gauss measure in Figure 2.
Note the stark difference in behavior; the fit is excellent for π but so bad for e
that we cannot show the observed and Gauss–Kuzmin predictions on the same plot.
Both of these numbers are transcendental, but they have very different properties.
The first few digits of their continued fraction expansions are
π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3, 13, 1,
4, 2, 6, 6, 99, 1, 2, 2, 6, 3, 5, 1, 1, 6, 8, 1, 7, 1, 2, 3, 7, 1, 2, 1, 1, 12, 1, 1, 1, 3, 1, 1,
8, 1, 1, 2, 1, 6, 1, 1, 5, 2, 2, 3, 1, 2, 4, 4, 16, 1, 161, 45, 1, 22, 1, 2, 2, 1, 4, 1, 2, 24,
1, 2, 1, 3, 1, 2, 1, 1, 10, . . . ]
and
e = [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, 1, 12, 1, 1, 14, 1, 1, 16, 1, 1, 18, 1, 1, 20,
1, 1, 22, 1, 1, 24, 1, 1, 26, 1, 1, 28, 1, 1, 30, 1, 1, 32, 1, 1, 34, 1, 1, 36, 1, 1, 38, 1, 1,
40, 1, 1, 42, 1, 1, 44, 1, 1, 46, 1, 1, 48, 1, 1, 50, 1, 1, 52, 1, 1, 54, 1, 1, 56, 1, 1, 58,
1, 1, 60, 1, 1, 62, 1, 1, 64, 1, 1, 66, 1, . . . ].
Notice that two out of every three continued-fraction digits of e equal 1 and the
others form an arithmetic progression. A pattern in π’s continued-fraction digits
has never been found.
Bibliography
[1] V. Bergelson, Some historical comments and modern questions around the ergodic theorem,
Dynamics of Complex Systems, Research Institute for Math. Sciences, Kyoto, 2004, 1–11.
https://people.math.osu.edu/bergelson.1/vb_Kyoto8Nov04.pdf.
[2] G. Birkhoff, Proof of the ergodic theorem, Proc. Nat. Acad. Sci, USA 17 (1931), 656–660.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1076138/.
[3] J. Bourgain, On the maximal ergodic theorem for certain subsets of the integers, Israel J. Math.
61 (1988), no. 1, 39–72, DOI 10.1007/BF02776301. http://link.springer.com/article/10.
1007%2FBF02776301. MR937581
[4] Z. Buczolich and R. D. Mauldin, Divergent square averages, Ann. of Math. (2) 171 (2010),
no. 3, 1479–1530, DOI 10.4007/annals.2010.171.1479. http://annals.math.princeton.edu/
wp-content/uploads/annals-v171-n3-p02-p.pdf. MR2680392
[6] J. von Neumann, Proof of the quasi-ergodic hypothesis, Proc. Nat. Acad. Sci., USA 18 (1932),
70–82. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1076162/.
[7] C. E. Silva, Invitation to ergodic theory, Student Mathematical Library, vol. 42, American
Mathematical Society, Providence, RI, 2008. MR2371216
1932
The 3x + 1 Problem
Introduction
The Collatz function T : N → N is defined by
⎧x
⎨ if x is even,
T (x) = 2
⎩3x + 1 if x is odd.
Now pick a seed, a natural number n, and consider the corresponding Collatz se-
quence n, T (n), T 2 (n), . . ., in which T k (n) denotes the k-fold iterate T (T (· · · (T (n)))).
This is also called the orbit of n under T . For example, n = 21 yields the Collatz
sequence
21, 64, 32, 16, 8, 4, 2, 1, 4, 2, 1, 4, 2, 1, ...
and n = 24 provides
24, 12, 6, 3, 10, 5, 16, 8, 4, 2, 1, 4, 2, 1, ....
Both sequences eventually settle down to the repeating pattern 4, 2, 1, 4, 2, 1, . . ., a

periodic orbit of period three.
It appears that for every initial seed n, the Collatz sequence eventually reaches
the number 1; that is, every Collatz sequence ends with 4, 2, 1, 4, 2, 1, . . .. Proving
this is the famed 3x + 1 problem (or the 3x + 1 conjecture), often credited to Lothar
Collatz (1910–1990). It goes by an astounding number of other names as well:
Ulam’s conjecture, Kakutani’s problem, the Thwaites conjecture, and the Syracuse
problem. We are not going to debate the origins of this problem and are content
to call it the 3x + 1 problem.
One way to visualize the 3x + 1 problem is with a directed graph; see Figure
1. Each natural number is a vertex in the Collatz graph and there is an arrow
from j to k whenever T (j) = k. The 3x + 1 conjecture asserts that no matter
which vertex you start on, following the arrows in the Collatz graph always leads
to 4, 2, 1, 4, 2, 1, . . ..
Some seeds take a long time to reach 1. For example, n = 27 requires 111
iterations. Its Collatz sequence climbs all the way up to 9,232 before coming back
down (see the notes for 1929 for an example of another sequence that demonstrates
this sort of behavior). This highlights one of our main obstacles: there is no simple
way to predict how high the Collatz sequence for a given seed reaches. As of 2015,
the 3x + 1 conjecture has been verified for all seeds less than 260 . Although this is
an overwhelming amount of numerical evidence, it is not a proof (see the notes for
1930 for examples of misleading computations).
101
102 1932. THE 3x + 1 PROBLEM
Figure 1. The map T : N → N induces a directed graph on the

natural numbers. A portion of this graph is represented here. The
3x + 1 problem suggests that any starting point eventually leads
to the cycle 1, 4, 2, 1, 4, 2, . . ..
One can prove that the distribution of digits converges, in an appropriate sense,
to Benford’s law on digit bias (see the 1938 entry and [4, 7]). That is, if you take
a large starting seed and look at all the iterates until it hits the cycle 4, 2, 1, then
with probability tending to 1 the digit distribution converges to Benford’s law.
One can make this precise: given any tolerance, the number of starting seeds in an
interval [1, X] that are more than this tolerance away from Benford tends to zero
as X → ∞.
How can one disprove the 3x + 1 conjecture? There are two ways in which
the conjecture could be false. There might be a seed whose Collatz sequence is
unbounded. Or there might be a periodic orbit other than 4, 2, 1 (it is known that
there are no other periodic orbits of length 100,000,000 or less [3]).
Some have described the 3x + 1 problem as a Soviet conspiracy to slow down
American mathematics since so many people tried working on it, tempted by its
apparent simplicity. Paul Erdős said that mathematics is not yet ready to address
questions such as the the 3x + 1 problem.

Proposed by Jeffrey Lagarias, University of Michigan.
Here we consider the original function G : Z → Z, defined by
G(3n) = 2n, G(3n + 1) = 4n + 1, and G(3n + 2) = 4n + 3,
that Collatz wrote down on July 1, 1932. It is a permutation of Z and its inverse
is given by
G−1 (2n) = 3n, G−1 (4n + 1) = 3n + 1, and G−1 (4n + 3) = 3n + 2.
One can show that G maps N onto N, so it induces a permutation of N too. One
finds that G(1) = 1 is a fixed point, that G(2) = 3 and G(3) = 2 form a periodic
orbit of period 2, and that
G(4) = 5, G(5) = 7, G(7) = 9, G(9) = 6, and G(6) = 4
form a periodic orbit of period 5.
(a) What happens for n = 8? Computation indicates that the forward orbit
{Gk (n) : k ≥ 0} of n = 8 includes numbers larger than 10400 . But is the orbit
infinite? This question is the original Collatz problem and it has been proposed
independently several times, starting with Murray S. Klamkin (1921–2004) in
1963 [2]. It too is unsolved and could be as hopeless as the 3x + 1 problem.
(b) For N = 1, 2, . . ., let
SN = {n ∈ [1, N ] : Gk (8) = n for some k ∈ Z}.
It is conjectured that
|SN |
lim = 0. (1932.1)
N →∞ N
Probabilistic models suggest that |SN | = O(log N ) as N → ∞ and computer
experiments support this. So there seems to be “room to spare” in trying to
establish (1932.1). Nevertheless, this problem seems difficult. The reader is
warned.
(c) Consider the full forward and backward orbit of n = 8:
S∞ = {n ∈ N : Gk (8) = n for some k ∈ Z}.
Disprove that there are only finitely many natural numbers that are not in S∞ .
This assertion sounds simple to resolve and it is much weaker than (1932.1).
Nevertheless, it is an open problem and may be as intractable as the 3x + 1
problem.
1932: Comments
A heuristic approach. When stuck on a difficult conjecture, one can try to
give heuristic arguments for or against its validity. To simplify our model, we omit
the troublesome +1 in the definition of the Collatz function. Since half of the even
numbers are divisible by 2 and not by 4, and a fourth are divisible by 4 and not by 8,
and so on, we consider the functions H2 (x) = 3x/2, H4 (x) = 3x/4, H8 (x) = 3x/8,
and so forth. Our heuristic approximation to the Collatz function is denoted H;
it is obtained by applying H2k with probability 1/2k for k = 1, 2, . . .. The hope is
that this related problem is easier to analyze and that its behavior will shed light
on the original problem.
104 1932. THE 3x + 1 PROBLEM
It is more appropriate to consider the expected value of log H(x) since there
are products involved. According to our model,1
∞ ∞
1 log(3x/2k )
E[log H(x)] = log H 2k (x) =
2k 2k
k=1 k=1
∞

k log 2
= log x + log 3 − = log x + log(3/4)
2k
k=1
< log x.
Consequently, iterating H once decreases the size of the expected outcome. Re-
peated iterations should continue to decrease. Not only does such an argument
lead to heuristic support for the 3x + 1 conjecture, it also suggests roughly how
many steps one needs to iterate until we reach 1. Since each iteration tends to
replace x with 34 x, the expected number of iterations should satisfy (3/4)m x = 1;
that is,
log x
m≈ .
log 4/3
Numerical data strongly supports this rate; see [5, 6] for more on these ideas.
The idea of replacing a deterministic problem with a random one is applicable in
many other settings. One can do this with prime numbers to build intuition about
a host of problems. However, one must be careful. Just as the 3x + 1 problem has
some structure that is lost in the conversion to a random model, the actual sequence
of primes has additional structure not present in random analogues. While random
models are useful, they sometimes give the wrong answer in certain regimes.
Lychrel numbers. We end with an example that leads to another simply

stated open problem. Consider the function L : N → N defined by L(n) = n+R(n),
in which R(n) is the number formed by reversing the decimal representation of n.
For instance, L(89) = 89 + 98 = 187,
L2 (89) = L(187) = 187 + 781 = 968,
and so forth. This leads to the following sequence:
89, 187, 968, 1837, 9218, 17347, 91718, 173437, 907808, 1716517,
8872688, 17735476, 85189247, 159487405, 664272356, 1317544822,
3602001953, 7193004016, 13297007933, 47267087164, 93445163438,
176881317877, 955594506548, 1801200002107, 8813200023188. . . .
The number L24 (89) is the palindrome 8,813,200,023,188; it is the same read forward
or backward. Most natural numbers eventually appear to reach a palindrome after
repeated applications of L.
A Lychrel number is a natural number for which this process never yields a
palindrome. Brute force computations show that no n ≤ 195 is a Lychrel number,
but no one is sure about 196 (this leads to an alternative name for this iteration:
the 196-algorithm). Nobody knows whether Lychrel numbers exist, but 196 sure
∞
1 To sum differentiate the identity ∞
k
k=1 2k , n=0 z = (1 − z)
n −1 , valid for |z| < 1, multiply
∞
the result by z, and obtain n=1 nz n = z/(1 − z)2 . Then substitute z = 1/2.
looks like a strong candidate:

196, 887, 1675, 7436, 13783, 52514, 94039, 187088, 1067869,
10755470, 18211171, 35322452, 60744805, 111589511, 227574622,
454050344, 897100798, 1794102596, 8746117567, 16403234045,
70446464506, 130992928913, 450822227944, 900544455998. . . .
Over a billion iterates have been computed without reaching a palindrome. Exten-
sive computation suggests that the following integers are Lychrel numbers [9]:
196, 295, 394, 493, 592, 689, 691, 788, 790, 879, 887, 978, 986,
1495, 1497, 1585, 1587, 1675, 1677, 1765, 1767, 1855, 1857, 1945,
1947, 1997, 2494, 2496, 2584, 2586, 2674, 2676, 2764, 2766, 2854,
2856, 2944, 2946, 2996, 3493, 3495, 3583, 3585, 3673, 3675.
Curiously, Lychrel numbers are known to exist in other bases. For example, in
binary the number 10110 (which is 22 in decimal) is a Lychrel number. Can you
prove it?
Bibliography
[1] R. K. Guy, Don’t try to solve these problems!, Amer. Math. Monthly 90 (1983), 35–41. http://
www.jstor.org/discover/10.2307/2975688?uid=3739256&uid=2&uid=4&sid=21102550539183.
[2] M. S. Klamkin, Problem 63-13∗ , SIAM Review 5 (1963), 275–276.
[3] L. Halbeisen and N. Hungerbühler, Optimal bounds for the length of rational Collatz cycles,
Acta Arith. 78 (1997), no. 3, 227–239, DOI 10.4064/aa-78-3-227-239. MR1432018
[4] A. V. Kontorovich and S. J. Miller, Benford’s law, values of L-functions and the 3x + 1
problem, Acta Arith. 120 (2005), no. 3, 269–297, DOI 10.4064/aa120-3-4. http://arxiv.org/
pdf/math/0412003v2. MR2188844
[5] J. C. Lagarias, The 3x + 1 problem and its generalizations, Amer. Math. Monthly 92 (1985),
no. 1, 3–23, DOI 10.2307/2322189. MR777565
[6] J. C. Lagarias (ed.), The ultimate challenge: the 3x + 1 problem, American Mathematical
Society, Providence, RI, 2010. MR2663745
[7] J. C. Lagarias and K. Soundararajan, Benford’s law for the 3x + 1 function, J. London Math.
Soc. (2) 74 (2006), no. 2, 289–303, DOI 10.1112/S0024610706023131. http://arxiv.org/pdf/
math/0509175.pdf. MR2269630
[8] H. L. Montgomery and K. Soundararajan, Primes in short intervals, Comm. Math. Phys. 252
(2004), no. 1-3, 589–617, DOI 10.1007/s00220-004-1222-4. MR2104891
[9] The On-Line Encyclopedia of Integer Sequences, A023108 (Positive integers which appar-
ently never result in a palindrome under repeated applications of the function f (x) =
x + (x with digits reversed), http://oeis.org/A023108.
1933
Skewes’s Number
Introduction
For a few decades, Skewes’s number held the record as the largest finite number
to meaningfully appear in a mathematical research paper. Let π(x) denote the
number of primes at most x and let
x
dt
Li(x) = (1933.1)
2 log t
denote the offset logarithmic integral function. One version of the prime number
theorem (see the 1913 and 1919 entries) says that
π(x)
lim = 1.
x→∞ Li(x)
This is illustrated in Figure 1. The logarithmic integral gives a better approximation
to π(x) than x/ log x, which is used in other formulations of the prime number
theorem; see Table 1.
(a) x ≤ 100 (b) x ≤ 1,000
(c) x ≤ 10,000 (d) x ≤ 100,000
Figure 1. Graphs of Li(x) versus π(x) on various scales.
107
108 1933. SKEWES’S NUMBER
Table 1. The logarithmic integral Li(x) is a better approximation

to the prime-counting function π(x) than is x/ log x. The entries
in the table have been rounded to the nearest integer.
x π(x) Li(x) x/ log x

1000 168 177 145
10,000 1,229, 1,245 1,086
100,000 9,592 9,629 8,686
1,000,000 78,498 78,627 72,382
10,000,000 664,579 664,917 620,421
100,000,000 5,761,455 5,762,208 5,428,681
For all practically computable values of x, the function li(x) = Li(x) + log 2
satisfies li(x) > π(x). Based upon overwhelming numerical evidence, it was con-
jectured that this held for all x. In 1914, John Edensor Littlewood (1885–1977)
showed that li(x) − π(x) changes sign infinitely many times. Littlewood asked one
of his students, a South African named Stanley Skewes (1899–1988), to compute
how high one must go to find the first integer s0 for which π(s0 ) > li(s0 ). Assuming
the truth of the Riemann hypothesis,1 Skewes proved in 1933 that
e79
s0 < ee .
In 1955, he showed that if the Riemann hypothesis is false, then
7.705
ee
s0 < ee .
Both of these extraordinary numbers are sometimes referred to as Skewes’s number .
While much progress has been made, the best upper bounds on s0 are still on the
order of e728 (or about 10316 ). It seems hopeless to expect the first sign change to
be found by computer.
Since Skewes’s second bound is larger than the first, we can conclude that
li(x)−π(x) changes sign somewhere before exp(exp(exp(exp(7.705)))). Why? There
are two cases. Either the Riemann hypothesis is true or it is false, and Skewes
covered both cases! Voilà! For another striking example of this sort of “magical”
reasoning, see the 1935 entry.
Are we overlooking a third possibility? Could the Riemann hypothesis (see the
1942 and 1945 entries) be undecidable, say in ZFC (Zermelo–Fraenkel set theory
with the axiom of choice)? If it is false, then it must be provably false in ZFC.
Why? Because it is known to be equivalent, under ZFC, to various elementary
statements about natural numbers. Let
1 1
Hn = 1 + + · · · +
2 n
denote the nth harmonic number . In 2002, Lagarias showed that the statement

“for each n ≥ 1, d ≤ Hn + eHn log Hn ”
d|n
1 The Riemann hypothesis, one of the seven Clay Millennium Problems (see the comments for
the 2000 entry), is one of the most important open problems in mathematics. Its veracity would
have numerous applications throughout number theory and cryptography. It’s going to take a
while to build up to! See below and the entries for 1942, 1945, 1948, 1967, and 1987.
is equivalent to the Riemann hypothesis [3]. Thus, if the Riemann hypothesis (RH)
is false, there is a natural number n for which the preceding inequality is violated
and hence there is a finite computation that disproves the Riemann hypothesis. On
the other hand, if RH is undecidable in ZFC, then it is true (but just not provable
in ZFC; see the 1929 entry on Gödel’s work). Why? If the RH were undecidable in
ZFC, then no natural number n violating Lagarias’s condition exists (the existence
of such an n would lead to a quick proof of the falsehood of the Riemann hypothesis).
Thus, if the RH is undecidable in ZFC, then Lagarias’s condition holds, so the
RH is true (just not provable). See the 1924, 1929, and 1963 entries for more
information on axiom systems, and the 1987 entry for connections between the
Riemann hypothesis and counting primes.

Let
e↑n (x) = exp(exp(· · · exp(exp(x)))),
in which there are n iterated exponentials. Thus, Skewes’s 1955 result is the bound
s0 ≤ e↑4 (7.705). If we were to write this as 10y , what would y equal? More
generally, if
e↑n (x) = 10f (x;n) ,
how fast does f grow with n? With x? The functions e↑n (x) are also known as
iterated towers. For more rapidly growing quantities, see the 1926 and 1992 entries.
1933: Comments
A proof technique. Skewes’s arguments use a powerful proof technique:
break the problem into an exhaustive set of cases, where in each case you have
additional facts at your disposal. For another example of this approach, see the
1935 entry.
Term-by-term multiplication and Mertens’s theorem. Here ∞ are some

facts
∞ about infinite series that we will need shortly. Suppose that n=0 an and
n=0 bn are two convergent series of complex numbers. Naively multiplying the
two series term-by-term suggests that

∞
∞
ai bj = (a0 + a1 + a2 + · · · )(b0 + b1 + b2 + · · · )
i=0 j=0
= a0 b0 + (a0 b1 + a1 b0 ) + (a0 b2 + a1 b1 + a2 b0 ) + · · ·
∞

= cn ,
n=0
n ∞ ∞
in which
∞ cn = k=0 ak bn−k . The series n=0 cn is the Cauchy product of n=0 an
and n=0 bn . This term-by-term multiplication of series is permissible if both of
110 1933. SKEWES’S NUMBER
the series involved are absolutely convergent.2 This is used implicitly in calculus,
complex variables, and differential equations whenever power series methods are
involved.
If both series are conditionally convergent (convergent but not absolutely con-
vergent), then their Cauchy product series can diverge. An example is furnished by
∞
an = bn = √ (−1)n
n+1
. The alternating series test confirms that ∞ n=0 an and n=0 bn
converge. However,
n n
1
|cn | = ak bn−k =
k=0 k=0
(k + 1)(n − k + 1)

n
1
n
1
n
2
≥ 2 = n =
n +1 n+2
k=0
2 +1 k=0 2 k=0
2 2n + 2
= (n + 1) =
n+2 n+2
∞
does not tend to zero, so n=1 cn diverges.
Mertens’s theorem, due to Franz Mertens (1840–1927), ensures that if at least
one of the two series involved is absolutely convergent,
∞ then term-by-term
∞ multi-
plication is permissible. To be more specific, if n=0 an = A and n=0 bn = B
are convergent series of complex numbers, atleast one of which is absolutely con-
∞
vergent, then their Cauchy product series n=0 cn converges to AB. Proving
Mertens’s theorem is a good exercise in analysis. Here is a sketch. Let An , Bn ,
and Cn be the nth partial sums of the three series involved and consider the iden-
n
tity
n C n = A n B + i=0 (Bi − B)an−i . Since An → A, the key is to show that
i=0 (B i − B)a n−i → 0 as n → ∞.
The Riemann zeta function and the Euler product formula. In homage
to Riemann, who wrote s = σ + it to denote his complex variable, we follow him
and use the letter s below to refer to a complex number. The Riemann hypothesis
concerns the location of the complex zeros of the Riemann zeta function
∞
1
ζ(s) = s
, (1933.2)
n=1
n
which is defined initially for Re s > 1. It might at first appear strange to call
(1933.2) by such a fancy name. Indeed, (1933.2) is the familiar p-series from
calculus. However, the Riemann zeta function is the critical function that links
analysis and number theory. In particular, the deepest properties of the prime
numbers are encoded in the Riemann zeta function.
The connection between the innocuous looking Riemann zeta function and the
prime numbers is furnished by the Euler product formula. If Re s > 1, then
∞
1 1
−1
= 1 − . (1933.3)
n=1
ns p prime
ps
∞ ∞
2A series n=0 an is absolutely convergent if n=0 |an | converges. Absolute convergence
∞ (−1)n+1
implies convergence, but the converse is not true. The alternating harmonic series
n=1 n
converges to log 2, but the harmonic series ∞ 1
n=1 n diverges.
Since quite a few of our entries (1928, 1942, 1945, 1967, and 1987) involve the
Riemann zeta function, we can take the liberty to develop the topic slowly and
deliberately.
If p is a fixed prime number and s > 1, then the series
∞ ∞ ∞ n −1
1 1 1 1
= = = 1 −
n=0
(pn )s n=0
pns n=0
ps ps
converges absolutely since |1/ps | < 1. By Mertens’s theorem,
−1 −1
1 1 1 1 1 1 1 1
1− s 1− s = 1 + s + s + s + ··· 1 + s + s + s + ···
2 3 2 4 8 3 9 27
1 1 1 1 1 1 1
= 1 + s + s + s + s + s + s + s + ··· ,
2 3 4 6 8 9 12
in which the last sum includes terms corresponding exactly to those numbers whose
prime factorizations involve only 2 or 3. Since Re s > 1, the preceding series is
absolutely convergent. Similarly,
−1 −1 −1
1 1 1
1− s 1− s 1− s
2 3 5
1 1 1 1 1 1 1 1 1 1
= 1 + s + s + s + s + s + s + s + s + s + s + ··· ,
2 3 4 5 6 8 9 10 12 15
in which the sum involves those numbers whose only prime factors are 2, 3, or 5,
and so forth. Since the tail end of a convergent series tends to zero,

∞
1 1
−1

∞
1
s
− 1 − s ≤ → 0
n=1
n p ns
p prime n=N
p≤N
as N → ∞. This establishes the Euler product formula (1933.3).

We get Euclid’s theorem on the infinitude of the primes as a corollary. If
there were only finitely many primes, then the right-hand side of (1933.3) would
converge to a finite limit as s → 1+ . However, the left-hand side of (1933.3) diverges
as s → 1+ since its terms tend to those of the harmonic series.
Bibliography
[1] T. M. Apostol, Introduction to analytic number theory, Undergraduate Texts in Mathematics,
Springer-Verlag, New York-Heidelberg, 1976. MR0434929
[2] H. Davenport, Multiplicative number theory, 3rd ed., revised and with a preface by Hugh
L. Montgomery, Graduate Texts in Mathematics, vol. 74, Springer-Verlag, New York, 2000.
MR1790423
[3] J. C. Lagarias, An elementary problem equivalent to the Riemann hypothesis, Amer. Math.
Monthly 109 (2002), no. 6, 534–543, DOI 10.2307/2695443. MR1908008
[5] S. Skewes, On the Difference pi(x) − li(x) (I), J. London Math. Soc. 8 (1933), no. 4, 277–283,
DOI 10.1112/jlms/s1-8.4.277. MR1573970
[6] S. Skewes, On the difference π(x) − li x. II, Proc. London Math. Soc. (3) 5 (1955), 48–70, DOI
10.1112/plms/s3-5.1.48. MR0067145
1934
Khinchin’s Constant
Introduction
Each irrational real number x has a unique infinite continued fraction expansion
1
x = a0 (x) + ,
1
a1 (x) +
1
a2 (x) + 1
a3 (x) + ···
in which the ai (x) are the continued fraction digits of x and a1 (x), a2 (x), . . . are
positive integers (see the 1931 and 1972 entries or [6] and the references therein for
more details). For instance,
1
π = 3+ ,
1
7+
1
15 +
1
1+
1
292 +
···
which we write as
π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1,
84, 2, 1, 1, 15, 3, 13, 1, 4, 2, 6, 6, 99, 1, 2, 2, 6, 3, 5, 1, 1, 6, 8, 1,
7, 1, 2, 3, 7, 1, 2, 1, 1, 12, 1, 1, 1, 3, 1, 1, 8, 1, 1, 2, 1, 6, 1, 1, 5,
2, 2, 3, 1, 2, 4, 4, 16, 1, 161, 45, 1, 22, 1, 2, 2, 1, 4, 1, 2, 24,. . . ].
Truncating this expansion after a few steps provides excellent rational approxima-
tions to π:
1 22
3+ = = 3.142857 . . . ,
7 7
1 333
3+ 1 = = 3.141509 . . . , and
7 + 15 106
1 355
3+ 1 = = 3.141592 . . . .
7 + 15+ 1 113
1
These approximations are accurate to 2, 4, and 6 decimal places, respectively.

Continued fractions provide an alternative to base-dependent expansions, such
as binary or decimal expansions. Since they are base-independent, there is a pos-
sibility that the continued fraction digits might have some deep meaning. In 1934,
Aleksandr Khinchin proved that, for almost every real number x, the geometric
113
114 1934. KHINCHIN’S CONSTANT
mean of the first n digits in the continued fraction expansion of x converges to the
same constant K as n → ∞:

lim n a1 (x)a2 (x) · · · an (x) = K. (1934.1)
n→∞
That means that, for every > 0, the set of real numbers x for which (1934.1) fails
can be covered by countably many open intervals of total length < . The constant
K is called Khinchin’s constant; it is given by
∞ log2 r
1
K = 1+ = 2.6854520010653064453 . . . .
r=1
r(r + 2)
It is not known whether K is rational, algebraic irrational, or transcendental. Be-

sides examples contrived for the purpose, we do not know a “natural” example of
an x for which the geometric mean of the ai (x) converges to K. However, numerical
experiments suggest that π, γ, and Khinchin’s constant itself are likely candidates
(the geometric mean for e diverges). Since you are dying to know, the continued
fraction expansion of Khinchin’s constant is
K = [2; 1, 2, 5, 1, 1, 2, 1, 1, 3, 10, 2, 1, 3, 2, 24, 1, 3, 2, 3, 1, 1,
1, 90, 2, 1, 12, 1, 1, 1, 1, 5, 2, 6, 1, 6, 3, 1, 1, 2, 5, 2, 1, 2, 1, 1, 4,
1, 2, 2, 3, 2, 1, 1, 4, 1, 1, 2, 5, 2, 1, 1, 3, 29, 8, 3, 1, 4, 3, 1, 10,
50, 1, 2, 2, 7, 6, 2, 2, 16, 4, 4, 2, 2, 3, 1, 1, 7, 1, 5, 1, 2, 1, 5, 3, 1,
1, 1, 2, 2, 2, 1, 13, 11, 770, 1, 4, 2, 1, 14, 1, 14, 2, 1, 6, 1, 1, 1, 9,
2, 53, 1, 2, 2, 1, 9, 5, 6, 2, 1, 2, 1, 5, 4, 1, 234, 7, 1, 1, 4, 3, 19,
3, 1, 10, 18, 8, 24, 1, 12, 1, 1, 10, 3, 2, 1, 32, 112, 5, 1, 1, 3, 2, 5,
1, 2, 1, 3, 2, 1, 2, 1, 1, 2, 2, 4, 1, 6, 4, 1, 2, 1, 8, 2, 1, 4, 2, 1, 1,
11, 1, 1, 1, 5, 3, 4, 2, 6, 2, 1, 2, 1, 1, 19, 1, 38, 2, 1, 1, 4, 6, 2, 50,
2, 1, 1, 2, 1, 4, 1, 5, 1, 2, 8, 13, 1, 2, 1, 1, 9, 1, 6, 3, 6, 1, 4, 2, 1,
272, 1, 1, 1, 1, 4, 1, 21, 3, 1, 2, 87, 1, 8, 1, 2, 3, 2, 1, 1, 2, 3, 16,
1, 5, 3, 5, 1, 1, 1, 10, 11, 45, 2, 331, 2, 1, 2, 1, 4, 1, 2, 2, 1, 3, 1,
1, 3, 1, 2, 2, 1, 13, 1, 3, 3, 2, 4, 4, 1, 4, 40,1, 9, 1, 4, 1, 1, 1,. . . ].
Does x = K satisfy (1934.1)?

Proposed by Jake Wellens, Caltech.
This problem explores some consequences of the conjectured transcendence
of K. Assume that K is transcendental and let x be a quadratic irrational (an
algebraic
irrational of degree 2; see p. 30). Prove that for any such x, the geometric
mean a1 (x)a2 (x) · · · an (x) does not converge to K.
n
1934: Comments
Solution to the problem. A quadratic irrational has a continued fraction
expansion that is eventually periodic (try to prove it). Thus, the geometric means
of its continued fraction digits converges to the th root of a product of integers, in
which denotes the length of the period. Consequently, the limit of the geometric
means are either rational or algebraic irrational, and hence not transcendental.
This solves the proposed problem.
e and Khinchin’s constant. While we cannot give an example of a number

for which the geometric mean of its continued fraction digits converges to K, we
can give a transcendental number for which it diverges, namely e. Its continued
fraction expansion is
e = [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, 1, 12, 1, 1, 14, 1,
1, 16, 1, 1, 18, 1, 1, 20, 1, 1, 22, 1, 1, 24, 1, 1, 26, 1, 1, 28, 1, 1,
30, 1, 1, 32, 1, 1, 34, 1, 1, 36, 1, 1, 38, 1, 1, 40, 1, 1, 42, 1, 1, 44,
1, 1, 46, 1, 1, 48, 1, 1, 50, 1, 1, 52, 1, 1, 54, 1, 1, 56, 1, 1, 58, 1,
1, 60, 1, 1, 62, 1, 1, 64, 1, 1, 66, 1, . . . ].
Since it will not affect the limiting behavior, let us change the first 2 to a 1, so that
our string of digits is 1, 1, 2, 1, 1, 4, 1, 1, 6, . . .. If we look at the geometric mean of
the first 3n digits we have
2n n 1/3n
1 2 n! = 21/3 n!1/3n .
Since Stirling’s formula 1 states that
√ n n
n! ≈ 2πn ,
e
the geometric mean of the first 3n digits is comparable to (2n/e)1/3 , which diverges
to infinity.
Nonsimple continued fractions for π and e. Why does e have a “nice”

continued fractional expansion while π does not? Maybe we are looking at things
the wrong way. Instead of considering simple continued fraction expansions, in
which all the numerators are 1’s, the situation drastically changes if we allow them
to vary. One example, which restores balance between these two fundamental
constants, is Brouncker’s formula:
4 12
= 1+ .
π 32
2+
52
2+
72
2+
1 + ···
Of course, e has amazing expansions as well, such as
1
e = 2 + ;
1
1 +
2
2 +
3
3 +
4 + ···
see the 1972 entry for a derivation of this formula. This is just the beginning of
the story; see [1] and the references therein for the Rogers–Ramanujan continued
fraction.
1 First note that n! = n(n − 1) · · · 2 · 1, so log(n!) = log n + log(n − 1) + · · · + log 2 + log 1.

We then approximate the sum with an integral and find log(n!) ≈ 1n log x dx = n log n − (n − 1),
with error on the order of half the sum of the first and last terms. Exponentiating yields a rough
approximation to n!. There are numerous elementary proofs of Stirling’s formula; see [2] and [5].
116 1934. KHINCHIN’S CONSTANT
Bibliography
[1] B. C. Berndt, H. H. Chan, S.-S. Huang, S.-Y. Kang, J. Sohn, and S. H. Son, The
Rogers-Ramanujan continued fraction, Continued fractions and geometric function the-
ory (CONFUN) (Trondheim, 1997), J. Comput. Appl. Math. 105 (1999), no. 1-2, 9–
24, DOI 10.1016/S0377-0427(99)00033-3. http://www.sciencedirect.com/science/article/
pii/S0377042799000333. MR1690576
[2] C. L. Frenzen, A New Elementary Proof of Stirling’s Formula, Math. Mag. 68 (1995), no. 1,
55–58. https://www.maa.org/sites/default/files/269138004440.pdf. MR1573069
[3] A. Khintchine, Metrische Kettenbruchprobleme (German), Compositio Math. 1 (1935),
361–382. http://archive.numdam.org/ARCHIVE/CM/CM_1935__1_/CM_1935__1__361_0/
CM_1935__1__361_0.pdf. MR1556899
[4] A. Ya. Khinchin, Continued fractions, The University of Chicago Press, Chicago, Ill.-London,
1964. MR0161833
1935
Hilbert’s Seventh Problem
Introduction
Our problem collection is inspired, as are so many other collections, by the
problems David Hilbert proposed in his keynote address at the International Con-
gress of Mathematicians in 1900; see [1]. These problems were meant to chart
important directions for research in the 20th century. A solution to any of Hilbert’s
problems brings instant fame and membership in “The Honors Class” [3].
Here is a curious warmup to one of Hilbert’s problems. We claim that there
are irrational numbers α and β so that αβ is rational. To show this, we consider
√
√ 2
γ = 2 .
There are two possibilities.

√
(a) If γ is rational, take α = β = 2. In this case, α and β are irrational and
αβ = γ is rational (by assumption).
√ √2 √
(b) If γ is irrational, take α = 2 and β = 2. Then
√ √2 √ √ √2√2 √ 2
αβ = ( 2 ) 2 = 2 = 2 = 2.
In this case, α and β are irrational and αβ is rational.

Since both cases lead to the desired conclusion, the proof is finished. If you are
like most people, the preceding proof will leave you feeling unsatisfied. The proof
is correct, but it does not indicate which of the two possibilities is true. This is a
quintessential example of an existential proof ; it proves the existence of α and β
without indicating specific values of α and β that are guaranteed to work.
Hilbert’s seventh problem is the following. Let α and β be algebraic numbers
with β irrational. Prove that αβ is transcendental whenever α = 0, 1. Problems
along these lines have a long and storied history. For example, in 1748 Leonhard
Euler proposed that if α = 0, 1 is rational and β is an irrational algebraic number,
then αβ is irrational. This is a weak version of Hilbert’s seventh problem.
In 1934 Alexandr Gelfond (1906–1968) and Theodor Schneider (1911–1988) in-
dependently resolved Hilbert’s problem in the affirmative. If we invoke the Gelfond–
Schneider theorem, then we can immediately deduce that the Gelfond–Schneider
constant
√
2 2
≈ 2.665144142690225188650297 . . .
is transcendental. Thus, its square root, γ, cannot be algebraic; that is, it is case
(b) in the proof above that is correct. The transcendence of the Gelfond–Schneider
constant was first established in 1930 by Rodion Kuzmin.
117
118 1935. HILBERT’S SEVENTH PROBLEM
Of course, no discussion along these lines would be complete without mention-

ing Euler’s formula
eix = cos x + i sin x,
in which i2 = −1. Setting x = π yields the marvelous relation
eiπ + 1 = 0
between the five most important constants in mathematics (0, 1, π, e, i). Among
other things, a transcendental power of a transcendental number can be rational.
Currently, it is unknown if either of e + π or eπ is transcendental, but we do
know that at least one of them is; see the comments for the 1973 entry.

Proposed by Jesse Freeman and Steven J. Miller, Williams College.
The problems below trace the development of the theory of transcendental
numbers and highlight the power of the Gelfond–Schneider theorem.
(a) Euler: For α ∈ C, let Bα = {β ∈ C : αβ ∈ Q}. Show that for an algebraic
irrational γ,

Bγ Bα . (1935.1)
α∈Q
You may use the Gelfond–Schneider theorem. Describe the union on the right-
hand side of (1935.1) and investigate the algebraic structure of Bγ .
(b) Cantor: Use the fundamental theorem of algebra to prove that the set of
algebraic numbers is countable (see the footnote on p. 31 for an outline of the
proof). Since R is uncountable, this shows that almost all real numbers are
transcendental. Although this argument proves that almost all numbers are
transcendental, it does not provide an explicit example of a transcendental
number.
(c) Liouville: Suppose α is an algebraic number of degree d > 1 (see p. 30 for the
definition). Liouville’s theorem asserts that there exists a positive constant
C(α) such that for any rational number a/b,
a
C(α)
α − > . (1935.2)
b bd
We say α ∈ R is a Liouville number if for every positive integer n there are
integers a and b with b > 1 such that
a
1
0 < α − < n .
b b
The result above implies that all Liouville numbers are transcendental; how-
ever, not all transcendental numbers are Liouville numbers. Show that the
set of Liouville numbers in the interval [−1, 1] has measure zero.1 See the
notes below for a proof of Liouville’s theorem and the explicit construction of
a transcendental number.
1 That is, for every > 0, the set of Liouville numbers in [−1, 1] can be covered by countably
many open intervals of total length < .

(d) Gelfond/Schneider/Hilbert: Using the Gelfond–Schneider theorem, show

that if the ratio of two angles in an isosceles triangle is algebraic and irrational,
then the ratio between the sides opposite those angles is transcendental.
1935: Comments
Proof of Liouville’s theorem. Suppose that α ∈ R is a root of
f (x) = cd xd + cd−1 xd−1 + · · · + c1 x + c0 ,
in which the coefficients are integers and cd = 0. Since f has only finitely many
roots, there is a δ > 0 so that f (x) = 0 whenever 0 < |x − α| ≤ δ. Write
f (x) = (x−α)g(x), in which g is a polynomial of degree d−1. Since g is continuous,
there is an M > 0 such that |g(x)| ≤ M for |x − α| ≤ δ.
Suppose that a, b ∈ Z, b > 1, and 0 < |α − a/b| ≤ δ. Then g(a/b) = 0 and
hence
d
a f (a/b) cn ( abd ) + · · · + c1 ( ab ) + c0
−α = =
b g(a/b) g(a/b)
cn a + cd−1 a b + · · · + c0 bd
d d−1
= .
bd g(a/b)
The numerator is an integer which is nonzero since f (a/b) = 0, so
a
1
α − ≥
b M bd
whenever 0 < |α − a/b| ≤ δ. On the other hand, if |α − a/b| > δ, then
a
δ
α − > n
b b
since b ≥ 1. Consequently, 0 < C(α) < min{δ, M
1
} ensures that

a C(α)
α − > .
b bn
This concludes the proof of Liouville’s theorem.
A specific transcendental number. We are in a position to prove, without

heavy machinery like the Gelfond–Schneider theorem, that a single, specific number
is transcendental. We claim that Liouville’s constant
∞
1
λ = = 0.11000100000000000000000100000 . . .
n=1
10n!
is transcendental; this number was “cooked up” exactly for this purpose. It is
irrational since its decimal expansion is not eventually repeating. Thus, if λ is
algebraic, its degree is at least 2. So suppose toward a contradiction that λ is an
algebraic number of degree d ≥ 2. If n > d, then consider the nth partial sum
n
1 a
j!
=
j=1
10 b
120 1935. HILBERT’S SEVENTH PROBLEM
of the series defining λ. Putting things over a common denominator, we find that
the preceding is a rational number with denominator b = 10m! for some m. Thus,
a
1 1 1
λ − = + + (n+3)! + · · ·
b 10(n+1)! 10(n+2)! 10
1 1 1
= 1 + (n+2)!−(n+1)! + (n+3)!−(n+1)! + · · ·
10(n+1)! 10 10

1 1 1
= 1 + (n+1)!(n+1) + (n+1)!(n+2) + · · ·
10(n+1)! 10 10

1 1 1 1 1
< 1+ + + ··· = ·
10(n+1)! 10 102 10(n+1)! 1 − 10
1
2
< .
10(n+1)!
Liouville’s theorem ensures that for n > d,
a
C(λ) C(λ) 2
0 < = < λ − <
bd 10n!d b 10(n+1)!
and hence
C(λ) 10n!d 10n!d
0 < < = = 10n!(d−n−1) → 0
2 10(n+1)! 10n!(n+1)
as n → ∞. This is a contradiction, so λ must be transcendental.
Bibliography
[1] D. Hilbert, Über das Unendliche, Math. Ann. 95 (1926), 161–190. http://link.springer.
10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf.
[3] B. H. Yandell, The honors class: Hilbert’s problems and their solvers, A K Peters, Ltd., Natick,
MA, 2002. MR1880187
[4] Wikipedia, Gelfond–Schneider theorem, https://en.wikipedia.org/wiki/Gelfond-
Schneider_theorem.
[5] Wikipedia, Liouville number, https://en.wikipedia.org/wiki/Liouville_number.
1936
Alan Turing
Introduction
Besides cracking codes at Bletchley Park during World War II and pioneering
the field of artificial intelligence, Alan Turing (1912–1954) might be best known
for his eponymous model of computation, the Turing machine (see Figure 1). The
machine features an infinite tape partitioned into squares and a moving head that
overlooks a single square at each moment in time. Squares start out blank but can
also contain symbols from a finite alphabet. The head can read symbols from and
write symbols to the tape. It also occupies one of n states-of-mind, which we simply
call states. These states serve as the machine’s memory. Computation occurs as
follows: the head reads a symbol from its current square, writes a new symbol to
the square (it might be the same symbol or a blank), and moves either to the left
or to the right while also (potentially) changing its state. The alphabet, states, and
transition rules constitute a finite description of a Turing machine.
In [4], Turing defined a universal machine, one that can take the description of
another Turing machine as input and then simulate that Turing machine. It is the
first example of the now ubiquitous virtual machine. Turing also used his machine
to define computable numbers, which are real numbers whose decimal values can
be written down successively, with each additional digit appearing after a finite
number of steps. These machines do not halt, but they always make progress. Most
Sensor to read, write, or erase
Symbol on tape
S1 S2 S3 S4 S5 S6 S7 S8
Moving tape
Figure 1. Depiction of a Turing machine.
121
122 1936. ALAN TURING
modern treatments of Turing machines deal with computable functions instead of

computable numbers. In this scenario the computation begins with a tape initialized
with some finite input. What remains on the tape after the machine halts is the
output. Thus, computable functions are functions that can be computed by a
Turing machine in a finite number of steps. Unlike the machines writing computable
numbers, these machines always halt. A classic function that is not computable
asks whether given the description of a Turing machine, will that machine halt on
every input? This is called the the halting problem and it remains a natural gateway
into the study of computability.
Though Stephen Kleene (1909–1994), Alonzo Church (1903–1995), and Emil
Leon Post (1897–1954) had already developed models of computation that were
equivalent in power, the Turing machine was the first to convince Kurt Gödel (see
the 1929 entry) of what it truly meant to be an algorithm. That this really is the
correct definition of mechanical computability was established beyond any doubt
by Turing. Indeed, the Turing machine has remained the model of choice when
explaining, extending, or developing new concepts in computability and complexity
theory.

Proposed by Brent Heeringa, Williams College.
Suppose we restrict our attention to Turing machines with n states and one
additional HALT state, which tells the machine to immediately cease computation.
In addition, suppose these machines are only allowed to read and write 0’s and
1’s, with 0’s serving as the blank symbol, so the tape is initially all 0’s. Let Σ(n)
be the maximum number of 1’s appearing on the tape after any n-state Turing
machine halts; Σ(n) is called the busy beaver function and any n-state, halting
Turing machine achieving Σ(n) is called a busy beaver . It is clear that Σ(n) is well-
defined because there are only a finite number of n-state halting Turing machines
over the binary alphabet {0, 1}. It is known that Σ(3) = 6 and Σ(4) = 13, but
the exact value of Σ(5) is unknown (it is at least 4,098). As a warm-up, show that
Σ(3) = 6. Then show that, in general, Σ(n) is not computable. Can you find any
upper or lower bounds on its growth rate?
1936: Comments
Turing and Enigma. It would be a disservice of the highest rank not to
mention the valuable work Turing and his colleagues performed for the British
government in cracking the German Enigma encryption; see Figures 2 and 3. To put
these contributions in perspective, estimates of their worth range from shortening
the war by two to four years, to turning the tide to an Allied victory. Turing was
one of the driving forces in cracking the supposedly uncrackable codes. For more
on these efforts see the 1943 entry.
Much of Turing’s work during the war was classified and was kept classified for
a variety of reasons afterwards. However, with the passage of time the need for such
security lessened, and much of his work is now publicly available; see [5]. Sadly,
Turing, who was a homosexual, was prosecuted for “gross indecency” and forced to
undergo chemical castration. He committed suicide at the age of 41. Speaking in
Figure 2. An Enigma machine.
2009, British Prime Minister Gordon Brown (1951– ) apologized:

Thousands of people have come together to demand justice for Alan
Turing and recognition of the appalling way he was treated. While
Turing was dealt with under the law of the time and we can’t put the
clock back, his treatment was of course utterly unfair and I am pleased
to have the chance to say how deeply sorry I and we all are for what
happened to him . . . . So on behalf of the British government, and all
those who live freely thanks to Alan’s work I am very proud to say:
we’re sorry, you deserved so much better.
Alan Turing was officially pardoned by Elizabeth II (1926– ) in 2014.
Cryptography. An entry on Alan Turing seems like the appropriate place

to begin discussing cryptography. This is a topic that we will return to every
now and then; see the 1943, 1952, and 1977 entries. Suppose that Alice and Bob
wish to communicate and that they want to prevent an eavesdropper, Eve, from
understanding their conversation. Any information that Alice and Bob wish to
exchange can be encoded using numbers. For instance, the American Standard
Code for Information Interchange (ASCII) is a standard method for converting
symbols into numerical equivalents. There are 256 = 28 symbols that can be
124 1936. ALAN TURING
Figure 3. Detail of an Enigma machine.
represented in ASCII. For example, the string ASCII corresponds (in decimal) to
65 83 67 73 73 and (in binary) to
01000001 01010011 01000011 01001001 01001001.
Each symbol is represented by eight bits, that is, a sequence of eight 0’s and 1’s.
These transmitted segments can be augmented to ensure a more accurate trans-
mission. For instance, the seven bits 0100110 that Alice wants to send might be
augmented to 01001101. The additional 1 is a checksum bit; it means that there
is an odd number of ones in 0100110. If Bob receives 01001100, then he knows
an error has occurred and he can request that Alice resend the block. There are,
of course, many more effective and fascinating error-detecting methods that have
been developed over the years.
If Alice and Bob share a common key beforehand, then there are many methods
they can use to encrypt their data. For instance, the National Institute of Standards
and Technology (NIST) adopted the Data Encryption Standard (DES) in 1976 and
the Advanced Encryption Standard (AES) in 2001. Since this is our first expedition
into cryptography, we discuss a simple technique that dates back to antiquity: the
Caesar cipher .
For the sake of readability and simplicity, we do not consider a blank space as
a character. Alice replaces each letter in the plaintext
HERE IS A MESSAGE ENCRYPTED WITH THE CAESAR
CIPHER USING THE KEY FIVE.
with the letter that occurs k places after it (with “wraparound”). We say that k is
the key that is used to encrypt the message. With k = 5, Alice sends the ciphertext
MJWJ NXFR JXXF LJJS HWDU YJIB NYMY MJHF JXFW
HNUM JWZX NSLY MJPJ DKNA JPMN.
Table 1. Frequency of letters (in percent) in English.
A 8.064 J 0.112 S 6.382

B 1.537 K 0.625 T 9.025
C 2.689 L 4.102 U 2.786
D 4.329 M 2.501 V 1.026
E 12.886 N 6.985 W 2.119
F 2.448 O 7.378 X 0.169
G 1.963 P 1.703 Y 1.806
H 6.099 Q 0.106 Z 0.097
I 6.906 R 6.157
to Bob. Observe that Alice padded the cipher text with nonsense to ensure that the
blocks are of uniform size. For Alice and Bob to use the Caesar cipher, they must
first share the key k (see the 1977 entry for an encryption method that eliminates
the need to share a secret key before communicating).
Eve can use frequency analysis to decipher an intercepted message, even though
she does not know k. For example, the letter E is the most common letter in English;
see Table 1. The uncommon letter J occurs often in the ciphertext, which suggests
that E is replaced by J. Thus, Eve guesses that k = 5 (the distance between E and
J) and obtains the plaintext message.
As its name suggests, this method of encryption was used by Julius Caesar
(100–44 BCE). Although the Caesar cipher is easily broken, in a time when most
of the population was illiterate and mathematically unsophisticated, it provided
adequate security. The following was encrypted with the Caesar cipher.
HSSN HBSP ZKPC PKLK PUAV AOYL LWHY AZVU LVMD OPJO AOLI
LSNH LPUO HIPA AOLH XBPA HUPH UVAO LYAO VZLD OVPU AOLP
YVDU SHUN BHNL HYLJ HSSL KJLS AZPU VBYN HBSZ AOLA OPYK
HSSA OLZL KPMM LYMY VTLH JOVA OLYP USHU NBHN LJBZ AVTZ
HUKS HDZA OLYP CLYN HYVU ULZL WHYH ALZA OLNH BSZM YVTA
OLHX BPAH UPAO LTHY ULHU KAOL ZLPU LZLW HYHA LAOL TMYV
TAOL ILSN HLXY
Use frequency analysis to determine possible keys and then decipher the message.
See below for the answer.
Bibliography
[1] G. Boolos and R. Jeffrey, Computability and Logic (third edition), Cambridge University Press,
1999.
[2] K. Gödel, Undecidable Diophantine propositions, in Collected Works III (from the 1930s),
164–175.
[3] T. Radó, On non-computable functions, Bell System Tech. J. 41 (1962), 877–884, DOI
10.1002/j.1538-7305.1962.tb00480.x. MR0133229
[4] A. M. Turing, On Computable Numbers, with an Application to the Entscheidungsprob-
lem, Proc. London Math. Soc. (2) 42 (1936), no. 3, 230–265, DOI 10.1112/plms/s2-
42.1.230. https://academic.oup.com/plms/article-abstract/s2-42/1/230/1491926?
redirectedFrom=fulltext. MR1577030
[5] A. M. Turing, The Applications of Probability to Cryptography, http://arxiv.org/pdf/1505.
04714v2.pdf.
Answer: The key is 7. The message is: “All Gaul is divided into three parts, one
of which the Belgae inhabit, the Aquitani another, those who in their own language
are called Celts, in our Gauls, the third. All these differ from each other in language,
customs and laws. The river Garonne separates the Gauls from the Aquitani; the Marne
and the Seine separate them from the Belgae.” These are the famous opening lines of
Julius Caesar’s Commentarii de Bello Gallico (Commentary on the Gallic War). The
final XY in the ciphertext is padding.
Berlin, 2004. MR2106942
at the École Polytechnique Fédérale de Lausanne, Lausanne, June 28, 2002, Springer-Verlag,
“Turing Day: Computing Science 90 Years from the Birth of Alan Mathison Turing” held
[6] C. Teuscher (ed.), Alan Turing: life and legacy of a great thinker, papers from the Conference
1936. ALAN TURING 126
1937
Vinogradov’s Theorem
Introduction
Although we normally view primes from a multiplicative perspective, there are
many interesting additive questions to investigate. A famous conjecture, due to
Christian Goldbach (1690–1764), is that every even number greater than four is
the sum of two primes; see Figure 1. This is the binary Goldbach conjecture; it
is significantly harder than the ternary Goldbach conjecture: every odd number at
least seven is the sum of three primes.
A major advance towards the proof of the ternary conjecture was made by Ivan
Matveyevich Vinogradov (1891–1983) in 1937, who proved that there is a constant
C such that every odd number at least C is the sum of three primes. Thus, the
ternary Goldbach conjecture is reduced to a finite computation: show that every
odd number less than C is a sum of three primes. Unfortunately, the value of C
produced by Vinogradov’s proof is too large for practical computation: it was over
101000 .
In 2013, the ternary Goldbach conjecture was proved by Harald Andrés Helf-
gott (1977– ), who brought C down to 1027 , well within the range checkable by
computers. These approaches all use the circle method (see the 1920 and 1923
Figure 1. Number of ways (vertical axis) to represent an even

number at most 10,000 (horizontal axis) as a sum of two primes.
Numerical evidence such as this suggests that Goldbach’s conjec-
ture is true.
127
128 1937. VINOGRADOV’S THEOREM
entries), which converts the problem to estimating integrals of exponential sums.

For example, the number of ways an integer N can be written as the sum of three
primes is
1 3
e2πipx e−2πiN x dx. (1937.1)
0 p≤N
p prime
To see this, expand the sum and integrate term-by-term. We obtain a sum of terms
of the form 1
e2πi(p1 +p2 +p3 −N )x dx,
0
which equals 0 if p1 + p2 + p3 − N = 0 (by the periodicity of the integrand) and 1 if
p1 + p2 + p3 − N = 0 (since the integrand is the constant function 1). Consequently,
each representation N = p1 + p2 + p3 as a sum of three primes adds 1 to (1937.1),
so the integral (1937.1) counts the number of ways to write N as a sum of three
primes.
The Goldbach problems boil down to determining if an integral, which happens
to be integer valued, is nonzero. Unfortunately, the integral (1937.1) is fiendishly
difficult to analyze. To date, all approaches to understanding these sums are highly
technical.

Perhaps if we are willing to allow more primes we can prove a related result
more easily. Or, even more generally, let us consider writing integers as sums and
differences of primes.
(a) Can you prove, in an elementary manner, whether or not there is a finite
integer r such that every odd number is the sum and difference of at most r
primes? For example, if r = 4, we could consider quantities of the form
p1 + p2 + p3 + p4 , p1 + p2 + p3 − p4 , p1 + p2 − p3 − p4 , and p1 − p2 − p3 − p4 .
(b) Prove that if for any even n you knew there was at least one pair of primes
differing by n, then you could take r = 2,013 in the original problem. Can you
get a better value of r than 2,013?
(c) Here is an easier version: find a function f (x) that does not grow too rapidly
such that every integer exceeding 4 and at most x is the sum of at most f (x)
primes. Helfgott’s work shows that the constant function f (x) = 4 works.
1937: Comments
Recent developments. The Hardy–Littlewood k-tuple conjecture (see the
1923 entry) implies that for every even number n there is a positive constant Cn ,
which can be explicitly written down in terms of functions of the prime factors of
n, such that the number of pairs of primes of the form (p, p + n) with p ≤ x is
asymptotic to Cn x/(log x)2 . In particular, for each even n the conjecture predicts
infinitely many pairs of primes (p, p + n). Although this has not been proved for
any n, the landscape has changed dramatically in recent years. In 2013, Yitang
Zhang proved that there is some n ≤ 70,000,000 for which there are infinitely many
pairs of primes (p, p + n). Subsequent work has lowered seventy million to 246; see
the 1919 entry. Also see the comments for the 2005 entry for information about
the more general Bateman–Horn conjecture.
Bertrand’s postulate. In 1843, Joseph Bertrand (1822–1900) conjectured

that for any integer n > 3, there is at least one prime in the interval (n, 2n).
Pafnuty Chebyshev (1821–1894) gave a proof in 1852 by obtaining nontrivial upper
and lower bounds for π(x), the number of primes at most x (see the entries for 1913,
1919, and 1948). Bertrand’s postulate is now a consequence of the prime number
theorem (proved in 1896) since
π(2n) 2n/ log 2n log n + log 2

lim = lim = 2 lim = 2.
n→∞ π(n) n→∞ n/ log n n→∞ log n
That is, there are approximately twice as many primes in the interval (0, 2n) than
there are in the interval (0, n). But there is a simpler proof that does not rely on
such heavy machinery.
In 1932, the nineteen-year-old Paul Erdős (see the 1913 entry) gave a beautiful
elementary proof of Bertrand’s postulate [2]. Our presentation is based upon that
given in [1]. Erdős first obtained the estimate

p ≤ 4x−1
p≤x
for real x ≥ 2; here and henceforth, the subscript p refers to a prime number. He
next examined the prime divisors of the central binomial coefficient

2n (2n)!
= . (1937.2)
n (n!)2
Erdős
√then showed that no prime divides (1937.2) more than 2n times, that primes
p > 2n appear at most once in the factorization of (1937.2), and that primes p
satisfying 23 n < p ≤ n do not divide (1937.2) at all. This last remark is the key
to his argument. To see why it is true, observe that 3p > 2n for n, p ≥ 3 implies
that p and 2p are the only multiples of p that divide (2n)! and that p divides (n!)2
exactly twice. Consequently,

4n 2n
≤ ≤ 2n · p· p,
2n n √ √
p≤ 2n 2
2n<p≤ 3 n n<p≤2n
the lower bound being obtained by noting that (1937.2) is the largest term of the
2n + 1 terms in the binomial expansion of (1 + 1)2n = 22n = 4n . Suppose toward a
contradiction that there is no prime in the interval (n, 2n]. Then,
√ √ 2
4n ≤ (2n)1+ 2n p ≤ (2n)1+ 2n 4 3 n ,
√
2n<p≤ 23 n
which is false for n ≥ 467 (the original argument gives n ≥ 4,000; we have used
modern computation). This reduces the proof of Bertrand’s postulate to a finite
computation, which is easily accomplished.
130 1937. VINOGRADOV’S THEOREM
Solution to problem (c). Let f (x) = log2 x for x ≥ 2. We claim that

each integer n > 4 is the sum of at most f (n) primes. Brute force shows that if
4 ≤ n ≤ 100, then n is a sum of at most log2 n primes. Consider now n > 100. If n
is prime, then we are done. If not and x is even, then there is a prime p1 ∈ (n/2, n)
such that n − p1 < n/2. If n is odd, then there is a prime p1 ∈ ( n+1 2 , n + 1) such
that n − p1 < n/2. Thus, in each case we are left with decomposing an integer less
than n/2.
• Assume 4 ≤ n − p1 < n/2. By induction we need at most log2 (n/2) primes, so
adding in the prime p1 means that we need at most log2 (n/2) + 1 = log2 n primes
to decompose n, completing the proof in this case.
• What if 1 ≤ n − p1 ≤ 3? That means there is some prime p1 such that n ∈
{p1 + 1, p1 + 2, p1 + 3}. Notice the last two cases yield n as a sum of two primes,
so we need only consider n = p1 + 1. Rewrite this as n = (p1 − 1) + 2 and then
decompose p1 − 1. Arguing as above there is a prime p2 such that
p1 − 1 − p2 < (p1 − 1)/2.
If p1 − 1 − p2 ≥ 2, we are done as before. If p1 − 1 − p2 = 1, then we note that
n = (p1 − 1) + 2 = (p2 + 1) + 2 = p2 + 3
expresses n as a sum of two primes.
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the
1998 original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin,
2018. MR3823190
[2] P. Erdős, Beweis eines Satzes von Tschedbyschef, Acta Sci. Math. (Szeged) 5 (1930-2), 194–
198.
[3] D. Goldston, Zhang’s theorem on bounded gaps between primes, http://www.aimath.org/
news/primegaps70m/.
[4] H. Helfgott, Major arcs for Goldbach’s theorem, http://arxiv.org/abs/1305.2897.
[5] H. Helfgott, The ternary Goldbach conjecture, http://valuevar.wordpress.com/2013/07/
02/the-ternary-goldbach-conjecture/.
[6] D. H. J. Polymath, New equidistribution estimates of Zhang type, Algebra Number Theory
8 (2014), no. 9, 2067–2199, DOI 10.2140/ant.2014.8.2067. MR3294387
[7] D. H. J. Polymath, Variants of the Selberg sieve, and bounded intervals containing many
primes, Res. Math. Sci. 1 (2014), Art. 12, 83, DOI 10.1186/s40687-014-0012-7. MR3373710
[8] Terence Tao, Online reading seminar for Zhang’s “bounded gaps between primes”, http://
terrytao.wordpress.com/2013/06/04/online-reading-seminar-for-zhangs-bounded-
gaps-between-primes/.
[9] I. M. Vinogradov, The method of trigonometrical sums in the theory of numbers, translated
from the Russian, revised and annotated by K. F. Roth and Anne Davenport; reprint of the
1954 translation, Dover Publications, Inc., Mineola, NY, 2004. MR2104806
[10] Y. Zhang, Bounded gaps between primes, Ann. of Math. (2) 179 (2014), no. 3, 1121–1174,
DOI 10.4007/annals.2014.179.3.7. MR3171761
1938
Benford’s Law
Introduction
Calculate the first N Fibonacci numbers
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, . . .
for as large an N as you can. For d = 1, 2, . . . , 9, what percentage have first digit d?
While it is natural to think that each digit is equally likely to occur, this guess is
totally wrong. For N sufficiently large, about 30% of the first N Fibonacci numbers
start with a 1, while only about 4.5% begin with a 9. In general, the probability
of the first digit being d is log10 d+1
d . The Fibonacci numbers are not an isolated
oddity in this respect. Many mathematical and natural data sets exhibit this bias,
which is known as Benford’s law ; see Figure 1 and Table 1.
One interesting application of Benford’s law is to detect tax fraud (also image
fraud, voter fraud, medical fraud, and so forth). It is so successful because peo-
ple are lousy random number generators: they do not put in enough of the right
patterns. For example, if we toss a fair coin 100 times, most people know there
should be about 50 heads and 50 tails, but they typically do not know what the
longest run of heads or tails should be, or how many alternations between runs of
heads and runs of tails should occur (see [13] for a very readable introduction to
some surprising results on the longest run). The same is true in creating fake data
entries; people are more likely to spread out the leading digit equally from 1 to 9,
or concentrate near 5, in the mistaken belief that this makes the data look more
plausible. There is now a vast literature on Benford’s law and its applications. It
surfaces in accounting, computer science, dynamical systems, economics, finance,
geology, medicine, number theory, physics, psychology, statistics, and astronomy.
See [3, 4, 12] for introductions to the theory and many of its applications and [2]
for a searchable website on Benford publications.
Benford’s law of digit bias says that there is no bias, provided that we look at
the data the right way. Suppose that {log10 xn } is equidistributed modulo 1 for
Table 1. Comparison of Fibonacci, 2n , California, and GDP data

from Figure 1 versus the Benford prediction
First digit 1 2 3 4 5 6 7 8 9
Benford 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%
Fibonacci 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%
2n 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%
California 30.9% 17.6% 13.8% 9.3% 7.5% 6.7% 5.8% 4.8% 3.7%
GDP 31.3% 20.8% 10.4% 8.9% 9.9% 5.2% 3.6% 5.2% 4.7%
131
132 1938. BENFORD’S LAW
(a) First digits of the first 10,000 Fibonacci num-

bers.
(b) First digits of 2n for n = 1, 2, . . . , 10,000.
(c) First digits of the populations of all cities in

California.
(d) First digits of GDP (in dollars) for all 192 UN

member countries for which data is available.
Figure 1. Many data sets exhibit the bias described by Benford’s law.
some data set {xn }. This means that for any [a, b] ⊆ [0, 1],
|{n ≤ N : log10 xn (mod 1) ∈ [a, b]}|
lim = b − a.
N →∞ N
We claim that any such data set satisfies Benford’s law. Let x > 0 and write
x = 10u+v = 10u 10v ,
in which u = log10 x and v ∈ [0, 1) is the fractional part of log10 x; that is,
v = log10 x − log10 x.
Since 10u is a positive integer power of 10, it follows that the leading digit of x is
determined entirely by 10v . Because 10v ∈ [1, 10), the probability that 10v has lead-
ing digit d is the probability that 10v ∈ [d, d + 1); that is, v ∈ [log10 d, log10 (d + 1)).
The equidistribution hypothesis on the data set ensures the probability that v ∈
[log10 d, log10 (d + 1)) is
d + 1
log10 (d + 1) − log10 d = log10 .
d
This is the prediction of Benford’s law.

The sequences {2n } and {3n } are both Benford; what about the sequence
{2 3 } (write the numbers in increasing order: 1, 2, 3, 4, 6, 8, 9, . . .)? More gener-
m n
ally, is {pm q n } Benford for p and q distinct primes?
1938: Comments
The Kronecker–Weyl theorem. Although we will not spoil the problem,
we should at least explain why {2n } and {3n } obey Benford’s law. The Kronecker–
Weyl theorem asserts that nξ is equidistributed modulo 1 if ξ is irrational; see
Figure 2 and the 1931 entry. Consequently, if ξ = log10 α is irrational, then xn =
nξ = log10 (αn ) is equidistributed modulo 1 and hence the sequence {αn } obeys
Benford’s law. Since log10 2 and log10 3 are irrational, we conclude that {2n } and
{3n } are Benford. A bit more work shows that {en } and {π n } are Benford too.
The version of the Kronecker–Weyl theorem we used above states that nξ is
equidistributed modulo 1; it says nothing about how rapidly the equidistribution
sets in. This can be remedied by a more involved analysis that takes into account
how “irrational” a number is. A real number α has irrationality type κ if κ is the
supremum of all γ such that

p
lim inf q γ+1
min α − = 0.
q→∞ p q
Roth’s theorem (see the 1955 entry) ensures that every algebraic irrational is of type
1. See [7] for more details on irrationality types, [9] for applications to Benford’s
law, and [8, Thm. 3.3, p. 124] for details connecting the irrationality type to the
convergence rate.
Figure 2. Plots of ξ, 2ξ, . . . , 100ξ (mod 1) for ξ = π, π 2 , e, γ, K,

in which γ is the Euler–Mascheroni constant (see (1942.5) in the
1942 entry) and K is Khinchin’s constant (see the 1934 entry).
Powers of 2’s and 3’s. Here is another interesting question about 2 and 3. Is
S = {3n /2m : 1 ≤ m, n < ∞}
dense in the positive real numbers? To handle this question, we need Kronecker’s
approximation theorem [6, Thm. 440], which asserts that if β > 0 is irrational,
α ∈ R, and δ > 0, then there are n, m ∈ N so that |nβ − α − m| < δ. Let ξ, > 0
and note that β = log2 3 > 0 is irrational. By the continuity of f (x) = 2x at log2 ξ,
there exists δ > 0 such that
| log2 x − log2 ξ| < δ =⇒ |x − ξ| < . (1938.1)
Kronecker’s theorem with β = log2 3 and α = log2 ξ now yields n, m ∈ N so that
n

log2 3 − log2 ξ = |n log2 3 − log2 ξ − m| < δ.
2m
In light of (1938.1), it follows that |3n /2m − ξ| < , and thus S is dense in the
positive real numbers. This answer, along with many similar results, can be found
in [5].
Benford’s law and powers of π and e. A glance at Figure 2 suggests that

nπ (mod 1) and nπ 2 (mod 1) are not as “random” as ne (mod 1). Perhaps a similar
behavior is seen in π n versus en ? To test this, we calculate the chi-square test
statistic1 to see how well Benford’s law fits the first digits of π n and en for n ≤ N for
N up to 1,000. If we simulate data randomly from the Benford probabilities, then
approximately 95% of the time we should observe a chi-square value of 15.487 or
1 If
the probability of observing a leading digit of d is pd and we have N observations, the
9
chi-square statistic (with 8 degrees of freedom) is χ2 = d=1 (Obsd − N pd ) /N pd , where N is
2
the number of observations and Obsd is the number with leading digit d.
Figure 3. Logarithm of the chi-square statistic for the Benford

test of en (red) and π n (blue) for n ≤ N versus N .
lower. We plot the results in Figure 3, where for convenience we plot the logarithm
of the chi-square value.
Two items are immediately apparent. First, for most N the chi-square values
for π n are significantly larger than those of en . Second, there seems to be an almost
periodic behavior in the amplitude of the chi-square values for π n , with a period of
approximately 175 (and the amplitude getting smaller in subsequent periods).
The latter is not a coincidence. While many people have made it a matter
of personal pride to memorize and be able to recite digits of π on demand, very
few can do this feat for π 2 , and almost no one for even higher powers. This is a
shame, as that knowledge would be useful here. If we go far down with our powers,
we eventually come to π 175 and notice that it is approximately 1.0028 · 1087 . In
other words, every time we increase the exponent n by 175 we almost return to
our original value padded by 87 zeros. Almost. If we returned to the same leading
digits (just with an extra 87 zeros at the end), we would have periodic, non-Benford
behavior. The slight difference eventually pushes us to Benford behavior, but very
slowly (as can be seen by the slow decay in the maximum amplitudes); this is what
we mean by the irrationality of the number controlling the behavior. The fact that
a large power of π is almost a large power of 10 produces the peculiar behavior
exhibited in Figure 3.
Bibliography
[1] F. Benford, The law of anomalous numbers, Proceedings of the American Philosophical So-
ciety 78 (1938), 551–572. http://www.jstor.org/discover/10.2307/984802?uid=3739552&
uid=2&uid=4&uid=3739256&sid=21103164625091.
[2] A. Berger and T. P. Hill, Benford online bibliography, http://www.benfordonline.net.
[3] A. Berger and T. P. Hill, A basic theory of Benford’s law, Probab. Surv. 8 (2011), 1–126,
DOI 10.1214/11-PS175. MR2846899
[4] A. Berger and T. P. Hill, An introduction to Benford’s law, Princeton University Press,
[5] B. Brown, M. Dairyko, S. R. Garcia, B. Lutz, and M. Someck, Four quotient set gems, Amer.
Math. Monthly 121 (2014), no. 7, 590–599, DOI 10.4169/amer.math.monthly.121.07.590.
MR3229105
[6] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, 6th ed., revised by
D. R. Heath-Brown and J. H. Silverman; with a foreword by Andrew Wiles, Oxford University
Press, Oxford, 2008. MR2445243
[7] M. Hindry and J. H. Silverman, Diophantine geometry: An introduction, Graduate Texts in
Mathematics, vol. 201, Springer-Verlag, New York, 2000. MR1745599
[8] L. Kuipers and H. Niederreiter, Uniform distribution of sequences, Pure and Applied
Mathematics, Wiley-Interscience [John Wiley & Sons], New York-London-Sydney, 1974.
MR0419394
[9] A. V. Kontorovich and S. J. Miller, Benford’s law, values of L-functions and the 3x + 1
problem, Acta Arith. 120 (2005), no. 3, 269–297, DOI 10.4064/aa120-3-4. http://arxiv.
org/abs/math/0412003. MR2188844
[10] S. J. Miller and M. J. Nigrini, The modulo 1 central limit theorem and Benford’s law for
products, Int. J. Algebra 2 (2008), no. 1-4, 119–130. MR2417189
[11] S. J. Miller (editor), The Theory and Applications of Benford’s Law, Princeton University
Press, 2015.
[12] R. A. Raimi, The first digit problem, Amer. Math. Monthly 83 (1976), no. 7, 521–538.
MR0410850
[13] M. F. Schilling, The longest run of heads, College Math. J. 21 (1990), no. 3, 196–207, DOI
10.2307/2686886. MR1070635
1939
The Power of Positive Thinking
Introduction
A student doing a homework problem has an enormous advantage over a re-
searcher: the problem is known to be solvable. This is especially true in undergrad-
uate and beginning graduate classes, in which assignments are meant to reinforce
lessons and help students learn techniques. It is hard to overstate how important
this is. It is a huge psychological boost to know a solution exists (let alone having
a sense of what methods will be useful in finding it).
There are many anecdotes and studies of people who were unaware of the diffi-
culty of a problem and who then proceeded to make great progress. The following
story and its variants have circulated for years and are the subject of this year’s
entry. We will meet the protagonist, George Dantzig (1914–2005), again in the
1947 entry. The quote below is from a 1986 interview [1]. He was asked why his
Ph.D. was on a statistics topic when he had taken so few statistics courses.
It happened because during my first year at Berkeley I arrived late
one day at one of Neyman’s classes. On the blackboard there were two
problems that I assumed had been assigned for homework. I copied
them down. A few days later I apologized to Neyman1 for taking so
long to do the homework—the problems seemed to be a little harder
to do than usual. I asked him if he still wanted it. He told me to
throw it on his desk. I did so reluctantly because his desk was covered
with such a heap of papers that I feared my homework would be lost
there forever. About six weeks later, one Sunday morning about eight
o’clock, Anne and I were awakened by someone banging on our front
door. It was Neyman. He rushed in with papers in hand, all excited:
“I’ve just written an introduction to one of your papers. Read it so I
can send it out right away for publication.” For a minute I had no idea
what he was talking about. To make a long story short, the problems
on the blackboard that I had solved thinking they were homework were
in fact two famous unsolved problems in statistics. That was the first
inkling I had that there was anything special about them.
Later in the interview he discusses how the story found its way into sermons.
The origin of that minister’s sermon can be traced to another Lutheran
minister, the Reverend Schuler of the Crystal Cathedral in Los Angeles.
Several years ago he and I happened to have adjacent seats on an
airplane. He told me his ideas about thinking positively, and I told him
my story about the homework problems and my thesis. A few months
later I received a letter from him asking permission to include my story
1 Jerzy Neyman (1894–1981) was Dantzig’s eventual thesis advisor.
137
138 1939. THE POWER OF POSITIVE THINKING
in a book he was writing on the power of positive thinking. Schuler’s

published version was a bit garbled and exaggerated but essentially
correct. The moral of his sermon was this: If I had known that the
problems were not homework but were in fact two famous unsolved
problems in statistics, I probably would not have thought positively,
would have become discouraged, and would never have solved them.

Find the statements of the two problems Dantzig solved, read papers, and
believe in yourself when confronted with challenges in the future. To start you on
your journey, one of the papers is available at [2].
1939: Comments
The birthday problem. Another candidate for this year’s topic is the birth-
day problem. In 1939 Richard von Mises (1883–1953) posed the following problem,
which is a staple in most probability courses. How many people must there be in
a room before there is at least a 50% chance that two people share a birthday?
We give a quick discussion of this problem; see [4] for an expanded treatment and
additional questions.
The first step is to interpret what is going on. Normally people assume that all
birthdays are equally likely (and no one is born on February 29th). This assumption
is not always met. Malcolm Gladwell (1963– ) has a beautifully humorous passage
in his book Outliers [3], in which he investigates the distribution of birthdays among
Canadian junior hockey players. What often happens is that the young kids who
just miss the cutoff for a program are now the oldest and hence likely to be among
the biggest players. This is a tremendous advantage and this makes them look like
better players. They then get more attention, get on to special teams, and the
difference grows. In a telling passage, Gladwell substitutes the birthdays for the
players names:
It no longer sounds like the championship of Canadian junior hockey.
It now sounds like a strange sporting ritual for teenage boys born
under the astrological signs Capricorn, Aquarius, and Pisces. March
11 starts around one side of the Tigers’ net, leaving the puck for his
teammate January 4, who passes it to January 22, who flips it back
to March 12, who shoots point-blank at the Tigers’ goalie, April 27.
April 27 blocks the shot, but it’s rebounded by Vancouver’s March 6.
He shoots! Medicine Hat defensemen February 9 and February 14 dive
to block the puck while January 10 looks on helplessly. March 6 scores!
Back to the birthday problem. We assume that there are 365 days in each year
and that all days are equally likely. We use the law of complementary probability:
the probability that an event happens is one minus the probability that it does not
happen. The probability that among n people we have n different birthdays is

n−1
0 1 n−1 k
qn = 1 − 1− ··· 1 − = 1− .
365 365 365 365
k=0
Indeed, the first person can have any birthday, the next person must avoid that
first birthday, then the subsequent person must miss those two days, and so on.
As we saw in the 1920 and 1934 entries, it is often profitable to take the
logarithm of a product. Thus, we consider

n−1
k
log qn = log 1 − .
365
k=0
If we choose N so that qN ≤ 1/2, then 1 − qN ≥ 1/2; that is, the probability that
a birthday is shared among N people is ≥ 1/2. For small x, we use the Taylor
approximation log(1 − x) ≈ −x and obtain

N −1
k (N − 1)N (N − 1/2)2
log(1/2) ≈ − = − ≈ −
365 2 · 365 2 · 365
k=0
and hence
1 1
N ≈ −2 · 365 log(1/2) + = 365 log 4 + = 22.994 . . . .
2 2
Most people unfamiliar with the problem significantly underestimate the chance;
the probability is about 70% if there are 30 people, 89% with 40, and √ 97% with
50. More generally, if there were D days in the year, we need at least D log 4 + 12
people to have a 50% chance of at least one shared birthday.
How close is this approximation? Very close: the probability that among n
people at least two share the same birthday is ≥ 50% if n ≥ 23. This is sometimes
called the birthday paradox since the answer is strikingly different than the answer
to a seemingly similar problem: how many people are needed before there is a 50%
chance that someone shares my birthday? We need N so large that
N
1 1
1− ≤ ;
365 2
this occurs first for N = 253 (if we had D days in a year, we would find N ≈ D log 2).
The reason the two answers disagree by so much is that in one version any two
people may agree, while in the other someone must agree with a predetermined
person. Note the sharp difference in behavior: the first answer grows like D1/2 ,
whereas and the second grows linearly with D. In addition to being a source of
revenue for probability professors betting their students on the odds two members
in the class share a birthday, an interesting application is the birthday attack to
find collisions of hash functions in cryptography (see [5] and the references therein).
The zeta function and relatively prime integers. Now that we have de-
veloped a bit of the theory behind the Riemann zeta function (see the 1928 and
1933 entries) here is another probability gem that we cannot resist. What is the
probability that two randomly chosen integers a, b are relatively prime?
To begin, let us note that gcd(a, b) = 1 if and only if a and b have no prime
factors in common. In other words, no prime number p divides both a and b. There
is only a 1/4 chance that both a and b are divisible by 2. Therefore there is a

1 1
1− = 1− 2
4 2
140 1939. THE POWER OF POSITIVE THINKING
chance that 2 is not a common divisor of a and b. Similarly, there is a

1 1
1− = 1− 2
9 3
chance that 3 is not a common divisor of a and b. In particular, the chance that
neither 2 nor 3 is a common factor of a and b is

1 1
1− 2 1− 2 .
2 3
Proceeding in this manner for all primes, the Euler product formula (1933.3) and
the solution (1919.2) to the Basel problem suggest the probability that a and b
share no common prime factors is
1

1 6
1− 2 = = 2 = 0.6079 . . . ≈ 60.8%.
p prime
p ζ(2) π
This probabilistic result can be seen in actual computations. For example, in
10 seconds on a desktop computer, Mathematica generated 106 random pairs (a, b)
of integers belonging to the interval [−1016 , 1016 ] and computed gcd(a, b). Of these
pairs, approximately 0.6074 ≈ 60.7% satisfied gcd(a, b) = 1. This is remarkably
close to the true value 6/π 2 ≈ 60.8%.
Is our preceding reasoning sound? After all, there is no uniform probability
measure on the integers. Indeed, if every integer had the same probability of being
selected, then the sum of these infinitely many identical probabilities would sum
to 1, which is impossible (a similar argument arose at the end of the comments
portion of the 1924 entry). Thus, we need to be careful about what is meant by
“randomly chosen integers.” A more precise version of the problem (for which our
answer is the correct one) is, “What is the limit as N → ∞ of the probability that
two randomly chosen integers a, b with |a|, |b| ≤ N are relatively prime?”
Bibliography
[1] D. J. Albers and C. Reid, An interview with George B. Dantzig: the father of linear program-
ming, College Math. J. 17 (1986), no. 4, 293–314, DOI 10.2307/2686279. http://www.jstor.
org/stable/2686279. MR856311
[2] G. B. Dantzig, On the non-existence of tests of “Student’s” hypothesis having power functions
independent of σ, Ann. Math. Statistics 11 (1940), 186–192, DOI 10.1214/aoms/1177731912.
http://projecteuclid.org/download/pdf_1/euclid.aoms/1177731912. MR0002082
[3] M. Gladwell, Outliers: The story of success (reprint edition), Back Bay Books, 2011.
[5] R. Niebuhr, P.-L. Cayrel, and J. Buchmann, Improving the efficiency of generalized birth-
day attacks against certain structured cryptosystems, published in WCC 2011—Workshop on
coding and cryptography (2011), 163–172. https://www.cdc.informatik.tu-darmstadt.de/
reports/reports/GBA-final2.pdf.
1940
A Mathematician’s Apology
Introduction
One of the most important parts of an academic’s job is mentoring the next gen-
eration. Some have written extensively to share the lessons they have learned. One
of the most prolific is Steven G. Krantz (1951– ), whose titles include
A Mathematician’s Survival Guide: Graduate School and Early Career Develop-
ment; A Primer of Mathematical Writing: Being a Disquisition on Having Your
Ideas Recorded, Typeset, Published, Read and Appreciated ; How to Teach Mathe-
matics; A TEX Primer for Scientists; and The Survival of a Mathematician: From
Tenure to Emeritus. These books give a nice sample of the issues, challenges, and
rewards that lie ahead (the last is available online [9]; all can be purchased for
reasonable amounts).
Although there are many authors and texts to mention, this entry highlights
Godfrey Harold Hardy’s A Mathematician’s Apology, first published in 1940 [7].
While many books discuss the challenges and rewards of being a mathematician, his
work is a reflection on his life and whether or not it was well spent. Mathematically
it surely was, since he was responsible for numerous advances and new techniques.
Regarding his life, Hardy considered it to be a success in terms of the happiness
and comfort that he found, but the question remained as to the “triviality” of his
life. He resolved it accordingly:
The case for my life. . . is this: that I have added something to knowl-
edge, and helped others to add more; and that these somethings have
a value which differs in degree only, and not in kind, from that of the
creations of the great mathematicians, or of any of the other artists,
great or small, who have left some kind of memorial behind them.
Because of the influence of Hardy’s writing and work, we devote the entire entry
to him. This is not meant to imply that there were no significant results proved in
1940. One natural candidate is Kurt Gödel’s proof [5] of the relative consistency of
the axiom of choice with the Zermelo–Fraenkel axioms of set theory; see the entry
from 1963 for the rest of the story.
A well-known passage from the Apology proclaims:
I have never done anything “useful.” No discovery of mine has made,
or is likely to make, directly or indirectly, for good or ill, the least differ-
ence to the amenity of the world. . . . Judged by all practical standards,
the value of my mathematical life is nil; and outside mathematics it is
trivial anyhow.
Then it might come as a surprise that Hardy is best known to the world for his work
in genetics. His fame stems from a condescending letter to the editor in Science
on the stability of genotype distributions from one generation to the next [6]; see
141
142 1940. A MATHEMATICIAN’S APOLOGY
Figure 1. Hardy’s note in Science in which he lays out what is

now known as the Hardy–Weinberg law [6].
Figure 1. The result was independently found by the German physician Wilhelm
Weinberg (1862–1937) and is now known as the Hardy–Weinberg law ; see [1] for
more details.
During a lecture by Reginald Crundall Punnett (1875–1967) of Punnett square
fame, the statistician Udny Yule (1871–1951) asked about the behavior of the ratio
of dominant to recessive traits over time. Why does the population not tend towards
the dominant trait over time? Punnett brought the problem to his friend and cricket
companion Hardy; see [3, 4] for more details.
Using only “mathematics of the multiplication-table type”, under natural con-
ditions Hardy proved that there is an equilibrium at which the ratio of different
genotypes remains constant over time. The mathematical content of the letter can
be summarized in one line:
(p + q)2 = p2 + 2pq + q 2 .
The following passage from his note gives a good sense of its tone.
I am reluctant to intrude in a discussion concerning matters of which I
have no expert knowledge, and I should have expected the very simple
point which I wish to make to have been familiar to biologists. . . .
There is not the slightest foundation for the idea that a dominant
character should show a tendency to spread over a whole population,
or that a recessive should tend to die out. [6].
In an obituary of Hardy, Edward Charles Titchmarsh (1899–1963) states that Hardy
“attached little weight to it” [12]. However, its prevalence in introductory biology
texts demonstrates the importance of the Hardy–Weinberg law. One commentator

conjectured:
It must have embarrassed him that his mathematically most trivial
paper is not only far and away his most widely known, but has been
of such distastefully practical value. He published this paper not in
the obvious place, Nature, but across the Atlantic in Science. Why?
. . . I would like to think that he didn’t want it to be seen by his math-
ematician colleagues. [2]

Read the masters! Pull up Riemann’s original paper [11] or some article in a
field that strikes your fancy. Read the rest of A Mathematician’s Apology or other
similar books. Browse some math blogs. We are fortunate to live in a time when
the only cost of posting and publishing certain types of information is the time
it takes to write it. The AMS has a great blog for graduate students at http://
blogs.ams.org/mathgradblog/. Many people make career decisions by following
paths of least resistance; really think about what you want to do. Do not just go
with the flow; make as informed a decision as you can.
1940: Comments
More about Hardy. Hardy lived through World War I and the Apology was
written at the start of the Second World War. Much of his pride in the uselessness
of his work stemmed from the fact that he was not contributing to violence and
war.
But here I must deal with a misconception. It is sometimes suggested
that pure mathematicians glory in the uselessness of their work. If the
theory of numbers could be employed for any practical and obviously
honorable purpose, if it could be turned directly to the furtherance of
human happiness of the relief of human suffering. . . then surely neither
Gauss nor any other mathematician would have been so foolish as to
decry or regret such applications. But science works for evil as well as
for good (and particularly, of course, in time of war). . . . [7]
Interestingly, what seems useless and pure in one era can become useful and applied
a short time later. Hardy’s own work provides an excellent example, where much of
elementary number theory (as well as advanced results on L-functions) now plays
an important role in cryptography; see the 1921 and 1977 entries.
Of course, Hardy is perhaps best known (in the mathematical community at
any rate) for his collaborations with Littlewood and Ramanujan. On this Hardy
says:
I still say to myself when I am depressed and find myself forced to listen
to pompous and tiresome people, “Well, I have done one thing you
could never have done, and that is to have collaborated with Littlewood
and Ramanujan on something like equal terms.” [7]
The 2015 film “The Man Who Knew Infinity,” based upon the outstanding biogra-
phy of Ramanujan by Robert Kanigel (1946– ) [8], depicts some of Hardy’s many
quirks and his working relationship with the great Ramanujan; see Figure 2.
144 1940. A MATHEMATICIAN’S APOLOGY
Figure 2. A scene from the 2015 movie “The Man Who Knew
Infinity.” S. Ramanujan (left) speaks with J. E. Littlewood (right)
as G. H. Hardy (middle) observes. Ramanujan, Littlewood, and
Hardy are played by Dev Patel (1990– ), Toby Jones (1966– ), and
Jeremy Irons (1948– ), respectively.
Bibliography
[1] H. E. Christenson and S. R. Garcia, G. H. Hardy: mathematical biologist, J. Humanist. Math.
5 (2015), no. 2, 96–102, DOI 10.5642/jhummath.201502.08. http://scholarship.claremont.
edu/cgi/viewcontent.cgi?article=1273&context=jhm. MR3378780
[2] J. F. Crow, Eighty years ago: the beginnings of population genetics, Genetics 19 (1988), no. 3,
473–76.
[3] A. W. F. Edwards, G. H. Hardy (1908) and Hardy–Weinberg equilibrium, Genetics
179 (2008), no. 3, 1143–150. http://genetics.org/content/179/3/1143.
[4] C. R. Fletcher, G. H. Hardy—applied mathematician, Bull. Inst. Math. Appl. 16 (1980),
no. 2-3, 61–67. MR576086
[5] K. Gödel, The Consistency of the Continuum Hypothesis, Annals of Mathematics Studies,
no. 3, Princeton University Press, Princeton, N. J., 1940. MR0002514
[6] G. H. Hardy, Mendelian proportions in a mixed population, Science 28 (1908), 49–50. http://
www.esp.org/foundations/genetics/classical/hardy.pdf.
[7] G. H. Hardy, A mathematician’s apology, with a foreword by C. P. Snow; reprint of the 1967
edition, Canto, Cambridge University Press, Cambridge, 1992. MR1148590
[8] R. Kanigel, The man who knew infinity: A life of the genius Ramanujan, Charles Scribner’s
Sons, New York, 1991. MR1113890
[9] S. G. Krantz, The survival of a mathematician: From tenure-track to emeritus, Ameri-
can Mathematical Society, Providence, RI, 2009. http://www.math.wustl.edu/~sk/books/
newsurv.pdf. MR3309302
[10] R. C. Punnett, Early days of genetics, Heredity 4 (1950), no. 1, 1–10.
[11] B. Riemann, On the number of prime numbers less than a given quantity, Monatsberichte der
Königlich Preußischen Akademie der Wissenschaften zu Berlin, 1859. http://www.claymath.
org/sites/default/files/ezeta.pdf.
[12] E. C. Titchmarsh, Obituary: Godfrey Harold Hardy (1877–1947), Obit. Notices Roy. Soc.
London 6 (1949), 447–461 (1 plate). MR0037796
1941
The Foundation Trilogy
Introduction
On August 1, 1941, Isaac Asimov visited John Campbell, editor of Astounding
Science Fiction. The meeting led to the creation of the Foundation series, one of
the most influential science-fiction series of all time. The story is modeled on the
celebrated The History of the Decline and Fall of the Roman Empire by Edward
Gibbon (1737–1794) and tells the story of how the Galactic Empire will fall and
30,000 years of anarchy will reign before a new empire arises.1 Hari Seldon develops
the mathematical theory of psychohistory. Inspired by statistical mechanics, the
Foundation series postulates that it is possible to mathematically predict the general
behavior of galactic populations with high precision (despite the fact that it is
impossible to predict the behavior of specific individuals). While it is too late
to stop the fall, Hari and his colleagues analyze the equations and take steps to
minimize its impact, so that a new empire will rise after just a thousand years.
Asimov is but one of many science-fiction writers whose work has inspired
scientists and engineers. NASA seriously considered adopting the Star Trek logo;
while that never happened, the first shuttle was named Enterprise.
Of course, this is not meant to imply that science fiction always gets the math
right. In the 1989 Star Trek: The Next Generation episode The Royale, Captain
Jean-Luc Picard claims that Fermat’s last theorem is still unresolved after 800
years;2 it was proved by Andrew Wiles (1953– ) in 1994 (see the 1995 entry). The
2010 Doctor Who episode The Eleventh Hour is notable for conflating anecdotes
about the mathematicians Pierre de Fermat (1607–1665) and Évariste Galois (1811–
1832). On the other hand, the 1981 Doctor Who story Logopolis and the 1982 story
Castrovalva (named after an M. C. Escher lithograph) involve mathematics, in a
vague but fascinating sense, as part of the plot.

One of the most famous quotes in Asimov’s original trilogy is “A circle has
no end.” In case you are not familiar with the story, we will not spoil it for you
by divulging its meaning in the work. While a circle has no end, it does have
a perimeter and an area. Consider the following generalization. Find the area
1 Can you think of a Roman Emperor who was captured in battle? Can you find a Fields
Medalist with that middle name?
2 Although the 1995 Star Trek: Deep Space 9 episode Facets refers to Wiles’s proof.
145
146 1941. THE FOUNDATION TRILOGY
enclosed by the ellipse

x2 y2
+ = 1.
a2 b2
Now find its perimeter.
The first problem is often covered in multivariable calculus. The second prob-
lem has been studied by many mathematicians and its solution touches on many
fields. This conundrum illustrates that for many problems the boundary is harder
to deal with than the interior.
1941: Comments
Elliptical reasoning. If we let u = x/a and v = y/b, then the equation of the
ellipse, in uv-space, becomes the equation of the unit circle centered at the origin;
the area element dx dy becomes ab du dv. This change of variables yields

1 dx dy = 1 · ab du dv = πab
(x/a)2 +(y/b)2 ≤1 u2 +v 2 ≤1
since the area of the unit circle is π. If a = b = r, then the area is πr 2 , as expected.
A similar calculation shows that the volume of the ellipsoid
(x/a)2 + (y/b)2 + (z/c)2 ≤ 1
is πabc. Computing the perimeter of an ellipse is a different story; see [1] for a
discussion and solution.
Fourier series. Of course, 1941 witnessed many mathematical innovations
that are worthy of our attention. We focus on a famous theorem of Norbert Wiener
(1894–1964) about absolutely convergent Fourier series. Before tackling Wiener’s
theorem, we need to talk about Fourier series. This is a subject that every mathe-
matics student should learn about. Under certain circumstances one can approxi-
mate a function f : [−π, π] → R by the partial sums of its Fourier series
∞

a
√0 + (an cos nt + a−n sin nt), (1941.1)
2 n=1
in which the Fourier coefficients are given by

1 π
a−n = f (t) sin nt dt,
π −π

1 π f (t)
a0 = √ dt,
π −π 2
and

1 π
an = f (t) cos nt dt
π −π
for n ∈ N. The motivation stems from the study of waves. A 2π-periodic function
f : R → R can be regarded as a function f : [−π, π] → R since the values of f
on [−π, π] determine the values of f everywhere. Under certain circumstances, one
hopes to express f as a superposition of simple sine and cosine waves. The integrals
defining the “amplitudes” an act as “filters” that isolate the component of f that
has “frequency” n.
A typical result in the area, used all the time by electrical engineers, is the
following. Let f : R → R be a periodic function with period 2π. Suppose that
Figure 1. The graph of the square-wave function (1941.2) and the

Fourier approximation π2 + 2 sin t + 23 sin 3t + 25 sin 5t + 27 sin 7t +
2
9 sin 9t.
f and f are both piecewise continuous on [−π, π] and that f (−π) = f (π) and
f (−π) = f (π). If f is continuous at t, then the Fourier series (1941.1) converges
to f (t). If f has a jump discontinuity at t, then the Fourier series (1941.1) converges
to the midpoint 12 (f (t+ ) + f (t− )) of the gap [4, 2.3.10]. This result is of practical
value, since it ensures that “nice” waves can be studied using sines and cosines.
Leibniz’s series for π/4. Here is a cute example of Fourier series in action.
Consider the square-wave function f : [−π, π] → R defined by
⎧
⎪
⎨0 if −π < t < 0,
f (t) = π if 0 < t < π, (1941.2)
⎪
⎩π
2 if t = 0 or t = ±π.
Since f and f are piecewise continuous, one can show that

∞ ∞
a0 π 2 sin[(2n − 1)t]
f (t) = √ + (an cos nt + a−n sin nt) = +
2 n=1 2 n=1 2n − 1
for all t; see Figure 1. Since f ( π2 ) = π, it follows that

∞
π (−1)n+1
π= +2 ,
2 n=1
(2n − 1)
which can be rearranged to yield the famous series

1 1 1 π
1− + − + ··· = ,
3 5 7 4
discovered in 1674 by Gottfried Wilhelm Leibniz (1646–1716).
Gelfand’s proof of Wiener’s 1/f theorem. A familiar theme in this book

is that everything real is better complex. Let
T = {z ∈ C : |z| = 1} = {eit : t ∈ [−π, π]}
denote the unit circle in the complex plane. By identifying the interval [−π, π] with
T, we may regard a 2π-periodic complex-valued function as a function f : T → C.
148 1941. THE FOUNDATION TRILOGY
This is the natural setting for the study

of Fourier series, in which one attempts to
find series expansions of the form n∈Z cn eint for f (eit ), in which
π
1
cn = f (t)e−int dt.
2π −π
The advantage of this approach is that we can work entirely with exponential
functions, which are much easier to deal with than sines and cosines. For instance,
do you remember the trigonometric identities for cos(x + y) and sin(x + y)? If not,
see the footnote on p. 8. However, you certainly know that ex+y = ex ey .
Wiener’s 1/f theorem asserts that if f : T → C has an absolutely convergent
Fourier series f (eit ) = n∈Z an eint and if f does not vanish on T, then 1/f has
an absolutely convergent Fourier series. The original proof by Norbert Wiener
(1894–1964) from 1932 was delicate and technical (around 100 pages) [7]. It is
often described as a tour-de-force of “hard analysis.” Using the theory of Banach
algebras, Israel Gelfand (1913–2009) gave a “soft” proof of Wiener’s theorem in
1941 [5] that requires only a few pages.
To discuss Gelfand’s proof, we need to view the problem through the lens of
Banach algebras [2]. The Wiener algebra W is the set of all functions f : T → C
of the form f (eit ) = n∈Z an eint for which

f W = |an | < ∞.
n∈Z
It turns that W is closed under pointwise addition and multiplication. It can be

endowed with the norm and metric dW (f, g) = f − gW with respect to which
it is a complete metric space. In particular, W consists of those functions whose
Fourier series are absolutely convergent. Each f ∈ W is continuous, since it is the
N
uniform limit on T of the continuous functions n=−N an eint .
The Wiener algebra is an example of a commutative Banach algebra. That
is, it is a complete (in the topological sense) normed vector space endowed with
addition and a commutative multiplication that satisfy several natural axioms (for
instance, the norm is submultiplicative: f gW ≤ f W gW ).
The general theory of commutative Banach algebras, developed by Gelfand,
tells us that f ∈ W is invertible in W if and only if χ(f ) = 0 for every character χ
of W. In this context, a character is a multiplicative linear function χ : W → C.
That is, χ is a complex-valued linear map on W that satisfies χ(f g) = χ(f )χ(g)
for all f, g ∈ W. Gelfand showed that characters on commutative Banach algebras
are contractive; so |χ(f )| ≤ f W for all f ∈ W. In particular, every character on
W is continuous since |χ(f ) − χ(g)| = |χ(f − g)| ≤ f − gW .
Let z = eit and observe that z n ∈ W and z n W = 1 for all n ∈ Z. Suppose
that χ : W → C is a character and that χ(z) = λ. Then
|λ|n = |λn | = |χ(z n )| ≤ z n W = 1
for all n ∈ Z, so |λ| = 1. Thus, for each character χ, there is a unique eiα ∈ T so
that χ(z) = eiα . Consequently,
χ(z n ) = χ(z)n = einα


for all n ∈ Z. Thus, χ simply evaluates the function z n at eiα . If f = n∈Z cn z
n
∈
W, then the continuity of χ ensures that

χ(f ) = cn χ(z n ) = cn einα = f (eiα ).
n∈Z n∈Z
If f does not vanish on T, then χ(f ) = 0 for every character χ of W. Thus, f is

invertible in W, so 1/f ∈ W as claimed.
Bibliography
[1] S. Adlaj, An eloquent formula for the perimeter of an ellipse, Notices Amer. Math. Soc.
59 (2012), no. 8, 1094–1099, DOI 10.1090/noti879. http://www.ams.org/notices/201208/
rtx120801094p.pdf. MR2985810
[2] W. Arveson, A short course on spectral theory, Graduate Texts in Mathematics, vol. 209,
Springer-Verlag, New York, 2002. MR1865513
[3] I. Asimov, Foundation, Gnome Press, 1951.
[4] R. Bhatia, Fourier series, reprint of the 1993 edition [Hindustan Book Agency, New Delhi,
MR1657675], Classroom Resource Materials Series, Mathematical Association of America,
Washington, DC, 2005. MR2108537
[5] I. M. Gelfand, Normierte ringe, Mat. Sbornik N.S. 9 (1941), no. 51, 3-24. http://www.mathnet.
ru/links/c2c9f3ffc009b4ac303540de01718e1e/sm6046.pdf.
[6] D. J. Newman, A simple proof of Wiener’s 1/f theorem, Proc. Amer. Math. Soc. 48
(1975), 264–265, DOI 10.2307/2040730. http://www.ams.org/journals/proc/1975-048-01/
S0002-9939-1975-0365002-8/. MR0365002
[7] N. Wiener, Tauberian theorems, Ann. of Math. (2) 33 (1932), 1-100. http://www.jstor.org/
stable/1968102?origin=crossref&seq=1#page_scan_tab_contents.
1942
Zeros of ζ(s)
Introduction
The Riemann zeta function is perhaps the most important function in number
theory; see the 1928, 1933, 1939, 1942, 1945, 1967, and 1987 entries. It is initially
defined for Re s > 1 by the series
∞
1
ζ(s) = s
.
n=1
n
The Euler product formula (1933.3) is the product representation

1
−1
ζ(s) = 1− s ,
p prime
p
also valid for Re s > 1; see the 1933 entry for a proof. Although Euler and others
studied the zeta function first, it is named after Georg Friedrich Bernhard Riemann
(1826–1866) because of his 1859 masterpiece that relates the distribution of the
zeros of ζ(s) to the fine properties of the prime-counting function π(x) [8].
The Euler product formula confirms that the zeta function has no zeros in the
half plane Re s > 1. However, neither the series nor the product representation
given above converges if Re s ≤ 1. So what do we mean by the zeros of ζ(s)?
To resolve this issue and to understand Riemann’s contribution, we must discuss
analytic continuation.
An analytic function is a differentiable function f : U → C defined on a
nonempty, connected open set U ⊆ C. By “differentiable,” we mean that
f (z) − f (z0 )
f (z0 ) = lim
z→z0 z − z0
exists for every z0 ∈ U . This is the complex version of the single-variable calculus
definition. For instance, the zeta function is analytic on Re s > 1 with derivative
∞
log n
ζ (s) = − .
n=1
ns
An analytic continuation of an analytic function f : U → C is an analytic function

g : V → C, defined on an open set V that contains U , so that f and g agree on U .
That is, g is an analytic “extension” of f to the larger set V .
An example of analytic continuation involves the geometric series. The sum-
mation formula
∞
1
zn = (1942.1)
n=0
1−z
151
152 1942. ZEROS OF ζ(s)
is valid for |z| < 1.1 There is an important asymmetry in (1942.1): the series
z)−1 is defined for all z = 1.
converges only for |z| < 1, whereas the function (1 −
−1 ∞
Thus, (1 − z) provides an analytic continuation of n=0 z n from the open disk
|z| < 1 to the much larger region C\{1}.
Obtaining an analytic continuation of the zeta function is more difficult. We
first construct an analytic continuation to Re s > 0. Observe that
∞
∞ ∞ n+1
1
ζ(s) − = n−s − x−s dx = n−s − x−s dx
s − 1 n=1 1 n=1 n
∞ n+1
−s
= n − x−s dx
n=1 n
∞ n+1 x
= s y −1−s dy dx. (1942.2)
n=1 n n
Since
n+1 x
s y −1−s
dy dx ≤ |s|n−1−Re s ,

n n
it follows that the series (1942.2) converges absolutely and uniformly on each half-
plane Re s ≥ δ > 0. Each summand is an analytic function of s, so (1942.2) provides
an analytic continuation of ζ(s) − (s − 1)−1 to the half-plane Re z > 0; the presence
of the term (s − 1)−1 on the left-hand side ensures that ζ(s) has a simple pole at
s = 1 with residue 1. That is, near the point s = 1, the zeta function behaves like
the function (s − 1)−1 .
The next, and most complicated, step is to show that the zeta function satisfies
the functional equation
πs
ζ(s) = 2s π s−1 sin Γ(1 − s)ζ(1 − s), (1942.3)
2
in which
e−γs s −1 s
∞
Γ(s) = 1+ en (1942.4)
s n=1 n
is the gamma function and
N
1
γ = lim − log N ≈ 0.5772156 . . . (1942.5)
N →∞ n
n=1
is the Euler–Mascheroni constant. For the sake of brevity, we omit this step. Since
the product (1942.4) is an analytic function on C\{0, −1, −2, −3, . . .}, the functional
equation (1942.3) permits us to define ζ(s) for Re s ≤ 0 since the function on the
right-hand side of (1942.3) is now defined for s = 1 with Re s ≥ 1. Thus, we have
obtained an analytic continuation of the zeta function to C\{0}.
The product representation (1942.4) of the gamma function and (1942.3) ensure
that ζ has zeros at −2, −4, −6, . . . These are the trivial zeros of the zeta function.
Any remaining zeros must be in the critical strip
{s ∈ C : 0 < Re s < 1}.
1 The
∞ n
radius of convergence of the series n=0 z is 1. What students of calculus do not
often realize is that the “radius” referred to is the radius of the disk |z| < 1 in the complex plane.
These are the nontrivial zeros of the zeta function. It turns out that the nontrivial
zeros govern the main terms in our error estimates of the π(x). Neglecting some
logarithmic factors, if
θ = sup{Re s : 0 < Re s < 1, ζ(s) = 0},
then the maximum deviation2 |π(x)−Li(x)| from the prediction of the prime number
theorem is essentially of size at most xθ . Thus, the nontrivial zeros of the zeta
function have an enormous influence in number theory: they control the large-scale
distribution of the prime numbers.
To a few decimal places, these are the first twenty nontrivial zeros that lie in
the upper half-plane:
0.5 + 14.1347i, 0.5 + 21.0220i, 0.5 + 25.0109i, 0.5 + 30.4249i, 0.5 + 32.9351i,
0.5 + 37.5862i, 0.5 + 40.9187i, 0.5 + 43.3271i, 0.5 + 48.0052i, 0.5 + 49.7738i,
0.5 + 52.9703i, 0.5 + 56.4462i, 0.5 + 59.3470i, 0.5 + 60.8318i, 0.5 + 65.1125i,
0.5 + 67.0798i, 0.5 + 69.5464i, 0.5 + 72.0672i, 0.5 + 75.7047i, 0.5 + 77.1448i.
Notice a pattern? Numerical calculations have confirmed that the first 1013 non-
trivial zeros lie on the critical line Re s = 12 ; see Figure 1. The Riemann hypothesis,
one of the seven Clay Millennium Problems, asserts that the nontrivial zeros all lie
on the critical line. Riemann wrote in [8]:
. . . and it is very probable that all roots are real.3 Certainly one would
wish for a stricter proof here; I have meanwhile temporarily put aside
the search for this after some fleeting futile attempts, as it appears
unnecessary for the next objective of my investigation.
The Riemann hypothesis, which was one of Hilbert’s problems [10] (see the 1935,
1963, 1970, 1980, and 1983 entries), is considered by many mathematicians to be
the most important open problem in mathematics.
In 1914, Godfrey Harold Hardy (see the 1920, 1923, and 1940 entries) proved
there are infinitely many nontrivial zeros on the critical line. However, he was
unable to ascertain whether a positive proportion of them are on the critical line.
The situation changed in 1942, when Atle Selberg (1917–2007) showed that a small,
but positive, proportion of the zeros of ζ(s) are on the critical line; see the 1948
entry. A major advance came in 1974 with the work of Norman Levinson (1912–
1975), who proved more than a third of these zeros are on the line. The best results
today are around 40%; there is still a long way to go. Even if we can prove that
100% of the zeros are on the critical line, that still would be insufficient to prove
the Riemann hypothesis. There could still be infinitely many zeros in the critical
strip that do not lie on the critical line. This is meant in the same sense that “100%
of natural numbers are not perfect squares.” The proportion √ of natural numbers
√
at most x that are not perfect squares is approximately (x − x)/x = 1 − 1/ x,
which tends to zero as x → ∞.
It is still unknown whether or not there is a c < 1 such that all nontrivial zeros
of the zeta function have real part at most c; the Riemann hypothesis is equivalent
to being able to take c = 12 (the nontrivial zeros are symmetric about the line
2 Here Li(x) denotes the offset logarithmic integral function (1933.1).

3 Riemann was considering a variant of the zeta function, for which the corresponding con-
jecture is that the zeros are real.
154 1942. ZEROS OF ζ(s)
Figure 1. The nontrivial zeros of the Riemann zeta function lie

in the critical strip 0 < Re s < 1. The Riemann hypothesis asserts
that they all lie on the critical line Re s = 12 .
Re s = 12 ). The best results are zero-free regions where how far to the left of the
line Re s = 1 we can go tends to zero rapidly with the height t, giving regions where
ζ(σ + it) = 0 if σ > 1 − A(log |t|)−r1 (log log |t|)−r2
for some positive constants A, r1 , r2 .

What is wrong with the following “proof” of the Riemann hypothesis?4
(a) For each prime p let hp (s) = (1 − p−2s )−1 /(1 − p−s )−1 . Note that hp (s) is
never zero or infinity for Re s > 0.
4 This is not a valid proof, nor can it be salvaged.

(b) Let ζ2 (s) = h2 (s)ζ(s). The analytic continuation of ζ2 (s) is simply h2 (s) times
the analytic continuation of ζ(s). Furthermore, ζ2 (s) and ζ(s) have the same
zeros for Re s > 0. Observe that
−1 −1
ζ2 (s) = 1 − 2−2s 1 − p−s .
p prime
p≥3
(c) Similarly set ζ3 (s) = h3 (s)ζ2 (s), and observe that ζ3 (s) and ζ2 (s) (and hence
also ζ(s)) have the same zeros in the region Re s > 0. Note that
−1 −1 −1
ζ3 (s) = 1 − 2−2s 1 − 3−2s 1 − p−s .
p prime
p≥5
(d) We continue this process, working initially in the region Re s > 2 so that all
the products involved converge uniformly. We let ζ∞ (s) be the limit of ζp (s)
as p → ∞. This limit exists and equals ζ(2s) for Re s > 2.
(e) Since ζ(2s) has an analytic continuation that does not vanish for Re s > 1/2
(because ζ(s) does not vanish if Re s > 1), each ζp (s) does not vanish for
Re s > 1/2. Since all these functions have the same zeros in this region, none
of them vanish for Re s > 1/2. Thus, ζ(s) does not vanish in this region and
the Riemann hypothesis is true.
1942: Comments
Solution to the problem. The approach sketched above is fundamentally
flawed. The error is that the analytic continuation of the limit is not necessarily
the limit of the analytic continuation. Moreover, there is no hope of salvaging the
argument above. If instead of replacing each prime with its square we used its
cube, we would then deduce that ζ(s) has no zeros for Re s > 1/3. However, this
is impossible since the zeta function has infinitely many zeros on the critical line.
Bibliography
[1] E. Bombieri, Problems of the millennium: the Riemann hypothesis, Clay Mathematics In-
stitute, http://www.claymath.org/sites/default/files/official_problem_description.
pdf.
[2] Clay Mathematics Institute, Millennium problems, http://www.claymath.org/millennium-
problems.
[3] H. Davenport, Multiplicative number theory, 2nd ed., revised by Hugh L. Montgomery, Grad-
uate Texts in Mathematics, vol. 74, Springer-Verlag, New York-Berlin, 1980. MR606931
[4] H. M. Edwards, Riemann’s zeta function, Pure and Applied Mathematics, Vol. 58, Aca-
demic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London,
1974. MR0466039
[5] G. H. Hardy, Sur les zéros de la fonction ζ(s), Comp. Rend. Acad. Sci. 158 (1914), 1012–
1014.
[6] H. Iwaniec and E. Kowalski, Analytic number theory, American Mathematical Society
Colloquium Publications, vol. 53, American Mathematical Society, Providence, RI, 2004.
MR2061214
[7] N. Levinson, More than one third of zeros of Riemann’s zeta-function are on σ = 1/2,
Advances in Math. 13 (1974), 383–436, DOI 10.1016/0001-8708(74)90074-7. MR0564081
156 1942. ZEROS OF ζ(s)
[8] G. F. B. Riemann, Über die Anzahl der Primzahlen unter einer gegebenen Grösse, Monats-
ber. Königl. Preuss. Akad. Wiss. Berlin, Nov. 1859, 671–680. http://www.maths.tcd.ie/pub/
HistMath/People/Riemann/Zeta/EZeta.pdf.
[9] A. Selberg, Contributions to the theory of the Riemann zeta-function, Arch. Math. Naturvid.
48 (1946), no. 5, 89–155. MR0020594
[10] Wikipedia, Hilbert’s problems, http://en.wikipedia.org/wiki/Hilbert’s_problems.
1943
Breaking Enigma
Introduction
One group of mathematicians played a crucial role in the Allied victory in
World War II: the codebreakers. The German Army encrypted its communications
with Enigma machines, typerwriter-like devices (see Figures 2 and 3 on pp. 123 and
124, respectively) that produce a fiendishly complicated code. The Polish Cipher
Bureau developed the strategies to break the Enigma code in the early 1930s, but
the largest codebreaking operation was British, headquartered at Bletchley Park,
a Victorian manor northwest of London. The top-secret Bletchley Park project,
codenamed “Ultra,” is legendary. It employed mathematicians, linguists, chess
masters, academics, composers, and puzzle experts. Recruiters once asked the
Daily Telegraph to organize a crossword competition and then secretly offered jobs
to the winners. One of the leaders of Ultra was Alan Turing, the mathematician
and pioneer of theoretical computer science whom we met in the 1936 entry.
Mathematically speaking, the Enigma machine generates a permutation τ ∈
S26 of the 26 letters of the alphabet. Here Sn denotes the symmetric group on n
symbols. The permutation τ changes with each keystroke. Typing one letter sends
an electric current through scrambling mechanisms—a plugboard, then a set of
rotors, then a reflector, then back through the rotors and the plugboard—causing
a different letter to light up. It also turns the rotors so that the next letter will be
scrambled differently.
The scramblers are wired as follows: the plugboard has one plug for each letter
and ten pairs of letters wired together. It defines a permutation π, which is a
product of ten two-cycles. The rotors are rotating wheels with a circle of twenty-
six brass pins on one side and twenty-six electrical contacts on the other. The wiring
from contacts to pins gives a fixed permutation ρ. Depending on the position of the
rotor, this permutation is conjugated by a power of the 26-cycle α = (1 2 3 . . . 26).
The reflector has twenty-six electrical contacts, connected in pairs by thirteen wires.
It gives a fixed permutation σ, a product of thirteen 2-cycles. Altogether, the
permutation τ is
π −1 (α−i1 ρ1 αi1 )−1 (α−i2 ρ2 αi2 )−1 (α−i3 ρ3 αi3 )−1 σ(α−i3 ρ3 αi3 )(α−i2 ρ2 αi2 )(α−i1 ρ1 αi1 )π,
where i1 , i2 , i3 , which represent the positions of rotors 1, 2, and 3, vary. Since each
permutation τ is a conjugate of σ, it follows that τ is also a product of thirteen
2-cycles and that τ −1 = τ . Thus, a message can be encrypted and decrypted by
Enigma machines with the same settings. The operator could choose ten pairs
of letters to connect in the plugboard, three out of five exchangeable rotors in
any order, and twenty-six initial positions for each rotor. This gives a total of
150,738,274,937,250 initial settings for the machine.
157
158 1943. BREAKING ENIGMA
The vast number of initial settings makes the Enigma code almost unbreakable,
but it does have weaknesses. Since τ is a product of thirteen 2-cycles, no letter is
ever encoded as itself. A codebreaker can look for common words and phrases in the
encrypted text and rule them out if any letters match. German messages also had
various common formats that made them easier to guess. Furthermore, the Allied
spies captured parts of Enigma machines, decrypted messages, and information
about initial settings. All this was just enough to break the code. By 1943, British
Intelligence was able to decrypt most Enigma codes without knowing the initial
settings of the machine. This capability was kept utterly secret; the Nazis never
knew. Winston Churchill (1874–1965) later told George VI (1895–1952), “It was
thanks to Ultra that we won the war.”

Proposed by Ian Whitehead, University of Minnesota.
In honor of the Bletchley Park crossword contest, here is a cryptography-
themed cryptic crossword (see Figure 1), jointly written with Joey McGarvey. As in
all cryptic crosswords, each clue contains a regular definition and a pun/anagram/
wordplay hint. You must figure out how to parse the clue.
1943: Comments
Derangements. There are n! permutations of {1, 2, . . . , n}. A permutation
is a derangement if no element ends up where it started. Thus, (2 4 3 5 1) is not
a derangement since 3 is fixed, but (2 3 5 1 4) is. Let pn denote the fraction of
permutations of {1, 2, . . . , n} that are derangements. Does limn→∞ pn exist? If it
exists, is it large (close to 1) or small (close to 0)? Think about this before reading
on.
To determine pn , we compute 1 minus the probability at least one number
is fixed. Let Ai1 ,i2 ,...,ik denote the number of permutations that fix the distinct
natural numbers i1 , i2 , . . . , ik ≤ n; these permutations may fix other numbers as
well, so long as i1 , i2 , . . . , ik are fixed. Then Ai1 ,i2 ,...,ik = (n − k)!. The principle of
inclusion-exclusion ensures that the number of permutations that fix at least one
of {1, 2, . . . , n} is

n
A i1 − Ai1 ,i2 + Ai1 ,i2 ,i3 − · · · + (−1)n−1 A1,2,...,n .
i1 =1 i1 <i2 i1 <i2 <i3
n
Since the number of ways to select i1 < i2 < · · · < ik is k , the preceding equals
n! n! n! n!
(n − 1)! − (n − 2)! + (n − 3)! − · · · + (−1)n 0!
1!(n − 1)! 2!(n − 2)! 3!(n − 3)! n!0!

1 1 1 1
= n! − + − · · · + (−1)n .
1! 2! 3! n!
Divide the above by n! (there are n! total permutations of {1, 2, . . . , n}), subtract
the result from 1, and obtain
n
1 1 1 1 1 (−1)k
pn = − + − + · · · + (−1)n = .
0! 1! 2! 3! n! k!
k=0
Figure 1. Cryptography-themed crossword. See Figure 2 for the

solutions.
Since
x x2 x3
ex = 1 + + + + ···
1! 2! 3!
for all real x, we see that
lim pn = e−1 ≈ 36.79%.
n→∞
160 1943. BREAKING ENIGMA
Looked at another way, for large n a random permutation of the set {1, 2, . . . , n}
has a 63.21% chance of keeping at least one element fixed.
Figure 2. Solutions to the cryptography-themed crossword (Figure 1).
Deception and misdirection. Cryptography is too vast a subject to cover

fully in a short entry, but we would be remiss if we did not at least briefly mention
the importance of deception and misdirection. One of the most important examples
concerns the Battle of Midway between America and Japan in World War II. The
U.S. victory there has been described as the turning point in the Pacific War; Japan
lost all four of its carriers engaged in the battle, while the U.S. lost only one of its
three.
Allied cryptographers had broken much of Japan’s code and knew that a major
attack was being planned; this provided an opportunity for Americans to position
their fleet to surprise their enemy. The target was listed only as “AF”. There was
reason to believe this meant Midway, but if it were another location, the American
fleet could be unavailable for the true engagement. It was essential to determine
if the target was Midway. Commander Joseph J. Rochefort (1900–1976) and his
team devised a truly elegant way to verify the target. Through a secure channel
they told Midway to broadcast that it had a broken water pump and would be
short on fresh water. Shortly afterward they intercepted a Japanese message which
declared that “AF” was short on water. This trick convinced Admiral Chester W.
Nimitz (1885–1966) to commit his forces. For another great example of deception,
look up Operation Fortitude, the Allied deception to convince the Germans that
the invasion of Europe would be at Calais and not the true site, Normandy.
Bibliography
[1] M. Cozzens and S. J. Miller, The mathematics of encryption: An elementary introduc-
tion, Mathematical World, vol. 29, American Mathematical Society, Providence, RI, 2013.
MR3098499
[2] F. H. Hinsley and A. Stripp (eds.), Codebreakers: The inside story of Bletchley Park, The
Clarendon Press, Oxford University Press, New York, 1993. MR1243675
[3] A. R. Miller, The Cryptographic Mathematics of Enigma, NSA Pamphlet, 2001.

[4] L. Mundy, Code Girls: The Untold Story of the American Women Code Breakers of World
War II, Hachette, 2017.
[5] S. Singh, The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptog-
raphy, Fourth Estate, 1999.
[6] W. Trappe and L. C. Washington, Introduction to cryptography with coding theory, 2nd ed.,
Pearson Prentice Hall, Upper Saddle River, NJ, 2006. MR2372272
[7] Wikipedia, Operation Fortitude, https://en.wikipedia.org/wiki/Operation_Fortitude.
[8] J. Wilcox, Solving the Enigma—History of the Cryptanalytic Bombe, NSA Pamphlet, 2001.
1944
Theory of Games and Economic Behavior
Introduction
In 1944 John von Neumann and Oskar Morgenstern (1901–1997) published
Theory of Games and Economic Behavior [7], the seminal book in the field of
game theory. Since its publication, game theory has steadily become more widely
used both within and across disciplines. It is now a leading analytical tool in
microeconomic theory, formal political theory, evolutionary biology, and ecology
[5, 6]. It is even used in the study of literature and philosophy [2]. One can argue
that game theory is one of the great success stories of modern applied mathematics.
One of the central problems in the field is the determination of equilibrium
strategies for rational participants. These range from existence questions (does an
equilibrium exist and, if so, of what type?) to normative ones (what is the optimal
equilibrium?). Enormous progress was made in 1950. In his Princeton mathematics
dissertation [3], John Forbes Nash Jr. (1928–2015) proved that in a large class of
noncooperative games, an equilibrium exists in which no player has an incentive to
change his behavior. These points are now called Nash equilibria.
Nash’s biography [4] was turned into a movie by the same name, A Beautiful
Mind, which won the Oscar for Best Picture in 2002. The movie dramatized the
scene in which Nash thought of the idea for his thesis. Mathematicians, however,
might get more of a kick out of a different scene, described in the book but left
out of the movie, in which Nash visited von Neumann’s office to share his idea.
The book reports that the meeting was short and ended with von Neumann saying
“That’s trivial, you know. That’s just a fixed-point theorem.” The idea won Nash
a Nobel Prize in 1994.1
Sadly, Nash and his wife, Alicia, died in a car crash in 2015 after returning from
Norway, where Nash had just received the famed Abel Prize. The driver of their
taxi lost control and the couple was ejected from the vehicle (they were not wearing
seat belts). The driver, who was wearing a seat beat, sustained non-life-threatening
injuries.
1 This is not to say that von Neumann did not know what he was talking about. One of
the leading mathematicians of the 20th century, von Neumann’s work inspired the work of Fields
Medalists Alain Connes (1947– ) in 1982 and Vaughan F. R. Jones (1952– ) in 1990, both of
whom worked on von Neumann algebras (see the 1985 entry). He also made seminal contributions
to the foundations of mathematics (see the 1924 entry), ergodic theory (1931 entry), operator
theory, numerical analysis (1926 entry), quantum mechanics, hydrodynamics, fluid dynamics, and
statistics. He was also one of the founders of computer science and he was a key member of the
Manhattan Project.
163
164 1944. THEORY OF GAMES AND ECONOMIC BEHAVIOR

Proposed by Daniel F. Stone, Bowdoin College, and Steven J. Miller,
Williams College.
A set of (two or more) individuals is in Nash equilibrium if each individual’s
strategy is optimal, holding the others’ strategies fixed, that is, if no single player
has an incentive to unilaterally change his plan of action. Unfortunately, the movie
botched the illustration of this concept. Nash discovers the idea in a bar when he
is with four male friends, and four brunette women and one blonde enter. Nash’s
(supposed) insight is that they should resist their temptation to each pursue the
blonde. He suggests instead, “What if no one goes for the blonde. . . We don’t get
in each other’s way. . . That’s the only way we win.” He says this while imagining
his four friends matching up with the four brunettes, with the blonde ignored, and
himself excluded.
To follow the movie’s simplistic, dated structure, assume that matching with
a brunette yields a positive payoff (for the male matched) and matching with the
blonde a higher payoff. If a male matches with no one, then his payoff is zero.
Assume also, as described in the movie, that if the male does not match with the
first female he pursues, then he matches with no one and that the probability of
matching with a female pursued by n males is 1/n. Suppose Nash is not a player
in the game, as in the scene he pictures in the movie, so there are just four males,
four brunettes, and one blonde. Why is the situation Nash pictures (in which
each male pursues and matches with a brunette) not a Nash equilibrium? Under
what conditions on the payoffs would it indeed (contrary to Nash’s claim in the
movie) be a Nash equilibrium for each male to pursue the blonde (and thus for
three males to fail to match)? How might it matter (or not) if the males chose
their actions simultaneously or sequentially? Finally, return to simultaneous play
by the males, and suppose the blonde might be more interested in some males than
others. She can smile at a male to signal this interest, but her smile might also
be incidental. Suppose a smile indicates a 0.75 chance of liking the male smiled at
the most (otherwise, she likes each equally). Is it still possible for it to be a Nash
equilibrium for each male to pursue the blonde? If so, under what conditions?
1944: Comments
The Borsuk–Ulam theorem. As von Neumann noted, the heavy lifting in
the theory of Nash equilibria is done by a fixed-point theorem. To be more specific,
if T : X → X is a function, then xf ∈ X is a fixed point of T if T (xf ) = xf . The
most famous fixed-point theorem is undoubtedly Brouwer’s fixed-point theorem;
see the 2009 entry. We describe a few simple fixed-point theorems below.
Our first fixed-point theorem involves only calculus. We claim that if T :
[0, 1] → [0, 1] is continuous, then it has at least one fixed point; see Figure 1.
Consider the function Δ : [0, 1] → [0, 1] defined by Δ(x) = T (x) − x. If either
Δ(0) or Δ(1) vanishes, then we are done. Consequently, we may suppose that
Δ(0) > 0 and Δ(1) < 0. Then the intermediate value theorem ensures that there
is an xf ∈ (0, 1) such that Δ(xf ) = 0. In other words, T (xf ) = xf .

Figure 1. The function T : [0, 1] → [0, 1] given by T (x) =

sin2 x4 +0.0712
1
cos sin2 (x + 2) has nine fixed points. Each in-
tersection of its graph with the diagonal line y = x is a fixed point
of T .
A similar argument yields the one-dimensional version of the Borsuk–Ulam

theorem: there are two antipodal points on the Earth’s equator that have the same
temperature. Let S n = {x ∈ Rn+1 : x = 1} denote the n-dimensional unit
sphere. The full Borsuk–Ulam theorem asserts that if T : S n → Rn is continuous,
then there is an x ∈ S n such that T (−x) = T (x).
Imagine that the Earth is the unit sphere in R3 ; the equator is identified with
S . If x ∈ R3 is on the equator, then so is −x; see Figure 2. If T (x) denotes the
1
temperature at x, then Δ(x) = T (x) − T (−x) measures the temperature difference

between antipodal points x and −x on the equator. For a given x0 on the equator,
either Δ(x0 ) = Δ(−x0 ) = 0, in which case we are done, or Δ(x0 ) and Δ(−x0 ) have
opposite signs. Assuming that T , and hence Δ, is continuous, the intermediate
value theorem ensures that somewhere on the equator between x0 and −x0 , the
function Δ assumes the value 0. This yields a pair of antipodal points with the
same temperature.
Contraction mapping principle. A more general, and constructive, fixed-

point theorem is the contraction mapping principle. Suppose that X is a complete
metric space with metric d and that T : X → X is a function for which there exists
a constant α ∈ [0, 1) such that
d(T (x), T (y)) ≤ αd(x, y) for all x, y ∈ X. (1944.1)
Then T is a uniformly strict contraction. The contraction mapping principle as-
serts that T has a unique fixed point xf . Moreover, it also ensures that for each
x0 ∈ X, the sequence defined by xn = T (xn−1 ) for n = 1, 2, . . . converges to xf .
This important consequence of the contraction mapping principle means that we
can start with any x0 ∈ X and be guaranteed that that sequence xn of iterates
166 1944. THEORY OF GAMES AND ECONOMIC BEHAVIOR
−x0
x0
Figure 2. Illustration of the one-dimensional Borsuk–Ulam Theorem.
converges to the unique fixed point guaranteed by the theorem. The method of
Picard iteration 2 for solving differential equations is based upon this idea; see [1].
Who cares about uniqueness? Consider the initial value problem

y (t) = y(t), y(0) = 0, (1944.2)
on [0, ∞). We regard the differential equation (1944.2) as a law or model that
governs some system that evolves with respect to the variable t, which represents
time. The initial condition y(0) = 0 gives us some initial data from which we wish
to predict the future, that is, find a function y : [0, ∞) → R that satisfies (1944.2).
One solution to (1944.2) is y(x) = x2 /4. However, it is not the only one since

0 if x ≤ τ ,
yτ (x) = 1
4 (x − τ )
2
if x > τ
also solves (1944.2) for any τ ≥ 0 (a short argument with the definition of the
derivative confirms that yτ (x) is differentiable at x = τ ). Thus, our model is
useless for predicting the future since it says that y might remain zero forever, or
it might suddenly start increasing quadratically at any moment!
Bibliography
[1] W. Boyce and R. DiPrima, Elementary differential equations and boundary value problems
(seventh edition), John Wiley & Sons, 2000.
[2] M. S.-Y. Chwe, Jane Austen, game theorist, Princeton University Press, Princeton, NJ, 2014.
MR3222736
[3] J. F. Nash Jr, Non-cooperative games, Thesis (Ph.D.)–Princeton University, ProQuest
LLC, Ann Arbor, MI, 1950. http://www.princeton.edu/mudd/news/faq/topics/Non-
Cooperative_Games_Nash.pdf. MR2938064
[4] S. Nasar, A beautiful mind, Simon & Schuster, New York, 1998. MR1631630
2 Themathematician Charles Émile Picard (1856–1941) was a distant ancestor of the famed
Jean-Luc Picard (2305 – ?). Although this is supported by neither canonical nor noncanonical
sources, we would like to “make it so.”
[5] T. C. Schelling, The Strategy of Conflict, Oxford University Press, 1960.

[6] J. Maynard Smith, The theory of games and the evolution of animal conflicts, J. Theoret.
Biol. 47 (1974), no. 1, 209–221, DOI 10.1016/0022-5193(74)90110-6. MR0444115
[7] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior, Princeton
University Press, Princeton, New Jersey, 1944. MR0011937
1945
The Riemann Hypothesis in Function Fields
Introduction
Suppose that F is a field. If α is the root of√a polynomial with coefficients in F,
then α is algebraic over F. For instance, i = −1 is algebraic over R since it is a
root of x2 + 1. We say that the degree of the extension field C = R(i) is two, since
C = {a + bi : a, b ∈ R} is a two-dimensional vector space over R. If an extension
field F(α) is finite-dimensional over F, then it is a finite extension of F. A finite
extension
√ of the rational numbers is a number field . The quadratic number fields
Q( d) for various d are an important class of examples; see the 1966 entry.
If α is not the root of a polynomial with coefficients in F, then α is transcen-
dental over F. For example, π and e are known to be transcendental over Q; see
the 1918, 1934, 1935, 1955, and 1973 entries for more information. If α is transcen-
dental over F, then no nontrivial F-linear combination of the powers of α equals
zero. That is, a nonzero polynomial in α cannot be reduced to elements of F. Con-
sequently, the extension field F(α) obtained from F by including α resembles the
field of rational functions, with coefficients in F, in the “variable” α. That is,
2 p(α) 3
F(α) = : p, q are polynomials .
q(α)
A function field in one variable over F is a field K, containing F and at least one
element x that is transcendental over F, such that K is a finite algebraic extension
of F(x). For instance, we can consider the subfield K of F(x, y) generated by two
transcendental elements x and y that satisfy the defining equation y 2 = x5 + 1.
Here F(x, y) denotes the set of all quotients of polynomials in x and y. Such a field
is said to have transcendence degree one over F. A function field in one variable
over a finite field of constants is a global function field .
It is often easier to prove results for function fields than number fields. This
is one reason for their popularity: they provide a nice testing ground to explore
what may be true for number fields. For instance, Fermat’s last theorem is famously
difficult to prove (see the 1995 entry) but its function-field analogue is a consequence
of the Mason–Stothers theorem (see the 1981 entry), which itself has an elementary
proof that boils down to a careful study of polynomials and their derivatives. The
Riemann hypothesis (see the 1942 entry) is no exception to this rule. Weil proved
it in the function field setting in the 1940s; we are still waiting for a proof of the
Riemann hypothesis in the classical case.
Since the Riemann zeta function plays such a useful role in understanding the
prime numbers, we wish to define an analogue for function fields: the zeta function
of a global function field K over Fq , the finite field with q elements. We must first
169
170 1945. THE RIEMANN HYPOTHESIS IN FUNCTION FIELDS
introduce some notation; for more on these objects see [8]. A prime in K is a
discrete valuation ring R with maximal ideal P such that F ⊂ R and the quotient
field of R is equal to K. The group of divisors of K, denoted DK , is the free abelian
group generated by the primes. For A ∈ DK , we define the norm of A, denoted
N (A), to be q degA . The zeta function of K is
−1
1 1
ζK (s) = = 1− , Re s > 1.
N (A)s N (P )s
A≥0 P prime in K
Like the Riemann zeta function, ζK (s) can be analytically continued to a larger
domain (see the 1942 entry). The Riemann hypothesis over global function fields
is a theorem, first proved by André Weil in the 1940s.
The Riemann Hypothesis for Function Fields: Let K be a global function

field over Fq . All the roots of ζK (s) lie on the line Re s = 1/2.
The result above was first conjectured for hyperelliptic function fields by Emil
Artin (1898–1962) in his thesis. The simplest case (elliptic curves; see the 1921
entry) was proved by Helmut Hasse. The first proof of the general result was
published by Weil in 1948. Weil presented two proofs of this theorem. The first
used the geometry of algebraic surfaces and the theory of correspondences. The
second used the theory of abelian varieties; see [13, 14]. The whole project required
revisions in the foundations of algebraic geometry since he needed these theories
to be valid over arbitrary fields not just algebraically closed fields in characteristic
zero. In the early seventies, Fields Medalist Enrico Bombieri (1940– ) obtained
a more elementary proof, building upon important work of Sergei Aleksandrovich
Stepanov (1941– ).

Proposed by Julio Andrade, IHÉS.
Let K be a global function field in one variable with a finite constant field Fq
with q elements. Suppose that the genus of K is g. Prove that there is a polynomial
LK (u) ∈ Z[u] of degree 2g such that
LK (q −s )
ζK (s) = .
(1 − q −s )(1 − q 1−s )
You will need to use the Riemann–Roch theorem. For more details about the genus
of a function field and the Riemann–Roch theorem see [9].
1945: Comments
Special values of the Riemann zeta function. Since we have discussed the
Riemann zeta function in this entry, now is a good time to explore some more of its
intriguing properties. While reading the results below ask yourself: do analogous
statements hold in the function field setting?
The Riemann zeta function can be evaluated at the even positive integers in
closed form. The first few values are
π2 π 10
ζ(2) = , ζ(10) = ,
6 93555
π4 691π 12
ζ(4) = , ζ(12) = ,
90 638512875
π6 2π 14
ζ(6) = , ζ(14) = ,
945 18243225
π8 3617π 16
ζ(8) = , ζ(16) = ;
9450 325641566250
see the 1919 entry for an evaluation of ζ(2). In fact, Euler showed that ζ(2k) is a
rational multiple of π 2k for k = 1, 2, . . .. On the other hand, the exact value of
∞
1
ζ(3) = = 1.2020569031595942854 . . .
n=1
n3
is unknown. Fortune and glory await the mathematician who provides a closed-
form evaluation of ζ(3). The most significant result in this direction is due to
Roger Apéry (1916–1994), who proved that ζ(3) is irrational in 1979 [1] (see the
1979 entry).
The Riemann zeta function and arithmetic functions. To further high-

light the significance of the Riemann zeta function, let us investigate some of its
connections with arithmetic functions from number theory. The divisor function
τ (n) counts the number of divisors of n. Thus, τ (n) = 2 if and only if n is a prime.
If p and q are distinct primes, then τ (p2 q) = 6 since the divisors of p2 q are 1, p, p2 ,
q, pq, and p2 q. Moreover, τ (p2 )τ (q) = 3 · 2 = 6 = τ (p2 q). This is not a coincidence,
for τ is a multiplicative function. This means that τ (mn) = τ (m)τ (n) whenever m
and n are relatively prime.
If Re s > 1, then we may square the zeta function and obtain
∞
1 2
ζ 2 (s) =
n=1
ns

1 1 1 1 1 1
= 1 + s + s + s + ··· 1 + s + s + s + ···
2 3 4 2 3 4

1 1 1 1 1 1 1 1
= 1+ + s + + s + + s · s + s + ···
2s 2 3s 3 4s 2 2 4
2 2 3 2 4
= 1 + s + s + s + s + s + ···
2 3 4 5 6
∞
τ (n)
= ;
n=1
ns
term-by-term multiplication is permissible here since both series involved are ab-
solutely convergent (see p. 110). This suggests that we might extract information
about the divisor function from knowledge of ζ(s). Experience has taught number
theorists that almost any interesting arithmetic function can be expressed in terms
of the zeta function.
172 1945. THE RIEMANN HYPOTHESIS IN FUNCTION FIELDS
A famous open problem in this area is the Dirichlet divisor problem. It asks
for the infimum over all α such that

n
τ (j) = n log n + (2γ − 1)n + O(nα ),
j=1
in which γ is the Euler–Mascheroni constant (1942.5). Dirichlet himself showed

that α = 12 works, so the infimum is at most 12 ; a simple proof can be found in [2].
In particular, the average value of τ (n) tends to log n + 2γ − 1:
1
n
lim τ (k) − (log n + 2γ − 1) = 0.
n→∞ n
k=1
On the other hand, Edmund Landau (1916) showed that the infimum must be ≥ 14 .

It is customary to write τ (n) = d|n d, in which the subscript d|n indicates that
the sum runs over all of the positive divisors of n. This suggests the generalization

σk (n) = dk ,
d|n
which sums the kth powers of the divisors of n. If k = 0, then σ0 = τ . The case
k = 1 is also special; we write σ = σ1 and refer to this as the sigma function (or the
sum of divisors function). Like the τ function, the functions σk are multiplicative.
For Re s > 2, we have
∞
1 1
∞
ζ(s)ζ(s − 1) =
n=1
ns m=1
ms−1
∞
1 m
∞
=
n=1
ns m=1
ms

1 1 1 2 3 4
= 1 + s + s + s + ··· 1 + s + s + s + ···
2 3 4 2 3 4

1 2 1 3 1 2 4
= 1+ + + + + + + + ···
2s 2s 3s 3s 4s 4s 4s
∞
σ(n)
= ,
n=1
ns
which reveals a connection between σ and ζ. In a similar vein, Srinivasa Ramanujan

derived the identity
∞
σa (n)σb (n) ζ(s)ζ(s − a)ζ(s − b)ζ(s − a − b)
= (1945.1)
n=1
n s ζ(2s − a − b)
that Albert Ingham (1900–1967) used in 1930 to provide a quick proof of the prime
number theorem [7]. Ramanujan’s formula reduces to the curious formula
∞
(τ (n))2 ζ 4 (s)
s
=
n=1
n ζ(2s)
when a = b = 0. Another appealing formula of Ramanujan is

∞
π 2 n cq (n)
σ(n) = ,
6 q=1 q 2
in which
cq (n) = e2πian/q
a=1
gcd(a,q)=1
is a Ramanujan sum [10]
Bibliography
[1] R. Apéry, Irrationalité de ζ(2) et ζ(3) (French), Luminy Conference on Arithmetic, Astérisque
61 (1979), 11–13. MR3363457
[3] E. Artin, Quadratische Körper in Geibiet der Höheren Kongruzzen I and II, Math. Z.
19 (1924), 153-296. https://eudml.org/doc/167773 and https://eudml.org/doc/167774.
[4] E. Bombieri, Problems of the millennium: the Riemann hypothesis, Clay Mathematics In-
stitute, http://www.claymath.org/sites/default/files/official_problem_description.
pdf.
[5] E. Bombieri, Counting points on curves over finite fields (d’après S. A. Stepanov),
Séminaire Bourbaki, 25ème année (1972/1973), Exp. No. 430, Lecture Notes in Math.,
Vol. 383, Springer, Berlin, 1974, pp. 234–241. http://link.springer.com/chapter/10.1007
%2FBFb0057311. MR0429903
[6] G. Lejeune Dirichlet, Sur l’usage des séries infinies dans la théorie des nombres (French), J.
Reine Angew. Math. 18 (1838), 259–274, DOI 10.1515/crll.1838.18.259. MR1578191
[7] A. E. Ingham, Note on Riemann’s zeta-Function and Dirichlet’s L-Functions, J. London
Math. Soc. 5 (1930), no. 2, 107–112, DOI 10.1112/jlms/s1-5.2.107. MR1574211
[8] K. Ireland and M. Rosen, A classical introduction to modern number theory, 2nd ed., Grad-
uate Texts in Mathematics, vol. 84, Springer-Verlag, New York, 1990. MR1070716
[9] C. Moreno, Algebraic curves over finite fields, Cambridge Tracts in Mathematics, vol. 97,
Cambridge University Press, Cambridge, 1991. MR1101140
[10] S. Ramanujan, On certain trigonometrical sums and their applications in the theory of num-
bers [Trans. Cambridge Philos. Soc. 22 (1918), no. 13, 259–276], Collected papers of Srini-
vasa Ramanujan, AMS Chelsea Publ., Providence, RI, 2000, pp. 179–199. MR2280864
[11] P. Sarnak, Problems of the millennium: the Riemann hypothesis (2004), Clay Mathematics
Institute, http://www.claymath.org/sites/default/files/sarnak_rh_0.pdf.
[12] A. Weil, On the Riemann hypothesis in function fields, Proc. Nat. Acad. Sci. U. S. A. 27
(1941), 345–347. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1078336/. MR0004242
[13] A. Weil, Sur les courbes algébriques et les variétés qui s’en déduisent (French), Actualités
Sci. Ind., no. 1041 = Publ. Inst. Math. Univ. Strasbourg 7 (1945), Hermann et Cie., Paris,
1948. MR0027151
[14] A. Weil, Variétés abéliennes et courbes algébriques (French), Actualités Sci. Ind., no. 1064 =
Publ. Inst. Math. Univ. Strasbourg 8 (1946), Hermann & Cie., Paris, 1948. MR0029522
1946
Monte Carlo Method
Introduction
While today it is hard to gaze around a room without seeing a computer,
be it in a smartphone or a thermostat, the situation was different during World
War II. Computers were in their infancy. They were rare, expensive, and big.
Early computers could fill an entire room and they had enormous power demands.
A major leap came when people realized that they could be used for more than
computing exact answers to specific problems. They can be used to approximate
the answers to difficult problems through extensive simulations. This led to what
is now called the Monte Carlo method (the name refers to the famous casino on
Monaco).
The first thoughts and attempts I made to practice [the Monte Carlo
method] were suggested by a question which occurred to me in 1946
as I was convalescing from an illness and playing solitaires. The ques-
tion was what are the chances that a Canfield solitaire laid out with
52 cards will come out successfully? After spending a lot of time try-
ing to estimate them by pure combinatorial calculations, I wondered
whether a more practical method than “abstract thinking” might not
be to lay it out say one hundred times and count the number of success-
ful plays. This was already possible to envisage with the beginning of
the new era of fast computers, and I immediately thought of problems
of neutron diffusion and other questions of mathematical physics, and
more generally how to change processes described by certain differen-
tial equations into an equivalent form interpretable as a succession of
random operations. – Stanislaw Ulam (1909–1984) [2]
Monte Carlo techniques are now used to approximate the solution to numerous
problems. Rather than finding exact answers, one can simulate millions of cases and
use that information to obtain an excellent approximation to the correct answer. An
early application was to nuclear reactions, in which scientists would approximate
both the trajectories of neutrons and the numbers released in each collision. A
more down-to-earth example involves integration. In calculus, students learn how
to compute areas by integration. Instructors work hard to find functions that have
nice antiderivatives; a general function does not have a closed-form expression for
its integral (see the 1968 and 1976 entries). For instance, the definite integrals
1 47 1
dx
e−x dx
2
1 + x3 dx, , and
0 2 ln x 0
175
176 1946. MONTE CARLO METHOD
cannot be computed directly using antiderivatives; none of the integrands are

derivatives of elementary functions. They can, however, be computed quickly and
accurately using Monte Carlo integration.
Suppose that we want to find the area of a region R in R2 . For simplicity
suppose that R is a subset of the unit square [0, 1]2 . Then choose N points in
[0, 1]2 uniformly at random. Whatever fraction lies in R is our approximation to
the area. The central limit theorem (see the 1922 entry) ensures that this is a
good approximation, and for large N , it gives us bounds on our error. Figure 1
illustrates the Monte Carlo method for computing the area bounded by the ellipse
x2 + 4y 2 = 1.
As another example,
/1 one can use the Monte Carlo method to approximate the
definite integral 0 e−x dx. Take N random points (x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )
2
in [0, 1]2 , count how many of them satisfy yi ≤ e−xi , then divide this number by
2
N to obtain an approximation to the integral.

Another famous example of the Monte Carlo philosophy, which actually pre-
dates the method by many years, can be seen in Buffon’s needle problem. The
following problem was posed by Georges-Louis Leclerc (1707–1788), Comte de Buf-
fon. Given infinitely many parallel lines exactly d units apart, independently drop
N needles of length , and count how many times the needles intersect the lines;
see Figure 2. If ≤ d, then one can show that as N → ∞ the expected number of
hits tends to
2
N. (1946.1)
πd
The claimed answer is reasonable: if we rescale the separation of the lines by r
and the length of the needle by r, the percentage of hits should not change (this
is equivalent to passing from meters to feet, say). Also, in the limit as d → ∞ the
percentage of tosses resulting in a hit tends to zero, while as → ∞ we expect to
cross more and more lines each toss.
There are many ways to prove this result. One is a direct, brute-force calcula-
tion, putting a probability measure on the space of needle tosses. Without loss of
generality, by symmetry we can parametrize the space by saying the center of the
rod lands on the x-axis (and there is a line at x = 0) somewhere between −d/2 and
d/2 and the angle the needle makes with the x-axis is θ ∈ [0, 2π). Actually, we can
exploit symmetry even more: it suffices to assume the center lands between 0 and
d/2 and that the angle is between 0 and π/2. The result now follows from direct
integration, which you are encouraged to set up and do.
There is a truly remarkable and elegant derivation of (1946.1) that completely
avoids integration. You can find the complete details in [1], although we encourage
you to ponder the following sketch and make it rigorous since there are a lot of
powerful techniques involved. First, observe that the answer only depends on the
ratio /d; rescaling both by the same amount is effectively the same as just changing
the units. Next, the answer is linear in ; the expected number of intersections from
two sticks of length 1 and 2 is the same as using one stick of length 1 +2 . Finally,
the answer is linear in N (if we double the number of tosses, we double the number
of expected hits). Putting all of this together we see there is some constant c such
that the expected number of hits is c d N .
(a) N = 100, A ≈ 1.24 (b) N = 500, A ≈ 1.504
(c) N = 1,000, A ≈ 1.576 (d) N = 5,000, A ≈ 1.6016
(e) N = 10,000, A ≈ 1.5976 (f) N = 20,000, A ≈ 1.5838
Figure 1. Monte Carlo method for computing the area A bounded by the
ellipse x2 + 4y 2 = 1. The true area is π/2 = 1.570796 . . .; see the comments
for the 1941 entry for the derivation.
Figure 2. Throwing 1,000 needles of length 1/2 on an array of

vertical lines spaced 1 unit apart. There are 318 hits, yielding an
approximation π ≈ 1,000
318 = 3.14465. This differs from π ≈ 3.14159,
for a relative error of approximately 0.00097. Equivalently, with
1,000 tosses we have two digits of decimal accuracy.
All that remains is to determine c. Our solution uses a powerful method: if we

can find c for one specific choice of and d, we know c for all and d; see the 1914
entry for more on this method. Let us find c when = πd. To do this, we imagine
that instead of tossing rods we toss a circle1 with diameter d. No matter how the
circle lands, it intersects the vertical lines exactly twice (most of the time it will be
the same line twice, but if it lands just right, it will be touched at the extremes by
two adjacent lines). Thus, with a perimeter of πd we have two intersections, so if
we toss the circle N times, we expect 2N intersections. Thus,
πd 2
c N = 2N =⇒ c = ,
d π
exactly as we had before.

One of the most important steps in the Monte Carlo method is the ability to
choose numbers randomly. You may be surprised by how hard it is to generate
a “random” sequence of points. Frequently one generates a sequence of quasi-
random points through a deterministic process, which is often good enough for
1 Youmight object to our tossing a circle when the problem is about tossing rods; however,
we may approximate the circle as a regular n-gon with many small sides. As the number of sides
tends to infinity, the circle corresponds to lots of little rods falling at all angles equally.
applications. A popular, early method is the von Neumann middle square digits
method, described with some nice references in the “random numbers” section of [3].
Given an n-digit natural number, square it to get a 2n-digit number. Our random
number is the middle n digits. We then square that, take the middle n digits of
the new product, and obtain our next “random” number. Continuing this process
generates our pseudo-random sequence of numbers. For example, if we start with
4321, our next number is 6710 since 43212 = 18671041. Since 67102 = 45024100,
our next number is 241.
This process cannot generate numbers uniformly at random, even if we restrict
ourselves to numbers from 0 to 10n −1. The reason is simple: this process generates
a periodic sequence! After at most 10n − 1 terms we have a repeat, at which point
the pattern cycles since all future terms are completely determined by the preceding
value.
For each n, what is the shortest period? The longest? How many of the 10n
initial seeds have the shortest (or longest) period? Can you give an example? If you
cannot solve this problem exactly, can you approximate the answer using Monte
Carlo techniques?
1946: Comments
How many trials? One of the most natural questions to ask when doing a
Monte Carlo simulation is, “How many trials N are needed for a given accuracy?”
√ In
many settings the convergence is fast, with the error on the order of 1/ N . Let us
explore approximating the area A of a region R contained in the unit square [0, 1]2 .
Let X1 , X2 , . . . , XN be independent, identically distributed random variables, in
which each is 1 with probability A and 0 with probability 1 − A (each Xn is a
Bernoulli random variable; see the comments for the 1922 entry). The fraction
X1 + · · · + XN
YN =
N
has expected value A and variance
1 N N
1
Var YN = Var Xn = 2
A(1 − A)
N n=1 n=1
N
A(1 − A)
= .
N
As N → ∞, the central limit theorem (see the 1922 entry) ensures that the
random variable YN converges to being normally distributed with mean A and
standard deviation "
A(1 − A)
.
N
The greatest uncertainty is when A = 1/2, for which the standard deviation is√at
most 2√1N . The probability that the observed estimate is off by more than 2/ N
is bounded by the probability of being more than four standard deviations from
the mean,√which is approximately 0.0000633425. If instead we asked about being
within 3/ N , the probability of failing decreases to at most 1.973176 · 10−9 .
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the 1998
original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin, 2018.
MR3823190
[2] R. Eckhardt, Stan Ulam, John von Neumann, and the Monte Carlo method, with contribu-
tions by Tony Warnock, Gary D. Doolen and John Hendricks; Stanislaw Ulam 1909–1984,
Los Alamos Sci. 15, Special Issue (1987), 131–137. http://library.lanl.gov/cgi-bin/
getfile?00326867.pdf. MR935772
[3] N. Metropolis, The beginning of the Monte Carlo method, Stanislaw Ulam 1909–1984,
getfile?00326866.pdf. MR935771
[4] N. Metropolis and S. Ulam, The Monte Carlo method, J. Amer. Statist. Assoc. 44 (1949),
335–341. http://www.jstor.org/stable/2280232. MR0031341
[5] Wikipedia, Monte Carlo method, http://en.wikipedia.org/wiki/Monte_Carlo_method.
1947
The Simplex Method
Introduction
There are many important problems for which an algorithm to find a solution
exists but has a prohibitively long run time that limits its practical value. One
example is integer factorization: given an integer N , write it as a product of primes.
We give one solution below without any attempt to improve its efficiency.
• Step 1: Initialize Factors(N ) to be the empty set; as the name suggests, we will
store the factors of N here. Let M = N and n = 2 and continue to Step 2.
• Step 2: If n divides M , then append n to Factors(N ), replace M with M/n,
and continue to Step 3. If n does not divide M , then let n = n + 1; if n = M ,
then append n to Factors(N ) and go to Step 4, else repeat this step.
• Step 3: If M > 1, then set n = 2 and repeat Step 2, else go to Step 4.
• Step 4: Print Factors(N ) and stop.
This algorithm is painfully slow since it requires us to check all numbers up to

N as potential divisors. We can make many improvements, although none of these
yields a practical algorithm. Once we find an n that divides M , we should see how
many times n divides M ; this would save us from having to return to n = 2 each
time
√ we restart Step 2. Next, √ we can notice that any prime factor of N is at most
N , and hence once n > N we know N is prime. Finally, if we are able to store
the earlier prime numbers, we need only check n that are prime.√ Even if we do all
of these, however, we still have to check all primes at most N . The prime number
theorem tells us that the number of primes at most x is approximately x/ log x. If
N is around 10406 , we need to check about 2 × 10200 numbers. This is well beyond
what modern computers can do.
While factorization is easy to do in principle, in practice the “natural” approach
is too slow to be useful. It is a major open problem to find a fast way to factor
numbers. If such an algorithm existed, then encryption schemes such as RSA
(described in the 1977 entry) would be insecure. Interestingly, while we cannot
quickly factor a number, we can quickly tell if a number is prime (see the 2002
entry).
Our topic for this year concerns a different problem for which a fast algorithm is
available. Linear programming is a beautiful subject that is a natural outgrowth of
linear algebra. In linear algebra we try to solve systems of linear equations, such as
Ax = b. In linear programming we have a constraint matrix A and are now looking
for a solution to Ax = b that maximizes the profit cT x, in which c is fixed. Initially
181
182 1947. THE SIMPLEX METHOD
one allows inequalities in the linear system of constraints. By introducing additional

variables we can replace all the inequalities with equalities. We also require each
component of x to be nonnegative (it is a nice exercise to show that we may always
do this, though we may need to introduce some additional variables); doing so
allows us to put our linear programming problem into a standard, canonical form.
For example, one of the earliest successes in the subject concerns the diet prob-
lem. Here the entries of x are constrained to be nonnegative, with xk equal to the
amount of product k consumed. Each food provides a different amount of essen-
tial vitamins and minerals, and we wish to find the cheapest diet that will keep
us alive while ensuring that we get the minimum daily recommended allowance of
each nutrient. See [1] for a humorous recounting of the meeting between linear
programming and the diet problem.
One of the first theorems proved in the subject concerns the candidates for our
solution. We say x is feasible if it solves Ax = b. It turns out that the space of
feasible solutions has many nice properties. We call a solution x of the constraints
a basic solution if the columns of A corresponding to the nonzero entries of x
are linearly independent. It turns out that if there is an optimal solution to our
problem, then that optimal solution is a basic solution. Moreover, there are only
finitely many basic solutions. Thus, we need only check all the basic solutions to
find the optimal solution.
The problem is that naively searching the set of basic solutions is impractical
for large, real-world problems. Let A be an m × n matrix, so that x ∈ Rn . We
assume that n > m, since otherwise the system Ax = b is overdetermined. If every
subset of at most m columns of A is linearly independent, then the number of basic
solutions is at most
n n n
+ + ··· + .
0 1 m
For m, n large this is approximately nm /m!. To get some feel for how quickly
this expression grows, if n = 10,000 and m = 100, then the number of candidates
exceeds 10241 .
We need an efficient way to navigate the set of basic solutions. Fortunately,
there is such an approach. It is called the simplex algorithm and it was introduced
by George Dantzig (whom we met in the 1939 entry) in 1947. His procedure
and later generalizations allow us to solve many real-world problems in reasonable
amounts of time on everyday laptops.

Building on the success of the simplex algorithm, it is natural to consider other
generalizations of linear programming and ask if they too can be solved efficiently.
The first natural candidate is to replace the word “linear” with “quadratic.” Un-
fortunately, while quadratic objective functions can often be handled, to date we
still require the constraints to be linear.
To see why, we first consider another generalization. Instead of requiring the
solution vector x to have nonnegative real entries, let us require it to have nonnega-
tive integral entries. This is an extremely important class of optimization problems.
When the entries are restricted to 0 or 1, we can interpret the components as bi-
nary indicator variables. Do we have a plane leaving from Albany to Charlotte
at 2:45pm? Do we show The Lego Movie on our biggest screen at 10:30am? If
we are trying to solve the traveling salesman problem (what is the route of least
distance through a given set of cities?), is the fifth leg of our trip from Boston to
Rochester? These examples should convey the importance of solving binary integer
programming problems.
Prove that if we could modify the simplex method to handle problems with
quadratic constraints, then we could solve all integer programming problems! For
those familiar with the P versus NP problem (see the comments for the 2000 entry),
this would prove P equals NP.
1947: Comments
Overview of the simplex method. While we cannot describe the simplex
method in its full glory and prove why it works in a short introduction, we can at
least sketch what it is and give a sense of why it should work. Suppose that we
wish to solve the canonical linear programming problem: minimize cT x subject to
Ax = b for an m×n matrix A, in which the entries of x and b are nonnegative (if an
entry of b were negative, we could multiply the corresponding row of A by −1 and
reverse its sign). We also make the assumption that the problem is nondegenerate,
in the following sense.
Suppose that the m rows of A are linearly independent. If the rows are not
linearly independent, then either we cannot solve Ax = b or at least one of the
rows is unnecessary. We also assume b is not a linear combination of fewer than
m columns of A. If b is a combination of fewer than m columns, this will create a
technical difficulty in the simplex method. Fortunately this is a weak condition: if
we change some of the entries of b by small amounts (less than 10−10 , for example),
this should suffice to break the degeneracy.
The simplex method has two phases.
• Phase I: Find a basic feasible solution (or prove that none exists).
• Phase II: Given a basic feasible solution, find a basic optimal solution (or prove
that none exists). If no optimal solution exists, Phase II produces a sequence of
feasible solutions with cost cT x that tends to minus infinity.
The idea of the proof seems absurd at first: we start by assuming we can do
Phase II, use that to do Phase I, and then use Phase I to do Phase II. The reason
this argument is not circular is that the input of Phase II is a basic feasible solution.
If we have a problem for which we have one such solution, then we can run through
Phase II. Instead of the original problem, we instead consider a related one for which
we can find a basic feasible solution by inspection. It is to this related problem that
we apply Phase II to determine whether or not there is a basic feasible solution to
the original problem; if there is, we then use that solution as an input in applying
Phase II to the original problem.
We proceed by appending the m × m identity matrix to A to form the new
matrix A = [A I] and consider the following new canonical linear programming
problem: minimize z1 + · · · + zm subject to A (x1 , . . . , xn , z1 , . . . , zm )T = b with

xi , zj ≥ 0. By construction we can find a basic feasible solution: set each xi = 0
and set zj = bj .
Now that we have a basic feasible solution to this related problem, we can
apply Phase II. The cost cannot go to negative infinity, since the sum of the zj ’s
is at least zero. Thus, there is an optimal solution and there are two cases. If the
sum is zero, then we have found a feasible solution to our original problem (as the
only way the sum vanishes is if each zj = 0). If the sum is positive, then at least
one of the zj ’s is nonzero. This proves that there cannot be a feasible solution to
the original problem, for if there were, that would correspond to a solution with all
zj = 0 and hence lower cost.
For a full analysis of Phase II, see [2]; we summarize the main ideas here
through an application mentioned earlier: the diet problem. The main idea is that
the space of feasible solutions is given by the intersection of regions above or below
hyperplanes arising from the constraints. The optimal solution, if it exists, is either
in the interior or on the boundary. One then shows that if this minimum value
is attained inside, that same value is attained somewhere on the boundary and
thus it suffices to investigate these points. Consequently, the search for an optimal
solution is reduced to the “faces” of our boundary. The power of the simplex
method is that it also gives a very efficient way to flow to the optimal solution. We
give an example of one such path in Figure 1 and illustrate what is happening by
looking at a two-dimensional example.
Consider the diet problem, in which we have two foods with two nutrients (iron
and protein).
• The first food costs $20 per unit. Each unit contains 30 units of iron and 15 units
of protein.
• The second food costs $2 per unit. Each unit contains 5 units of iron and 10
units of protein.
Assume we need 60 units of iron and 70 units of protein daily to remain alive.
If we buy x1 units of the first and x2 units of the second, the constraints become
30x1 + 5x2 ≥ 60, (iron)
15x1 + 10x2 ≥ 70, (protein)
x1 , x2 ≥ 0.
The first two constraints reflect how much iron and protein we consume, ensuring
we meet the minimum requirements. The third constraint prevents us from eating
negative quantities. We want to minimize the cost C, which is given by
C = 20x1 + 2x2 .
We illustrate the search for the optimal solution in Figure 2. Changing the
value of C shifts the cost line up or down; thus different values generate a family of
parallel cost lines. Any two points on one of these lines have the same cost. If we
are at an interior point in the feasible region, we can flow down the cost line going
through it until we reach the boundary without changing the cost. Thus, there
cannot be a diet at an interior point cheaper than all the boundary diets. Next, we
Figure 1. An example of a feasible space for a linear program-

ming problem and a path generated by the simplex algorithm to
reach an optimum solution. The set of solutions is a convex poly-
tope, and an optimum solution (if it exists) must be one of the
vertices.
(a) The region of feasible solutions. (b) Searching for the cheapest diet.
Figure 2. An illustration of the simplex method for the two-food

diet problem.
can shift the cost line down and to the left and lower the cost. Doing so lands us at
one of the three vertices (unless the slope of the cost line equals the slope of one of
the boundary lines, but even in that case we would still have the value at a vertex
equal to the minimal cost). These arguments generalize to a large class of linear
programming problems and show that the optimal solution occurs at a boundary;
all that remains is to find a method to quickly reach such a point.
Bibliography
[1] G. Dantzig, The diet problem, Interfaces 20 (1990), no. 4, 43–47.
[2] J. Franklin, Methods of mathematical economics: Linear and nonlinear programming, fixed-
point theorems, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Berlin, 1980.
MR602694
[3] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Under-
graduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274
[4] Wikipedia, Simplex algorithm, https://en.wikipedia.org/wiki/Simplex_algorithm.
1948
Elementary Proof of the Prime Number Theorem
Introduction
The prime number theorem states that the number of primes at most x, denoted
π(x), is asymptotic to x/ log x:
π(x)
lim = 1;
x→∞ x/ log x
see the 1919 and 1933 entries. First conjectured in the 1790s, it was not proved
until almost 100 years later, when Jacques Hadamard and Charles Jean de la Vallée-
Poussin (1866–1962) independently established it in 1896. They both used complex
analysis to understand the distribution of zeros of the Riemann zeta function (see
the 1928, 1933, 1939, 1942, 1945, 1967, and 1987 entries).
Since the prime number theorem is a statement about integers and not about
complex analysis, these proofs were unsatisfactory to some. It felt unnatural to use
complex numbers to study primes.1 However, it was commonly believed that com-
plex analysis or other similarly “deep” methods were needed to prove it. According
to G. H. Hardy (see the 1940 entry):
No elementary proof of the prime number theorem is known, and one
may ask whether it is reasonable to expect one. Now we know that the
theorem is roughly equivalent to a theorem about an analytic function,
the theorem that the Riemann zeta function has no roots on a certain
line. A proof of such a theorem, not fundamentally dependent on the
theory of functions, seems to me extraordinarily unlikely.
It took almost fifty years for an elementary proof (that is, one that does not
rely on complex analysis) to be found. This was done by Paul Erdős [3] (see the
1913 entry) and Atle Selberg [9] in 1948. The story of who contributed what and
when, and who should receive what credit, has been the subject of many heated
discussions. Dorian Goldfeld (1947– ), who knew the players involved, has written
a good description of what happened [5]. See also [6] for a motivated account of
the proof.
The term “elementary” should not be confused with “easy.” The elementary
proofs of the prime number theorem are longer, more technical, and provide less
accurate estimates about π(x) than the complex analysis proofs do. In fact, the
classical approach is still preferred in most textbooks. We devote the remainder of
this entry to discussing the traditional, complex analysis approach.
1 A famous, humorous dictum is that the shortest path between two statements involving real
numbers is through the complex plane; that is certainly the case here.
187
188 1948. ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM
In Riemann’s seminal 1859 paper [8], he showed that knowledge of the zeros of
the zeta function yields information about π(x); see the 1942 entry. The fact that
the zeta function enjoys the Euler product representation (1933.3) suggests the use
of logarithmic derivatives. Recall that the logarithmic derivative of f is
f
(log f ) = .
f
The logarithmic derivative of a product is a sum of logarithmic derivatives:
(f g) f g + f g f g
= = + ;
fg fg f g
the same holds true for products with three or more factors. With appropriate
limit arguments, one can use this technique to study the Euler product (1933.3)
representation of the zeta function. The following theorem from complex analysis
permits us to pass from knowledge of the logarithmic derivative of a function to
knowledge of the number and location of the roots of that function.
Theorem: Let Ω be a nonempty, connected open set in C and let γ be a simple
closed curve in Ω with its interior in Ω. Let f : Ω → C be analytic with no zeros
on γ and let g : Ω → C be analytic. Then f has finitely many zeros in the interior
of γ and

1 f (s)
g(s) ds = g(ρ), (1948.1)
2πi γ f (s)
f (ρ)=0
in which ρ runs over the zeros of f , repeated according to multiplicity.

Up to lower order terms, when we integrate f (s) = ζ (s)/ζ(s) times g(s) = xs /s
along the line2 Re s = 2, we find (as is customary in number theory, p always denotes
a prime number)
xρ
log p = x − . (1948.2)
ρ
p≤x ζ(ρ)=0
Some care is needed in writing down the sum so that it converges (this is typically
done by summing the zeros in complex-conjugate pairs). As remarked in the 1942
entry, the analytic continuation of ζ(s) has only a single pole3 , which is simple and
at s = 1 with residue 1; this is responsible for the x = x1 /1 term in (1948.2). The
remaining terms come from the zeros of ζ(s). One can show that these zeros have
real part at most 1 without too much trouble; this follows from the convergence of
the Euler product (1933.3).
The prime number theorem asserts that
x
1 ∼ ,
log x
p≤x
2 The line Re s = 2 is not a simple closed curve. However, it is when viewed as a curve that
passes through ∞. Suitable limit arguments and the Riemann sphere model of the complex plane
(see the 1956 entry) are required to push this through.
3 A pole of f is an isolated singularity s around which f behaves like a constant times
0
(s − s0 )−k for some natural number k. The pole is simple if k = 1. The residue of f at a simple
pole s0 is lims→s0 (s − s0 )f (s).
which partial summation (the discrete analogue of integration by parts) confirms

is equivalent to

log p ∼ x.
p<x
The left-hand side of this expression is precisely the left-hand side of (1948.2). The
crucial step in the standard proof of the prime number theorem is to show that if
ζ(ρ) = 0, then Re ρ < 1. See [7] for a more fleshed out sketch of this proof or see
[3, 6] for full details.

Martin Aigner and Günter M. Ziegler’s wonderful tribute to Paul Erdős (the
hero of our 1913 entry) Proofs from The Book [1] gives six different proofs of the
infinitude of primes (Euclid’s theorem). These include Euclid’s proof (see p. 4), as
well as ones using Fermat numbers, Mersenne numbers, and topology (see the 1955
entry).
(a) One of our favorite proofs is that the irrationality of π 2 implies there are
infinitely many primes. Prove this claim. Deduce from it a lower bound for
the number of primes at most x. Hint: Use the fact that ζ(2) = π 2 /6; see the
1919 entry for a derivation.
(b) For another approach that uses the Riemann zeta function, compare the series
(1933.2) and product (1933.3) representations of ζ(s) as s → 1+ . The solution
can be found on p. 111.

(c) Use the fact that n≤x 1/n ∼ log x to obtain the estimate
1
∼ log log x
p
p≤x
as x → ∞.
1948: Comments
Mertens’s theorem. In 1874, a little more than 20 years before the prime
number theorem was proved, Franz Mertens proved that
1
lim − log log x − M = 0, (1948.3)
x→∞ p
p≤x
in which M = 0.2614972128476 . . . is the Meissel–Mertens constant. In fact, he

showed that
1
1
= log log x + M + O .
p log x
p≤x
For a proof of Mertens’s theorem, see almost any textbook on analytic number
theory (see [7] or [10]).
190 1948. ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM
The asymptotic growth rate log log x remains the same even if we sum over the
reciprocals of all prime powers. To be more specific,
1
= log log x + O(1),
pa
p ≤x
a
in which pa denotes a prime power and O(1) a quantity that remains bounded as
x → ∞. Indeed, Mertens’s theorem ensures that
1 1 1 1
a
= + a
≤ log log x + O(1) +
p p p nk
p ≤x
a a
p≤x p ≤x n≥2 k≥2
a≥2
1 1 1

= log log x + O(1) + 1 + + 2 + ···
n2 n n
n≥2
1 1
= log log x + O(1) + ·
n 1 − 1/n
2
n≥2
= log log x + O(1).
A zero-free region for the zeta function. Let us sketch the proof that
ζ(s) = 0 whenever s = σ + it with σ, t ∈ R and σ ≥ 1. This is very weak progress
towards the Riemann hypothesis, although it is enough to obtain the prime number
theorem (though with a poor error term). First, use the series definition (1933.2)
to prove that ζ(σ) = 0 if σ > 1. Then use the Euler product formula (1933.3) to
show that ζ(σ + it) = 0 if σ > 1.
We are left with the case σ = 1. This was originally independently proved by
Hadamard and de la Vallée-Poussin in 1896; fill in the details of Mertens’s elegant
proof from a few years later by proving the following statements.
(a) 3 + 4 cos θ + cos 2θ ≥ 0. (Hint: Consider (cos θ + 1)2 .)
∞
p−kσ
(b) For s = σ + it, log ζ(s) = e−itk log p .
p k=1
k
∞
p−kσ
(c) Re log ζ(s) = cos t log pk .
p k=1
k
(d) 3 log ζ(σ) + 4 Re log ζ(σ + it) + Re log ζ(σ + 2it) ≥ 0.

(e) ζ(σ)3 |ζ(σ + it)4 ζ(σ + 2it)| ≥ 1.
(f) If ζ(1 + it) = 0, then σ decreases to 1 from above and |ζ(σ + it)| < A(σ − 1)
for some A.
(g) Since ζ(σ) ∼ (σ−1)−1 (because ζ(s) has a simple pole of residue 1 at s = 1) and
ζ(σ +2it) is bounded as σ → 1 (the only pole of ζ(s) is at s = 1), the preceding
implies that if ζ(1 + it) = 0, then as σ → 1, ζ(σ)3 |ζ(σ + it)4 ζ(σ + 2it)| → 0.
Since the product must be at least 1, this proves ζ(1 + it) = 0.
The key to Mertens’s proof is the positivity of the trigonometric expression in (a).
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the
1998 original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin,
2018. MR3823190
[2] D. Burt, S. Donow, S. J. Miller, M. Schiffman, and B. Wieland, Irrationality measure and
lower bounds for π(X), Pi Mu Epsilon J. 14 (2017), no. 7, 421–429. http://arxiv.org/pdf/
0709.2184.pdf. MR3726946
[3] P. Erdős, On a new method in elementary number theory which leads to an elementary
proof of the prime number theorem, Proc. Nat. Acad. Sci. U. S. A. 35 (1949), 374–384, DOI
10.1073/pnas.35.7.374. MR0029411
[4] H. Furstenberg, On the infinitude of primes, Amer. Math. Monthly 62 (1955), 353, DOI
10.2307/2307043. MR0068566
[5] D. Goldfeld, The elementary proof of the prime number theorem: an historical perspective,
http://www.math.columbia.edu/~goldfeld/ErdosSelbergDispute.pdf.
[6] N. Levinson, A motivated account of an elementary proof of the prime number theorem,
Amer. Math. Monthly 76 (1969), 225–245, DOI 10.2307/2316361. http://www.maa.org/
sites/default/files/images/upload_library/22/Ford/NormanLevinson.pdf. MR0241372
[8] G. F. B. Riemann, Über die Anzahl der Primzahlen unter einer gegebenen Grösse, Monats-
ber. Königl. Preuss. Akad. Wiss. Berlin (1859), 671-680. http://www.maths.tcd.ie/pub/
HistMath/People/Riemann/Zeta/EZeta.pdf.
[9] A. Selberg, An elementary proof of the prime-number theorem, Ann. of Math. (2)
50 (1949), 305–313, DOI 10.2307/1969455. http://www.jstor.org/stable/1969455?seq=1#
page_scan_tab_contents. MR0029410
[10] T. Tao, Mertens’ theorems, https://terrytao.wordpress.com/2013/12/11/mertens-theorems/.
1949
Beurling’s Theorem
Introduction
The study of linear transformations between finite-dimensional spaces is the
purview of linear algebra. Analysis rarely enters into the discussion because it is
possible to show any two norms on the vector space Rn are essentially the same.
To be more specific, they give rise to the same open sets, closed sets, convergent
sequences, continuous functions, and so forth. The study of linear transformations
between infinite-dimensional normed vector spaces is called operator theory. There
are lots of things that can go wrong when one steps up to the infinite-dimensional
setting; we examine of few of them below. The interplay between linear algebra
and analysis is one of the great appeals of the subject.
A norm on a vector space V is a function · : V → [0, ∞) that satisfies
(a) v = 0 if and only if v = 0,
(b) cv = |c|v for all v ∈ V and all scalars c, and
(c) u + v ≤ u + v for all u, v ∈ V.
The Euclidean norm x = (x21 + x22 + · · · + x2n )1/2 on Rn is an example, as are

x1 = |x1 | + |x2 | + · · · + |xn | and x∞ = max |x1 |, |x2 |, . . . , |xn | , (1949.1)
in which x = (x1 , x2 , . . . , xn ); see Figure 1.

Consider the space C[a, b] of continuous, real-valued functions f : [a, b] → R,
endowed with the norm
f ∞ = sup |f (x)|.
x∈[a,b]
This induces a metric d∞ (f, g) = f − g∞ with respect to which C[a, b] is a

complete metric space (a sequence converges with respect to d∞ if and only if it
converges uniformly). Let P denote the subspace of all polynomial functions; it
is infinite dimensional since it contains polynomials of every finite degree. The
celebrated Weierstrass approximation theorem (see the comments) asserts that P
is dense in C[a, b]: for each f ∈ C[a, b], there is a sequence of polynomials pn that
converges to f with respect to d∞ . In finite-dimensional spaces, this sort of thing
is impossible: a proper subspace cannot be dense in the whole space. For example,
a plane through the origin cannot be dense in R3 .
An often used, but underappreciated, result in linear algebra is: if A, B are
n × n matrices and AB = I, then BA = I; that is, a right inverse is a left inverse,
193
194 1949. BEURLING’S THEOREM
e2
e2
0 e1 0 e1
(a) {x ∈ R2 : x 1 ≤ 1} (b) {x ∈ R2 : x ∞ ≤ 1}
Figure 1. The closed unit balls for the norms (1949.1) on R2 have
corners. The corners are extreme points: they do not lie in any
open line segment that joins two points of the closed ball. Here
e1 = (1, 0) and e2 = (0, 1).
and vice versa. This fails miserably in infinite dimensions. If

⎡ ⎤ ⎡ ⎤
0 1 0 0 ··· 0 0 0 0 ···
⎢0 0 1 0 · · ·⎥ ⎢1 0 0 0 · · ·⎥
⎢ ⎥ ⎢ ⎥
A = ⎢0 0 0 1 · · ·⎦ ⎥ and B = ⎢0 1 0 0 · · ·⎥ , (1949.2)
⎣ ⎣ ⎦
.. .. .. .. . . .. .. .. .. . .
. . . . . . . . . .
then one can verify that AB = I and BA = I. However, we should be a bit more
formal about the space upon which these matrices act.
Consider the complex vector space 2 (N) of all complex, square-summable infi-

nite sequences x = (x0 , x1 , . . .); the norm on 2 (N) is x = ( ∞
n=0 |xn | )
2 1/2
. The
matrices A and B from (1949.2) operate on x ∈ (N) as follows:
2
A(x0 , x1 , . . .) = (x1 , x2 , . . .) and B(x0 , x1 , . . .) = (0, x0 , x1 , . . .).

That is, A is the backward shift operator and B is the forward shift operator. In
an interesting twist, observe that A is onto but not one-to-one, and B is one-to-one
but not onto. Linear algebra tells us that both of these situations are impossible
for a linear transformation from Rn to itself.
At this point we insist that our subspaces are topologically closed; this avoids
some peculiarities that would take us too far afield. Critical to the understanding
of any linear transformation is a careful study of its invariant subspaces. These
are subspaces that are mapped into themselves by an operator. For instance, each
one-dimensional invariant subspace of an operator is spanned by an eigenvector.
A complete understanding of the invariant subspaces of an operator on Cn reveals
its Jordan canonical form. What are the invariant subspaces of the forward shift
operator? In other words, what are the (topologically closed) subspaces of 2 (N)
that B maps into itself?
The answer to this question, found by Arne Beurling (1905–1986) [3], requires
a radical change of perspective. Instead of 2 (N), we must consider the related
Hardy space H 2 (named after G. H. Hardy; see the 1940 entry), which consists of

complex power series f (z) = ∞ n
n=0 an z for which

∞ 1/2
f = |an |2
n=0
is finite. Each function f ∈ H 2 is analytic (see p. 151) on the open unit disk
D = {z ∈ C : |z| < 1}. The spaces 2 (N) and H 2 are fundamentally the same; they
are relabelled versions of each other. The shift operator on 2 (N) is essentially the
same as the operator that maps f (z) to zf (z): multiplication by z shifts the Taylor
coefficients of f .
Beurling’s theorem asserts that the nontrivial invariant subspaces for the for-
ward shift operator are all of the form uH 2 = {uf : f ∈ H 2 }, in which u is an
inner function. What is an inner function? An inner function is a bounded ana-
lytic function on D whose boundary values on the unit circle have absolute value 1
“almost everywhere.” For example, a Möbius transformation (see the 1956 entry)
of the form u(z) = (a − z)/(1 − az) with a ∈ D is an example, as are finite prod-
ucts of such functions. An important factorization theorem asserts that each inner
function factors as
∞ π it !
|zn | zn − z e +z
iγ N
e z exp − it
dμ(e ) ,
n=1 n
z 1 − zn z −π e − z
it
in which γ ∈ [0, 2π) is a real constant, N ≥ 0, z1 , z2 , . . . is a (possibly finite or

vacuous) list of points in D that satisfy the Blaschke condition ∞ n=1 (1 − |zn |) < ∞
(this ensures that the infinite product converges on D), and dμ is a nonnegative
singular measure [6].
This is a lot to digest! The point to take away is that Beurling’s theorem pro-
vides an unexpected link between operator theory and complex analysis. Moreover,
the solution to a concrete theorem about a specific linear transformation boils down
to a deep factorization theorem for a certain class of analytic functions. For an even
more unexpected link between two areas of mathematics, see the 1985 entry.

Give a complete description of the (topologically
/x closed) invariant subspaces
for the Volterra integration operator [V f ](x) = 0 f (t) dt on C[0, 1]. In particular,
show that V has no eigenvalues.
1949: Comments
Left and right inverses. Here are two proofs that left and right inverses
coincide for n × n matrices [4]. Both involve finite dimensionality in a crucial way
and avoid the unnecessary use of determinants. You may wish to consider at which
point the proofs break down for the infinite matrices in (1949.2).
(a) Let A, B ∈ Mn . If AB = I, then A(Bx) = x for all x ∈ Rn . Thus,

columnspace A = Rn and hence nullspace A = {0} by the dimension theo-
rem. Let I − BA = [x1 x2 . . . xn ] be written columnwise. Then,
[Ax1 Ax2 . . . Axn ] = A[x1 x2 . . . xn ] = A(I − BA)
= A − (AB)A = A − IA = 0,
so x1 , x2 , . . . , xn = 0. Thus, I − BA = 0 and BA = I.
2
(b) Let A, B ∈ Mn and suppose that AB = I. The n2 +1 matrices I, A, A2 , . . . , An
in Mn are linearly dependent, so there is a polynomial p of degree at most n2
such that p(A) = 0. Write p(z) = cz j f (z), in which j ≥ 0, c ∈ C is nonzero,
and f (z) = ak z k + ak−1 z k−1 + · · · + a1 z + a0 has a0 = 0. Then
0 = B k p(A) = cB k Ak f (A) = cf (A),
so f (A) = 0. Since AB = I, we find that
0 = f (A)B = (ak Ak + ak−1 Ak−1 + · · · + a1 A + a0 I)B
= ak Ak B + ak−1 Ak−1 B + · · · + a1 AB + a0 B
= (ak Ak−1 + ak−1 Ak−2 + · · · + a1 I) + a0 B.
Thus, B is a polynomial in A, so BA = AB = I.
Weierstrass approximation theorem. While a proof of Beurling’s theorem
would take us too far afield (see [5] for a modern approach), we can outline an ele-
gant proof of the Weierstrass approximation theorem that is due to Sergei Bernstein
(1880–1968) [2].
It suffices to consider [a, b] = [0, 1] since the linear function φ : [a, b] → [0, 1]
defined by φ(x) = (x − a)/(b − a) maps polynomials to polynomials and so does its
inverse. Let f ∈ C[0, 1]. The nth Bernstein polynomial for f is
n
k n k
(Bn f )(x) = f x (1 − x)n−k ; (1949.3)
n k
k=0
it is a polynomial of degree at most n. Let x ∈ [0, 1] and consider a coin flip with
probability of heads x and tails 1 − x, respectively. The probability of k heads in n
trials is P (k, n) = nk xk (1 − x)n−k . What do the Bernstein polynomials (1949.3)
represent? Think of f ∈ C[0, 1] as the “payoff function” for a coin tossing game: if
there are k heads in n trials, then you win f (k/n) dollars. The expected winnings
are (Bn f )(x). For large n, we expect k ≈ nx heads, so
(Bn f )(x) = Expected winnings after n tosses ≈ f ( nk ) ≈ f (x).
The uniform continuity of f on [0, 1] and some probability theory ensure that this
informal reasoning can be pushed through.
The Müntz–Szász theorem. Recall that the span of a set of vectors is the
collection of all finite linear combinations of elements of that set. The Weierstrass
approximation theorem says that span{1, x, x2 , . . .} is dense in C[a, b]. This suggests
the following question.
Let 0 = λ0 < λ1 < λ2 < · · · . What are necessary and sufficient
conditions for span{1, xλ1 , xλ2 , . . .} to be dense in C[a, b]?
This question has a precise and elegant answer, due independently to Herman
Müntz (1884–1956) [9] and Otto Szász (1884–1952) [11]. Let
S = span{1, xλ1 , xλ2 , . . .}−
denote the closure of span{1, xλ1 , xλ2 , . . .} in C[a, b], in which a > 0.
∞
1
(a) If = ∞, then S = C[a, b].
λ
n=1 n
∞
1
(b) If / {λn }∞
< ∞ and if λ ∈ n=0 , then x ∈
λ
/ S (so S is not dense in C[a, b]).
n=1
λ n
The proof of the Müntz–Szász theorem is beyond the scope of this course. Its key
ingredients are the Hahn–Banach theorem and Riesz representation theorem from
functional analysis and the Blaschke characterization of the zero sets of bounded
analytic functions on the unit disk. A proof can be found in [8].
Here are two curious corollaries of the Müntz–Szász theorem. Let a > 0.
(a) If C[a, b] = span{1, xλ1 , xλ2 , . . .}− , then there is an infinite subsequence of the
λn that can be removed from the collection {1, xλ1 , xλ2 , . . .} so that the span
of the new collection is also dense in C[a, b].
(b) span{1, x2 , x3 , x5 , x7 , x11 , x13 , x17 , x19 , . . .} is dense in C[a, b]. This follows
from Euler’s proof that the sum of the reciprocals of the primes diverges;
see the notes for the 1913 entry.
Solution to the problem. Now for the answer to our question, which (in the
L2 [0, 1] Hilbert space setting) was first asked by Israel Gelfand [7]. The invariant
subspaces of the Volterra integration operator are all of the form {f ∈ C[0, 1] :
f (x) = 0 for x ∈ [0, a)} for some a ∈ [0, 1]. Thus, the invariant subspaces form
an uncountably, linearly ordered chain of subspaces of C[0, 1]. This result was
established in 1949 by Shmuel Agmon (1922– ) [1], with later proofs given by
several others authors (the most influential proof being that of Donald Sarason
(1933–2017) [10]).
Bibliography
[1] S. Agmon, Sur un problème de translations (French), C. R. Acad. Sci. Paris 229 (1949),
540–542. MR0031110
[2] S. N. Bernstein, Démonstration du théroème de Weierstrass, fondeé sur le calcul des proba-
bilités, Commun. Soc. Math. Kharkow (2) 13, 1–2.
[3] A. Beurling, On two problems concerning linear transformations in Hilbert space, Acta Math.
81 (1948), 17, DOI 10.1007/BF02395019. MR0027954
[4] S. R. Garcia and R. A. Horn, A Second Course in Linear Algebra, Cambridge Mathematical
Textbooks, Cambridge University Press, 2017.
[5] S. R. Garcia, J. Mashreghi, and W. T. Ross, Introduction to model spaces and their oper-
ators, Cambridge Studies in Advanced Mathematics, vol. 148, Cambridge University Press,
Cambridge, 2016. MR3526203
[6] J. B. Garnett, Bounded analytic functions, 1st ed., Graduate Texts in Mathematics, vol. 236,
Springer, New York, 2007. MR2261424
[7] I. M. Gelfand, A problem (Russian), Uspehi Matem. Nauk, 5 (1938), 233.
[8] P. D. Lax, Functional analysis, Pure and Applied Mathematics (New York), Wiley-
Interscience [John Wiley & Sons], New York, 2002. MR1892228
[9] C. H. Müntz, Über den Approximationssatz von Weierstrass, H. A. Schwarz’s Festschrift

(1914), 303–31.
[10] D. Sarason, A remark on the Volterra operator, J. Math. Anal. Appl. 12 (1965), 244–246,
DOI 10.1016/0022-247X(65)90035-1. MR0192355
[11] O. Szász, Über die Approximation stetiger Funktionen durch lineare Aggregate von Potenzen
(German), Math. Ann. 77 (1916), no. 4, 482–496, DOI 10.1007/BF01456964. MR1511875
1950
Arrow’s Impossibility Theorem
Introduction
Kenneth J. Arrow (1921–2017) was awarded the Nobel Prize in Economics
in 1972. Among the contributions cited in the prize committee’s statement was
the “possibility theorem” from his doctoral dissertation on voting theory that was
published as the book Social Choice and Individual Values [1–3]. Arrow set out
to determine the best election procedure and narrowed the set of all procedures
by requiring them to satisfy a number of desirable properties. These properties
were called axioms because they represented what Arrow believed were, in some
sense, the most natural properties that an election procedure should satisfy. Arrow
showed that no election procedure satisfies the axioms, which we describe below,
when two or more voters decide among three or more candidates. That is, the
axioms are inconsistent (see the notes for the 1924 entry for another example of
inconsistent axioms). His result is now referred to as Arrow’s impossibility theorem.
Assume that each of m ≥ 2 voters can rank order n ≥ 3 candidates, listing
them from most preferred to least preferred. An election procedure aggregates the
voters’ rankings and produces a societal ranking of the candidates. A version of
Arrow’s theorem from 1963 (the second edition of [1]) says that there is no election
procedure that satisfies the following three axioms.
• Pareto condition: If every voter prefers A over B, then the group ranks A above
B.
• Nondictatorship: There is not a single voter who is able to determine the group’s
rankings (that is, there is no dictator).
• Independence of Irrelevant Alternatives (or IIA): The societal ranking between
candidates A and B should only depend on the voters’ preferences for A and B.
The third axiom perhaps requires a bit more explanation. It asserts that for
a society to rank A and B, it is irrelevant to factor in how the voters rank other
candidates. For example, suppose that several voters change the relative rankings
of B and C. This should not affect how the society ranks A and B. It may, of
course, affect how the society ranks B and C.
To appreciate Arrow’s theorem, we need to go back to the beginnings of voting
theory. Nicolas de Condorcet (1743–1794), whose full name was Marie Jean Antoine
Nicolas de Caritat, Marquis de Condorcet, was a French mathematician, political
scientist, and philosopher. Although he published several papers on differential
and integral calculus, in mathematics he is most famous for observing a fundamen-
tal paradox in voting theory, which we describe below. He died in prison under
mysterious circumstances (some suggest poison) during the French revolution.
199
200 1950. ARROW’S IMPOSSIBILITY THEOREM
A candidate is the Condorcet winner if the candidate defeats every other can-
didate in a pairwise election (by being preferred by more than half of the voters
to every other candidate in a head-to-head competition). However, not every col-
lection of voters’ preferences has a Condorcet winner. In his 1785 paper Essai sur
l’application de l’analyse à la probabilité des décisions rendues à la pluralité des
voix, Condorcet made a remarkable observation. Suppose that three voters have
the following preferences for candidates A, B, and C:
voter 1 voter 2 voter 3
First choice A B C
Second choice B C A
Third choice C A B
Suppose that C is removed from consideration. Then we have the voter preferences
voter 1 voter 2 voter 3
First choice A B A
Second choice B A B
Consequently, A would defeat B in a pairwise election (denoted A B) because A
would receive two first-choice votes (from voters 1 and 3) and B would receive only
one such vote (from voter 2). Similarly, B would defeat C in a pairwise election
and C would defeat A. Notationally, this is represented by
ABCA
and is referred to as a Condorcet cycle. A Condorcet cycle can involve more candi-
dates. For example, we might have
A B D E C A.
If a Condorcet cycle contains all of the candidates in an election, then that election
does not have a Condorcet winner.
An election procedure satisfies the Condorcet winner criterion (CWC) if the
following holds:
If a Condorcet winner exists, then the election procedure always has
the Condorcet winner ranked first.
A weaker and easily accessible version of Arrow’s impossibility theorem requires just
two axioms, IIA and the Condorcet winner criterion (CWC), but also supposes that
the election procedure returns a top-ranked candidate; see [5, p. 343] for details.

Proposed by Michael Jones, Mathematical Reviews.
It is possible to show that an election procedure that satisfies IIA and CWC
cannot return a single, top-ranked candidate for the three-voter Condorcet cycle
above. This idea can be extended. In an election between n candidates, a set of
candidates C are a top cycle if the candidates in C all defeat the candidates not in
C in pairwise contests and if there is a Condorcet cycle among all candidates in C.

For example, the Condorcet cycle for the three-voter, three-candidate case above is
a top cycle.
For three candidates, there are two possible top cycles that involve all three
candidates: A B C A and A C B A. For n-candidate elections, how
many top cycles are possible?
1950: Comments
Penrose–Banzhaf power index. In the discussion above we gave each voter
the same weight. However, this is frequently not the case in practice. Consider the
following two situations for a private firm. In the first, Adams owns 90% of the
stock, Buchanan owns 8%, and Cleveland has the remaining 2%. In the second,
Adams has 45%, Buchanan 35%, and Cleveland 20%. How much are their shares
worth in each case, assuming that if over 50% of the stock supports a plan, then
that plan will be done?
Adams effectively controls the company in the first setting, since she can do
whatever she wants and the other two cannot outvote her. The second case is more
interesting, since any two of the three suffice to control the company. Thus, in this
setting, it is reasonable to say each share is worth a third of the company.
More generally, we define the Penrose–Banzhaf power index , named after Lionel
Penrose (1898–1972) [7] and John F. Banzhaf III (1940– ) [4]) as follows. A winning
coalition is a sets of voters that is sufficient to pass a measure. That is, if every
member of a winning coalition votes “yes,” then the measure is guaranteed to pass.
A swing vote is an additional vote necessary for a particular coalition; without that
“yes” vote, the coalition is not sufficient to pass the measure. The power index of
an individual is the fraction of all possible swing votes that they cast, that is, their
percentage of all swing votes in all winning coalitions.
In our first example, there are four winning coalitions (any coalition with
Adams), and Adams casts every swing vote, so Adams’s power index is 1 while
Buchanan and Cleveland both get 0. In our second example, there are seven win-
ning coalitions (any coalition with at least two voters), and each voter has the same
number of swing votes, so they all have power index 1/3. This index is useful in
evaluating many real-world voting schemes, such as the United States Electoral
College or the European Union.
Voting in Venice. The most convoluted voting system that the authors are
aware of was the method used by the Republic of Venice to elect its Doge; see
Figure 1. It was established in 1268 and remained in use until the ignominious
fall of the Most Serene Republic in 1797. The details and particulars are so mind-
bogglingly complicated that we have no choice but to quote John Julius Norwich
(1929–2018), an authority on the subject [6, p. 166]. Surely we would get the finer
points incorrect otherwise!
On the day appointed for the election, the youngest member of the
Signoria was to pray in St Mark’s; then on leaving the Basilica, he was
to stop the first boy he met and take him to the Doges’ Palace, where
the Great Council, minus those of its members who were under thirty,
was to be in full session. This boy, known as the ballotino, would
202 1950. ARROW’S IMPOSSIBILITY THEOREM
Figure 1. The 17th-century Basilica di Santa Maria della Salute

in Venice.
have the duty of picking the slips of paper from the urn during the
drawing of lots. By the first of such lots, the Council chose thirty of
their own number. The second was used to reduce the thirty to nine,
and the nine would then vote for forty, each of whom was to receive at
least seven nominations. The forty would then be reduced, again by
lot, to twelve, whose task was to vote for twenty-five, of whom each
this time required nine votes. The twenty-five were in turn reduced to
another nine; the nine voted for forty-five, with a minimum of seven
votes each, and from these the ballotino picked out the names of eleven.
The eleven now voted for forty-one—nine or more votes each—and it
was these forty-one who were to elect the Doge. They first attended
Mass, and individually swore an oath that they would act honestly
and uprightly, for the good of the Republic. They were then locked in
secret conclave in the Palace, cut off from all contact or communication
with the outside world and guarded by a special force of sailors, day
and night, until their work was done.
So much for the preliminaries; now the election itself could begin.
Each elector wrote the name of his candidate on a paper and dropped
it in the urn; the slips were then removed and read, and a list drawn
up of all the names proposed, regardless of the number of nominations
for each. A single slip for each name was now placed in another urn,
and one drawn. If the candidate concerned was present, he retired
together with any other elector who bore the same surname, and the
remainder proceeded to discuss his suitability. He was then called back
to answer questions or to defend himself against any accusations. A
ballot followed. If he obtained the required twenty-five votes, he was
declared Doge; otherwise a second name was drawn, and so on.
With a system so tortuously involved as this, it may seem remark-
able that anyone was ever elected at all.
Solution to the Centennial Problem. A top cycle requires three or more

candidates. For a set of k candidates, there are (k − 1)! circular permutations of
the k candidates. Hence, there are (k − 1)! top cycles for a set of k candidates. For
n-candidate elections, the number of possible top cycles is
n n n
n n!(k − 1)! n!
(k − 1)! = = .
k k!(n − k)! k(n − k)!
k=3 k=3 k=3
Bibliography
[1] K. J. Arrow, Social Choice and Individual Values, Cowles Commission Monograph No. 12,
John Wiley & Sons, Inc., New York, N. Y.; Chapman & Hall, Ltd., London, 1951. MR0039976
[2] K. J. Arrow, Social Choice and Individual Values (second edition), Cowles Foundation Mono-
graphs Series 12, Yale University Press, 1963.
[3] K. J. Arrow, Social Choice and Individual Values (third edition), Cowles Foundation Mono-
graphs Series 12, Yale University Press, 2012. http://www.jstor.org/stable/j.ctt1nqb90.
[4] J. F. Banzhaf, Weighted voting doesn’t work: a mathematical analysis, Rutgers Law Re-
view 19 (1965), no. 2, 317–343. http://heinonline.org/HOL/Page?handle=hein.journals/
rutlr19&div=19&g_sent=1&collection=journals.
[5] COMAP, For All Practical Purposes (ninth edition), W. H. Freeman and Company, 2013.
[6] J. J. Norwich, A History of Venice, Vintage Books, 1989.
[7] L. Penrose, The elementary statistics of majority voting, Journal of the Royal Sta-
tistical Society 109 (1946), no. 1, 53–57. http://www.jstor.org/stable/2981392?seq=1#
page_scan_tab_contents.
1951
√
Tennenbaum’s Proof of the Irrationality of 2
Introduction
√
There are now hundreds of proofs of the irrationality of 2. Perhaps the most
familiar is the following:
√
Suppose toward a contradiction that 2 = a/b, in which a, b are rela-
tively prime integers and b = 0. Squaring the preceding equation, we
obtain 2b2 = a2 . This shows that a2 is even, so a is even too. Write
a = 2c with c a positive integer, so that 2b2 = (2c)2 = 4c2 . Thus,
b2 = 2c2 . This shows that b2 , and hence b itself, is even. Thus,√2
divides both a and b, which is a contradiction. We conclude that 2
is irrational.
It is worth noting that we used the fact that 2 is a prime number. Indeed, if p is
a prime number that divides a perfect square a2 , then p divides a itself. This does
not hold in general since, for example, 4 divides 36 = 62 , but 4 does not divide 6.
Sometime in the 1950s Stanley Tennenbaum came up with the √ following geo-
metric gem (see Figure 1). Suppose toward a contradiction that 2 = a/b, in
which the positive integer b is as small as possible. Consider a square with sides
of length a, and draw squares of side length b in the upper-left and lower-right
corners. Since a2 = 2b2 , the area of the two squares of side length b equals that of
the large square of side length a. The figure suggests that these two squares miss
two small squares with side length a − b and double count a square with side length
2b − a. Consequently, the double counted region must have√the same area as the
two missing squares; that is, (2b − a)2 = 2(a − b)2 . Thus, 2 = (2b − a)/(a − b)
b
a + =
2b − a
a−b
√
Figure 1. Illustration of Tennenbaum’s proof of the irrationality of 2.
205
√
206 1951. TENNENBAUM’S PROOF OF THE IRRATIONALITY OF 2
Figure 2. A misleading “proof by picture.” The large triangles

have the same area, so subtracting the areas of the four congruent
pieces from each large triangle yields the same area. Thus, 0 = 1.
and√a little more work shows 0 < a − b < b. This contradicts the minimality of b,
so 2 is irrational. For a discussion of the history of Tennenbaum’s proof, see [4].
Is Tennenbaum’s proof valid? One must always be wary about “proofs by
picture.” There are many appealing visual “proofs” that are wrong; see Figure 2.
Fortunately, the geometric intuition used in Tennenbaum’s proof can be formalized.
It is instructive to see what the fundamental ingredients are and how the proof
√ can
proceed without the use of diagrams. Suppose toward a√contradiction that 2 is
rational and let b be the smallest natural number so that 2 = a/b for some integer
a. More explicitly, the well-ordering principle ensures that
√
b = min{n ∈ N : 2n ∈ Z}
exists. We claim that a > b; if not, then we obtain a contradiction:
a ≤ b =⇒ 2
= a2
2b ≤ b2 =⇒ 2 ≤ 1.
√
since 2 = a/b
Since 2 > 1, we must

√ have a > b. A few algebraic manipulations lead us to another
representation of 2 as a quotient of integers:
√ √
√ √ 2−1 2− 2 2− a 2b − a
2 = 2 √ = √ = a b = . (1951.1)
2−1 2−1 b − 1 a−b
√
Since a − b > 0 and 2 > 0, it follows from the preceding that 2b − a > 0 and hence
0 < a − b < b.
√
Because (1951.1) contradicts the minimality of b, we conclude that 2 is irrational.

√
Tennenbaum’s construction is beautiful and gives the irrationality of 2. Steven
J. Miller and√ David
√ √Montague√ used similar geometric arguments to get the irra-
tionality
√ of
√ 3, 5, 6, and 10 [3]. Can you geometrically prove the irrationality
of 7 or 3 2?
1951: Comments
The square root of 2 has been called the Rome of mathematics, for all roads
lead to it [2, p. 207]. Here are some lesser-known proofs for your enjoyment.
Linear representation of the gcd. We begin by exploring a consequence of

the Euclidean algorithm: if a, b are two nonzero integers, then there exist integers
x and y such that
gcd(a, b) = ax + by;
see [5]. In other words, the greatest common divisor of a and b is an integral linear
combination of a and b. In the language of algebra, this says that the ring Z is a
principal ideal domain. We can use this property of the integers to give a quick
√
proof that n is irrational if the natural number n is not a perfect square. Suppose
√
that n = a/b, in which a, b are relatively prime. Then there exist x, y ∈ Z so that
√ √
1 = ax + by. Since na = bn and nb = a, it follows that
√ √ √ √
n = n(ax + by) = ( na)x + ( nb)y = bnx + ay
√
is an integer. This contradicts the hypothesis that n is not a perfect square, so n
is irrational.
Analytic proof. Although this proof appears unnecessarily complicated, it

contains techniques that can be recycled to great effect √
in the theory of Diophantine
approximation. Assume toward a contradiction that 2 = p/q, in which p, q are
integers and q ≥ 1. Let √
en = ( 2 − 1)n
and observe that √
0 < 2−1 < 1
2. (1951.2)
Indeed, arithmetic confirms that (1951.2) is equivalent to 1 < 2 < 9/4, so we can
establish (1951.2) without the use of a calculator or decimal expansions. It follows
from (1951.2) and the definition of en that
1
0 < en < (1951.3)
2n
for all n ∈ N. Now observe that for each n ∈ N there exist integers an , bn such that
√
en = an + bn 2. (1951.4)
Although this statement can be proved by induction, it also follows from the bino-
mial theorem:
n
√ n √ n
en = ( 2 − 1)n = ( 2) (−1)n−k .
k
k=0
√
208 1951. TENNENBAUM’S PROOF OF THE IRRATIONALITY OF 2
√
Since the binomial coefficients nk are integers and since ( 2)n is either an integer
√
or an integer times √2, the desired formula (1951.4) follows. From (1951.4) and
the assumption that 2 is rational, we have

√ p an q + bn p cn
en = an + bn 2 = an + bn = = ,
q q q
in which cn is an integer. Since en = 0, it follows that cn ≥ 1 and hence en ≥ 1/q.
In light of (1951.3), we find that
1 1
≤ en < n
q 2
for every n ∈ N. However, the resulting
√ inequality 2n < q fails for sufficiently large
n. This contradiction shows that 2 is irrational.
Proof by overkill. Our final gem is not self-contained, but it is worth the
setup (if only for its humor value) [1]. The following result was first conjectured
by Pierre de Fermat in 1637 and remained one of the most famous open problems
in mathematics until its resolution in 1994 by Andrew Wiles. Its proof requires
some of the most sophisticated and powerful tools of modern number theory; see
the 1995 entry.
Theorem (Fermat’s Last Theorem). There do not exist natural numbers a, b, c,
n that satisfy an + bn = cn if n ≥ 3.1
√
n
We are now in a position
√ to prove that 2 is irrational for n = 3, 4, 5, . . ..
Suppose that n ≥ 3 and n 2 = a/b, in which a, b ∈ N. Raise both sides to the nth
power and obtain an = 2bn = bn + bn , in violation of Fermat’s last theorem.2 It
has been remarked sarcastically that Fermat’s last theorem is not strong enough to
handle the case n = 2.
Bibliography
[1] R. Ehrenborg, An observation, Amer. Math. Monthly 110 (2003), no. 5, 423.
[2] S. R. Garcia, S. J. Miller, 100 Years of Math Milestones: The Pi Mu Epsilon Centennial
Collection, American Mathematical Society, 2019.
[3] S. J. Miller and D. Montague, Picturing irrationality, Math. Mag. 85 (2012), no. 2, 110–114,
DOI 10.4169/math.mag.85.2.110. http://arxiv.org/pdf/0909.4913.pdf. MR2910300
[4] P. Tennenbaum (personal communication), http://web.williams.edu/Mathematics/
sjmiller/public_html/math/papers/Neode_Tennenbaum.PDF.
1 If
n = 2, then there are many solutions, such as 32 + 42 = 52 and 52 + 122 = 132 .
2 The
author of [1] attributes this observation to William Henry Schultz, an undergraduate
at UNC Charlotte.
1952
NSA Founded
Introduction
The National Security Agency (NSA) is a high-technology organization on the
frontiers of communications and data processing. Created by President Harry S.
Truman (1884–1972) in 1952, the NSA coordinates, directs, and performs highly
specialized activities to protect U.S. information systems. Moreover, it also pro-
duces vital foreign intelligence information for U.S. policy makers and the U.S.
military. Virtually every mathematical discipline finds some application within the
NSA. In addition to cryptology, mathematicians at the NSA work on problems in
signal analysis, speech processing, coding theory, data compression, communication
networks, computer security, and other areas.
The NSA is the country’s largest employer of mathematicians and provides op-
portunities for both summer internships and full-time employment. The Director’s
Summer Program (DSP) and the CASASP (Cryptanalysis and Signals Analysis
Summer Program) are summer internships open to undergraduate students ma-
joring in mathematics. The Graduate Mathematics Program (GMP) is a summer
internship available to graduate students. Additionally, the NSA hires full-time
mathematicians and statisticians year round at every degree level. For more infor-
mation about these opportunities and how to apply, visit https://www.nsa.gov/
careers/.
Figure 1. The NSA headquarters in Fort Meade, Maryland.

Image public domain from Wikipedia.
209
210 1952. NSA FOUNDED

Proposed by NSA Cryptomathematics Institute.
Pretend that you are a cryptanalyst at Bletchley Park during World War
II, assigned to work on the decryption of strategic communications between Ger-
man High Command and Army Group commanders. The underlying teleprinter
code is a 32-symbol alphabet that represents the five-bit quantities 00000 through
11111. These teleprinter communications are encrypted with a device called the
Schlüsselzusatzgeraet 1940, code named TUNNY by the British.
Little is known about the TUNNY machine. It is believed to be a key-additive
machine, which means that the ciphertext is the XOR of the plaintext and key.
TUNNY messages begin with a 12-letter message indicator which has led to spec-
ulation that the encipherment process of the TUNNY machine involves 12 wheels.
Further analysis of the message indicators collected over the last several months
shows that the first eleven indicators take on 25 distinct values (every letter except
J) while the last indicator only takes on 23 distinct values.
Recently, a significant breakthrough against TUNNY was obtained. Two mes-
sages with the same indicator were intercepted and it soon became apparent that
they were slight variations on the same message. Arduous work by a colleague
succeeded in reading this message. This confirmed that the TUNNY machine is
key-additive and produced a stretch of over 4,000 key characters. Your job is to use
these key characters to determine, as much as possible, how the TUNNY machine
operates. The key file is reproduced below with base-32 encoding 0-9,A-V.
K5FSVEJ238VE49QT2DQP4P28J8ST6PGJ69GJPAA0MK99U9DM243OH8PN514N80G3
8Q58FTFHCUH61IVPOR2TGBITUTIV56KU56TVQ0V71CV8TMFSPM2HGOP228FGTSIM
3TSP8KHA7GG34QTCN83O7OP8JOJ450VS5J60EPMDUV70QPE5SHHS88A6OPIONQ08
V1KDULE9JG9KN4RLCKUL51MJ3AMMGBQDSMDUJESP0Q36G48E7SH2LA41TT29G4C5
NOF4TSIHC3I1OJ8JJDUPNO6UF2TUDURELMBQHHKDR7Q3CMOG44JEHADQUPID567O
P2N02I92H6DIFPCJKJTR30HQ718SH2PS3OB8RM6K4BCT08VC8NUPJ4B0365MHEQ8
1GJ4FG5LQ4AE3PCF1QL030NAE9PG0D4OSSRKDAVGJS1GBMQVVK7G919NLM7ETIBC
PA7OF3D1JAQ71UME1IHOEVR9MRLJEEB4C17LIC6TK9MGQHO1MHOHGJ25MPF6OU9E
LO4RVHVGGKVS5ANK2QOAPQJAH9DU4R5Q63878JQ5DUAKV4NQJCVOVC9KJC7T8GCM
J5TM9TVG76T194MT05SLKU20HFCVBTT6LQ18N80J0BIFCA5FBUPEFGBDRRCR0P2H
ALQ1PGI4EDRHIKKCNVQDM3FFA94EECOLEM3R2N4BMDOV2TUPEFTMKH3KHGLOFSLS
N26BE6SV762N3F8RA15TN8RO9A1IJ819HU0IL8NORDPCHD25MT81QT8JA0G3HHO1
APTKS8MJDH86TFQNFJU8SNJ83MLAFKR4PCFGF2MKK5OBCO3SR0RQ9QLEPUES0F4L
1LIPD6P8SK7RC9FDR6RTC0UCNFIF4R65KF2AIFMFSGM57TSJL9DQJLLQ0BJL27MK
RS9KIAL9EVIHQTKPEDOTO4LQ4FLDIL6T25916DVRABS88T7T8USI0ULUBGQQC7MU
DPNE45JUEOL6P9ERO78RO3UJ8JSF6DJ49AJO65IL6DDEHHHSTQ25K7K2SN2I7STQ
5DC8VCBC4PDHM46TG56T5BB02VSR75I2DIG73NGT87UIAMF47VS86EL035HV52NE
HM1M72J092NG1ALMVVQRFS73K6O9ATGDAPQ1JHGQGU5J8JITG5C9JGJ49GFD580N
3121EBGRCFSJE8USS9KE7RA4O7Q6QG9UUM1Q4HK83K4G326OHSE50NOS4H6TM5I5
4LA8S74FLO4AFCDIALAF8338K5TG3KRKP6J5GJ2A491HU66JHVCAML0J4BI34761
2QE2L610RNM699MFG74CCUVCJKMB3K7Q09U9FUUSPSDKF0TK74P2R8SNDRCDNPUC
1L0K78N05SN2H5MFJ7N8BE6I6A9EKTM96LNPO7L85UNC3SO9MT6HMDKLMLFJCJ9I
70LU34O4U1UPAG79VD2MUP8DQS07874A096FCVB2QV2QOF0R2JGBM5UF6NN0FGN0
2P0R4TM93TPBIPB0VE8627ENR5NB0J23S3QK5CNO5EVC5U03L92G6SD858CPQD8H
3D1QED6VL6L4UEBMPKUUVNSGI5OVVSG2JKB0RI92LM98H9UQM338B9ADGRADLAIP
AROKCE74NL6PT9VSPABODIBVSMM4JCFSLD2OM9OLRQKQF0OROV89ILE5JJ80R8LD
6MUL69QE4B2GB2L30I29O5A9A90PUD0VM94RTNHIESP69QBED62R59G3ITNNBU1F
1I49ILEOCS7CRON2LUFKCFR969O4C8JGULTJB4RSTGRPSI48R8K42CSNRILAJLLE
740TF8GLS3MLL65SB0F8BCHMK1GKCD87I1ONETA5UF6Q9ELET75SR1H1AG3EJSL6
UU4PUD6SB1G16PALODQG2FI18NDMD0I1LS04VARF8K1FBV3CTQ1VGRNBFA9IPUB2
PI12VTQDSREHLUGMU1B2Q95QJA5Q92DKFO7DATD3DU6CI91SQ90H6DQLOHB8O3UL
07SJ4H65IDULCEU523I587U5CP25U8HJ9LQDER4BCHMHDJJA3DEE7S7C352TM9EO
5DRCFSD07GF2KNA530RI962646RHNG3ST43QSR2108CRC4VDF4B0DGBSR49GN09I
395RC1C8BCFSLPOD6P6M8JSBDPT54EU6BATF7TMTCTE5M7IV4RNO46JS7CVA5UM7
D9RMA83GH5EGK7AJ98JT5A1GA4NOK5PE8G4KF4H27AJKJSJQ5AHI1EJR8EIHOG3G
1AR4DATRR0NGBKQ056A3JOBIBNVJ4NCCUD2VCDAKAPS1QHQ58LMOFFDVO1SI03A4
38NG2Q1EPPJEQ5SN2HK3TB1MDL7ME9G5ALM4HC745A6T8ATKT0BGR8JCDOFSR4HS
E6G2TSAK83S5HQM7C7EFO54EDPF9KNAP8BO7OPALULAACDUJGOODIPCK7HGVO38P
U95HSNVK9FFDSD2C0QR49UPADO3GS25N7A125AJG6IHM7K92L21HQ9VUNP5UVSB8
HSBOHOBMU5C1BHV8FKR3L47GVCL8BODIR831TQL4DUALA5K1PFS1ESAQLVOFTLV3
GHELA5U1CLI961GPUPDFU90JC3G129A2UO2C30U45PNCTVBI3QPR3EJVIUCLOFT4
CIV2TMJKTMF8K9IN47GNCU7FSSJ1SDA5OE0O56LEFRM3T64GCQAHSFPNGU9EM1S7
8NC12KA1ATS5ATUDST6C6N6RUHVR95QHA1UPKCIKUP3O2HVC93P1H4JB3E9U5E61
1A9E7UH66TL76HI5Q0PIRCNKSTU5ALQ9H37JLH9KK5EDAA087S10NTMTDP6D9MF4
H6D6TS90IBFTOCJLFGLKA1CRQ1GJBK15R7VNHSVVFBGT68O7MI7T0BOF44RLUKML
C14DGJGU9OO4PQ3OF8Q3DFS51FMK83JCH4RP1L65OLH2KHFCFOJ8A06T6TK9QTGB
FAOPIRNM5MPE48M21ENR0M0KHNM2HEUHPRE16VAVA3GV47J6HHNKDIRKRGN2DICH
JLSG6G3H132PUDBP92HA5D9DA9PCJD0QHGBNE9OJ69VSFCNAPU1R4AMB808I4H69
GAFSDA59HO4N3MIHGUF3HKPCTALMILDBMKJ755JB0N63S70NACFTKB229RDFIR07
G3GJ1FGRNEJP6JR64OU5AMHDM9A9698PQ9ACDGUD9V2TSKOGNDC3RT85I70R02J2
JM6120KJOJCHCLVG59L0IP7PKCU543IL0BSJCPQDIPNAPCULC5RE1QJ792HLVRER
4TOJIDC0JUVVMB44U98J9NKJOK31NG78PRU2LHNMFSK6V492GJ9G5TN83A1SJC6L
7TU9GQCTJ9LTADM56PGQVEHUPEFQT9QR9EVCC6TCGTDR0BUVB2V0L6NI7VETN102
THBOPUDOKD6P34G8KRUPGFAPQJH0MKBOL2FN4MSLSP942QHNQ9H9BO4JMJPL08V1
MFP3D2RK9IFU9MD0N5E4IL4NV0KBDSFCLEBN2N2TA96LB9NQ8DL65D8PU4T70CP7
CQSCCL4PI5GPIPUP09UJM2K2IO8RCDIJJ7O7AUQ9RS5IRU671CGO0PIJC03KJ85E
8C7O4TN5GF6T8B0V9KJCU5G44HI4RI5TIP9E2DORVILGGF7QMVUSC6JAVQD2DAV4
DM96TCB0GAEH15PH9QVUT2V8DHUNSFNL8V7P1J98JG3KS9QTORMHH9LBU830NO3O
5CN0BLPHTU37IL52KADKBC31A50N37MB6R77URIUHE1I5GNBGNO32JKB0J6FLA6S
5Q165EPBGV8FOR0QGDDODT75N2KR29A36E2TB03CH46SQOH0FGH27SLNR2D7IKMK
N5SG8VGBER340K197B6C9BHI83N3C2LM9RGCRK9QB5G7CFGOBDRGSLIRE548143C
1QV81QOR6MLAAB9PN6STR8L062FGLGTDJ03Q5T7165SP6DUTODJM2T54P6PQD9BU
NUQBVP5M00J4VDDCS4LQLIAKV07KV4PCFK0JJ7GJQ90RUO55DG7A5U1E77A2SN6N
VE6URPHND4VD67EHNGLKB0RG5G2P47M947474LQ87P7SI4M52RIBGFP8PM08IN3Q
1952: Comments
TUNNY business. Believe it or not, this is essentially how the solution of
the TUNNY problem progressed in 1941–1942. On August 30, 1941, two TUNNY
messages with the indicator HQIBPEXEZMUG were intercepted. Colonel John Tiltman
recognized that these messages were isologous, meaning that they were repeats of
the same underlying plaintexts. Since the two messages had numerous typographi-
cal errors and extraneous spaces, the messages soon got out of sync with each other
which allowed Tiltman to read both messages and recover a stretch of about 4,000
key characters. In January 1942, he made a cryptanalytic breakthrough (you will
212 1952. NSA FOUNDED
need to find it for yourself if you wish to decrypt the message) that allowed him to
ascertain the entire inner workings of the TUNNY machine.
Determining how TUNNY encipherment worked was just a single step towards
producing an ability to read TUNNY messages from an intercepted cipher. An
excellent account of Bletchley Park’s success against the TUNNY machine can be
found in [1]. A technical report on TUNNY, written by the codebreakers themselves
and containing a wealth of details about the machine and its exploitation, was
declassified in 2000 and is available at [3].
The Vigenère cipher. Let us look back at a more elementary cryptographic

method. We discussed the Caesar cipher in the 1936 entry. The Vigenère cipher
is a modification of the Caesar cipher in which the shift amount changes based
upon a pattern dictated by a keyword [2]. It is named after Blaise de Vigenère
(1523–1596), who wrote a book on cryptography entitled Traicté des Chiffres in
1586. However, the method appeared earlier in a 1553 book by Giovan Battista
Bellaso (1505–?). Here is how it works.
Alice and Bob agree that their keyword is PIMUEPSILON. This yields the shift
pattern 16 9 13 21 5 16 19 9 12 15 14 since P is the sixteenth letter after A, I is the
ninth, and so forth; see below.
A B C D E F G H I J K L M
1 2 3 4 5 6 7 8 9 10 11 12 13
N O P Q R S T U V W X Y Z
14 15 16 17 18 19 20 21 22 23 24 25 26
Alice encrypts the plaintext (omitting spaces)

HERE IS A MESSAGE ENCRYPTED WITH THE VIGENERE CIPHER USING
THE KEY PIMUEPSILON LET US HOPE THAT EVE IS NOT ABLE TO
FIGURE THIS ONE OUT
with the agreed-upon shift pattern and sends Bob the ciphertext
WMDY MHSU PGFP OQYR RJGA HRSE UNLI ZMGW TTVQ LIRA XSSE
JAUH KIZM VSLE QYOI EKQW CAAM FOWW GXPH UPBQ PIXK VZHN
QTQN SUAO FFRI PUMS CWWF HADE
Eve’s eavesdropping task is now much more formidable.
In the 1863 book Die Geheimschriften und die Dechiffrir-kunst (Cryptography
and the Art of Decryption), Friedrich Kasiski (1805–1881) outlined what is now
known as the Kasiski method for breaking the Vigenère cipher. However, Charles
Babbage (1791–1871) had privately developed the method earlier and used it to
decipher several messages encrypted with the Vigenère cipher. Although for many
years the Vigenère cipher was considered secure, it is now easily broken. The
following ciphertext is encrypted with a Vigenère cipher. Try to decode it.
JSUIK GDPQB JTKSF IQTWI ZHWLC HZIJO YIGBF VADWI FIVMP UWLGT
DQQIV ZKUHR GFHGK DKITV RTGTX PNMLX YWOPP CWTTC YIVCR HKTIM
EIOXV CGIEW SLGFL JVVLW VSAXK GOSPU HRQIJ LVVHV GGKSW ZGFGS
QSWUK MPTRV OOWMQ BISTM RYVCE ICPOI VCIMP RWLGZ HRIHK SHHKI
MSBAK HKXJO WCQIF EPRRJ TSTYG BFCCB DPAGL WCBGL QDHJW ZOCHW
QHVVH OGMZS TRDXV VDXRC LRVHK IFSFV ADWMQ BLWUW PTNSD RFGWV
CWJLV TRVYO UHICR HNIFO
For the solution, see the comments in the 1954 entry (we have other things to talk
about in the 1953 entry).
Bibliography
[1] J. Copeland (editor), Colossus: The Secrets of Bletchley Park’s Codebreaking Computers,
Oxford University Press, 2006.
MR3098499
[3] I. J. Good, D. Michie, and G. Timms, General report on TUNNY with emphasis on statistical
methods, www.alanturing.net.
[4] Wikipedia, Lorenz cipher, https://en.wikipedia.org/wiki/Lorenz_cipher.
1953
The Metropolis Algorithm
Introduction
Suppose, for the sake of simplicity, that Middle Earth is divided into three
demographic regions: Gondor, Rohan, and Mordor (sorry to disappoint fans of
Eriador, Forodwaith, Rhovanion, Rhûn, Harad, and so forth). Each year, 5% of
the residents of Gondor move to Rohan and 5% move to Mordor (property values
in Mordor are low and the climate is warm and dry). Of the residents of Rohan,
15% of them move to Gondor and 10% move to Mordor each year. Finally, 10% of
the residents of Mordor move to Gondor and 5% move to Rohan each year. What
percentage of the population will reside in each of the three regions after a long
period of time? What role does the original population distribution play?
Because of the complicated interactions between the three regions, it is difficult
to track the flow of residents throughout the system for more than one or two years
without some sort of visual or algebraic aid. We can make a diagram of our system
to present the data in an intuitive format; see Figure 1. Each node represents one
of the three regions of Middle Earth, and the arrows indicate the flow of residents
between regions. The sum of the outgoing arrows from each node in the diagram
must be 1 since the entire population of a given region needs to be accounted for.
For example, we know that 15% and 10% of Rohan residents move to Gondor and
Mordor, respectively, during a given year. What happens to the remaining 75% of
them? They stay put and remain in Rohan.
If the initial population is split between in a 40 : 60 : 10 ratio between Gondor,
Rohan, and Mordor, we define the initial probability vector p0 = [0.4 0.6 0.1]T .
More generally, we call any vector whose entries are nonnegative and sum to 1 a
probability vector . We wish to find the probability vectors p1 , p2 , . . . that describe
the population distributions in years 1, 2, . . . and so on. Moreover, we would like
to evaluate limn→∞ pn , if it exists, since this probability vector will reveal the
eventual state of our system (can you explain why the limit of probability vectors
is a probability vector?).
We compile our data in the transition matrix
⎡ ⎤ ⎡9 3 1
⎤
0.90 0.15 0.10 10 20 10
⎢1 1 ⎥
A = ⎣0.05 0.75 0.05⎦ = ⎣ 20 3
4 20 ⎦ ,
0.05 0.10 0.85 1 1 17
20 10 20
each column of which is a probability vector that describes the outgoing arrows from
the corresponding node in Figure 1. This matrix governs the population flow in
Middle Earth from one year to the next. The population after one year is described
215
216 1953. THE METROPOLIS ALGORITHM
0.15
0.90 Gondor Rohan 0.75
0.05
0.05 0.05
0.10 0.10
Mordor 0.85
Figure 1. Diagram for the Middle Earth problem. The sum of

the outgoing arrows from each node in the diagram is 1.
by the probability vector

⎡ ⎤⎡ ⎤ ⎡ ⎤
0.90 0.15 0.10 0.4 0.45
Ap0 = ⎣0.05 0.75 0.05⎦ ⎣0.6⎦ = ⎣0.47⎦ .
0.05 0.10 0.85 0 0.08
Thus, after one year 45%, 47%, and 8% of the population live in Gondor, Rohan,
and Mordor, respectively. What happens after two years? This is where matrix
arithmetic comes in: it keeps track of the complicated interactions between the three
regions over multiple years. The state of Middle Earth in year two is described by
⎡ ⎤
0.4835
p2 = Ap1 = A p0 = ⎣ 0.379 ⎦ .
2
0.1375
That is, we can apply A to the year-one result, p1 , to obtain the answer, or we
can use A2 to jump straight from p0 to the year-two result A2 p0 . In general,
pn = An p0 . Observe that the populations of the three regions appear to stabilize
rapidly; see Figure 2(a).
What happens if we change the initial settings? Suppose that p0 = [0.3 0 0.7]T ;
that is, 30% of the initial population resides in Gondor and 70% in Mordor. Figure
2(b) suggests that the populations stabilize at the same levels as before. Moreover,
the convergence again appears to be very rapid. Can we explain this behavior?
Since pn = An p0 describes the relative population levels of the three regions
after n years have elapsed, we wish to evaluate limn→∞ An p0 , assuming this limit
exists. How can we compute it? This is where linear algebra comes in: we attempt
to diagonalize A. If we can write A = SDS −1 , in which D is diagonal, then
An = (SDS −1 )(SDS −1 ) · · · (SDS −1 ) = SDn S −1 .

n times
(a) p0 = [0.4 0.6 0]T (b) p0 = [0.3 0 0.7]T
Figure 2. The relative populations of Gondor, Rohan, and Mor-

dor stabilize rapidly regardless of the initial probability vector p.
Since D is diagonal, limn→∞ Dn should be easy to compute and we obtain the limit
L = lim An = S( lim Dn )S −1 .
n→∞ n→∞
A little linear algebra of the eigenvalue-eigenvector sort tells us that
⎡9 ⎤ ⎡ 13 ⎤⎡ ⎤⎡ 7 ⎤
10
3
20
1
10 7 −1 1 1 0 0 24
7
24
7
24
⎢1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 1 ⎥ ⎢4 0 −2⎥ ⎢ ⎥⎢ 3 5 ⎥
⎦ ⎣0 5 0 ⎦ ⎣− 8
3 4 1
⎣ 20 4 20 ⎦ = ⎣ 7 8 8 ⎦.
1 1 17
1 1 1 0 0 10 7 1
− 125 1
20 10 20
12 12

A S D S −1
In particular, the eigenvalues of A are 1, 45 , and 10 7

and
⎡ ⎤
1 0 0 ⎡ ⎤
⎢ ⎥ 1 0 0
lim Dn = lim ⎢ 0 ( 54 )n 0 ⎥ ⎣
⎦= 0 0 0 .
⎦
n→∞ n→∞ ⎣
0 0 7 n
( 10 ) 0 0 0
We conclude from the preceding that

⎡ 13 ⎤
⎡ ⎤ 13 13
1 0 0 ⎢1
24 24 24
⎥
n ⎣ ⎦ −1
L = lim A = S 0 0 0 S = ⎣ 6 ⎢ 1 1 ⎥
n→∞ 6 6 ⎦.
0 0 0 7 7 7
24 24 24
The rate of convergence is determined by how fast Dn converges, which is deter-
mined by how quickly (4/5)n and (7/10)n tend to zero. These quantities tend to
zero exponentially fast, which explains the rapid convergence seen in Figure 1.
The columns of L = limn→∞ An are each equal to p∞ = [ 13 1 7 T
24 6 24 ] . Regardless
of the initial probability vector p0 , it follows that
lim pn = lim An p0 = Lp0 = p∞
n→∞ n→∞
since the sum of the entries of p0 is 1. Consequently, in the long run we expect
that about 54% of residents will be in Gondor, 17% in Rohan, and 29% in Mordor,
regardless of the initial population distribution. This limiting behavior is reflected
Figure 3. Much like the elves of Middle Earth, the population

distribution determined by p∞ = [ 13 1 7 T
24 6 24 ] is unchanged by the
passage of time.
1
Alice Bob
1
Figure 4. Diagram for the hot potato problem.
in Figure 2. Furthermore, the population distribution determined by p∞ is stable

in the sense that
Ap∞ = A( lim An p0 ) = lim An+1 p0 = p∞ ;
n→∞ n→∞
that is, it does not change with the passage of time; see Figure 3.
Unfortunately, the preceding approach does not always work. Suppose that
Alice and Bob are playing “hot potato.” Each round, Alice throws everything she
has to Bob and vice versa; see Figure 4. The transition matrix that governs the
system is !
0 1
A= .
1 0
In this case, limn→∞ An does not exist since

n I if n is even,
A =
A if n is odd.
Our previous method does not work since the eigenvalues of A are 1 and −1; the
expression (−1)n does not tend to a limit as n → ∞. Nevertheless, there is a stable
equilibrium for the system. What is it? If Alice and Bob each start with half a
potato, then they will each have half a potato in all subsequent rounds.
What if the transition probabilities depend upon time? What if there are
infinitely many different states? While it is wonderful to solve problems exactly
and with explicit parameter dependence, for many real-world problems this is not
remotely feasible. This year honors the Metropolis algorithm. It and various gener-
alizations have led to the explosive growth of Markov chain Monte Carlo (MCMC)
algorithms, which have revolutionized subjects such as statistical physics, Bayesian
inference, theoretical computer science, and financial mathematics by giving us the
ability to simulate almost anything in real time. For example we can simulate a
baseball game as a Markov chain. These results and applications are a natural
extension of the Monte Carlo methods from the 1946 entry.
A Markov chain is a random sequence of states, each of whose probabilities
depend iteratively on the previous state. Nicholas Metropolis (1915–1999) and his
colleagues realized in 1953 that Markov chains could be run on then-new electronic
computers to converge to, and hence sample from, a probability distribution of
interest [3]. The following problem concerns a Markov chain with an infinite number
of states.

Proposed by Jeffrey Rosenthal, University of Toronto.
Consider the special case in which the set of possible states is Z, the set of
integers. Let {pi }i∈Z be a positive probability distribution on Z, that is, a collection

of real numbers pi > 0 such that i∈Z pi = 1. In the notation of our previous
examples, one might think of the pi as the entries of an infinitely long (in two
directions) probability vector p∞ .
Let {ai,j }i,j∈Z be Markov chain transition probabilities, so ai,j equals the prob-
ability, given that the state at time n equals i, that the state at time n + 1 equals
j. One can think of A = [ai,j ] as an infinitely large transition matrix. Can we find
simple transition probabilities ai,j such that the chain “converges to p∞ ”?
The answer is yes! For i ∈ Z, set
- .
1 pi+1
ai,i+1 = min 1, ,
2 pi
- .
1 pi−1
ai,i−1 = min 1, ,
2 pi
ai,i = 1 − ai,i+1 − ai,i−1 ,
with ai,j = 0 otherwise. Then this Markov chain is easily run on a computer and
has good convergence properties as the following problem shows (for an animated
version see [7]).
Show that the Markov chain above
(a) is irreducible; that is, for any i, j ∈ Z there are m ∈ N and k1 , k2 , . . . , km ∈ Z
such that ai,k1 > 0 and akm ,j > 0 and akn ,kn+1 > 0 for 1 ≤ n ≤ m − 1;
(b) is aperiodic; that is, there is at least one i ∈ Z with pi,i > 0;
(c) is reversible; that is, pi ai,j = pj aj,i for all i, j ∈ Z;

(d) leaves p∞ stationary; that is, i∈Z pi ai,j = pj for all j ∈ Z [Hint: Use (c)];
(e) converges to p∞ as described above. Hint: This follows from (a), (b), and (d)
by the standard Markov chain convergence theorem [5, Sect. 1.8].
1953: Comments
The interesting posts [6, 9] give a nice background of the history of Markov
chains, some surprising examples, and code to explore. Andrey Markov (1856–1922)
introduced the chains now named for him in 1913 while performing an analysis of
the sequence of consonants and vowels in the work of the Russian writer Alexander
Pushkin (1799–1837). In particular, he found that he could create state diagrams
in which the transition probability to the next letter depended only on the previous
two letters.
In the intervening years these ideas have been successfully extended and ap-
plied to numerous other problems. There are many readable accounts of the history
of these algorithms [1, 8]. Motivation for these extensions and improvements range
from studying the behavior of neutrons in fissile material to estimating the prob-
ability that certain solitaire games are winnable. The applications are almost as
varied. One reason for this is that we can use these ideas to estimate integrals and
areas; it is desirable to be able to determine areas since these frequently correspond
to probabilities. For more on this see the 1946 entry.
Bibliography
[1] D. B. Hitchcock, A history of the Metropolis-Hastings algorithm, Amer. Statist. 57 (2003),
no. 4, 254–257, DOI 10.1198/0003130032413. http://www.jstor.org/stable/pdf/30037292.
pdf. MR2037852
[2] N. Metropolis, The beginning of the Monte Carlo method, Stanislaw Ulam 1909–1984,
getfile?15-12.pdf. MR935771
[3] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, Equations of state
calculations by fast computing machines, J. Chem. Phys. 21 (1953), 1087–1091.
[4] N. Metropolis and S. Ulam, The Monte Carlo method, J. Amer. Statist. Assoc. 44 (1949),
335–341. http://www.jstor.org/stable/pdf/2280232.pdf. MR0031341
[5] J. R. Norris, Markov chains, Cambridge Series in Statistical and Probabilistic Mathematics,
vol. 2, reprint of 1997 original, Cambridge University Press, Cambridge, 1998. http://www.
statslab.cam.ac.uk/~james/Markov/. MR1600720
[6] O. Pavlyk, Centennial of Markov Chains, Mathematica Algorithm R&D, http://blog.
wolfram.com/2013/02/04/centennial-of-markov-chains/.
[7] http://probability.ca/jeff/java/rwm.html
[8] C. Robert and G. Casella, A short history of Markov chain Monte Carlo: subjective recollec-
tions from incomplete data, Statist. Sci. 26 (2011), no. 1, 102–115, DOI 10.1214/10-STS351.
http://arxiv.org/pdf/0808.2902.pdf. MR2849912
[9] A. Smith, Surprising examples of Markov chains, http://mathoverflow.net/q/252671.
1954
Kolmogorov–Arnold–Moser Theorem
Introduction
One can regard perturbation theory as a collection of various methods for ob-
taining approximate solutions to difficult problems based upon exact solutions to
closely related, but simpler, problems. Applied mathematicians and scientists use
the tools of perturbation theory to infer information about problems that model
dynamical systems under the influence of gravitational or quantum forces. For ex-
ample, the planet Neptune was discovered in 1846 as a result of calculations made
by the French mathematician Urbain Le Verrier (1811–1877) and mathematician-
astronomer John Couch Adams (1819–1892), based on the perturbations of the
planet Uranus due to the gravitational influence of the then-unknown Neptune.
It was a momentous day in the history of science when mathematicians told as-
tronomers where to point their telescopes to see the first new planet discovered
since 1781, sixty-five years earlier.
Around the turn of the 20th century, Henri Poincaré, expanding on the work
of the “problem of small denominators” by astronomer Charles-Eugéne Delaunay
(1816–1872), first postulated that small perturbations can have large effects on a
dynamical system. In popular culture, this is known as “chaos” or the “butterfly ef-
fect.” The “problem of small denominators” refers to issues arising from potentially
small quantities that appear in the denominators of the formal Fourier series con-
structed to solve the problem. These can cause convergence issues in the proposed
perturbative series, a problem solved with the advent of KAM theory [8].
The Kolmogorov–Arnold–Moser theorem concerns the behavior of systems un-
der small perturbations; see [3, 4, 8]. The first set of results are due to Andrey
Kolmogorov (1903–1987) in 1954, which were later extended in 1962 by Jürgen
Moser (1928–1999) and further developed by Vladimir Arnold (1937–2010) a year
later. Essentially, the Kolmogorov–Arnold–Moser theorem provides criteria under
which a system of partial differential equations have little “chaotic” behavior under
small perturbations.
One of the most important examples arises from physics. In the Hamiltonian
formulation we have position variables q = (q1 , q2 , . . . , qn ), momentum variables
p = (p1 , p2 , . . . , pn ), the Hamiltonian function H(p, q) (which often corresponds to
the total energy of the system), and the time evolution given by
dp
= −∂q H
dt
and
dq
= ∂p H.
dt
221
222 1954. KOLMOGOROV–ARNOLD–MOSER THEOREM
One then studies how the solutions evolve over time. In this setting, KAM theory
states that for sufficiently small perturbations the new behavior should be close to
that of the unperturbed system.
See [1] for a nice perspective written fifty years after Kolmogorov’s work. We
can see some of the issues by looking at an example from that paper: complex
linearization. Consider a map F (z) = λz + f (z) for some nonzero λ ∈ C. We wish
to find a function Φ such that
(Φ ◦ F )(z) = λΦ(z).
If f has a series expansion, then we can formally find a series expansion for Φ. If
0 < |λ| = 1, the series for Φ converges, but issues arise if |λ| = 1. In that case
we may write λ = e2πiα for some α ∈ R, and the behavior depends on how well
approximable α is by rational numbers. We elaborate on this in the associated
problem and encourage the interested reader to consult [1].

Proposed by Avery T. Carr, Emporia State University, and Steven J.
A key ingredient in KAM theory is the irrationality type of certain parameters.
This concept measures how well one can approximate a given number by rational
numbers. In this problem, we need a notion of how well approximable an irrational
number is by rationals. We can get as good of an approximation as desired simply
by taking more and more decimal digits; thus, our notion cannot simply be how far
our rational approximation is from the original number. It is fruitful to measure
the “cost” of a rational approximation by the size of the denominator used. This
is a reasonable notion, since a large improvement using a small denominator is
more impressive than a large improvement obtained with a larger denominator.
For example, if we want π to 6 decimal places we could use
31415926
= 3.1415926;
10000000
however,
355
= 3.14159292 . . .
113
also gives us 6 decimal places of accuracy while having a much smaller denominator.
Such unusually good rational approximations can be found with continued fractions;
see [5–7] and the entries for 1931, 1934, and 1955.
An irrational number α is of type (K, ν) (for positive K, ν) if

α − p > K
q qν
for all integers p, q. In other words, we cannot approximate α too well by rationals.
The following problem assumes some familiarity with measure theory; for a brief
introduction to these ideas see [7, Appendix A.5].
(a) Prove that for any irrational α there exist infinitely many relatively prime
pairs of integers p, q such that

α − p < 1 .
q q2
This is known as Dirichlet’s approximation theorem. It implies that every
irrational number can be approximated fairly well.
(b) Consider all irrational numbers in [0, 1] of type (1, 2 + ) for a fixed > 0.
What is the measure of such numbers? More generally, what is the measure
of all irrational numbers in [0, 1] that are of type (K, 2 + ) for a fixed > 0
(so K is allowed to vary)?
1954: Comments
In the 1938 entry we saw how the irrationality of α affected the rate of con-
vergence of the sequence {αn } to Benford’s law. The results of this year provide
another example of irrationality in action and serve as an excellent bridge to the
1955 entry on Roth’s theorem.
Dirichlet’s approximation theorem. The standard solution to problem (a)

can be found in [7] and many other number theory books. The proof proceeds by
Dirichlet’s box principle (the pigeonhole principle): if we place n + 1 pigeons in n
boxes, at least one box must have two pigeons. We may assume that 0 < α < 1.
For each Q, partition [0, 1) into Q intervals of length 1/Q:
[0, 1/Q), [1/Q, 2/Q), . . . , [(Q − 1)/Q, 1).
Consider qα (mod 1) for 0 ≤ q ≤ Q, the fractional parts of qα. Each must lie in one
of the Q bins above. Since we have Q + 1 fractional parts, at least one bin must
contain two of them, say q1 α (mod 1) and q2 α (mod 1). Moreover, these fractional
parts must be distinct because α is irrational.
This implies there are integers p1 , p2 such that q1 α − p1 and q2 α − p2 are in the
same bin. Equivalently, the absolute value of their difference is at most 1/Q:
1
|(q1 α − p1 ) − (q2 α − p2 )| < .
Q
Let p = p1 −p2 and q = q1 −q2 (which is nonzero since we have selected two distinct
elements from the same bin); the preceding yields

α − p < 1 < 1 .
q qQ q2
Moreover, there are infinitely many distinct relatively prime pairs (p, q) with this
property since Q can be arbitrarily large.
Hurwitz’s approximation theorem. Are there any irrationals that are hard
to approximate by rationals, or that are particularly easy? The golden ratio
√
1+ 5
φ =
2
224 1954. KOLMOGOROV–ARNOLD–MOSER THEOREM
is among the hardest to approximate. Hurwitz’s theorem states that for any irra-
tional number α there are infinitely many relatively prime pairs (p, q) such that

α − p < √ 1
q 5q 2
√
and that the√constant 1/ 5 is best possible in the sense that if α = φ, then for
each C < 1/ 5 there are only finitely many relatively prime pairs (p, q) such that

φ − p < C .
q q2
Back to cryptography. Now back to some unfinished business. We ran out

of room in our 1952 entry before we could discuss the decryption of a message
encoded with the Vigenère cipher (p. 212) and we wanted to give the 1953 entry
some room to breath. Let us return to our decryption problem.
First, we look for repeated strings in the ciphertext, the longer the better. For
example, the strings FVADW and VVH appear multiple times in the ciphertext. We
highlight these instances below.
JSUIK GDPQB JTKSF IQTWI ZHWLC HZIJO YIGBF VADWI FIVMP UWLGT
DQQIV ZKUHR GFHGK DKITV RTGTX PNMLX YWOPP CWTTC YIVCR HKTIM
EIOXV CGIEW SLGFL JVVLW VSAXK GOSPU HRQIJ LVVHV GGKSW ZGFGS
QSWUK MPTRV OOWMQ BISTM RYVCE ICPOI VCIMP RWLGZ HRIHK SHHKI
MSBAK HKXJO WCQIF EPRRJ TSTYG BFCCB DPAGL WCBGL QDHJW ZOCHW
QHVVH OGMZS TRDXV VDXRC LRVHK IFSFV ADWMQ BLWUW PTNSD RFGWV
CWJLV TRVYO UHICR HNIFO
It would be unusual for strings to appear multiple times in a short piece of ciphertext
purely by chance. The starting points of the two occurrences of VVH are separated
by 116 = 4 × 29 characters and the starting points of two occurrences of FVADW
are separated by 244 = 4 × 61 characters. Since the least common multiple of
these distances is four, we suspect that the key has length four. Why? Because we
expect that common plaintext words or phrases (such as THE) occasionally occur
some multiple of four letters apart and are therefore encrypted identically.
If we look at every fourth letter of the ciphertext, we obtain
JKQKQ ZCJGA FPGQK GKTGN YPTVK EVEGV VKPQV GWGWP OQTVC VPGIH
MKJQP TGCAC QWHVG TVRVF AQUNF CVYIN
The letters Q, V, G, and W occur often in this string. The relevant shift most likely
maps common letters to these letters. The letter frequency table on p. 125 and
some trial and error lead us to believe that the keyword starts with C, since shifting
by −2 maps the common letters O, T, E, and U, to the letters above.
The characters in the ciphertext at positions 2, 6, 10, 14, . . . are
SGBST HHOBD IUTIU FDVTM WCCCT ICWFV SGUIV GZSUT OBMCP CRZHH
SHOIR SBBGB DZWVM RVCHS DBWSG WTOCI
The letters B, C, I, H, Z, and S appear often in this string. The relevant shift most
likely maps common letters to the letters above. Shifting by −14 yields the common
letters N, M, T, S, L, and E, respectively. This makes us suspect that the second letter
of the keyword is O.
Some more frequency analysis and inspired guesswork provides the keyword
CODE. The decoded message is:
HERE IS A LONG PIECE OF TEXT THAT WE HAVE ENCRYPTED USING THE
FAMOUS VIGENERE CIPHER. HOPEFULLY IT WILL NOT PROVE TOO DIFFICULT
TO DECIPHER. IF THIS TEXT IS LONG ENOUGH, THERE SHOULD BE ENOUGH
INFORMATION FOR YOU TO BE ABLE TO FIND THE LENGTH OF THE KEY.
WITH THAT YOU CAN DO FREQUENCY ANALYSIS AND HOPEFULLY FIND THE
KEY WORD. AT THAT POINT THE DECRYPTION IS SIMPLE AND STRAIGHTFORWARD.
GOOD LUCK.
Bibliography
[1] H. W. Broer, KAM theory: the legacy of A. N. Kolmogorov’s 1954 paper. Comment on: “The
general theory of dynamic systems and classical mechanics” (French) [in Proceedings of the
International Congress of Mathematicians, Amsterdam, 1954, Vol. 1, 315–333, Erven P.
Noordhoff N.V., Groningen, 1957; MR0097598], Bull. Amer. Math. Soc. (N.S.) 41 (2004),
no. 4, 507–521, DOI 10.1090/S0273-0979-04-01009-2. http://www.ams.org/journals/bull/
2004-41-04/S0273-0979-04-01009-2/S0273-0979-04-01009-2.pdf. MR2083638
[2] L. Chierchia and J. N. Mather, Kolmogorov–Arnold–Moser theory, Scholarpedia 5 (2010),
no. 9, 2123. http://www.scholarpedia.org/article/Kolmogorov-Arnold-Moser_theory.
[3] CORNELLCAST, Small denominators: adventures through the looking glass, http://www.
cornell.edu/video/john-milnor-small-denominators.
[4] H. S. Dumas, The KAM story: A friendly introduction to the content, history, and signif-
icance of classical Kolmogorov-Arnold-Moser theory, World Scientific Publishing Co. Pte.
Ltd., Hackensack, NJ, 2014. MR3222196
[5] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, 5th ed., The
Clarendon Press, Oxford University Press, New York, 1979. MR568909
[6] A. Ya. Khinchin, Continued fractions, The University of Chicago Press, Chicago, Ill.-London,
1964. MR0161833
[8] C. E. Wayne, An introduction to KAM theory, Dynamical systems and probabilistic methods
in partial differential equations (Berkeley, CA, 1994), Lectures in Appl. Math., vol. 31, Amer.
Math. Soc., Providence, RI, 1996, pp. 3–29. http://math.bu.edu/people/cew/preprints/
introkam.pdf. MR1363023
[9] Wikipedia, Kolmogorov–Arnold–Moser theorem, http://en.wikipedia.org/wiki/
Kolmogorov-Arnold-Moser_theorem.
[10] Wikipedia, Discovery of Neptune, http://en.wikipedia.org/wiki/Discovery_of_Neptune.
[11] Wikipedia, Perturbation theory, http://en.wikipedia.org/wiki/Perturbation_theory.
1955
Roth’s Theorem
Introduction
A real number is either rational or irrational; determining which is the case is
often a difficult matter (see the 1935 entry). Can one irrational number be “more
irrational” than another?
The irrationality measure μ(α) of a real number α is the least upper bound of
the set of real r > 0 such that

p 1

0 < α − < r (1955.1)
q q
has infinitely many solutions with relatively prime integers p, q and q > 0. Dirich-
let’s approximation theorem asserts that μ(α) ≥ 2 for all irrational α; see Table
1. To be more specific, Dirichlet proved that (1955.1) has infinitely many solutions
when r = 2. For such p, q, the error in the approximation α ≈ p/q is much smaller
than one has a right to expect since consecutive rational numbers with denomina-
tor q are at a distance of 1/q from each other. An error bounded above by 1/q 2
seems like too much to ask for. Such excellent approximations can be produced
with truncated continued fraction expansions (see the 1931 entry).
On the other hand, each rational α has μ(α) = 1. To see this, write α = a/b
in lowest terms and let δ > 0. Suppose that (1955.1) has infinitely many solutions
with r = 1 + δ. Then
aq − bp
0< < 1 .
bq q 1+δ
Multiply by bq and observe that aq − bp is a nonzero integer, so
b
1 ≤ |aq − bp| < δ .
q
Since the right-hand side of the preceding tends to zero as q → ∞, the preceding
inequalities can be satisfied by only finitely many pairs p, q of relatively prime inte-
gers. Thus, there is a profound gap between the irrationality measures of rationals
(exactly 1) and irrationals (≥ 2).
One can show that almost every real number has irrationality measure 2; this
includes the constant e.1 For other numbers, such as π, we have only upper bounds:
μ(π) ≤ 7.6063 [4]. On the other hand, there are numbers whose irrationality
∞
measure is infinite. This includes Liouville’s constant n=1 10−n! , the first explicit
transcendental number discovered; see the 1935 entry.
1 The convenient form of the continued fraction expansion for e (see the 1931 entry) makes
it possible to show that μ(e) = 2. The erratic nature of the continued fraction expansion for π,
on the other hand, does not permit a similar evaluation of μ(π).
227
228 1955. ROTH’S THEOREM
Table 1. Rational approximations of π.
p
q
p
q (numeric) | pq − π| 1
q2
1
q2 (numeric)
3 3.00000000000 0.14 1 1.0
22 1
7 3.14285714286 0.0013 49 0.020
333 1
106 3.14150943396 0.000083 11236 0.000089
−7
355
113 3.14159292035 2.67 × 10 1
12769 0.000078
103993
33102 3.14159265301 5.78 × 10−10 1
1095742404 9.13 × 10−10
104348
33215 3.14159265392 3.32 × 10−10 1
1103236225 9.06 × 10−10
208341
66317 3.14159265347 1.22 × 10−10 1
4397944489 2.27 × 10−10
Roth’s theorem, for which Klaus Friedrich Roth (1925–2015) was awarded a
Fields Medal in 1958, states that μ(α) = 2 for every irrational algebraic real number
α. That is, for every > 0, the inequality

α − p < 1
q q 2+
has only finitely many solutions with relatively prime integers p, q and q > 0. Thus,
an irrational algebraic real number cannot have many “extremely good” rational
approximations.
The origins of this work go back to Joseph Liouville (1809–1882), who proved in
1844 that μ(α) ≤ d for an algebraic number of degree d ≥ 2; see the 1935 entry for a
proof of this result. In 1909, Axel Thue (1863–1922)
√ improved this to d/2 + 1 + for
every > 0. √This bound was reduced to 2 d by Carl Ludwig Siegel (1896–1981) in
1921 and to 2d by mathematician-physicist Freeman Dyson in 1947 (see the 1928
entry). Siegel had conjectured that μ(α) = 2 for all algebraic irrational numbers;
this was finally proved by Roth in 1955.
Due to the recent explosion of work in additive combinatorics [5], the phrase
“Roth’s theorem” now often refers to Roth’s theorem on arithmetic progressions
(1953), which asserts that if A ⊆ Z has positive upper density, meaning that
|A ∩ [−N, N ]|
lim sup > 0,
N →∞ 2N
then A contains infinitely many arithmetic progressions of length three; see the 1913
entry. This is the first nontrivial case of Szemerédi’s theorem; see the 1975 entry.
To avoid confusion, Roth’s theorem on Diophantine approximation is sometimes
referred to as the Thue–Siegel–Roth theorem.

Find a one-to-one function f : [0, 1] → [0, 1] such that f (x) is always transcen-
dental. Can you find a continuous function that does this? If so, can you make
your function differentiable?
1955: Comments
Hint for the problem. Consider the binary expansion
∞
bn (x)
x = , bn (x) ∈ {0, 1},
n=1
2n
of x ∈ [0, 1). The expansion is unique if x is irrational. If x is rational, then it has

two binary expansions; take the infinite one. Motivated by Liouville’s construction
of a transcendental number (see the 1935 entry), define
∞

M (x) = 10−(bn (x)+1)n! .
n=1
Prove that M (x) is always transcendental. What properties does M have? Is it

continuous? Strictly increasing? One-to-one?
The Flint Hills series. Here is another difficult question related to Diophan-
tine approximation. Does the Flint Hills series
∞
1
(1955.2)
n=1
n3 sin2 n
converge? In case you were wondering, the nomenclature refers to Flint Hills,
Kansas [6, Chapter 25]. One suspects that the n3 in the denominator should force
the series to converge. However, sin n gets close to zero every now and then. For
example, the exceptionally good rational approximation π ≈ 355/113 means that
sin 355 = −0.000030144 . . . is dangerously close to sin 113π = 0. This results in a
big jump: the 354th partial sum of (1955.2) is approximately 4.8 and the 355th
partial sum is approximately 29.4. Figures 1 and 2 illustrate this sort of behavior.
Figure 1. First 75 partial sums of the Flint Hills series (1955.2).

The nth partial sum tends to be significantly larger than the
(n − 1)st whenever n is the numerator of an unusually accurate
rational approximation to π.
230 1955. ROTH’S THEOREM
Figure 2. First 400 partial sums of the Flint Hills series (1955.2).
The huge jump between the 354th and 355th partial sums is evi-
dent.
As of 2018, it is unknown whether the Flint Hills series converges. Its relevance
to this entry stems from the fact that its convergence would imply that μ(π) ≤ 2.5
[1]. That would be a huge improvement over the best known result, μ(π) ≤ 7.6063.
Furstenberg’s proof of Euclid’s theorem. The year 1955 is also notable

for Hillel Furstenberg’s remarkable topological proof of the infinitude of the primes
(Euclid’s theorem) [2]. The essence of Furstenberg’s proof was later highlighted by
Idris D. Mercer [3] in 2009 when he provided a variant of the proof without the use
of point-set topology.
Let X be a set. A collection τ of subsets of X is a topology on X if
(a) ∅, X ∈ τ ,
(b) τ is closed under arbitrary unions,
(c) τ is closed under finite intersections.
A set X endowed with a topology τ is a topological space. The elements of τ are
open sets; a subset of X is a closed set if its complement is open. A base β for X
is a subset β of τ such that each element of τ is a union of elements of β. If this is
the case, one says that “β generates the topology τ on X.”
Now for Furstenberg’s proof of Euclid’s theorem. Consider the topology τ on
X = Z generated by the collection of all infinite arithmetic progressions
Ba,n = {a + kn : k ∈ Z}.
Thus, each Ba,n is open and each open set in Z is a union of some collection of
infinite arithmetic progressions. Since
*
n−1
c
Ba,n = Ba+j (mod n),n
j=1
c
is a finite union of open sets, it follows that each Ba,n is open. Consequently, each
Ba,n is closed (in addition to being open).
Suppose toward a contradiction that there are only finitely many primes. Since
the finite union of closed sets is closed2 , we conclude that
*
A= B0,p
p prime
is closed. Since every integer except −1 and 1 is a multiple of some prime, it follows
that {−1, 1} = Z\A is a nonempty open set that contains no infinite arithmetic
progression. This contradiction shows that there are infinitely many prime numbers.
Much of the preceding material on Furstenberg’s proof was originally contained
in the 1948 entry, whose problem was written by James M. Andrews. The original
problem asked: Is the Furstenberg topology Hausdorff? Is it regular? Is it normal?
Here is a quick explanation of the terminology for those who have not taken a
course in point-set topology. A topological space X is
(a) Hausdorff if whenever x, y ∈ X are distinct points, there are disjoint open sets
U, V ⊂ X with x ∈ U and y ∈ V ,
(b) regular if whenever A ⊂ X is closed and x ∈ X\A, there are disjoint open sets
U, V ⊂ X with x ∈ U and A ⊂ V , and
(c) normal if whenever A, B ⊂ X are disjoint closed sets, there are disjoint open
sets U, V ⊂ X with A ⊂ U and B ⊂ V .
Bibliography
[1] M. A. Alekseyev, On convergence of the Flint Hills series, https://arxiv.org/pdf/1104.
5100.pdf
[2] H. Furstenberg, On the infinitude of primes, Amer. Math. Monthly 62 (1955), 353, DOI
10.2307/2307043. MR0068566
[3] I. D. Mercer, On Furstenberg’s proof of the infinitude of primes, Amer. Math. Monthly 116
(2009), no. 4, 355–356, DOI 10.4169/193009709X470218. MR2503321
[4] V. Kh. Salikhov, On the irrationality measure of π (Russian), Uspekhi Mat. Nauk 63 (2008),
no. 3(381), 163–164, DOI 10.1070/RM2008v063n03ABEH004543; English transl., Russian
Math. Surveys 63 (2008), no. 3, 570–572. MR2483171
[5] T. Tao and V. H. Vu, Additive combinatorics, paperback edition [of MR2289012], Cambridge
Studies in Advanced Mathematics, vol. 105, Cambridge University Press, Cambridge, 2010.
MR2573797
[6] C. A. Pickover, The mathematics of Oz: Mental gymnastics from beyond the edge, Cambridge
University Press, Cambridge, 2002. MR1936664
2 This
n n
follows from de Morgan’s law ( i=1 S i )c = i=1 Sic and axiom (c).
1956
The GAGA Principle
Introduction
In calculus one encounters a vast array of “transcendental” functions such as
ex , sin x, and log x. In multivariable calculus (with functions) and differential
geometry (with smooth maps), the abundance of “transcendental” functions and
maps becomes even more pronounced. In 1956, it was shown by Jean-Pierre Serre
(1926– ), who had been awarded the Fields Medal in 1954, that in the setting of
complex variables, under a compactness hypothesis many “transcendental-looking”
geometric and function-theoretic constructions are algebraic from an appropriate
point of view and, moreover, that such an “algebraization” of the analytic construc-
tion is unique.
This result explained many earlier known special cases and was of fundamental
importance in the development of algebraic and complex analytic geometry. Not
only did it justify the role of transcendental methods in the solution of algebraic
problems admitting a sufficiently geometric flavor, it also inspired the profound
work of Alexander Grothendieck (1928–2014) and many others during the revolu-
tion that swept through algebraic geometry in the 1960s.
Serre’s method of proof was sufficiently robust that it was later generalized to
apply to geometric constructions over the p-adic numbers instead of C, and this
generalization is a ubiquitous tool in contemporary algebraic number theory. His
1956 paper is titled “Géométrie algébrique et géométrie analytique”, or GAGA for
short, and the phrase “GAGA principle” expresses the idea that in the presence
of compactness, certain analytic constructions in geometry over C not only admit
an algebraic description (which is already quite striking) but in fact an essentially
unique one.

Proposed by Brian Conrad, Stanford University.
This problem develops the classical content of Serre’s theorem in the one-
dimensional case and assumes familiarity with undergraduate complex analysis.
Let f be a meromorphic function on C. It is meromorphic at ∞ if f (1/z) is mero-
morphic at 0.
(a) Prove that every rational function is meromorphic at ∞.
(b) Prove that if f is meromorphic at ∞, then f is a rational function. Deduce
that if a holomorphic automorphism f : C → C is meromorphic at ∞, then
f (z) = az + b for some a ∈ C× and b ∈ C. Hint: Show that f has only finitely
many zeros and poles in C. Use this to reduce to the case in which f has no
233
234 1956. THE GAGA PRINCIPLE
such zeros or poles. By studying the zero or pole order of f (1/z) at z = 0, get
to a case where Liouville’s theorem can be applied.
1956: Comments
Fermat’s last theorem. Since J. P. Serre has played an important role in
many entries in this collection, it is worth mentioning another here which we explore
in greater detail in the 1995 entry: Fermat’s last theorem.
Fermat’s last theorem states that if n ≥ 3, then there are no solutions to
an + bn = cn in natural numbers a, b, c. Although Pierre de Fermat claimed to
have a remarkably simple proof almost four hundred years ago, the only known
proof uses an enormous amount of machinery from 20th-century mathematics. The
following summary is painfully short, and the interested reader is encouraged to
peruse the references from the 1995 entry.
In the 1960s Yves Hellegouarch (1936– ) considered what could be done if a
solution (a, b, c) existed for some n. He associated the elliptic curve
y 2 = x(x − an )(x + bn )
to this solution and saw that it would have some special properties. A few years
later in the 1980s, Gerhard Frey (1944– ) explored these curves again and proposed
that such curves would not be “modular.” However, it was believed that all el-
liptic curves are modular (one interpretation is that there is a weight-2 cuspidal
newform associated to the curve). Serre noticed a mistake in Frey’s proof of the
nonmodularity of his curves; this issue (the epsilon conjecture) was proved by Ken
Ribet (1948– ) in 1986. The proof of Fermat’s last theorem follows from showing all
semistable elliptic curves over Q are modular, something accomplished by Andrew
Wiles with assistance from Richard Taylor (1962– ) in the 1990s.
A word about fonts. A convention going back to the influential books of
Nicholas Bourbaki (a pseudonym under which a group of mathematicians, mostly
French, published a series of influential textbooks known for their high level of
abstraction) in the 1930s is that canonical mathematical structures should be de-
noted with a boldface font, including various number systems such as as Z, Q, R,
and C. Since such boldface is hard to replicate in handwriting, Kunihiko Kodaira
(1915–1997) proposed the variants Z, Q, R, and C when writing by hand. Para-
doxically, with the advent of modern mathematical typsetting, these latter fonts
became more widespread in typesetting than the boldface fonts they were invented
to replicate. Both Conrad and Serre (whose work is featured here) feel strongly
that only boldface should be used in the typography for these number systems, so
we have followed that convention here.
Bibliography
[1] J.-P. Serre, Géométrie algébrique et géométrie analytique (French), Ann. Inst. Fourier, Greno-
ble 6 (1955), 1–42. MR0082175
[2] Wikipedia, Algebraic and analytic geometry, http://en.wikipedia.org/wiki/
Algebraic_geometry_and_analytic_geometry.
1957
The Ross Program
Introduction
The Ross Mathematics Program is an intensive residential summer program for
talented high school students. Arnold Ross (1906–2002) founded the program at the
University of Notre Dame in 1957. He later moved it to the Ohio State University
in 1964. Although Dr. Ross stepped down in 2000, the Ross Program continues
to run, involving about seventy-five first-year students every summer. The central
goal of the program is to train students to think like mathematicians and to write
convincing, logical proofs of their mathematical observations. Ross chose number
theory as the vehicle for this learning process. Starting from the axioms for the ring
of integers, Ross participants analyze topics such as modular arithmetic, Euclid’s
algorithm, quadratic reciprocity, and the existence of primitive roots. They also
consider analogues of those ideas in other contexts such as the Gaussian integers and
the ring of polynomials over Z/pZ. Further information about the Ross Program is
posted at http://www.math.osu.edu/ross. The problems below are taken from
some of the Ross problem sets.

Proposed by Daniel Shapiro, The Ohio State University.
Let gcd(a, b) denote the greatest common divisor of integers a and b. The
sequence 2n − 1 enjoys a curious property:
gcd(2m − 1, 2n − 1) = 2gcd(m,n) − 1.
We give this property a name: a sequence {An }n≥1 of positive integers has the gcd
property if gcd(Am , An ) = Agcd(m,n) for every pair of indices m, n.
Problem 1. Show that the following sequences have the gcd property:
(a) the constant sequence Cn = r, in which r ∈ N is fixed,
(b) the linear sequence Ln = rn, in which r ∈ N is fixed,
(c) for fixed c, k ∈ N, the sequence

c if n is a multiple of k,
E(k, c)n =
1 otherwise,
(d) for fixed a, b ∈ N with a > b, the sequence Rn = an − bn ,
(e) the Fibonacci numbers, defined by F1 = F2 = 1 and Fn+2 = Fn+1 + Fn .
235
236 1957. THE ROSS PROGRAM
Problem 2. For a sequence {bn }n≥1 of positive integers, define

Bn = bd .
d|n
For example,
B2 = b1 b2 , B4 = b1 b2 b4 , and B6 = b1 b2 b3 b6 .
If gcd(bm , bn ) = 1 whenever m = n, show that {Bn } has the gcd property.
(a) Which {bn } produce sequences {Bn } with the gcd property?
(b) Does every {Bn } with the gcd property arise from some (unique) integer se-
quence {bn }?
1957: Comments
Cyclotomic polynomials. The second problem is related to the factorization

xn − 1 = Φd (x),
d|n
in which Φd (x) denotes the dth cyclotomic polynomial. To be more specific, Φd (x)
is the monic (leading coefficient 1) polynomial whose roots are the primitive dth
roots of unity. The primitive dth roots of unity are exp(2πij/d), in which i2 = −1,
j ∈ {1, 2, . . . , d}, and gcd(j, d) = 1; see Figure 1.
√ i √
− 12 + 3
2 i
1
2 + 3
2 i
−1 0 1
√ √
− 12 − 3 1
− 3
2 i
2 i 2
−i
Figure 1. The sixth roots of unity are the vertices of an equi-

lateral hexagon inscribed in the unit circle in the complex
√
plane.
πi/3 1 3
The primitive sixth roots of unity are e = 2 + 2 i and
√
e5πi/3 = 12 − 23 i, denoted by red dots above.
Although they are defined in terms of their roots, which are certain complex
roots of unity, cyclotomic polynomials have only integer coefficients. The first few
cyclotomic polynomials are
Φ1 (x) = x − 1,
Φ2 (x) = x + 1,
Φ3 (x) = x2 + x + 1,
Φ4 (x) = x2 + 1,
Φ5 (x) = x4 + x3 + x2 + x + 1,
Φ6 (x) = x2 − x + 1,
Φ7 (x) = x6 + x5 + x4 + x3 + x2 + x + 1,
Φ8 (x) = x4 + 1,
Φ9 (x) = x6 + x3 + 1,
Φ10 (x) = x4 − x3 + x2 − x + 1.
If Bn = 2n − 1, then bd = Φd (2) is a sequence of integers with the gcd property.

Before proceeding to another gcd-related gem (see the 1951 and 1977 entries
for more applications of the gcd), we first want to dispel a natural conjecture about
cyclotomic polynomials. A glance at the first hundred or so cyclotomic polynomials
suggests that their coefficients are always −1, 0, or 1. This is false, since
Φ105 (x) = x48 + x47 + x46 − x43 − x42 − 2x41 − x40 − x39 + x36 + x35 + x34
+ x33 + x32 + x31 − x28 − x26 − x24 − x22 − x20 + x17 + x16 + x15
+ x14 + x13 + x12 − x9 − x8 − 2x7 − x6 − x5 + x2 + x + 1.
When one realizes that 105 = 3 · 5 · 7 is the smallest number that is the product of
three distinct odd primes, it becomes slightly more reasonable to expect that the
first counterexample might take so long to materialize.
Invisible forests. Imagine that there is a slender tree planted at each lattice
point (x, y) ∈ Z2 and pretend that you are at the origin (0, 0). How many lattice
points can you “see” from the origin? Which ones are blocked by trees? Are there
arbitrarily large portions of the forest that are not visible from the origin? See [2]
for interesting generalizations of this problem.
If gcd(x, y) = g = 1, then x = gx and y = gy for some (x , y ) ∈ Z2 , so that
the tree planted at (x , y ) “blocks” our view of (x, y) = g(x , y ). In general, a
lattice point (x, y) is visible from the origin if and only if gcd(x, y) = 1; see Figures
2 and 3. Based upon this, one can show that the proportion of lattice points visible
from the origin is 6/π 2 ≈ 60.8%; see the notes to the 1939 entry.
The following result is from [1, Thm. 5.29]: the set of lattice points visible from
the origin contains arbitrarily large square gaps. That is, given any positive integer
n, there exists a lattice point (a, b) such that none of the lattice points (a + j, b + k)
with 0 < j, k ≤ n is visible from the origin.
Figure 2. Visible lattice points in the region [−10, 10] × [−10, 10].
The proof is an elegant use of prime numbers and the Chinese remainder the-
orem. Given n > 0, form the n × n matrix
⎡ ⎤
2 3 · · · pn
⎢ pn+1 pn+2 · · · p2n ⎥
⎢ ⎥
⎢ .. .. .. ⎥
⎣ . . . ⎦
p(n−1)n+1 pn(n−1)+2 ··· pn2
whose first row consists of the first n primes, whose second row consists of the next
n primes, and so on. Let rj be the product of the primes in the jth row and let
cj denote the product of the primes in the jth column. Since none of the primes
p1 , p2 , . . . , pn2 can lie in two rows or two columns simultaneously, it follows that
gcd(rj , rk ) = gcd(cj , ck ) = 1
whenever j = k. The Chinese remainder theorem asserts that the system of con-
gruences
x ≡ −1 (mod r1 ), x ≡ −2 (mod r2 ), . . . , x ≡ −n (mod rn )
has a unique solution a modulo r1 r2 · · · rn . Similarly, the system
y ≡ −1 (mod c1 ), y ≡ −2 (mod c2 ), . . . , y ≡ −n (mod cn )

Figure 3. Visible lattice points in the region [−50, 50] × [−50, 50].
has a solution b that is unique modulo c1 c2 · · · cn . Observe now that
r1 r2 · · · rk = c 1 c 2 · · · c k = 2 · 3 · 5 · · · p k 2 .
Consider the square with corners at (a, b) and (a + n, b + n). Any lattice point
inside of this square can be written in the form (a + j, b + k), in which 0 < j, k < n
(the points with either j = n or k = n lie on the boundary of the square). Since
a ≡ −j (mod rj ) and b ≡ −k (mod ck )
by the definition of a and b, it follows that rj |(a + j) and ck |(b + k). Thus, the
prime number at the intersection of row j and column k divides a + j and b + k.
Consequently, gcd(a + j, b + k) = 1 and hence (a + j, b + k) is not visible from the
origin if 0 < j, k ≤ n. This proves that there exists a square of n2 lattice points
that are not visible from the origin.
Bibliography
[2] E. H. Goins, P. E. Harris, B. Kubik, and A. Mbirika, Lattice point visibility on
generalized lines of sight, Amer. Math. Monthly 125 (2018), no. 7, 593–601, DOI
10.1080/00029890.2018.1465760. MR3836421
[3] D. Goss (editor), Arnold Ross Memorial Issue, Journal of Number Theory 110 (2005), no. 1.
In particular, see Arnold Ephraim Ross (1906–2002), p.1-2. http://www.sciencedirect.com/
science/journal/0022314X/110/1.
[4] M. Dziemiańczuk and Wieslaw Bajguz, On GCD-morphic sequences, 2008, http://arxiv.
org/abs/0802.1303.
1958
Smale’s Paradox
Introduction
There are many remarkable results in topology that are counterintuitive. One
of the most famous is the subject of our 1924 entry, the Banach–Tarski paradox.
It asserts that the three-dimensional unit ball can be partitioned into finitely many
disjoint subsets that can be rearranged using rigid motions to form two identical
unit balls. This appears to violate our notion of volume.
Smale’s paradox is another strange result about supposedly familiar objects.
Imagine a sphere composed of a material that can pass through itself. With-
out puncturing or creasing the material, is it possible to turn the sphere inside
out? Stephen Smale (1930– ) shocked the mathematical world in 1958 when he
proved that sphere eversion is possible [2]. However, his proof was difficult to
distill into an explicit regular homotopy. It was through the work of many oth-
ers, including Arnold Shapiro (1921–1962) and Bernard Morin (1931– ), that the
first concrete geometric representation of a sphere eversion emerged. In particu-
lar, William Thurston (1946–2012) discovered a clever explicit construction, now
known as Thurston’s corrugations. Using the methods of Thurston’s corrugations,
the sphere is corrugated and then the top and the bottom of the sphere are pulled
through each other without creasing due to the geometry of the corrugations which
permits the “turning.” An excellent introduction to the topic, including an anima-
tion of the eversion, is available online at [3].
Smale’s paradox belongs in the “a video is worth a thousand words” category,
so we make no attempt to provide the details here. However, we can introduce
some other interesting topological ideas here that are not so exotic.
The Möbius strip is perhaps most students’ first brush with topology. It is a
peculiar surface that is obtained by gluing two opposite ends of a flexible, rect-
angular strip with a half twist; see Figure 1. The Möbius strip is an example of
a two-dimensional manifold with boundary. This means that a tiny observer who
lives on the surface of the Möbius strip, but not on the boundary curve, could be
forgiven for thinking that she lives in R2 . The observer would not be able to deduce,
based upon purely local observations, that the universe was curved in some way.
Nor would she be able to deduce that the Möbius strip is nonorientable: there are
no “front” and “back” sides to the Möbius strip. It has only one side. A torus can
be described in a similar manner; see Figure 2. Unlike the Möbius strip, a torus is
orientable: it has an inside and an outside (as Homer Simpson could tell you).
What is a practical application of the Möbius strip? Large conveyor belts are of-
ten fashioned into Möbius strips to ensure that the entire surface wears evenly. The
241
242 1958. SMALE’S PARADOX
Figure 1. Construction of a Möbius strip from a square of flexible

material. (left) A square of flexible material that has its left- and
right-hand edges identified with opposite orientations. (right)
Aligning the edges of the square so that the arrows agree results
in a Möbius strip.
Möbius strip was independently discovered by August Ferdinand Möbius (1790–

1868) and Johann Benedict Listing (1808–1882) in 1858. Unfortunately for Listing,
the surface eventually became known as the Möbius strip despite the fact that
Möbius’s discovery of the surface became known only after his death. However,
Listing was the first to use the term topology. According to Listing:
By topology we mean the doctrine of the modal features of objects,

or of the laws of connection, of relative position and of succession of
points, lines, surfaces, bodies and their parts, or aggregates in space,
always without regard to matters of measure or quantity.
Although he used the term as early as 1836, Listing’s 1847 book Vorstudien zur
Topologie firmly cemented the word in German mathematics. English speakers
continued to use the term analysis situs until the late 1920s when “topology” came
into popular use [1].

Proposed by Avery T. Carr, Emporia State University, and James M.
Andrews, University of Memphis.
Smale’s unexpected result brings to question the possibility of everting other
shapes. Consider a circle governed by the same rules as the sphere from Smale’s
paradox. The circle is composed of a material that can pass through itself but
cannot be punctured or creased. Is it possible to evert the circle? What about a
torus? More generally, what about a hypersphere in n dimensions? Hint: Look up
the Whitney–Graustein theorem.
Figure 2. Construction of a torus from a square of flexible ma-

terial. (left) A square of flexible material with opposite edges
identified with the same orientation. (middle) Aligning the left
and right edges so that the arrows agree results in a hollow cylin-
der. (right) Aligning the top and bottom of the cylinder so that
the arrows agree results in a torus.
1958: Comments
Möbius trip. As Figure 1 suggests, we can describe a Möbius strip as a
parametrized surface. The parametrization
v u
x(u, v) = 1 + cos cos u,
2 2
v u
y(u, v) = 1 + cos sin u,
2 2
v u
z(u, v) = sin ,
2 2
maps the rectangle [0, 2π] × [−1, 1] in uv-space onto a Möbius strip in xyz-space
with width 1 and whose central circle has radius 1.
The boundary of a Möbius strip is, topologically speaking, a circle. In the
formation of the Möbius strip from our square of flexible material (Figure 1), the
upper and lower edges are joined into one continuous curve. Indeed, trace the edge
of a Möbius strip and you will find that it is a single curve; see Figure 3.
Imagine that we live in a high-dimensional space, or that we had a flexible
material that could pass through itself. What would happen if we took a disk and
glued its edge to the boundary of a Möbius strip? We cannot accomplish this in
R3 without self-intersections, but we could accomplish this in R5 , which gives us
enough wiggle room. See the notes for the 2003 entry for more information.
The Klein bottle. Another popular topological item is the Klein bottle; see
Figure 4. This peculiar bottle was first described in 1882 by Felix Klein (1849–
1925). Like the Möbius strip, it is nonorientable. You should definitely not use
it to store liquids since it has no inside or outside! You can make a Klein bottle
by gluing together two Möbius strips along their boundaries. Unfortunately, the
resulting object cannot be realized in R3 without self-intersections, although R4
244 1958. SMALE’S PARADOX
Figure 3. The boundary of a Möbius strip is a single continuous

curve that can be continuously deformed into a circle.
Figure 4. Construction of a Klein bottle from a square of flexible

material. (left) A square of flexible material so that the left and
right edges have the same orientation and so that the top and bot-
tom edges have opposite orientations. (middle) Aligning the left-
and right-hand edges so that the arrows agree results in a hollow
cylinder. (right) The result is the Klein bottle, a nonorientable
surface. It has no inside and no outside.
will do nicely. For example,
x(u, v) = (3 + cos u) cos v,

y(u, v) = (3 + cos u) sin v,
z(u, v) = sin u cos(v/2),
w(u, v) = sin u sin(v/2),
maps the square [0, 2π] × [0, 2π] in uv-space onto a Klein bottle in xyzw-space
without self-intersection.
Bibliography
[1] J. J. O’Connor and E. F. Robertson, Johann Benedict Listing, MacTutor History of Mathe-
matics, http://www-history.mcs.st-and.ac.uk/Biographies/Listing.html.
[2] S. Smale, A classification of immersions of the two-sphere, Trans. Amer. Math. Soc. 90
(1958), 281–290, DOI 10.2307/1993205. http://www.maths.ed.ac.uk/~aar/papers/smale5.
pdf. MR0104227
[3] YouTube, Outside In, http://www.youtube.com/watch?v=wO61D9x6lNY.
1959
QR Decomposition
Introduction
The QR decomposition is a phenomenally useful matrix factorization that was
independently discovered by John G. F. Francis (1934– ) in 1959 [3, 4] and Vera
Kublanovskaya (1920–2012) in 1961 [9]; see [7] for a detailed history. Suppose that
A ∈ Mm×n (R) and m ≥ n; that is, suppose that A has at most as many columns
as rows. Then we may factor A = QR, in which Q ∈ Mm×n (R) has orthonormal
columns and R ∈ Mn (R) is upper triangular and has nonnegative diagonal entries.
The QR algorithm is an iterative algorithm, based upon repeated QR decompo-
sitions, that is used to quickly and accurately compute eigenvalues. The standard
approach, taught in most introductory linear algebra courses, is to compute the
characteristic polynomial
pA (z) = det(zI − A)
of A ∈ Mn and then find its roots. Due to its reliance on determinants, this method
is terribly inefficient for large matrices. Moreover, there are no simple formulas to
exactly compute the roots of a polynomial of degree five or more. According to
mathematician-writer-journalist Barry Cipra (1952– ) [1]:
Eigenvalues are arguably the most important numbers associated with
matrices—and they can be the trickiest to compute. It’s relatively
easy to transform a square matrix into a matrix that’s “almost” up-
per triangular, meaning one with a single extra set of nonzero entries
just below the main diagonal.1 But chipping away those final nonze-
ros, without launching an avalanche of error, is nontrivial. The QR
algorithm is just the ticket. Based on the QR decomposition, which
writes A as the product of an orthogonal matrix Q and an upper tri-
angular matrix R, this approach iteratively changes Ai = QR into
Ai+1 = RQ, with a few bells and whistles for accelerating convergence
to upper triangular form. By the mid-1960s, the QR algorithm had
turned once-formidable eigenvalue problems into routine calculations.
The QR algorithm has rightly been hailed as one of the ten most important algo-
rithms of the 20th century [1, 2]; see also the 1965 entry.
How does one compute a QR decomposition? We outline here the method of
Householder reflections, named after Alston Scott Householder (1904–1993). See
[5] for more details and corresponding results about complex matrices; see [6] for
1 Such a matrix is called an upper Hessenberg matrix. It is possible to bring a square matrix
into upper Hessenberg form through the use of Householder transformations; see [5].
247
248 1959. QR DECOMPOSITION
w
x
Uw x
Figure 1. Action of a 3 × 3 Householder matrix Uw on R3 .
explicit algorithms and numerical considerations. Let w ∈ Rn . The n × n House-

holder matrix
Uw = I − 2ww∗ /w2 (1959.1)
reflects vectors in Rn across the (n − 1)-dimensional hyperplane that is orthogonal
to w; see Figure 1. Since Uw preserves the norm of each vector that it acts on,
−1
it is an orthogonal matrix. Moreover, Uw = Uw since a reflection is self-inverse.
Consequently, Householder matrices are well suited for numerical computation:
they are simple to define (1959.1), numerically stable, and easy to invert.
Let A = [a1 a2 . . . an ] ∈ Mm×n (R), and suppose for the sake of simplicity
that none of the columns of A are zero. Find an orthogonal matrix2 U1 ∈ Mm (R)
so that U1 a1 equals a1 times the first standard basis vector in Rm . Then
!
a1
U1 A = , A ∈ M(m−1)×(n−1) (R), (1959.2)
0 A
in which denotes entries that are of no interest to us. The same principle applies
to the smaller matrix A now. Iterating this procedure n times, one obtains a
sequence of orthogonal matrices U1 , U2 , . . . , Un ∈ Mm (R) so that
!
R
U ···U U A = ,
n 2 1 0(m−n)×n
U
in which U ∈ Mm (R) is orthogonal and R ∈ Mn (R) is upper-triangular and has

nonnegative diagonal entries. Let V = U T and partition V = [Q Q ], in which
Q ∈ Mm×n (R). Since V is an orthogonal matrix, Q has orthonormal columns and
! !
R R
A = V = [Q Q ] = QR + Q 0 = QR.
0 0
2 One can always find a Householder matrix that takes a given vector to another given vector
with the same norm. To improve numerical stability, it is useful to consider a slight generalization.
Suppose that x, y ∈ Rn and x = y = 0. Let σ = 1 if x · y ≤ 0 and σ = −1 if x · y > 0. Then
σUy−σx ∈ Mn (R) is a real orthogonal matrix that maps x to y; see [5].

Proposed by Stephan R. Garcia, Pomona College.
Let A = [a1 a2 . . . an ] ∈ Mn (R), in which a1 , a2 , . . . , an ∈ Rn . Use the QR
decomposition to prove Hadamard’s inequality
| det A| ≤ a1 a2 · · · an . (1959.3)
1959: Comments
Gram–Schmidt in the real world. It is customary in elementary linear
algebra courses to teach students how to orthogonalize a list of vectors with the
Gram–Schmidt process. While there is some merit to this (for instance, the Gram–
Schmidt process can be used to provide an easy proof that every finite-dimensional
inner product space has an orthonormal basis), students should be warned that the
Gram–Schmidt process is numerically unstable and hence unreliable in practice.
The QR decomposition, because of its reliance on orthogonal matrices, is stable
and yields much better results.
If A = [a1 a2 . . . an ] ∈ Mm×n (R) has linearly independent columns (this im-
plies that m ≥ n), then the columns of the matrix Q = [q1 q2 . . . qn ] ∈ Mm×n (R)
from the QR decomposition are orthonormal and they have the property that
span{a1 , a2 , . . . , ar } = span{q1 , q2 , . . . , qr }
for r = 1, 2, . . . , n.
Hadamard matrices. Jacques Hadamard first proved his eponymous inequal-
ity in 1893 [8]; it is related to a fascinating open problem in combinatorics. If each
entry of A ∈ Mn (R) is −1 or 1, then Hadamard’s inequality (1959.3) tells us that
| det A| ≤ nn/2 .
A matrix for which equality holds is a Hadamard matrix of order n. It is possible to
show that the order of a Hadamard matrix must be 1, 2, or a multiple of 4. Some
Hadamard matrices of small order are
⎡ ⎤
! 1 1 1 1
1 1 ⎢ 1 1 −1 −1 ⎥
[1], , ⎢ ⎥,
1 −1 ⎣ 1 −1 −1 1 ⎦
1 −1 1 −1
and ⎡ ⎤
1 1 1 1 1 1 1 1
⎢ 1 1 1 1 −1 −1 −1 −1 ⎥
⎢ ⎥
⎢ 1 1 −1 −1 −1 −1 1 1 ⎥
⎢ ⎥
⎢ 1 1 −1 −1 1 1 −1 −1 ⎥
⎢ ⎥.
⎢ 1 −1 −1 1 1 −1 −1 1 ⎥
⎢ ⎥
⎢ 1 −1 −1 1 −1 1 1 −1 ⎥
⎢ ⎥
⎣ 1 −1 1 −1 −1 1 −1 1 ⎦
1 −1 1 −1 1 −1 1 −1
As these examples suggest, a Hadamard matrix must be a multiple of an orthogonal
matrix; that is, it must have orthogonal rows and orthogonal columns. The famed
250 1959. QR DECOMPOSITION
Hadamard conjecture asserts that a Hadamard matrix of order 4k exists for every
positive integer k; the smallest permissible order for which no Hadamard matrix is
presently known is 668.
A determinantal inequality. We conclude with a beautiful determinantal
inequality for positive semidefinite matrices. Recall that A ∈ Mn (R) is positive
semidefinite if A is symmetric and its eigenvalues are nonnegative (the eigenvalues
of a real symmetric matrix are always real). This is equivalent to
A = B TB (1959.4)
for some B = [b1 b2 . . . bn ] ∈ Mm×n (R). This decomposition highlights one of
the main applications of positive semidefinite matrices. Since
i bj = bi · bj
aij = bT
for 1 ≤ i, j ≤ n in (1959.4), the entries of A measure the correlations between
the vectors b1 , b2 , . . . , bn ∈ Rm . In this context, positive semidefinite matrices
frequently arise in statistics. As a consequence of Hadamard’s inequality,
| det A| = | det(B T B)| = | det(B T ) det B| = | det B|2
≤ b1 2 b2 2 · · · bn 2
= a11 a22 · · · ann
since aii = bi · bi = bi 2 ≥ 0 for i = 1, 2, . . . , n. Thus, the absolute value of the
determinant of a positive semidefinite matrix is bounded above by the product of
its diagonal entries. If A is merely symmetric, but not positive semidefinite, then
the preceding inequality fails. Consider
!
0 1
A= ,
1 0
for which | det A| = 1 and a11 = a22 = 0. For more information about positive
semidefinite matrices and their properties, see [5].
Bibliography
[1] B. A. Cipra, The best of the 20th century: editors name top 10 algorithms, SIAM News
33 (2000).
[2] J. Dongarra and F. Sullivan, The top 10 algorithms, Comput. Sci. Eng. 2 (2000), 22–23.
[3] J. G. F. Francis, The QR transformation: a unitary analogue to the LR transformation. I,
Comput. J. 4 (1961/1962), 265–271, DOI 10.1093/comjnl/4.3.265. MR0130111
[4] J. G. F. Francis, The QR transformation. II, Comput. J. 4 (1961/1962), 332–345, DOI
10.1093/comjnl/4.4.332. MR0137289
[6] G. H. Golub and C. F. Van Loan, Matrix computations, 4th ed., Johns Hopkins Studies in the
Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, 2013. MR3024913
[7] G. Golub and F. Uhlig, The QR algorithm: 50 years later its genesis by John Francis and
Vera Kublanovskaya and subsequent developments, IMA J. Numer. Anal. 29 (2009), no. 3,
467–485, DOI 10.1093/imanum/drp012. MR2520155
[8] J. Hadamard, Résolution d’une question relative aux déterminants, Bulletin des Sciences
Mathématiques 17 (1893), 240–246.
[9] V.N. Kublanovskaya, On some algorithms for the solution of the complete eigenvalue problem,
USSR Computational Mathematics and Mathematical Physics 1 (1963), no. 3, 637–657
1960
The Unreasonable Effectiveness of Mathematics
Introduction
This year honors a groundbreaking, influential article by Eugene Wigner [12],
the Nobel laureate in physics whose work in random matrix theory eventually led
to astonishing connections between the seemingly diverse fields of number theory
and nuclear physics; see the 1928 entry. In his article, Wigner discusses the use of
mathematics in physics:1
A possible explanation of the physicist’s use of mathematics to formu-
late his laws of nature is that he is a somewhat irresponsible person.
As a result, when he finds a connection between two quantities which
resembles a connection well-known from mathematics, he will jump at
the conclusion that the connection is that discussed in mathematics
simply because he does not know of any other similar connection. It is
not the intention of the present discussion to refute the charge that the
physicist is a somewhat irresponsible person. Perhaps he is. However,
it is important to point out that the mathematical formulation of the
physicist’s often crude experience leads in an uncanny number of cases
to an amazingly accurate description of a large class of phenomena.
This shows that the mathematical language has more to commend it
than being the only language which we can speak; it shows that it is,
in a very real sense, the correct language.
Mathematics is so ubiquitous in physics that the American Journal of Physics
asked, “Does any piece of mathematics exist for which there is no application what-
soever in physics? ” To this, physicist Dwight E. Neuenschwander (1952– ) re-
sponded:
While constructing such a “useless” piece of mathematics would be the
delight of a mathematical purist, it seems we physicists have always
managed to foil this lofty goal. It seems that even the most esoteric
mathematical inventions of the human mind are eventually used to
model physical systems. Why that should be true is of course a deep
and fascinating question. [9]
The catchphrase “unreasonable effectiveness” has spawned innumerable imitators
and it is difficult to catalogue them all. Some of the most influential were discussed
by economist K. Vela Velupillai (1947– ) [11]:
Eugene Wigner’s Richard Courant Lecture in the Mathematical Sci-
ences, delivered at New York University on 11 May 1959, was titled,
1 The repeated use of “his” and “he” to refer to a generic physicist is regrettable.
251
252 1960. THE UNREASONABLE EFFECTIVENESS OF MATHEMATICS
picturesquely and, perhaps, with intentional impishness The Unreason-

able Effectiveness of Mathematics in the Natural Sciences [12]. Twenty
years later, another distinguished scientist, Richard W. Hamming, gave
an invited lecture to the Northern California Section of the Mathe-
matical Association of America with the slightly truncated title The
Unreasonable Effectiveness of Mathematics [5]. A decade or so later,
Stefan Burr tried a different variant of Wigner’s title by organising a
short course on The Unreasonable Effectiveness of Number Theory [2].
Another decade elapsed before Arthur Lesk, a distinguished molecular
biologist at Cambridge, gave a lecture at the Isaac Newton Institute
for Mathematical Sciences at Cambridge University where yet another
twist to the Wigner theme was added: The Unreasonable Effectiveness
of Mathematics in Molecular Biology [8].2
The words “unreasonable” and “effectiveness” are often slightly modified to fit the
author’s point. For example, there is The Reasonable Ineffectiveness of Research
in Mathematics Education [7]. In The Reasonable Effectiveness of Mathematics in
Economics [3], Frank J. Fabozzi (1948– ) and Sergio M. Focardi tell us:
In a nutshell, we believe that the reason that mathematics is only
reasonably effective in economics is because we apply mathematics to
study large engineered artefacts (i.e., economies or financial markets),
that have been designed to allow a lot of freedom so as to encour-
age change and innovation. The level of unpredictability and control
is clearly different when considering systems governed by immutable
natural laws as opposed to artefacts constructed by humans. Some
systems, such as economies or financial markets, are prone to crises.
Mathematics does a reasonably good job in describing these systems.
But the mathematics involved is not that of physics: It is the mathe-
matics of learning and complexity.
Mathematics is often called the language of the universe. However, some dis-
pute how far this universe extends beyond physics and astronomy and how much is
actually needed to describe the world and make significant contributions; see the ar-
ticle [13] by biologist Edward Osborne Wilson (1929– ). Wigner’s article influenced
even those who profoundly disagree with him. For example, Israel Gelfand, who
worked both in pure mathematics (see the 1941 entry) and mathematical biology,
said:
Eugene Wigner wrote a famous essay on the unreasonable effectiveness
of mathematics in natural sciences. He meant physics, of course. There
is only one thing which is more unreasonable than the unreasonable
effectiveness of mathematics in physics, and this is the unreasonable
ineffectiveness of mathematics in biology.
The engineer Derek Abbott (1960– ) wrote the influential rebuttal The Reasonable
Ineffectiveness of Mathematics [1], in which he writes:
Science is a modern form of alchemy that produces wealth by pro-
ducing the understanding for enabling valuable products from base
ingredients. Science is merely functional alchemy that has had a few
incorrect assumptions fixed, but has in its arrogance replaced them
2 The punctuation and citation style has been slightly modified.

with more insidious ones. The real world of nature has the uncanny
habit of surprising us; it has always proven to be a lot stranger than
we give it credit for. Mathematics is a product of the imagination that
sometimes works on simplified models of reality. Platonism is a viral
form of philosophical reductionism that breaks apart holistic concepts
into imaginary dualisms. . . . Mathematics is a human invention for de-
scribing patterns and regularities. It follows that mathematics is then
a useful tool in describing regularities we see in the universe. The re-
ality of the regularities and invariances, which we exploit, may be a
little rubbery, but as long as they are sufficiently rigid on the scales of
interest to humans, then it bestows a sense of order.
Certainly many mathematicians would disagree with Abbott’s account!

Proposed by Stanislav Molchanov and Harold Reiter, UNC Charlotte.
The following four problems illustrate Wigner’s principle that a single mathematical
idea often appears in several different areas.
Problem 1. The Catalan numbers are defined for integers n ≥ 0 by

1 2n
Cn = . (1960.1)
n+1 n
The first several Catalan numbers are
1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, . . . .
Prove that Cn is always an integer.
Problem 2. The probability density
⎧
⎪ √
⎨ 1 4 − x2 if |x| ≤ 2,
p(x) = 2π
⎪
⎩0 otherwise,
arises in Wigner’s semicircle law, which he proposed for the description of the
spectra of heavy atomic nuclei. Show that its moments are
2
1 Cn/2 if n is even,
x n
4 − x2 dx =
2π −2 0 if n is odd.
Problem 3. A tree is a graph in which any two vertices are connected by exactly
one path. An ordered tree is a rooted tree in which the children of each vertex
are given a fixed left-to-right order. Show that Cn is the number of nonisomorphic
orderded trees with n vertices; see Figure 1.
Problem 4. Suppose that we must multiply n ≥ 2 symbols a1 , a2 , . . . , an using
a binary but not necessarily associative operation b(x, y). Consequently, we must
keep track of order. We are interested in the number of structurally different ways
we can combine the symbols, and not the number of different ways we can then
input the n objects into the possibilities. If we let Sn−1 denote the number of
different structures we can use to multiply n symbols using our binary operation
n − 1 times, then S1 = 1 since the only way to combine two symbols is b(a1 , a2 );
we do not count b(a2 , a1 ) since it is structurally the same as b(a1 , a2 ).
254 1960. THE UNREASONABLE EFFECTIVENESS OF MATHEMATICS
Figure 1. Two rooted trees on 23 vertices. The root vertices are

highlighted in red. If we had to choose names for the trees, they
would be Telperion the Silver and Laurelin the Golden.
Similarly, S2 = 2 since we have only two structurally different approaches:

b (a1 , b(a2 , a3 )) and b (b(a1 , a2 ), a3 ) .
A little more work shows that S3 = 5:

b b(a1 , a2 ), b(a3 , a4 ) , b b(a1 , a2 ), a3 , a4 , b a1 , b b(a2 , a3 ), a4 ,

b b a1 , b(a2 , a3 ) , a4 , and b a1 , b a2 , b(a3 , a4 ) .
Show that Sn = Cn .
1960: Comments
Catalan numbers. There is a wealth of interesting facts known about the
Catalan numbers. First of all, they are named after the French-Belgian mathemati-
cian Eugène Charles Catalan (1814–1894), who does not appear to be Catalonian.
Nevertheless, the term “Catalonian” has been used by a few authors to refer to
subjects related to the Catalan numbers [4, p. 254] (at least the authors think it a
good idea and are not above flagrant self-reference). The Catalan numbers appear
in many different places in mathematics; over fifty such occurrences are discussed
in [10].
It turns out that Cn is the number of ways to write n left parentheses and n
right parentheses so that, as we move from left to right, we never see more right
parentheses than left parentheses. We see that C1 = 1 since the only possible
arrangement is (). Similarly, C2 = 2 since there are only two permissible configu-
rations: ()() and (()). For n = 3, we have exactly five options:
((())), (()()), (())(), ()(()), and ()(()).
Thus, C3 = 5. See the comments for the 2008 entry for the asymptotic rate of
growth of the Catalan numbers.
Another interesting interpretation of Cn is that it is the number of “staircase
walks” from (0, 0) to (n, n) that never rise above the main diagonal; that is, j ≤ k
whenever (j, k) is on our path. Such a path is called a Dyck path, in honor of
Walther von Dyck (1856–1934); see Figure 2.
Figure 2. There are C4 = 14 Dyck paths of order 4.
Bibliography
[1] D. Abbott, The reasonable ineffectiveness of mathematics, Proceedings of the IEEE, Vol.
101, no. 10, October 2013.
[2] S. A. Burr (ed.), The unreasonable effectiveness of number theory, papers from the American
Mathematical Society Short Course held in Orono, Maine, August 6–7, 1991, Proceedings of
Symposia in Applied Mathematics, vol. 46, American Mathematical Society, Providence, RI,
1992. MR1195838
[3] S. M. Focardi and F. J. Fabozzi, The reasonable effectiveness of mathematics in economics,
American Economist 1 (2010), no. 55, 19–30.
[4] S. R. Garcia and S. J. Miller, 100 Years of Math Milestones: The Pi Mu Epsilon Centennial
Collection, American Mathematical Society, 2019.
[5] R. W. Hamming, The unreasonable effectiveness of mathematics, Amer. Math. Monthly 87
(1980), no. 2, 81–90, DOI 10.2307/2321982. MR559142
[6] A. Harvey, The Reasonable Effectiveness of Mathematics in the Physical Sciences, Relativity
and Gravitation, 43 (2011), 3057–3064.
[7] J. Kilpatrick, The reasonable ineffectiveness of research in mathematics education, For the
Learning of Mathematics 2 (1981), no. 2, 22–29.
[8] A. M. Lesk, The unreasonable effectiveness of mathematics in molecular biology, Math. In-
telligencer 22 (2000), no. 2, 28–37, DOI 10.1007/BF03025372. MR1764266
[9] D. E. Neuenschwander, Does any piece of mathematics exist for which is no application
whatsoever in physics?, Amer. J. Phys. 63 (1996), 63.
[10] R. P. Stanley, Enumerative combinatorics. Vol. 2, with a foreword by Gian-Carlo Rota and
appendix 1 by Sergey Fomin, Cambridge Studies in Advanced Mathematics, vol. 62, Cam-
bridge University Press, Cambridge, 1999. MR1676282
[11] K. V. Velupillai, The unreasonable ineffectiveness of mathematics in economics, Cambridge
Journal of Economics 29 (2005), 849–872.
[12] E. P. Wigner, The unreasonable effectiveness of mathematics in the natural sciences [Comm.
Pure Appl. Math. 13 (1960), 1–14; Zbl 102, 7], Mathematical analysis of physical systems,
Van Nostrand Reinhold, New York, 1985, pp. 1–14. https://www.dartmouth.edu/~matc/
MathDrama/reading/Wigner.html. MR824292
[13] E. O. Wilson, Great Scientist = Good at Math: E. O. Wilson shares a secret: Discoveries
emerge from ideas, not number-crunching, Wall Street Journal (online). http://www.wsj.
com/articles/SB10001424127887323611604578398943650327184.
1961
Lorenz’s Nonperiodic Flow
Introduction
There is a certain “continuity principle” that underlies much familiar mathe-
matics: if one jiggles parameters a little bit, then the final answer should only change
by a small amount. For example, the roots of a quadratic polynomial ax2 + bx + c,
in which a = 0, are given by
√
−b ± b2 − 4ac
.
2a
As long as we avoid a = 0, the (possibly complex) roots vary continuously with the
parameters (a, b, c). Similarly, the area and perimeter of a polygon vary continu-
ously with the placement of its vertices.
Beginning with the work of Henri Poincaré on the orbits of planets, this gen-
eral principle began to be questioned. A milestone in our understanding of chaotic
behavior is the work of Edward Lorenz (1917–2008). His seminal paper Determin-
istic Nonperiodic Flow, published in 1963 (but based on work started in 1961),
introduced the notion of “sensitive dependence on initial conditions.” This refers
to when minute changes to initial conditions drastically affect long-term behavior.
In an attempt to study the weather, Lorenz considered the deterministic system
dx dy dz
= σ(y − x), = x(ρ − z) − y, = xy − βz,
dt dt dt
in which x is proportional to the rate of convection, y to the horizontal temperature
variation, z to the vertical temperature variation, and σ, ρ, and β are parameters.
He wanted to rerun some calculations from an intermediate point. When he fed
the output from a previous run into the computer, the system behaved in a totally
different manner than it had before.
How could this occur in a deterministic system? Lorenz’s printer only displayed
three digits of the output, while the computer code worked internally with six. The
resulting loss of precision changed the initial conditions slightly and permitted the
two computations to make radically different long-term predictions. Many people
are familiar with the butterfly effect, a phrase which insinuates that the flap of a
butterfly’s wings may eventually cause (or prevent) the onset of a tornado hundreds
of miles away. Long-term weather forecasting may be impossible since we can never
know all the parameter values with perfect accuracy.
The following problem shows that a tiny difference in the initial trajectory of a
billiard ball can have qualitative effects on the long-time orbits of the initial points.
Furthermore, if one imagines this rectangle is slightly compressed to have concave
257
258 1961. LORENZ’S NONPERIODIC FLOW
sides, then a tiny difference in the initial slope has exponentially large effects on
the long-time orbits of the initial points, a property of a dynamical system known
as sensitive dependence on initial conditions.

Proposed by Craig Corsi and Steven J. Miller, Williams College.
Imagine playing billiards, in which the billiards table is the unit square [0, 1] ×
[0, 1] in R2 , and the ball is a point. You place the ball at (0, 0), the lower-left edge
of the table, and strike the ball at some angle θ ∈ (0, π/2) relative to the positive
x-axis. Assume that there is no friction and the boundary of the square is perfectly
elastic. Then the ball will continue to bounce off the walls of the table forever. For
instance, if θ = π/4, then the ball will bounce back and forth between the lower-left
and upper-right corners of the table.
Let xθ (t) represent the position of the ball at time t.
(a) For any N ∈ N, show that if θ = φ, then there exists a t > N such that
|xθ (t) − xφ (t)| > 1/2.
(b) For any θ ∈ (0, π/2), show that either (i) the number of points on the edge of
the billiards table hit by the ball is finite or (ii) any line segment contained in
the boundary of the unit square, however small, is hit infinitely often by the
ball; see Figure 1.
(c) Classify all angles for which (i) occurs in (b).
Figure 1. Two billiard-ball trajectories starting at (0, 0) on a fric-

tionless, perfectly elastic, square billiard table represented by the
unit square [0, 1] × [0, 1]. (top) The slope of the initial trajectory
is the rational number 27/10 = 2.7. (bottom) The slope of the
initial trajectory is the irrational number e ≈ 2.71828. The small
difference in initial conditions leads to profound differences in the
eventual behavior of the system.
1961: Comments
Newton’s method. Here is a particularly nice example of a chaotic system
that sets the stage for our 1964 and 1978 entries.
Newton’s method is a powerful algorithm that constructs a sequence of real
numbers that rapidly converges to a zero of a given polynomial. For example,
f (x) = x2 − 3
√ √
has
√ the zeros x = ± 3. Arithmetic tells us that 1 < 3 < 2 and we suspect that
3√lies closer to 2 than 1. Let x0 = 2 be our initial guess for the numerical value
of 3; it is not a particularly good guess, but this matters not because Newton’s
method is incredibly effective. Construct the tangent line to the the graph of f (x)
at the point (x0 , f (x0 )); that is,
y − f (x0 ) = f (x0 )(x − x0 ).
We suspect that the point x1 at which √

the tangent line intersects the x-axis should
be a decent approximation to the zero 3 of f ; see Figure 2. Set (x, y) = (x1 , 0) in
the preceding and obtain
f (x0 ) f (2) 7
x1 = x0 −
= 2− = = 1.75.
f (x0 ) f (2) 4
Our succeeding approximations are generated by the recursion
f (xn )
xn+1 = xn − .
f (xn )
Figure 2. First step of Newton’s method to compute a root of

x2 − 3. The initial guess x0 = 2 provides the approximation
√ x1 =
7/4 = 1.75, which is already remarkably close to 3 = 1.73205 . . ..
Start with the initial approximation x0 = 2 and obtain

7
x1 = = 1.75,
4
97
x2 = = 1.73214 . . . ,
56
18817
x3 = = 1.732050810 . . . ,
10864
at which point we might as well stop, since
√
3 = 1.732050807 . . . .
Thus, only three iterations of Newton’s method are required to get seven digits of
accuracy; see the 1964 entry for more information about the computation of square
roots and some results about the rate of convergence.
Newton fractals. In the preceding example, one can show that

⎧√
⎪
⎨ √ 3 if x0 > 0,
lim xn = − 3 if x0 < 0,
n→∞ ⎪
⎩
0 if x0 = 0.
Figure 3. Newton fractal for f (z) = z 6 − 1.

In particular, x0 = 0 is a poor initial choice since xn is the zero sequence and hence
does not converge to a zero of f . Things become much more interesting if we use
polynomials of higher degree and permit the use of complex numbers.
Consider the complex polynomial
f (z) = z 6 − 1,
whose roots are the vertices of an equilateral hexagon inscribed in the unit circle
|z| = 1; see the figure on p. 236. For almost every complex number z, the sequence
obtained from Newton’s method with initial seed z converges to one of the three
roots. But which one? Assign a color to each of the three roots of f . Now paint each
initial seed z according to which root the seed iterates to under Newton’s method.
The resulting image (see Figure 3) is a Newton fractal. Other polynomials yield
similarly enchanting images; see Figure 4. For a wealth of information about chaos
and fractals, see [3].
Figure 4. Newton fractal f (z) = z 5 − 1.

Bibliography
[1] E. N. Lorenz, Deterministic Nonperiodic Flow, Journal of Atmospheric Sciences 20 (1963),
130–141. http://eaps4.mit.edu/research/Lorenz/Deterministic_63.pdf
[2] E. N. Lorenz, How much better can weather prediction become?, Technology Re-
view Jul/Aug 1969, 39-49. http://eaps4.mit.edu/research/Lorenz/How Much Better Can
Weather Prediction 1969.pdf
[3] H.-O. Peitgen, H. Jürgens, and D. Saupe, Chaos and fractals: New frontiers of science, 2nd
ed., with a foreword by Mitchell J. Feigenbaum, Springer-Verlag, New York, 2004. MR2031217
1962
The Gale–Shapley Algorithm

and the Stable Marriage Problem
Introduction
In a seminal 1962 paper [4], David Gale (1921–2008) and Lloyd Shapley (1923–
2016) initiated the formal study of stable matchings. In 2012 the Nobel Prize in
Economics was given to Shapley and Alvin Roth (1951– ) “for the theory of stable
allocations and the practice of market design” (Gale had the misfortune of passing
away in 2008, thus rendering him ineligible for a Nobel Prize).
One of the most important applications of these ideas is to the National Res-
ident Match Program (NRMP), which matches hospitals and medical students for
residencies. In 1998, the NRMP changed the matching algorithm in response to
concerns of fairness. Finding stable matchings that meet various fairness criteria
remains challenging and depends upon a careful study of intricate relationships in
posets imposed on multiple stable matchings.
Suppose that we have two groups of the same size: proposers and acceptors.
Each proposer must be matched with an acceptor; see Figure 1. We collect the
preferences for proposers and acceptors in two preference matrices, one for each of
Figure 1. (top) Each proposer (gray) has one or more desired

acceptors (red). (bottom) A compatible matching. This is a much
harder problem once ranked preferences are involved. That is the
context of the Gale–Shapley algorithm.
263
264 1962. THE GALE–SHAPLEY ALGORITHM & THE STABLE MARRIAGE PROBLEM
the groups. A matching is stable if no two parties prefer each other to their assigned
partners. In a stable matching, no two parties have a reason to switch partners.
The Gale–Shapley algorithm is an efficient proposal algorithm that, given two
preference matrices, finds stable matchings. These are often called stable marriages
because one of the original applications provided by Gale and Shapley was the
matching of n men to n women (although this makes one question who the consumer
of such an algorithm would be). The worst case complexity of the algorithm is
O(n2 ), which means that the number of steps needed is at worst proportional to
the square of the size of each group. Moreover, the Gale–Shapley algorithm always
returns at least one stable matching, and at most two of them no matter what set
of preferences are given. Unlike our description of the powerful simplex method
(see the 1947 entry), the Gale–Shapley algorithm is simple enough for us to explain
in detail.
Suppose we have a group of n men and a group of n women who want to be
matched.1 Let the men, in turn, propose to the women, each of whom either rejects
the proposals she receives or breaks off a previous engagement if a better proposal
comes along. Here is the Gale–Shapley algorithm.
(a) In the first round, each man proposes to the woman he prefers the most.
Each woman considers all the proposals she receives. She provisionally accepts
the proposal coming from the man she ranks highest among those who have
proposed to her and rejects all the other proposals.
(b) Each unengaged man now proposes to the woman he prefers among the women
he has not previously asked to marry him (once a woman rejects a man he
never asks her again), regardless of whether or not she has provisionally ac-
cepted a proposal. Each woman considers all the proposals she receives and
provisionally accepts the proposal coming from the man she ranks highest
among those who have proposed to her. She rejects all the other proposals.
(c) We keep repeating step (b), with the unengaged men asking and the women
provisionally accepting, until all men are provisionally engaged. At this point
all the provisional engagements become permanent and we have obtained a
matching between the men and women.
The proof that this algorithm always results in a stable matching is construc-
tive. Once a woman provisionally accepts a proposal, she can only stay the same or
trade-up; she is never unmatched. If a man has been unsuccessful, then he proposes
to someone new. Since there are the same number of men and women, there must
be at least one available woman who has not received any offers and thus must
accept his. Each man remains paired with a woman he prefers unless that woman
receives a better offer, and every woman is given the option of choosing among the
men that prefer her.
When the Gale–Shapley algorithm finds at most two stable matchings, it is
because the matching resulting from having one group do the proposing may differ
from the matching obtained when the other group does the proposing. If two
distinct stable matchings can be returned by the algorithm, each is optimal for the
1 We tried unsuccessfully to rephrase the algorithm in a gender-neutral manner. It became
increasingly difficult to understand because the words “its” and “their” could refer to either party.
group doing the proposing. For example, if the men propose, each man will fare at
least as well as he would in the matching obtained by having the women propose.
Current attention is focused on what happens when there are many more than
two stable matchings possible. In these cases the additional matchings must be
found with other algorithms. Christine Cheng and her colleagues recently proved
that a nice relationship holds for local and global aspects of the set of matchings,
given that all the stable matchings for a particular problem instance have been
found. This work involves the following two concepts.
Global Median Matching (GMM): Impose a partial ordering on the set of stable
matchings according to the rule that one matching is better than another if every
proposer (or symmetrically every acceptor) receives at least as good a partner in
the former matching as in the latter matching. The resulting poset terminates at
one end in the proposer-optimal matching and at the other end in the acceptor-
optimal matching. A GMM matching is a matching that lies a median number of
steps from these extreme matchings.
Local Median Matching (LMM): Consider for each proposer (and similarly for
each acceptor) the ordered set of all the rankings of the partners it is assigned in all
the stable matchings. An LMM is a matching that assigns all the people a partner
that lies at the median of their ordered sets.
The surprising result is that not only do GMMs and LMMs exist, but there is
always at least one GMM and LMM that are identical. Therefore, if one accepts
these local and global measures of fairness as valid, both measures can be satisfied
by a single stable matching.

Proposed by Paul Kehle, Hobart and William Smith Colleges.
What is the problem? The problem is that in some cases, in addition to a co-
inciding GMM and LMM solution, other stable matchings are arguably fairer. How
can we characterize stable-matching problems according to whether the GMM/LMM
matching is the fairest of them all? What other measure of fairness should we use to
select a matching when the GMM/LMM matching leaves something to be desired?
Consider the set of stable matchings in Figure 2. Which one is fairest, and why?
How does your measure of fairness connect with the GMM and LMM measures?
1962: Comments
Kidney transplants. One of the powers of the Gale–Shapley algorithm is
its flexibility. If we can formulate a real-world problem in terms of assignments,
then the algorithm may be applicable. For example, the Gale–Shapley algorithm
can be used in diverse situations such as college admissions, scheduling tasks on
processors, matching kidneys with patients, internet search engine auctions, speed
dating, and pairing students to schools. For many problems in the world there are
other algorithms that could run faster or yield better solutions, but it is good to
know that a stable matching exists. Moreover, the algorithm often runs fast enough
to resolve the problem.
The great insight in many of these situations is that we can find solutions us-
ing market-like situations without money changing hands. For example, consider
Figure 2. This stable-matching instance for n = 8 has 16 stable matchings.

They form a partially ordered set that reveals a hierarchy of matchings. A
line between two matchings means that the matching with the larger Roman
numeral is one in which each of the proposers has a partner it prefers at least as
much as the partner it has in the matching with the smaller Roman numeral.
This ordering is therefore transitive; since XVI is better than XV and because
XV is better than XI, we see that XVI is better than XI even though no direct
line is drawn between these two matchings. Note however that XV is not better
than V, even though the average preference of the proposers in XV, 1.875, is
much lower than the average in V, 2.625 (examine A’s and G’s preferences).
This “better than” ordering is reversed for the acceptors’ perspective: lines
between matchings indicate that each of the acceptors has at least as good or
better a partner in the matching denoted by the smaller of the two Roman
numerals. Image courtesy of Paul Kehle.
Hari Peter
Hober Petra
Figure 3. An opportunity exists here for both Hober and Peter to

receive kidneys from outside of their families. Hari’s kidneys are
incompatible with Hober; Petra’s kidneys are incompatible with
Peter. However, Hari can donate to Peter and Petra can donate
to Hober.
kidney transplants. Initially most transplants came from deceased donors. How-
ever, it is possible for a living person to donate one of their kidneys. This greatly
increases the available supply, but many people are understandably hesitant to do-
nate one of their kidneys. Moreover, not any kidney can go to any patient; there
are compatibility issues that must be addressed.
Imagine two families in which someone needs a kidney, say Hari and Hober
in one, and Petra and Peter in another. Hober and Peter both need kidneys,
but unfortunately Hari’s kidney is incompatible with Hober. Similarly, Petra’s
is incompatible with Peter. However, Hari’s would work in Peter and Petra’s in
Hober. This opens up the opportunity for a trade that helps both families; see
Figure 3. Now Peter and Hobert can declare: “Kidneys! I’ve got new kidneys! I
don’t like the colour.”
Before the Gale–Shapley algorithm was applied in the early 2000s, only about
twenty transplants per year were from living donors. Now thousands of such trans-
plants have been performed successfully. For more information about this life-saving
application of mathematics, see [6, 7, 9].
Algorithmic bias. Concerns about the ever-present use of algorithms in mod-

ern society have recently begun to surface. It would take us too far afield to explore
the rapidly developing conversation about algorithmic bias here, so we content our-
self with a quote from Cathy O’Neill (1972– ), a mathematician whose experience in
the finance sector left her with grave concerns about the application of supposedly
“fair” algorithms.
The math-powered applications powering the data economy were based
on choices made by fallible human beings. Some of these choices were
no doubt made with the best intentions. Nevertheless, many of these
models encoded human prejudice, misunderstanding, and bias into the
software systems that increasingly managed our lives. Like gods, these
mathematical models were opaque, their workings invisible to all but
the highest priests in their domain: mathematicians and computer
scientists. Their verdicts, even when wrong or harmful, were beyond

dispute or appeal. And they tended to punish the poor and the op-
pressed in our society, while making the rich richer [8].
Bibliography
[1] C. T. Cheng, Understanding the generalized median stable matchings, Algorithmica 58 (2010),
no. 1, 34–51, DOI 10.1007/s00453-009-9307-2. http://link.springer.com/article/10.1007
%2Fs00453-009-9307-2. MR2658099
[2] C. T. Cheng and A. Lin, Stable roommates matchings, mirror posets, median graphs, and
the local/global median phenomenon in stable matchings, SIAM J. Discrete Math. 25 (2011),
no. 1, 72–94, DOI 10.1137/090750299. http://epubs.siam.org/doi/abs/10.1137/090750299.
MR2765702
[3] C. Cheng, E. McDermid, and I. Suzuki, A unified approach to finding good stable matchings
in the hospitals/residents setting, Theoret. Comput. Sci. 400 (2008), no. 1-3, 84–99, DOI
10.1016/j.tcs.2008.02.014. MR2424344
[4] D. Gale and L. S. Shapley, College Admissions and the Stability of Marriage, Amer. Math.
Monthly 69 (1962), no. 1, 9–15, DOI 10.2307/2312726. http://www.econ.ucsb.edu/~tedb/
Courses/Ec100C/galeshapley.pdf. MR1531503
[5] D. Gusfield and R. W. Irving, The stable marriage problem: structure and algorithms, Foun-
dations of Computing Series, MIT Press, Cambridge, MA, 1989. MR1021242
[6] A. Hern, Trading kidneys, repugnant markets and stable marriages win the Nobel Prize
in Economics, NewStatesman, October 15, 2012. http://www.newstatesman.com/blogs/
economics/2012/10/trading-kidneys-repugnant-markets-and-stable-marriages-win-
nobel-prize-econo.
[7] K. Luong, Matching theory: kidney allocation, Health Policy and Economics, UWOMJ 82,
no.1, Spring 2013. http://www.uwomj.com/wp-content/uploads/2013/10/v82no1_6.pdf.
[8] C. O’Neil, Weapons of math destruction: How big data increases inequality and threatens
democracy, Crown, New York, 2016. MR3561130
[9] Reuters, Alvin Roth Transformed Kidney Donation System, Reuters, October 15,
2012. http://forward.com/news/breaking-news/164327/alvin-roth-transformed-kidney-
donation-system/.
1963
Continuum Hypothesis
Introduction
In our 1918 entry, we introduced Cantor’s theory of cardinality and its shocking
implication that there are multiple levels of infinity. Recall that A ∼
= B means that
there is a one-to-one and onto function f : A → B. For finite sets, A ∼ = B if and
only if A and B have the same number of elements. Cantor’s brilliant insight was
to extend this definition to infinite sets. His classic diagonal argument (see p. 29)
reveals that no one-to-one correspondence between N and R exists; that is, N and
R represent two different levels of infinity.
Since N is a subset of R, it is natural to consider what happens “in between”
N and R. The continuum hypothesis (CH) asserts that if N ⊆ A ⊆ R, then either
A∼ = N or A ∼ = R; that is, there are no “intermediate infinities” between those of
the natural numbers and the real numbers.
Cantor believed the continuum hypothesis to be true and he spent years at-
tempting to prove it, without success. David Hilbert, one of the greatest math-
ematicians of all time, placed it first on his list of twenty-three open questions
presented to the International Congress of Mathematicians, held in Paris in 1900
(for more about Hilbert’s problems, see the 1935, 1970, and 1980 entries). Hilbert
opened his historic address with these words:
Who among us would not be happy to lift the veil behind which is
hidden the future; to gaze at the coming developments of our science
and at the secrets of its development in the centuries to come? What
will be the ends toward which the spirit of future generations of math-
ematicians will tend? What methods, what new facts will the new
century reveal in the vast and rich field of mathematical thought?
So is the continuum hypothesis true or false? To this day, nobody has been
able to prove it. Neither has anyone been able to disprove it. Nevertheless, the
problem has been resolved! How can this be?
In 1940, Kurt Gödel proved that CH cannot be disproved from the traditionally
accepted axioms of set theory [5]. Specifically, he showed that CH cannot be
disproved using the Zermelo–Fraenkel (ZF) axioms (see the 1929 entry) or the
Zermelo–Fraenkel axioms augmented with the axiom of choice (AC). This extended
axiom system is denoted ZFC; see the comments for the 1964 and 1969 entries.
In 1963, Paul Cohen (1934–2007) introduced the powerful forcing technique
and demonstrated that CH cannot be proved in ZFC [1, 2, 9]. Thus, the continuum
hypothesis is neither provable nor disprovable from the standard axioms of set
theory. Cohen won the prestigious Fields Medal in 1966 for this achievement.
269
270 1963. CONTINUUM HYPOTHESIS
See [9] for a remembrance of Paul Cohen; the second named author is one of his
mathematical grandchildren.
Of course, the results of Gödel and Cohen assume that ZFC is consistent. The
issue of whether or not ZFC is consistent is another story altogether; see the 1929
entry. To some extent, whether CH is “true” or “false” is a matter of opinion
since it can neither be proved nor disproved in ZFC. One can add either CH or its
negation to ZFC and obtain two different versions of mathematics, one in which CH
is “true” and one in which CH is “false.” Each is as valid as the other, although, as
Gödel showed, neither system can prove its own consistency. This situation seems
bizarre, although it becomes easier to understand if we study a similar occurrence
in classical geometry; see the comments for this entry.

Cardinality is a blunt instrument that ignores the topological properties of a
set. For example, R and the Cantor set (see the 1917 entry) are equinumerous but
“feel” totally different. One approach to distinguishing self-similar subsets of Rn is
the notion of fractal dimension.
A square, which is a “two-dimensional object,” consists of four scaled copies
of itself, each of which has been reduced by a factor of two. It also consists of
nine scaled copies of itself, each of which has been reduced by a factor of three.
The relationship between these numbers is p = r d , in which p is the number of
pieces in the dissection, r is the reduction factor, and d = 2 is the “dimension”
of the square. Something similar works for a cube, which we regard as a “three-
dimensional object” because a similar equation holds with d = 3; see Figure 1.
(a) Explain why the “fractal dimension” of the Cantor set C is log3 2 ≈ 0.6309298.
The Cantor set shows that there is a set of fractal dimension strictly between
0 and 1; see the notes for the construction of a set of fractal dimension strictly
between 1 and 2.
Figure 1. A cube consists of (left) 8 copies of itself, each scaled

down by a factor of 2; (center) 27 copies of itself, each scaled
down by a factor of 3; (right) 64 copies of itself, each scaled down
by a factor of 4.
(b) If you have two sets of positive fractal dimension d1 = d2 , can you always
construct a set whose fractal dimension is strictly between d1 and d2 ?
1963: Comments
Self-similarity. The Cantor set is self-similar: it is composed of two scaled
copies of itself, each of which has been shrunk by a factor of three. The power-law
relation between the number of pieces p, the reduction factor r, and the fractal
dimension d is p = r d ; that is,
log p
d= .
log r
At each stage of the Cantor set construction, the number of pieces is doubled and
each is shrunk by a factor of 3. Thus, the fractal dimension of the Cantor set is
log 2
≈ 0.63093.
log 3
What about a fractal whose dimension is between 1 and 2? Take a solid equilat-
eral triangle and subdivide it into four equilateral triangles by removing the central
triangle. Iterate this process and obtain the Sierpiński triangle (Figure 2), named
after Waclaw Sierpiński (1882–1969). In particular, observe that the Sierpiński tri-
angle resembles our diagram of the 3-adic integers; see the 1916 entry. This fractal
is composed of three scaled copies of itself, each of which has been shrunk by a
factor of two. Thus, its fractal dimension is
log 3
≈ 1.58496,
log 2
which is strictly between 1 and 2. For more information about fractals, see the
1961 and 1978 entries and [8].
Figure 2. Sierpiński triangle construction iterated eight times.

Figure 3. Illustration of Euclid’s fifth postulate. Since 0 < α +

β < π2 , one expects the red and blue lines to eventually intersect.
The parallel postulate. How does classical geometry relate to the indepen-
dence of the continuum hypothesis? The story begins around 2,300 years ago,
when Euclid of Alexandria (in modern Egypt) wrote the Elements, a monumental
treatise on geometry and related topics. The Elements was an attempt to build
geometry in a logical and rigorous manner from a few basic axioms. Although Eu-
clid’s book contains some mistakes and a few hidden assumptions, it is nonetheless
a magnificent intellectual achievement.
After defining everything from circles to isosceles triangles to rhomboids, Euclid
presents his five postulates (axioms):
(a) A straight line segment can be drawn joining any two points.
(b) Any straight line segment can be extended indefinitely in a straight line.
(c) Given any straight line segment, a circle can be drawn having the segment as radius
and one endpoint as center.
(d) All right angles are congruent.
(e) If two lines are drawn which intersect a third in such a way that the sum of the inner
angles on one side is less than two right angles, then the two lines inevitably must
intersect each other on that side if extended far enough.
The fifth postulate sticks out: it seems too complicated to accept as an axiom.
Perhaps with sufficient work we can deduce it from the remaining axioms? Euclid
himself must have been unsatisfied with his fifth postulate since he held off from
using it until his twenty-ninth theorem (Proposition I.29). For over 2,000 years
mathematicians tried unsuccessfully to prove that the fifth postulate followed from
the other postulates and definitions.
They all failed for a subtle reason: it is impossible to prove or disprove the
fifth postulate, given only the truth of the other four! This is because we can
produce two distinct versions of geometry, one in which the fifth postulate is true
and another in which it is false. If you assume that Euclid’s fifth postulate is
true, then your geometry is just plain-old plane geometry. If you assume that
Euclid’s fifth postulate is false, then you are studying hyperbolic geometry, a type
of curved geometry. The existence of curved geometries is not surprising to us in the
21st century, since we are used to hearing of relativity and “curved space-time.”
However, this was once an extremely radical thought. Indeed, the philosopher
Figure 4. The parallel postulate fails in the Poincaré disk model

of hyperbolic geometry. Given a line that does not contain p,
there are infinitely many hyperbolic lines through p that are par-
allel to the given line.
Immanuel Kant (1724–1804) went so far as to say that “Euclidean geometry is the
inevitable necessity of thought.”
The fifth postulate is often called the parallel postulate because it is equivalent
to Playfair’s axiom:
In a plane, given a line and a point not on it, at most one line parallel
to the given line can be drawn through the point.
The Poincaré disk model of the hyperbolic plane is a geometry in which Euclid’s
first four postulates hold, but the parallel postulate fails; see Figure 4. The “points”
in this geometry are elements of an open disk. The “lines” are arcs of circles that
intersect the boundary circle orthogonally. This geometry satisfies the first four of
Euclid’s axioms, but not the fifth. See [6] for the whole story behind Euclidean and
non-Euclidean geometry.
Bibliography
[1] P. Cohen, The independence of the continuum hypothesis, Proc. Nat. Acad. Sci. U.S.A.
50 (1963), 1143–1148, DOI 10.1073/pnas.50.6.1143. http://www.ncbi.nlm.nih.gov/pmc/
articles/PMC221287/. MR0157890
[2] P. J. Cohen, The independence of the continuum hypothesis. II, Proc. Nat. Acad. Sci. U.S.A. 51
(1964), 105–110, DOI 10.1073/pnas.51.1.105. http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC300611/. MR0159745
[3] T. Y. Chow, A beginner’s guide to forcing, Communicating Mathematics: A Conference in
Honor of Joseph A. Gallian’s 65th Birthday, Contemporary Mathematics 479, 25–40. http://
[4] L. Gillman, Two classical surprises concerning the axiom of choice and the continuum hypothe-
sis, Amer. Math. Monthly 109 (2002), no. 6, 544–553, DOI 10.2307/2695444. http://www.maa.
org/sites/default/files/pdf/upload_library/22/Ford/Gillman544-553.pdf. MR1908009
[5] K. Gödel, The Consistency of the Continuum Hypothesis, Annals of Mathematics Studies, no.
3, Princeton University Press, Princeton, N. J., 1940. MR0002514
[6] R. Hartshorne, Geometry: Euclid and beyond, Undergraduate Texts in Mathematics, Springer-
Verlag, New York, 2000. MR1761093
10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf.
[8] H.-O. Peitgen, H. Jürgens, and D. Saupe, Chaos and fractals: New frontiers of science, 2nd
ed., with a foreword by Mitchell J. Feigenbaum, Springer-Verlag, New York, 2004. MR2031217
[9] P. Sarnak (ed.), Remembering Paul Cohen, Notices of the AMS 57 (2010), no. 7, 824–838.
1964
Principles of Mathematical Analysis
Introduction
One of the most important contributions someone can make to mathematics
is to encourage others to join and thrive in the field. Although there are many
ways to do this, one way is through quality writing. A good textbook can circle
the globe, edition after edition, reaching many generations. For example, Euclid’s
Elements remained in use for almost 2,000 years; see notes for the 1963 entry.
One of the most prestigious prizes honoring such work is the Leroy P. Steele
Prize for Mathematical Exposition. It was first given in 1993 to Walter Rudin
(1921–2010) for his enormously influential books Principles of Mathematical Anal-
ysis [4] and Real and Complex Analysis [5]. These books have been used around
the world for decades and have influenced countless mathematicians. They have
survived into many editions. In fact, the reason this is the entry for 1964 and not
1953 is that this year marks the publication of the second edition of Principles.
Many mathematicians profess their love for these books because of the challeng-
ing problems at the end of each chapter. On a personal note, the second named
author remembers using the third edition of Principles as a sophomore at Yale.
At the time he was on the fence between mathematics and physics. The joy of
wrestling with Rudin’s problems finally pushed him into the math camp.
Principles is such an omnipresent classic that one can hardly imagine the time,
shortly after it was published, when it was just another new real analysis textbook.
The original 1953 review from the Bulletin of the American Mathematical Soci-
ety compared three contemporary real analysis books: Real Functions by Casper
Goffman (1913–2006), H. P. Thielman’s Theory of Functions of Real Variables,
and Rudin’s Principles of Mathematical Analysis; see Figure 1. The reviewer,
M. E. Munroe, concluded:
Rudin’s book is definitely the smoothest. He lists his theorems in the

most effective order for facilitating his arguments, and he invariably
comes up with extremely neat proofs. [2]
Rudin’s books do have their detractors. His style, which was typical for the era,
is terse. As generations of students have lamented, illustrations are notoriously ab-
sent from Principles. The first named author suspects that for each student turned
on to mathematics by Principles’ style, another few are turned away. Perhaps the
widespread use of Principles is one of the main reasons why real analysis is so fre-
quently viewed as the “sink or swim” course by mathematics majors. As Herbert
275
276 1964. PRINCIPLES OF MATHEMATICAL ANALYSIS
Figure 1. Comparison of Rudin’s Principles of Real Analysis ver-

sus two competitors in the November 1953 issue of the Bulletin of
the American Mathematical Society [2].
Wilf (1931–2012), who took undergraduate analysis at MIT under Rudin said:
This course is famous for being our rite of passage. Our hazing cere-
mony. If you want to join the club, then here is the hurdle that you
have to jump over. [6]
One Goodreads reviewer had the following humorous take:
I have mixed feelings about this book. How to describe it. . . ok, let’s
talk kung-fu movies. So there’s a standard trope in martial arts movies
where the young apprentice shows up at the stoop of the Old Master
and says, “teach me to fight”. And the Old Master decides that instead
of doing the obvious thing and having our hapless padawan practice
something reasonable like, you know, punching techniques, the Old
Master tells the aspirant to do a series of incomprehensible and difficult
tasks. Carrying the Old Master up and down the mountains. Knitting
sweaters while hanging upside-down over hot coals. Doing the Old
Master’s laundry. And so on. Usually, it’s never clear if the training is
difficult because Sensei is trying to impart some kind of deeper wisdom
or if he’s really just a resentful old jerk who takes pleasure in making
young students suffer.
Principles of Mathematical Analysis is the Old Master. It is com-
pletely uncompromising—no diagrams, the proofs are often opaque,
the definitions unmotivated—and the book carries more than a whiff
of that sadistic strain in math education that sees formal rigor and a
lack of justification as a kind of intellectual machismo. [3]

All these years later, I still remember Problems 16, 17, and 18 from Chapter 3
of Principles. This was my first introduction to Newton’s method, and I remember
being amazed at being able to prove how rapidly convergence set in when computing
square roots in Problem 16. Problem 17 involved a significantly slower method for
finding square roots and Problem 18 was the generalization of Problem 16 to pth
roots. I went to the office of my professor, Peter Jones (1952– ), to talk about these
further. Although the problem below is somewhat standard, I’ve chosen to use that
because of the impact these three problems had on me. I strongly urge any reader
not familiar with these books to pick up a copy, read on, and try the exercises.
Exercise #16, Chapter 3 (third edition). Fix a positive number α. Choose
√
x1 > α, and define x2 , x3 , x4 , . . . by the recursion formula

1 α
xn+1 = xn + .
2 xn
√
(a) Prove that {xn } decreases monotonically and that lim xn = α.
√
(b) Put n = xn − α, and show that
2n 2
n+1 = < √n
2xn 2 α
√
so that, setting β = 2 α,
2n
1
n+1 < β (n = 1, 2, 3, . . . ).
β
(c) This is a good algorithm for computing square roots, since the recursion for-
mula is simple and the convergence is extremely rapid. For example, if α = 3
and x1 = 2, show that 1 /β < 1/10 and that therefore
5 < 4 · 10−16 , 6 < 4 · 10−32 .
1964: Comments
The axiom of choice. We were so busy in the 1963 entry discussing the
continuum hypothesis that we never had a chance to say anything substantial about
the axiom of choice! That is a much more exciting topic than debating the merits
of Rudin’s Principles of Real Analysis.
In the proof of the existence of Vitali sets and in the derivation of the Banach–
Tarski paradox (see the 1924 entry), we had an equivalence relation on a set. We
produced a new set by selecting one element from each equivalence class. This step
implicitly appeals to the axiom of choice (AC):
Axiom of Choice. If {Xα }α∈I is a nonempty collection of
#
nonempty sets, then there is an f : I → α∈I Xα such that
f (α) ∈ Xα for all α ∈ I.
278 1964. PRINCIPLES OF MATHEMATICAL ANALYSIS
This axiom, stated above in familiar mathematical terminology (as opposed to

purely symbolically), is not one of the axioms of Zermelo–Fraenkel set theory (ZF).
The axiom system obtained by augmenting ZF with AC is abbreviated ZFC.
The function f “chooses” one element f (α) ∈ Xα for each α ∈ I. This addi-
tional axiom of set theory is needed whenever one needs to make infinitely many
choices without a definite procedure in place to do so. Think of each “choice” as a
“step” in a proof. Finitely many choices can be made in a proof of finite length. If
infinitely many choices must be made, then we need AC to accomplish this in “one
step” unless there is a definite procedure to make the choices automatically.
The axiom of choice is used implicitly in statements such as the following:
Suppose that X1 , X2 , . . . are nonempty sets. Let x1 , x2 , . . . be a

sequence such that xn ∈ Xn for n = 1, 2, . . ..
Without further knowledge about the sets Xn , the axiom of choice is required
to assert that the sequence x1 , x2 , . . . exists. What do we mean about “further
knowledge”? For example, AC is not required for the following statement:
Suppose that X1 , X2 , . . . are nonempty subsets of N. Let x1 , x2 , . . .

be the sequence defined by xn = min Xn for n = 1, 2, . . ..
Here we have used the fact that N is well-ordered : a nonempty subset of N contains
a smallest element. This does not require the axiom of choice because we have
provided a definite rule for producing each xn .
Suppose that a caterpillar with infinitely many pairs of legs is getting dressed.
It has infinitely many pairs of shoes and infinitely many pairs of socks. For each pair
of legs, the caterpillar can put on the left shoe first, then the right. The caterpillar
is unable to wear its socks without the axiom of choice, since infinitely many choices
need to be made without the aid of a procedure for making the selection. Since the
socks are indistinguishable, an arbitrary choice must be made for each pair.
For more about the axiom of choice, see the comments for the 1999 entry.
Cauchy functional equation. In 1905, Georg Hamel (1877–1954) used the

axiom of choice to prove that not every solution f : R → R to the Cauchy functional
equation
f (x + y) = f (x) + f (y)
is of the form f (x) = cx for some c ∈ R. Can you prove this?
A surprising isomorphism. Another cute application of the axiom of choice

is the following: the groups (R, +) and (R2 , +) are isomorphic. That is, there is a
bijection φ : R → R2 such that φ(x + y) = φ(x) + φ(y) for all x, y ∈ R. Can you
prove this?
Bibliography
[1] G. Hamel, Eine Basis aller Zahlen und die unstetigen Lösungen der Funktionalgleichung:
f (x + y) = f (x) + f (y) (German), Math. Ann. 60 (1905), no. 3, 459–462, DOI
10.1007/BF01457624. MR1511317
[2] M. E. Munroe, Book Review: Real functions // Book Review: Principles of mathematical
analysis // Book Review: Theory of functions of real variables, Bull. Amer. Math. Soc. 59
(1953), no. 6, 572–577, DOI 10.1090/S0002-9904-1953-09765-8. MR1565532
[3] M. Needham, Review of Principles of Mathematical Analysis, https://www.goodreads.com/
review/show/1271096254?book_show_action=true&from_review_page=1.
[4] W. Rudin, Principles of mathematical analysis, McGraw-Hill Book Company, Inc., New York-
Toronto-London, 1953. MR0055409
[5] W. Rudin, Real and complex analysis, McGraw-Hill Book Co., New York-Toronto, Ont.-
London, 1966. MR0210528
[6] H. S. Wilf, Epsilon sandwiches, https://www.math.upenn.edu/~wilf/website/MAASpeech.
1965
Fast Fourier Transform
Introduction
There are many important milestones in our efforts to find better and faster
ways to solve problems. One of the most important is the fast Fourier trans-
form (FFT), developed in 1965 by James William Cooley (1926–2016) and John
Tukey (1915–2000). It is (unintentionally) based upon tools first developed by Carl
Friedrich Gauss in 1805 to calculate the coefficients in a trigonometric expansion
related to the trajectories of two asteroids. The FFT has had a tremendous impact
upon the engineering community, particularly in the field of digital signal process-
ing.
A discrete periodic function with period n can be thought of as a function
whose domain is the cyclic group Z/nZ. Such functions arise naturally not only
in abstract algebra and number theory, but also in many real-world applications.
For example, a real- or complex-valued function on Z/nZ can be regarded as the
discretization of a continuous, periodic function; see Figure 1. The discrete Fourier
transform (DFT) of f : Z/nZ → C is the function f+ : Z/nZ → C defined by
1
n−1
f+(j) = √ f (k)e−2πijk/n ,
n
k=0
in which i2 = −1. The DFT is used to analyze the strength of the “signal” f at
√
various frequencies. The normalization 1/ n is not universal: 1 and 1/n are also
used. Since periodic functions arise anytime there are waves or vibrations, the DFT
is used to analyze everything from radio waves to earthquakes.
The FFT reduces the number of computations required to compute the discrete
Fourier transform from O(n2 ) to O(n log n). Since n log n tends to infinity much
more slowly than n2 , the FFT provides a huge savings when n is large. This
illustrates that a problem which appears to require a certain amount of time or
effort may be susceptible to a more clever, faster approach; see the 2002 entry for
another striking example. Our problem for this year involves such a problem: how
fast can one multiply two matrices? We must be specific about how we measure
the speed of an algorithm. Addition is somewhat faster than multiplication on
a computer, so one often counts the number of multiplications required by an
algorithm as a measure of its approximate runtime. In particular, one wants to
know how the algorithm performs as the size of the input increases.
281
282 1965. FAST FOURIER TRANSFORM
(a) A periodic function with period 1.
(b) A discretization (with 100 sample points) over one period of the periodic
function above.
Figure 1. To analyze a periodic function, one can sample the

function over one period at n evenly spaced points. The resulting
discretized function can be regarded as a function on Z/nZ.

Proposed by Steven J. Miller, Williams College, and Bree Yeates, Em-
poria State University.
If A and B are n×n matrices, then there are n2 entries in the product AB. Each
entry apparently requires n multiplications and n − 1 additions to compute. Thus,
computing AB can be done with n3 multiplications. Show that we can cleverly
group terms and compute the four entries in the product of two 2 × 2 matrices with
just seven multiplications (and 18 additions).
1965: Comments
Matrix multiplication. The method suggested by the centennial problem
can be iterated to provide an algorithm for multiplying two n × n matrices with
only O(n2.8074 ) multiplications. The exponent log2 7 ≈ 2.8074, which improves
upon the log2 8 = 3 provided by the naive algorithm, reflects the fact that only seven
Figure 2. Best known exponents for matrix multiplication

(extrapolated from the image https://commons.wikimedia.org/
wiki/File:Bound on matrix multiplication omega over time.
svg which is in the public domain).
multiplications (instead of eight) are required with each iteration. This method of
matrix multiplication is known as the Strassen algorithm, due to Volker Strassen
(1936– ) [6]. Since its introduction in 1969, there have been many incremental
improvements; see Figure 2. The current world record is an O(n2.3728639 ) algorithm
due to François Le Gall in 2014 [3].
For the most part these algorithms are only of theoretical interest since their
numerical stability is inferior to that of the naive method. Moreover, the constants
hidden by the Big-O notation can be prohibitively large. On the other hand, the
Strassen algorithm can be used effectively over finite fields, in which numerical
accuracy is irrelevant because the computations are performed exactly [7].
Fourier matrix. Let ζ = exp(2πi/n). The matrix representation of the n-

point DFT with respect to the standard basis of Cn is the complex conjugate of
⎡ ⎤
1 1 1 ··· 1
⎢1 ζ ζ2 ··· ζ n−1 ⎥
1 ⎢⎢ ζ2 ζ4 ···
⎥
ζ 2(n−1) ⎥
Fn = √ ⎢1 ⎥. (1965.1)
n ⎢ .. .. .. .. .. ⎥
⎣. . . . . ⎦
(n−1)2
1 ζ n−1 ζ 2(n−1) ··· ζ
284 1965. FAST FOURIER TRANSFORM
Table 1. The eigenvalues of the n × n Fourier matrix Fn depend

upon n (mod 4).
n +1 −1 −i i
4k k+1 k k k−1
4k + 1 k+1 k k k
4k + 2 k+1 k+1 k k
4k + 3 k+1 k+1 k+1 k
This is the Fourier matrix of order n. It is unitary, meaning that Fn−1 = Fn∗ ,
in which Fn∗ is the conjugate transpose of Fn . Some computations with finite
geometric series confirm that Fn2 = −I, and hence Fn4 = I. Thus, the eigenvalues of
Fn are among 1, −1, i, −i; see Table 1. Since the trace of a matrix is the sum of its
eigenvalues, repeated according to multiplicity, the multiplicities of the eigenvalues
of Fn can be deduced from the evaluation of the quadratic Gauss sum
⎧ √
⎪
⎪ (1 + i) n if n ≡ 0 (mod 4),
⎪
2 ⎪
n−1 ⎨√n if n ≡ 1 (mod 4),
k
ζ =
⎪
⎪ if n ≡ 2 (mod 4),
k=0 ⎪0√
⎪
⎩i n if n ≡ 3 (mod 4).
The preceding formula is not at all obvious! Although the magnitude of the sum
can be found relatively easily, its argument is much harder to pin down. As Gauss
confided to Wilhelm Olbers (1758–1840) in 1805 [4]:
. . . the determination of the sign, is exactly what has tortured me all
the time. This shortcoming spoiled everything else that I found; and
hardly a week passed during the last four years where I have not made
this or that vain attempt to untie that knot—especially vigorously
during recent times. But all this brooding and searching was in vain,
sadly I had to put the pen down again. Finally, a few days ago, it has
been achieved—but not by my cumbersome search, rather through
God’s good grace, I am tempted to say. As the lightning strikes the
riddle was solved; I myself would be unable to point to a guiding thread
between what I knew before, what I had used in my last attempts, and
what made it work. Curiously enough the solution now appears to me
to be easier than many other things that have not detained me as many
days as this one years, and surely noone whom I will once explain the
material will get an idea of the tight spot into which this problem had
locked me for so long.
Horner’s method. The FFT and Strassen algorithm provide much more
rapid methods for performing important computations than the naive approaches
suggested by the definitions. Although we refrain from discussing the technical
details of these algorithms, we can at least discuss a simpler algorithm for the rapid
evaluation of polynomials. This hints at the sort of creative thinking and clever
repackaging that is often required to “beat” the approach suggested by definitions.
Although it appears that evaluating

f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0
requires
1
1 + 2 + ··· + n = n(n + 1) = O(n2 )
2
multiplications and n additions, there is a faster approach. Horner’s method , named
after William George Horner (1786–1837), was known to Chinese mathematicians
over 2,000 years ago. It computes f (x) as follows:
f (x) = a0 + x(a1 + x(a2 + · · · + x(an−1 + an x))).
This requires only n multiplications and n additions, an order of magnitude savings
over the naive method.
We can sometimes do even better if the polynomial has a special form. For
example, the evaluation of xn can be done in at most 2 log2 n steps; see the 1977
and 1996 entries, which concern applications of fast multiplication to encryption
and the search for large prime numbers.
Bibliography
[1] C. Burrus, Fast Fourier Transforms. http://cnx.org/content/col10550/1.22/
[2] M. T. Heideman, D. H. Johnson, and C. S. Burrus, Gauss and the history of the fast
Fourier transform, Arch. Hist. Exact Sci. 34 (1985), no. 3, 265–277, DOI 10.1007/BF00348431.
MR815154
[3] F. Le Gall, Powers of tensors and fast matrix multiplication, ISSAC 2014—Proceedings of the
39th International Symposium on Symbolic and Algebraic Computation, ACM, New York,
2014, pp. 296–303, DOI 10.1145/2608628.2608664. MR3239939
[4] S. J. Patterson, Gauss sums, The shaping of arithmetic after C. F. Gauss’s Disquisitiones arith-
meticae, Springer, Berlin, 2007, pp. 505–528, DOI 10.1007/978-3-540-34720-0 19. MR2284818
[5] J. M. Pollard, The fast Fourier transform in a finite field, Math. Comp. 25 (1971), 365–374,
DOI 10.2307/2004932. http://www.ams.org/journals/mcom/1971-25-114/S0025-5718-1971-
0301966-0/S0025-5718-1971-0301966-0.pdf. MR0301966
[6] V. Strassen, Gaussian elimination is not optimal, Numer. Math. 13 (1969), 354–356, DOI
10.1007/BF02165411. MR0248973
[7] Wikipedia, Matrix multiplication algorithm, https://en.wikipedia.org/wiki/Matrix
multiplication algorithm.
1966
Class Number One Problem
Introduction
A binary, integral quadratic form is a function Q : Z2 → Z of the form
Q(x, y) = ax2 + bxy + cy 2 , (1966.1)
in which a, b, c are integers. We require that a and c be nonzero to avoid trivialities
and we often drop the adjectives “binary” and “integral” in what follows. Despite
their simple appearance, quadratic forms have an incredibly rich structure. Carl
Friedrich Gauss developed much of the theory of quadratic forms in his landmark
book Disquisitiones Arithmeticae.
Two quadratic forms Q and Q are equivalent if
Q (x, y) = Q(αx + βy, γx + δy),
in which α, β, γ, δ ∈ Z and αδ − βγ = ±1; that is, (αx + βy, γx + δy) is related to
(x, y) by a 2 × 2 integer matrix whose determinant is 1 or −1. The discriminant of
(1966.1) is
D = b2 − 4ac.
One can show that equivalent forms share the same discriminant. We denote the
number of equivalence classes of quadratic forms with discriminant D by h(D);
this is the class number of D (see Table 1). For the sake of simplicity, we assume
throughout the following that D < 0, in which case Q is positive definite: Q(x, y) >
0 for all x, y ∈ Z.
Gauss proved that h(D) is always finite and discovered that the set of equiva-
lence classes of quadratic forms of discriminant D forms an abelian group of order
h(D) under a complicated operation now known as “Gauss composition.” This
was illuminated by Fields Medalist Manjul Bhargava, who discovered many higher-
order composition laws. In particular, the composition of quadratic forms can now
be conveniently represented with so-called Bhargava cubes [2].
Gauss’s legendary class number one problem asserts that D > 0 satisfies h(−D)
= 1 if and only if
D ∈ {3, 4, 7, 8, 11, 12, 16, 19, 27, 28, 43, 67, 163}.
It is more
√ convenient these days to treat things in terms of imaginary quadratic
fields Q[ −D] instead of quadratic forms. Consequently, it suffices to consider
only square-free D since removing
√ a perfect-square divisor of D results in the same
field. In this context, Q[ −D] is said to have class number one if its “ideal
√ class
group” is trivial. This occurs if and only if the ring of integers in Q[ −D] is a
287
288 1966. CLASS NUMBER ONE PROBLEM
Table 1. √ Class numbers for the first 100 imaginary quadratic

fields Q[ −D]. The boldface entries correspond to Gauss’s list
(1966.2).
D h(−D) D h(−D) D h(−D) D h(−D) D h(−D) D h(−D)

1 1 33 4 66 8 97 4 131 5 165 8
2 1 34 4 67 1 101 14 133 4 166 10
3 1 35 2 69 8 102 4 134 14 167 11
5 2 37 2 70 4 103 5 137 8 170 12
6 2 38 6 71 7 105 8 138 8 173 14
7 1 39 4 73 4 106 6 139 3 174 12
10 2 41 8 74 10 107 3 141 8 177 4
11 1 42 4 77 8 109 6 142 4 178 8
13 2 43 1 78 4 110 12 143 10 179 5
14 4 46 4 79 5 111 8 145 8 181 10
15 2 47 5 82 4 113 8 146 16 182 12
17 4 51 2 83 3 114 8 149 14 183 8
19 1 53 6 85 4 115 2 151 7 185 16
21 4 55 4 86 10 118 6 154 8 186 12
22 2 57 4 87 6 119 10 155 4 187 2
23 3 58 2 89 12 122 10 157 6 190 4
26 6 59 3 91 2 123 2 158 8 191 13
29 6 61 6 93 4 127 5 159 10 193 4
30 4 62 8 94 8 129 12 161 16 194 20
31 3 65 8 95 8 130 4 163 1 195 4
unique
√ factorization domain. An equivalent form of Gauss’s conjecture claims that
Q( −D) with D > 0 has class number one if and only if
D ∈ {1, 2, 3, 7, 11, 19, 43, 67, 163}. (1966.2)
In 1966, Alan Baker (1939–2018) and Harold Stark (1939– ) independently

proved Gauss’s conjecture. Their methods were strikingly different: Baker used
his theory of logarithmic forms [1], whereas Stark studied L-functions and certain
Diophantine equations [6]. Stark’s approach was similar to an attempt of Kurt
Heegner (1893–1965) from 1952 [4], which contained a gap that prevented its general
acceptance for several years [7]. Baker was awarded the Fields Medal in 1970,
in part for his work on the class number one problem; see the 1935 entry for
information about Baker’s work on Hilbert’s seventh problem.

Proposed by Kyle Pratt, Brigham Young University.
Show that the quadratic form Q(x, y) = x2 + 7y 2 represents infinitely many
primes. The first few primes of this form are
7, 11, 23, 29, 37, 43, 53, 67, 71, 79, 107, 109, 113, 127, 137, 149, 151, 163, 179.
Hint: For a prime p, show that p = x2 + 7y 2 if and only if p = 7 or (−7/p) = 1, in

which (·/p) denotes the Legendre symbol. Then use the fact that h(−28) = 1. See
the notes for a complete solution.
1966: Comments
Primes of the form x2 + dy 2 . The study of primes of the form x2 + dy 2 has a
long and storied history [3]. Fermat showed that a prime p is of the form x2 + y 2 if
and only if p = 2 or p ≡ 1 (mod 4). Here is a short explanation. Since 2 = 12 + 12 ,
it suffices to consider odd primes p. If p ≡ 1 (mod 4), then the method discussed
in the comments for the 1923 entry imply that p is the sum of two squares. On the
other hand, any square is congruent to 0 or 1 (mod 4). Thus, a sum of two squares
cannot be congruent to 3 (mod 4).
Similar criteria are available for many other small values of d:
• p = x2 + 2y 2 iff p = 2 or p ≡ 1, 3 (mod 8).
• p = x2 + 3y 2 iff p = 3 or p ≡ 1 (mod 3).
• p = x2 + 5y 2 iff p ≡ 1, 9 (mod 20).
• p = x2 + 6y 2 iff p ≡ 1, 7 (mod 24).
• p = x2 + 7y 2 iff p = 7 or p ≡ 1, 2, 4 (mod 7).
Sums of squares. Let r(n) denote the number of decompositions of n as the

sum of two squares. We count decompositions as distinct even if they differ only in
sign or order. For instance, since
5 = (±2)2 + (±1)2 = (±1)2 + (±2)2 ,
Figure 1. Plot of r(n) for 1 ≤ n ≤ 1,000,000. The maximum

value of r(n) in this range belongs to n = 801,125 = 53 · 13 · 17 · 29.
we say that r(5) = 8. If p is prime, then one can show that

⎧
⎪
⎨4 if p = 2,
r(p) = 8 if p ≡ 1 (mod 4),
⎪
⎩
0 if p ≡ 3 (mod 4).
More generally, if n = 2a m1 m2 , in which m1 is a product of primes of the form
4k + 1 and m2 is a product of primes of the form 4k + 3, then

0 if m2 is not a perfect square,
r(n) =
4d(m1 ) if m2 is a perfect square.
Here d(m1 ) denotes the number of divisors of m1 . Evidently, the behavior of r(n)
is erratic; see Figure 1. On the other hand, the arithmetic mean
r(0) + r(1) + · · · + r(n)
An =
n+1
of r(n) does something remarkable:
lim An = π. (1966.3)
n→∞
Why does (1966.3) hold? First, observe that r(n) equals the number of points in
√
Z2 that lie on the circle x2 +y 2 = n with center (0, 0) and radius n. Consequently,
r(0) + r(1) + · · · + r(n) (1966.4)
is the number of points in Z2 that lie in the disk
Dn = {(x, y) ∈ R2 : x2 + y 2 ≤ n}
√
of radius n centered at the origin. Thus, (1966.3) says that for large n, the
expression (1966.4) is approximately equal to the area πn of the disk Dn .
For each point of Dn ∩ Z2 , we associate the square of area 1 of which it forms
the lower left-hand corner.1 The area of the region Rn that is formed by the union
of these squares is (1966.4). In certain places Rn extends past the boundary of the
Dn while in other places Dn extends beyond Rn . Since the squares have
√ √ diagonal
√
2, it follows that the region Rn is contained in the disk of radius n+ 2 centered
√
√
at the origin. On the other hand, the region Rn contains the disk of radius n− 2
centered at the origin; see Figure 2.
Consequently,
√ √ √ √
π( n − 2)2 ≤ r(0) + r(1) + · · · + r(n) ≤ π( n + 2)2 ,
from which it follows that
√ √
π − 2π 2n π + 2π 2n
π+ ≤ An ≤ π + .
n+1 n+1
Take the limit as n → ∞ to obtain (1966.3). In fact, the preceding inequalities tell
√
us that An = π + O(1/ n), so the convergence is relatively slow. For example,
A1,000,000 = 3.141545858 . . ., which is only accurate to four decimal places.
1 Any other corner, or even the center of the square, would work too. The important thing is
to pick a convention and remain consistent.

Figure
√ √ 2. The region Rn is contained inside of the disk of radius
n + 2 that√ is centered at the origin. It contains the disk of
√
radius n − 2 that is centered at the origin.
Solution to the problem. Fix a prime p = 7. It suffices to show that
p = x2 + 7y 2 ⇐⇒ (−7/p) = 1.
Indeed, quadratic reciprocity ensures that (−7/p) = 1 if and only if p ≡ 1, 2, 4

(mod 7) and Dirichlet’s theorem on primes in arithmetic progressions yields infin-
itely many primes congruent to 1, 2, or 4 modulo 7; see the 1913 entry.
Suppose that p = x2 + 7y 2 ; in particular, this implies that gcd(x, p)
= gcd(y, p) = 1. Since x2 = p − 7y 2 , it follows that
x2 ≡ −7y 2 (mod p) and hence (xy −1 )2 ≡ −7 (mod p),
so that (−7/p) = 1. Conversely, suppose that (−7/p) = 1. Then there is a b such

that −7 ≡ (b )2 (mod p), which implies that
−28 ≡ (2b )2 (mod 4p).
In particular, observe that −28 is the discriminant of Q(x, y) = x2 + 7y 2 . Then
b2 + 28
b = 2b and c=
4p
are integers. Since h(−28) = 1 and because the discriminant of the quadratic form
Q (x, y) = px2 + bxy + cy 2
is −28, it follows that Q is equivalent to Q. Since Q (1, 0) = p, it follows that Q
also represents p; that is, p is of the form x2 + 7y 2 .
Bibliography
[1] A. Baker, Linear forms in the logarithms of algebraic numbers. IV, Mathematika 15 (1968),
204–216, DOI 10.1112/S0025579300002588. MR0258756
[2] M. Bhargava, Higher composition laws. I. A new view on Gauss composition, and qua-
dratic generalizations, Ann. of Math. (2) 159 (2004), no. 1, 217–250, DOI 10.4007/an-
nals.2004.159.217. MR2051392
[3] D. A. Cox, Primes of the form x2 +ny 2 : Fermat, class field theory, and complex multiplication,
2nd ed., Pure and Applied Mathematics (Hoboken), John Wiley & Sons, Inc., Hoboken, NJ,
2013. MR3236783
[4] K. Heegner, Diophantische Analysis und Modulfunktionen (German), Math. Z. 56 (1952),
227–253, DOI 10.1007/BF01174749. MR0053135
[5] D. Shanks, On Gauss’s class number problems, Math. Comp. 23 (1969), 151–163,
DOI 10.2307/2005064. http://www.ams.org/journals/mcom/1969-23-105/S0025-5718-1969-
0262204-1/S0025-5718-1969-0262204-1.pdf. MR0262204
[6] H. Stark, On complex quadratic fields with class number equal to one, Trans. Amer. Math.
Soc. 122 (1966), 112–119, DOI 10.2307/1994504. http://www.ams.org/journals/tran/1966-
122-01/S0002-9947-1966-0195845-4/S0002-9947-1966-0195845-4.pdf. MR0195845
[7] H. M. Stark, On the “gap” in a theorem of Heegner, J. Number Theory 1 (1969), 16–27, DOI
10.1016/0022-314X(69)90023-7. MR0241384
[8] H. M. Stark, A complete determination of the complex quadratic fields of class-number one,
Michigan Math. J. 14 (1967), 1–27.
[9] H. M. Stark, The Gauss class-number problems, Analytic number theory, Clay Math. Proc.,
vol. 7, Amer. Math. Soc., Providence, RI, 2007, pp. 247–256. http://www.claymath.org/
publications/Gauss_Dirichlet/stark.pdf. MR2362205
1967
The Langlands Program
Introduction
A handwritten letter from Robert Langlands (1936– ) to André Weil begins
modestly:
While trying to formulate clearly the question I was asking you before
Chern’s talk I was led to two more general questions. Your opinion of
these questions would be appreciated. I have not had a chance to think
over these questions seriously and I would not ask them except as the
continuation of a casual conversation. I hope you will treat them with
the tolerance they require at this stage. After I have asked them I will
comment briefly on their genesis. [5]
This 1967 letter was a tour de force, a manifesto that would shape the next half-
century (and more) of number theory. The main characters in Langlands’s drama
are automorphic forms: functions on a topological space that are invariant under
a discrete group of symmetries (the actual definition is much longer and more
technical). There are two crucial supporting characters.
First we have Galois representations; these are homomorphisms Gal(Q/Q) →
GLn (C), in which Q is the algebraic closure of Q and GLn (C) is the group of n × n
invertible complex matrices. This Galois group is one of the richest objects in alge-
braic number theory and describing its representations is a complicated problem.
We also have L-functions, of which the simplest example is the Riemann zeta
function:
∞
1 1
−1
ζ(s) = = 1− s . (1967.1)
n=1
ns p prime
p
The equality of the sum and the product is the famed Euler product formula; see the
1933 entry. Every L-function can be written as a product over primes in this way.
They extend meromorphically to s ∈ C, with a certain symmetry with respect to
s → 1 − s. Miraculously, L-functions encode all kinds of data, from the distribution
of prime numbers to the number of points on algebraic varieties.
The Langlands program, to put it roughly, aims to show that behind every
Galois representation or L-function there is an automorphic form. For example,
suppose that we have an L-function
αp
−1
βp
−1
L(s) = 1− s 1− s ,
p p
p prime
in which αp , βp ∈ C and αp βp = 1. This kind of L-function might come from an

elliptic curve, a representation Gal(Q/Q) → GL2 (C), a modular form, or a more
293
294 1967. THE LANGLANDS PROGRAM
mysterious Maass form. In the case of an elliptic curve (see the 1921 entry), αp
and βp are functions of the number of points on the curve modulo p. Langlands
conjectured that not only do these L-functions have automorphic forms behind
them, but so too do the “symmetric power L-functions”
−1
r
αpi βpr−i
Lr (s) = 1− .
p prime i=0
ps
Just the convergence of the symmetric powers implies two famous conjectures:
the Ramanujan conjecture (all αp , βp are on the unit circle) and the Sato–Tate
conjecture (they are equidistributed on the circle).
The Langlands program encompasses a vast range of conjectures and theo-
rems, more than one person could ever prove. For example, class field theory is
the simplest case of the Langlands program. Andrew Wiles’s proof of Fermat’s last
theorem? That is part of the next simplest case. There have been huge break-
throughs on the Langlands program since 1967, such as the proof of the so-called
fundamental lemma by Ngô Bao Châu (1972– ). He received the Fields Medal in
2010 for this result. However, we will almost certainly be working on the Langlands
program for years to come.

Proposed by Ian Whitehead, Macalester College.
In this problem we will show that Langlands’s conjecture for symmetric power
L-functions implies the Ramanujan conjecture. Consider one factor of the product
of symmetric power L-functions L0 (s)L2 (s)L4 (s) · · · L2m (s):

m
2r
(1 − αi β 2r−i x)−1 ,
r=0 i=0
in which we have substituted α, β for αp , βp , and x for p−s . Assume that αβ = 1

and α + β ∈ R. Prove that this expands as a power series in x with positive real
coefficients. (Hint: Take a logarithm first.) This fact, together with Langlands’s
conjecture, implies that the series converges for x < 1/p, regardless of m. Conclude
that |α| = |β| = 1.
1967: Comments
It is impossible to do the Langlands program justice in a few short paragraphs
and hence we make no attempt to do so. Instead we focus on a couple tangential
results that are of a more elementary nature.
Euclid’s theorem revisited. We derived the Euler product formula (1967.1)

in the notes to the 1933 entry. In the notes for the 1919 entry, we showed that
∞
1 π2
2
= ,
n=1
n 6
an old result due to Leonhard Euler (see also the 1939 and 1973 entries). Put these
two results together and obtain
1
−1
π2
1− 2 = .
p 6
p prime
Since π 2 /6 is irrational, the preceding product must include infinitely many terms;
that is, there are infinitely many primes. This provides another proof of Euclid’s
theorem. Armed with the finiteness of the irrationality measure of π 2 /6, one can
modify this proof and obtain lower bounds on the prime counting function [2].
We should be more careful, however. The √ irrationality of π does not automati-
cally imply that π 2 is irrational. For example, 2 is irrational (see the 1951 entry),
but its square is an integer. Fortunately, π is transcendental (proved by Lindemann
in 1882) and hence π 2 is irrational. Indeed, if π 2 were rational, then π would be
algebraic since it is a root of x2 − π 2 , which is assumed to have rational coefficients.
The field of algebraic numbers. Fundamental to the Langlands program

is Q, the algebraic closure of Q. One can show that Q = A, the set of algebraic
numbers that we encountered in the 1918 entry. Recall that an algebraic number
is a complex number that is a zero of a polynomial with rational coefficients. Con-
sequently, we are asserting that A is an algebraically closed field: any polynomial
with coefficients in A has a root in A. It is not even clear that A is a field. For
example, why are the sum and product of algebraic numbers algebraic?
The proof that the algebraic numbers form a field is standard fare for abstract
algebra texts. However, there is a concrete proof that uses only linear algebra. This
has the added benefit of exposing students to the notion of tensor products, before
they are introduced in a graduate algebra class as bifunctors on monoidal categories
subject to certain coherence conditions; that is, as “abstract nonsense.” It is much
better to see concrete examples with matrices before diving into whatever Figure 1
entails!
(A ⊗ (B ⊗ C)) ⊗ D
α
⊗ 1D A,
B⊗
C C,
B, D
α A,
((A ⊗ B) ⊗ C) ⊗ D A ⊗ ((B ⊗ C) ⊗ D)
D
αA
C,
B,
⊗B
α
,C
⊗
,D
1A
αA,B,C⊗D
(A ⊗ B) ⊗ (C ⊗ D) A ⊗ (B ⊗ (C ⊗ D))
Figure 1. The pentagon diagram from the formal definition of

an abstract tensor product. If such a diagram was your first intro-
duction to tensor products, then we feel sorry for you.
296 1967. THE LANGLANDS PROGRAM
The Kronecker product of an m × n matrix A = [aij ] and a p × q matrix B is

the mp × nq matrix
⎡ ⎤
a11 B a12 B · · · a1n B
⎢ a21 B a22 B · · · a2n B ⎥
⎢ ⎥
A⊗B =⎢ . .. .. .. ⎥ .
⎣ .. . . . ⎦
am1 B am2 B ··· amn B
For example, if !
1 2
A= and B = [5 6],
3 4
then ! !
B 2B 5 6 10 12
A⊗B = = .
3B 4B 15 18 20 24
This is a concrete instance of a tensor product. You should verify that the Kronecker
product enjoys the following properties:
(a) (A ⊗ B)(C ⊗ D) = AC ⊗ BD,
(b) c(A ⊗ B) = (cA) ⊗ B = A ⊗ (cB),
(c) (A + B) ⊗ C = A ⊗ C + B ⊗ C,
(d) A ⊗ (B + C) = A ⊗ B + A ⊗ C,
(e) A ⊗ (B ⊗ C) = (A ⊗ B) ⊗ C,
in which A, B, C, D are matrices and c is a scalar [3, Sect. 3.6]. Eigenvalues and
eigenvectors are particularly compatible with Kronecker products. From this stems
our current interest in them. If Ax = λx and By = μy, then
(A ⊗ B)(x ⊗ y) = (Ax) ⊗ (By) = (λx) ⊗ (μy) = λμ(x ⊗ y)
and
[(A ⊗ I) + (I ⊗ B)](x ⊗ y) = (A ⊗ I)(x ⊗ y) + (I ⊗ B)(x ⊗ y)
= Ax ⊗ y + x ⊗ By
= λx ⊗ y + μx ⊗ y
= (λ + μ)(x ⊗ y).
Thus, if λ and μ are eigenvalues of A and B, respectively, then λμ is an eigenvalue
of A ⊗ B and λ + μ is an eigenvalue of A ⊗ I + I ⊗ B.
Let
f (z) = z n + cn−1 z n−1 + cn−2 z n−2 + · · · + c1 z + c0
be a polynomial of degree at least two. The companion matrix of f is
⎡ ⎤
0 0 ... 0 −c0
⎢1 0 . . . 0 −c1 ⎥
⎢ ⎥
⎢0 1 . . . 0 −c2 ⎥
Cf = ⎢ ⎥.
⎢. . . .. ⎥
⎣ .. .. . . ... . ⎦
0 0 . . . 1 −cn−1
Induction and cofactor expansion along the top row of zI − Cf reveals that f is
the characteristic polynomial of Cf [3, p. 200]. Consequently, a complex number is
algebraic if and only if it is an eigenvalue of a matrix with rational entries. This is

all the equipment we need to show that the set A of algebraic numbers is a field.
Since A is a subset of the field of complex numbers, it suffices to show that A is
closed under addition, multiplication, and inversion. Let α, β ∈ A and suppose that
p(α) = q(β) = 0, in which p and q are monic polynomials with rational coefficients.
Then α, β are eigenvalues of the rational matrices Cp and Cq , respectively. This
means that αβ is an eigenvalue of the rational matrix Cp ⊗ Cq , so αβ is algebraic.
Similarly, α+β is an eigenvalue of the rational matrix Cp ⊗I +I ⊗Cq and hence α+β
is algebraic. What about inversion? If α = 0 is algebraic and p is a polynomial
with rational coefficients such that p(α) = 0, then 1/α is a root of the rational
polynomial z deg p (p(z −1 )).
Bibliography
[1] D. Bump, Automorphic forms and representations, Cambridge Studies in Advanced Mathe-
matics, vol. 55, Cambridge University Press, Cambridge, 1997. MR1431508
[2] D. Burt, S. Donow, S. J. Miller, M. Schiffman, and B. Wieland, Irrationality measure and
lower bounds for π(X), Pi Mu Epsilon J. 14 (2017), no. 7, 421–429. https://arxiv.org/abs/
0709.2184. MR3726946
[3] S. R. Garcia and R. A. Horn A Second Course in Linear Algebra, Cambridge University Press,
2017.
[4] S. Gelbart, An elementary introduction to the Langlands program, Bull. Amer. Math. Soc.
(N.S.) 10 (1984), no. 2, 177–219, DOI 10.1090/S0273-0979-1984-15237-6. http://www.ams.
org/journals/bull/1984-10-02/S0273-0979-1984-15237-6/S0273-0979-1984-15237-6.
pdf. MR733692
[5] R. Langlands, Letter to André Weil, Institute for Advanced Study. http://publications.
ias.edu/rpl/paper/43.
[6] R. P. Langlands, Problems in the theory of automorphic forms, Lectures in modern analysis
and applications, III, Lecture Notes in Math., Vol. 170, Springer, Berlin, 1970, pp. 18–61.
MR0302614
1968
Atiyah–Singer Index Theorem
Introduction
In 1968, Michael Atiyah (1929–2019) and Isador Singer (1924– ) established
what is now known as the Atiyah–Singer index theorem, a remarkable result that
connects topology and analysis [2, 3]. In 2004, the Norwegian Academy of Science
and Letters awarded the Abel Prize to Atiyah and Singer for this work (the inaugu-
ral award went to Jean-Pierre Serre, whom we met in our 1956 entry). The award
citation proclaims:
The Atiyah-Singer index theorem is one of the great landmarks of
twentieth-century mathematics, influencing profoundly many of the
most important later developments in topology, differential geometry
and quantum field theory. Its authors, both jointly and individu-
ally, have been instrumental in repairing a rift between the worlds of
pure mathematics and theoretical particle physics, initiating a cross-
fertilization which has been one of the most exciting developments of
the last decades.
We describe the world by measuring quantities and forces that
vary over time and space. The rules of nature are often expressed by
formulas involving their rates of change, that is, differential equations.
Such formulas may have an “index”, the number of solutions of the
formulas minus the number of restrictions which they impose on the
values of the quantities being computed. The index theorem calculates
this number in terms of the geometry of the surrounding space. [12]
It is also worth noting that Atiyah was knighted in 1983, in part for the index
theorem. Singer, who is American, is not eligible for knighthood.
Although the precise statement of the Atiyah–Singer index theorem is beyond
the scope of this book, we can describe an elementary result that is of a similar
spirit. The rank-nullity theorem from linear algebra states that if A is an m × n
complex matrix, then
n = rank A + nullity A.
Similarly,
m = rank A∗ + nullity A∗ ,
in which A∗ denotes the conjugate transpose of A. Since rank A = rank A∗ , it

follows that
m − n = nullity A∗ − nullity A. (1968.1)

299
300 1968. ATIYAH–SINGER INDEX THEOREM
This relates the “topological” quantity m − n, the difference in dimensions of the

underlying range and domain spaces, to an “analytic” quantity, the kernel dimen-
sions of A and A∗ , which measures the “sizes” of the solutions sets to Ax = 0 and
A∗ x = 0, respectively. See the notes for a discussion of the Toeplitz index theorem,
a deeper result in the same vein.
The Atiyah–Singer index theorem paved new paths connecting physical theo-
ries such as string theory with pure abstractions found in topology. Much of physics
is concerned with differential equations and the index theorem answers some fun-
damental general questions about them. The overview provided on the occasion of
the Abel Prize states:
The Atiyah–Singer index theorem is a purely mathematical result. It
tells us that a fundamental question in analysis, namely how many
solutions there are to a system of differential equations, has a concrete
answer in topology. This insight provides a short-cut to getting to
know whether such solutions exist or not. The theorem is valuable,
because it connects analysis and topology in a beautiful and insightful
way. It is practical, because it explains how the manifold applications
there are of mathematical analysis can make good use of the spatial,
or topological, structure that underlies the problem at hand. [15]
The Atiyah–Singer index theorem displays an unexpected connection between
two seemingly unrelated branches of mathematics (see the 1985 entry for an even
more remarkable story of a completely unexpected connection between disparate
parts of mathematics). In a similar spirit, the following concrete, but extremely
difficult problem, illustrates a surprising connection between the composition of
polynomials and the classification of finite simple groups (see the 1992 and 2004
entries). Indeed, it is surprising that one of the deepest theorems in mathematics
is necessary to solve the problem.

A polynomial f (x) ∈ C[x] is indecomposable if f (x) = u(v(x)) for u, v ∈ C[x]
implies that u or v is a linear polynomial. Suppose that f, g are indecompos-
able polynomials and that f (x) − g(y) can be factored in C[x, y]. Prove that
g(x) = f (ax + b) or deg f = deg g ∈ {7, 11, 13, 15, 21, 31}. Prove that each of
these possibilities occurs. Hint: Use the classification of finite simple groups [7, 8]
(see the comments for the 2004 entry).
1968: Comments
A temporal anomaly. Although the index theorem appears here in the 1968
entry, Atiyah was awarded a Fields Medal in 1966 because he
[d]id joint work with Hirzebruch in K-theory; proved jointly with
Singer the index theorem of elliptic operators on complex manifolds;
worked in collaboration with Bott to prove a fixed point theorem re-
lated to the “Lefschetz formula.”1
1 The quote refers to Friedrich Hirzebruch (1927–2012), Raoul Bott (1923–2005), and Solomon
Lefschetz (1884–1972).
How did this occur? The original announcement of the index theorem dates to
1963 [1] and the results had undergone many years of peer review and study by the
community before the final papers [2, 3] appeared in print in 1968.
Toeplitz index theorem. A somewhat more elementary, although still highly

nontrivial, index theorem is the Toeplitz index theorem. This requires a little bit
of setup. Consider the Hardy space H 2 , which consists of complex power series

f (z) = ∞ n
n=0 an z for which
∞ 1/2
f = |an | 2
n=0
is finite; see the 1949 entry. Each function f ∈ H 2 is analytic (see p. 151) on the
open unit disk D and has a boundary function
∞
∞

f (ζ) = an ζ n = lim− an (rζ)n (1968.2)
r→1
n=0 n=0
that exists for almost all ζ on the unit circle T [10]. For example,
∞
zn
f (z) =
n=0
n+1
∞
belongs to H 2 (its norm is the square root of n=1 1/n2 = π 2 /6; see the 1919
entry), but (1968.2) diverges for ζ = 1 since it is the harmonic series. However,
such points are the exception, rather than the rule: the radial limit (1968.2) exists
generically.
Suppose now that we have a suitable function2 g : T → C that can be decom-
posed as a complex Fourier series

g(ζ) = bn ζ n .
n∈Z
Its Riesz projection is

∞

Pg = bn ζ n ;
n=0
that is, we remove the negatively indexed summands. If g is nice enough, then we
can regard P g as an element of H 2 by replacing ζ ∈ T with z ∈ D. For example,

P 2 cos(arg ζ) = P (ζ + ζ −1 ) = P (ζ) = z.
If φ : T → C is continuous, then the Toeplitz operator Tφ : H 2 → H 2 with
symbol φ is defined by
Tφ f = P (φf ).
Since φ is likely not analytic, its Fourier series will probably involve both positively
and negatively indexed terms. To compute Tφ f , we multiply the Fourier series for
φ and f term-by-term and then apply the Riesz projection, which removes any
negatively indexed terms that result.
2 The technical hypothesis here is that g belongs to L2 (T), the space of complex-valued
functions on T that are square-integrable with respect to Lebesgue measure.

−1 0 2
Figure 1. Several curves and their winding numbers about a

point. The Toeplitz index theorem relates this quantity to the
index of a Toeplitz operator.
The analytic index of Tφ is
ind Tφ = dim ker Tφ∗ − dim ker Tφ ,
in which Tφ∗ = Tφ is the adjoint operator of Tφ . If φ : T → C is continuous and

does not pass through z, then its winding number about z is

1 dζ
indφ (z) = .
2πi φ ζ − z
Those who have not learned complex analysis might be surprised to learn that this
quantity is an integer and that it counts the number of times that φ encircles z;
see Figure 1. The Toeplitz index theorem asserts that if φ : T → C does not pass
through the origin, then
ind Tφ = indφ (0).
The preceding result relates the “analytic index” of a Toeplitz operator to its “topo-
logical index.” This is one of the seminal results in the theory of C ∗ -algebras, a
field that can largely be thought of as “noncommutative point-set topology.” See
[6] for a good introduction to the subject.
Risch algorithm. The year 1968 is also notable for the introduction of the
Risch algorithm, developed by Robert Henry Risch (1939– ) [13,14]. This algorithm
determines whether a given function has an elementary antiderivative. If it has such
an antiderivative, the Risch algorithm produces it. Calculus students worldwide
depend on variants of the algorithm whenever they appeal to Wolfram Alpha to
do their homework. Information about computer implementations of the Risch
algorithm can be found in [5, 9].
What is an elementary function? We say that f (x) is elementary if it can be
obtained from the field of complex rational functions in x by adjoining a finite num-
ber of nested exponentials, logarithms, and algebraic functions. The trigonometric
and hyperbolic functions are elementary, as are their inverses. For example,
2 cos x = eix + e−ix

by Euler’s formula, so cos x is elementary. What about the inverse cosine? Write
the preceding equation as
e2ix − 2eix cos x + 1 = 0
and use the quadratic formula to reveal
√
2 cos x ± 4 cos2 x − 4
ix
e = = cos x ± i cos2 x − 1.
2
In what follows, we gloss over some technical issues, such as the precise definition
of the complex logarithm. By convention, we select the plus sign in the preceding
equation. Substitute x = cos−1 z and obtain

cos−1 z = −i log(z + z 2 − 1).
This demonstrates that cos−1 z is an elementary function.
Some well-known functions that do not have elementary antiderivatives are
1/ log x, which arises in the prime number theorem (see the 1919, 1933, and 1948
entries), cos(x2 ) and sin(x2 ), which arise in the Fresnel integrals from optics, and
e−x , which arises in the central limit theorem (see the 1922 entry). A particularly
2
compelling example was found by Manuel Bronstein (1963– ), who observed that
x
f (x) = √ (1968.3)
x4 + 10x2 − 96x − 71
has the elementary antiderivative
1
F (x) = − ln (x6 + 15x4 − 80x3 + 27x2 − 528x + 781) x4 + 10x2 − 96x − 71
8
− (x8 + 20x6 − 128x5 + 54x4 − 1408x3 + 3124x2 + 10001) + C
but that substituting 72 in place of 71 in (1968.3) results in a function whose

antiderivative is not elementary.
Bibliography
[1] M. F. Atiyah and I. M. Singer, The index of elliptic operators on compact manifolds, Bull.
Amer. Math. Soc. 69 (1963), 422–433, DOI 10.1090/S0002-9904-1963-10957-X. MR0157392
[2] M. F. Atiyah and I. M. Singer, The index of elliptic operators. I, Ann. of Math. (2) 87 (1968),
484–530, DOI 10.2307/1970715. http://www.jstor.org/stable/1970715. MR0236950
[3] M. F. Atiyah and I. M. Singer, The index of elliptic operators. III, Ann. of Math.
(2) 87 (1968), 546–604, DOI 10.2307/1970717. http://www.jstor.org/stable/1970717.
MR0236952
[4] M. Bronstein, Integration of elementary functions, J. Symbolic Comput. 9 (1990), no. 2,
117–173, DOI 10.1016/S0747-7171(08)80027-2. MR1056841
[5] M. Bronstein, Symbolic Integration Tutorial http://www-sop.inria.fr/cafe/Manuel.
Bronstein/publications/issac98.pdf.
[6] K. R. Davidson, C ∗ -algebras by example, Fields Institute Monographs, vol. 6, American
Mathematical Society, Providence, RI, 1996. MR1402012
[7] W. Feit, Some consequences of the classification of finite simple groups, The Santa Cruz
Conference on Finite Groups (Univ. California, Santa Cruz, Calif., 1979), Proc. Sympos.
Pure Math., vol. 37, Amer. Math. Soc., Providence, R.I., 1980, pp. 175–181. MR604576
[8] M. Fried, Exposition on an arithmetic-group theoretic connection via Riemann’s existence
theorem, The Santa Cruz Conference on Finite Groups (Univ. California, Santa Cruz, Calif.,
1979), Proc. Sympos. Pure Math., vol. 37, Amer. Math. Soc., Providence, R.I., 1980, pp. 571–
602. MR604636
[9] K. O. Geddes, S. R. Czapor, and G. Labahn, Algorithms for computer algebra, Kluwer Aca-
demic Publishers, Boston, MA, 1992. MR1256483
[10] J. Mashreghi, Representation theorems in Hardy spaces, London Mathematical Society Stu-
dent Texts, vol. 74, Cambridge University Press, Cambridge, 2009. MR2500010
[11] R. B. Melrose, The Atiyah-Patodi-Singer index theorem, Research Notes in Mathematics,
vol. 4, A K Peters, Ltd., Wellesley, MA, 1993. http://www.maths.ed.ac.uk/~aar/papers/
melrose.pdf. MR1348401
[12] Norwegian Academy of Science and Letters, 2004 Abel Prize Citation, http://www.
abelprize.no/c53865/binfil/download.php?tid=53806
[13] R. H. Risch, The problem of integration in finite terms, Trans. Amer. Math. Soc. 139 (1969),
167–189, DOI 10.2307/1995313. MR0237477
[14] R. H. Risch, The solution of the problem of integration in finite terms, Bull. Amer. Math.
Soc. 76 (1970), 605–608, DOI 10.1090/S0002-9904-1970-12454-5. MR0269635
[15] J. Rognes, On the Atiyah–Singer index theorem, http://www.abelprize.no/c53865/binfil/
download.php?tid=53804.
1969
Erdős Numbers
Introduction
The most prolific mathematical researcher of the 20th century was Paul Erdős.
He wrote over 1,500 articles with around 500 different coauthors. Mathematicians
started to think of him as the center of the research collaboration world. In 1969
Casper Goffman wrote a whimsical article in which he described a measure of
distance from Erdős in terms of mathematical collaborations [6]:
• Paul Erdős has Erdős number 0;

• a person who published a joint paper with Erdős has Erdős number 1;
• a person who published a paper with a person with Erdős number n but who
does not qualify for a smaller Erdős number has Erdős number n + 1;
• a person with no such path to Erdős has Erdős number ∞.
Currently, over 11,000 people have Erdős number 2 and nearly every practicing
mathematician has Erdős number 6 or less [9]. Most nonmathematicians have
Erdős number ∞ (simply because most people have never coauthored a research
article of any type), although there are many exceptions since researchers in physics,
economics, computer science, and other fields can often be linked to Erdős in a finite
number of steps.
From a mathematical point of view, Erdős numbers are distances in the grand
“collaboration graph.” The vertices of this graph are researchers and an edge is
present between every pair of researchers who have published together. A small
portion of this graph is depicted in Figure 1. The collaboration graph is just one
example of a large social network; other examples include Facebook and Twitter.
Research into the structure and dynamics of social networks has reached a feverish
pace in the past several years [10]. Much of that work deals with how graphs can
evolve randomly, a topic pioneered by Erdős and his collaborators decades ago [4].
Erdős himself wrote a short paper in 1972 in which it is shown that the more
restrictive collaboration graph, in which only two-author papers are considered,
cannot be drawn in the plane without its edges crossing [2]. To be more specific,
he attributed the observation to Andrzej Schinzel (1937– ):
I communicated this problem to Schinzel, who proved that G(M ) [the

restricted collaboration graph] is not planar by showing that G(M )
contains a K(3, 3)—that is, a complete bipartite graph of 6 vertices
(with three vertices of each color and the 9 edges connecting black to
white in all possible ways). The white vertices are Chowla, Mahler,
305
306 1969. ERDŐS NUMBERS
Christopher N. B. Hammond
Harold S. Shapiro
Mihai Putinar Alex Kontorovich
William T. Ross Aviezri Fraenkel
David Sherman Curtis Cooper
Gary Weiss Hang Chen
Stephan Garcia Paul Erdős Steven J. Miller
Florian Luca M. Ram Murty
Zolt Fedi Frank Morgan
Figure 1. Partial collaboration graph illustrating some of the au-

thors’ links to Paul Erdős. Both authors have Erdős number 2.
They also have multiple three-edge paths to Erdős. Prior to the
publication of this book, the authors had collaboration distance 2.
After the publication of this book they will be connected by the
dashed edge and hence have collaboration distance 1.
and Schinzel; the black ones are Davenport, Erdős, Lewis; the simple
task of finding the 9 relevant papers can be left to the reader.

Proposed by Jerrold Grossman, Oakland University.
Suppose that in a group of at least three people, each pair has precisely one
common friend. Prove that there is always someone who is everybody’s friend, and
describe the structure of this “friendship graph.” Paul Erdős solved this problem
in a paper with Alfréd Rényi (1921–1970) and Vera Sós (1930– ) in 1966 [5].
1969: Comments
Erdős–Bacon numbers. A much more selective group are those who have a
finite Erdős–Bacon number . Your Erdős–Bacon number is the sum of your Erdős
number and your Bacon number . The Bacon number is similar to the Erdős num-
ber: just replace “Paul Erdős” with “Kevin Bacon” and “research papers” with
“movie roles.” If you have never appeared in a movie, then your Bacon number is
infinite. Thus, it is hard to have a finite Erdős–Bacon number.
The first named author’s former senior thesis student, Vincent Selhorst-Jones,
has one of the lowest Erdős–Bacon numbers (5) on record; see Figure 2. He appeared
in American Sniper (2014) with Joel Lambert, who appeared in Patriots Day (2016)
with Kevin Bacon. Thus, Vincent has Bacon number 2 (since he never appeared in a
movie with Kevin Bacon, he does not have a 1). As an undergraduate mathematics
Figure 2. The actor and former mathematics major Vincent

Selhorst-Jones has Erdős–Bacon number five. Photo courtesy of
Vincent Selhorst-Jones.
major at Pomona College, Vincent coauthored a paper [7] with the first named
author, who has Erdős number 2; see Figure 1. Thus, Vincent has Erdős–Bacon
number 5.
Wetzel’s problem. In 1963, Paul Erdős provided a stunning solution to the

following problem, first posed by John E. Wetzel (1932– ):
If {fα } is a family of distinct analytic functions (on some fixed domain)
such that for each z the set of values {fα (z)} is countable, is the family
itself countable?
Erdős proved that an affirmative answer to Wetzel’s problem is equivalent to the
negation of the continuum hypothesis [3] (see [1] for a detailed exposition of Erdős’s
proof and the 1963 entry for more information about the continuum hypothesis).
Taken together, Erdős’s solution and Cohen’s proof of the independence of the
continuum hypothesis render Wetzel’s problem undecidable in ZFC. Upon hearing
of Erdős’s solution, Wetzel wrote to his advisor, Halsey Royden (1928–1993) and
said:
Erdős has showed that the answer to a question I asked in my disser-
tation is closely tied to the continuum hypothesis! So once again a
natural analysis question has grown horns.
Erdős begins his paper with “[i]n the Ann Arbor Problem Book, Wetzel asked
(under the date December, 1962) the following question. . . .” One minor quibble is
that Wetzel says that “I have never visited the University of Michigan; I’ve never
even been to Ann Arbor.” Simple enough, we can consult the Ann Arbor Problem
308 1969. ERDŐS NUMBERS
Book to figure out what happened. However, Peter Duren (1935– ), who was a
professor at the University of Michigan in 1962, tells us:
The Secretary of the Math Club acted as guardian of the book, and
both locals and visitors were invited to look through it. Unfortunately,
the book was lost during the Christmas break of 1962–63, on the streets
of Chicago. The man then serving as Secretary of the Math Club
had carried the book (or books) with him when he drove to Chicago
and had left it in his car overnight. Someone broke into the car and
set it on fire, and the Math Club book was lost (among other items,
including the car). . . . What Paul Erdős called the Ann Arbor Problem
Book must have been the Math Club book. But his reference can’t be
checked, since the original entries for December 1962 no longer exist.
Interestingly the University of Illinois at Urbana-Champaign, where Wetzel was

a professor for many years, has its own problem book that contains several remarks
Figure 3. Excerpt from the University of Illinois at Urbana-

Champaign “Boneyard Book” (image in the public domain).
Erdős’s distinctive handwriting is evident in the fourth entry.
on Wetzel’s problem by multiple authors and an entry written in Erdős’s distinctive

handwriting; see Figure 3. The history of Wetzel’s problem is no less interesting
than its solution; for details and references see [8].
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 3rd ed., including illustrations by Karl
H. Hofmann, Springer-Verlag, Berlin, 2004. MR2014872
[2] P. Erdős, Mathematical Notes: On the Fundamental Problem of Mathematics, Amer. Math.
Monthly 79 (1972), no. 2, 149–150, DOI 10.2307/2316535. MR1536622
[3] P. Erdős, An interpolation problem associated with the continuum hypothesis, Michigan Math.
J. 11 (1964), 9–10. MR0168482
[4] P. Erdős and A. Rényi, On the evolution of random graphs (English, with Russian summary),
Magyar Tud. Akad. Mat. Kutató Int. Közl. 5 (1960), 17–61. http://www.renyi.hu/~p_erdos/
1961-15.pdf. MR0125031
[5] P. Erdős, A. Rényi, and V. T. Sós, On a problem of graph theory, Studia Sci. Math. Hungar.
1 (1966), 215–235. MR0223262
[6] C. Goffman, Mathematical Notes: And What Is Your Erdős Number?, Amer. Math. Monthly
76 (1969), no. 7, 791, DOI 10.2307/2317868. MR1535523
[7] S. R. Garcia, V. Selhorst-Jones, D. E. Poore, and N. Simon, Quotient sets and
Diophantine equations, Amer. Math. Monthly 118 (2011), no. 8, 704–711, DOI
10.4169/amer.math.monthly.118.08.704. MR2843990
[8] S. R. Garcia and A. L. Shoemaker, Wetzel’s problem, Paul Erdős, and the continuum hy-
pothesis: a mathematical mystery, Notices Amer. Math. Soc. 62 (2015), no. 3, 243-247 (in
Part II of the Erdős retrospective).
[9] J. W. Grossman, The Erdős Number Project, www.oakland.edu/enp.
[10] M. Newman, A.-L. Barabási, and D. J. Watts (eds.), The structure and dynamics of net-
works, Princeton Studies in Complexity, Princeton University Press, Princeton, NJ, 2006.
MR2352222
1970
Hilbert’s Tenth Problem
Introduction
A Diophantine equation is an equation of the form
p(x1 , x2 , . . . , xn ) = 0, (1970.1)
in which p is a polynomial with integer coefficients and only integer solutions are
sought. Such equations have intrigued mathematicians from the dawn of the subject
to the present day. Here are just a few well-known examples.
An early example arises in the Pythagorean theorem, which asserts that
a2 + b2 = c2
for a right triangle with sides a and b and hypotenuse c. Since this relationship
can be rewritten as a2 + b2 − c2 = 0, it is of the form (1970.1). As evidence in
favor of the old dictum that one should not trust a scarecrow whose certification
comes from an unscrupulous degree mill, the theorem is apparently contradicted by
the nonsense the scarecrow utters upon receiving his Th.D. (Doctor of Thinkology)
diploma in The Wizard of Oz :
The sum of the square roots of any two sides of an isosceles triangle is
equal to the square root of the remaining side. [8]
Fictional scarecrows are not alone in botching the Pythagorean theorem: Major
League Baseball messed it up as well (see the comments for the 1971 entry). See
the comments for this year for a proof of the theorem.
Another famous Diophantine equation is the Fermat equation
xn + y n = z n ,
in which n ≥ 3. Pierre de Fermat conjectured in 1637 that the equation has no

solutions in positive integers. This is a fiendishly difficult problem that took over
three centuries to solve (see the 1981 and 1995 entries for more about Fermat’s last
theorem).
In 1900, the Second International Congress of Mathematicians was held in
Paris. David Hilbert gave an influential keynote address that set out what he
thought were the most important problems in mathematics [2]. This list has moti-
vated and shaped the course of mathematical research ever since. The first, third,
and seventh of Hilbert’s problems are discussed in the 1963, 1980, and 1935 entries,
311
312 1970. HILBERT’S TENTH PROBLEM
respectively. Hilbert’s tenth problem was:

Given a Diophantine equation with any number of unknown quantities
and with rational integral numerical coefficients: To devise a process
according to which it can be determined in a finite number of opera-
tions whether the equation is solvable in rational integers.
In 1970 Yuri Matiyasevich (1947– ) completed a chain of ideas developed by many

mathematicians, including Julia Robinson (1919–1985), Martin Davis (1928– ),
and Hilary Putnam (1926–2016), that proved Hilbert’s tenth problem is unsolvable
[5]. That is, there does not exist an algorithm to determine whether an arbitrary
Diophantine equation has an integer solution.
We now turn to the other end of the spectrum: a problem for which a unique
solution can be shown to exist and, moreover, for which there is a simple algorithm
to find it. Moreover, it involves the sequence of Fibonacci numbers
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, (1970.2)
a key ingredient in Matiyasevich’s attack on Hilbert’s tenth problem. Edouard

Zeckendorf (1901–1983) proved that every positive integer can be written uniquely
as a sum of nonconsecutive Fibonacci numbers (we exclude 0 and omit the first 1
in (1970.2)). For example,
42 = 34 + 8,
221 = 144 + 55 + 21 + 1, and
1,701 = 1,597 + 89 + 13 + 2.
We use a greedy algorithm to obtain these decompositions. Given n, subtract the

largest Fibonacci number at most n; call it Fk . If n = Fk , then we are done.
Otherwise, subtract the largest Fibonacci number at most n − Fk . The number
subtracted off at this step cannot be Fk−1 since otherwise we could have subtracted
Fk+1 = Fk + Fk−1 in the first step. Since this contradicts the maximality of Fk , we
conclude that consecutive Fibonacci numbers are never used in the decomposition.
It turns out that Zeckendorf’s theorem provides another characterization of
the Fibonacci numbers: they are the unique sequence of positive integers such that
every natural number can be written uniquely as a sum of nonconsecutive terms.
Without the restriction that the Fibonacci numbers involved are nonconsecutive,
many more representations arise. For example, since 34 = 21 + 13 and 8 = 5 + 3,
we may write
42 = 34 + 5 + 3
= 21 + 13 + 8
= 21 + 13 + 5 + 3,
and so forth. Among all possible decompositions of a positive integer as a sum

of Fibonacci numbers, one can show that none have fewer summands than the
Zeckendorf decomposition; see the 1980 entry.

Here is an outline for another proof of Zeckendorf’s theorem. We phrase it as a
cookie problem, though it is more commonly referred to as a stars and bars problem:
how many ways are there to divide C identical cookies among P people, in which
the cookies are indistinguishable? This is equivalent to
x1 + x2 + · · · + xP = C, x1 , x2 , . . . , xP ≥ 0.
Find an elementary proof that the number of solutions to the preceding is

C +P −1
.
P −1
Can you use this to prove Zeckendorf’s theorem?
1970: Comments
Statistical properties of Zeckendorf decompositions. The combinatorial
interpretation suggested by the problem can be used not only to prove Zeckendorf’s
theorem, but also to obtain statistical results about Zeckendorf decompositions. For
example, if we look at all integers in [Fn , Fn+1 ), then the number of summands in
the Zeckendorf decomposition becomes normally distributed as n → ∞ [4]. As a
consequence, one can obtain the following curious results of Lekkerkerker [3]. The
average number of summands used to represent integers in [Fn , Fn+1 ) is
√
5− 5 2 1 2
n− = n−
10 5 1 + φ2 5
and the variance in the number of summands is
1 2 φ 2
√ n− = n− ,
5 5 25 5(φ + 2) 25
in which
1 √
φ = (1 + 5) = 1.618 . . .
2
denotes the golden ratio. The appearance of the golden ratio is not surprising in
light of Binet’s formula; see the comments for the 2002 entry. For another angle on
statistical properties of the Fibonacci numbers, see the 1938 entry.
A Fibonacci tiling. The Fibonacci identity
F12 + F22 + · · · + Fn2 = Fn Fn+1 (1970.3)
has an appealing geometric interpretation; see Figure 1. The left-hand side of

(1970.3) can be interpreted as the sum of the areas of n squares used to dissect a
rectangle of size
Fn (Fn + Fn−1 ) = Fn Fn+1 ,
which yields the desired formula.
314 1970. HILBERT’S TENTH PROBLEM
Figure 1. Squares of Fibonacci size tile the plane. In addition,

this suggests the formula F02 + F12 + · · · + Fn2 = Fn Fn+1 .
A proof of the Pythagorean theorem. If you are reading this book, chances
are that you have taken a good number of sophisticated mathematics courses. How-
ever, a surprising number of mathematics majors cannot prove the Pythagorean
theorem off the top of their heads! We shall remedy that here; see Figure 2 for
an elegant “proof by picture.” There are now hundreds of proofs known. Even
b a b
c a
c b
b b
c
b c
a
c b
a a c a
a b a
(a) The area of the large square equals that of (b) The area of the large square equals that
the two small squares (a2 + b2 ) plus that of the of the central square (c2 ) plus that of the four
four triangles. triangles.
Figure 2. Proof of the Pythagorean theorem. The total area in

(a) is a2 +b2 +4( 21 ab). The total area in (b) is c2 +4( 12 ab). Equating
these expressions yields a2 + b2 = c2 .
US President James A. Garfield (1831–1881) got in on the action when he was a

member of the US House of Representatives [1]. A readable account of the history
behind the Pythagorean theorem is [7].
Bibliography
[1] J. A. Garfield, Pons Asinorum, The New England Journal of Education 3 (1876), no. 14, 161.
[2] D. Hilbert, Über das Unendliche (German), Math. Ann. 95 (1926), no. 1, 161–190,
DOI 10.1007/BF01206605. http://www.ams.org/journals/bull/1902-08-10/S0002-9904-
1902-00923-3/S0002-9904-1902-00923-3.pdf. MR1512272
[3] C. G. Lekkerkerker, Voorstelling van natuurlyke getallen door een som van getallen van Fi-
bonacci, Simon Stevin 29 (1952), 190–195.
[4] M. Koloğlu, G. S. Kopp, S. J. Miller, and Y. Wang, On the number of summands in Zeckendorf
decompositions, Fibonacci Quart. 49 (2011), no. 2, 116–130. MR2801798
[5] Ju. V. Matijasevič, The Diophantineness of enumerable sets (Russian), Dokl. Akad. Nauk
SSSR 191 (1970), 279–282. MR0258744
[6] Y. Matijasevich, My collaboration with Julia Robinson, Math. Intelligencer 14 (1992), no. 4,
38–45, DOI 10.1007/BF03024472. MR1188142
[7] E. Maor, The Pythagorean theorem: A 4,000-year history, Princeton University Press, Prince-
ton, NJ, 2007. MR2316578
[8] Scarecrow (from the Wizard of Oz), https://www.youtube.com/watch?v=uCOxU2rKLas.
1971
Society for American Baseball Research
Introduction
The Society for American Baseball Research (SABR), founded in Cooperstown,
New York, by Bob Davids (1926–2002) in 1971, has many objectives, one of which
is to encourage and aid the application of mathematics and statistics to the analysis
of baseball. The term sabermetrics, derived from the acronym SABR, refers to the
statistical study of baseball (usually with the aim of improving a team’s perfor-
mance). Sabermetricians have created an alphabet soup of acronyms to describe
new metrics for measuring player performance (VORP, WAR, OPS, and so forth).
Other sports have since followed baseball’s lead. For example, exotic acronyms
such as TS%, PER, PPP, USG%, and APM are now bandied about on basketball
websites. The current dominance of the NBA’s Golden State Warriors is often
partly attributed to their wholehearted embrace of data analytics.
It is important to know what to measure. For example, walks were originally
viewed as errors by the pitcher and not a positive event by the batter. This led
to an enormous undervaluation of walks, now remedied by the consideration of
on-base percentage. Since the annual revenues in Major League Baseball (MLB)
and other professional sports are measured in the billions, there is a lot at stake.
A team that has a better understanding of which statistics truly matter can as-
semble a better team for less money. This can translate into World Series rings
and increased revenue. Most teams now have sabermetricians helping with player
selection and strategy. Moneyball [6], by Michael Lewis (1960– ), is an excellent
popular account of how the Oakland A’s applied these principles and, with a rela-
tively small budget, fielded competitive teams that routinely reached the playoffs.
See also [4] for applications of mathematics in sports.

Only seven times in MLB history has a team had four consecutive batters
hit home runs: the Milwaukee Braves (1961), the Cleveland Indians (1963), the
Minnesota Twins (1964), the Los Angeles Dodgers (2006), the Boston Red Sox
(2007), the Chicago White Sox (2008), and the Arizona Diamondbacks (2010).
Estimate the probability that some team performs this feat during the season.
What is wrong with just raising the average home run frequency to the fourth
power?
317
318 1971. SOCIETY FOR AMERICAN BASEBALL RESEARCH
1971: Comments
Predicting unlikely events. The importance of the chosen problem extends
far beyond baseball: how do we estimate the probability of an unlikely event? One
approach is through simulation; see the 1946 entry on the Monte Carlo method.
However, we need the ability to run many trials and gather a lot of data. For Monte
Carlo-type methods to be useful in baseball, we would need to be able to simulate
games accurately. Such programs do exist and they often use Markov chains; see
the 1953 entry. Consult [2, 9] for an introduction to Markov chains and [1, 3, 8] for
some applications to baseball.
Another approach is to count how many situations have existed in which the
desired event could have occurred and how many of these situations led to the
outcome. Such an approach does an excellent job for events that occur frequently,
such as hits or stolen bases, or even coming back to win after being down by four
runs after six innings. It is much harder to apply this method if there are few
occurrences.
In a playoff matchup, two teams compete in a best-of-seven series; this means
that the first team to win four games advances. Prior to 2004 (when the Boston
Red Sox achieved the feat), no team in Major League Baseball had ever come back
to win a series after trailing 3-0. However, such opportunities for an epic comeback
only arose 24 times (as of January 1, 2018, it has happened only 34 times). If each
team has an equal chance of winning a game, then we should expect the team down
3-0 to complete the comeback one out of every sixteen times. Of course, it is too
simplistic to think that each team has an equal chance: perhaps the team that is
up 3-0 is just much better than the other team.
In Figure 1, we plot the probability of having no teams, at most one team, and
at most two teams come back to win a best-of-seven series after being down 3-0 if
there are n teams in that situation. There is an enormous difference if we drop the
hypothesis that each team in a series is equally likely to win any given game. If we
assume that the losing team has only a 40% chance of winning each game, then the
number of teams expected to complete an epic comeback drops dramatically.
To compute the probabilities in Figure 1, we first find the chance that one team
comes back after being down 3-0 in games. Assuming they win each individual
game with probability p, the chance they win the next four is just p4 . Thus, the
probability they do not come back is 1−p4 . If there are n teams that find themselves
in a 3-0 hole, the probability that none come back is just (1 − p4 )n , while exactly
one team comes back with probability

n
(p4 )(1 − p4 )n−1
1
and exactly two happens with probability

n
(p4 )2 (1 − p4 )n−2 .
2
To get the probabilities of at most 0, 1, or 2 teams winning a series we just sum
the corresponding probabilities.
(a) p = 0.5 (b) p = 0.4
Figure 1. Probability (vertical axis) of having no teams (blue), at

most one team (yellow), and at most two teams (green) come back
to win a best-of-seven series after being down 3-0 when there have
been n teams in that position (horizontal axis) and the trailing
team has a probability p of winning each game.
Which team wins? What is the probability that one team beats another?
The goal is to obtain a formula that allows you to assess the contributions of your
players to winning. Such knowledge can then be used to determine where you need
to build. Is it more valuable to improve your offense or your pitching? How much
should you pay for a hitter that is a little bit better than your current player?
More generally, the answer is a result of general techniques that can be applied to
a variety of problems.
One of the most commonly used formulas is the Pythagorean won-loss formula,
due to Bill James (1949– ), which dates back to the 1970s. To give a sense of
its value, it is one of the few statistics often used in scoreboards or expanded
scoreboards online (frequently denoted X-WL for expected won-loss). If RS denotes
the average number of runs scored by a team per game, and RA the average number
of runs they allow, James postulated that a good approximation to their winning
percentage (number of wins divided by number of games) would be
RS2 /(RS2 + RA2 ).
The exponent 2 was chosen to simplify the computations and led to the name since
the sum of the squares in the denominator looks similar to the sum of squares in
the Pythagorean theorem. Nowadays the 2 is replaced by a parameter γ, whose
best fit value in baseball is close to, but a little less than, 2. In 2006, the second
named author provided a theoretical justification for why this formula should be
an excellent predictor. He used elementary probability theory to model the runs
scored and allowed as being drawn from independent Weibull distributions; see
[5, 7]. One of the great values of the Pythagorean expectation is that it allows a
team to estimate the benefit it would receive from adding a hitter who generates
10 more runs versus signing a pitcher who allows 10 fewer.
The nonexistence of baseball. We claim that baseball does not exist. To be

more specific, we prove that the official rules of Major League Baseball specify an
320 1971. SOCIETY FOR AMERICAN BASEBALL RESEARCH
17
8.5 8.5
12 12
Figure 2. The specifications for home plate require the existence

of a right triangle with sides 12, 12, 17. This is prohibited by the
Pythagorean theorem.
impossible geometric construction. According to the Major League Baseball 2017

Official Rules:
Home base shall be marked by a five-sided slab of whitened rubber.
It shall be a 17-inch square with two of the corners removed so that
one edge is 17 inches long, two adjacent sides are 8 21 inches and the
remaining two sides are 12 inches and set at an angle to make a point.
It shall be set in the ground with the point at the intersection of the
lines extending from home base to first base and to third base; with
the 17-inch edge facing the pitcher’s plate, and the two 12-inch edges
coinciding with the first and third base lines. The top edges of home
base shall be beveled and the base shall be fixed in the ground level
with the ground surface.
Since “the infield shall be a 90-foot square,” it follows that home base contains
a right triangle with hypotenuse 17 and side lengths 12, 12; see Figure 2. This
contradicts the Pythagorean theorem since
122 + 122 = 288 and 172 = 289.
Consequently, home base does not exist, from which it follows that baseball does
not exist.
Bibliography
[1] J. Beamer, Introducing Markov chains, The Hardball Times, November 26, 2007. https://
www.fangraphs.com/tht/introducing-markov-chains/.
[2] E. Behrends, Introduction to Markov chains: With special emphasis on rapid mixing, Advanced
Lectures in Mathematics, Friedr. Vieweg & Sohn, Braunschweig, 2000. MR1730905
[3] B. Bukiet, E. R. Harold, and J. L. Palacios, Markov Chain Approach to Baseball, Operations
Research 45 (1997), 14–23. https://pubsonline.informs.org/doi/abs/10.1287/opre.45.1.
14.
[4] J. A. Gallian (ed.), Mathematics and sports, The Dolciani Mathematical Expositions, vol. 43,
Mathematical Association of America, Washington, DC, 2010. MR2766424
[5] S. J. Miller, T. Corcoran, J. Gossels, V. Luo, and J. Porfilio, Pythagoras at the
bat, Social networks and the economics of sports, Springer, Cham, 2014, pp. 89–
113, DOI 10.1007/978-3-319-08440-4 6. https://web.williams.edu/Mathematics/sjmiller/
public_html/math/papers/MillerEtAl_Pythagoras.pdf. MR3307909
[6] M. Lewis, Moneyball: The Art of Winning an Unfair Game, W. N. Norton & Company, 2004.
[7] S. J. Miller, A derivation of James’ Pythagorean projection, By The Numbers – The Newsletter
of the SABR Statistical Analysis Committee 16 (February 2006), no. 1, 17–22 and Chance Mag-
azine 20 (Winter 2007), no. 1, 40–48; expanded version available at https://web.williams.
edu/Mathematics/sjmiller/public_html/math/papers/PythagWonLoss_Paper.pdf.
[8] M. D. Pankin, Baseball as a Markov Chain, http://www.pankin.com/markov/intro.htm.
[9] D. Stansbury, A Brief Introduction to Markov Chains, The Clever Machine: Topics in Compu-
tational Neuroscience & Machine Learning, September 24, 2012. https://theclevermachine.
wordpress.com/2012/09/24/a-brief-introduction-to-markov-chains/.
1972
Zaremba’s Conjecture
Introduction
What is the best way to numerically integrate a function of several variables?
One method is to compute the average value of the function over a large number of
sample points. The 1946 entry described the Monte Carlo method, in which sample
points are selected at random. However, it is often desirable to use a deterministic
approach, that is, one that does not depend upon random choices.
Suppose for the sake of simplicity that we wish to numerically integrate a real-
valued smooth function of two variables over the unit square [0, 1]2 in R2 . In 1971,
Stanislaw Zaremba (1903–1990) suggested using sample points
- .
n np
, (mod 1) : 1 ≤ n ≤ q ,
q q
in which gcd(p, q) = 1. In other words, he considered the orbit of (1/q, p/q) under
repeated addition modulo 1; this may remind you of Figure 1 in the 1961 entry.
Zaremba noticed that the quality of the sampling depends upon how small the
partial quotients a0 , a1 , . . . , ak are in the (finite) continued fraction expansion
p 1
= a0 + ;
q 1
a1 +
1
a2 +
a3 + · · ·
see Figure 1 (for more information about continued fractions, see [3,5] and the 1931,
1934, and 1955 entries). There is no loss of generality in assuming that 1 ≤ p < q,
in which case a0 = 0 and we write
p
= [a1 , a2 , . . . , ak ].
q
For a given q, can we select a p so that a1 , a2 , . . . , ak are as small as possible? In
1972, Zaremba conjectured that this “height” of the partial quotients can be made
absolute, for any choice of sample size q [6]. In particular, he conjectured that
one can always select p so that max{a1 , a2 , . . . , ak } ≤ 5; see Table 1. Zaremba’s
conjecture is our problem for this year.

Proposed by Alex Kontorovich, Rutgers University.
For A > 0, let DA be the set of all positive integers q for which there exists a
p ∈ {1, 2, . . . , q} with gcd(p, q) = 1 such that the finite continued fraction expansion
323
324 1972. ZAREMBA’S CONJECTURE
(a) 1191/2383 = [2, 1191] yields a poor sam- (b) 1678/2383 = [1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2]
pling of the unit square. yields a good sampling of the unit square.
Figure 1. Zaremba noticed that if p/q = [a1 , a2 , . . . , ak ] has only

small partial quotients a1 , a2 , . . . , ak , then {(n/q, np/q) (mod 1) :
1 ≤ n ≤ q} provides a good sampling of the unit square. For
the prime q = 2,383, we examine the samplings that arise from
p = 1,191 and p = 1,678.
p/q = [a1 , a2 , . . . , ak ] has max{a1 , a2 , . . . , ak } ≤ A. Prove that there exists an A > 1

so that DA = N.
Bonus points. Prove that A = 5 suffices.
Extra bonus points. Prove that A = 2 suffices if a finite number of integers are
allowed to be omitted.
1972: Comments
A continued fraction expansion for e. We follow [4, Sect. 3.8] and derive
the beautiful continued fraction expansion1
2
e = 2+ (1972.1)
3
2+
4
3+
5
4+
5 + ···
for Euler’s constant e = 2.71828 . . .. First, substitute x = −1 in the power series
expansion
∞
xn x2 x3
ex = = 1+x+ + + ···
n=0
n! 2! 3!
and obtain
1 1 1
= 1 − 1 + − + ··· ,
e 2! 3!
1 Since the numerators in (1972.1) are not all 1, this is not a “simple” continued fraction of
the sort that we have been considering.

Table 1. Evidence in favor of Zaremba’s conjecture. For q =

2, 3, . . . , 101, we find a corresponding p so that none of the partial
quotients in the continued fraction expansion of p/q exceed A = 5.
p p p p
q [a1 , a2 , . . . , ak ] q [a1 , a2 , . . . , ak ] q [a1 , a2 , . . . , ak ] q [a1 , a2 , . . . , ak ]
1 5 9 16
2 2 27 5, 2, 2 52 5, 1, 3, 2 77 4, 1, 4, 3
1 5 10 17
3 3 28 5, 1, 1, 2 53 5, 3, 3 78 4, 1, 1, 2, 3
1 5 17 14
4 4 29 5, 1, 4 54 3, 5, 1, 2 79 5, 1, 1, 1, 4
1 7 12 17
5 5 30 4, 3, 2 55 4, 1, 1, 2, 2 80 4, 1, 2, 2, 2
5 7 13 14
6 1, 5 31 4, 2, 3 56 4, 3, 4 81 5, 1, 3, 1, 2
2 7 10 17
7 3, 2 32 4, 1, 1, 3 57 5, 1, 2, 3 82 4, 1, 4, 1, 2
3 7 11 16
8 2, 1, 2 33 4, 1, 2, 2 58 5, 3, 1, 2 83 5, 5, 3
2 9 11 19
9 4, 2 34 3, 1, 3, 2 59 5, 2, 1, 3 84 4, 2, 2, 1, 2
3 6 11 16
10 3, 3 35 5, 1, 5 60 5, 2, 5 85 5, 3, 5
2 11 11 15
11 5, 2 36 3, 3, 1, 2 61 5, 1, 1, 5 86 5, 1, 2, 1, 3
5 7 11 16
12 2, 2, 2 37 5, 3, 2 62 5, 1, 1, 1, 3 87 5, 2, 3, 2
3 7 11 17
13 4, 3 38 5, 2, 3 63 5, 1, 2, 1, 2 88 5, 5, 1, 2
3 7 11 16
14 4, 1, 2 39 5, 1, 1, 3 64 5, 1, 4, 2 89 5, 1, 1, 3, 2
4 7 12 17
15 3, 1, 3 40 5, 1, 2, 2 65 5, 2, 2, 2 90 5, 3, 2, 2
3 9 25 16
16 5, 3 41 4, 1, 1, 4 66 2, 1, 1, 1, 3, 2 91 5, 1, 2, 5
3 11 12 17
17 5, 1, 2 42 3, 1, 4, 2 67 5, 1, 1, 2, 2 92 5, 2, 2, 3
5 8 13 16
18 3, 1, 1, 2 43 5, 2, 1, 2 68 5, 4, 3 93 5, 1, 4, 3
4 13 13 33
19 4, 1, 3 44 3, 2, 1, 1, 2 69 5, 3, 4 94 2, 1, 5, 1, 1, 2
9 8 13 17
20 2, 4, 2 45 5, 1, 1, 1, 2 70 5, 2, 1, 1, 2 95 5, 1, 1, 2, 3
4 11 15 17
21 5, 4 46 4, 5, 2 71 4, 1, 2, 1, 3 96 5, 1, 1, 1, 5
5 9 17 17
22 4, 2, 2 47 5, 4, 2 72 4, 4, 4 97 5, 1, 2, 2, 2
4 11 13 17
23 5, 1, 3 48 4, 2, 1, 3 73 5, 1, 1, 1, 1, 2 98 5, 1, 3, 4
5 9 13 17
24 4, 1, 4 49 5, 2, 4 74 5, 1, 2, 4 99 5, 1, 4, 1, 2
7 9 13 19
25 3, 1, 1, 3 50 5, 1, 1, 4 75 5, 1, 3, 3 100 5, 3, 1, 4
5 11 13 18
26 5, 5 51 4, 1, 1, 1, 3 76 5, 1, 5, 2 101 5, 1, 1, 1, 1, 3
which can be rewritten as

1 1 1 1 1
1− = − + − + ··· . (1972.2)
e 1 1·2 1·2·3 1·2·3·4
This is a convergent series of the form
1 1 1 1
− + − + ··· , (1972.3)
x1 x1 x2 x1 x2 x3 x1 x2 x3 x4
326 1972. ZAREMBA’S CONJECTURE
for which a remarkable algebraic manipulation is available. Observe that

1 1 1
− =
x1 x1 x2 x1 + ( x2x−1
1
)
and use this to obtain
1 1 1 1 x3 − 1
− + = −
x1 x1 x2 x1 x2 x3 x1 x1 x2 x3
1 1
= −
x1 x1 ( xx32−1
x3
)
1
= x1 by (1972.3)
x1 +
( xx32−1
x3
) −1
1
= x1 .
x1 + x2
x2 − 1 +
x3 − 1
Proceed by induction and get
1 1 1 1 1
− + − + = x1 .
x1 x1 x2 x1 x2 x3 x1 x2 x3 x4 x1 + x2
x2 − 1 + x3
x3 − 1 +
x4 − 1 + · · ·
Apply this to (1972.2) with xn = n and obtain
1 1
1− = .
e 1
1+
2
1+
3
2+
3 + ···
Since
1 1
1− = ,
e 1
1+
e−1
we see that
1 1
= ,
1 1
1+ 1+
e−1 2
1+
3
2+
3 + ···
from which (1972.1) follows.
Status of the problem. In 2011, Alex Kontorovich (1980– ) and Jean Bour-
gain almost proved Zaremba’s conjecture [1]. To be more specific, they showed
that
|D50 ∩ {1, 2, . . . , n}|
lim = 1.
n→∞ n
In other words, almost all natural numbers appear as the denominator of a finite
continued fraction whose partial quotients are bounded by 50. In 2015, the same
result was established with D5 in place of D50 . Thus, Zaremba’s original conjecture
with A = 5 is now known to be “almost” true in the sense that those natural
numbers that do not belong to D5 have density zero.
Bibliography
[1] J. Bourgain and A. Kontorovich, On Zaremba’s conjecture, Ann. of Math. (2) 180 (2014), no. 1,
137–196, DOI 10.4007/annals.2014.180.1.3. https://arxiv.org/pdf/1107.3776. MR3194813
[2] S. Huang, An improvement to Zaremba’s conjecture, Geom. Funct. Anal. 25 (2015), no. 3,
860–914, DOI 10.1007/s00039-015-0327-6. MR3361774
[4] D. Perkins, ϕ, π, e & i, MAA Press, 2017.
[5] A. J. van der Poorten, Notes on continued fractions and recurrence sequences, Number theory
and cryptography (Sydney, 1989), London Math. Soc. Lecture Note Ser., vol. 154, Cambridge
Univ. Press, Cambridge, 1990, pp. 86–97. MR1055401
[6] S. K. Zaremba, La méthode des “bons treillis” pour le calcul des intégrales multiples (French,
with English summary), Applications of number theory to numerical analysis (Proc. Sym-
pos., Univ. Montreal, Montreal, Que., 1971), Academic Press, New York, 1972, pp. 39–119.
MR0343530
1973
Transcendence of e Centennial
Introduction
Let α be a complex number. If there exists a polynomial p(x) of positive degree
with integer coefficients such that p(α) = 0, then α is an algebraic number . If no
such polynomial √exists, then
√ α is transcendental. Thus, all rational numbers are
algebraic, as are 2, i = −1, and
"
√ √
5 + 3 + 1 + 2.
However, not every algebraic number can be written in terms of integers, rational
operations, and root extractions. Students of Galois theory know that the Abel–
Ruffini theorem says that there is no formula analogous to the quadratic formula
that can provide the roots of every polynomials of degree five. For example, x5 −x−1
has roots that are algebraic but not expressible in terms of radicals.
It is often difficult to prove that a specific number is transcendental, although
we can quickly show that most real (or complex) numbers are transcendental. Georg
Cantor proved that the set of algebraic numbers is countable (see the footnote con-
cerning this on p. 31 in the 1918 entry). Since the set of real numbers is uncountable
(see the 1918 and 1999 entries for proofs), it follows that real transcendental num-
bers exist and, moreover, that most real numbers are transcendental.
In 1844, Joseph Liouville proved a theorem that can be used to construct
specific transcendental numbers. For example, he proved that Liouville’s constant
∞
1
λ = n!
= 0.11000100000000000000000100000 . . .
n=1
10
is transcendental; see the comments for the 1935 entry for complete details. This did
not, however, shed any light on the status of the famous constants e and π. Charles
Hermite (1822–1901) established the transcendence of Euler’s constant (Figure 1)
e = 2.7182818284590452353602874713526624977572470936999 . . .
in 1873 and Ferdinand von Lindemann (1852–1939) proved the transcendence of π

in 1882.
We provide a slick modern proof of e’s transcendence in the notes below. Since
transcendental numbers are irrational, it also establishes the irrationality of e. How-
ever, the following simple proof that e is irrational is too good to pass up. Let
1
In = xn ex dx,
0
329
330 1973. TRANSCENDENCE OF e CENTENNIAL
Figure 1./ Euler’s constant e can be defined as the unique value

e
for which 0 dx
x = 1.
which is positive. Then use integration by parts and induction to show that there
are integers an and bn such that
In = an + bn e, n = 0, 1, 2, . . . .
Suppose toward a contradiction that e = p/q for some natural numbers p and q.
Then

p an q + bn p 1
In = an + bn = ≥
q q q
since the numerator an q + bn p is a positive integer. On the other hand,
1 1
1 e
≤ In = xn ex dx ≤ e xn dx = → 0.
q 0 0 n + 1
This contradiction implies that e is irrational.
See the 1935 and 1955 entries, along with the the comments for the 1918, 1934,
1938, and 1967 entries, for more information about algebraic and transcendental
numbers.

Proposed by Steven J. Miller, Williams College. Prove that at least one of
e + π and eπ is transcendental.1
1 When a version of this entry was published in the Pi Mu Epsilon journal, the following prob-
lem was used: “Find a 1-to-1, increasing function f : [0, 1] → R such that f (x) is transcendental
for all x.” This problem has been moved to the 1955 entry.
1973: Comments
Transcendence of e. The following proof, which can be found in
[2, Thm. 12.45], involves a small amount of complex analysis, or at least some
familiarity with complex integration (see [4, Thm. 5.4.2] for another proof). If f (x)
is a polynomial with deg f = m, then define
z
I(z) = f (ζ)ez−ζ dζ. (1973.1)
0
Repeated integration by parts yields

m
m
I(z) = ez f (j) (0) − f (j) (z). (1973.2)
j=0 j=0
Let F (x) denote the polynomial obtained from f by replacing each coefficient of f
with its absolute value. Since the inequality
|ez−ζ | ≤ e|z−ζ| ≤ e|z|
holds for ζ = tz with t ∈ [0, 1], it follows from (1973.1) that
|I(z)| ≤ |z|e|z| F (|z|).
Suppose toward a contradiction that e is algebraic. Then there are integers
0 and gcd(q0 , q1 , . . . , qn ) = 1 so that
q0 , q1 , . . . , qn with q0 =
q0 + q1 e + q2 e2 + · · · + qn en = 0. (1973.3)
Let
f (x) = xp−1 (x − 1)p · · · (x − n)p , (1973.4)
in which p is a large prime number. Let I(z) denote (1973.1) and let
J = q0 I(0) + q1 I(1) + · · · + qn I(n). (1973.5)
Then (1973.2) and (1973.3) ensure that

m
n
J = − qk f (j) (k),
j=0 k=0
in which
m = deg f = (n + 1)p − 1.
The definition (1973.4) tells us that f (j) (k) = 0 if j < p and k > 0 or if j < p − 1
and k = 0. Consequently p! divides f (j) (k) for all j, k except for j = p − 1 and
k = 0, in which case we have
f (p−1) (0) = (p − 1)!(−1)np (n!)p .
It follows that f (p−1) (0) is a nonzero integer that is divisible by (p − 1)! but not p!
whenever p > n. Let p > max{n, |q0 |} so that |J| ≥ (p − 1)!. Since
F (k) ≤ (2n)m ,
it follows from (1973.4) and (1973.5) that
|J| ≤ |q1 |eF (1) + · · · + |qn |nen F (n) ≤ cp ,
332 1973. TRANSCENDENCE OF e CENTENNIAL
in which c is a constant that is independent of n. Therefore,

(p − 1)! ≤ |J| ≤ cp
and hence
cp−1
1 ≤ c → 0.
(p − 1)!
This contradiction proves that e is transcendental.
Solution to the problem. The solution is simpler than one might suspect
since it has nothing to do with special properties of e and π, or with any mysterious
relationships between them. Given two transcendental numbers α and β, at least
one of α + β and αβ is transcendental. Here is the explanation. Suppose that there
are transcendental numbers α and β such that α + β and αβ are both algebraic.
Since the sum and product of algebraic numbers are algebraic (see the notes for the
1967 entry), it follows that
(α + β)2 and 4αβ
are algebraic. Therefore,
(α − β)2 = (α + β)2 − 4αβ
is algebraic too. Since square roots of algebraic numbers are algebraic (prove it!),
we deduce that α − β is algebraic and hence
1
α = (α + β) − (α − β)
2
is algebraic. This contradiction shows that at least one of α + β and αβ is tran-
scendental.
In the context of our problem, we know that at least one of e + π and eπ is
transcendental. Possibly both of them are. As of 2019, we still do not know. The
same goes for e/π, π − e, π π , and ee , although we do know that π + eπ and πeπ are
both transcendental [5].
Bibliography
[1] E. B. Burger and R. Tubbs, Making transcendence transparent: An intuitive approach to
classical transcendental number theory, Springer-Verlag, New York, 2004. MR2077395
[2] B. Fine, A. Gaglione, A. Moldenhauer, G. Rosenberger, and D. Spellman, Algebra and number
theory: A selection of highlights, De Gruyter Textbook, De Gruyter, Berlin, 2017. MR3727130
[3] S. Lang, Algebra, 3rd ed., Springer-Verlag, 2002.
[5] Yu. V. Nesterenko, Modular functions and transcendence questions (Russian, with Russian
summary), Mat. Sb. 187 (1996), no. 9, 65–96, DOI 10.1070/SM1996v187n09ABEH000158;
English transl., Sb. Math. 187 (1996), no. 9, 1319–1348. MR1422383
[6] D. Richeson, The transcendence of e, Division by Zero blog, September 28, 2010. https://
divisbyzero.com/2010/09/28/the-transcendence-of-e/.
[7] R. Schwartz, Transcendence of e, online notes adopted from Section 5.2 of Herstein’s Topics
in Algebra, http://www.math.brown.edu/~res/M154/e.pdf.
1974
Rubik’s Cube
Introduction
In 1974, Ernő Rubik (1944– ) invented the Magic Cube (as it was initially called
in his native Hungary), a mechanical puzzle now known around the world as the
Rubik’s Cube [3]. It is easy to scramble the cube with just a few turns; figuring out
how to restore the six faces takes much more work (one solution is presented in the
comments below). Although the Rubik’s Cube has
43,252,003,274,489,856,000 = 227 × 314 × 53 × 72 × 11
possible states, it can always be solved in 20 moves or less, a fact only established in
2010 [1]. At the first World Championships in 1982, winner Minh Thai (1966– ) of
the United States won with a best time of 22.95 seconds. The current world record
belongs to Feliks Zemdegs (1995– ) of Australia, who clocked in at 4.22 seconds [4].
The best average time over five solves is Zemdegs’s astounding 5.80 seconds.
The mathematics of the Rubik’s Cube is inherently noncommutative in nature:
the order of operations matters. For example, fix an orientation of the cube, rotate
the front face by 90◦ clockwise, then rotate the right face by 90◦ clockwise:
F R
−−−−−→ −−−−−→
Call these operations F and R, respectively. Now take a similarly oriented cube
and perform these steps in the reverse order:
R F
−−−−−→ −−−−−→
Since we have obtained two different configurations, F R = RF .

The Rubik’s Cube group is the group generated by the symbols U, D, L, R, F, B
(for Up, Down, Left, Right, Front, and Back, respectively) and their inverses,
333
334 1974. RUBIK’S CUBE
subject to the natural relations imposed by the cube itself. For example, U 4 =
D4 = L4 = R4 = F 4 = B 4 = I, in which I denotes the identity element (that is, do
nothing) of the Rubik’s Cube group. This algebraically encapsulates the fact that
turning any face of the cube four times returns the cube to its original state. Other
relations are more subtle, such as (R2 U 2 )6 = I and (RU 2 D−1 BD−1 )1260 = I.

Proposed by Alan Chang, Princeton University.
(a) Start with a solved Rubik’s Cube. Prove that every finite sequence of turns,
if repeated enough times, will get you back to the solved state.
(b) Observe that each face of the Rubik’s Cube has two “cuts” (in order to produce
three layers). We say that a Rubik’s Cube “has cuts at 1/3 and 2/3.” If
you want to turn a face of the cube, you must turn along one of these cuts.
Similarly, a 4 × 4 × 4 cube has cuts at 1/4, 2/4, and 3/4. Suppose instead
that we have a cube that has a cut at α for every α ∈ [0, 1]. Is it true that
any finite sequence of moves, if repeated enough times, will get you back to a
solved state?
Acknowledgements: This problem would not have been possible without the help of
the second named author of this book, who suggested looking at an infinite variation of
(a), a dinner discussion with a group of SMALL ’13 REU students at Williams College,
and Scott Sicong Zhang, who helped simplify the proof of (b).
1974: Comments
How to solve a cube. Before giving the solution to the centennial problem,
we might as well provide a solution to the Rubik’s Cube itself! Although the
following method is far from the fastest, it is relatively simple and relies only on
a small number of algorithms. The first named author of this book and economist
(and Pomona alum) Xan Vongsathorn have coached dozens of students through
their first solves with the method below. The second named author uses a similar
approach (online tutorials of his are available at [2]). Speed cubers know dozens of
additional algorithms and have different approaches to the cube entirely.
Figure 1. The six faces of a Rubik’s Cube.
The Rubik’s Cube has six faces, which we refer to as U , D, L, R, F , and B

(for Up, Down, Left, Right, Front, and Back, respectively). Unlike the preceding
discussion, we do not attach these letters to particular colors or insist upon fixing
a certain orientation of the cube. In Figure 1 we call the orange face F because we
are holding the cube so that the orange face is in front of us. The green face is R
because it is on our right.
The letters U, D, L, R, F, B also describe turning a face 90◦ clockwise from the
perspective of someone looking at the face head on. For example, U means “turn
the U face clockwise (seen from above) by 90◦ ” and D means “turn the D face
clockwise (from the perspective of someone looking at the bottom of the cube).”
We use F −1 to refer to a counterclockwise quarter turn of the F face, and similarly
for the other faces. An algorithm is a specific sequence of turns, such as F RF −1 R−1 .
This algorithm asks us to execute F , then R, then F −1 , then R−1 .
The first step is to make a white cross on the U face:
Good: Bad:
Somewhere on the cube are four corners with a white sticker on them. You want
to get them on the white (U ) face without messing up the white cross. The two
other colors on the white corners will need to match their surroundings:
Good: Bad:
You can use these algorithms to move a corner into position.
F F F
F DF −1 R−1 DDRDR−1 D−1 R R−1 D−1 R
If a white corner is in the top layer but not in the correct position, use one of the
preceding algorithms to move it into the bottom layer. Then proceed as above.
Now flip the cube over so that the white side is on the D face. If possible, turn
the U layer until you are in a position to apply one of these algorithms:
(U −1 F −1 U F )(U RU −1 R−1 )
−−−−−−−−−−−−−−−−−−−−−−→
F F
(U RU −1 R−1 )(U −1 F −1 U F )
−−−−−−−−−−−−−−−−−−−−−−→
F F
If the desired edge is not in the top layer, then it is in the middle layer. Use one
of the algorithms above to swap the edge that is stuck in the middle layer with an
edge from the upper layer. Now proceed as above. With these two algorithms, you
can solve all four of the middle layer edge pieces.
We now want to make a yellow cross on the (U ) face. If you already have a
yellow cross, you can move onto the next phase. If not, apply the algorithm below
one, two, or three times to make the cross.
B B
F U RU −1 R−1 F −1
L U R −−−−−−−−−−−−−−→ L U R
F F
B B
F U RU −1 R−1 F −1
L U R −−−−−−−−−−−−−−→ L U R
F F
B B
F U RU −1 R−1 F −1
L U R −−−−−−−−−−−−−−→ L U R
F F
We now need to put the yellow edges in the correct locations:
Good: Bad:
Here are algorithms to swap two or three adjacent edge pieces:
B B
L U R L U R
F F
(RU R−1 U )(RU U R−1 )U (RU R−1 U )(RU U R−1 )
You might need to swap two opposite edges. In that case, apply one of the preceding
algorithms and reevaluate the situation.
Now we need to move the corners to the right locations. We will worry about
their orientations later.
Good: Bad:
Use the algorithm
L U R (U RU −1 L−1 )(U R−1 U −1 L)
one or more times to permute the corners until they are in the correct locations.
This is the last step! It is the most complicated. Every piece should be in the
right location, but the yellow corners may not all have yellow stickers facing up.
The algorithm
(R−1 D−1 RD)(R−1 D−1 RD)

−−−−−−−−−−−−−−−−−−−−−−→
F F
rotates the Up-Front-Right (U F R) corner counterclockwise. If you do it twice in

a row, it will rotate the U F R corner counterclockwise twice, which is the same as
rotating it clockwise. This algorithm has strange side effects on the bottom two
layers, unless you do it three times in a row. Of course, if you do it three times in
a row, your corner piece will rotate counterclockwise three times, ending up back
where it started! The trick is to rotate the U face in between executions of the
algorithm.
If you need to rotate the U F R corner, apply the algorithm until the corner is
properly oriented, that is, yellow side up. This will require one repetition if it needs
to be rotated counterclockwise, and two if clockwise.
Then rotate U until another misoriented corner is in the U F R position, and
repeat the algorithm until the new corner is properly oriented. Repeat this until
all the yellow corners have been properly oriented. This step always requires that
you repeat the algorithm some multiple of three times. If you are successful, the
bottom two layers will be restored and the cube solved!
Solution to the centennial problem. (a) Start with a solved cube and let
M be a finite sequence of turns. There are only r = 227 · 314 · 53 · 72 · 11 possible
states of the cube. The pigeonhole principle ensures that among the r + 1 states
M, M 2 , M 3 , . . . , M r+1 there are two that are identical. That is, there are distinct
s > t so that M s = M t . Thus, M s−t = I; that is, repeating M a total of s − t
times returns the cube to the solved state.
(b) We cannot use the pigeonhole principle directly since there are infinitely
many states. However, each finite sequence of turns involves only finitely many cuts.
These cuts, along with their reflections, divide the cube into an n × n × n cube for
some n. By a proof similar to (a), any finite sequence of turns on a solved n × n × n
cube eventually returns to the solved state after sufficiently many iterations.
Bibliography
[1] Cube20, God’s Number is 20, http://www.cube20.org/.
[2] S. J. Miller, Talks on solving the 2 × 2 × 2 and 3 × 3 × 3 cubes, https://youtu.be/PKZ7pxFyYu0
and https://youtu.be/FO1kOU-3Blw.
[3] Rubik’s, Home of the Rubik’s Cube, http://www.rubiks.com/.
[4] World Cube Association, https://www.worldcubeassociation.org/.
1975
Szemerédi’s Theorem
Introduction
An arithmetic progression is a finite sequence of integers, such as 4, 9, 14, 19, 24,
whose consecutive terms differ by a fixed amount; see the 1913 entry. We say that
a subset of the natural numbers is AP-rich if it contains arbitrarily long arithmetic
progressions. For example, the set of even numbers is AP-rich. Is the set
A = {1, 2, 3, 5, 6, 7, 10, 11, 13, 14, 15, 17, 19, 21, 22, 23, 26, 29, . . . } (1975.1)
of square-free natural numbers AP-rich? What about the set
B = {1, 4, 9, 16, 25, 36, . . .}
of perfect squares? Or the set
C = {2, 3, 5, 7, 11, 13, 17, 19, 23, . . .}
of prime numbers?
Although each of these three sets is infinite, the ways in which they sit inside
of the natural numbers are different. The square-free numbers appear omnipresent,
whereas the perfect squares seem sparsely distributed. The prime numbers are
somewhere in between. To capture this intuitive idea, we introduce the notion of
natural density (or simply density):
|S ∩ {1, 2, . . . , n}|
d(S) = lim . (1975.2)
n→∞ n
For example, one can show that
6
d(A) = = 0.607927 . . . ;
π2
see the notes for the 1939 entry. Consequently, one might say that “A contains
about 60.8% of the natural numbers.” The perfect squares are much sparser, since
√ √
n + 1 2 n
d(B) = lim ≤ lim = 0.
n→∞ n n→∞ n
Similarly, the prime number theorem (see the 1919 and 1948 entries) ensures that
π(n) n/ log n 1
d(C) = lim = lim = lim = 0.
n→∞ n n→∞ n n→∞ log n
Natural density confirms, in a quantitative manner, that there are a lot more square-
free natural numbers than perfect squares or primes. One can make such statements
more precise by studying the asymptotic behavior of the quotient that appears in
√
(1975.2). For the sets A, B, C above, the quotient is asymptotic to 6n/π 2 , 1/ n,
and 1/ log n, respectively.
339
340 1975. SZEMERÉDI’S THEOREM
In practice, natural density is too restrictive. Indeed, there are subsets of

the natural numbers for which the limit (1975.2) is undefined. Can you find an
example? Of greater use is the notion of upper density
|S ∩ {1, 2, . . . , n}|
d(S) = lim sup ,
n→∞ n
which always exists.
In 1935, Klaus Friedrich Roth, whom we met in our 1955 entry, proved that any
subset of the natural numbers with positive upper density contains infinitely many
three-term arithmetic progressions [13]. Paul Erdős (see the 1913 entry) and Pál
Turán (1910–1976) then conjectured that every subset of the naturals with positive
upper density is AP-rich [7].1 To be more precise, the original wording is less direct
and they attribute much of the conjecture to George Szekeres (1911–2005):
More generally, he [Szekeres] has conjectured that, if we denote by
rl (N ) the maximum number of integers less than or equal to N such
that no l of them form an arithmetic progression, then, for any k, and
any prime p,

(p − 2)pk + 1)
rp = (p − 1)k .
p−1
An immediate and very interesting consequence of this conjecture
would be that for every k there is an infinity of k combinations of
primes forming and arithmetic progression.
A major step occurred in 1969 when Endre Szemerédi extended Roth’s theorem
to four-term arithmetic progressions [18]. In 1975, Szemerédi proved the Erdős–
Turán conjecture in its entirety [19]. For this, and many other results, he received
the Abel Prize in 2012.
Armed with Szemerédi’s theorem, we can assert that the set of square-free
natural numbers is AP-rich. This is not at all obvious. However, even Szemerédi’s
theorem does not address whether the perfect squares or the prime numbers, which
both have density zero, are AP-rich. Elementary arguments show that the set of
perfect squares contains infinitely many arithmetic progressions of length three (see
the 1913 entry). More sophisticated methods confirm that there is no arithmetic
progression in the perfect squares of length four (see the 2004 entry). The Green–
Tao theorem, for which Hillel Furstenberg’s ergodic-theoretic proof of Szemerédi’s
theorem was a crucial ingredient [8], asserts that the primes are indeed AP-rich.
See the 2004 entry for more about the Green–Tao theorem [12] and some of its
extensions [14, 15].
We say that S ⊆ N is additively large if for some sequence of “intervals”
In = {an + 1, an + 2, . . . , an + n } ⊆ N
with lengths |In | = n → ∞, the following holds:
|S ∩ In |
d(S; In ) = lim sup > 0. (1975.3)
n→∞ |In |
One can prove that Szemerédi’s theorem implies the ostensibly stronger statement
that every additively large set is AP-rich.
1 In Hungarian, their names are Erdős Pál and Turán Pál, respectively.
What about geometric progressions? A geometric progression is a finite se-

quence of integers, such as 6, 12, 24, 48, 96, so that the ratio of consecutive terms
is constant. We say a subset of N is GP-rich if it contains arbitrarily long geo-
metric progressions. Are sets with positive upper density GP-rich? No. The set of
square-free numbers, which has density 6/π 2 , provides a counterexample. It cannot
contain a length-three geometric progression a, ar, ar 2 since the third term cannot
be square free.
Nowadays, it is customary to view Szemerédi theorem as a density version of
van der Waerden’s theorem [16], a seminal result of Ramsey theory that implies
that for any finite partition
*r
N = Ci , (1975.4)
i=1
at least one of the Ci is AP-rich (see the 1930 entry). It is also true that one
of the Ci is GP-rich: consider the restriction of the coloring (1975.4) to the set
{2n : n ∈ N} and then apply van der Waerden’s theorem.
If one hopes that a partition result from Ramsey theory should have a density
version, we need a new notion of largeness that is geared towards the multiplicative
structure of N. The additive semigroup of natural numbers (N, +) has the single
generator 1 since
k = 1 + 1 +· · · + 1 .
k times
On the other hand, the multiplicative semigroup (N, ×) has infinitely many gener-
ators: they are the prime numbers. This distinction complicates matters.
(j)
For each j ∈ N, let Nn be an increasing sequence of natural numbers. Let an
be a sequence in N, let p1 , p2 , . . . be an enumeration of the primes, and let
Fn = {an pi11 pi22 · · · pinn : 0 ≤ ij ≤ Nn(j) , 1 ≤ j ≤ n}.
For A ⊆ N, the upper multiplicative density with respect to the family Fn is
|A ∩ Fn |
d× (A; Fn ) = lim sup .
n→∞ |Fn |
Observe that d× (A; Fn ) is invariant with respect to multiplication and division in
the sense that
d× (A; Fn ) = d× (kA; Fn ) = d× (A/k; Fn ),
in which
kA = {ka : a ∈ A} and A/k = {b : kb ∈ A}.
The sets Fn are best viewed as multiplicative counterparts of the family of intervals
that appear in (1975.3). We say that A ⊆ N is multiplicatively large if d× (A; Fn ) > 0
for some sequence Fn as defined above.
Vitaly Bergelson (1950– ) proved that any multiplicatively large set is GP-rich
[3]. This can be viewed as the multiplicative analogue of Szemerédi’s theorem. In
light of van der Waerden’s theorem, for any finite partition (1975.4), at least one Ci
is simultaneously AP-rich and GP-rich. It turns out, surprisingly, that the notion of
multiplicative largeness admits a density version of this result: any multiplicatively
large set is AP-rich [3].
Suppose that S is a syndetic set in (N, +), that is, a set with the property that
finitely many of its shifts
k + S = {k + s : s ∈ S}
cover N. Equivalently, S is syndetic if it has bounded gaps in the sense that there
exists a g ∈ N so that
{a, a + 1, a + 2, . . . , a + g} ∩ S = ∅
for all a ∈ N.

Proposed by Vitaly Bergelson, The Ohio State University.
Are syndetic subsets of the natural numbers GP-rich?
1975: Comments
Divisibility chains. The set (1975.1) of square-free numbers, while free of
geometric progressions, has a lot of multiplicative structure. Each prime number is
square free, and the product of any finite number of distinct primes is square free.
Thus, A contains
p1 , p1 p2 , p1 p2 p3 , . . . ,
for any sequence p1 , p2 , . . . of distinct primes. It turns out that any set of positive
upper logarithmic density (a notion slightly stronger than that of upper density)
contains an infinite divisibility chain, that is, a sequence x1 , x2 , . . . for which each
term divides the next [6]. On the other hand, there is a set of positive upper density
for which no element divides any other element [5].
Ramanujan’s constant. If someone told you that a particular computation
had produced the number
262,537,412,640,768,743.999999999999,
it would be reasonable to assume that the correct result is actually the integer
262,537,412,640,768,744
and that the string of twelve 9’s beyond the decimal point is the byproduct of round-
off error or some other inaccuracy introduced through numerical computation.
In 1975, Martin Gardner (see the 1914 entry) played a famous April Fool’s
joke on the mathematical community when he claimed in his√Scientific American
column that Srinivasa Ramanujan had conjectured that exp(π √163) was an integer
[9] (Gardner fessed up about the joke in [10]). Although exp(π 163) is not exactly
an integer, it is remarkably close since
√
eπ 163
= 262,537,412,640,768,743.9999999999992500725971981856888 . . . .
This amazing near-miss had already been noted in 1859 by Charles Hermite, whom
we met in our 1973 entry. One should keep in mind that few readers in 1975 would
have been able to detect this ruse. Personal computers did not yet exist and desktop
calculators did not have the ability to deal with such large numbers or work with
such great precision. On the other hand, the first named author just computed
√
1,000 digits of eπ 163 on a late 2013 iMac in just 0.000038 seconds. One million
digits only took 3.367 seconds. How far we have come!
The origin of this spectacular “almost integer” lies with the theory of the j-
invariant; see the 1992 entry. If τ is a quadratic irrational number with positive
imaginary part, then j(τ ) is an algebraic integer (an algebraic number that is the
root of a monic polynomial with integer coefficients) whose degree is the class
number of the quadratic field Q(τ ). Consequently, if Q(τ ) has class number one,
then j(τ ) is an algebraic
√ integer of degree 1, that is, an integer in√the usual sense
of the word. For τ = −d with d square free (to ensure that Q( −d) = Q), this
occurs if and only if d is a Heegner number . These are 1, 2, 3, 7, 11, 19, 43, 67, 163;
see the 1966 entry. One can show that
√
√
1 + −d
eπ d
≈ −j + 744,
2
in which the first term on the right-hand side is an integer if d = 163. Other, less
spectacular, near-integer identities hold for the largest remaining Heegner numbers:
√
eπ 67
= 147,197,952,743.99999 . . . ,
√
eπ 43
= 884,736,743.9997 . . . .
For an explanation of the mathematics behind “Ramanujan’s constant”, see [11].
Another Erdős–Turán conjecture? Erdős’s conjecture on arithmetic pro-

gressions (see the 1913 entry) states that
1
diverges =⇒ A is AP-rich. (1975.5)
a
a∈A
The conjecture is often confusingly referred to as the Erdős–Turán conjecture, which

more properly refers to the original conjecture proved by Szemerédi. There is also
the Erdős–Turán conjecture on additive bases, which is something else entirely!
Status of the centennial problem. At present, it is not known whether any

syndetic set contains a pair of the form {a, ar 2 }. See [1, 2] for discussion and some
equivalent forms of this problem.
Bibliography
[1] M. Beiglböck, V. Bergelson, N. Hindman, and D. Strauss, Multiplicative structures
in additively large sets, J. Combin. Theory Ser. A 113 (2006), no. 7, 1219–
1242, DOI 10.1016/j.jcta.2005.11.003. http://www.sciencedirect.com/science/article/
pii/S0097316505002141. MR2259058
[2] M. Beiglböck, V. Bergelson, N. Hindman, and D. Strauss, Some new results in multiplicative
and additive Ramsey theory, Trans. Amer. Math. Soc. 360 (2008), no. 2, 819–847, DOI
10.1090/S0002-9947-07-04370-X. http://www.ams.org/journals/tran/2008-360-02/S0002-
9947-07-04370-X/S0002-9947-07-04370-X.pdf. MR2346473
[3] V. Bergelson, Multiplicatively large sets and ergodic Ramsey theory, Probability in mathemat-
ics, Israel J. Math. 148 (2005), 23–40, DOI 10.1007/BF02775431. http://link.springer.
com/article/10.1007%2FBF02775431. MR2191223
[4] V. Bergelson and A. Leibman, Polynomial extensions of van der Waerden’s and Szemerédi’s
theorems, J. Amer. Math. Soc. 9 (1996), no. 3, 725–753, DOI 10.1090/S0894-0347-96-00194-4.
MR1325795
[5] A. S. Besicovitch, On the density of certain sequences of integers, Math. Ann. 110 (1935),
no. 1, 336–341, DOI 10.1007/BF01448032. MR1512943
[6] H. Davenport and P. Erdős, On sequences of positive integers, Acta Arith. 2 (1936), 147–151.
[7] P. Erdős and P. Turán, On Some Sequences of Integers, J. London Math. Soc. 11 (1936),
261–264. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.8225.
[8] H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemerédi on
arithmetic progressions, J. Analyse Math. 31 (1977), 204–256, DOI 10.1007/BF02813304.
MR0498471
[9] M. Gardner, Mathematical Games: Six Sensational Discoveries that Somehow or Another
Have Escaped Public Attention, Sci. Amer. 232 (1975), no. 4, 127–131.
[10] M. Gardner, Mathematical Games: On Tessellating the Plane with Convex Polygons, Sci.
Amer. 232 (1975), no. 7, 12–117.
[11] B. J. Green, The Ramanujan Constant: An Essay on Elliptic Curves, Complex Multiplication
and Modular Forms, http://people.maths.ox.ac.uk/greenbj/papers/ramanujanconstant.
pdf.
Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. MR2415379
[13] K. F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 104–109, DOI
10.1112/jlms/s1-28.1.104. MR0051853
[14] T. Tao and T. Ziegler, The primes contain arbitrarily long polynomial progressions, Acta
Math. 201 (2008), no. 2, 213–305, DOI 10.1007/s11511-008-0032-5. MR2461509
[15] J. Pintz, Patterns of primes in arithmetic progressions, Number theory—Diophantine prob-
lems, uniform distribution and applications, Springer, Cham, 2017, pp. 369–379. MR3676411
[16] B. L. van der Waerden, Beweis einer Baudetschen Vermutung, Nieuw. Arch. Wisk. 15 (1927),
212–216.
[17] Wikipedia, https://en.wikipedia.org/wiki/Heegner_number.
[18] E. Szemerédi, On sets of integers containing no four elements in arithmetic progression, Acta
Math. Acad. Sci. Hungar. 20 (1969), 89–104, DOI 10.1007/BF01894569. MR0245555
[19] E. Szemerédi, On sets of integers containing no k elements in arithmetic progression, Col-
lection of articles in memory of Juriı̆ Vladimirovič Linnik, Acta Arith. 27 (1975), 199–245,
DOI 10.4064/aa-27-1-199-245. MR0369312
1976
Four Color Theorem
Introduction
The four color theorem states that every planar map can be colored with four
colors in such a way that no two adjacent countries share the same color; see Figure
1. However, we should be precise about what this means. First of all, each country
must be connected. For example, the United States does not count because Alaska
and Hawaii are not connected to the lower forty-eight states. Second, we do not
consider countries that touch “at corners” to be adjacent. Thus, Arizona and
Colorado do not share a border as far as we are concerned; neither do Utah and
New Mexico. Finally, we prohibit countries with infinitely long boundaries since
otherwise one can construct bizarre maps that require more than four colors [8].
The year 1976 marked the end of the long search for a (correct) proof of the four
color theorem, which was initially conjectured in 1852 by Francis Guthrie (1831–
1899). The conjecture was prompted by his attempt to color a map of English
counties. Today most people know the theorem in the form “no more than four
colors are needed to color a map.” Despite this common understanding of the
theorem, cartographers claim that it does not matter since there is no reason to
limit the number of colors used. Moreover, only three colors are needed for most
(a) Georgia (b) Ohio
Figure 1. Four colorings of the counties in two US states.
345
346 1976. FOUR COLOR THEOREM
maps that arise in practice. Despite its pragmatic insignificance, the four color
theorem has great historical importance.
To make the problem more precise, one converts statements about maps into
statements about graphs. Assign each country a vertex. Place an edge between
two vertices if and only if the two corresponding countries share a common border.
This permits us to phrase the four color theorem in terms of graph theory: the
vertices of any graph that can be drawn in the plane without edge crossings can be
colored with at most four colors so that no two adjacent vertices share the same
color.
The four color theorem has the dubious honor of having been “proved” twice
before 1976. Proofs by Alfred Kempe (1849–1922) in 1879 and by Peter Guthrie
Tait (1831–1901) in 1880 each stood unchallenged for 11 years before fatal flaws
were found. It is much easier to prove that five colors suffice [7]; see [9, Chapter
19] for details.
It was not until 1976 that mathematicians again claimed to have a proof of
the elusive theorem. Kenneth Appel (1932–2013) and Wolfgang Haken (1928– ) at
the University of Illinois proved the four color theorem with computer assistance,
through which they reduced the problem to 1,936 special cases, each of which was
checked by computer. This was greeted with controversy by the mathematical
community (see also the 1998 entry on the Kepler conjecture). Is a proof valid
if it is so long and computationally intensive that no human can understand it
in totality? Although the theorem has since been verified by the Coq interactive
theorem prover [6], there are some who still find the prospect of computer-aided
proofs unsettling. Perhaps a more elegant, humanly understandable proof of the
four color theorem exists. Try to find it!

Proposed by Alexandra Jensen, Steven J. Miller, and Pamela Mishkin,
Williams College.
We know that four colors suffice to color a planar map so that no two countries
with a common border share the same color. What if we add the constraint that no
color is used too often? For what p ∈ [25, 100] does a four coloring exist that uses
each color for at most p% of the countries? The four color theorem says we may
take p = 100 and the pigeonhole principle tells us we cannot have p < 25. What if
we only require at most p% of each color when there are at most N regions?
1976: Comments
Heawood conjecture. The four color theorem tells us that we can color any
planar map using at most four colors. What about map colorings on the torus,
the Klein bottle (see the 1958 entry), or other surfaces? Percy J. Heawood (1861–
1955), who spent most of his career attempting to prove the four color theorem and
found the fatal flaw in Kempe’s 1879 proof, conjectured in 1890 that the minimum
Table 1. Computation of the Euler characteristics of the five Pla-

tonic solids. Here v denotes the number of vertices, e the number
of edges, and f the number of faces of the solid. Since all five solids
are homeomorphic, their Euler characteristics are equal.
S v e f χ
tetrahedron 4 6 4 2
cube 8 12 6 2
octahedron 6 12 8 2
dodecahedron 20 30 12 2
isosahedron 12 30 20 2
number of colors required to color any map on a two-dimensional surface S is

0 √ 1
7+ 49 − 24χ
, (1976.1)
2
in which χ denotes the Euler characteristic of S [7]. To compute χ, triangulate S

and use the formula
χ(S) = v − e + f,
in which v denotes the number of vertices, e the number of edges, and f the number
of faces in the triangulation. It turns out that any triangulation of S produces the
same value; that is, the Euler characteristic is a topological invariant of S.1 For
example, the five Platonic solids are all homeomorphic (see p. 22) to a sphere and
all have χ = 2; see Figure 2 and Table 1. Substituting this into (1976.1) suggests
that any map on a sphere can be colored with at most four colors.
What is the status of the Heawood conjecture? Technically, it was disproved in
1934 when Philip Franklin (1898–1965) proved that any map on the Klein bottle (for
which χ = 0) can be colored with only six colors, as opposed to the seven predicted
by the conjecture [5]. This bound is tight since the Franklin graph (Figure 3) can
be embedded on the surface of the Klein bottle and the resulting map cannot be
colored with fewer than six colors. Morally speaking, however, the conjecture is
true 100% of the time since Gerhard Ringel (1919–2008) and John W. T. Youngs
(1910–1970) proved that it holds for all surfaces other than the Klein bottle [10].
For example, any map on the torus (which has χ = 0) can be colored with only
seven colors, and this is minimal; see Figure 4.
1 It is important to note that nonhomeomorphic surfaces may have the same Euler character-
istic. For example, the torus and the Klein bottle both have Euler characteristic zero. They are
not homeomorphic since, for example, they have different fundamental groups (Z2 for the torus
and
a, b : ab = b−1 a for the Klein bottle). We refrain from further discussion since that would
take us too far afield.
(a) Tetrahedron (4, 6, 4). (b) Cube (8, 12, 6) (c) Octahedron (6, 12, 8)
(d) Dodecahedron (20, 30, 12) (e) Icosahedron (12, 30, 20)
Figure 2. The five Platonic solids along with (v, e, f ), in which v

denotes the number of vertices, e the number of edges, and f the
number of faces. The surface of each Platonic solid is homeomor-
phic to a two-dimensional sphere. Since the Euler characteristic
of a surface is a topological invariant, v − e + f = 2 for all five
surfaces. Readers who prefer the terminology d4, d6, d8, d10, d12,
and d20, respectively, for these objects gain 100 experience points.
Figure 3. The Franklin graph can be embedded on the surface of

the Klein bottle. The resulting map cannot be colored with fewer
than six colors. Since Franklin proved that every map on a Klein
bottle can be colored with at most six colors, this example shows
that his bound is sharp.
Figure 4. The map at left can be wrapped onto the surface of a

torus. This example shows that not every map on the torus can
be colored with fewer than seven colors.
Bibliography
[1] K. Appel and W. Haken, Every planar map is four colorable. I. Discharging, Illinois J.
Math. 21 (1977), no. 3, 429–490. http://www.projecteuclid.org/euclid.ijm/1256049011.
MR0543792
[2] K. Appel, W. Haken, and J. Koch, Every planar map is four colorable. II. Reducibility, Illinois
J. Math. 21 (1977), no. 3, 491–567. http://projecteuclid.org/euclid.ijm/1256049012.
MR0543793
[3] K. Appel and W. Haken, The solution of the four-color-map problem, Sci. Amer. 237 (1977),
no. 4, 108–121, 152, DOI 10.1038/scientificamerican1077-108. MR0543796
[4] K. Appel and W. Haken, Every planar map is four colorable, with the collaboration of
J. Koch, Contemporary Mathematics, vol. 98, American Mathematical Society, Providence,
RI, 1989. MR1025335
[5] P. Franklin, A six color problem, J. Math. Phys. 13 (1934), 363–379.
[6] G. Gonthier, Formal proof—the four-color theorem, Notices Amer. Math. Soc. 55 (2008),
no. 11, 1382–1393. http://www.ams.org/notices/200811/tx081101382p.pdf. MR2463991
[7] P. J. Heawood, Map-colour theorems, Quarterly Journal of Mathematics, Oxford 24 (1890),
332–338.
[8] H. Hudson, Four colors do not suffice, Amer. Math. Monthly 110 (2003), no. 5, 417–423.
[10] G. Ringel and J. W. T. Youngs, Solution of the Heawood map-coloring problem, Proc. Nat.
Acad. Sci. U.S.A. 60 (1968), 438–445, DOI 10.1073/pnas.60.2.438. MR0228378
[11] R. Thomas, An update on the four-color theorem, Notices Amer. Math. Soc. 45 (1998), no. 7,
848–859. http://www.ams.org/notices/199807/thomas.pdf. MR1633714
[12] Wikipedia, Four color theorem, http://en.wikipedia.org/wiki/Four_color_theorem.
1977
RSA Encryption
Introduction
Alice and Bob wish to communicate without letting an eavesdropper, Eve,
understand their conversation. Any information that they wish to exchange can be
encoded with numbers (see the comments for the 1936 entry). Instead of sending
one large number that represents an entire message, information is typically broken
up into smaller blocks of fixed size. Thus, Alice and Bob want to securely send
and receive nonnegative integers less than or equal to a fixed threshold while Eve
is eavesdropping. Moreover, they need to do this without first exchanging a secret
key for their code: otherwise Eve will know the key!
The RSA cryptosystem, invented by Ronald Rivest (1947– ), Adi Shamir
(1952– ), and Leonard Adleman (1945– ) in 1977 and, independently, by Clifford
Cocks (1950– ) of the UK intelligence agency GCHQ (Government Communications
Headquarters) in 1973, addresses this issue (Cocks’s work remained classified until
1997). Eve can listen to the entire RSA-encrypted communication and she will be
unable to decipher it! Without algorithms such as RSA, modern e-commerce would
be impossible: we can buy things online without meeting the seller in person to
agree on a secret key for the transaction. To perform this amazing feat, Alice and
Bob require some number theory.
To describe the RSA cryptosystem, we need Euler’s generalization of Fermat’s
little theorem. Fermat’s little theorem tells us that
ap−1 ≡ 1 (mod p)
if p is prime and gcd(a, p) = 1; see the 2002 entry. Let φ(n) denote the number
of j ∈ {1, 2, . . . , n} that are relatively prime to n. For example, φ(15) = 8 since
there are eight numbers, namely 1, 2, 4, 7, 8, 11, 13, 14, in the specified range that
are relatively prime to 15. The function φ is called the Euler totient function. It is
multiplicative, in the sense that φ(mn) = φ(m)φ(n) if m and n are relatively prime.
For example, φ(15) = φ(3)φ(5) = 2 · 4 = 8. Moreover, φ(p) = p − 1 whenever p is
prime, since 1, 2, . . . , p − 1 are relatively prime to p. Euler’s theorem states that if
gcd(a, n) = 1, then
aφ(n) ≡ 1 (mod n).
We are now ready to state the RSA algorithm.

351
352 1977. RSA ENCRYPTION
RSA algorithm.
• Alice secretly selects distinct large primes p and q. Their product n = pq is her
enciphering modulus.
• Alice picks a public key (also called an encryption key) e. This is a positive
integer such that gcd(e, φ(n)) = 1. She knows n = pq, so she can compute
φ(n) = φ(p)φ(q) = (p − 1)(q − 1) and check if gcd(e, φ(n)) = 1 rapidly via the
Euclidean algorithm.
• Alice’s private key (also called a decryption key) d is the inverse of e (mod φ(n)).
Thus, de = jφ(n) + 1 for some integer j.
• Alice makes n and e known to the public. She does not disclose p, q, or d.
• To send the message M ∈ {1, 2, . . . , n} to Alice, Bob computes1 E ≡ M e (mod n).
He sends E to Alice.
• Alice recovers M from E as follows:2
E d ≡ (M e )d ≡ M de ≡ M jφ(n)+1 ≡ (M φ(n) )j M ≡ M (mod n).
Since n and e are publicly available, anyone can send messages to Alice. Only
she can decrypt these messages because only she knows the private key d. Here
is an example. Alice selects secret primes p = 7,919 and q = 9,733. Then n =
pq = 77,075,627 and φ(n) = (p − 1)(q − 1) = 77,057,976. Alice chooses e = 47
and checks that gcd(47, φ(n)) = 1. The multiplicative inverse of 47 (mod φ(n)) is
d = 68,860,319. Bob wants to send the message M = 12,345 to Alice. He computes
E ≡ M e = (12,345)47 ≡ 18,269,972 (mod n)
and sends this to Alice, who receives it and computes
E d = (18,269,972)68,860,319 ≡ 12,345 (mod n).
Suppose that Eve wants to find M , knowing only E and Alice’s public informa-
tion, n and e. She needs Alice’s private key d, so Eve must solve de ≡ 1 (mod φ(n)).
To do this, Eve needs to know φ(n) = (p − 1)(q − 1). Since
(p − 1)(q − 1) = pq − p − q + 1 = n − (p + q) + 1,
knowing φ(n) is equivalent to knowing p + q. However, knowing p + q is equivalent
to knowing p and q since the roots of
(x − p)(x − q) = x2 − (p + q)x + pq = x2 − (p + q)x + n,
namely p and q, can be found by the quadratic formula. Thus, finding φ(n) =
(p − 1)(q − 1) is as hard as factoring n = pq.
The security of RSA is based upon the assumption that it is hard to factor large
numbers (even though it is easy to multiply them). If a method for fast factoriza-
tion were to be found, then RSA would cease to be secure. Peter Shor (1959– )
found such an algorithm for fast factorization, but it requires a quantum computer.
Although quantum computers have so far only been able to factor relatively small
1 Althoughexponentiating M modulo n appears to be a daunting task, it can be done rapidly
by repeated squaring and modular reduction; see the 2002 entry.
2 One can prove that E d ≡ M (mod n) even if gcd(M, n) = 1.
numbers, the potential exists for them to one day factor RSA moduli. Other cryp-
tographic systems, such as lattice-based methods, are believed to be more secure
against quantum-computer attacks.

Rivest, Shamir, and Adleman formed RSA Laboratories to market and fur-
ther develop applications of the RSA cryptosystem, which was granted U.S. Patent
4,405,829. In 1991, the company announced fifty-four factoring challenges to en-
courage cryptographic research and to monitor the state of contemporary factoring
algorithms and technology.
Each challenge number is the product of two large primes. These RSA challenge
numbers were generated by an isolated computer, with no access to the internet,
whose hard drive was immediately destroyed. Thus, we can be certain that if
someone presents a factorization of an RSA challenge number, there was no cheating
involved. Cash prizes were offered, ranging from $1,000 to $200,000. The challenge
was officially closed in 2007, although many people continue to try to factor the
RSA numbers.
As of 2017, the smallest unfactored RSA challenge number is RSA-230
17969491597941066732916128449573246156367561808012600070888
91883553172646034149093349337224786865075523085586419992922
18144366847228740520652579374956943483892631711525225256544
10980819170611742509702440718010364831638288518852689,
which has 230 digits. The largest of the challenge numbers is RSA-2048, which has
617 decimal digits (2048 bits). Without a major advance in quantum computation,
RSA-2048 will probably never be factored.
The smallest RSA challenge number is RSA-100
15226050279225333605356183781326374297180681149613806886579
08494580122963258952897654000350692006139.
This 100-digit number was factored less than a month after the challenge began.
Find the factors yourself!
1977: Comments
Poor choices and Pollard’s p − 1 algorithm. The security of RSA rests
on the assumption that factoring n = pq is computationally infeasible. However,
there are some choices of p and q that render n susceptible to certain factorization
algorithms. Suppose that p − 1 has only small prime factors. For instance, the
prime p = 614,657 is “large” but p − 1 = 614,656 = 28 · 74 has only “small” prime
factors. In this situation, Pollard’s p − 1 algorithm might be able to factor n in a
reasonable amount of time. In what follows, we do not require that n is a product
of two distinct primes.
The starting point of Pollard’s algorithm is the observation that if p − 1 does
not have any large prime factors, then (p − 1)|k! for some small k. For example, if
p = 181, then
p − 1 = 180 = 22 · 32 · 5
354 1977. RSA ENCRYPTION
contains only small prime factors and p − 1 divides 6! = 720 = 180 · 4. On the
other hand, if p = 179, then p − 1 = 178 = 2 · 89 has a relatively large prime factor.
Because of this, p − 1 does not divide k! for k = 1, 2, . . . , 88, although it divides 89!.
Suppose that p is a prime factor of n and (p − 1)|k!. Then k! = (p − 1)r for
some r ∈ N and Fermat’s little theorem yields
2k! = 2(p−1)r ≡ (2p−1 )r ≡ 1r ≡ 1 (mod p),
so p|(2k! − 1). Although other bases may be used, the base 2 is preferred in practice
since exponentiation with base 2 is particularly amenable to computation.
Let mk ≡ 2k! − 1 (mod n) with 1 ≤ mk ≤ n. Since mk and 2k! − 1 differ by a
multiple of n, we have
gcd(mk , n) = gcd(2k! − 1, n) ≥ p.
If n does not divide 2k! − 1, then gcd(mk , n) is a proper divisor of n. In the
preceding, we insisted that mk is the least positive residue of 2k! − 1 modulo n since
mk = 0 implies that gcd(mk , n) = n and hence we do not obtain a proper factor of
n.
To implement Pollard’s algorithm, fix a threshold K and compute gcd(mk , n)
for k = 2, 3, . . . , K and hope that a proper divisor of n is found. Observe that
mk ≡ 2k! − 1 ≡ (2(k−1)! )k − 1 ≡ (mk−1 + 1)k − 1 (mod n),
so the mk can be computed iteratively without computing k!. This shortcut is
important, since the rapid growth of k! prevents the direct evaluation of mk .
Here is an example. If n = 26,016,619, then
22! ≡ 4 (mod n), m2 = 3, gcd(m2 , n) = 1,
2 ≡ 4 ≡ 64 (mod n),
3! 3
m3 = 63, gcd(m3 , n) = 1,
2 ≡ 64 ≡ 16,777,216 (mod n),
4! 4
m4 = 16,777,215, gcd(m4 , n) = 1,
25! ≡ 16,777,2165 ≡ 6,730,144 (mod n), m5 = 6,730,143, gcd(m5 , n) = 1,
2 ≡ 6,730,144 ≡ 14,067,788 (mod n),
6! 6
m6 = 14,067,787, gcd(m6 , n) = 1,
2 ≡ 14,067,788 ≡ 20,137,005 (mod n), m7 = 20,137,004,
7! 7
gcd(m7 , n) = 5,419,
so 5,419|n. In fact, n = pq, in which p = 5,419 and q = 4,801 are prime. Neither
p − 1 = 5,418 = 2 · 32 · 7 · 43 nor q − 1 = 4,800 = 26 · 3 · 52
divides 7! = 5,040. That is, the Pollard p − 1 method was successful before our
initial analysis predicted that it should be. This is because 2k! −1 might be divisible
by p by chance, as opposed to being divisible by p because k! is a multiple of p − 1.
This is the case here, since 27! − 1 happens to be divisible by p.
If Alice is careful in selecting her primes p and q, she can prevent Eve from
factoring her RSA modulus n = pq using Pollard’s p − 1 algorithm. Let p0 , q0 be
large primes. Then let p and q be even larger primes of the form
p = ip0 + 1 and q = jq0 + 1.
Dirichlet’s theorem on primes in arithmetic progressions ensures that there are
infinitely many such primes; see the comments for the 1913 entry. By construction,
p − 1 = ip0 and q − 1 = jq0 have the large prime factors p0 and q0 , respectively.
This prevents Eve from applying the Pollard p − 1 algorithm effectively.
Answer to the problem. The factorization of RSA-100 is
37975227936943673922808872755445627854565536638199
× 40094690950920881030683735292761468389214899724061
This was found in 1991 by Mark Manasse (1958– ) and Arjen K. Lenstra (1956– )
[3].
Bibliography
[1] R. Rivest, A. Shamir, and L. Adleman, US Patent 4,405,829 (1977). http://www.google.com/
patents/US4405829.
[2] R. L. Rivest, A. Shamir, and L. Adleman, A method for obtaining digital signatures and public-
key cryptosystems, Comm. ACM 21 (1978), no. 2, 120–126, DOI 10.1145/359340.359342.
https://people.csail.mit.edu/rivest/Rsapaper.pdf. MR700103
[3] RSA Laboratories, RSA Honor Roll, http://www.ontko.com/pub/rayo/primes/hr_rsa.txt
[4] Wikipedia, Pollard’s p − 1 algorithm, https://en.wikipedia.org/wiki/Pollard’s_p_-
_1_algorithm.
[5] Wikipedia, RSA Factoring Challenge, http://en.wikipedia.org/wiki/RSA Factoring
Challenge.
[6] Wikipedia, Shor’s Algorithm, http://en.wikipedia.org/wiki/Shor’s_algorithm.
1978
Mandelbrot Set
Introduction
The Mandelbrot set is an example of a fractal, a mathematical object that
possesses a great deal of self-similarity. It is constructed as follows. For each
complex number c, form the sequence zn;c , in which
2
z0;c = c and zn+1;c = zn;c + c.
The simplest pictures of the Mandelbrot set are obtained by coloring a point c black
if the sequence defined above is bounded and white otherwise; see Figure 1. For
finer detail, we can color points c whose sequences zn;c appear unbounded based
upon how many iterations are needed to exceed a fixed, large threshhold; see Figure
2. One can zoom in on the Mandelbrot set and obtain a variety of beautiful and
bewildering images; see Figure 3 and the links at [9].
One of the most important things to address with any iterative problem is
the existence and classification of fixed points. If w is a fixed point of the map
p(z) = z 2 + c, in which c is a constant, then p(w) = w; that is,
w2 − w + c = 0.
Figure 1. The first visualization of the Mandelbrot set was pro-

duced in 1978 by Robert W. Brooks (1952–2002) and J. Peter
Matelski [1]. Image public domain.
357
358 1978. MANDELBROT SET
Figure 2. The Mandelbrot set.
This yields two fixed points (which coincide if c = 14 ):

√
1± 1 − 4c
w = .
2
The magnitude of
√
p (w) = 1 ± 1 − 4c
determines the nature of the fixed point w. If |p (w)| < 1, then w is an attracting
fixed point and values that start out close to w will iterate toward w. If |p (w)| > 1,
then w is a repelling fixed point and values that start out close to w will iterate away
from w. If |p (w)| = 1, then the situation is more complicated and the argument of
the complex number p (w) comes into play.
What about polynomials of higher degree? If p is a polynomial of degree n,
then p(w) = w means that w is a zero of the polynomial h(z) = p(z) − z, which has
degree at most n. The fundamental theorem of algebra asserts that a polynomial
of degree n has exactly n zeros, counted according to multiplicity, in the complex
plane. Thus, p has at most n fixed points. What if the polynomial p is replaced
with a slightly more exotic function?
Figure 3. Several close-up images of the Mandelbrot set.

Let p(z) be a complex polynomial of degree n. How many fixed points can
p(z) have? That is, how many roots can the equation p(z) = z have? At most n?
Infinitely many? Or something in between?
1978: Comments
Space Invaders. A strong contender for this year’s topic was the video game
Space Invaders 1 [10]. Created by Tomohiro Nishikado (1944– ) and released in
1978, this mega-blockbuster game revolutionized the industry. Interestingly, one of
the defining features of the game was due to hardware limitations. In the game,
alien ships are attacking the Earth. As more and more of them are destroyed, the
1A common misconception is that the line “And the space he invades he gets by on you”
from the 1981 Rush song Tom Sawyer is “And the space invaders get by on you.” Certainly, the
second is the more amusing interpretation.
remaining ships move faster and faster until the last few ships move at incredible
speeds. This feature was due to a computational bottleneck. The fewer the num-
ber of ships that need to be drawn, the faster the computer could display them!
Nishikado decided that he liked this and incorporated it into the game.
A continuous, nowhere-differentiable function. Self-similarity is a key

ingredient in the construction of the blancmange function, a continuous, nowhere-
differentiable function f : R → R; see Figure 4. Since the original construction is
due to Teiji Takagi (1875–1960) [8], this function is also called the Takagi function.
The first step is to prove that if f : R → R is differentiable at x, then
f (vn ) − f (un )
lim = f (x)
n→∞ vn − un
whenever un , vn are sequences such that
(a) un ≤ x < vn for all n ∈ N,
(b) un < vn for all n ∈ N, and
(c) limn→∞ (vn − un ) = 0.
To do this, use the definition of the derivative as a limit of difference quotients and
argue that it suffices to consider the case f (x) = 0.
Given x ∈ R, let g(x) denote the distance from x to the nearest integer. The
graph of g(x) looks like a “sawtooth wave” with each “tooth” of height 1/2 and
width 1; see Figure 5. Use the Weierstrass M -test to prove that the function
f : R → R defined by
∞
g(2n x)
f (x) = (1978.1)
n=0
2n
is continuous and bounded. Since g(x) is periodic with period 1, it follows that
g(2n x) is periodic with period 21n . If x is a dyadic rational number (that is, its
denominator is a power of 2), then 2k x is an integer whenever k ≥ n and hence
Figure 4. Graph of the blancmange function on [0, 1]. This func-

tion is continuous, but nowhere differentiable.
Figure 5. Graphs of the summands g(x), 12 g(2x), 14 g(4x), 18 g(8x)

for n = 1, 2, 3, 4.
g(2k x) = 0 for all k ≥ n. Fix x ∈ R. For each n ∈ N, let

mn mn + 1
un = and vn =
2n 2n
be dyadic rational numbers that satisfy
1
un ≤ x < vn and vn − un = .
2n
By the preceding remarks, the series for f reduces to
f (vn ) − f (un ) g(2k vn ) − g(2k un )

n−1
= . (1978.2)
vn − un 2k vn − 2k un
k=0
However,
2k un = 2k−n 2n un = 2k−n mn , and

k k−n
2 vn = 2 (mn + 1),
for some mn ∈ Z. Since 2k−n ≤ 12 for k < n, this means that g is linear on the
interval [2k un , 2k vn ]. Thus, each of the difference quotients on the right side of
(1978.2) is ±1 (depending on whether mn is even or odd). In other words,
f (vn ) − f (un )
n−1
= ±1 (1978.3)
vn − un
k=0
for some sequence of signs ±. Since the terms of a convergent series must tend to
zero, it follows that (1978.3) does not tend to a finite limit as n → ∞. In light of
(1978.1), we conclude that f (x) does not exist.
Answer to the problem. Let p(z) and q(z) be polynomials with deg p = n,
deg q = m, and m < n. What is the maximum number of zeros of
h(z) = p(z) − q(z)?

Terence Sheil-Small conjectured in 1992 that the sharp upper bound was n2 . This
is indeed the case if m = n or m = n − 1, as his student A. S. Wilmshurst proved
[11]. What if m < n − 1? Wilmshurst conjectured that if m = 1, that is,
h(z) = p(z) − z,
then the number of zeros of h is at most 3n − 2. This was proved in 2002 by
Dmitry Khavinson (1956– ) and Grzegorz Świa̧tek using techniques from complex
dynamics [4]; see [3] for an elegant exposition of this result and an application to
gravitational lensing (also see the 1915 entry). The sharpness of the upper bound
3n − 2 was proved in 2008 by Lukas Geyer [2].
Bibliography
[1] R. Brooks and J. P. Matelski, The dynamics of 2-generator subgroups of PSL(2, C), Riemann
surfaces and related topics: Proceedings of the 1978 Stony Brook Conference (State Univ.
New York, Stony Brook, N.Y., 1978), Ann. of Math. Stud., vol. 97, Princeton Univ. Press,
Princeton, N.J., 1981, pp. 65–71. MR624805
[2] L. Geyer, Sharp bounds for the valence of certain harmonic polynomials, Proc. Amer. Math.
Soc. 136 (2008), no. 2, 549–555, DOI 10.1090/S0002-9939-07-08946-0. MR2358495
[3] D. Khavinson and G. Neumann, From the fundamental theorem of algebra to astrophysics: a
“harmonious” path, Notices Amer. Math. Soc. 55 (2008), no. 6, 666–675. MR2431564
[4] D. Khavinson and G. Świa̧tek, On the number of zeros of certain harmonic polynomials,
Proc. Amer. Math. Soc. 131 (2003), no. 2, 409–414, DOI 10.1090/S0002-9939-02-06476-6.
MR1933331
[5] B. Mandelbrot, Fractal aspects of the iteration of z → λz(1 − z) for complex λ, z, Annals of
the New York Academy of Sciences 357, 249–259.
[6] B. B. Mandelbrot, The fractal geometry of nature, Schriftenreihe für den Referenten [Series
for the Referee], W. H. Freeman and Co., San Francisco, Calif., 1982. MR665254
[7] Team Fresh, Last Lights On—Mandelbrot fractal zoom to 6.066 e228 (2760 ). http://vimeo.
com/12185093.
[8] T. Takagi, A simple example of the continuous function without derivative, Proc. Phys. Math.
Japan, 1 (1901), 176–177.
[9] Wikipedia, Mandelbrot set, http://en.wikipedia.org/wiki/Mandelbrot_set.
[10] Wikipedia, Space invaders, http://en.wikipedia.org/wiki/Space_Invaders.
[11] A. S. Wilmshurst, The valence of harmonic polynomials, Proc. Amer. Math. Soc. 126 (1998),
no. 7, 2077–2081, DOI 10.1090/S0002-9939-98-04315-9. MR1443416
1979
TEX
Introduction
This entry honors two fundamental contributions of computer science to mathe-
matics. First, there is the creation of the TEX typesetting system by Donald Knuth
(1938– ), which was released in 1978. Second, there are off-by-one errors (in which
a loop intended to be executed n times is inadvertently executed n − 1 or n + 1
times), which is why this entry is listed under 1979. Purists will be happy to know,
however, that Knuth was honored with the National Medal of Science in 1979.
Donald Knuth is perhaps best known for his monumental, encyclopedic, and
stunningly readable series The Art of Computer Programming. Begun in 1962
while he was a graduate student at Caltech, the project continues to this day, with
volume 4A published in 2011 and several remaining volumes in preparation. While
preparing a second edition of Volume 2, Knuth was dismayed with the quality of
the typesetting done by the publisher. He realized that digital typesetting can be
boiled down to 0’s and 1’s: is a pixel black or white? Knuth saw this as a problem
amenable to computer science and set out to design his own system.
Knuth estimated he could have his digital-typesetting system ready in six
months. Instead, it was almost ten years before TEX was released. It was called
version 3. The next version was 3.1, which was followed by version 3.14, and so
forth. The current version is 3.14159265. This unusual numbering system suggests
that later versions are only incrementally different from previous ones and that TEX
has essentially stabilized.
TEX is used extensively in the publishing world. Almost every contemporary
paper in mathematics and computer science is typeset using a system based on TEX,
including this book! Most mathematicians use LATEX, a document-preparation
system written in TEX that includes many predefined commands that would be
tedious to deal with in “raw” TEX. For example, the LATEX source
\begin{equation}\label{eq:ZetaAgain1979}
\zeta(s) \ = \ \sum_{n=1}^\infty \frac{1}{n^s}
\end{equation}
produces (1979.1) below. The formula for the Riemann zeta function is enclosed in
an equation environment with the label eq:ZetaAgain1979 attached in case we
need to refer to it at some point.
Although TEX has many features that distinguish it from other digital typeset-
ting and publishing systems, we focus on only two points of interest here.
363
364 1979. TEX
First, TEX is a programming language. The user writes a program that de-
scribes both the content and layout of the document. The program is then inter-
preted by TEX and produces the desired output. This design choice means that
TEX is extraordinarily flexible and customizable. The price is that TEX and related
systems can be hard for beginners to pick up. Fortunately there are many tem-
plates available online. By looking at existing code and compiled documents you
can learn over 90% of what you need fairly quickly, and then search the web or ask
experts for the rest. For example, the second named author maintains TEX tem-
plates (for papers and presentations) at http://web.williams.edu/Mathematics/
sjmiller/public_html/math/handouts/latex.htm. The website also has a link
to a YouTube video that goes through writing simple articles with these templates.
The second point is that TEX uses sophisticated algorithms to lay out text.
Consider the problem of breaking a paragraph into justified lines. Each line must
begin at the left margin and end at the right margin, and there can be neither too
much nor too little space between words. Line breaks are allowed only between
words and, if necessary, inside a word at a known hyphenation point.
How would you solve this problem? The solution used in most digital typeset-
ting systems and word processors is a greedy strategy. We consider the words of
the paragraph one at a time, adding them to the current line. When the current
line is full, it is added to the page, and we begin adding words to the next line.
This approach is fast since it considers each word only once, but it can lead to
unappealing results because it never changes its mind about lines that have already
been added to the page. For example, the greedy algorithm may put vastly different
amounts of space between words on different lines, which looks terrible.

Proposed by James Wilcox, University of Washington.
Instead of looking one word at a time, TEX tries hard to optimize things to
produce a good-looking paragraph. To do so, it uses a notion called badness, which
is computed using complex rules that are designed to penalize ugly layouts. For
example, we wish to penalize paragraphs that contain lines with too much or too
little space between words. Given a definition of badness, the problem is now to
minimize badness over all possible sets of line breaks. A naive implementation
of this approach would consider exponentially many choices, but it is possible to
do better. Give a quadratic-time algorithm (in the number of lines) for finding
the optimal set of line breaks. For a detailed discussion of TEX’s line breaking
algorithm, see [4].
1979: Comments
Apéry’s constant. The year 1979 also saw Roger Apéry’s proof that ζ(3) is
irrational. Here ζ denotes the Riemann zeta function, defined by
∞
1
ζ(s) = (1979.1)
n=1
ns
for Re s > 1; see the 1928, 1933, 1939, 1942, 1945, 1967, and 1987 entries for
more information. As a consequence of Apéry’s result, some people refer to ζ(3) as
Apéry’s constant.
It has long been known that ζ(k) is a rational multiple of π k if k ≥ 2 is even1
(see the 1919 and 1945 entries); the values of ζ(k) for odd k ≥ 3 remain largely
mysterious. To fifty decimal places,
ζ(3) = 1.2020569031595942853997381615114499907649862923405 . . . .
Is this a rational multiple of π 3 ? If so, the numbers involved must be enormous
since otherwise an explicit formula for ζ(3) would have been found long ago. Lots of
mathematicians have studied ζ(3). For example, Srinivasa Ramanujan discovered
the curious representation
∞
1
7 3
ζ(3) = π −2 e2πk − 1.
180 k3
k=1
Although we do not have a closed-form expression for ζ(3), at least we know that it
is irrational. Moreover, ζ(k) is irrational for infinitely many odd k [2] and at least
one of ζ(5), ζ(7), ζ(9), ζ(11) is irrational [9].
Proof of Apéry’s theorem. The following argument of Stephen D. Miller2

(1974– ) [7] simplifies the proof of Frits Beukers (1953– ) [3]. We begin with a few
integrals:
1 1 a a ∞
s t 1
ds dt = , (1979.2)
0 0 1 − st n=1
(n + a)2
1 1 a a ∞
s t log st 1
ds dt = −2 , (1979.3)
0 0 1 − st n=1
(n + a)3
1 1 a a ∞
s t log t 1
ds dt = − .
0 0 1 − st n=1
(n + a)3
If a = b, then
1 1
s a tb
ds dt
0 0 1 − st
∞
1 1 1 1 1
sa tb log s
= − ds dt (1979.4)
b − a n=1 n+a n+b 0 0 1 − st
∞ ∞
1 1 1 1 1
= + − , (1979.5)
a − b n=1 (n + a)2 (a − b)2 n=1 n + a n + b
and
1 ∞ ∞
1
sa tb log t 1 1 1 1 1
ds dt = + − ,
0 0 1 − st b − a n=1 (n + b)2 (a − b)2 n=1 n + b n + a
1 If we permit nonpositive values of k and consider the analytic continuation of ζ to C\{1},

then ζ(0) = −1/2 and 0 = ζ(2) = ζ(4) = · · · ; see the 1942 entry.
2 No relation of the second named author.
366 1979. TEX
and
∞
1 1
sa tb log st 1 1 1
ds dt = − . (1979.6)
0 0 1 − st a − b n=1 (n + a)2 (n + b)2
The formulas (1979.2) and (1979.4) follow by expanding
∞
1
= s n tn
1 − st n=0
and integrating. The others follow by differentiating (1979.2) and (1979.4) with
respect to a and b.
If p(s, t) is a polynomial of degree n with integral coefficients, then (1979.3)
and (1979.6) imply
1 1
p(s, t) log(st) an + bn ζ(3)
ds dt = , (1979.7)
0 0 1 − st d3n
in which an , bn , dn ∈ Z and dn = lcm{1, 2, . . . , n}. We claim that dn ≤ e1.01n for
sufficiently large n. Indeed,
dn = pk(p) ,
p≤n
in which 1 0
log n log n
k(p) = ≤
log p log p
is the highest power of p that divides a number at most n. The prime number
theorem ensures that

log dn = k(p) log p ≤ π(n) log n ∼ n,
p≤n
which proves the claim.

Consider
1 dn
(s − s2 )n ,
Pn (s) =
n! dsn
which is a polynomial with integral coefficients. Since
1
1 log(x)
dx = − (1979.8)
0 1 − (1 − x)t 1−x
and Pn (1 − s) = (−1)n Pn (s), it follows that
1 1 1 1 1
Pn (s)Pn (t) log(st) Pn (s)Pn (t)
ds dt = ds dt du (1979.9)
0 0 1 − st 0 0 0 1 − (1 − st)u
1 1 1
Pn (s)Pn (t)
= (−1)n ds dt du.
0 0 0 1 − (1 − (1 − s)t)u
We next claim that for s, t ∈ (0, 1) fixed,
1 1
1 1
du = du.
0 1 − (1 − (1 − s)t)u 0 (1 − (1 − u)s)(1 − (1 − t)u)
Indeed, a partial fraction expansion implies that

1 1 s 1−t
= − ,
(1 − (1 − u)s)(1 − (1 − t)u) 1 − (1 − s)t 1 − (1 − u)s 1 − (1 − t)u
and hence
1
1
du
0 (1 − (1 − u)s)(1 − (1 − t)u)

1 log(1 − s) log(t)
= −s + (1 − t)
1 − (1 − s)t s t−1
log(t(1 − s))
= − .
1 − (1 − s)t
Use (1979.8) with x = 1 − (1 − s)t and observe that the two integrals are equal.
The preceding argument and (1979.7) ensure that
1 1 1
Pn (s)Pn (t)
(−1)n ds dt du
0 0 0 1 − (1 − (1 − s)t)u
1 1 1
Pn (s)Pn (t)
= (−1)n ds dt du
0 0 0 (1 − (1 − u)s)(1 − (1 − t)u)
1 1 1
Pn (s)Pn (t)
= ds dt du
0 0 0 (1 − (1 − u)s)(1 − tu)
is of the form
an + bn ζ(3)
.
d3n
Integrating by parts n times with respect to each of the variables s and t yields
1 1 1
Pn (s)Pn (t)
ds dt du
0 0 0 (1 − (1 − u)s)(1 − tu)
1 1 1
(s − s2 )n (t − t2 )n (u − u2 )n
= ds dt du.
0 ((1 − (1 − u)s)(1 − tu))
n+1
0 0
The nonnegative function
s(1 − s)t(1 − t)u(1 − u)
f (s, t, u) = (1979.10)
(1 − (1 − u)s)(1 − tu)
vanishes on the boundary of [0, 1] × [0, 1] × [0, 1] and attains its maximum at
√ √
(s, t, u) = (2 − 2, 2 − 1, 12 ),
where
√ √ √
f (2 − 2, 2 − 1, 12 ) = 17 − 12 2.
Thus,
an + bn ζ(3) √
3
= O (17 − 12 2)n .
dn
Since dn = O(e1.01n ) and
√
e3.03 (17 − 12 2) ≈ 0.60927,
it follows that the integer an + bn ζ(3), which is nonzero because of the positivity
of the integrand (1979.10), satisfies
1 ≤ |an + bn ζ(3)| = O(0.61n ).
This contradiction proves that ζ(3) is irrational.
368 1979. TEX
Bibliography
[1] R. Apéry, Irrationalité de ζ(2) et ζ(3) (French), Luminy Conference on Arithmetic, Astérisque
61 (1979), 11–13. MR3363457
[2] K. Ball and T. Rivoal, Irrationalité d’une infinité de valeurs de la fonction zêta aux entiers
impairs (French), Invent. Math. 146 (2001), no. 1, 193–207, DOI 10.1007/s002220100168.
MR1859021
[3] F. Beukers, A note on the irrationality of ζ(2) and ζ(3), Pi: A Source Book, Springer-
Verlag, 2000, 434-438. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.458.
2222&rep=rep1&type=pdf.
[4] D. E. Knuth and M. F. Plass Breaking paragraphs into lines, Software: Experience and Prac-
tice 11 (1981), no.11.
[5] D. E. Knuth, The art of computer programming. Vol. 1, Fundamental algorithms, 3rd ed. [of
MR0286317], Addison-Wesley, Reading, MA, 1997. MR3077152
[6] D. E. Knuth, The art of computer programming, http://www-cs-faculty.stanford.edu/
~uno/taocp.html.
[7] S. D. Miller, An Easier Way to Show ζ(3) ∈ Q. http://sites.math.rutgers.edu/~sdmiller/
simplerzeta3.pdf.
[8] The TEX User Group, History of TEX, http://www.tug.org/whatis.html.
[9] V. V. Zudilin, One of the numbers ζ(5), ζ(7), ζ(9), ζ(11) is irrational (Russian), Uspekhi Mat.
Nauk 56 (2001), no. 4(340), 149–150, DOI 10.1070/RM2001v056n04ABEH000427; English
transl., Russian Math. Surveys 56 (2001), no. 4, 774–776. MR1861452
[10] W. Zudilin, Apéry’s theorem. Thirty years after, Int. J. Math. Comput. Sci. 4 (2009), no. 1,
9–19. https://arxiv.org/abs/math/0202159. MR2598496
1980
Hilbert’s Third Problem
Introduction
The Wallace–Bolyai–Gerwien theorem states that two rectilinear figures are
equidecomposable if and only if they have the same area. For example, if a square
and an equilateral triangle have the same area, then they can be dissected into
a finite number of polygonal pieces so that one figure can be rearranged into the
other; see Figure 1. The hypothesis that the original figures and the resulting pieces
are rectilinear is necessary (see the 1924 entry on the Banach–Tarski paradox).
The history of the theorem is convoluted. According to the detailed history
set forth in [3], the problem was posed in 1807 by the Scottish mathematician
William Wallace1 (1768–1843). John Lowry arrived at the first proof of what is
now known as the Wallace–Bolyai–Gerwien theorem in 1814, although sadly his
contribution now appears largely unheralded and we are unable to find out much
information about him. The Hungarian mathematician Wolfgang Farkas Bolyai
(1775–1856) independently proved the result in 1832, followed shortly thereafter by
Paul Gerwien. Little is known about Gerwien, save that he was a “lieutenant in the
Prussian 22nd Infantry Regiment and instructor in the Royal Prussian Cadet Corps
in the early 1830s” and that he published two papers and an analytic geometry
textbook in the 1830s [7]. As for Farkas Bolyai, he is well known for the stern
warning to his son János Bolyai (1802–1860) about Euclid’s parallel postulate (see
the comments from the 1963 entry): “do not try the parallels in that way. . . . I have
measured that bottomless night, and all the light and all the joy of my life went
out there” [11].
Does the Wallace–Bolyai–Gerwien theorem have an analogue for solids in three
dimensions? At the International Congress of Mathematicians in Paris in 1900,
David Hilbert presented a host of problems to inspire and guide mathematicians
in the new century [8] (see the 1935 and 1970 entries). His third problem concerns
polyhedra, the analogues of polygons in three dimensions. Hilbert asked if two
polyhedra with equal volumes can always be dissected into finitely many polyhedra
so that one of the original polyhedra can be rearranged to form the other.
The problem was quickly dispatched by Max Dehn (1878–1952) in 1901. He
introduced a polyhedral invariant, the Dehn invariant, such that two polyhedra
are equivalent under dissection if and only if they have the same volume and same
Dehn invariant. Dehn proved that the cube and the tetrahedron have different
Dehn invariants and hence they are not equidecomposable [6]. More turns out to
be true. In 1980 (hence the topic for this year’s entry), Hans Debrunner showed
1 We are unsure whether he was related to his more famous namesake.
369
370 1980. HILBERT’S THIRD PROBLEM
Figure 1. A square can be dissected and rearranged to form an

equilateral triangle of equal area. Image public domain.
that if a polyhedron tiles three-dimensional space, then its Dehn invariant is zero
[5]. Since tetrahedra have nonzero Dehn invariants, they cannot tile R3 . See [2, 9]
and the references therein for a readable account of the method.
Dehn was not the first to solve Hilbert’s third problem. That honor goes to
Ludwik Antoni Birkenmajer (1855–1929), who solved it in 1882 for a math contest
held by the Academy of Arts and Sciences of Kraków [3]. The competitors were
challenged with the following:
Given any two tetrahedra with equal volumes, subdivide one of them
by means of planes, if it is possible, into the smallest possible number
of pieces that can be rearranged so as to form the other tetrahedron. If
this cannot be done at all or can be done only with certain restrictions,
then prove the impossibility or specify precisely those restrictions.
This is Hilbert’s third problem! Although it was judged at the time to be correct,
Birkenmajer’s solution was never published. It disappeared from history until it
was recently rediscovered and reevaluated; a modern appraisal deems it valid [3].

Problems on packing and tiling go back to antiquity. In On the Heavens,
Aristotle (384–322 BCE) asserted [1, Book 3, Sec. 8]:
It is agreed that there are only three plane figures which can fill a
space, the triangle, the square, and the hexagon, and only two solids,
the pyramid [tetrahedron] and the cube.
However, regular tetrahedra do not completely fill space around a point.2 This leads
to the following problem, which is unsolved. How many nonoverlapping congruent
regular tetrahedra can touch a point in R3 ?
2 The regular tetrahedra do not have to be congruent; similarity is enough. This is because
the condition involves only a small neighborhood of the point in question.

One can show that 20 tetrahedra can touch at a point. This can be done in
such a way that the 20 opposite faces of these tetrahedra (not touching the point)
lie on the 20 faces of a regular icosahedron, whose centroid is the point at which the
tetrahedra touch. We can get an upper bound on how many tetrahedra can touch
by determining the solid angle subtended by a regular tetrahedron and dividing it
into a full solid angle 4π ≈ 12.56 steradians. In this way, it is found that there is
room for at most 22 tetrahedra to touch at a point. Is the answer 20, 21, or 22?
No one knows. The answer is suspected to be 20. Can one even rule out 22?
The problem can be turned into a two-dimensional problem by intersecting a
neighborhood of the point in question with a small sphere. How many equilateral
spherical triangles, with all angles arccos(1/3) (about 71 degrees) can be packed on
the surface of a unit sphere without overlap?
1980: Comments
Origin of the problem. The first written instance of the centennial problem
appears to be the paper of Lagarias and Chuanming Zong [10, p. 1545], in which
the number 20 is suggested as the correct answer. However, the problem seems
to have circulated in the community for many years. Paul Sally (1933–2013) told
Lagarias that he encountered the problem at Lincoln Labs in 1958.
A tiling problem. Here is a problem that has an elegant solution using in-
variants (an invariant is a quantity that is unchanged throughout a process). Is it
possible to tile a chessboard with two opposite corners removed (Figure 2(b)) with
1 × 2 dominoes (Figure 2(a))? If we assign white the number −1 and black the
number 1, then the sum of the values in any figure tiled by dominoes is zero. This
is an invariant: this value is the same regardless of how many dominoes are used
or where they are placed. Since the sum of the values in the modified chessboard
is −2, no such tiling is possible. See the 2003 entry for a more difficult variation on
this problem.
Zombies and monovariants. Invariants permeate much of mathematics and

science. In physics one sees this with conservation of energy, momentum, and
angular momentum. A related and also useful concept is that of a monovariant,
a quantity whose value can change in only one direction throughout a series of
transformations.
Suppose that the world is an n × n chessboard and that each square is occupied
either by a zombie or a person. If a square is occupied by a zombie, then it remains
occupied by a zombie in all subsequent rounds. If a square has at least two edges
that border zombie-infested squares, then the person in that square becomes a
zombie in the next round. If not, then the person remains as they were. We iterate
this process round after round; see Figure 3 for a sample evolution. For a given
configuration, will the zombie infection spread and infect everyone, or will some
people survive?
Some configurations lead to universal infestation. For example, if each square
is occupied by a zombie, then the zombies have already won. As another example,
a checkerboard pattern of zombies and people also leads to universal infestation.
Both of these initial configurations have on the order of n2 cells that are initially
(a) A 1 × 2 domino. (b) A chessboard with two opposite corners removed.
Figure 2. Is it possible to tile a chessboard that has two opposite

corners removed with 2 × 1 dominoes?
(a) Round 0 (b) Round 1 (c) Round 2 (d) Round 3
Figure 3. The spread of an infection. After the third round no

more cells will be infected. Thus, seven people survive.
(a) The perimeter of the infected area remains (b) The perimeter of the infected area de-
unchanged. creases by four.
Figure 4. Two of the possibilities that can happen when a cell is

infected.
infected. Can we reduce this to approximately n initial zombies and still end with
a complete takeover? Yes. One can show that infecting the main diagonal suffices
and this requires only n zombies to start with.
Can the undead rule the world if there are only n − 1 zombies at the beginning?
No! One way to see this is to introduce the following monovariant. At time t,
let P (t) denote the perimeter of the infected area. One can show that P (t) is
nonincreasing; two cases are shown in Figure 4. Since the perimeter of the n × n
board is 4n and since the maximum possible perimeter of a configuration with n − 1
infected squares is 4(n − 1), it follows that at least one person will be safe if the
zombie apocalypse commences with only n − 1 zombies. We leave it as an exercise
to the reader to determine exactly how many people will be safe.
Zeckendorf revisited. For another example of a monovariant, we return to

Zeckendorf’s theorem (see the 1970 entry). By defining the right monovariant, one
can show that among all decompositions of a natural number as a sum of Fibonacci
numbers, none have fewer summands than the Zeckendorf decomposition. Given
a decomposition of n as a sum of Fibonacci numbers (we use the convention here
that F1 = 1, F2 = 2, and Fk+1 = Fk + Fk−1 ), consider the sum of the indices of the
terms that appear in the decomposition. If two adjacent summands Fk and Fk−1
appear, we do not increase the index sum by replacing them with Fk+1 . If Fk is
used twice in the decomposition, then use the identity
2Fk = Fk−2 + Fk−1 + Fk = Fk−2 + Fk−1
to replace the two occurrences of Fk with Fk−2 + Fk−1 , which has a smaller index
sum. These two processes can occur only finitely many times. When the procedure
terminates, there can be no repeats or adjacencies in the decomposition. Thus, we
have a Zeckendorf decomposition. See [4] for generalizations.
Bibliography
[1] Aristotle, On the Heavens, http://classics.mit.edu/Aristotle/heavens.html.
[2] M. Aigner and G. M. Ziegler, Proofs from The Book, 3rd ed., including illustrations by Karl
H. Hofmann, Springer-Verlag, Berlin, 2004. MR2014872
[3] D. Ciesielska and K. Ciesielski, Equidecomposability of polyhedra: a solution of Hilbert’s
third problem in Kraków before ICM 1900, Math. Intelligencer 40 (2018), no. 2, 55–63, DOI
10.1007/s00283-017-9748-4. https://doi.org/10.1007/s00283-017-9748-4. MR3814621
[4] K. Cordwell, M. Hlavacek, C. Huynh, S. J. Miller, C. Peterson, and Y. N. T. Vu, On summand
minimality of generalized Zeckendorf decompositions, https://arxiv.org/abs/1608.08764.
[5] H. E. Debrunner, Über Zerlegungsgleichheit von Pflasterpolyedern mit Würfeln (German),
Arch. Math. (Basel) 35 (1980), no. 6, 583–587 (1981), DOI 10.1007/BF01235384. MR604258
[6] M. Dehn, Üeber den Rauminhalt, Mathematische Annalen 55 (1901), no. 3, 465–478. http://
gdz.sub.uni-goettingen.de/dms/load/img/?PID=GDZPPN002258633.
[7] G. N. Frederickson, Dissections: plane & fancy, Cambridge University Press, Cambridge,
1997. MR1735254
com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-
08-10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf.
[9] L. A. Krasilnikova, Hilbert’s Third Problem (A Story of Threes), MIT Admissions Blog,
February 25, 2015, http://mitadmissions.org/blogs/entry/hilberts-third-problem-a-
story-of-threes and http://sciencecow.mit.edu/me/hilberts_third_problem.pdf.
[10] J. C. Lagarias and C. Zong, Mysteries in packing regular tetrahedra, Notices Amer. Math. Soc.
59 (2012), no. 11, 1540–1549, DOI 10.1090/noti918. http://www.ams.org/notices/201211/
rtx121101540p.pdf. MR3027108
[11] J. J. O’Connor and E. F. Robertson, Farkas Wolfgang Bolyai, MacTutor History of Mathe-
matics, http://www-groups.dcs.st-and.ac.uk/history/Biographies/Bolyai_Farkas.html.
[12] Wikipedia, Dehn invariant, https://en.wikipedia.org/wiki/Dehn_invariant.
[13] Wikipedia, Hilbert’s third problem, https://en.wikipedia.org/wiki/Hilbert’s third
problem.
[14] Wolfram MathWorld, Dissection, http://mathworld.wolfram.com/Dissection.html.
1981
The Mason–Stothers Theorem
Introduction
In 1981, Walter Wilson Stothers (1946–2009) proved a remarkable theorem
about polynomials [10], later discovered independently by Richard C. Mason [3].
Although broader generalizations exist, we state it here for polynomials over the
complex numbers for the sake of simplicity. The Mason–Stothers theorem states
that if a, b, c are relatively prime polynomials, not all of which are constant, and
a + b = c, then
max{deg a, deg b, deg c} ≤ deg rad(abc) − 1, (1981.1)
in which rad f denotes radical of f , that is,
the product of the distinct irreducible
factors of f . For example, rad x3 (x + 1)2 = x(x + 1). Since the field of complex
numbers is algebraically closed,
deg rad(abc) = number of distinct roots of abc.
What is the importance of the Mason–Stothers theorem? The centennial prob-
lem for this year is to prove the polynomial version of Fermat’s last theorem! If
that is not motivation enough, perhaps an integer analogue of the Mason–Stothers
theorem will interest you. Why should such an analogue exist? As students of
abstract algebra know, there are a great many similarities between integers and
polynomials. For example, they both form rings and they both enjoy unique fac-
torization into irreducibles. The integers have the primes as their basic building
blocks and the polynomials have the linear polynomials as theirs.
Here is a first attempt at an integer analogue of (1981.1). Suppose that a, b, c
are relatively prime integers and a + b = c. A naive generalization of (1981.1) is

max |a|, |b|, |c| ≤ rad(abc), (1981.2)
in which rad(abc) denotes the product of the distinct prime factors of abc. For
example, rad(200) = rad(23 · 52 ) = 2 · 5 = 10. Unfortunately, (1981.2) is false, even
if we replace rad(abc) with K rad(abc) for some large K > 0 [1]. Let p ≥ 2K be a
large prime and let
a = 1, b = 2p(p−1) − 1, and c = 2p(p−1) .
Then Euler’s generalization of Fermat’s little theorem ensures that p2 divides b and
hence the strengthened (1981.2) implies that
2bK 2Kc
c ≤ K rad(abc) ≤ < < c,
p p
375
376 1981. THE MASON–STOTHERS THEOREM
which is absurd. Consequently, an integral generalization of the Mason–Stothers

theorem must be more subtle.
The celebrated abc-conjecture states that for any > 0, there exists a constant
κ so that if a, b, c are relatively prime integers and a + b = c, then
1+
max{|a|, |b|, |c|} ≤ κ rad(abc) .
It was posed independently in 1985 by David Masser (1948– ) [4] and in 1988 by
Joseph Oesterlé (1954– ) [6].
Although the Mason–Stothers theorem has several short proofs (see the com-
ments below for the proofs of Noah Snyder (1980– ) [9] and Joseph H. Silverman
(1955– ) [8]), the abc-conjecture is much more stubborn. It is considered to be one
of the most important open problems in number theory [1].
In 2012, Shinichi Mochizuki (1969– ), a respected mathematician previously
best known for proving a conjecture of Alexander Grothendieck, released a series of
four papers on “Inter-universal Teichmüller theory,” totaling over 500 pages, which
claim to contain a proof of the abc-conjecture [5]. As of 2019, there is no universal
consensus about the validity of the proof.

Proposed by Jeffery Paul Wheeler, University of Pittsburgh.
Use the Mason–Stothers theorem to prove a polynomial version of Fermat’s last
theorem: if x, y, z are relatively prime polynomials, not all of which are constant,
then xn + y n = z n has no solutions with n ≥ 3.
1981: Comments
Snyder’s proof of the Mason–Stothers theorem. In 2000, Noah Snyder
provided a simple proof of the Mason–Stothers theorem [9, 11]. Let a, b, c be rela-
tively prime polynomials, not all of which are constant, and suppose that a+b+c = 0
(it is more convenient to work with this symmetric version instead of a + b = c).
Then a, b, c are pairwise relatively prime since any polynomial that divides two of
a, b, c divides the third. Since
a + b + c = 0,
the three Wronskians
W (a, b) = ab − a b, (1981.3)

W (b, c) = bc − b c = b(−a − b ) − b (−a − b) = ab − a b,
W (c, a) = ca − c a = (−a − b)a − (−a − b )a = ab − a b
are equal. Let
W = W (a, b) = W (b, c) = W (c, a)
denote their common value. We claim that W is not the zero polynomial. Without
loss of generality, suppose that a = 0; that is, a is not constant. If W = 0, then
ab = a b and hence a divides a since gcd(a, b) = 1. However, this contradicts the
fact that deg a > deg a . Thus, W is not identically zero.
The various formulas for W ensure that gcd(a, a ), gcd(b, b ), and gcd(c, c )
each divide W . Since these three polynomials are pairwise relatively prime, W is
divisible by their product and hence
deg gcd(a, a ) + deg gcd(b, b ) + deg gcd(c, c ) ≤ deg W. (1981.4)
However,
⎫
deg gcd(a, a ) ≥ deg a − (number of distinct roots of a),⎪
⎬
deg gcd(b, b ) ≥ deg b − (number of distinct roots of b), (1981.5)
⎪
⎭
deg gcd(c, c ) ≥ deg c − (number of distinct roots of c).
From (1981.3), we have
deg W ≤ deg a + deg b − 1.
Putting (1981.4) and (1981.5) together, simplifying, and using the relative primality
of a, b, c, yields
deg c ≤ (number of distinct roots of a) + (number of distinct roots of b)

+ (number of distinct roots of c) − 1
= (number of distinct roots of abc) − 1
= deg rad(abc) − 1.
By symmetry, the same argument provides identical bounds deg a and deg b. This
yields (1981.1), as desired.
Silverman’s proof of the Mason–Stothers theorem. Joseph H. Silverman

(1955– ) proved a more general result that, when distilled, provides an elegant proof
of the Mason–Stothers theorem [8]. We follow the presentation in [1]. Suppose that
π : C ∪ {∞} → C ∪ {∞} is a rational function; that is, π(t) = f (t)/g(t), in which
f, g are polynomials. Since the Riemann sphere C∪{∞} has genus 0, the Riemann–
Hurwitz formula from the theory of Riemann surfaces [2] tells us

2 deg π = 2 + deg π − |π −1 (z)| , (1981.6)
z∈C∪{∞}
in which deg π = max{deg f, deg f } and

π −1 (z) = w ∈ C ∪ {∞} : π(w) = z = w ∈ C ∪ {∞} : f (w) − zg(w) .
In particular,
|π −1 (z)| ≤ deg π,
with equality unless f (w) − zg(w) has a double root.
If a, b, c are relatively prime polynomials, not all of which are constant, and
a + b = c, then let π = a/c. Every term on the right-hand side of (1981.6) is
nonnegative and hence

2 deg π ≥ 2 + deg π − |π −1 (z)| .
z∈{0,1,∞}
378 1981. THE MASON–STOTHERS THEOREM
Observe the following.

• If π(∞) = 0, then π −1 (0) is the set of distinct roots of a.
• If π(∞) = 1, then π −1 (0) is the set of distinct roots of b.
• If π(∞) = ∞, then π −1 (∞) is the set of distinct roots of c.
Since ∞ belongs to at most one of these sets, we get
deg π ≤ (number of distinct roots of abc) − 1.
Since deg π = max{deg a, deg c} and b = c − a, we obtain
max{deg a, deg b, deg b} ≤ deg rad(abc) − 1.
This concludes the proof of the Mason–Stothers theorem.
Fermat’s last theorem and the abc-conjecture. What does the abc-conjec-
ture have to say about the Fermat equation
xn + y n = z n , (1981.7)
in which x, y, z are natural numbers with gcd(x, y, z) = 1 and n ≥ 3? We follow
the presentation in [1]. The case n = 3 was handled by Leonhard Euler in 1770, so
we suppose that n ≥ 4. Let
a = xn , b = yn , and c = zn,
and observe that
rad(abc) = rad(xn y n z n ) = rad(xyz) ≤ xyz ≤ z 3 .
For each > 0, the abc-conjecture provides a constant κ so that
z n = max{a, b, c} ≤ k rad(abc)1+ ≤ k (z 3 )1+ = k z 3+3 .
If < 13 , then the exponent on the right-hand side is less than 4 and hence the
preceding inequality has only finitely many solutions. Since x, y ≤ z, it follows that
(1981.7) has only finitely many solutions.
This result was proved for n ≥ 5 (without the abc-conjecture) by Gerd Faltings
(1954– ) as a consequence of his proof of Mordell’s conjecture (a curve of genus
greater than 1 over the field of rational numbers has only finitely many rational
points). Faltings earned the Fields Medal in 1986 for that result.
Solution to the problem. Suppose that x, y, z are polynomials, not all of
which are constant, such that
xn + y n = z n
for some n ≥ 3. The Mason–Stothers theorem with a = xn , b = y n , and c = z n
ensures that
n deg x ≤ n max{deg x, deg y, deg z}
= max deg{xn , y n , z n }
≤ deg rad(xn y n z n ) − 1
= deg rad(xyz) − 1
≤ deg x + deg y + deg z − 1,
and similarly for n deg y and n deg z. Therefore,

n(deg x + deg y + deg z) ≤ 3(deg x + deg y + deg z) − 3
and hence
3
n ≤ 3− < 3.
deg x + deg y + deg z
Thus, the polynomial version of the Fermat equation has no solutions for n ≥ 3, in
which not all the polynomials involved are constant.
Bibliography
[1] A. Granville and T. J. Tucker, It’s as easy as abc, Notices Amer. Math. Soc. 49 (2002),
no. 10, 1224–1231. http://www.ams.org/notices/200210/fea-granville.pdf. MR1930670
[2] G. A. Jones and D. Singerman, Complex functions: An algebraic and geometric viewpoint,
Cambridge University Press, Cambridge, 1987. MR890746
[3] R. C. Mason, Diophantine equations over function fields, London Mathematical Society Lec-
ture Note Series, vol. 96, Cambridge University Press, Cambridge, 1984. MR754559
[4] D. W. Masser, Open problems, Proceedings of the Symposium on Analytic Number Theory.
London: Imperial College, 1985.
[5] S. Mochizuki, http://www.kurims.kyoto-u.ac.jp/~motizuki/top-english.html.
[6] J. Oesterlé, Nouvelles approches du “théorème” de Fermat (French), Astérisque 161-162
(1988), Exp. No. 694, 4, 165–186 (1989). Séminaire Bourbaki, Vol. 1987/88. MR992208
[7] P. Ribenboim, 13 Lectures of Fermat’s Last Theorem, Springer, 1979.
[8] J. H. Silverman, The S-unit equation over function fields, Math. Proc. Cambridge Philos.
Soc. 95 (1984), no. 1, 3–4, DOI 10.1017/S0305004100061235. MR727073
[9] N. Snyder, An alternate proof of Mason’s theorem, Elem. Math. 55 (2000), no. 3, 93–94, DOI
10.1007/s000170050074. http://cr.yp.to/bib/2000/snyder.pdf. MR1781918
[10] W. W. Stothers, Polynomial identities and Hauptmoduln, Quart. J. Math. Oxford Ser. (2)
32 (1981), no. 127, 349–370, DOI 10.1093/qmath/32.3.349. http://qjmath.oxfordjournals.
org/content/32/3/349.extract. MR625647
[11] Wikipedia, Mason–Stother’s theorem, https://en.wikipedia.org/wiki/Mason-Stothers
theorem.
1982
Two Envelopes Problem
Introduction
The debate about whether the natural numbers and the primes are built into
the universe or whether they are human constructs has raged for centuries. In
A Mathematician’s Apology, G. H. Hardy (see the 1920, 1923, and 1940 entries)
asserts:
317 is a prime, not because we think so, or because our minds are
shaped in one way rather than another, but because it is, because
mathematical reality is built that way.
We make no attempt to wade into these deep waters here: you are welcome to
consider Figures 1 and 2 and draw your own conclusions.
Despite his legendary aversion to applicable mathematics, Hardy helped to so-
lidify some of the theoretical underpinnings of probability theory. Famed probabilist
Persi Diaconis (1945– ) wrote [1]:
Despite a true antipathy to the subject, Hardy contributed deeply
to modern probability. His work with Ramanujan begat probabilistic
number theory. His work on Tauberian theorems and divergent series
has probabilistic proofs and interpretations. Finally, Hardy spaces are
a central ingredient in stochastic calculus. . . .
I want to argue that Hardy had no knowledge of probability the-
ory and indeed had a genuine antipathy to the subject. To begin with,
Hardy loved clear rigorous argument. At the time he worked, the
mathematical underpinnings of probability were a vague mess. . . it was
only in 1933 that Kolmogorov gave a measure theoretic interpretation
of probability; a random variable was defined as a measurable func-
tion. Then one could see that early workers in probability; Bernoulli,
Laplace, Gauss, Chebychev, Markov were doing mathematics after all.
The naive approach to probability is full of pitfalls and paradoxes. It took
many years for the theory to be established on firm foundations. We have seen
several examples of paradoxes throughout this book. Each one provides a valuable
opportunity for further work: it means there is something incomplete or incompat-
ible with our view of mathematics. We began with paradoxes related to the notion
of infinity in the 1918 entry. Then we encountered the Banach–Tarski paradox in
the 1924 entry, which challenged our understanding of what area and volume are.
We continued with the liar’s paradox and Russell’s paradox in the 1929 entry and
discussed related issues in set theory and logic. This is just the start! We have
many others paradoxes in the entries before this, as well as a few more ahead (see
the Monty Hall problem in the 1990 entry).
381
382 1982. TWO ENVELOPES PROBLEM
Figure 1. Six balls can be arranged to form a rectangular array,

each side of which consists of two or more balls. That is, six is a
composite number. The arrangements above are physical demon-
strations of the factorizations 3 × 2 = 6 and 2 × 3 = 6. Are these
statements about the physical universe itself or about a human
construct? Would the result be the same if six physical balls were
used on the other side of the galaxy? Would we suddenly find our-
selves unable to arrange six balls in a rectangular array as above?
Figure 2. Seven balls cannot be arranged into a rectangular ar-

ray, each side of which consists of two or more balls. That is, seven
is a prime number. Is the primality of seven built into the uni-
verse? Is it a human construct? Is it conceivable that, in another
time or place, seven balls could be placed in a rectangular array
(each side of which consists of two or more balls)?
For this year, we look at the two envelopes problem. A player can choose be-
tween two closed, identically constructed envelopes. The envelopes are labeled A
and B, respectively. Both envelopes contain money, although one of them contains
twice as much as the other. The one that contains more money cannot be deter-
mined without opening the envelopes. You initially choose envelope A but do not
open it. You are permitted to switch envelopes indefinitely until a final decision is
made. Which envelope should you open?
Where is the paradox? Let us examine the expected value of switching the
envelopes. For example, envelopes A and B each have a probability of 1/2 to
contain the greater amount. Suppose that we choose A and let x > 0 denote the
amount of money in the envelope. Should we switch? Since B has an equal chance
of having either value, half the time it should contain half as much as A, namely
x/2, and half the time it should contain twice, namely 2x. Thus, the expected
amount of money in B is

1 1 x 5
(2x) + = x. (1982.1)
2 2 2 4
Since this is larger than x, the amount in A, we should switch. Of course, the
same argument applies to B as well. Therefore, we should continue to switch back
and forth indefinitely since the expected return (5/4)n x after n switches tends to
infinity! In principle, this suggests that one can place $1 and $2 into two different
envelopes, juggle them for several minutes, then open one of the envelopes with the
expectation of receiving at least a billion dollars.
It is not entirely clear who first came up with the two envelopes problem. A vari-
ant of it appeared in a 1953 recreational mathematics book by Maurice Kraitchek
(1882–1957), who considered a wager between two rich men who wished to de-
termine whose necktie was more expensive. In the same year, Littlewood stated
another variant and credited it to Schrödinger (see the 1925 entry). What is beyond
a doubt is that the problem was popularized in 1982 by Scientific American writer
and puzzle enthusiast Martin Gardner [3]; see the 1914 entry.
The following problem was first proposed by Olle Häggström in 2013 [2]. It is
similar to Newcomb’s paradox, a variant of the two envelopes problem.

Proposed by Avery T. Carr, Emporia State University, and Steven J.
An intelligent donor has prepared two boxes for you: a big one and a small one.
The small one contains $1,000. The big one contains either $1,000,000 or nothing.
You have a choice between accepting both boxes or just the big box. It seems
obvious that you should accept both boxes, since that gives you $1,000 regardless
of the content of the big box. However, the donor has tried to predict whether you
will pick one box or two boxes. If the prediction is that you pick just the big box,
then it contains $1,000,000. If the prediction is that you pick both boxes, then the
big box is empty. The donor has performed the same experiment with many other
people and has predicted correctly 90% of the time. What should you do?
1982: Comments
Resolution of the two envelopes problem. Unlike many of the other para-
doxes that we have encountered, the issue for the two envelopes problem is easily
highlighted and explained. Let x denote the amount in the lesser of the two en-
velopes. Then the total amount of money in the two envelopes is 3x = x + 2x
and this cannot change. If A contains x dollars, then you gain x by switching. If
A contains 2x dollars, then you lose x by switching. Therefore, the expected gain
from switching is
1 1
x + (−x) = 0,
2 2
384 1982. TWO ENVELOPES PROBLEM
as expected. The issue with (1982.1) is that the terms 2x and x/2 are conditioned
upon whether envelope B contains more or less money than A. Thus, a more
complicated argument involving conditional probability is required to pursue that
line of reasoning.
Bibliography
[1] P. Diaconis, G. H. Hardy and probability???, Bull. London Math. Soc. 34 (2002), no. 4,
385–402, DOI 10.1112/S002460930200111X. https://doi.org/10.1112/S002460930200111X.
MR1897417
[2] O. Häggström, Paradoxes in Probability Theory (Book Review), Notices of the AMS 3 (2013),
329–331.
[3] M. Gardner, Aha! Gotcha: Paradoxes to Puzzle and Delight, W. H. Freeman & Co, 1982.
[4] Wikipedia, Two envelopes problem, http://en.wikipedia.org/wiki/Two envelopes problem.
1983
Julia Robinson
Introduction
Julia Robinson was the first woman to become president of the American Math-
ematical Society (1983–1984). She shared her passion for mathematics with her
sister and biographer, Constance Reid, who said about Robinson:
She herself, in the normal course of events, would never have considered
recounting the story of her own life. As far as she was concerned, what
she had done mathematically was all that was significant. [8]
“Significant” is a fitting word indeed when speaking about the magnitude of Robin-
son’s mathematical accomplishments, especially regarding her contributions to the
eventual resolution of Hilbert’s tenth problem (see the 1970 entry).
In the early years of the 20th century, David Hilbert proposed twenty-three
problems that would shape mathematics for decades to come [3]. One of the un-
derlying themes of the list was the question of decidability. Given a mathematical
problem that falls into a certain class, is there a general algorithm that can solve
every problem in the class?
For his tenth problem, Hilbert considered the general solvability of Diophantine
equations. These are equations of the form
P (x1 , x2 , . . . , xn ) = 0,
in which P is a polynomial with integer coefficients and the unknowns x1 , x2 , . . . , xn

are integers. Hilbert’s tenth problem is the following: is there an algorithm that
can determine the solvability of an arbitrary Diophantine equation?
Over the span of several decades, Robinson and her collaborators, Martin Davis
and Hilary Putnam, proved that if there is at least one “Diophantine relation of
exponential growth,” then no such solvability algorithm exists. They later estab-
lished that if a Diophantine equation P (a, b, c, x1 , x2 , . . . , xm ) = 0 has a solution in
integers x1 , x2 , . . . , xm if and only if a = bc , then the necessary growth condition is
met. Robinson later simplified this criterion to involve only two parameters a, b. In
1970, Yuri Matiyasevich discovered an equation that satisfied this criterion. This
answered Hilbert’s tenth problem in the negative (see the 1970 entry).
Decision problems are found in many areas of mathematics, such as combi-
natorics and graph theory. One particularly important problem is the traveling
salesman problem: given a set of cities and the distances between them, find the
shortest possible route that visits each city once and returns to the city of origin.
This is known to be an “NP-complete problem,” which, without getting technical,
means that there is no efficient algorithm to solve it.
385
386 1983. JULIA ROBINSON
A graph G is a set of vertices, represented by dots, that are connected by edges,

represented by line segments that connect exactly two dots (see the 2006 entry). It
is often fruitful to assign a weight to each edge. For example, these might represent
the physical distance between two cities. The sum of the weighted edges of a path
in G is the total weight of the path. In this context, the traveling salesman problem
asks one to find the path of least weight that traverses all of the vertices once.

Proposed by Avery T. Carr, Emporia State University.
A cycle in a graph G is a closed path that traverses a set of vertices and passes
through each exactly once. A Hamiltonian cycle of G is a cycle that contains every
vertex of G; see Figure 1. Is there an efficient algorithm to find a Hamiltonian cycle
(provided at least one exists) of minimum weight for an arbitrary, edge-weighted
graph G? This is an equivalent abstract version of the traveling salesman problem.
A positive solution would have notable implications in several industries, including
logistics and computer engineering.
1983: Comments
Diophantine sets and the prime numbers. We say that S ⊆ Nj is a
Diophantine set if there is a Diophantine equation
P (x1 , x2 , . . . , xj , y1 , y2 , . . . , yk ) = 0
so that (n1 , n2 , . . . , nj ) ∈ S if and only if there is an (m1 , m2 , . . . , mk ) ∈ Nk so that
P (n1 , n2 , . . . , nj , m1 , m2 , . . . , mk ) = 0.
For example, the set {1, 4, 9, 16, . . .} is Diophantine since we may let
P (x, y) = y 2 − x.
Indeed, x is a perfect square if and only if x = y 2 for some y ∈ N, that is, if and
only if P (x, y) = 0. Similarly, the arithmetic progression {a + b, 2a + b, 3a + b, . . .}
is Diophantine, as witnessed by the polynomial P (x, y) = ay + b − x.
Is the set of prime numbers Diophantine? Surely the primes are random and
unpredictable enough that they could never be encapsulated by a single polynomial,
right? In 1976, James P. Jones, Daihachiro Sato, Hideo Wada, and Douglas Wien
found such a polynomial. They write:
Martin Davis, Yuri Matijasevich, Hilary Putnam and Julia Robinson
have proven that every recursively enumerable set is Diophantine, and
hence that the set of prime numbers is Diophantine. . . it follows that
the set of prime numbers is representable by a polynomial formula. In
this article such a prime representing polynomial will be exhibited in
explicit form. [5]
(a) Cube
(b) Octahedron
(c) Dodecahedron
Figure 1. Hamiltonian cycles (right) for the edge graphs of the

cube, octahedron, and dodecahedron.
The set of prime numbers, they show, is precisely the set of positive values assumed
by the polynomial
P (a, b, c, . . . , x, y, z)

= (k + 2) 1 − [wz + h + j − q]2 − [(gk + 2g + k + 1)(h + j) + h − z]2
− [16(k + 1)3 (k + 2)(n + 1)2 + 1 − f 2 ]2 − [2n + p + q + z − e]2
− [e3 (e + 2)(a + 1)2 + 1 − o2 ]2 − [(a2 − 1)y 2 + 1 − x2 ]2
− [16r 2 y 4 (a2 − 1) + 1 − u2 ]2 − [n + + v − y]2
− [(a2 − 1)2 + 1 − m2 ]2 − [ai + k + 1 − − i]2
− [((a + u2 (u2 − a))2 − 1)(n + 4dy)2 + 1 − (x + cu)2 ]2
− [p + (a − n − 1) + b(2an + 2a − n2 − 2n − 2) − m]2
− [q + y(a − p − 1) + s(2ap + 2a − p2 − 2p − 2) − x]2

− [z + p(a − p) + t(2ap − p2 − 1) − pm]2 ,
which is of degree 25 and has 26 variables. An interesting corollary is that if p
is a prime number, then there is a computation that confirms the primality of p
that involves only 87 additions and multiplications. Indeed, one only need exhibit
natural numbers a, b, c, d, . . . , x, y, z so that P (a, b, c, . . . , x, y, z) = p. Of course,
finding such numbers is no easy task.
Mills’s constant. The existence of a single polynomial that encodes the prime
numbers is shocking. What about a simple formula that produces only prime
numbers? In 1947, William H. Mills proved the existence of a constant A (called
Mills’s constant) so that
n
A3
is prime for n = 0, 1, 2, . . .. Since there are actually uncountably many values of A
with this property, the term “Mills’s constant” is mildly inappropriate, especially
since Mills himself did not specify a precise numerical value for A. Assuming the
truth of the Riemann hypothesis, the smallest possible Mills’s constant begins
1.3063778838630806904686144926026057129167845851567136 . . . .
It was calculated to 6,850 decimal places by Chris K. Caldwell and Yuanyou Cheng
in 2005 [2]. It is unknown whether this constant is rational or irrational.
The proof of Mills’s result indicates how one might go about constructing such
a constant A. Unfortunately, one needs to know a lot about the distribution of the
prime numbers to compute A and hence Mills’s result is not a practical method for
producing primes.
Here is the proof. Let pn denote the nth prime number. Using knowledge about
the rate of growth of the Riemann zeta function on the critical line 12 + it, Albert
Ingham showed in 1937 that there is a constant K so that1
pn+1 − pn < Kp5/8
n
1 Roger C. Baker (1947– ), Glyn Harman (1956– ), and János Pintz (1950– ) showed that the
exponent can be lowered from 5/8 = 0.625 to 0.525 [1]. Assuming the Riemann hypothesis, this
can be reduced even further and the constant K made explicit.
for all n ∈ N [4]. We use this to show that if N ≥ K 8 , then there is a prime p so
that
N 3 < p < (N + 1)3 . (1983.1)
To see this, let pn = N 3 . Then
N 3 < pn+1
< pn + Kp5/8
n
< N 3 + KN 15/8
≤ N3 + N2
< (N + 1)3 − 1,
as desired.
If P0 ≥ K 8 is prime, then (1983.1) permits us to find a sequence P0 , P1 , P2 , . . .
of primes so that
Pn3 < Pn+1 < (Pn + 1)3 − 1. (1983.2)
Define
−n −n
un = Pn3 and vn = (Pn + 1)3 .
Then perform a few computations based upon (1983.2) to verify that
un < un+1 < vn+1 < vn
for all n. In particular, the sequence un is increasing and bounded above by v0 and
is therefore convergent. Define
A = lim un
n→∞
and observe that un < A < vn , and hence
n
Pn < A3 < Pn + 1,
for all n. Thus,
n
A3 = Pn
is prime for n = 0, 1, 2, . . ..
Bibliography
[1] R. C. Baker, G. Harman, and J. Pintz, The difference between consecutive primes. II, Proc.
London Math. Soc. (3) 83 (2001), no. 3, 532–562, DOI 10.1112/plms/83.3.532. MR1851081
[2] C. K. Caldwell and Y. Cheng, Determining Mills’ constant and a note on Honaker’s problem,
J. Integer Seq. 8 (2005), no. 4, Article 05.4.1, 9. MR2165330
com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-
08-10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf.
[4] A. E. Ingham, On the difference between consecutive primes, Quart. J. Math. Oxford 8 (1937),
255–266.
[5] J. P. Jones, D. Sato, H. Wada, and D. Wiens, Diophantine representation of the set of prime
numbers, Amer. Math. Monthly 83 (1976), no. 6, 449–464, DOI 10.2307/2318339. https://
www.jstor.org/stable/2318339. MR0414514
[6] Y. Matiyasevich, My Collaboration with Julia Robinson, http://logic.pdmi.ras.ru/~yumat/
personaljournal/collaborationjulia/index.html.
[7] W. H. Mills, A prime-representing function, Bull. Amer. Math. Soc. 53 (1947), 604, DOI
10.1090/S0002-9904-1947-08849-2. MR0020593
[8] C. Reid, The autobiography of Julia Robinson, College Math. J. 17 (1986), no. 1, 3–21,
DOI 10.2307/2686866. https://www.maa.org/sites/default/files/pdf/upload_library/
22/Polya/07468342.di020720.02p00912.pdf. MR827630
[9] C. Reid, Being Julia Robinson’s sister, Notices Amer. Math. Soc. 43 (1996), no. 12, 1486–
1492. http://www.ams.org/notices/199612/reid.pdf. MR1416722
[10] Wikipedia, Hilbert’s tenth problem, http://en.wikipedia.org/wiki/Hilbert’s tenth
problem.
[11] Wikipedia, Julia Robinson. http://en.wikipedia.org/wiki/Julia_Robinson.
[12] C. Wood, Julia Robinson and Hilbert’s Tenth Problem (film review), Notices of the American
Mathematical Society (2008), 573-575. http://www.ams.org/notices/200805/tx080500573p.
pdf.
1984
1984
Introduction
The year is the title of this entry. The other entries of this work honor mathe-
maticians or mathematical events; in a sense, this year honors math itself. Here 1984
refers to the classic dystopian novel, 1984, by George Orwell (1903–1950). Written
thirty-five years prior to 1984, it describes a world at perpetual war in which the
three major governments manipulate and control their populations. Some of the
methods of control are centuries old, such as informants, constant surveillance, and
fear. Others are either new or are given a clearer expression than before, such
as Newspeak (the language of Oceania, designed to limit freedom of thought by
restricting what can be discussed).
One of the most famous passages of the novel involves the “equation”
2 + 2 = 5,
which is false (unless one works modulo 1, as one does in certain Diophantine
approximation problems; see the 1922, 1931, 1938, and 1972 entries). The protag-
onist, Winston Smith, is thinking about Big Brother, the rule of the party, and
“alternative facts”:
In the end the Party would announce that two and two made five,
and you would have to believe it. It was inevitable that they should
make that claim sooner or later: the logic of their position demanded
it. Not merely the validity of experience, but the very existence of
external reality, was tacitly denied by their philosophy. The heresy
of heresies was common sense. And what was terrifying was not that
they would kill you for thinking otherwise, but that they might be
right. For, after all, how do we know that two and two make four? Or
that the force of gravity works? Or that the past is unchangeable? If
both the past and the external world exist only in the mind, and if the
mind itself is controllable. . . what then?
A few paragraphs later, the chapter ends with Winston thinking:

Freedom is the freedom to say that two plus two make four. If that is
granted, all else follows.
The Star Trek: The Next Generation episode Chain of Command, Part II fea-
tures a striking homage to 1984 that mirrors the powerful scene in which Winston
Smith is tortured by O’Brien (Orwell does not provide the character with a first
391
392 1984. 1984
name).1 A Cardassian torturer shines four bright lights on a physically restrained

Jean-Luc Picard, who is pressured most unpleasantly over many days to say that
he sees five lights. Picard’s captor gives up on obtaining useful information from
Picard and focuses only on breaking his spirit. Fortunately, a diplomatic solution
is found by the higher-ups and Picard’s release is ordered, although not before he
almost gives in. As he is ushered out of the torture chamber he shouts at his former
captor, “There are four lights!”
In honor of Winston’s quote about everything following from the freedom to
say 2 + 2 = 4, this year’s problem is the famous four fours puzzle: given four fours
and the unlimited use of a finite set of mathematical operations, which natural
numbers are constructible? For example,
44 − 44 = 0, 44/44 = 1, and 4/4 + 4/4 = 2.
More creatively, we have
4! + 4! + 4/4 = 49.
There are many variants of the puzzle. The following is from [6].
Here’s a brain teaser! Can you (with the help of your calculator, as
needed) “build” all the whole numbers between 1 and 100 using only
four 4’s? Use only the + - × / ( ) . ∧ 2 = and 4 keys on your calculator;
4! = 4 × 3 × 2 × 1 is allowed, along with repeating decimal 4 (.4444 . . .).
(All the whole numbers up to 120 have been “built” with just four
4’s—how many can you find?)

To make the problem even more interesting, let us add a scoring component.
Assign a cost of 1 unit to the four basic binary operations (addition, subtraction,
multiplication, and division). Assign a cost of 2 units for exponentiation, factoriza-
tion, and nth roots. Continue along these lines until you have assigned a value to
all the operations you are allowed to use. Classify all numbers of cost at most C.
Given some integer n, is there a bound on the minimal cost to represent it?
1984: Comments
So Long, and Thanks for All the Fish. The fourth book, So Long, and
Thanks for All the Fish, in Douglas Adams’s heralded “Hitchhiker’s guide trilogy”
was released in 1984. In the first book, The Hitchhiker’s Guide to the Galaxy, the
supercomputer Deep Thought (after a seven and a half million year long calcula-
tion) produced the “Answer to the Ultimate Question of Life, The Universe, and
Everything”: 42. This unhelpful response prompted the construction of a much
more sophisticated computer that would find out what the Ultimate Question ac-
tually was.
1 This is an inversion of the “O’Brien must suffer” theme (as fans refer to it) in Star Trek:
Deep Space Nine scripts. The character Miles O’Brien (2383– ), a noncommissioned everyman
whom audience members could relate to, was often subjected to various physical and emotional
tortures.
At the end of the second book, The Restaurant at the End of the Universe, after
ten million additional years, the new computer (we will not spoil its identity) reveals
that the Ultimate Question is, “What do you get if you multiply six by nine?”
Unfortunately, an error was unintentionally introduced into the computation that
rendered the result meaningless. Or did it?
Most readers will agree that 6 × 9 = 54. However, it is true that
6 × 9 = 42
if the computations are carried out in base 13 since
54 = 4 · 13 + 2 · 1 = (42)13 .
Adams has claimed that this was unintentional. On the other hand, the title “42”
of the 2007 Doctor Who episode is an intentional simultaneous homage to Douglas
Adams and the television show “24,” along with a reference to the approximate
running time of the episode (which proceeds in “real time”).
Bieberbach conjecture. Ludwig Bieberbach (1886–1982) is best known for

the conjecture that bears his name and for being a terrible person: he was a pas-
sionate Nazi who even wore a Nazi uniform while giving an examination [4].

Suppose that f (z) = ∞ n
n=0 an z defines an analytic function on the open unit
disk D in the complex plane. A sufficient condition for this is
lim sup |an |1/n ≤ 1,
n→∞
which ensures that the radius of convergence of the power series that defines f is
greater than or equal to 1. If f is one-to-one on D, what can be said about the
growth of the Taylor coefficients an ? Since f (0) = a0 , it makes sense to normalize
f so that f (0) = 0; otherwise, a0 could be any complex number. Then f (0) = a1
and there are two possibilities. If a1 = 0, then f (0) = 0 and basic complex analysis
tells us that f is not one-to-one on any neighborhood of the origin. Thus, we may
assume that a1 = 0. In this case, we may divide f by a1 and, without loss of
generality, we may as well assume that a1 = 1:
f (z) = z + a2 z 2 + a3 z 3 + · · · .
Such a one-to-one analytic function is called a schlicht function.
Suppose that f is schlicht. The Bieberbach conjecture, first posed in 1916,
states that |an | ≤ n for n ≥ 2 and, moreover, that if equality is attained for some
n, then
∞

z
f (z) = = nαn−1 z n (1984.1)
(1 − αz)2 n=1
for some constant α of absolute value one [1]. The function

z
k(z) = = z + 2z 2 + 3z 3 + · · ·
(1 − z)2
is the Koebe function; see Figure 1. The extremal functions (1984.1) are just
rotations of the Koebe functions in the following sense: f (z) = α−1 k(αz).
394 1984. 1984
Figure 1. The Koebe function, named after Paul Koebe (1882–

1945), k(z) = z/(1 − z)2 is a one-to-one map from D onto the
complement of the half-line (−∞, − 41 ] on the real axis.
The Bieberbach conjecture was finally proved in 1984 by Louis de Branges

(1932– ) [2, 3]. Because of several unfortunate incidents in the past in which
de Branges had claimed erroneous solutions to major open problems, his claims
were greeted with some skepticism. He traveled to Leningrad in 1984, where sev-
eral mathematicians worked through the proof and declared it sound. Overnight,
de Branges went from being a pariah of sorts to a superstar. He was showered with
various prizes, ranging from the Ostrowski Prize (1989) to the Steele Prize (1994).
Bibliography
[1] L. Bieberbach, Über die Koeffizienten derjenigen Potenzreihen, welche eine schlichte Abbil-
dung des Einheitskreises vermitteln, Sitzungsber. Preuss. Akad. Wiss. Phys-Math. Kl. (1916),
940–955.
[2] L. de Branges, A proof of the Bieberbach conjecture, Acta Math. 154 (1985), no. 1-2, 137–
MR772434
[3] L. de Branges, Underlying concepts in the proof of the Bieberbach conjecture, Proceedings
of the International Congress of Mathematicians, Vol. 1, 2 (Berkeley, Calif., 1986), Amer.
Math. Soc., Providence, RI, 1987, pp. 25–42. http://www.mathunion.org/ICM/ICM1986.1/
Main/icm1986.1.0025.0042.ocr.pdf. MR934213
[4] J. J. O’Connor and E. F. Robertson, Ludwig Georg Elias Moses Bieberbach, Mac-
Tutor History of Mathematics, http://www-history.mcs.st-andrews.ac.uk/Biographies/
Bieberbach.html.
[5] G. Orwell, Nineteen Eighty-Four: A novel, Secker & Warburg, 1949.
[6] Texas Instruments Incorporated, The Great International Math on Keys Book, Texas Instru-
ments, 1976.
1985
The Jones Polynomial
Introduction
A knot is an embedding of a circle in three-dimensional space. We consider here
only tame knots; these are knots that can be physically realized with a string or
rope that has a nonzero thickness. Given two knots, how do we determine if they
are equivalent? That is, can we manipulate one of them into the other without
cutting? For example, is the trefoil knot (Figure 1(b)) equivalent to the unknot
(Figure 1(a))?
Knot theorists try to solve these sorts of problems by associating a mathemat-
ical object (an invariant) to each knot in such a way that equivalent knots are
assigned the same invariant. Thus, two knots with different invariants are truly
different knots: neither can be manipulated into the other (on the other hand, two
knots with the same invariant might turn out to be inequivalent). One desires knot
invariants that are simple to compute and compare. So far, nobody has come up
with a simple invariant that can distinguish between all nonequivalent knots.
Although knots have been used since ancient times, their mathematical study
began with Gauss’s development of linking numbers in 1833. Although physicists
(a) The unknot. (b) The trefoil knot.
Figure 1. The trefoil knot cannot be deformed into the unknot

without passing through itself. The Jones polynomials of these
two knots are 1 and −t−4 + t−3 + t−1 , respectively. Since these
polynomials are different, the two knots are not equivalent.
395
396 1985. THE JONES POLYNOMIAL
were interested in knots for a period in the 1800s, the modern study of knots
only took off in the early 20th century. Max Dehn (see the 1980 entry), James
Waddell Alexander (1888–1971), and Kurt Reidemeister (1893–1971) were early
contributors. In particular, Alexander discovered the first knot polynomial [2]. The
so-called Alexander polynomial of a knot is a Laurent polynomial (negative powers
of the variable are permitted) with integer coefficients that is a knot invariant: two
equivalent knots share the same Alexander polynomial.
The year 1985 marked the publication of the explosive paper “A polynomial
invariant for knots via von Neumann algebras” by Vaughan F. R. Jones [7]. This
paper introduced the Jones polynomial of a knot, a Laurent polynomial invariant
that is distinct from the Alexander polynomial and that could settle problems
t−2 − t−1 + 1 − t + t2 t3 + t5 − t8
t−5 − 2t−4 + 2t−3 − 2t−2 + 2t−1 − 1 + t t2 + t4 − t5 + t6 − t7
Figure 2. Several knots and their Jones polynomials. Since their

Jones polynomials are different, these knots are mutually nonequiv-
alent.
that were impervious to previous methods; see Figure 2. It also exposed links
between knot theory and physics that revitalized interest in the subject. Soon
afterwards, a variety of other invariants, such as the HOMFLY polynomial [4], were
discovered. Jones’s work had ushered in a new age in knot theory. See [1, Ch. 6]
for an introduction to the Alexander, Jones, and HOMFLY polynomials and how
to compute them.
The existence of the Jones polynomial was not the most surprising part of the
paper [7]. The most stunning portion of the title is “von Neumann algebras,” a
highly technical and abstract branch of operator theory (think linear algebra in
infinite-dimensional spaces with a hefty dose of analysis) with no initially apparent
connections to low-dimensional topology at all. One could hardly have predicted
deep links between two more seemingly disparate parts of mathematics! It is like
claiming that techniques from the theory of large cardinals (transfinite numbers
that are so large they require additional axioms beyond ZFC) could be used to
solve open problems in biostatistics. Most mathematicians prior to Jones would
have dismissed a connection between von Neumann algebras and knot theory as
wild fantasy. For his amazing discovery, Jones was awarded the Fields Medal in
1990.

Proposed by Chad Wiley, Emporia State University.
The Jones polynomial of the unknot is the constant polynomial 1. Are there
any nontrivial knots which also have this property? Surprisingly, despite all the
research that has been done, we still do not know the answer. A more accessible
problem would be to show, perhaps using Rolfsen’s tables [8], that a nontrivial knot
with Jones polynomial 1 must have at least 11 crossings. In fact, it can be shown
that such a knot would need to have at least 18 crossings; see [3] for details.
1985: Comments
Knot theory in other dimensions. It is only in three dimensions that knot
theory is interesting. In two dimensions, there is essentially only the unknot (the
unit circle) since a knot cannot cross itself. In four dimensions and higher, there
is too much freedom to manipulate knots. One can show that any knot in four
dimensions can be untangled to obtain the unknot; see Figure 3 for an intuitive
explanation of this phenomenon.
Where do von Neumann algebras come in? Knot theorists rapidly built
on the Jones polynomial and developed more direct constructions. They have
since discovered many new invariants without the use of von Neumann algebras.
Consequently, information about the technical details of Jones’s work is hard to
come by in the knot theory literature.
We attempt to sketch, in broad strokes, some of the details behind Jones’s dis-
covery. We thank James Tener for his assistance in this endeavor. First of all, a
von Neumann algebra is a collection of bounded linear operators on a (typically
Figure 3. We can represent the position of a knot in four dimen-

sions with three spatial dimensions and one “color dimension.”
Two portions of the knot that are different colors can slide past
each other. Thus, even complicated knots like the Stevedore knot
above can be unknotted in four dimensions.
infinite-dimensional) Hilbert space that satisfies several desirable algebraic and

topological axioms. As a finite-dimensional analogue, one might think of the set
of n × n complex matrices as a model, although do not get too comfortable with
that analogy! A factor is a von Neumann algebra that is highly noncommutative,
in the sense that its center (the set of all things that commute with everything in
the algebra) consists only of multiples of the identity operator. In 1936, Francis
Murray (1911–1996) and von Neumann introduced the theory of von Neumann al-
gebras (“rings of operators”) and showed that factors come in three basic flavors,
called Type I, Type II, and Type III.
Certain factors of Type II possess a well-behaved “trace” that behaves like the
trace on the n × n matrices; these factors are called II1 factors. A factor that is
contained by another factor is, not surprisingly, a subfactor . Given a II1 subfactor
M−1 ⊆ M0 , Jones’s basic construction produces a new factor M1 that is generated
by M0 and an orthogonal projection e1 [6]. One then iterates this procedure to get
the Jones tower
M−1 ⊆ M0 ⊆ M1 ⊆ M2 ⊆ · · ·
and a sequence of orthogonal projections ej ∈ Mj such that Mj is generated by

Mj−1 and ej . One then considers the finite-dimensional algebras T Ln generated by
{e1 , e2 , . . . , en }. This algebra inherits a positive trace from the II1 factors. A key
property of this trace is the “Markov property”: tr(xen ) = τ tr(x) if x ∈ T Ln−1 ,
in which τ is the reciprocal of the “index” of the subfactor. The projections satisfy
the Temperly–Lieb(–Jones) relations

• e2j = ej ,
• ej ek ej = τ ej when |j − k| = 1, and
• ej ek = ek ej when |k − j| ≥ 2.
One cannot have an algebra with these relations and a positive trace unless
τ ∈ {4 cos2 (π/n) : n = 3, 4, . . .} ∪ [4, ∞);
moreover, each of these values does occur. The fact that the spectrum of permissible
index values has both a discrete and a continuous component is the most startling
aspects of Jones’s paper on subfactors [6].
The Temperly–Lieb(–Jones) relations are reminiscent of the relations that de-
fine the braid group:
Bn = σ1 , σ2 , . . . , σn : σi σj = σj σi if |i − j| ≥ 2, σi σi+1 σi = σi+1 σi σi+1 .
It turns out that T Ln contains a representation of the braid group Bn , with gen-
erators (t + 1)ej − 1, where τ −1 = 2 + t + t−1 . This formula gives a family of
representations πt of Bn (parametrized by t = −1) with special traces defined on
them. However, for the rest of the story the positivity is not crucial, only the al-
gebras T Ln with their special trace, which can be defined for any complex number
τ . At this point, Jones noticed the similarity between the Temperly–Lieb relations
and the braid group relations and talked to Joan Birman (1927– ) about it [5].
Alexander’s theorem says that any knot can be described as the closure of a
braid. This process is not injective, but Markov’s theorem says that two braids
result in the same link if and only if one can be obtained from the other by two
types of moves. The first move is conjugation in the braid group, so to obtain
link invariants from representations of the braid group, one needs a trace on the
representation π. Thus, g → tr(π(g)) gives an invariant of the braid that is invariant
under conjugation. The key to getting link invariants is to have a trace that is also
invariant under the second Markov move.
According to Jones’s paper, he realized the connection between the second
Markov move and the Markov property of the trace as a result of the conversations
with Birman. Once you multiply by an appropriate factor, his trace was indeed
invariant under the second Markov move and hence gave an invariant of links. The
normalizing factor uses the “writhe” of the link, which requires an orientation,
so the Jones polynomial is actually an invariant of oriented links. One can then
prove many of the properties of the Jones polynomial. For example, it is a Laurent
polynomial in t1/2 . Since the Jones polynomial can distinguish between the trefoil
knot and its mirror image, it was no mere variant of the Alexander polynomial; it
was indeed something novel.
Quantum topology. We cannot resist sharing an enlightening exchange from

MathOverflow [9]. One user asked, “Why should I care about the Jones polyno-
mial?” A particularly erudite answer was provided by Jonny Evans, who responded:
Operator algebras had grown out of an attempt to formalise quantum
mechanics/QFT, and it was surprising that a knot invariant should
appear naturally in a completely different subject. At around the
same time, other manifold invariants (Donaldson invariants, instanton

Floer homology) appeared that were also inspired by constructions in
physics. These other manifold invariants had very definite topological
consequences, solving huge open questions in 4-dimensional topology.
Witten showed that all of these invariants (including the Jones
polynomial) can be obtained formally by performing path integrals.
For example, roughly speaking, the Jones polynomial can be obtained
by looking at all possible connections A on a suitable bundle on your
3-manifold, taking the trace of the holonomy of A around your knot,
multiplying by eiCS(A) where CS is the Chern–Simons invariant of A,
and then integrating the result (over the infinite-dimensional space of
all connections, which is a kind of path integral). Whether or not you
care about physics, that is a pretty cool way to define an invariant.
This led mathematicians to study topological quantum field the-
ories, and the associated manifold invariants. In particular, Khovanov
was led to discover his homological refinement of the Jones polynomial,
which is extraordinarily useful as a knot invariant. . . .
See [10] for Witten’s down-to-earth exposition of the Jones polynomial and its
relation to quantum mechanics.
Bibliography
[1] C. C. Adams, The knot book: An elementary introduction to the mathematical theory of
knots, revised reprint of the 1994 original, American Mathematical Society, Providence, RI,
2004. MR2079925
[2] J. W. Alexander, Topological invariants of knots and links, Trans. Amer. Math. Soc. 30
(1928), no. 2, 275–306, DOI 10.2307/1989123. MR1501429
[3] O. T. Dasbach and S. Hougardy, Does the Jones polynomial detect unknottedness?, Ex-
periment. Math. 6 (1997), no. 1, 51–56. http://www.or.uni-bonn.de/~hougardy/paper/
does_the.pdf. MR1464581
[4] P. Freyd, D. Yetter, J. Hoste, W. B. R. Lickorish, K. Millett, and A. Ocneanu, A new polyno-
mial invariant of knots and links, Bull. Amer. Math. Soc. (N.S.) 12 (1985), no. 2, 239–246,
DOI 10.1090/S0273-0979-1985-15361-3. MR776477
[5] A. Jackson and L. Traynor, Interview with Joan Birman, Notices Amer. Math. Soc. 54 (2007),
no. 1, 20–29. http://www.ams.org/notices/200701/fea-birman.pdf. MR2275922
[6] V. F. R. Jones, Index for subfactors, Invent. Math. 72 (1983), no. 1, 1–
MR696688
[7] V. F. R. Jones, A polynomial invariant for knots via von Neumann algebras, Bull. Amer.
Math. Soc. (N.S.) 12 (1985), no. 1, 103–111, DOI 10.1090/S0273-0979-1985-15304-2. http://
www.ams.org/journals/bull/1985-12-01/S0273-0979-1985-15304-2/. MR766964
[8] The Knot Atlas, The Rolfsen Knot Table, http://katlas.org/wiki/The Rolfsen Knot Table
[9] MathOverflow, Why should I care about the Jones polynomial?, https://mathoverflow.net/
questions/304486/why-should-i-care-about-the-jones-polynomial.
[10] E. Witten, Jones polynomial, https://www.ias.edu/ideas/2011/witten-knots-quantum-
theory.
1986
Sudokus and Look and Say
Introduction
Long ago movie theaters had double features, at which you could see two films
for the price of one. The Astor Theatre in Melbourne opened in 1936. It is one of
the few places in the world where one can still catch a double feature. In honor of
its 50th anniversary, we present a mathematical double feature: two “recreational”
math topics for the price of one!
Almost everyone has heard about Sudokus. Their rise to popularity began in
1986 with the puzzle company Nikoli in Japan. Since then, they have become so
ubiquitous that they now share space with crossword puzzles in newspapers and
airline magazines. One is presented with a partially filled 9 × 9 grid, which is
subdivided into nine blocks of size 3 × 3. The goal is to fill in the empty boxes
with digits in such a way that each row and column is a permutation of 1, 2, . . . , 9.
Moreover, each block must contain each of 1, 2, . . . , 9 exactly once. Figure 1 is a
good example; see Figure 2 in the comments for the solution.
Sudokus involve a lot of terrific mathematics. The first natural question to
ask is how many distinct Sudoku puzzles there are. For example, if we switch all
1’s and 9’s, we obtain a puzzle that looks different, but that is fundamentally the
same. There are other transformations that can be performed: rotate the puzzle
2 7 6
6 1 3
4 9 2
3 2 5 9
1 5 3 7
6 9 1 4
1 5 2
9 6 1
4 2 3
Figure 1. A Sudoku challenge.
401
402 1986. SUDOKUS AND LOOK AND SAY
by 90 degrees, reflect across the diagonal, and so forth. Up to symmetries, there

are 5,472,730,538 essentially different puzzles.
An implicit rule, observed by Sudoku puzzle creators, is that there must be
exactly one solution to each puzzle. What is the minimal number of clues that
must be given in order to uniquely determine how a Sudoku grid is filled? The
answer, 17, was obtained in 2012 by Gary McGuire, Bastian Tugemann, and Gilles
Civario who wrote [5]:
The Sudoku minimum number of clues problem is the following ques-
tion: what is the smallest number of clues that a Sudoku puzzle can
have (and lead to a unique solution)? For several years it had been
conjectured that the answer is 17. We have performed an exhaustive
computer search for 16-clue Sudoku puzzles, and did not find any, thus
proving that the answer is indeed 17. In this article we describe our
method and the actual search. As a part of this project we developed a
novel way for enumerating hitting sets. The hitting set problem is com-
putationally hard; it is one of Richard Karp’s 21 classic NP-complete
problems. A standard backtracking algorithm for finding hitting sets
would not be fast enough to search for a 16-clue Sudoku puzzle exhaus-
tively, even at today’s supercomputer speeds. To make an exhaustive
search possible, we designed an algorithm that allowed us to efficiently
enumerate hitting sets of a suitable size.
Our example (Figure 1) has 30 clues and is therefore much simpler than the worst
case scenario: a Sudoku with only 17 clues. For more information about Sudoku,
see [4, 8, 10].
For our second feature, consider the famous see and say sequence (or look and
say sequence) introduced by John Horton Conway (1937– ) in 1986. The first few
terms are
1, 11, 21, 1211, 111221, 312211, 13112221, 1113213211. (1986.1)
The pattern is not immediately obvious because we are used to looking for patterns
that arise from mathematical processes. However, (1986.1) is generated linguis-
tically. It is created by the process suggested by its name. The first number is
“one 1”, so the second number is 11. The second number is “two 1’s”, so the third
number is 21, and so on. Can you show that no digit other than 1, 2, or 3 appears
in the sequence?
Conway and his colleagues proved a number of remarkable facts about the
sequence (1986.1). The following is from the abstract of a talk on the subject given
by Alex Kontorovich at Columbia on March 23, 2004:
He [Conway] found that the sequence decomposed into certain recur-
ring strings. Categorizing these 92 strings and labeling them by the
atoms of the periodic table (from Hydrogen to Uranium), Conway was
able to prove that the asymptotic length of the sequence grows expo-
nentially, where the growth factor (now known as Conway’s constant)
is found by computing the largest eigenvalue of a 92 × 92 transition
matrix. Even more remarkable is the Cosmological Theorem, which
states that regardless of the starting string, every Look and Say se-
quence will eventually decay into a compound of these 92 atoms, in a
bounded number of steps. Conway writes that, although two indepen-
dent proofs of the Cosmological Theorem were verified, they were lost
in writing! It wasn’t until a decade later that Doron Zeilberger’s pa-
per (coauthored with his computer, Shalosh B. Ekhad) gave a tangible
proof of the theorem. We will discuss this weird and wonderful chem-
istry, and some philosophical consequences. The only prerequisite is
basic linear algebra.
Many variants of Conway’s sequence have been analyzed. Some use different
starting numbers, others use binary, and still others count the total number of digits
instead of the numbers of digits in blocks. See [1–3, 9] for more information about
the “look and say” sequence and its variations.

Proposed by Steven J. Miller and Samuel Tripp, Williams College.
Instead of a 9 × 9 Sudoku, one can consider n2 × n2 Sudoku puzzles. How does
the minimum number of clues required to impose a unique solution grow with n?
Can you find any lower bounds? Any upper bounds?
Let us now consider variants of the look and say sequence.
(a) What if we say things backwards? For example, instead of saying “two three”
for 33 we say “three two”? There is no difference for 1, 22, or 333, but there is
a difference for 33. If we start with 1, then the first few terms of the sequence
are
1, 11, 12, 1121, 122111, 112213, 12221131.
Each of these is the reverse of the corresponding term in the original sequence
(1986.1). Does this pattern hold forever?
(b) What if whenever we have just one of a number, we just write that num-
ber? In this case, if we start with 1, then the sequence produced is 1, 1, 1, . . ..
Starting with 11 yields the sequence 11, 21, 21, 21, . . .. Starting with 112 yields
112, 212, 212, 212, . . .. If we start with a finite string composed of 1’s, 2’s, and
3’s, does the sequence produced eventually stabilize?
(c) Consider 3-digit substrings of the sequence (1986.1). Prove that 333 will never
be found as a 3-digit substring of any term. Find three other such 3-digit
substrings that never appear.
(d) Show that if d ≥ 4 does not occur in the first two terms of the sequence, then
d never occurs.
There are of course many other problems you could study; see [7].
1986: Comments
An algorithm for Sudokus. The Sudoko-solving approach that we suggest
below is not the fastest, but it connects to our 1947 entry on linear programming.
See [6] and the references therein for more information about algorithms and linear
programming.
404 1986. SUDOKUS AND LOOK AND SAY
Let X = [xi,j ]9i,j=1 be the 9 × 9 matrix that represents the unique solution to
the Sudoku puzzle. Some of the xi,j (hopefully at least 17 of them!) are given and
we must find the rest. We have the conditions that in each row, each column, and
each of the nine 3 × 3 blocks, each digit 1, 2, . . . , 9 appears exactly once. Linear
programming can solve problems such as these, although the additional restriction
that our entries are integers makes things more difficult.
A lot of packages exist for solving binary integer programming, which requires
that the variables only assume the values 0 or 1. We can modify our approach by
choosing variables xi,j,d for 1 ≤ i, j, d ≤ 9 such that

1 if xi,j = d,
xi,j,d =
0 otherwise.
How many constraints on the xi,j,d do we have? If I is the set of locations for which
we are given initial values, then we have |I| conditions. However, this is dwarfed
by what remains. Each row, column, and 3 × 3 block has exactly one of each of
the nine digits. The fact that we require exactly one 5 in the first row yields the
constraint
x1,1,5 + x1,2,5 + · · · + x1,9,5 = 1.
This yields 81 constraints for the rows, 81 for the columns, and 81 for the blocks
(some may be redundant or unnecessary due to the placement of the initial values).
So we have about 240 constraints, give or take a few. However, we do not want the
nonzero values to correspond to the same cell and hence we add 81 more constraints

9
xi,j,d = 1, 1 ≤ i, j ≤ 9,
d=1
which gives us around 320 constraints.

We can reduce the number of constraints by instead replacing {1, 2, . . . , 9} with
S = {1, 10, 100, . . . , 108 }.
As before, the constraint

x1,j,d = 1
d∈S
ensures that exactly one of the xi,j,d is nonzero; we need 81 such constraints to make
sure we choose exactly one element of S for each grid location. The constraint on
the jth row is now

9
d · xi,j,d = 111,111,111.
d∈S i=1
Our choice of S ensures that the only way a row can sum to 111,111,111 is to have
exactly one element in the jth row that equals 1, exactly one that equals 10, and so
forth. This reduces the number of constraints by a huge amount, leaving us around
100 constraints to contend with.
2 3 8 7 6 9 1 4 5
6 5 1 4 8 3 7 2 9
4 9 7 1 2 5 8 3 6
3 2 4 6 7 1 5 9 8
1 8 5 2 9 4 3 6 7
7 6 9 3 5 8 2 1 4
9 7 3 8 1 6 4 5 2
5 4 2 9 3 7 6 8 1
8 1 6 5 4 2 9 7 3
Figure 2. The answer to the Sudoku challenge.
Bibliography
[1] J. H. Conway, The weird and wonderful chemistry of audioactive decay, Eureka
46 (1986), 5-18. http://graphics8.nytimes.com/packages/pdf/crossword/GENIUS AT PLAY
Eureka Article.pdf.
[2] S. B. Ekhad and D. Zeilberger, Proof of Conway’s lost cosmological theorem, Elec-
tron. Res. Announc. Amer. Math. Soc. 3 (1997), 78–82, DOI 10.1090/S1079-6762-97-
00026-7. http://www.ams.org/journals/era/1997-03-11/S1079-6762-97-00026-7/S1079-
6762-97-00026-7.pdf. MR1461977
[3] Ó. Martı́n, Look-and-say biochemistry: exponential RNA and multistranded DNA, Amer.
Math. Monthly 113 (2006), no. 4, 289–307, DOI 10.2307/27641915. http://web.archive.
org/web/20061224154744/http://www.uam.es/personal_pdi/ciencias/omartin/Biochem.
PDF. MR2211756
[4] Math Explorer’s Club, The Math Behind Sudoku: References, http://www.math.cornell.
edu/~mec/Summer2009/Mahmood/References.html.
[5] G. McGuire, B. Tugemann, and G. Civario, There is no 16-clue Sudoku: solving the
Sudoku minimum number of clues problem via hitting set enumeration, Exp. Math. 23
(2014), no. 2, 190–217, DOI 10.1080/10586458.2013.870056. http://arxiv.org/abs/1201.
0749. MR3223774
[7] C. Rivera, Puzzle 657: Look and say sequence, http://www.primepuzzles.net/puzzles/
puzz_657.htm.
[8] E. Russell and F. Jarvis, There are 5472730538 essentially different Sudoku grids. . . and the
Sudoku symmetry group, Mathematical Spectrum 39 (2006), 54–58.
[9] Wikipedia, Look and Say, http://en.wikipedia.org/wiki/Look-and-say_sequence.
[10] Wikipedia, Sudoku, http://en.wikipedia.org/wiki/Sudoku.
1987
Primes, the Zeta Function, Randomness,

and Physics
Introduction
In the 1942 entry, we saw that the Riemann zeta function
∞
1
ζ(s) =
n=1
ns
can be analytically continued from the half-plane Re s > 1 to C\{1}, with a simple
pole at s = 1 and with zeros at the negative even integers −2, −4, . . .. The nontrivial
zeros of the zeta function lie in the critical strip 0 < Re s < 1; the Riemann
hypothesis asserts that these all lie on the vertical line Re s = 12 [3].
The Euler product formula (1933.3) suggests a profound relationship between
the zeta function and the prime numbers. We suggested in the 1939 entry that the
location of the nontrivial zeros determines the large-scale behavior of the primes.
This profound link between the continuous (complex analysis) and discrete (prime
numbers) has long fascinated mathematicians. The primes dance to the tune played
by the zeros of an analytic function!
The classical methods of analytic number theory have not yet produced a proof
of the Riemann hypothesis. There is a general opinion among experts that a new
approach is needed. One idea that has spurred a huge amount of research in the
last several decades is the Hilbert–Pólya conjecture, which says that the Riemann
hypothesis is true because there is an unbounded selfadjoint operator H on some
Hilbert space so that the eigenvalues of
1
I + iH
2
are the nontrivial zeroes of the zeta function (the eigenvalues of a selfadjoint oper-
ator are real). Moreover, some expect that H is the Schrödinger operator (see the
1925 entry) corresponding to some quantum system.
Although the conjecture first appeared in print in 1973 [8], it was originally pro-
posed by George Pólya sometime during 1912–1914. Hilbert’s role in the conjecture
is less clear:
David Hilbert did not work in the central areas of analytic number the-
ory, but his name has become known for the Hilbert–Pólya conjecture
for reasons that are anecdotal. [14]
In the early 1980s, Andrew Odlyzko investigated the provenance of the conjecture.
His correspondence with Pólya and Olga Taussky-Todd (1906–1995), who worked
with Hilbert in Göttingen, makes an interesting read [10].
407
408 1987. PRIMES, THE ZETA FUNCTION, RANDOMNESS, AND PHYSICS
The Hilbert–Pólya conjecture suggests a connection with random matrix theory

(see the 1928 entry). While random matrix theory began in the 1920s and was
known to be relevant to nuclear physics since the 1950s, it was not until a fortuitous
meeting at the Institute for Advanced Study in the early 1970s that connections to
number theory emerged. The mathematical physicist Freeman Dyson was talking
with Hugh Montgomery and inquired about his recent work. Montgomery was
looking at the pair correlation of zeros of the Riemann zeta function, and when
he showed the sine kernel answer he found, Dyson remarked that one sees similar
behavior in nuclear physics and random matrix theory. Thus began numerous
productive conversations between the two communities.
The connection with random matrix theory suggests that we should approach
the prime numbers from a probabilistic viewpoint (see the comments below for an
explanation of the Cramér random model of the prime numbers).
The sieve of Eratosthenes is an elementary method for producing every prime
number up to a given threshold. For example, suppose that we wish to find all of
the primes at most 100. First cross out every multiple of 2, other than 2 itself.
Then cross out every multiple of 3, except for 3. The number 4 has already been
crossed out, so we ignore it. We proceed to cross out every multiple of 5 and so
forth. What remains when the procedure terminates is a list of the primes below
100:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Although the sieve is agonizingly slow and unsuitable for practical use, it does
provide a deterministic method to produce all prime numbers in a given range.
In order to understand the extent to which conjectures such as the Riemann
hypothesis or the twin prime conjecture are outcomes of general sieving procedures,
David Hawkins proposed a probabilistic version of the sieve of Eratosthenes [4, 5].
Start with the list of integers greater than 1, and perform an infinite series of passes
through the list; each pass produces what is called a Hawkins prime. On the first
pass, we identify 2 as the first element in the list. We call p1 = 2 the first Hawkins
prime. Then go through the remainder of the list and cross out each element with
probability 1/p1 = 1/2. On the kth pass, we look for the first element in the list
that is greater than pk−1 and that has not been crossed out. We declare it to be
pk and cross out every subsequent integer with probability 1/pk .
For example, 3 might be crossed out on the first pass while 4 is not. In this case
p2 = 4 is the second Hawkins prime! The Hawkins primes do not share all of the
properties of ordinary primes; they only provide a rough model for the large-scale
distribution of the primes.

Proposed by Andrew Odlyzko, University of Minnesota.
The Hawkins sieve can be simulated on a computer and studied. Many proper-
ties of primes, both proven and conjectured, can be shown to hold with probability
one for Hawkins primes. Can one come up with other random sieves that can
be analyzed rigorously and produce numbers that more closely resemble ordinary
primes?
1987: Comments
The explicit formula. In our previous entries on the Riemann zeta function
and the Riemann hypothesis, we alluded to the connection between the location of
the nontrivial zeros of the zeta function and the distribution of the prime numbers.
We address that issue here.
Let
∞
ψ(x) = log p = log p + log p (1987.1)
pk ≤x p≤x k=2 pk ≤x
and note that the main term in ψ(x) comes from the first summand. Believe it or
not, the log p appears throughout the preceding for convenience. The reason has to
do with the Euler product representation of the zeta function and the log p terms
that arise upon taking logarithmic derivatives. Indeed, we have
∞
Λ(n)
ζ (s)
= − ,
ζ(s) n=1
ns
in which
log p if n = pk for some prime p,
Λ(n) =
0 otherwise,
is the von Mangoldt function, named after Hans Carl Friedrich von Mangoldt (1854–
1925). This computation is the first step in many delicate contour integrations; see
[7, Rem. 2.3.21 & Ch. 3] for details.
Let ρ denote a typical zero of ζ(s) in the critical strip. Then 0 < Re ρ < 1; the
Riemann hypothesis is the statement that all such ρ have Re ρ = 12 . If x is not a
prime power, then a hefty dose of complex analysis yields an explicit formula that
relates the sum (1987.1) over the prime numbers to a sum over the zeros of ζ(s):
xρ
ζ (0) 1 1
ψ(x) = x − − − log 1 − 2 .
ρ
ρ ζ(0) 2 x
There is a small technicality here. If ζ(ρ) = 0, then ζ(ρ) = 0. In order to have the
preceding sum converge, we group the terms corresponding to ρ and ρ together.
The Riemann hypothesis implies that
1 √
|ψ(x) − x| ≤ x log x, x ≥ 74. (1987.2)
8π
The square root comes from the assumption that |xρ | = x1/2 ; the extra logarithm
appears because of technical reasons. Through partial summation, one can use
(1987.2) to conclude that there is a constant C such that
|π(x) − Li(x)| ≤ Cx1/2 log x,
in which π(x) denotes the number of primes at most x and
x
dt x
Li(x) = ∼
2 log t log x
denotes the offset logarithmic integral.
These arguments can be reversed: if π(x) is sufficiently close to Li(x), then
the Riemann hypothesis is true. One shows that the existence of a zero with real
part greater than 1/2 leads to a violation on the proposed bound on |π(x) − Li(x)|
(if ζ(ρ) = 0, then ζ(1 − ρ); hence we may assume an exception to the Riemann
hypothesis has real part greater than 1/2).
How big is the nth prime? We have seen that the zeros of the Riemann zeta
function govern the large-scale distribution of the prime numbers. For example, the
prime number theorem is a consequence of the fact that ζ(1 + it) = 0 for t ∈ R.
This famous theorem asserts that
π(x)
lim = 1.
x→∞ x/ log x
Since π(pn ) = n, we substitute q = pn , do a bit of calculus, and obtain

n log n π(pn ) log pn log n
lim = lim
n→∞ pn n→∞ pn log pn
log n
= lim
n→∞ log pn
log π(q)
= lim
q→∞ log q

log π(q)qlog q + log q − log log q
= lim
q→∞ log q

log 1 log log q
= lim +1−
q→∞ log q log q
= 1.
Thus, pn is asymptotic to n log n.
A more precise estimate is due to Michele Cipolla (1880–1947), who proved
that
n(log n + log log n − 1) < pn < n(log n + log log n)
for sufficiently large n [2]. In fact, he showed that

m
(−1)k+1 Tk (log log n) n(log log n)m+1
pn = n log n + log log n − 1 + +O ,
k=1
k logk n logm+1 n
in which Tk is a monic polynomial of degree k with rational coefficients, the first of

which are
T1 (x) = x − 2 and T2 (x) = x2 − 6x + 11.
For example, Cipolla’s formula with m = 2 predicts that the ten millionth prime
number is in the neighborhood of 179,464,275. The ten millionth prime number is
179,424,673. Not too shabby! The historical progression and the current state of
the art on estimating pn are discussed in [1].
The Cramér model. Based on the prime number theorem, Harald Cramér
(1893–1985) proposed a simple probabilistic model of the prime numbers that often
leads to decent predictions [9, 12] (see the comments for the 1975 entry for another
application of heuristic reasoning to the primes). The prime number theorem tells
us that the number of primes at most x is asymptotic to x/ log x or, somewhat

more accurately, Li(x). Consequently, for fixed > 0 and large x we expect about
x + x x − x 2 x
− ∼
log(x + x) log(x − x) log x
primes in the interval [x − x, x + x]. Dividing by the length 2 x of the interval,
it follows that the probability that a natural number in the vicinity of x is prime
is roughly 1/ log x. For n ≥ 2, let Xn be the random variable that is 1 with
probability 1/ log n and 0 otherwise. Since this is a heuristic argument that cannot
yield rigorous results, we may be a little imprecise. For example, 1/ log 2 > 1 and
hence we should omit the prime 2 from our considerations. However, this does not
come back to bite us and we ignore such issues.
Let
RN = X1 + X2 + · · · + XN
denote the number of “random primes” at most N . What is the expected num-
ber E[RN ] of primes at most N ? According to our model and the linearity of
expectation, we have

N
N
1
E[RN ] = E[Xn ] = ∼ Li(N ) ∼ π(N ).
n=2 n=2
log n
This is not surprising, since we designed our model based on the hypothesis that
there should be π(x) primes at most x: E[RN ] better be asymptotic to π(N )!
Since the random variables X1 , X2 , . . . are independent, the variance of their
sum is the sum of their variances. Consequently,1

N
Var(RN ) = Var(Xn )
n=2

N

= E[Xn2 ] − E[Xn ]2
n=2

N
1 1 N
= −
n=2
log n n=2 log2 n

N
1
∼
n=2
log n
N
∼ Li(N ) ∼ .
log N
√
The standard deviation of RN is therefore asymptotic to N 1/2 / log N . Thus, if
the primes are “random” in the sense of the Cramér model, we should expect that
π(x) behaves like Li(x), with an error on the order of x1/2 (ignoring constants and
logarithmic factors). This is what the Riemann hypothesis predicts! Although the
Cramér model does not always give exactly the right answer, it often does a decent
job. It certainly beats having to prove the Riemann hypothesis.
1 We did not apply the central limit theorem because, while our random variables are inde-
pendent, they are not identically distributed. One can use the Lyapunov central limit theorem in
this context [13].
Bibliography
[1] C. Axler, New estimates for the n-th prime number, https://arxiv.org/pdf/1706.03651.
pdf
[2] M. Cipolla, La determinazione assintotica dell’ nimo numero primo, Rend. Accad. Sci. Fis-
Mat. Napoli 3 (1902), no. 8, 132–166.
[3] J. B. Conrey, The Riemann hypothesis, Notices Amer. Math. Soc. 50 (2003), no. 3, 341–353.
http://www.ams.org/notices/200303/fea-conrey-web.pdf. MR1954010
[4] D. Hawkins, The random sieve, Math. Mag. 31 (1957/1958), 1–3, DOI 10.2307/3029322.
MR0099321
[5] J. Lorch and G. Ökten, Primes and probability: the Hawkins random sieve, Math. Mag.
80 (2007), no. 2, 112–119, DOI 10.1080/0025570x.2007.11953464. http://www.cs.bsu.edu/
homepages/jdlorch/mathmag116-123-lorch.pdf. MR2301878
[8] H. L. Montgomery, The pair correlation of zeros of the zeta function, Analytic number theory
(Proc. Sympos. Pure Math., Vol. XXIV, St. Louis Univ., St. Louis, Mo., 1972), Amer. Math.
Soc., Providence, R.I., 1973, pp. 181–193. MR0337821
[9] H. L. Montgomery and K. Soundararajan, Beyond pair correlation, Paul Erdős and his math-
ematics, I (Budapest, 1999), Bolyai Soc. Math. Stud., vol. 11, János Bolyai Math. Soc.,
Budapest, 2002, pp. 507–514. MR1954710
[10] A. Odlyzko, Correspondence about the origins of the Hilbert-Polya Conjecture, http://www.
dtc.umn.edu/~odlyzko/polya/index.html.
[11] A. M. Odlyzko, On the distribution of spacings between zeros of the zeta function, Math.
Comp. 48 (1987), no. 177, 273–308, DOI 10.2307/2007890. http://www.ams.org/journals/
mcom/1987-48-177/S0025-5718-1987-0866115-0/. MR866115
[12] T. Tao, 254A, Supplement 4: Probabilistic models and heuristics for the primes (optional),
https://terrytao.wordpress.com/tag/cramers-random-model/.
[13] Wikipedia, Lyapunov CLT, https://en.wikipedia.org/wiki/Central_limit_theorem#
Lyapunov_CLT.
[14] Wikipedia, Hilbert–Pólya conjecture, https://en.wikipedia.org/wiki/Hilbert-Polya
conjecture.
1988
Mathematica
Introduction
On June 23, 1988, Mathematica 1.0 was launched. What is Mathematica?
Wolfram Mathematica (usually termed Mathematica) is a modern

technical computing system spanning all areas of technical computing—
including neural networks, machine learning, image processing, geom-
etry, data science, visualizations, and others. The system is used in
many technical, scientific, engineering, mathematical, and computing
fields. It was conceived by Stephen Wolfram and is developed by Wol-
fram Research of Champaign, Illinois. [13]
That describes what Mathematica is now; it was not always so all-encompassing. It

was originally focused, as its name implies, on mathematics and it still does math
well. Many of the illustrations in this book were produced with Mathematica, as
were many of the tables of data we have presented. For more on what Mathematica
can do, check out the demonstrations page [11].
Mathematica, even from its beginnings, was capable of complex symbolic ma-
nipulations of the sort appreciated by calculus students everywhere. Indeed, the
“computational knowledge engine” Wolfram Alpha, which is consulted by millions
of calculus students every day, is based in part on Mathematica. For example,
suppose that we wish to compute the partial fraction expansion of
1
f (x) = .
x2 (x − 1)3 (x + 1)
This is the sort of grueling symbolic computation that is well suited for the com-
puter. We enter
Apart[1/(x^2 (x - 1)^3 (x + 1))]
into Mathematica (perhaps using its more appealing, modern interface) and imme-
diately receive the answer
1/(2 (-1+x)^3)-5/(4 (-1+x)^2)+17/(8 (-1+x))-1/x^2-2/x-1/(8 (1+x)).
There are, of course, more elegant ways to receive the output. However, this is
what users in the late 1980s would have seen on their screens. A nifty feature of
recent versions of Mathematica is the ability to get output in LATEX (see the 1979
entry). A simple cut-and-paste from the Mathematica window provides the LATEX
413
414 1988. MATHEMATICA
source for the answer

1 5 1 2 1 17
f (x) = − 2 − + − − + .
x 4(x − 1)2 2(x − 1)3 x 8(x + 1) 8(x − 1)
Integrals are also easily conquered (see the 1968 entry for information about the
Risch algorithm for symbolic integration). The command
Integrate[Exp[-x^2], {x, -Infinity, Infinity}]
yields the answer Sqrt[Pi] and hence tells us that
∞
√
e−x dx = π.
2
−∞
Of course, Mathematica is not only the backbone of Wolfram Alpha and the
hidden savior of calculus students everywhere. It has long been used for serious
mathematical research. Both authors have used Mathematica computations in their
own research, particularly in number theory, linear algebra, complex analysis, and
statistics. Its flexibility is remarkable:
It is often said that the release of Mathematica marked the beginning of
modern technical computing. Ever since the 1960s individual packages
had existed for specific numerical, algebraic, graphical and other tasks.
But the visionary concept of Mathematica was to create once and for
all a single system that could handle all the various aspects of technical
computing in a coherent and unified way. The key intellectual advance
that made this possible was the invention of a new kind of symbolic
computer language that could for the first time manipulate the very
wide range of objects involved in technical computing using only a
fairly small number of basic primitives. [12]
It does not take much imagination to see how computing software could be
useful in applied mathematics or statistics research. How can Mathematica be
used in pure mathematics research? Suppose that we wanted to explore the prime
numbers. A first step might be to examine the first 100 of them. The command
Table[Prime[n], {n, 1, 100}]
produces the output
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61,
67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137,
139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211,
223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283,
293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379,
383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461,
463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541}
from which we can observe several patterns. Aside from the anomalous primes 2
and 5, each prime number appears to end in 1, 3, 7, or 9. This is explained easily
enough: numbers that end in 0, 2, 4, 5, 6, 8 are divisible by 2 or by 5. Do the primes
have a favorite, among the final digits 1, 3, 7, and 9? The command
Table[Length[
Select[Prime[Range[1000000]],
Mod[#, 10] == {1, 3, 7, 9}[[i]] &]], {i, 1, 4}]
produces
{249934, 250110, 250014, 249940}
This tells us that among the first 1,000,000 primes, there are 249,934 that end in
1, 250,110 that end in 3, and so forth. The split looks remarkably even.
Trying a few different bases would reveal similarly equitable splits. From this, it
is a short step to conjecturing Dirichlet’s theorem on primes in arithmetic progres-
sions (see the 1913 entry). A more detailed analysis could also reveal Chebyshev’s
bias (there are usually slightly more primes of the form 4k + 3 than 4k + 1 up to
a given threshold) [2, 7], or perhaps even the recent observation of Robert Lemke
Oliver and Kannan Soundararajan (1973– ) that the primes have some definite
thoughts about who they sit next to [3, 5]:
Lemke Oliver and Soundararajan saw that in the first billion primes,
a 1 is followed by a 1 about 18% of the time, by a 3 or a 7 each 30% of
the time, and by a 9 22% of the time. They found similar results when
they started with primes that ended in 3, 7 or 9: variation, but with
repeated last digits the least common. The bias persists but slowly
decreases as numbers get larger. [4]
Simply put, computational power can reveal hidden patterns in classical objects
that could not be guessed at otherwise. This can lead to new conjectures about the
observed behavior and eventually new theorems. See the comments for an example
of this method of discovery.
Computer algebra systems can be used to produce startling identities that can
later be verified. This is similar to proofs by induction: you know the answer already
and need only to justify it. As a great example, the Mathematica commands
Sum[Binomial[n, k]^2, {k, 0, n}]
Sum[k Binomial[n, k]^2, {k, 0, n}]
Sum[k^2 Binomial[n, k]^2, {k, 0, n}]
Sum[k^3 Binomial[n, k]^2, {k, 0, n}]
yield the outputs
Binomial[2 n, n]
1/2 n Binomial[2 n, n]
n^2 Binomial[-2 + 2 n, -1 + n]
1/2 n^2 (1 + n) Binomial[-2 + 2 n, -1 + n]
These are the identities
n 2

n 2n
= ,
k n
k=0

n 2
n 1 2n
k = n ,
k 2 n
k=0
n 2
2 n 2 2n − 2
k = n ,
k n−1
k=0
n 2
n 1 2 2n − 2
k3 = n (n + 1) .
k 2 n−1
k=0

Can you prove the identities above by induction? Can you prove them combi-
natorially? Can you prove them using generating functions? Can you find closed
forms for
n 2
n
k4
k
k=0
and

n 2
n
k5 ?
k
k=0
1988: Comments
Wilf–Zeilberger algorithm. We have seen that computers can spit out novel
identities. It would be better if they could provide humanly understandable proofs
of those identities too. In 1990, Herbert Wilf and Doron Zeilberger (1950– ) came
up with an algorithm to do just that [6, 8, 9].
Twin primes and their biases. Here is a true story about how a few Math-
ematica computations led to a new discovery. Recall that a primitive root modulo
n is a generator of the multiplicative group (Z/nZ)× . For example, 2 is a primitive
root modulo 5 since 21 , 22 , 23 , 24 ≡ 2, 4, 3, 1 (mod 5), respectively. If p is prime,
then a theorem of Gauss ensures that (Z/pZ)× has exactly φ(p − 1) primitive roots,
in which φ denotes the Euler totient function (see the 1977 entry).
One day last year, in Professor Stephan Garcia’s Number Theory and
Cryptography class, the lesson took a surprising turn. To make a point
about the use of seemingly random patterns in cryptography, Garcia
had just flashed onto the screen a chart of the first 100 [actually 20]
prime numbers and all of their primitive roots. . . . [10]
Needless to say, the chart (Table 1) was produced by Mathematica. The command
PrimitiveRootList[p] provides a list of the primitive roots of p.
Looking at the chart, Elvis Kahoro ’20 noticed something interesting
about pairs of primes known as “twins”—primes that differ by exactly
two, such as 29 and 31 [apart from 3 and 5]. The smaller of the pair
always seemed to have as many or more primitive roots than the larger
of the two. He wondered if that was always true.
“So I just asked what I thought was a random question,” Kahoro
recalls. It was the kind of curious question he was known for asking all
through his school years, sometimes with unfortunate results. “Some
teachers would get mad at me for asking so many questions that led
us off the topic,” he remembers.
But Garcia took the first-year student’s question seriously. And

the next day, the professor called Kahoro to his office, where he’d been
doing some number-crunching on his computer [with Mathematica].
“It turns out that Elvis’s conjecture is false, but in an astound-
ingly interesting way,” Garcia explains. “There are only two counter-
examples below 10,000. And bigger number-crunching indicated that
his conjecture seemed to be correct 98 percent of the time [see Figure
1].”
Garcia and a frequent collaborator, Florian Luca, then found a
theoretical explanation for the phenomenon, resulting in a paper titled
“Primitive root bias for twin primes [1],” to be published in the journal
Experimental Mathematics, with Kahoro listed as a co-author.
“What I’ve taken away from this,” Kahoro says, “is never to be
afraid to ask questions in class, because you never know where they’ll
lead.” [10]
Table 1. Lists of primitive roots for the first 20 primes.
p primitive roots modulo p

2 1
3 2
5 2, 3
7 3, 5
11 2, 6, 7, 8
13 2, 6, 7, 11
17 3, 5, 6, 7, 10, 11, 12, 14
19 2, 3, 10, 13, 14, 15
23 5, 7, 10, 11, 14, 15, 17, 19, 20, 21
29 2, 3, 8, 10, 11, 14, 15, 18, 19, 21, 26, 27
31 3, 11, 12, 13, 17, 21, 22, 24
37 2, 5, 13, 15, 17, 18, 19, 20, 22, 24, 32, 35
41 6, 7, 11, 12, 13, 15, 17, 19, 22, 24, 26, 28, 29, 30, 34, 35
43 3, 5, 12, 18, 19, 20, 26, 28, 29, 30, 33, 34
47 5, 10, 11, 13, 15, 19, 20, 22, 23, 26, 29, 30, 31, 33, 35, 38, 39, 40, 41, 43, 44, 45
53 2, 3, 5, 8, 12, 14, 18, 19, 20, 21, 22, 26, 27, 31, 32, 33, 34, 35, 39, 41, 45, 48, 50, 51
59 2, 6, 8, 10, 11, 13, 14, 18, 23, 24, 30, 31, 32, 33, 34, 37, 38, 39, 40, 42, 43, 44, 47, 50,
52, 54, 55, 56
61 2, 6, 7, 10, 17, 18, 26, 30, 31, 35, 43, 44, 51, 54, 55, 59
67 2, 7, 11, 12, 13, 18, 20, 28, 31, 32, 34, 41, 44, 46, 48, 50, 51, 57, 61, 63
71 7, 11, 13, 21, 22, 28, 31, 33, 35, 42, 44, 47, 52, 53, 55, 56, 59, 61, 62, 63, 65, 67, 68, 69
Figure 1. The horizontal axis denotes the number of twin primes.

The vertical axis is the ratio of twin prime pairs (p, p+2) for which
p has more primitive roots than p + 2. The ratio hangs stubbornly
near 98%.
For more information about twin primes, see the 1919 and 1923 entries.
Bibliography
[1] S. R. Garcia, F. Luca, and E. Kahoro, Primitive root bias for twin primes, Experimental
Mathematics, in press. https://www.tandfonline.com/doi/full/10.1080/10586458.2017.
1360809.
[2] A. Granville and G. Martin, Prime number races, Amer. Math. Monthly 113 (2006), no. 1,
1–33, DOI 10.2307/27641834. MR2202918
[3] E. Klarreich, Mathematicians Discover Prime Conspiracy, https://www.quantamagazine.
org/mathematicians-discover-prime-conspiracy-20160313.
[4] E. Lamb, Peculiar pattern found in ‘random’ prime numbers: last digits of nearby primes have
‘anti-sameness’ bias, Nature (online), https://www.nature.com/news/peculiar-pattern-
found-in-random-prime-numbers-1.19550.
[5] R. J. Lemke Oliver and K. Soundararajan, Unexpected biases in the distribution of
consecutive primes, Proc. Natl. Acad. Sci. USA 113 (2016), no. 31, E4446–E4454,
DOI 10.1073/pnas.1605366113. http://www.pnas.org/content/pnas/113/31/E4446.full.
pdf. MR3624386
[6] P. Paule and M. Schorn, A Mathematica Version of Zeilberger’s Algorithm for Proving Bi-
nomial Coefficient Identities, J. Symbolic Computation 11 (1994), 1–25.
[7] M. Rubinstein and P. Sarnak, Chebyshev’s bias, Experiment. Math. 3 (1994), no. 3, 173–197.
MR1329368
[8] H. S. Wilf, Computer programs from the book “A = B”, and related programs, https://www.
math.upenn.edu/~wilf/progs.html.
[9] H. S. Wilf and D. Zeilberger, Towards computerized proofs of identities, Bull. Amer. Math.
Soc. (N.S.) 23 (1990), no. 1, 77–83. https://projecteuclid.org/euclid.bams/1183555718.
[10] Staff writer, How to Advance Mathematics By Asking the Right Questions, Pomona College
Magazine, Spring 2018, 20-21. http://magazine.pomona.edu/2018/spring/how-to-advance-
mathematics-by-asking-the-right-questions/.
[11] Wolfram, Wolfram Demonstrations Project, http://demonstrations.wolfram.com/.

[12] Wolfram, The Mathematica Book (Mathematica 5 Documentation, 2003), http://reference.
wolfram.com/legacy/v5/TheMathematicaBook/FrontMatter/0.2.1.html
[13] Wikipedia, Wolfram Mathematica, https://en.wikipedia.org/wiki/Wolfram_Mathematica.
1989
PROMYS
Introduction
In 1989, David Fried and Glenn H. Stevens (1953– ), graduates of Arnold Ross’s
Secondary Science Training Program (see the 1957 entry), cofounded PROMYS
(Programs in Mathematics for Young Scientists). Since then, over 1,000 students
have gone through the program. Currently about 80 high school students each
year come to Boston University for six weeks of challenging mathematics. They
are mentored by top graduate students and faculty drawn from all over the world.
Programs like PROMYS play a key role in exciting students to pursue mathemat-
ics and teaching older students how to mentor, design classes, and develop research
programs. In addition to standard classes and challenging problems, students par-
ticipate in research and attend advanced lectures on topics ranging from “The
Schoenflies Conjecture and Morse Theory” to “Statistical Inference and Modeling
the Unseen: How Bayesian statistics powers Google’s voice search.”
The second named author spoke at PROMYS several times. In 2009, he gave a
talk on heuristics and ballpark estimates. Informal argumentation is an important
skill for aspiring mathematicians to develop. The centennial problem for this year
concerns an application of heuristic reasoning to an old problem in number theory.
The Fermat number s are defined by
n
Fn = 22 + 1.
The first several of these are
F0 = 3, F1 = 5, F2 = 17, F3 = 257, F4 = 65,537. (1989.1)
Notice a pattern? The first three are prime, and a little work shows that F3 and
F4 are prime too. What about F5 = 4,294,967,297? Is it a Fermat prime as
well? The Fermat numbers grow so rapidly that things soon get beyond the realm
of computation. For example, F10 has 309 digits! Pierre de Fermat conjectured
that each Fn is prime, although he was unable to prove this. What does heuristic
reasoning suggest?

Give a heuristic argument for or against the existence of infinitely many Fermat
primes. Does your prediction agree or disagree with the numerical evidence?
421
422 1989. PROMYS
1989: Comments
Why the weird exponent? Some authors consider 2 = 20 + 1 a Fermat
prime because of their preference for the formula 2n + 1 [2]. However, this is not
n
widely adhered to. Why is it that we search for primes of the form 22 + 1 instead
n
of 2 + 1? We start with the identity
xn − 1 = (x − 1)(xn−1 + xn−2 + · · · + x + 1), (1989.2)
which can be confirmed by induction. Then replace x with x/y, multiply by y n ,

and obtain
xn − y n = (x − y)(xn−1 + xn−2 y + · · · + xy n−2 + y n−1 ).
If n is odd, then we set x = 2 and y = −1 and get
xn + 1 = (x + 1)(xn−1 − xn−2 + · · · − x + 1).

k
Suppose that 2m + 1 is prime, in which m = 2k n and n is odd. If x = 22 , then
k
2m + 1 = 22 n
+1
n
=x +1
= (x + 1)(xn−1 − xn−2 + · · · − x + 1)
k k k k
= (22 + 1)(22 (n−1)
− 22 (n−2)
+ · · · − 22 + 1).
k
The factor 22 + 1 is definitely larger than 1 and it is smaller than 2m + 1, unless
n = 1 (in which case the second factor is 1). Thus, the exponent m must be a
power of 2 in order for 2m + 1 to be prime.
Fermat’s conjecture. The state of Fermat’s conjecture is so well known that

we can hardly keep things a secret: Fermat was wrong. Moreover, the heuristic
argument discussed below suggests that he was way off base. In 1732, Leonhard
Euler disproved Fermat’s conjecture when he computed the prime factorization
F5 = 4,294,967,297 = 641 × 6,700,417.
Although this was an impressive computational feat at the time, a modern computer
factors F5 faster than the blink of an eye. The prime factorizations of the Fermat
numbers seem to involve some large primes (this is partially explained by the Euler–
Lucas theorem described at the end of the following section). For example, a few
seconds on a desktop computer reveals the prime factorizations
F6 = 274177 × 67280421310721,
F7 = 59649589127497217 × 5704689200685129054721,
F8 = 1238926361552897
× 93461639715357977769163558199606896584051237541638188580280321.
Prime factorizations are known for only a few more Fermat numbers. No Fermat
primes besides the original five (1989.1) have been found.
Euclid’s theorem revisited. We can use the Fermat numbers to provide

another proof of the infinitude of the primes [1, Ch. 1]. Begin by observing that
n n
Fn − 2 = (22 + 1) − 2 = 22 − 1
n−1 n−1
= (22 − 1)(22 + 1)
n−1
= (22 − 1)Fn−1
2n−2 n−2
= (2 − 1)(22 + 1)Fn−1
n−2
= (22 − 1)Fn−2 Fn−1
..
.
= F0 F1 · · · Fn−1 .
In light of this, Fm divides Fn − 2 whenever m < n. Consequently, any common
divisor of Fm and Fn divides
Fn − F0 F1 · · · Fn−1 = 2.

divisible by Fm
Since Fermat numbers are odd, the preceding tells us that gcd(Fm , Fn ) = 1. Thus,
the Fermat numbers F0 , F1 , F2 , . . . are pairwise relatively prime and hence their
prime factorizations yield infinitely many distinct primes. In fact, this proves that
n
there are at least n primes at most 22 + 1.
The ordered list of prime factors of the Fermat numbers begins with
3, 5, 17, 257, 641, 65537, 114689, 274177, 319489, 974849, 2424833,
6700417, 13631489, 26017793, 45592577, 63766529, 167772161,
825753601, 1214251009, 6487031809, 70525124609, 190274191361,
646730219521, 2710954639361, 2748779069441, 4485296422913,
6597069766657,
according to [3]. How can we be sure of this? What if a large Fermat number is
divisible by a small prime? A result of Euler, later improved by Édouard Lucas
(1842–1891), asserts that every prime factor of Fn is of the form k2n+2 + 1. Thus,
the size of the smallest prime factor of Fn tends to increase rapidly with n. For
example, we can be sure that no Fermat number has a prime factor strictly between
257 and 641 since we have the prime factorizations of all Fn for n = 0, 1, 2, . . . , 11.
Constructible polygons. The ancient Greeks developed methods to con-

struct regular (equilateral and equiangular) n-gons with straightedge and compass
for any n ≥ 3 of the form 2i 3j 5k , in which i ≥ 0 and j, k ∈ {0, 1}. Thus, they could
construct regular n-gons for n = 3, 4, 5, 6, 8, 10, 12, 15, 16 . . .. Can all regular n-gons
be constructed by straightedge and compass? This question vexed mathematicians
for two thousand years.
In 1796, at the age of nineteen, Carl Friedrich Gauss shocked the mathematical
world when he proved that the regular 17-gon was constructible. Folklore holds
that Gauss wanted the heptadecagon inscribed on his tombstone, although this was
regrettably not carried out. Gauss provided the first new constructible regular n-
gon since ancient times, a remarkable feat. Moreover, he also provided sufficient
424 1989. PROMYS
conditions for the constructibility of a regular n-gon. The constructibility of the

heptadecagon boils down to the fact that
"
2π √ √ √ √
16 cos = −1 + 17 + 34 − 2 17 + 2 17 + 3 17 − 170 + 38 17
17
is expressible in terms of integers and square roots [7]. From a Cartesian perspec-
tive, straightedge and compass constructions involve finding the intersections of
lines or circles with other lines or circles in R2 . Thus, one only considers systems
of equations of degree one or two; this leads to expressions that involve rational
numbers and nested square roots.
In 1837, Pierre Wantzel (1814–1848) completed the proof of what is now known
as the Gauss–Wantzel theorem. It states that for n ≥ 3, a regular n-gon is con-
structible with straightedge and compass if and only if n = 2k p1 p2 · · · pr , in which
k ≥ 0 and p1 , p2 , . . . , pr are distinct Fermat primes (either type of factor may be
absent). Consequently, the regular 7- and 9-gons are nonconstructible, whereas the
regular 10- and 17-gons are constructible; see Figure 1.
Heuristic argument. The prime number theorem asserts that the number
of primes at most x is roughly x/ log x. Thus, the density of primes at most x is
about 1/ log x. We therefore model the primes as a random process, in which the
probability that a natural number n is prime is 1/ log n (see the comments for the
1987 entry). Consider the random variable

1 if n is prime,
Xn =
0 otherwise.
The expected number of Fermat primes is
E[XF0 + · · · + XFN ] = E[XF0 ] + · · · + E[XFN ]
by the linearity of expectation. Since
1 1 1
E[XFn ] = = 2n ≤ n ,
log Fn log(2 + 1) 2 log 2
the expected number of Fermat primes is at most
∞
1 1 2
= ≈ 2.88 < 3. (1989.3)
log 2 n=0 2n log 2
Thus, we expect that there are only finitely many Fermat primes. A more sophis-
ticated argument comes to the same conclusion [2].
Our estimate is reasonably close to the presently observed number (five). What
causes the discrepancy? First of all, this is a heuristic argument that proves nothing:
our model could be completely wrong. However, well-composed heuristic arguments
often do point us in the right direction (see the 1987 entry). A more likely culprit is
the bias introduced by small primes. The largest contributions to the sum (1989.3)
come from the smallest Fermat numbers. In this range, the large-scale predictions
afforded by the prime number theorem are swamped by small-scale fluctuations.
For example, the prime number theorem predicts that there are 2/ log 2 ≈ 2.73
primes at most 2, which is absurd.
(a) n = 7 (nonconstructible) (b) n = 9 (nonconstructible)
(c) n = 10 (constructible) (d) n = 17 (constructible)
Figure 1. The regular n-gon with n ≥ 3 is constructible with

straightedge and compass if and only if n = 2k p1 p2 · · · pr , in which
k ≥ 0 and the p1 , p2 , . . . , pr are distinct Fermat primes (either type
of factor may be absent).
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 4th ed., Springer-Verlag, Berlin, 2010.
MR2569612
[2] K. D. Boklan and J. H. Conway, Expect at most one billionth of a new Fermat prime!, Math.
Intelligencer 39 (2017), no. 1, 3–5, DOI 10.1007/s00283-016-9644-3. https://arxiv.org/pdf/
1605.01371.pdf. MR3620166
[3] The On-Line Encyclopedia of Integer Sequences, A023394 (Prime factors of Fermat numbers),
http://oeis.org/A023394.
[5] PROMYS, PROMYS: Program in Mathematics for Young Scientists, http://www.promys.
org/.
[6] Wikipedia, Constructible polygon, https://en.wikipedia.org/wiki/Constructible_polygon.
[7] Wikipedia, Heptadecagon, https://en.wikipedia.org/wiki/Heptadecagon.
1990
The Monty Hall Problem
Introduction
Although it rose to national prominence after Marilyn vos Savant (1946– )
presented it in a 1990 Parade magazine column [3], the famed Monty Hall problem
first appeared in 1975 when it was posed by Steve Selvin (1941– ) in The American
Statistician [4]. His presentation also explains the origin of the problem’s name:
It is “Let’s Make a Deal”—a famous TV show starring Monte Hall.1
Monte Hall: One of the three boxes labeled A, B, and C contains
the keys to that new 1975 Lincoln Continental. The other two are
empty. If you choose the box containing the keys, you win the car.
Contestant: Gasp!
Monte Hall: Select one of these boxes.
Contestant: I’ll take box B.
Monte Hall: Now box A and box C are on the table and here is box
B (contestant grips box B tightly). It is possible the car keys are in
that box! I’ll give you $100 for the box.
Contestant: No, thank you.
Monte Hall: How about $200? Contestant: No!
Audience: No!!
Monte Hall: Remember that the probability of your box containing
the keys to the car is 1/3 and the probability of your box being empty
is 2/3. I’ll give you $500. Audience: No!!
Contestant: No, I think I’ll keep this box.
Monte Hall: I’ll do you a favor and open one of the remaining boxes
on the table (he opens box A). It’s empty! (Audience: applause). Now
either box C or your box B contains the car keys. Since there are two
boxes left, the probability of your box containing the keys is now 1/2.
I’ll give you $1000 cash for your box.
WAIT!!!!
Is Monte right? The contestant knows that at least one of the
boxes on the table is empty. He now knows it was box A. Does this
knowledge change his probability of having the box containing the keys
1 Monty Hall was the stage name of Monte Halparin (1921–2017). Although Selvin’s puzzle is
universally referred to as the “Monty Hall problem,” it is interesting to note that Selvin (perhaps
unintentionally) spelled the stage name “Monty” as “Monte,” which is the host’s actual first name.
427
428 1990. THE MONTY HALL PROBLEM
1 3
2
Figure 1. The contestant is presented with three doors. Behind

one of them is a valuable prize. The other two doors conceal noth-
ing of value. The contestant selects a door, say 1. The host opens
another door, say 2, and shows that it does not conceal the prize.
Thus, the prize is either behind door 1 or 3. Is the contestant
better off switching from door 1 to door 3?
from 1/3 to 1/2? One of the boxes on the table has to be empty. Has
Monte done the contestant a favor by showing him which of the two
boxes was empty? Is the probability of winning the car 1/2 or 1/3?
In most contemporary formulations of the problem, the contestant chooses one
of three doors. One door conceals a valuable prize. Behind the other two doors are
goats, which are presumed to be worthless; see Figure 1. The host opens one of the
other doors and reveals a goat. He gives the contestant the chance to switch to the
remaining door. Should the contestant switch?
How can switching doors possibly help? Each door initially has a 1/3 chance
of holding the prize. After the host opens one of the doors, we know that one of
the two remaining doors conceals the prize. Thus, the chance that either holds the
prize is 1/2. Is this correct? See the comments below for the answer!
Our problem for this year, which also appeared in 1990, is due to philosopher
Arnold Zuboff [7]. It is called the sleeping beauty problem and it is still the source
of spirited arguments. What do you think the answer is?

Proposed by Adam Elga, Princeton University.
Some researchers put you to sleep for two days. While sleeping, they briefly
wake you up either once or twice, depending on the toss of a fair coin (heads once;
tails twice). After each waking, they put you back to sleep with a drug that makes
you forget that waking. When you are first awakened, to what degree ought you
believe that the outcome of the coin toss is heads?
1990: Comments
Resolution of the Monty Hall problem. One good way to build intuition
for the answer is to write a computer program and simulate millions of games.
Computational results can quickly provide evidence for or against a particular an-
swer. Without loss of generality we may assume the contestant always chooses the
first door. Here is an example of such a program in Mathematica (see the 1988
entry).2
success = 0; (* initialize number of successes to 0 *)
For[n = 1, n <= 1000000, n++, (* do one million trials *)
{
(* randomly choose what door prize is behind *)
prizelocation = RandomInteger[{1, 3}];
(* if prize is behind Door 1 host randomly opens a door *)
(* if prize is behind Door 2 host must open Door 3 *)
(* if prize is behind Door 3 host must open Door 2 *)
If[prizelocation == 1, hostopen = RandomInteger[{2, 3}]];
If[prizelocation == 2, hostopen = 3];
If[prizelocation == 3, hostopen = 2];
(* Switch to whatever remaining door the host did not open *)
If[hostopen == 2, choosedoor = 3];
If[hostopen == 3, choosedoor = 2];
(* If prize is behind our new door, increase success counter by 1 *)
If[choosedoor == prizelocation, success = success + 1];
}];
(* we now print out the success rate *)
Print["By switching success rate is ", 100.0 * success/1000000, "\%."];
What is the final result? The output after 1,000,000 trials is
By switching success rate is 66.6787\%.
Although this is not a formal proof, it strongly suggests that switching doors leads
to victory two thirds of the time.
Most people arrive at the incorrect answer; they think that switching doors
should have no impact. Where did we go wrong? What were our hidden assump-
tions? Note that Monty Hall always reveals a goat; in other words, if you choose
door 1 and the prize is behind door 2, he will never open door 2 and show the prize.
Thus, the host does not have complete freedom if there is a goat behind your door;
this happens two thirds of the time. In this case, one of the other two doors hides
a goat while the other hides the prize. Since he cannot reveal the prize, he must
choose the door with the goat; if you switch, you win. If the prize is behind your
door, then the host can open either door. In this case, which happens one third of
the time, you lose if you switch.
All of this agrees with Selvin’s original case-by-case analysis of the problem [4].
He also presented a conditional probability based justification of the 2/3 answer
since many readers were not convinced by his first argument [5]. However, we
think that his original argument is convincing enough; see Table 1.
A discussion of extreme cases can also point you in the right direction. Instead
of three doors, suppose that there are a million doors. You make your pick and the
host opens 999,998 doors in quick succession, revealing goat after goat after goat.
There are now just two doors left. Do you really want to keep your first choice or
would you want to switch? Does it seem plausible that both of the remaining doors
2 RandomInteger[{a,b}] selects an integer uniformly at random from {a, a + 1, . . . , b}.
430 1990. THE MONTY HALL PROBLEM
Table 1. The analysis of the Monty Hall problem presented by

Selvin in 1975 shows that switching doors results in victory 2/3 of
the time [4].
Keys are Contestant Monty Hall Contestant Final

in box chooses box opens box switches result
1 1 2 or 3 1 for 2 or 3 lose
1 2 3 2 for 1 win
1 3 2 3 for 1 win
2 1 3 1 for 2 win
2 2 1 or 3 2 for 1 or 3 lose
2 3 1 3 for 2 win
3 1 2 1 for 3 win
3 2 1 2 for 3 win
3 3 1 or 2 3 for 1 or 2 lose
are equally likely to hide the prize, or does it seem clear that the host was careful
to avoid showing you the prize?
See [2] for more information about the Monty Hall problem and its history.
Getting Erdős’s goat. If you guessed that switching doors would not have an
effect on the Monty Hall problem, you would be in good company. Andrew Vazsonyi
(1916–2003) relates the following anecdote about the legendary Paul Erdős (see the
1913 entry) [6].
. . . I told the problem to the late Paul Erdős, one of the most famous
mathematicians of the century, when he visited my home in 1995.
Erdős was considered by number theorists as one of the greatest experts
in probability theory. In a conversation about the use of probability
theory in decision making, I mentioned the goats and Cadillac problem
and the answer to Erdős, fulling expecting us to move onto the next
subject. But, to my surprise, Erdős said, “No, that is impossible, it
should make no difference.”
Needless to say, whether it is a Cadillac or a Lincoln Continental at stake is irrele-
vant to the problem. The two argued over the problem for a while before Vazsonyi
became frustrated with the situation:
He wanted a straightforward explanation with no decision trees. I gave
up at this point, because I have no common sense explanation. . . .
So I told Erdős, “You don’t know about decision trees so you can’t
understand the solution. Put on your earphones, listen to your music,
and stop bothering me.” (When Erdős appeared in my house, the first
thing he did was unpack his radio and start listening to classical music.
The radio blasted from 5:00 am to midnight. He didn’t seem to be able
to live without it.)
An hour later Erdős came back really irritated. “What’s the mat-
ter with you? Why aren’t you telling me the reason why I should
switch?” I said that I was sorry, but I didn’t have a common sense
explanation and only the decision tree analysis convinces me.
Eventually Erdős was convinced by a numerical simulation, much like the one we
performed above.
Erdős objected that he still did not understand the reason why, but
was reluctantly convinced that I was right. A few days after he left,
he telephoned to say that Ron Graham of AT&T explained to him the
reasoning behind the answer and that now he understood. He pro-
ceeded to tell me the reasoning but I couldn’t fathom his explanation.
See the comments for the 1992 entry for more information about Ron Graham.
The ugly side of mathematics. Although Marilyn vos Savant, who claims
to have the world’s highest IQ, presented the correct answer in her column, she
received a large amount of spiteful and derogatory responses [3]. Here are some of
the totally unnecessary personal attacks that were leveled against her:
Maybe women look at math problems differently than men.3
May I suggest that you obtain and refer to a standard textbook on
probability before you try to answer a question of this type again?
You blew it, and you blew it big! Since you seem to have difficulty
grasping the basic principle at work here, I’ll explain. After the host
reveals a goat, you now have a one-in-two chance of being correct.
Whether you change your selection or not, the odds are the same.
There is enough mathematical illiteracy in this country, and we don’t
need the world’s highest IQ propagating more. Shame!
Bibliography
[1] A. Elga, Self-locating belief and the sleeping beauty problem, Analysis 60 (2000), no. 2, 143–147.
http://www.princeton.edu/~adame/papers/sleeping/sleeping.pdf.
[2] J. Rosenhouse, The Monty Hall problem: The remarkable story of math’s most contentious
brainteaser, Oxford University Press, Oxford, 2009. MR2543995
[3] M. vos Savant, Game Show Problem. http://marilynvossavant.com/game-show-problem/.
[4] S. Selvin, A problem in probability, American Statistician 29 (1975), no. 1, 67.
[5] S. Selvin, On the Monty Hall problem, American Statistician 29 (1975), no. 3, 134.
[6] A. Vazsonyi, Which door has the Cadillac?, Decision Line (1999), Dec./Jan., 17–19.
[7] A. Zuboff, One self: The logic of experience, Inquiry: An Interdisciplinary Journal of Philoso-
phy 33 (1990), no. 1, 39–68.
3 This reader later doubled down and wrote back again: “I still think you’re wrong. There is
such a thing as female logic.”

1991
arXiv
Introduction
This year’s problem honors the founding of the arXiv (http://arxiv.org/) in
1991 by Paul Ginsparg (1955– ). Authors frequently post preliminary versions of
articles on the arXiv, often long before they appear in print. Scientific ideas can be
shared and disseminated in almost real time. Hundreds of new papers are added
to the collection every day.
Although the arXiv originally started as a repository for physics papers, it now
hosts over 1.5 million articles in physics, mathematics, computer science, statistics,
and other fields; see Figures 1, 2, and 3. It provides researchers all over the world
immediate access to newly written papers. The arXiv is now generously supported
by Cornell University, the Simons Foundation, and various member institutions.
Most papers that are posted to the arXiv are eventually submitted for pub-
lication in peer reviewed research journals. However, there are a few exceptional
cases. For example, Grigori Perelman (1966– ) posted three papers on the arXiv
in 2002–2003 that resolved the longstanding Poincaré conjecture [5–7]. Although
he never submitted these papers to journals, experts in the field were still able to
read them and verify his results. Perelman was offered the Fields Medal in 2006,
although he declined (see the 2003 entry).
Although the arXiv was once flooded with fallacious proofs of famous conjec-
tures, they appear less frequently than they did in its early days. In 2004, the
arXiv instituted an endorsement system that requires established authors to vouch
for newcomers before they are allowed to post articles. However, papers that claim
proofs of the Riemann hypothesis, the twin prime conjecture, and so forth occa-
sionally appear. If you are brave, look a few of them up. Find the mistakes in the
proofs (there is almost certainly at least one in each paper) or satisfy yourself that
there are none.

Check every day for new posts to the arXiv. Spend ten minutes each day for a
month skimming the titles of papers, the names and affiliations of the authors, and
the abstracts. Click on and read the papers that sound interesting! Keep abreast
of what is happening in your favorite subfields. Get to know who the players are
and what they study. Get a sense of what topics are popular.
433
434 1991. ARXIV
Figure 1. Number of submissions posted to the arXiv month-

by-month since its inception in August 1991. Data available at
https://arxiv.org/stats/monthly_submissions.
Figure 2. left: New arXiv submissions per year for high energy
physics, condensed matter physics, astrophysics, other physics,
mathematics, computer science, electrical engineering / systems
science, statistics, quantitative biology, and quantitative finance /
economics. right: Fractional submission rates for each subject
area. Source: https://arxiv.org/help/stats/2017_by_area/
index.
1991: Comments
Reinventing the wheel. While it is easy to publish material online, there is
also the need to verify that what you read is correct and to make sure that what
you are doing is truly original work. A cautionary tale concerns Mary M. Tai, a
Figure 3. The cumulative data as a function of time. Source:

https://arxiv.org/help/stats/2017_by_area/index.
medical researcher who published a much talked-about article in a 1994 issue of

Diabetes Care [9]. The stated objective of the article is:
To develop a mathematical model for the determination of total areas
under curves from various metabolic studies.
This is because “estimation of total areas under curves of metabolic studies has
become an increasingly popular tool for evaluating results from clinical trials as
well as research investigations.” All of this should immediately set off some red
flags, since this sounds like basic integral calculus. The main result of the paper is
the trapezoidal rule for numerical integration.
In Tai’s Model, the total area under a curve is computed by dividing
the area under the curve between two designated values on the X-
axis (abscissas) into small segments (rectangles and triangles) whose
areas can be accurately calculated from their respective geometrical
formulas. The total sum of these individual areas thus represents the
total area under the curve.
There were at least a few knowledgeable practitioners in the discipline that set
the story straight. Jane J. Monaco and Randy L. Anderson of the Department of
Public Health Sciences at the Wake Forest University medical school wrote [2]:
We were disturbed to read the article by M. M. Tai. . . . The formula
given is simply the trapezoidal rule, published in many beginning cal-
culus texts. . . it is our understanding that the trapezoidal rule was
known to Isaac Newton in the 17th century.
In a response to one of her critics, Tai wrote [10]:
The concept behind it is obviously common sense, and one does not
have to consult the trapezoid rule to figure it out. The trapezoid rule
is really not Nobel Prize material. . . I never thought of publishing the
model as a great discovery or accomplishment.
436 1991. ARXIV
Of course, this response invites the question of why it proved necessary to publish
the result at all. Tai continued to refer to “Tai’s model,” even after numerous
readers pointed out that it is the trapezoidal rule:
According to Merriam Webster’s Dictionary, a model can be defined
as “a type of design or product;” “a description used to visualize some-
thing that cannot be directly observed;” or “a system of postulates,
data, and inferences presented as a mathematical description of an en-
tity. . . .” Even if Tai’s model were based on the trapezoid rule concept,
according to the definition of a model, I have worked out a “design”
(mathematical expression) for the “structure units” (individual areas)
on my own. In other words, I have presented the original concept into a
functioning mathematical description that can be easily observed and
applied.” [10]
Needless to say, every calculus book in existence presents the trapezoidal rule in a
manner than can easily be applied!
Fashionable nonsense. Jacques Lacan (1901–1981) was an incredibly influ-

ential psychoanalyst whose ideas have left their mark on post-structuralist philoso-
phy, critical theory, linguistics, and even film theory. His ideas are taken seriously
in many circles of the academy. Lacan’s work in applied topology speaks for itself:
This diagram [a Möbius strip] can be considered the basis of a sort
of essential inscription at the origin, in the knot which constitutes
the subject. This goes much further than you may think at first,
because you can search for the sort of surface able to receive such
inscriptions. You can perhaps see that the sphere, that old symbol
for totality, is unsuitable. A torus, a Klein bottle, a cross-cut surface,
are able to receive such a cut. And this diversity is very important
as it explains many things about the structure of mental disease. If
one can symbolize the subject by this fundamental cut, in the same
way one can show that a cut on a torus corresponds to the neurotic
subject, and on a cross-cut surface to another sort of mental disease.
[3, pp. 192–193]
On the complex unit i, whose square is −1, he says:
Thus, by calculating that signification according to the algebraic method
used here, namely:
S(signifier) √
= s (the statement), with S = (−1), produces s = −1.
s(signified)
. . . Thus the erectile organ comes to symbolize the place of jouissance,
not in itself, or even in the form of an image, but as a part lacking
√
in the desired image: that is why it is equivalent to the −1 of the
signification produced above, of the jouissance that it restores by the
coefficient of its statement to the function of the lack of signifier −1.
[4, pp. 316–320]
This work was addressed by physicist Alan Sokal (1955– ) in a famous 1996 hoax;
see the 1996 entry and the book Fashionable Nonsense: Postmodern Intellectuals’
Abuse of Science [8].
Bibliography
[1] arXiv preprint server, http://arxiv.org/.
[2] J. H. Monaco and R. L. Anderson, Tai’s formula is the trapezoidal rule, Diabetes Care
17 (1994), no. 10, 1224.
[3] J. Lacan, Of structure as an inmixing of an otherness prerequisite to any subject whatever,
The Languages of Criticism and the Sciences of Man (edited by R. Macksey and E. Donato),
Johns Hopkins Press, 1970, 186–200.
[4] J. Lacan, The subversion of the subject and the dialectic of desire in the Freudian uncon-
scious, In Merits: A Selection (translated by A. Sheridan), Norton, 1977, 292–325.
[5] G. Perelman, The entropy formula for the Ricci flow and its geometric applications, https://
arxiv.org/abs/math/0211159.
[6] G. Perelman, Ricci flow with surgery on three-manifolds, https://arxiv.org/abs/math/
0303109.
[7] G. Perelman, Finite extinction time for the solutions to the Ricci flow on certain three-
manifolds, https://arxiv.org/abs/math/0307245.
[8] A. Sokal and J. Bricmont, Fashionable Nonsense: Postmodern Intellectuals’ Abuse of Science,
Picador, 1999.
[9] M. M. Tai, A mathematical model for the determination of total area under glucose toler-
ance and other metabolic curves, Diabetes Care 17 (1994), no. 2, 152–154, http://care.
diabetesjournals.org/content/17/2/152.
[10] M. M. Tai, Reply from Mary Tai, Diabetes Care 17 (1994), no. 10, 1226–1228.
1992
Monstrous Moonshine
Introduction
A subgroup N of a finite group G is normal if N = gN g −1 for each g ∈ G. If N
is normal in G, then the quotient group G/N is well-defined. If N is nontrivial, then
G/N has smaller order than G itself and is, in principle, more tractable to study.
A simple group is a group that contains no proper, nontrivial normal subgroups.
For example, cyclic groups of prime order are simple, as are the alternating groups
An for n ≥ 5. The finite simple groups are the building blocks of all finite groups
in the sense that they cannot be “broken down” further.
The classification of finite simple groups, a monumental human achievement
spanning thousands of journal pages and hundreds of articles, was completed in
2004 (see the comments for the 2004 entry and the problem for 1968). In short,
the finite simple groups fall into eighteen infinite families along with twenty six
so-called “sporadic groups,” the largest of which is the monster group, M , which
has order
246 · 320 · 59 · 76 · 112 · 133 · 17 · 19 · 23 · 29 · 31 · 41 · 47 · 59 · 71 ≈ 8.08 × 1053 .
To put things in perspective, there are
26! ≈ 4.03 × 1026
permutations of the English alphabet. This is less than the square root of the order
of the monster group! There are approximately 1050 atoms in the Earth [3], still
comfortably less than the number of elements in M . Of course, there are useful
numbers that dwarf even the order of the monster group; see the comments below
on Graham’s number.
The monster group is notoriously difficult to compute with. How does one
describe its elements? The smallest faithful representation of M over the complex
field has dimension 196,884. This means that the smallest dimension n for which
there is an injective homomorphism1 φ : M → GLn (C) is n = 196,884. In contrast,
the alternating group A44 on 44 letters is larger than the monster group, since
|A44 | = 44!/2 ≈ 1.33 × 1054 .
However, we can represent any permutation on 44 letters faithfully using 44 × 44
permutation matrices. Even more extreme, the dihedral group of order 2n (no
matter how large n is) can be faithfully represented using 2×2 matrices: write down
1 Thus, M is isomorphic to its image φ(M ) in GLn (C), the group of n × n invertible complex
matrices. This reduces the study of the abstract group M to the study of a collection of matrices,
a general process that can be helpful for computations with finite groups.
439
440 1992. MONSTROUS MOONSHINE
Figure 1. Plot of Re j(τ ) for − 32 ≤ Re τ ≤ 3

2 and 0 < Im τ ≤ 2.
the linear transformations that represent the action of the group on the regular n-
gon in R2 . This tells us that the monster group is really quite complicated. We
simply (pun intended) cannot accurately encode it using relatively small matrices.
The term “monstrous moonshine” refers to a connection between the monster
group and the j-invariant of Felix Klein (1849–1925), a remarkable function on the
upper half-plane in C that is related to non-Euclidean geometry, elliptic curves,
and analytic number theory; see Figure 1. To explain this connection requires a bit
of setup.
A function f meromorphic on the upper half-plane is (elliptic) modular if it
satisfies
aτ + b
f (τ ) = f
cτ + d
whenever Im τ > 0 and a, b, c, d are integers with ad − bc = 1, and it enjoys a
Laurent series expansion of the form
∞

f (q) = a(n)q n ,
n=−m
2πiτ
in which q = e . One can show that every rational function of j is a modular
function and, conversely, that every modular function is a rational function of j.
In 1978, John McKay (1939– ) observed that the first few coefficients in the
expansion
j(q) = q −1 + 744 + 196,884q + 21,493,760q 2 + 864,299,970q 3
+ 20,245,856,256q 4 + 333,202,640,600q 5 + · · ·
are expressible as integral linear combinations of the dimensions rn of the irreducible

representations2 of the monster group M . For example,
1 = r1 ,
196,884 = r1 + r2 ,
21,493,760 = r1 + r2 + r3 ,
864,299,970 = 2r1 + 2r2 + r3 + r4 ,
20,245,856,256 = 3r1 + 3r2 + r3 + 2r4 + r5
= 2r1 + 3r2 + 2r3 + r4 + r6 ,
333,202,640,600 = 5r1 + 5r2 + 2r3 + 3r4 + 2r5 + r7
= 4r1 + 5r2 + 3r3 + 2r4 + r5 + r6 + r7 ,
and so forth [10]. The numbers involved are so large3 that one suspects that
these identities cannot be mere coincidence. In 1979, John Horton Conway and
Simon P. Norton (1952– ) coined the term “monstrous moonshine” to reflect both
the monster group and the (apparent) improbability of such a connection. The
conjectured connection between the j-invariant and the monster group led to the
discovery of several analogous relationships between modular functions and group
theory.
Richard Borcherds (1959– ) proved the Conway–Norton conjectures in 1992
and earned a Fields Medal for this work. One of the main elements of his argument
was the construction of a Z2 -graded Lie algebra on which M acts. As a result of his
proof, the relationship between the two mathematical objects is now understood
as follows: there is a vertex operator algebra called the moonshine module, first
explicitly constructed by Igor Frenkel (1952– ), James Lepowsky (1944– ), and Arne
Meurman (1956– ), that has M as its automorphism group and the j-invariant as
its graded dimension function. The underlying similarities of the two seemingly
unrelated topics comes from conformal field theory (field theory that is invariant
under conformal transformations), a theory that is used in modeling statistical
mechanics, string theory, and condensed matter physics.

Proposed by Blake Mackall and Steven J. Miller, Williams College.
It is fascinating that the particular number 246 · 320 · 59 · 76 · 112 · 133 · 17 · 19 ·
23 · 29 · 31 · 41 · 47 · 59 · 71 corresponds to the size of an interesting group. There are
fifteen primes that appear in its factorization:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 41, 47, 59, 71. (1992.1)
How many distinct products of powers of these fifteen numbers exist that yield a
number within a factor of 100 of the Monster’s size? What if we instead allow
ourselves to use all primes at most 71?
2 There are certain dimensions, r , for which one can find a homomorphism φ : M → GL (C)
n rn
that cannot be “decomposed” in the sense that the only subspaces of Crn that are invariant under
every φ(g) are {0} and Crn itself.
3 The coefficients grow rapidly. One can use the circle method (see the 1923 entry) to show
√
e4π n
that the coefficient of q n in the Laurent series expansion for j(q) is asymptotic to √ .
2n3/4
1992: Comments
Numbers with fixed prime factors. What can be said about the sequence
1 = n1 < n2 < · · · of natural numbers whose prime factors are among the list
(1992.1)? The sequence begins promisingly enough:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39,
40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 59,
60, 62, 63, 64, 65, 66, 68, 69, 70, 71, 72, 75, 76, 77, 78, 80, 81, 82,
84, 85, 87, 88, 90, 91, 92, 93, 94, 95, 96, 98, 99, 100
but skips a few numbers. Does it contain most of the natural numbers?
Axel Thue (1863–1922) proved that if one starts with any finite set of primes,
then limi→∞ (ni+1 − ni ) = ∞; that is, the gaps between terms in the sequence tend
to infinity [7]. In particular, the sequence above contains relatively few natural
numbers in the big scheme of things. A more quantitative version is due to Robert
Tijdeman (1943– ), who proved that there is a constant C, which depends only
upon the initial (finite) list of primes, such that
ni
ni+1 − ni >
(log ni )C
for ni ≥ 3 [8].
Graham’s number. Consider the following problem in Ramsey theory. An n-

dimensional hypercube has 2n vertices. For example, a two-dimensional hypercube
is a square, which has four vertices. A three-dimensional hypercube is a cube, in
the traditional sense, which has eight vertices, and so forth; see Figure 2. Connect
each pair of vertices and obtain a complete graph on 2n vertices. Now assign one of
two colors to each edge. What is the smallest dimension d for which any such two-
coloring contains a monochromatic, complete subgraph on four coplanar vertices
(see Figure 3)?
It is known that d ≥ 13. In unpublished work, Ronald Graham established an
upper bound for d that makes the size of the monster group pale in comparison. This
number, now known as Graham’s number, was popularized in Martin Gardner’s
Scientific American column (see the 1914 entry) in November 1977 [5].
(a) n = 2 (b) n = 3 (c) n = 4
Figure 2. The vertices of an n-dimensional hypercube (projected

into two-dimensional Euclidean space).
(a) A 2-coloring of a three-dimensional cube. (b) A monochromatic 4-vertex coplanar com-

plete subgraph.
Figure 3. Illustration of the Ramsey theory problem from which

Graham’s number arises (image public domain).
We start with Donald Knuth’s up-arrow notation for positive integers [11]; this
is related to Ackermann’s function from the 1926 entry. We can view multiplication
as iterated addition: ab is b copies of a under addition. Along these lines, a ↑ b is
defined to be ab , that is, b copies of a under multiplication. We can define
a ↑↑ b = a ↑ (a ↑ (· · · ↑ a)),

b − 1 up arrows
so that
3
3 ↑↑ 2 = 33 = 27 and 3 ↑↑ 3 = 33 = 7,625,597,484,987, (1992.2)
and so forth. But why stop there? We can define
a ↑↑↑ b = a ↑↑ (a ↑↑ (· · · ↑↑ a)),

b − 1 double up arrows
and so forth. Graham’s number involves 64 layers of ever larger up-arrowing

⎧
⎪
⎨a
b
if n = 1,
a↑ b = 1
n
if n ≥ 1 and b = 0,
⎪
⎩ n−1
a↑ (a ↑ (b − 1)) otherwise.
n
As (1992.2) suggests, g1 = 3 ↑↑↑↑ 3 is already outrageously huge. If we let gn =

3 ↑gn−1 3 for n ≥ 2, then Graham’s number is g64 . The following description gives
a rough idea of its magnitude:
Graham’s number is much larger than many other large numbers such
as Skewes’s number. . . it is so large that the observable universe is far
too small to contain an ordinary digital representation of Graham’s
number, assuming that each digit occupies one Planck volume, pos-
sibly the smallest measurable space. But even the number of digits
in this digital representation of Graham’s number would itself be a
number so large that its digital representation cannot be represented

in the observable universe. Nor even can the number of digits of that
number—and so forth, for a number of times far exceeding the total
number of Planck volumes in the observable universe. [9]
A video explanation, by Graham himself, is [6]. See the 1930 entry to learn more
about Ramsey theory and the 1933 entry for information about Skewes’s number.
Bibliography
[1] R. E. Borcherds, Monstrous moonshine and monstrous Lie superalgebras, Invent. Math. 109
(1992), no. 2, 405–444, DOI 10.1007/BF01232032. MR1172696
[2] J. H. Conway and S. P. Norton, Monstrous moonshine, Bull. London Math. Soc. 11 (1979),
no. 3, 308–339, DOI 10.1112/blms/11.3.308. http://blms.oxfordjournals.org/content/11/
3/308.full.pdf+html. MR554399
[3] Dr. FermiGuy, Physics Questions People Ask Fermilab, http://www.fnal.gov/pub/science/
inquiring/questions/atoms.html.
[4] T. Gannon, Monstrous moonshine: the first twenty-five years, Bull. London Math. Soc.
38 (2006), no. 1, 1–33, DOI 10.1112/S0024609305018217. http://arxiv.org/pdf/math/
0402345v2.pdf. MR2201600
[5] M. Gardner, Mathematical games, Scientific American 237 (1977), November, 18–28.
[6] R. Graham and B. Haran, How big is Graham’s number?, https://www.youtube.com/watch?
v=GuigptwlVHo.
[7] A. Thue, Selected mathematical papers, with an introduction by Carl Ludwig Siegel and a
biography by Viggo Brun; edited by Trygve Nagell, Atle Selberg, Sigmund Selberg, and Knut
Thalberg, Universitetsforlaget, Oslo, 1977. MR0460050
[8] R. Tijdeman, On integers with many small prime factors, Compositio Math. 26 (1973),
319–330. MR0325549
[9] Wikipedia, Graham’s number, https://en.wikipedia.org/wiki/Graham’s_number.
[10] Wikipedia, Monstrous moonshine, http://en.wikipedia.org/wiki/Monstrous_moonshine.
[11] Wikipedia, Knuth’s up-arrow notation, https://en.wikipedia.org/wiki/Knuth’s_up-
arrow_notation.
1993
The 15-Theorem
Introduction
Lagrange’s four-square theorem, proved by Joseph-Louis Lagrange in 1770,
says that every positive integer is the sum of four perfect squares (in which zero is
considered a square). For example, 1993 is expressible as a sum of four squares in
many different ways, such as
1993 = 442 + 72 + 22 + 22 = 422 + 122 + 72 + 62

= 332 + 302 + 22 + 02 = 242 + 242 + 212 + 202
= 432 + 122 + 02 + 02 = 322 + 222 + 222 + 12 .
Lagrange’s theorem was refined in 1834 by Carl Gustav Jacob Jacobi (1804–1851),
who proved what is now known as Jacobi’s four-square theorem: the number r4 (n)
of representations
n = a2 + b2 + c2 + d2 ,
in which a, b, c, d are integers, is

⎧
⎪
⎪ 24 m if n is even,
⎪
⎪
⎪
⎨ m|n
m odd
r4 (n) =
⎪
⎪
⎪
⎪ 8 m if n is odd.
⎪
⎩
m|n
There are several ways to generalize Lagrange’s four-square theorem. One

might consider a different number of squares. We considered sums of two squares
in the 1966 entry and refer the reader there for further details. In light of Lagrange’s
theorem, we focus on sums of three squares. Adrien-Marie Legendre (1752–1833)
proved that a natural number n is the sum of three squares if and only if n is not
of the form 4i (8j + 7). Thus, any number not on the list
7, 15, 23, 28, 31, 39, 47, 55, 60, 63, 71, 79, 87, 92, 95, 103, 111,
112, 119, 124, 127, 135, 143, 151, 156, 159, 167, 175, 183, 188,
191, 199, 207, 215, 220, 223, 231, 239, 240, 247, 252, 255, 263,
271, 279, 284, 287, 295, 303, 311, 316, 319, 327, 335, 343, . . .
is the sum of three squares [7]. In fact, Gauss later proved that if n ≥ 5 is square
free, then the number r3 (n) of representations n = x2 + y 2 + z 2 , in which x, y, z are
445
446 1993. THE 15-THEOREM
integers, is
⎧
⎨12h(−4n) if n ≡ 1, 2, 5, 6 (mod 8),
⎪
r3 (n) = 24h(−n) if n ≡ 3 (mod 8),
⎪
⎩
0 if n ≡ 7 (mod 8),
in which h(x) denotes the class number of x [10] (see the 1966 entry for more about
class numbers).
Instead of focusing only on sums of squares, we can learn even more by studying
quadratic forms in several variables. We take our inspiration from the identity
x21 + x22 + x23 + x24 = xT Ix, (1993.1)
in which x = [x1 x2 x3 x4 ]T ∈ R4 and I denotes the 4 × 4 identity matrix. Other

quadratic forms arise if we replace I with a more general matrix A. For example,
the most general binary quadratic form is
b
! !
a x1
ax21 + bx1 x2 + cx22 = [x1 x2 ] b
2 .
2 c x2
We say that a quadratic form Q(x) = xT Ax is universal if Q represents every

natural number; that is, for each n ∈ N, there is an integer vector x so that
Q(x) = n. For example, Lagrange’s four-square theorem asserts that the form
(1993.1) is universal. On the other hand, x21 + x22 + x23 is not universal because it
does not represent 7.
In 1993, John Horton Conway and William Schneeberger proved the 15-theorem.
This remarkable result asserts that if
Q(x) = xT Ax (1993.2)
is a quadratic form with positive definite, integral matrix A, then Q represents all
positive integers if and only if it represents the numbers 1, 2, . . . , 15 [12]. In fact,
one can replace this list with
1, 2, 3, 5, 6, 7, 10, 14, 15.
The restriction that A has integer entries is nontrivial. For example, the quadratic
form
! !
2 2 1 12 x1
x1 + x1 x2 + x2 = [x1 x2 ] 1 (1993.3)
2 1 x2
has integer coefficients, but its corresponding matrix has noninteger off-diagonal
entries. The original proof of the 15-theorem was not published, although Fields
Medalist Manjul Bhargava gave a simpler proof in 2000 [1].
In 1916, Ramanujan provided a list of fifty-five “diagonal” quartic forms (1993.2)
that he claimed exhausts the positive-definite, universal forms in four variables [9].
Figure 1. The nth triangular number Tn = n(n + 1)/2 is the

number of balls in a triangular array whose base consists of n
balls.
To be more specific, they correspond to diagonal matrices A with diagonal

(1, 1, 1, d), 1 ≤ d ≤ 7,
(1, 1, 2, d), 2 ≤ d ≤ 14,
(1, 1, 3, d), 3 ≤ d ≤ 6,
(1, 2, 2, d), 2 ≤ d ≤ 7,
(1, 2, 3, d), 3 ≤ d ≤ 10,
(1, 2, 4, d), 4 ≤ d ≤ 14,
(1, 2, 5, d), 5 ≤ d ≤ 10.
The 15-theorem can be used to show that Ramanujan was almost correct: his only
mistake was the erroneous inclusion of the quadratic form
x21 + 2x22 + 5x23 + 5x24 ,
which omits the value 15. Thus, the tuple (1, 2, 5, 5) should not have been included
in his list. Nevertheless, Ramanujan (as always) displayed remarkable foresight and
was well ahead of his time.

Proposed by Scott Duke Kominers, Harvard University.
(a) The nth triangular number is Tn = n(n + 1)/2; see Figure 1. Prove that every
positive integer can be represented in the form Tp + Tq + Tr .
(b) Prove that every positive
√ the form pp + 3qq, in
integer can be represented in √
which p = p1 + p2 −2 (p1 , p2 ∈ Z) and √ q = q1 + q2 −2 (q1 , q2 ∈ Z) are
algebraic integers in the quadratic field Q( −2).
(c) Characterize the set of natural numbers that are not of the form x2 +y 2 +10z 2 .
1993: Comments
The 290-theorem. In 2008, Bhargava and Jonathan P. Hanke proved the
290-theorem, which asserts that a quadratic form (1993.2) with integer coefficients
and positive definite matrix A is universal if and only if it assumes the values
1, 2, . . . , 290 [12]. As before, we can replace this list by something smaller:

1, 2, 3, 5, 6, 7, 10, 13, 14, 15, 17, 19, 21, 22, 23, 26, 29, 30, 31,
34, 35, 37, 42, 58, 93, 110, 145, 203, 290.
The 290-theorem handles quadratic forms, such as (1993.3), that the 15-theorem
does not address.
Triangular numbers. Do not feel bad if you have trouble with part (a) of
the centennial problem. It is a famous result of Gauss from 1796 and it is closely
related to the difficult problem of representing an integer as a sum of three squares.
Indeed, if
n = Tp + Tq + Tr ,
then a little algebra shows that
8n + 3 = (2p + 1)2 + (2q + 1)2 + (2r + 1)2 .
There are many things that can be said about triangular (and more generally,
polygonal) numbers [11]. The reader is invited to deduce the correct definition of
an n-polygonal number; look at Figure 1 for inspiration. If you wish to check your
answer, the generating function for the n-polygonal numbers is

x (n − 3)x + 1
Gn (x) = .
(1 − x)3
Fermat’s polygonal number theorem, stated by Fermat in 1638, asserts that for
k = 3, 4, . . ., each natural number n is the sum of k k-polygonal numbers (as usual,
zero summands are permitted). The case n = 3 is Gauss’s theorem and the n = 4
case is Lagrange’s four-square theorem. Fermat’s theorem, for which he gave no
proof, was finally proved by Cauchy in 1813 [14].
Ramanujan’s ternary quadratic form. Ramanujan observed several curi-

ous properties of the quadratic form x2 + y 2 + 10z 2 that appears in part (c) of
the centennial problem [9]. The even numbers that are not represented by this
quadratic form are of the form 4j (16k + 6). Moreover, the odd numbers
3, 7, 21, 31, 33, 43, 67, 79, 87, 133, 217, 219, 223, 253, 307, 391, . . .
that are not represented by this form are not easily characterized [13]. Two other
numbers, 679 and 2,719, were later added to Ramanujan’s list. In 1997, Ken Ono
and Kannan Soundararajan (1973– ) conjectured that the odd natural numbers
that are not of the form x2 + y 2 + 10z 2 are
3, 7, 21, 31, 33, 43, 67, 79, 87, 133, 217, 219, 223, 253, 307, 391, 679, 2719.
More importantly, they show that if the generalized Riemann hypothesis is true,
then their conjecture holds [8, Thm. 3].
Positivity. In engineering applications one often encounters multivariable

functions that one hopes are nonnegative. For example, is
f (x, y) = 5x2 + xy 2 + 6xy + 4y 2
nonnegative for all real x, y? Yes, because
f (x, y) = (2x + y)2 + (xy)2 + 2y 2 + (x + y)2
happens to be a sum of squares. These sums-of-squares decompositions have re-

cently found applications in the field of self-driving cars. Briefly, the idea is to
encode the journey so that if a certain function is nonnegative, then there are no
collisions [3]. If the function can be represented as a sum of squares of polynomials,
then it is nonnegative and the path is a safe one. To be practical for such real-world
problems, it is not enough to be able to compute that the polynomial has such a
decomposition; we must be able to rapidly certify that such a decomposition ex-
ists. The main idea is to replace computationally slow semidefinite programming
problems with a series of linked linear programming problems (see the 1947 entry).
Bibliography
[1] M. Bhargava, On the Conway-Schneeberger fifteen theorem, Quadratic forms and their ap-
plications (Dublin, 1999), Contemp. Math., vol. 272, Amer. Math. Soc., Providence, RI,
2000, pp. 27–37, DOI 10.1090/conm/272/04395. http://www.maths.ed.ac.uk/~aar/books/
dublin.pdf. MR1803359
[2] J. H. Conway, Universal quadratic forms and the fifteen theorem, Quadratic forms and their
applications (Dublin, 1999), Contemp. Math., vol. 272, Amer. Math. Soc., Providence, RI,
2000, pp. 23–26, DOI 10.1090/conm/272/04394. http://www.maths.ed.ac.uk/~aar/books/
dublin.pdf. MR1803358
[3] K. Hartnett, A classical math problem gets pulled into self-driving cars, Quanta Maga-
zine, May 23, 2018. https://www.quantamagazine.org/a-classical-math-problem-gets-
pulled-into-the-modern-world-20180523/
[4] M.-H. Kim, Recent developments on universal forms, Algebraic and arithmetic theory of
quadratic forms, Contemp. Math., vol. 344, Amer. Math. Soc., Providence, RI, 2004, pp. 215–
228, DOI 10.1090/conm/344/06218. MR2058677
[5] S. D. Kominers, On universal binary Hermitian forms, Integers 9 (2009), A02,
6, DOI 10.1515/INTEG.2009.002. http://www.emis.de/journals/INTEGERS/papers/j2/j2.
pdf. MR2475630
[6] I. Niven, H. S. Zuckerman, and H. L. Montgomery, An Introduction to the Theory of Numbers,
Wiley, 2008.
[7] The On-Line Encyclopedia of Integer Sequences, A004215 (numbers that are the sum of 4
but no fewer nonzero squares), https://oeis.org/A004215.
[8] K. Ono and K. Soundararajan, Ramanujan’s ternary quadratic form, Invent. Math. 130
(1997), no. 3, 415–454, DOI 10.1007/s002220050191. http://link.springer.com/article/
10.1007%2Fs002220050191. MR1483991
[9] S. Ramanujan, On the expression of a number in the form ax2 + by 2 + cz 2 + du2 , Proc. Camb.
Phil. Soc. 19 (1916), 11–21.
[10] Wolfram MathWorld, Sum of squares function, http://mathworld.wolfram.com/
SumofSquaresFunction.html.
[11] Wolfram MathWorld, Polygonal number, http://mathworld.wolfram.com/PolygonalNumber.

html.
[12] Wikipedia, 15 and 290 theorems, https://en.wikipedia.org/wiki/15_and_290_theorems.
[13] Wikipedia, Ramanujan’s ternary quadratic form, https://en.wikipedia.org/wiki/
Ramanujan’s_ternary_quadratic_form.
[14] Wikipedia, Fermat polygonal number theorem, https://en.wikipedia.org/wiki/
Fermat_polygonal_number_theorem.
1994
AIM
Introduction
In 1994 John Fry (1944– ), cofounder of the Fry’s Electronics chain, funded the
creation of AIM, the American Institute of Mathematics1 [1]. AIM was located in
Palo Alto, California, for many years before moving to its present location in San
Jose. The institute’s stated mission is:
To advance mathematical knowledge through collaboration, to broaden
participation in the mathematical endeavor, and to increase the aware-
ness of the contributions of the mathematical sciences to society.
Since 2002, AIM has been one of eight institutions that are part of the National
Science Foundation’s Mathematical Sciences Institute Program [5]. The others are:
• Institute for Advanced Study (IAS) in Princeton, NJ,
• Institute for Computational and Experimental Research in Mathematics (ICERM)
in Providence, RI,
• Institute for Mathematics and its Applications (IMA) in Minneapolis, MN,
• Institute for Pure and Applied Mathematics (IPAM) in Los Angeles, CA,
• Mathematical Biosciences Institute (MBI) in Columbus, OH,
• Mathematical Sciences Research Institute (MSRI) in Berkeley, CA,
• Statistical and Applied Mathematical Sciences Institute (SAMSI) in Research
Triangle Park, NC.
These institutes bring together mathematicians and foster long-term collaborations.
One of AIM’s most effective and popular methods for nurturing collaborative work
is the SQuaREs program:
The purpose of AIM’s research program called SQuaREs (Structured
Quartet Research Ensembles) is to allow a dedicated group of four to
six mathematicians to spend a week at AIM in San Jose, California,
with the possibility of returning in following years. A SQuaRE could
arise as a followup to an AIM workshop, or it could be a freestanding
activity. AIM will provide both the research facilities and the financial
support for each SQuaRE group.
There are so many good questions arising from work at AIM that it is hard to
select just one. We have chosen an easily stated problem with a long and storied his-
tory. Moreover, it connects not only to Hilbert’s tenth problem (see the 2005 entry)
1 Full disclosure: the first named author has served on the human resources board of AIM
since 2008. Both authors have led workshops at AIM over the years.
451
452 1994. AIM
and Sage (see the 2005 entry), but it also forms a segue into Fermat’s last theorem,
the topic of our next entry. See https://aimath.org/news/congruentnumbers/
for more information.

What positive integers n are the areas of a right triangle with rational sides?
In other words, solve the system of equations
1
a2 + b2 = c2 and ab = n, (1994.1)
2
in which a, b, c are rational and n is a positive integer.
1994: Comments
A trillion triangles. An n ≥ 1 for which (1994.1) has a rational solution
(a, b, c) is a congruent number . The centennial problem above is the famed congru-
ent number problem. The first few congruent numbers are
5, 6, 7, 13, 14, 15, 20, 21, 22, 23, 24, 28, 29, 30, 31, 34, 37, 38,
39, 41, 45, 46, 47, 52, 53, 54, 55, 56, 60, 61, 62, 63, 65, 69, 70, 71,
77, 78, 79, 80, 84, 85, 86, 87, 88, 92, 93, 94, 95, 96, 101, 102, 103,
109, 110, 111, 112, 116, 117, 118, 119, 120, 124, 125, 126 [6].
For example, 5 is the area of the right triangle with sides

20 3 41
(a, b, c) = , , .
3 2 6
Although early Islamic mathematicians identified the congruent numbers
5, 6, 14, 15, 21, 30, 34, 65, 70, 110, 154, 190, 210, 221, 231, 246, 290, 390, 429, 546,
they missed many of the examples above [2]. It is not easy to determine whether
a given number is congruent or not. The first congruent number omitted in the
second list, 7, is congruent because it is the area of the right triangle with sides

24 35 337
(a, b, c) = , , .
5 12 60
Where does AIM come in? In 2009, a team of mathematicians supported by
AIM succeeded in determining all of the congruent numbers up to one trillion [2].
Long story short: there are 3,148,379,694 of them in that range. An AIM press
release declared [1]:
Mathematicians from North America, Europe, Australia, and South
America have resolved the first one trillion cases of an ancient mathe-
matics problem. The advance was made possible by a clever technique
for multiplying large numbers. The numbers involved are so enormous
that if their digits were written out by hand they would stretch to the
moon and back. The biggest challenge was that these numbers could
not even fit into the main memory of the available computers, so the
researchers had to make extensive use of the computers’ hard drives.
Two teams, each using different software and hardware, arrived at the same
results (one group used Sage, the focus of our 2005 entry). A critical role was played
by the fast Fourier transform (see the 1965 entry), which can be used to multiply
two n-bit numbers in O(n log n log log n) time.
Congruent numbers and Pythagorean triples. Every right triangle with

rational sides gives rise to infinitely many congruent numbers. For example, the
(3, 4, 5)-triangle, which has area 6, gives rise to right triangles with side lengths
(3k, 4k, 5k) and area 6k2 for k = 1, 2, . . .. Are there infinitely many congruent
numbers whose associated triangles are not similar?
The substitution x = a/c and y = b/c provides a bijection between rational
solutions (a, b, c) with c = 0 to a2 + b2 = c2 and rational solutions to
x2 + y 2 = 1. (1994.2)
The preceding equation has the solution (1, 0), from which we can construct all
other rational solutions; see Figure 1. Consider the line through (1, 0) with slope t;
that is,
y = tx − t.
Substitute this into (1994.2) and obtain
(1 + t2 )x2 + 2t2 x = 1 − t2 .
The quadratic equation implies that x = 1, which leads to the known solution (1, 0)
of (1994.2), or
t2 − 1
x = 2 ,
t +1
which leads to 2
t −1 2t
, .
t2 + 1 t2 + 1

x(t), y(t)
(1, 0)
Figure 1. Parametrizing the rational solutions to x2 + y 2 = 1.

454 1994. AIM
This is a rational solution to (1994.2) if and only if t is rational. If we set t =

m/n, in which m, n are integers, and clear the resulting denominators, we obtain
Pythagorean triples
(m2 − n2 , 2mn, m2 + n2 )
and congruent numbers mn(m2 − n2 ). In particular, if n = 1 and m = p is prime,
we have a triple of the form
(p2 − 1, 2p, p2 + 1)
and associated congruent number
p(p2 − 1) = (p − 1)p(p + 1).
Moreover, no two such triples are rational multiples of each other and hence we
have a family of congruent numbers, no two of which are obtained from similar
right triangles.
Congruent numbers and elliptic curves. There is a beautiful connection

between congruent numbers and elliptic curves [3]. For n ≥ 1, the maps
2
nb 2n2 x − n2 2nx x2 + n2
(a, b, c) → , and (x, y) → , ,
c−a c−a y y y
provide bijections between the solution sets of (1994.2) and
y 2 = x3 − n2 x, (1994.3)
in which y = 0.2 Moreover, these maps send rational solutions to rational solutions.
Thus, a positive rational number n is a congruent number if and only if the elliptic
curve (1994.3) has a rational point with y = 0. The AIM press release tells us:
In 1982 Jerrold Tunnell of Rutgers University made significant progress
by exploiting the connection (first used by Heegner) between congruent
numbers and elliptic curves, mathematical objects for which there is
a well-established theory. He found a simple formula for determining
whether or not a number is a congruent number. This allowed the
first several thousand cases to be resolved very quickly. One issue
is that the complete validity of his formula (therefore also the new
computational result) depends on the truth of a particular case of
one of the outstanding problems in mathematics known as the Birch
and Swinnerton-Dyer conjecture. That conjecture is one of the seven
Millennium Prize Problems posed by the Clay Math Institute with a
prize of one million dollars. [1]
What is Tunnell’s theorem? Let

An = {(x, y, z) ∈ Z3 | n = 2x2 + y 2 + 32z 2 },

Bn = {(x, y, z) ∈ Z3 | n = 2x2 + y 2 + 8z 2 },

Cn = {(x, y, z) ∈ Z3 | n = 8x2 + 2y 2 + 64z 2 }, and

Dn = {(x, y, z) ∈ Z3 | n = 8x2 + 2y 2 + 16z 2 }.
2 The solutions (0, 0), (n, 0), and (0, n) to (1994.3) correspond to a = c, which is not attainable
by a right triangle with sides (a, b, c).

If n is a congruent number, then 2|An | = |Bn | if n is odd and 2|Cn | = |Dn | if n

is even. Moreover, if the Birch and Swinnerton-Dyer conjecture is true for curves
of the form (1994.3), then n is a congruent number whenever the corresponding
equality holds [7, 9].
How does this help? Since the quantities that define the sets An , Bn , Cn , Dn are
nonnegative for all x, y, z ∈ Z, the cardinalities |An |, |Bn |, |Cn |, |Dn | can be found
through an exhaustive search. For example, a short computation confirms that
|A41 | = 16 and |B41 | = 32. Assuming the Birch and Swinnerton-Dyer conjecture,
we conclude (rightly) that 41 is a congruent number. On the other hand, |A43 | =
|B43 | = 12, so 43 is not a congruent number.
Bibliography
[1] AIM, A trillion triangles, https://aimath.org/news/congruentnumbers/.
[2] R. Bradshaw, W. B. Hart, D. Harvey, G. Tornaria, and M. Watkins, Congruent number theta
coefficients to 1012 , http://homepages.warwick.ac.uk/~masfaw/congruent.pdf.
[3] K. Conrad, The congruent number problem, Harvard College Mathematical Review 2 (2008),
no. 2, 58–73.
[4] S. J. Miller, Extending the Pythagorean formula, talk online at http://youtu.be/
idIHcgapMG4 (slides at https://web.williams.edu/Mathematics/sjmiller/public_html/
math/talks/GeneralizingPythagoras.pdf).
[5] National Science Foundation, Mathematical sciences institutes, https://mathinstitutes.org/
institutes/.
[6] The On-Line Encyclopedia of Integer Sequences, A003273 (Congruent numbers: positive in-
tegers n for which there exists a right triangle having area n and rational sides), https://
oeis.org/A003273.
[7] J. B. Tunnell, A classical Diophantine problem and modular forms of weight 3/2, Invent.
Math. 72 (1983), no. 2, 323–334, DOI 10.1007/BF01389327. MR700775
[8] Wikipedia, Congruent number, https://en.wikipedia.org/wiki/Congruent_number.
[9] Wikipedia, Tunnell’s theorem, https://en.wikipedia.org/wiki/Tunnell’s_theorem.
1995
Fermat’s Last Theorem
Introduction
In 1637, Pierre de Fermat wrote the following statement in the margin of his
copy of Diophantus’s Arithmetica (Figure 1):
Cubum autem in duos cubos, aut quadratoquadratum in duos quadra-
toquadratos & generaliter nullam in infinitum ultra quadratum potes-
tatem in duos eiusdem nominis fas est dividere cuius rei demonstra-
tionem mirabilem sane detexi. Hanc marginis exiguitas non caperet.
In English, this reads
It is impossible to separate a cube into two cubes, or a fourth power into
two fourth powers, or in general, any power higher than the second,
into two like powers. I have discovered a truly marvelous proof of this,
which this margin is too narrow to contain.
Although it appears unlikely that Fermat found a simple and correct proof,1 the
conjecture became known as Fermat’s last theorem. In modern terminology it states
that if n ≥ 3, then there are no solutions in natural numbers x, y, z to
xn + y n = z n . (1995.1)
Although various special cases of Fermat’s last theorem were handled over the
years, a complete proof remained elusive (in contrast, Fermat’s last theorem for
polynomials is significantly easier; see the 1981 entry). Many mathematicians,
great and small, chipped away and some proved various special cases. The great
David Hilbert excused himself by saying, “Before beginning I should have to put
in three years of intensive study, and I haven’t that much time to squander on a
probable failure” (however, he must have squandered a little time on it, since he
found a new proof in the case n = 4).
The year 1995 saw the publication of papers by Andrew Wiles (1953– ) [12] and
by Richard Taylor (1962– ) and Wiles [11] that finally put Fermat’s last theorem
to rest. The big announcement came in 1993 during a series of lectures delivered
by Wiles at the Isaac Newton Institute in Cambridge. However, a serious issue was
soon found that threatened to undermine his proof. He teamed up with Taylor, his
former student, and they eventually succeeded in filling the gap. Their work built
upon the foundations laid by several generations of mathematicians that connected
the problem to the theory of elliptic curves (see the 1921 entry and the comments
for the 1956 entry). While Fermat’s result has held mathematicians’ interest for
1 Fermat proved the special case n = 4. If he were in a possession of a complete proof,
this would not have been necessary. He probably never thought that people would obsess over a
comment he made to himself in the margin of a book.
457
458 1995. FERMAT’S LAST THEOREM
Figure 1. (left) Fermat found himself unable to write his proof

in the space next to Problem II.8 of the 1621 edition of Diophan-
tus’s Arithmetica (this is not Fermat’s copy). (right) The 1670
edition of Diophantus’s Arithmetica, prepared by Clément-Samuel
Fermat after the death of his father. The statement of Fermat’s
last theorem is near the bottom third of the page. Images in the
public domain.
centuries, the method of proof was at least as important as the final result since it
yielded many important results in active areas of research.
Where does one start such a difficult and imposing problem? First observe that
if (x, y, z) ∈ N3 is a solution to (1995.1), then
(xn/d )d + (y n/d )d = (z n/d )d
whenever d divides n. Thus, we obtain solutions in natural numbers to the Fermat
equation with exponent d. Since there are solutions to (1995.1) if n = 1 and n = 2,
it suffices to show that there are no solutions if n = 4 or if n is an odd prime. This
is a significant reduction!
The case n = 3 was handled by Euler in 1770, although many independent
proofs followed over the years. The case n = 5 was dispatched by Legendre and
Dirichlet around 1825. Gabriel Lamé (1795–1870) settled the case n = 7 in 1839,
followed shortly thereafter by a proof of Victor-Amédée Lebesgue2 (1791–1875).
2 Not to be confused with Henri Lebesgue (1875–1941) of measure and integration fame.
Lamé’s proof made use of the clever identity

(x + y + z)7 − (x7 + y 7 + z 7 )
7 8
= 7(x + y)(x + z)(y + z) (x2 + y 2 + z 2 + xy + xz + yz)2 + xyz(x + y + z) .
However, such ad hoc methods appeared unlikely to permit the conjecture to be
proved for larger and larger odd prime exponents.
A major breakthrough occurred in 1849 when Ernst Kummer (1810–1893)
proved Fermat’s last theorem for so-called “regular” primes. In brief, if p ≥ 3
is prime and ζp is a primitive pth root of unity, then the class number of the pth
cyclotomic field Q(ζp ) is a positive integer that measures the extent to which unique
prime factorization fails in Z[ζp ] (we encountered a similar notion in the 1966 entry
in the context of imaginary quadratic fields). A prime p is regular if it does not
divide the class number of Q(ζp ). Kummer’s used Lamé’s factorization

p
zp − yp = (z − ζpj y)
j=1
and studied the ideals generated by the z − ζpj y in Z[ζp ].

Kummer also found an elementary characterization of regular primes in terms
of the Bernoulli numbers Bn . These are defined by
n
n+1
B0 = 1 and Bk = 0. (1995.2)
k
k=0
One can show that Bn = 0 for odd n ≥ 3 and that
∞
t Bn n
= t . (1995.3)
et − 1 n=0
n!
The first few Bernoulli numbers of even index are
1 1 1 1 5 691
B2 = , B4 = − , B6 = , B8 = − , B10 = , B12 = − .
6 30 42 30 66 2730
These arise in the computation of the values of the Riemann zeta function at the
even positive integers (comments for the 1945 entry):
(−1)n+1 (2π)2n
ζ(2n) = B2n .
2(2n)!
Although little is known about the Bernoulli numerators, an 1840 theorem of Karl
von Staudt (1798–1867) and Thomas Clausen (1801–1885) tells us a lot about the
denominators. They independently showed that
1
B2n + ∈ Z,
p prime
p
(p−1)|2n

and hence the denominator of B2n in lowest terms divides (p−1)|2n p.
Kummer proved that an odd prime p is regular if and only if p does not divide
the numerator of Bn , written in lowest terms, for all even n ≤ p − 3. Although
this does not solve Fermat’s problem outright, it does permit the rapid verification
of the conjecture for certain exponents. Indeed, Bn can be computed readily from
either the recurrence (1995.2) or the generating function (1995.3). This permits
460 1995. FERMAT’S LAST THEOREM
one to rapidly determine whether a given prime is regular or not. The first several
regular primes are [7]
3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 41, 43, 47, 53, 61, 71, 73, 79, 83,
89, 97, 107, 109, 113, 127, 137, 139, 151, 163, 167, 173, 179, 181,
191, 193, 197, 199, 211, 223, 227, 229, 239, 241, 251, 269, 277,
281, 313, 317, 331, 337, 349, 359, 367, 373, 383, 397, 419, 431.
Kummer’s theorem tells us that Fermat’s last theorem is true for these exponents.
Unfortunately, it is not known whether infinitely many regular primes exist,
although Carl Ludwig Siegel (1896–1981) conjectured that infinitely many exist
and, moreover, that they have density e−1/2 ≈ 0.60653 as a subset of the primes
[8]. On the other hand, there are infinitely many irregular primes, that is, primes for
which Kummer’s approach to Fermat’s last theorem is not applicable. This seems
to have first been proved in 1915 by Johan Ludwig Jensen (1859–1925), although
many authors cite the 1954 paper of Leonard Carlitz (1907–1999) [2].
A result as monumental as Fermat’s last theorem deserves two problems. The
first problem below was originally from the 1995 entry, while the second was from
the 1949 entry (in the process of converting these entries to a book, we had the
opportunity to move and combine some material).

Proposed by the students in Frank Morgan’s “The Big Questions” class
at Williams College (Fall 2008) and Minh-Tam Trinh, Princeton Uni-
versity.
(a) The status of Fermat’s last theorem for rational exponents is known [1]. What
about real exponents? Are there positive integral solutions to xr + y r = z r for
r real? If yes, can you give a nice example?
(b) The following is called Kummer’s congruence. If p is prime, h, k are positive
even integers not divisible by p − 1, and h ≡ k (mod(p − 1)), then
Bh Bk
≡ (mod p).
h k
Use Kummer’s congruence and the Clausen–von Staudt theorem to show that
if n is a product of irregular primes and 2n < B2n , then there is an irregular
prime p n. With more work, one can build on this and prove there are
infinitely many irregular primes.
1995: Comments
Sophie Germain primes. A prime p is a Sophie Germain prime if 2p +
1 is prime. These are named after Sophie Germain (1776–1831), a remarkable
mathematician, physicist, and philosopher. She proved that if p is such a prime,
then the only natural number solutions to xp + y p = z p have p|xyz [9]. See [5,
Ch. 14] for a circle-method argument (see the 1923 entry) that suggests the number
of Sophie Germain primes at most x is asymptotic to C2 x/ log2 x, in which C2 =
0.660161815 . . . is the twin primes constant (1919.4).
Solution to (a). The Fermat equation with rational exponents is settled in

[1, Thm. 1]: the equation
xn/m + y n/m = z n/m ,
in which m, n are relatively prime natural numbers with n > 2, has solutions in
natural numbers x, y, z if and only if x = y = z, m is divisible by 6, and three
different sixth roots of unity are used. Let f (t) = 4t + 5t − 6t . Since
42 + 52 > 62 and 43 + 53 < 63 ,
f (2) > 0 and f (3) < 0. The intermediate value theorem3 provides r ∈ (2, 3) so that
f (r) = 0; that is, 4r + 5r = 6r . In fact, r ≈ 2.48794 and, moreover, this exponent
is irrational in light of the theorem above. See [6] for more information about the
Fermat equation with real exponents.
Solution to (b). Choose a prime p that divides the numerator of B2n /(2n),
written in lowest terms. The Clausen–von Staudt theorem ensures that (p − 1) 2n,
so the division algorithm gives 2n = (p − 1)q + r, in which 0 < r < p − 1 must be
even. Kummer’s congruence implies that
B2n Br
≡ (mod p).
2n r
Thus, p divides the numerator of Br /r when it is written in lowest terms. Therefore,
p divides the numerator of Br and hence p is irregular.
Bibliography
[1] C. D. Bennett, A. M. W. Glass, and G. J. Székely, Fermat’s last theorem for rational expo-
nents, Amer. Math. Monthly 111 (2004), no. 4, 322–329, DOI 10.2307/4145241. MR2057186
[2] L. Carlitz, Note on irregular primes, Proc. Amer. Math. Soc. 5 (1954), 329–331, DOI
10.2307/2032249. MR0061124
[3] K. Devlin, F. Gouvêa, and A. Granville, Fermat’s last theorem, a theorem at last, FOCUS,
August 1993, 3–5. http://www.dms.umontreal.ca/~andrew/PDF/FLTatlast.pdf.
[4] F. Q. Gouvêa, “A marvelous proof ”, Amer. Math. Monthly 101 (1994), no. 3, 203–222, DOI
10.2307/2975598. MR1264001
[6] F. Morgan, Fermat’s last theorem for fractional and irrational exponents, College Math. J.
41 (2010), no. 3, 182–185, DOI 10.4169/074683410X488647. MR2656314
[7] Online Encyclopedia of Integer Sequences, A007703 (Regular primes), http://oeis.org/
A007703.
[8] C. L. Siegel, Zu zwei Bemerkungen Kummers (German), Nachr. Akad. Wiss. Göttingen
Math.-Phys. Kl. II 1964 (1964), 51–57. MR0163899
[9] A. van der Poorten, Notes on Fermat’s last theorem, Canadian Mathematical Society Series
of Monographs and Advanced Texts, A Wiley-Interscience Publication, John Wiley & Sons,
Inc., New York, 1996. MR1373197
[10] S. Singh, Fermat’s enigma: The epic quest to solve the world’s greatest mathematical problem,
with a foreword by John Lynch, Walker and Company, New York, 1997. MR1491363
[11] R. Taylor and A. Wiles, Ring-theoretic properties of certain Hecke algebras, Ann. of Math.
(2) 141 (1995), no. 3, 553–572, DOI 10.2307/2118560. MR1333036
[12] A. Wiles, Modular elliptic curves and Fermat’s last theorem, Ann. of Math. (2) 141 (1995),
no. 3, 443–551, DOI 10.2307/2118559. MR1333035
3 See the comments for the 1927 and 1944 entries for two more applications of this theorem.
1996
Great Internet Mersenne Prime Search (GIMPS)
Introduction
Cramér’s probabilistic model of the primes (see the comments to the 1989 entry)
predicts that a large natural number n has roughly a 1/ log n chance of being prime.
This heuristic suggests that the expected number of primes in a set A ⊆ N is
1
.
log a
a∈A
Are there infinitely many primes among the Mersenne numbers Mn = 2n − 1?

These are named after Marin Mersenne (1588–1648), who compiled a (somewhat
inaccurate) list of primes Mn with n ≤ 257. Since
∞
∞ ∞ ∞
1 1 1 1
= n − 1)
> n
= (log 2)
n=1
log Mn n=1
log(2 n=1
log 2 n=1
n
diverges, there is cause for optimism. However, like Treebeard we must not be
hasty. A similar computation suggests that there are infinitely many primes of
the form 2n , which is absurd. Some sort of adjustment must be made in order to
fine-tune such predictions.
We must first address the fact that not all n are treated equally by our sequence.
If n = ab and 1 < a, b < n, then (1989.2) provides the factorization
2n − 1 = 2ab − 1 = (2a )b − 1

= (2a − 1) (2a )b−1 + (2a )b−2 + · · · + 2a + 1 .
Thus, we may restrict our attention to Mp = 2p − 1, in which p is prime. A prime
of this form is called a Mersenne prime. If we update our heuristic argument to
reflect this restriction, we obtain a sum over the primes
1 1
= (log 2) ,
p
log Mp p
p
which diverges (a famous result of Euler; see the comments for the 1913 entry).
Perhaps there are infinitely many Mersenne primes? The values of p ≤ 1,000 that
produce Mersenne primes are
2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607
and there are currently (as of mid-2018) fifty known Mersenne primes. While the
search continues, it remains an open problem whether the number of Mersenne
primes is infinite. Even widely accepted conjectures, such as the Bateman–Horn
463
464 1996. GREAT INTERNET MERSENNE PRIME SEARCH (GIMPS)
conjecture (see the comments for the 2005 entry), are not refined enough to handle
the distribution of primes arising from nonpolynomial functions, such as 2n − 1.
For many years,
2127 − 1 = 170,141,183,460,469,231,731,687,303,715,884,105,727
was the largest known prime. It was shown to be prime by Édouard Lucas (1842–
1891) in 1876 and it will forever remain the largest prime ever found without the use
of a computer. The status of M67 , however, remained in doubt until 1903. Mersenne
claimed that it was prime, but Lucas proved that this is not the case. However, he
was unable to produce any of its factors. The following curious anecdote concerns
Frank Nelson Cole (1861–1926), whom the prestigious Cole Prizes in algebra and
number theory are named after:
At a mathematical meeting in New York in 1903, F. N. Cole walked
on to the platform and, without saying a single word, wrote two large
numbers on the blackboard. He multiplied them out in longhand, and
equated the result to 267 − 1. (Subsequently, in private, Cole said
that those few minutes at the blackboard had cost him three years of
Sundays.) So Mersenne was wrong about his ninth case: p = 67 does
not yield a prime number. . . . [5]
Perhaps Cole would be disappointed to know that his factorization
M67 = 147,573,952,589,676,412,927 = 193,707,721 × 761,838,257,287
can be found on a late-2013 desktop computer in less than 0.002 seconds!
In 1996, George Woltman (1957– ) started the Great Internet Mersenne Prime
Search (GIMPS) project.1 This distributed computing project operates on thou-
sands of participating computers around the world. Since its inception, every new
Mersenne prime that has been discovered was discovered by GIMPS. As of mid-
2018, the largest known prime is M77,232,917 , discovered in late-2017 by the GIMPS
program (see below), which has 23,249,425 digits. Anyone with a computer can
join in the effort to find new Mersenne primes and there is a monetary reward for
doing so: the Electronic Frontier Foundation (EFF) offers prizes of
• $150,000 to the first individual or group who discovers a prime number with at
least 100,000,000 decimal digits;
• $250,000 to the first individual or group who discovers a prime number with at
least 1,000,000,000 decimal digits [4].
Why is so much focus placed on Mersenne primes? Special algorithms and
the binary nature of computer architecture make Mersenne numbers particularly
tractable. The factoring code works in three phases to determine whether Mp =
2p − 1 is prime. First, one eliminates all possible small factors. This relies on the
fact that any factor of a Mersenne number must be of the form f = 2kp + 1 with
f ≡ 1, 7 (mod 8). This eliminates about 95% of the potential factors. Once any
potential small factors have been ruled out, GIMPS turns to the Lucas–Lehmer
primality test (see the comments below).
1 Not to be confused with the popular GNU Image Manipulation Program (GIMP), an open-
source alternative to Photoshop.

We have far too much to say about Mersenne primes to fit in one entry; see the
comments for the 1997 entry for more information.

Proposed by Steven J. Miller and Pamela Mishkin, Williams College.
This was a particularly hard year to choose a single problem for. It was neck and
neck between GIMPS and Google’s PageRank algorithm, so here are two problems:
(a) See [1] for one of the earliest papers on PageRank. Fittingly, we leave it as an
exercise to navigate the internet and find out more about PageRank.
(b) Find a new Mersenne prime. One approach to this problem is to visit https://
www.mersenne.org/, download GIMPS, and let your computer search for
Mersenne primes in the background.
1996: Comments
Lucas–Lehmer primality test. Trial division is impractical for determining
whether a large number n is prime. It suffices to check for prime factors at most
√ √
n since in any factorization n = ab, not both factors can be larger than n. If
n ≈ 10500 , then we would need to divide n by every prime at most 10250 . The prime
number theorem tells us that there are approximately 1.74 · 10247 such primes. How
bad is this? To put this in perspective, there are about 1082 atoms in the observable
universe [10]. If each atom were a universe itself, each atom of which was actually
a supercomputer capable of 1020 divisions per second and running since the big
bang (13.82 billion years ago), we would have only completed
1082 × 1082 × 1020 × 13.82 × 365 × 24 × 60 × 60 ≈ 4.36 × 10197
trial divisions. So how can we possible know, with absolute certainty, that a given
Mersenne number is truly prime?
The Lucas–Lehmer primality test, developed by Lucas in 1856 and subsequently
refined by Derrick Henry Lehmer (1905–1991), is an efficient way to test Mersenne
numbers for primality. If p is prime and

4 if i = 0,
si =
si−1 − 2 if i ≥ 1,
2
then Mp is prime if and only if sp−2 ≡ 0 (mod Mp ). Fortunately, repeated squaring

can be performed rapidly in modular arithmetic, especially when powers of 2 are
involved. The ability to exponentiate huge numbers quickly is one of the key reasons
why the RSA cryptosystem is practical; see the 1977 entry and [3].
Mersenne almost primes. We know that n must be prime for Mn to be

prime. For example, M2 , M3 , M5 , and M7 are prime. However, M11 = 2,047 =
23 · 89 is the product of exactly two primes. This leads to a natural question: how
often is Mn the product of exactly two primes? More generally, for a fixed k how
often is Mn the product of exactly k primes? There has been some progress on
these and related questions. For example, one can show that if Mn has exactly two
distinct prime factors, then n = 4 or 6, or there is an odd prime p such that n = p
466 1996. GREAT INTERNET MERSENNE PRIME SEARCH (GIMPS)
or n = p2 . Try to prove this; see [6] for a proof, as well as a characterization for
three distinct prime factors.
The Sokal affair. The year 1996 marks the publication of the landmark paper
Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quan-
tum Gravity by physicist Alan Sokal [8]. With such a lofty title, one might expect
the article to have deep philosophical reflections about the potential unification of
quantum mechanics and gravity, long considered a “holy grail” in physics. It con-
tains nothing of the sort but is instead composed of rambling and largely nonsensical
passages such as:
More recently, Lacan’s topologie du sujet has been applied fruitfully to
cinema criticism and to the psychoanalysis of AIDS. In mathematical
terms, Lacan2 is here pointing out that the first homology group of the
sphere is trivial, while those of the other surfaces are profound; and this
homology is linked with the connectedness or disconnectedness of the
surface after one or more cuts. Furthermore, as Lacan suspected, there
is an intimate connection between the external structure of the physical
world and its inner psychological representation qua knot theory: this
hypothesis has recently been confirmed by Witten’s derivation of knot
invariants (in particular the Jones polynomial) from three-dimensional
Chern–Simons quantum field theory.
This load of fetid dingo’s kidneys was published by Social Text, a leading journal
in postmodern cultural studies.3 What was Sokal’s motivation for this prank? He
provides the following explanation on his website [9]:
For some years I’ve been troubled by an apparent decline in the stan-
dards of intellectual rigor in certain precincts of the American academic
humanities. . . . So, to test the prevailing intellectual standards, I de-
cided to try a modest (though admittedly uncontrolled) experiment:
Would a leading North American journal of cultural studies. . . publish
an article liberally salted with nonsense if (a) it sounded good and (b)
it flattered the editors’ ideological preconceptions?
The answer, unfortunately, is yes. . . .
Throughout the article, I employ scientific and mathematical con-
cepts in ways that few scientists or mathematicians could possibly take
seriously. . . . I assert that Lacan’s psychoanalytic speculations have
been confirmed by recent work in quantum field theory. Even nonsci-
entist readers might well wonder what in heavens’ [sic] name quantum
field theory has to do with psychoanalysis; certainly my article gives
no reasoned argument to support such a link. . . .
In sum, I intentionally wrote the article so that any competent
physicist or mathematician (or undergraduate physics or math major)
would realize that it is a spoof. Evidently the editors of Social Text felt
comfortable publishing an article on quantum physics without bother-
ing to consult anyone knowledgeable in the subject.
2 See the comments for the 1991 entry for more about Jacques Lacan.
3 Continuing the holy grail theme, one might say that the editors of the journal “chose poorly.”
Bibliography
[1] S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, Computer
Networks and ISDN Systems 30 (1998), 107–117. http://infolab.stanford.edu/~backrub/
google.html.
[2] C. K. Caldwell, Mersenne Primes: History, Theorems and Lists, http://primes.utm.edu/
mersenne/index.html.
MR3098499
[4] Electronic Frontier Foundation, EFF Cooperative Computing Awards, https://www.eff.org/
awards/coop.
[5] N. Gridgeman, The search for perfect numbers, New Scientist 334 (1963), 86–88.
[6] A. Lemos and A. Cambraia Junior, On the number of prime factors of Mersenne numbers,
http://arxiv.org/abs/1606.08690.
[7] GIMPS Homepage. http://www.mersenne.org/.
[8] A. D. Sokal, Transgressing the Boundaries: Toward a Transformative Hermeneutics of Quan-
tum Gravity, Social Text (1996), no. 46/47, 217–252.
[9] A. D. Sokal, A Physicist Experiments With Cultural Studies, http://www.physics.nyu.edu/
faculty/sokal/lingua_franca_v4/lingua_franca_v4.html.
[10] Universe Today, How many atoms are there in the universe?, https://www.universetoday.
com/36302/atoms-in-the-universe/.
[11] Wikipedia, Lucas-Lehmer primality test, https://en.wikipedia.org/wiki/Lucas-Lehmer
primality test.
[12] Wikipedia, Mersenne prime, https://en.wikipedia.org/wiki/Mersenne_prime.
[13] Wikipedia, Sokal affair, https://en.wikipedia.org/wiki/Sokal_affair.
1997
The Nobel Prize of Merton and Scholes
Introduction
In addition to applications in the physical sciences, mathematics plays a key
role in many other fields, including economics and finance. While there is no true
Nobel Prize in Economics, since 1968 the Royal Swedish Academy of Sciences has
awarded the Bank of Sweden Prize in Economic Sciences in Memory of Alfred
Nobel. This is widely regarded as the “Nobel Prize in Economics” by the general
public. The award announcement from 1997 [2] begins:
Robert C. Merton [1944– ] and Myron S. Scholes [1941– ] have, in
collaboration with the late Fischer Black [1938–1995], developed a pi-
oneering formula for the valuation of stock options. Their methodology
has paved the way for economic valuations in many areas. It has also
generated new types of financial instruments and facilitated more effi-
cient risk management in society.
Sadly, Black passed away before the announcement and did not receive the award
since it is not given posthumously.
They begin with a stochastic model
dS
= μ dt + σ dW,
S
Figure 1. Brownian motion of a particle in one spatial dimension

(vertical axis). The horizontal axis represents time.
469
470 1997. THE NOBEL PRIZE OF MERTON AND SCHOLES
in which S is the stock price at time t and W is a Wiener process, that is, Brownian
motion (Figure 1). Intuitively, this says that the infinitesimal rate of return on S
has expected value μ dt and variance σ 2 dt. From here, one makes a few reasonable
assumptions, performs a number of manipulations, and deduces that
∂V 1 ∂2V ∂V
+ σ 2 S 2 2 + rS − rV = 0,
∂t 2 ∂S ∂S
in which V is the option price function. This is the famed Black–Scholes equation,
which can be solved numerically when given suitable boundary conditions [1].
When one considers the trillions of dollars traded annually in the global econ-
omy, the impact and importance of such mathematics is clear. See the 1962 entry
and the notes below for another connection between mathematics and Nobel Prize
winning economics applications.
No note on applications of mathematics in finance would be complete without
a mention of the dangers of using formulas in regimes in which they are not known
to hold. Famed investor Warren Buffett (1930– ) said in 2008:
I believe the Black–Scholes formula, even though it is the standard for
establishing the dollar liability for options, produces strange results
when the long-term variety are being valued. . . . The Black–Scholes
formula has approached the status of holy writ in finance . . . . If the
formula is applied to extended time periods, however, it can produce
absurd results. In fairness, Black and Scholes almost certainly under-
stood this point well. But their devoted followers may be ignoring
whatever caveats the two men attached when they first unveiled the
formula.
For a description of the faulty mathematics and incorrect assumptions that helped
instigate the “great recession,” see [3].

The density of a normal random variable with mean μ and variance σ 2 is
1 (x−μ)2
fμ,σ (x) = √ e− 2σ2 .
2πσ 2
A key ingredient in applications of the Black–Scholes model is the corresponding
cumulative distribution function1
x
Fμ,σ (x) = fμ,σ (t) dt.
−∞
Unfortunately, there is no closed-form expression for this function. Find a rapidly
convergent series expansion for Fμ,σ (x).
1997: Comments
Solution to the problem. For simplicity, we assume that μ = 0 and σ = 1.
Hence we must compute
x
1 2
F (x) = √ e−t /2 dt;
−∞ 2π
1 This is related to the error function by 1
2
[1 + erf( x−μ
√ )].
σ 2
see Figure 2. A natural approach is to use the series expansion for the exponential
function:
∞ ∞ x
x
1 (−1)n t2n (−1)n
F (x) = √ dt = √ t2n dt.
−∞ 2π n=0 2n n! n=0
2n n! 2π
−∞
However, this/ interchange of integration and summation is not permissible since

x
each integral −∞ t2n dt is infinite! How do we work around this problem?
The symmetry of f about the origin ensures that F (0) = 12 . If we let
x
1 2
G(x) = √ e−t /2 dt,
0 2π
then
⎧
⎨ 1 + G(x) if x ≥ 0,
2
F (x) =
⎩ 1 − G(|x|) if x ≤ 0.
2
We can write this more compactly as
1
F (x) = + sgn(x)G(|x|),
2
in which
⎧
⎪
⎨1 if x > 0,
sgn(x) = 0 if x = 0,
⎪
⎩
−1 if x < 0,
Figure 2. Graphs of f = f0,1 √ and F = F0,1 . The density f

reaches its peak at f (0) = 1/ 2π = 0.3989 . . .. The cumulative
distribution function F satisfies F (0) = 12 , which reflects the sym-
metry of f about the origin.
is the sign function. Since each integrand that appears is nonnegative and has finite
integral, the Fubini–Tonelli theorem implies that
∞
1 (−1)n t2n
x
G(x) = √ dt
0 2π n=0 2n n!
∞ x
(−1)n
= √ t2n dt
2n n! 2π
n=0 0
∞
(−1)n x2n+1
= √ .
n=0
2n (2n
+ 1)n! 2π
For any fixed x, the series converges rapidly due to the factorial in the denominator.
The Leontief input-output model. A much simpler economic model that

also won a Nobel Prize in Economics (1973) was developed by Wassily Leon-
tief (1906–1999) around 1949. Consider an economy that consists of n sectors,
S1 , S2 , . . . , Sn . Each sector interacts with the others in complicated ways. This
can be quantified with a detailed economic study and assembled in a consumption
matrix , an n × n matrix C = [c1 c2 . . . cn ] whose (i, j) entry Ci,j is the amount
Sj consumes from Si in order to produce one unit of output. The column vector
cj contains the demands of the jth sector required to produce one unit of output.
In addition to the sectors S1 , S2 , . . . , Sn , suppose that there is another part
of the economy, the open sector , that only consumes. It might represent con-
sumer demand, government consumption, surplus production, exports, and so forth.
Let d ∈ Rn be the final demand vector, which lists the amounts demanded from
S1 , S2 , . . . , Sn by the open sector. Is there a production vector x ∈ Rn that lists
the outputs x1 , x2 , . . . , xn of sectors S1 , S2 , . . . , Sn for one year so that the amounts
produced balance the total demand for that production?
Since the sectors all interact with each other every step of the way, the rela-
tionship between the final demand and production targets is complicated:
amt. produced = intermediate

demand + final demand
.

x ? d
The intermediate demands upon each sector are given by

⎡ ⎤
x1
⎢ x2 ⎥
⎢ ⎥
x1 c1 + x2 c2 + · · · + xn cn = [c1 c2 . . . cn ] ⎢ . ⎥ = Cx.
⎣ .. ⎦
xn
Thus,
amt. produced = intermediate

demand + final demand
,

x Cx d
or, equivalently, x = Cx + d. This is equivalent to the system of linear equations

(I − C)x = d,
which can be solved by any number of well-known numerical methods.2
It may come as some surprise that Leontief was awarded the highest prize in
economics for setting up the type of problem covered during the first day of an
elementary linear algebra course (see the 1940 entry for a story in a similar vein).
Although the underlying idea is embarrassingly simple, Leontief’s application in-
volved 500 sectors and an enormous amount of data collected from the U.S. Bureau
of Labor Statistics. Back in 1949, the solution of a 500 × 500 system of linear
equations required cutting-edge technology.
Perfect numbers. Let us be honest. We like number theory more than math-
ematical economics. There was too much to be said about Mersenne primes in our
1996 entry, so we have appropriated some space here to continue the discussion.
The Pythagoreans regarded the number 6 as special because it equals the sum
of its proper divisors: 1 + 2 + 3 = 6. The next largest numbers with this property
are 28, 496, and 8,128 since
28 = 1 + 2 + 4 + 7 + 14,
496 = 1 + 2 + 4 + 8 + 16 + 31 + 62 + 124 + 248,
8,128 = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 127
+ 254 + 508 + 1,016 + 2,032 + 4,064.
One of the cornerstones of Pythagorean philosophy was the assignment of mysti-
cal qualities to numbers. They called numbers like 6, 28, 496, and 8,126 perfect
numbers. Later thinkers like Augustine of Hippo (354–430) and Alcuin of York
(ca. 735–804) celebrated the special nature of perfect numbers. In the City of God
(Part XI, Chapter 30), Augustine writes:
These works are recorded to have been completed in six days (the same
day being six times repeated), because six is a perfect number,—not
because God required a protracted time, as if He could not at once
create all things, which then should mark the course of time by the
movements proper to them, but because the perfection of the works
was signified by the number six. For the number six is the first which
is made up of its own parts, i.e., of its sixth, third, and half, which are
respectively one, two, and three, and which make a total of six.. . . And,
therefore, we must not despise the science of numbers, which, in many
passages of holy Scripture, is found to be of eminent service to the
careful interpreter.
The fact that it takes twenty-eight days for the moon to travel around the Earth
was also seen by many early thinkers to confirm the importance of perfect numbers.
2 The QR-decomposition (see the 1959 entry) is particularly effective here. Write I −C = QR,
in which Q is an orthogonal matrix and R is upper triangular. The given system QRx = d is
equivalent to Rx = QT d, which has an upper-triangular coefficient matrix and hence can be solved
via back substitution. This approach is more stable than Gaussian elimination, which is typically
promoted in a first course on linear algebra.
In Book IX (Proposition 36) of the Elements, Euclid proved that if 2k − 1 is

prime, then n = 2k−1 (2k − 1) is a perfect number. For example,
22 − 1 = 3 is prime =⇒ 21 (22 − 1) = 2 · 3 = 6 is perfect,
23 − 1 = 7 is prime =⇒ 22 (23 − 1) = 4 · 7 = 28 is perfect,
25 − 1 = 31 is prime =⇒ 24 (25 − 1) = 16 · 31 = 496 is perfect.
Over 2,000 years later, Euler proved the converse: every even perfect number is of
Euclid’s form. Thus, an even number n is perfect if and only if
n = 2k−1 (2k − 1),
in which 2k − 1 is prime.
What about odd perfect numbers? It is known that an odd perfect number must
be larger than 101,500 and that it must have at least ten distinct prime factors. As
James Joseph Sylvester (1814–1897) noted:
. . . the existence of [an odd perfect number]—its escape, so to say, from
the complex web of conditions which hem it in on all sides—would be
little short of a miracle.
Since Sylvester’s time, many more obscure restrictions upon odd perfect numbers
have emerged. For example, they must be congruent to 1 (mod 12), 117 (mod 468),
or 81 (mod 324) and cannot be divisible by 105 [5, 6]. Most mathematicians believe
that odd perfect numbers do not exist, although we remain unable to prove it.
Bibliography
[1] J. Fogler, Options Pricing: Black–Scholes Model, https://www.investopedia.com/
university/options-pricing/black-scholes-model.asp.
[2] The Royal Swedish Academy of Sciences, Press Release (October 14, 1997), http://www.
nobelprize.org/nobel_prizes/economic-sciences/laureates/1997/press.html.
[3] F. Salmon, Recipe for disaster: the formula that killed Wall Street, Wired, February 23, 2009.
https://www.wired.com/2009/02/wp-quant/.
[4] Wikipedia, Black–Scholes model, https://en.wikipedia.org/wiki/Black-Scholes_model.
[5] Wikipedia, Perfect number, https://en.wikipedia.org/wiki/Perfect_number.
[6] Wolfram Mathworld, Odd perfect number, http://mathworld.wolfram.com/
OddPerfectNumber.html.
1998
The Kepler Conjecture
Introduction
What is the densest way to pack spheres into n-dimensional space? In one
dimension, each sphere is a line segment of length two and hence the densest packing
consists of infinitely many line segments placed end to end. Thus, the packing
density in one dimension is 1. In two dimensions the problem is somewhat harder.
Here the “spheres” are disks of radius one. Joseph-Louis Lagrange (1736–1813)
proved in 1773 that the hexagonal lattice packing (see Figure 1) is the densest
possible lattice-based sphere packing in the plane. Its density is
√
π 3
≈ 0.9069,
6
so about 90.7% of the plane is covered. Although Axel Thue had provided a flawed
proof back in 1890, a complete proof that the hexagonal lattice packing is the
densest of all possible packings, including irregular, non-lattice-based packings,
came only in 1940, when it was established by László Fejes Tóth (1915–2005) [14].
Figure 1. (left) The densest sphere packing in two dimensions

is the hexagonal lattice (honeycomb) packing. It covers approx-
imately 90.7% of the plane. (right) The square lattice packing
has density 4 − π ≈ 0.8584, so only around 85.8% of the plane is
covered.
475
476 1998. THE KEPLER CONJECTURE
In 1611, Johannes Kepler (1571–1630) conjectured that the densest packing of

identical spheres in three-dimensional space has density
π
√ ≈ 0.74048; (1998.1)
3 2
that is, the spheres occupy about 74.05% of the available space. This is the famed
Kepler conjecture. What made Kepler think of the number (1998.1)?
There are two familiar sphere packings in three dimensions: the hexagonal close
and cubic close packings; see Figure 2. Both of these packings have density equal to
(1998.1) and it seems impossible to do better.1 Kepler was aware of the cubic close
packing and conjectured that its density cannot be beaten [9, 10]. The hexagonal
close packing was only identified as a different packing by William Barlow (1845–
1934) in 1883 [2].
The problem was brought to Kepler’s attention by Thomas Harriot (ca. 1560–
1621), who had been asked by Walter Raleigh (1554–1618) about the best way
to stack cannonballs; see Figure 3. The problem was posed earlier (1611) than
Fermat’s last theorem (1637) and was solved shortly afterwards, making it an open,
active problem for a longer period of time.
A proof of the conjecture was announced by Thomas C. Hales and his stu-
dent Samuel P. Ferguson in 1998 (see [13, 15] for summaries of the key ideas).
Although it required a large number of computer-assisted computations, the proof
did not spark nearly the level of philosophical debate that the proof of the four
color theorem did over two decades earlier (see the 1976 entry).
[T]he proof was a 300-page monster that took 12 reviewers four years
to check for errors. Even when it was published in the journal Annals
of Mathematics in 2005, the reviewers could say only that they were
“99 per cent certain” the proof was correct. [1]
Although the final paper was eventually published in a top peer reviewed jour-
nal [4], the entire process prompted an important question. How does one referee
an argument where a significant amount of the argument is the result of running
tens of thousands of lines of code? To address this, Hales began a collaborative
project in 2003 to create a formal proof verifiable through automated proof check-
ing software. Called Project Flyspeck (the “F,” “P,” and “K” stand for a “Formal
Proof of Kepler”), it was successfully completed in 2014:
So in 2003, Hales started the Flyspeck project, an effort to vindicate
his2 proof through formal verification. His team used two formal proof
software assistants called Isabelle and HOL Light, both of which are
built on a small kernel of logic that has been intensely scrutinised for
any errors—this provides a foundation which ensures the computer
can check any series of logical statements to confirm they are true
. . . the Flyspeck team announced they had finally translated the dense
mathematics of Hale’s proof into computerised form, and verified that
it is indeed correct.
1 Thereare uncountably many packings that do just as well: study the key difference between
the two packings in Figure 2 and see if you can use it to build more packings of the same density.
2 Actually, the proof in the Flyspeck project involves a different local inequality based on
later work of Christian Marchal [12]. In converting the proof ideas to formal form, Hales took
advantage of this to get a local inequality that was cleaner and easier to prove by computer [11].
Hexagonal close packing (above) Hexagonal close packing (front)
Cubic close packing (above) Cubic close packing (front)
Figure 2. The√ cubic close and hexagonal close packings both have
density π/(3 2) ≈ 0.74048. The difference between the two
packings is in the relative orientation of every other layer. The
spheres in the hexagonal packing lie directly above the spheres
two layers below. The spheres in the cubic close packing do not:
consider the relative orientation of the green and blue triangles
suggested by the top and bottom layers.
“This technology cuts the mathematical referees out of the veri-

fication process,” says Hales. “Their opinion about the correctness of
the proof no longer matters.” [1]

For 1 ≤ n ≤ 20, determine the minimal side length R(n) of a cube in which
one can completely pack n unit-radius spheres. If you cannot get exact answers,
determine upper and lower bounds.
(a) A cubic close packing of cannonballs at Fort (b) Snowballs packed in hexagonal close (front)
Monroe in Hampton, Virginia, in 1861 (image and cubic close packings (rear) (image public
public domain). domain).
Figure 3. Packings of cannonballs and snowballs.
(1, 1)
( √12 , √1
2
)
(0, 0)
(a) In two dimensions, the sphere occupies ap- (b) How does the distance between the corner
proximately 52.36% of the box that contains of the cube to the nearest point of the sphere
it. change as the dimension increases?
Figure 4. What proportion of an n-dimensional cube with side

length 2 is taken up by the n-dimensional unit sphere?
1998: Comments
Cubes and spheres. What fraction of the n-dimensional cube (with sides of
length 2) is taken up by the n-dimensional unit sphere? In two dimensions the area
of the circle is π, giving a ratio of π/4 ≈ 0.785398, while in three dimensions the
volume of the sphere is 4π/3, giving a ratio of π/6 ≈ 0.523599; see Figure 4(a).
One can show that in n dimensions the sphere has volume
π n/2
Vn = ,
Γ( n2 + 1)
Table 1. The ratio of the volume of the n-dimensional sphere to

the n-dimensional cube tends to zero rapidly as n tends to infinity.
n r(n) r(n) approx n r(n) r(n) approx n r(n) r(n) approx

π3 π5
1 1 1. 6 384 0.0807455 11 332640 0.000919973
π π3 π6
2 4 0.785398 7 840 0.0369122 12 2949120 0.000325992
4 6
π π π
3 6 0.523599 8 6144 0.0158543 13 8648640 0.000111161
π2 π4 π7
4 32 0.308425 9 15120 0.0064424 14 82575360 0.0000365762
2 5 7
π π π
5 60 0.164493 10 122880 0.00249039 15 259459200 0.0000116407
in which ∞
Γ(s) = e−x xs−1 dx, Re s > 0,
0
is the gamma function. For positive integers n, we have
⎧
⎪
⎨(n − 1)! if x = n,
Γ(x) = √ (n − 2)!!
⎪
⎩ π if x = n + 12 ,
n−1
2 2
in which n!! denotes the product of every other term of the corresponding factorial.
For example, 6!! = 6 · 4 · 2 and 7!! = 7 · 5 · 3 · 1.
Using Stirling’s formula (see the comments for the 1934 entry)
√
n! ≈ nn e−n 2πn,
it follows that the ratio
π n/2 /Γ( n2 + 1)
r(n) =
2n
of the volumes of the n-dimensional sphere and cube tends to zero rapidly; see
Table 1. Thus, in higher dimensions the sphere occupies very little of the cube.
How can this be? Our low-dimensional intuition misleads us in higher dimensions.
For example, the point
1
√ (1, 1, . . . , 1) ∈ Rn
n
lies on the n-dimensional sphere. Its distance to the corner (1, 1, . . . , 1) of the
n-dimensional cube is
9
: n 2
: 1 √ 1 √
; 1− √ = n 1− √ = n − 1,
i=1
n n
which tends to infinity! This unexpected behavior is not evident in Figure 4(b).
Remark on the problem. One can show that R(1) = 2, and we think that
R(2) = 1 + √23 . Then things rapidly get tricky. There are some n ≤ 20 for which
the exact answer is unknown. Some records for 1 ≤ n ≤ 32 are in [3, 8].
Bibliography
[1] J. Aron, Proof confirmed of 400-year-old fruit-stacking problem, New Scientist (August 12,
2014), https://www.newscientist.com/article/dn26041-proof-confirmed-of-400-year-
old-fruit-stacking-problem.
[2] W. Barlow, Probable nature of the internal symmetry of crystals, Nature 29 (1883), 186–188.
[3] Th. Gensane, Dense packings of equal spheres in a cube, Electron. J. Combin. 11
(2004), no. 1, Research Paper 33, 17. http://www.combinatorics.org/ojs/index.php/eljc/
article/view/v11i1r33/pdf. MR2056085
[4] T. C. Hales, A proof of the Kepler conjecture, Ann. of Math. (2) 162 (2005), no. 3, 1065–
1185, DOI 10.4007/annals.2005.162.1065. http://annals.math.princeton.edu/2005/162-3/
p01. MR2179728
[5] T. C. Hales, Historical overview of the Kepler conjecture, Discrete Comput. Geom. 36 (2006),
no. 1, 5–20, DOI 10.1007/s00454-005-1210-2. http://link.springer.com/article/10.1007
%2Fs00454-005-1210-2. MR2229657
[6] T. C. Hales and S. P. Ferguson, A formulation of the Kepler conjecture, Discrete Comput.
Geom. 36 (2006), no. 1, 21–69, DOI 10.1007/s00454-005-1211-1. http://link.springer.com/
article/10.1007%2Fs00454-005-1211-1. MR2229658
[7] T. C. Hales, J. Harrison, S. McLaughlin, T. Nipkow, S. Obua, and R. Zumkeller, A revi-
sion of the proof of the Kepler conjecture, Discrete Comput. Geom. 44 (2010), no. 1, 1–34,
DOI 10.1007/s00454-009-9148-4. http://link.springer.com/article/10.1007%2Fs00454-
009-9148-4. MR2639816
[8] A. Joós, On the packing of fourteen congruent spheres in a cube, Geom. Dedicata 140
(2009), 49–80, DOI 10.1007/s10711-008-9308-3. http://link.springer.com/article/10.
1007%2Fs10711-008-9308-3. MR2504734
[9] C. Hardie, translation of J. Kepler’s Strena, seu de nive sexangula, Oxford University Press,
2014.
[10] J. Kepler, Strena, seu de nive sexangula, Francofurti ad Moenum apud Godfefridum Tam-
pach, 1611.
[11] J. C. Lagarias, Dense sphere packings: a blueprint for formal proofs [book review
of MR3012355], Bull. Amer. Math. Soc. (N.S.) 53 (2016), no. 1, 159–166, DOI
10.1090/bull/1502. MR3443950
[12] C. Marchal, Study of the Kepler’s conjecture: the problem of the closest packing, Math. Z.
267 (2011), no. 3-4, 737–765, DOI 10.1007/s00209-009-0644-2. MR2776056
[13] J. Lagarias (ed.), The Kepler Conjecture: The Hales-Ferguson Proof, Springer-Verlag, 2011.
[14] L. F. Tóth, Über die dichteste Kugellagerung, Math. Z. 48 (1940), 676–684.
1999
Baire Category Theorem
Introduction
A seminal result in analysis, the Baire category theorem, was published by the
French mathematician René-Louis Baire (1874–1932) in his 1899 doctoral thesis
Sur les fonctions de variables réelles. In particular, it is the main ingredient in
the proof of three fundamental theorems in functional analysis: the open mapping
theorem, the closed graph theorem, and the uniform boundedness principle [3].
Because of its numerous applications and continued use in modern mathematics,
its centennial merits special attention.
A few definitions are necessary in order to state this important theorem. A
subset A of a topological space (see the comments for the 1955 entry) is nowhere
dense if its closure A− has empty interior, that is, if (A− )◦ = ∅. Figure 1 shows
the closure and interior of a set in R2 . A subset A of a topological space is of the
first category if it can be written as the countable union of nowhere dense sets;
otherwise A is of the second category. The classical version of the Baire category
theorem says that a complete metric space is of the second category in itself [1–3].
Before proceeding, we should admit that Baire’s terminology is unenlightening
and dated. To add to the confusion, it has nothing to do with category theory,
an important branch of mathematics that originated in the latter half of the 20th
century. A more modern statement of Baire’s theorem has two parts:
(a) In a complete metric space, the countable union of open dense sets is dense.
(b) A complete metric space is not the countable union of nowhere dense sets.
The theorem also applies to topological spaces that are homeomorphic (see the
comments for the 1917 entry) to complete metric spaces.
Figure 1. A set A in R2 (left), its closure A− (middle), and its

interior A◦ (right).
481
482 1999. BAIRE CATEGORY THEOREM
Figure 2. A fat Cantor set F obtained by removing the middle

fifth of each successive interval. Like the standard middle-third
Cantor set, F is uncountable, compact, and nowhere dense. How-
ever, F has Lebesgue measure 13 .
What is the big deal about the Baire category theorem? As a warmup, here is a
one-line proof that R is uncountable (see the 1918 entry). If R = {a1 , a2 , . . .}, then
#
R = ∞ n=1 {an } is the countable union of nowhere dense sets, which contradicts
(b) since R is complete. A similar argument shows that the Cantor set (see the
comments for the 1917 entry) is uncountable. Since the Cantor set is compact, it
is complete and hence it cannot be the countable union of singletons.
Here is another cute application. Let F be a fat Cantor set, that is a Cantor-
like set with positive Lebesgue measure; see Figure 2. Then R is not the countable
union of translated copies of F . We cannot appeal to a measure-theoretic argument
here: since F has positive measure, a countable union of translates of F may well
have infinite Lebesgue measure. Baire’s theorem comes to the rescue. Like the
standard Cantor set, F is nowhere dense. Thus, (b) tells us that R is not the union
of countably many translates of F .
Why is the Baire category theorem so powerful? What is going on underneath
the hood? The proof of Baire’s theorem hinges in a crucial manner upon the axiom
of choice (see the comments below and in the 1964 entry).
Our problem for this year is a typical application of the Baire category theorem
to functional analysis. It may not be obvious how to apply the theorem to the
following problem. Here is a hint: look at finite-dimensional subspaces!

Proposed by Mihai Stoiciu, Williams College.
Let C[x] be the vector space of polynomials in one variable with complex co-
efficients and let · : C[x] → [0, ∞) be a norm on C[x]. Use the Baire category
theorem to prove that C[x] is not complete with respect to the induced metric.
That is, prove that (C[x], · ) is not a Banach space.
1999: Comments
Axiom of choice. The proof of the Baire category theorem, which can be
found in most real analysis textbooks, involves the subtle use of the axiom of
choice (AC). See the comments for the 1964 entry for a statement of the axiom
and a few general comments. We are interested here in discussing a few equivalent
formulations of AC. To continue our discussion, we require a few definitions.
{1, 2, 3}
{1, 2} {1, 3} {2, 3}
{1} {2} {3}
Figure 3. A Hasse diagram illustrating the poset P({1, 2, 3}),

ordered by ⊆.
A partial order on a set A is a relation ≤ on A that is

(a) (reflexive) a ≤ a,
(b) (antisymmetric) a ≤ b and b ≤ a imply a = b,
(c) (transitive) a ≤ b and b ≤ c imply a ≤ c.
A partially ordered set is called a poset. The symbols <, ≥, and > are defined in
terms of ≤ in the natural way.
In a poset, two elements need not be comparable; that is, there may exist
a, b ∈ A such that neither a ≤ b nor b ≤ a holds. A poset is totally ordered if for
every a, b ∈ A, either a ≤ b or b ≤ a. A chain is a totally ordered subset of a poset
(see the 1918 problem). A totally ordered poset is well-ordered if each nonempty
subset of A has a smallest element with respect to ≤.
The powerset P({1, 2, 3}), when endowed with the partial order ⊂, is a poset;
see Figure 3. This poset has a unique largest element, {1, 2, 3}, and a unique
smallest element, ∅. The elements {1} and {2} are not comparable; neither is
greater than or equal to the other. The set

∅, {1}, {1, 2}, {1, 2, 3}
is a chain in P({1, 2, 3}) that is well-ordered.

Many useful results in various branches of mathematics are known to be equiv-
alent under the axioms of Zermelo–Fraenkel set theory:
(a) Axiom of choice. If {Xα }α∈I is a nonempty collection of nonempty sets,
#
then there is an f : I → α∈I Xα such that f (α) ∈ Xα .
(b) Well-ordering principle. Every set can be well-ordered.1
(c) Cardinal comparability. If A, B are sets, then there is an injection f : A →
B or an injection g : B → A.
(d) Zorn’s lemma. Every nonempty partially ordered set in which every chain
has an upper bound contains at least one maximal element.
1 The order produced by the well-ordering principle need not correspond to any sort of natural
order structure that A possesses. The axiom of choice implies that R can be well-ordered, but the
order has no relation to the standard order on R.
484 1999. BAIRE CATEGORY THEOREM
(e) Hausdorff maximality principle. Every partially ordered set has a maxi-
mal totally ordered subset.
(f) The Cartesian product of nonempty sets is nonempty.
(g) Every vector space has a basis.
(h) Every poset has a maximal antichain.2
(i) Every connected graph has a spanning tree.3
The following common theorems require the axiom of choice or some weaker
variant of it such as the axiom of countable choice (in which countably many arbi-
trary choices can always be made):
• A countable union of countable sets is countable.
• Every infinite set has a countable infinite subset.
• Every field has an algebraic closure.4
• Nielsen–Schreier theorem. Every subgroup of a free group is free.
• Baire category theorem. In a complete metric space, the countable intersec-
tion of open, dense sets is dense.
The bizarre results that follow from the axiom of choice, coupled with its intu-
itive and useful consequences, spur one to ask if AC is true or false. This is, in a
precise sense, a question that cannot be answered: Gödel and Cohen proved that
AC is independent of Zermelo–Fraenkel set theory.
A set of axioms is consistent if there does not exist a statement S such that both
S and its negation ¬S are provable from the axioms; that is, the axioms are not self-
contradictory. Gödel’s second incompleteness theorem (see the 1929 entry) asserts
that no “sufficiently complicated” axiomatic system, including Zermelo–Fraenkel
set theory (ZF), can prove its own consistency.
Outside of logic and set theory, few working mathematicians concern themselves
with the consistency of ZF. Almost everyone believes that ZF is consistent, but
Gödel’s theorem tells us that we cannot hope to prove its consistency without
recourse to a more powerful axiom system; then we face the problem of proving
that that system is consistent!
Think of systems of axioms as “operating systems” for software. Most of mod-
ern mathematics “runs under” ZFC, the Zermelo–Fraenkel axioms augmented with
the axiom of choice. ZFC is sufficient to “run” the software that most average
“users” (mathematicians, statisticians, physicists, computer scientists, and so forth)
need. It has not “crashed” (been proven inconsistent) yet, but no one knows if ZFC
is “crash-proof” (consistent). There are other, more exotic operating systems out
2 Anantichain is a subset of a poset with the property that any two distinct elements in the
subset are not comparable.
3 A spanning tree in a graph G is a connected subgraph that contains every vertex of G and
which contains no cycles.

4 A field F is algebraically closed if every polynomial with coefficients in F has a root in F.
The standard example is the complex field C. That C is algebraically closed is the fundamental
theorem of algebra.
there, such as ZFC augmented by certain large cardinal axioms, but mostly these
are for “power users” such as set theorists and logicians. The average user is content
running on ZFC and rarely thinks about operating systems, if at all.
Bibliography
[1] G. B. Folland, Real analysis: Modern techniques and their applications, 2nd ed., Pure and
Applied Mathematics (New York), A Wiley-Interscience Publication, John Wiley & Sons,
Inc., New York, 1999. MR1681462
[2] S. H. Jones, Applications of the Baire category theorem, Real Anal. Exchange 23 (1997/98),
no. 2, 363–394. https://projecteuclid.org/euclid.rae/1337001353. MR1640007
[3] T. Tao, The Baire category theorem and its Banach space consequences, http://terrytao.
wordpress.com/2009/02/01/245b-notes-9-the-baire-category-theorem-and-its-banach-
space-consequences.
2000
Introduction
This is another entry for which there were at least two good options. The
Clay Millennium Problems were one natural candidate; we briefly discuss them
in the comments below. This year’s winner is one of the most popular statistical
programming languages and environments: R.
R was created in 1993 by Ross Ihaka (1954– ) and Robert Gentleman (1959– )
at the University of Auckland, New Zealand. Its name is both a reference to
the first names of its inventors and to the underlying S programming language
that was developed at Bell Labs in the 1970s [13]. R, which is open source and
freely available, is widely used in industry and academia to perform statistical
computations. There are numerous developers and thousands of useful packages
available online.
Version 1.0.0 of R was released on February 29, 2000. This was the first version
considered stable enough for general use [5]:
The release of a current major version indicates that we believe that R
has reached a level of stability and maturity that makes it suitable for
production use. Also, the release of 1.0.0 marks that the base language
and the API for extension writers will remain stable for the foreseeable
future. In addition we have taken the opportunity to tie up as many
loose ends as we could.
In the comments to the 1953 entry, we saw how Andrey Markov developed
Markov chains to analyze the writing of Alexander Pushkin. What about the cre-
ation of literature? A little probability theory ensures that an immortal monkey
who pounds away randomly at a typewriter for all eternity will almost surely pro-
duce the complete works of William Shakespeare1 , along with the true version of
his lost play Love’s Labour’s Won, along with many false versions2 . What about
more sensible applications of mathematics to literature? For example, we might
wish to determine if a certain passage was written by the purported author. Has
an author’s style changed over time? All of these questions involve culling large
sets of linguistic data, then parsing and analyzing it.
Maciej Eder, Jan Rybicki, and Mike Kestemont created an R package to per-
form such analyses [4]. The motivating examples they consider range from a
pseudonymously published work written by J. K. Rowling (1965– ) to the alleged
original version of To Kill a Mockingbird by Harper Lee (1926–2016). Their paper
1 “Ford!” he said, “there’s an infinite number of monkeys outside who want to talk to us
about this script for Hamlet they’ve worked out” [1].

2 Including the fanciful script of the 2007 Doctor Who episode The Shakespeare Code.
487
488 2000. R
Figure 1. Bookstrap Consensus Tree for Harper Lee, from [4].
is full of code and detailed textual analyses (see Figure 1) and gives a small glimpse
of what one can do with R:
This software paper describes ‘Stylometry with R’ (stylo), a flexible
R package for the high-level analysis of writing style in stylometry.
Stylometry (computational stylistics) is concerned with the quantita-
tive study of writing style, e.g. authorship verification, an application
which has considerable potential in forensic contexts, as well as his-
torical research. In this paper we introduce the possibilities of stylo
for computational text analysis, via a number of dummy case studies
from English and French literature. We demonstrate how the package
is particularly useful in the exploratory statistical analysis of texts,
e.g. with respect to authorial writing style. Because stylo provides an
attractive graphical user interface for high-level exploratory analyses,
it is especially suited for an audience of novices, without programming
skills (e.g. from the Digital Humanities). More experienced users can
benefit from our implementation of a series of standard pipelines for
text processing, as well as a number of similarity metrics.

To be scientifically literate these days one must understand statistics and be
able to write simple programs to cull and analyze data. Download R and analyze
a real-world problem. For example, look at all batters in baseball with (a) bases
empty and (b) just a runner on first and no outs. Are the batting averages in the
two cases statistically different? To solve this problem you will need to find game
data online and reconstruct the games to get the game state of each at bat.
2000: Comments
Monkey business. On the theme of monkey-generated literature, we cannot
pass up the opportunity to recount the bizarre story of Pierre Brassau. In 1964,
tabloid journalist Åke Axelsson had a four-year-old chimpanzee produce a series
of paintings that were later exhibited in the Gallerie Christinae in Göteborg under
the pretense that they were the work of “Pierre Brassau,” an unheralded French
painter. One critic applauded the work: “Brassau paints with powerful strokes, but
also with clear determination. His brush strokes twist with furious fastidiousness.
Pierre is an artist who performs with the delicacy of a ballet dancer” [12]. Needless
to say, many in the Swedish art world were not amused by the hoax.
Maximum amusement. On the theme of hoaxes (make sure to check out the
comments for the 1996 entry), MIT students Jeremy Stribling, Maxwell Krohn, and
Daniel Aguayo wrote SCIgen, “a program that generates random Computer Science
research papers, including graphs, figures, and citations” [9]. It produced the now-
infamous paper Rooter: A Methodology for the Typical Unification of Access Points
and Redundancy [8], which opens with the immortal lines:
Many scholars would agree that, had it not been for active networks,
the simulation of Lamport clocks might never have occurred. The
notion that end-users synchronize with the investigation of Markov
models is rarely outdated. A theoretical grand challenge in theory
is the important unification of virtual machines and real-time theory.
To what extent can web browsers be constructed to achieve this pur-
pose? Certainly, the usual methods for the emulation of Smalltalk
that paved the way for the investigation of rasterization do not apply
in this area. In the opinions of many, despite the fact that conventional
wisdom states that this grand challenge is continuously answered by
the study of access points, we believe that a different solution is nec-
essary. It should be noted that Rooter runs in Ω(log log n). Certainly,
the shortcoming of this type of solution, however, is that compilers
and superpages are mostly incompatible. Despite the fact that similar
methodologies visualize XML, we surmount this issue without synthe-
sizing distributed archetypes.
This meaningless load of fetid dingo’s kidneys was accepted by the Ninth World
Multiconference on Systemics, Cybernetics and Informatics (WMSCI 2005). What
was the point of this exercise? The mischievous trio hoped to cause “maximum
amusement” and “test whether such meaningless manuscripts could pass the screen-
ing procedure for conferences that, they feel, exist simply to make money” [2].
490 2000. R
Curiously, the statistical generation and analysis of research papers has come
full circle. In 2014, a study by computer scientist Cyril Labbé revealed that at
least 120 nonsense papers generated by SCIgen had been published in conference
proceedings between 2008 and 2013 [6]!
Clay Millennium Problems. The Clay Mathematics Institute was founded

in 1998 by Landon T. Clay (1926–2017), a successful venture capitalist with a
profound appreciation for mathematics. The institute is most well known for its
proposal of the Millennium Prize Problems, which were announced on May 24,
2000, in a series of lectures at the Collège de France by Timothy Gowers, Michael
Atiyah, and John Tate (1925– ). According to the institute [3]:
The Prizes were conceived to record some of the most difficult prob-
lems with which mathematicians were grappling at the turn of the
second millennium; to elevate in the consciousness of the general pub-
lic the fact that in mathematics, the frontier is still open and abounds
in important unsolved problems; to emphasize the importance of work-
ing towards a solution of the deepest, most difficult problems; and to
recognize achievement in mathematics of historical magnitude.
The millennium problems are a 21st-century analogue of David Hilbert’s celebrated

list from 1900; see the 1935, 1963, 1970, 1980, and 1983 entries. There is a modern
twist: a solution to a millennium problem earns the solver a million-dollar prize!
Some of the millennium problems are old favorites. For example, the Riemann
hypothesis, Hilbert’s eighth problem, is one of the seven Clay problems. Others
are more modern and would have been inconceivable in Hilbert’s time. The P
versus NP problem, for example, involves computational complexity theory, a field
that blossomed only after the advent of the computer. Several of the problems are
discussed elsewhere in this book; others we have not touched. We can hardly do
better than to quote the original summaries provided by the Clay foundation [3];
we do so frequently below.
• Yang–Mills existence and mass gap. The problem requires a proof that
any compact simple gauge group gives rise to a nontrivial quantum Yang–Mills
theory on R4 with a positive mass gap [14]. What does this mean?
Quantum Yang–Mills theory is now the foundation of most of elementary
particle theory, and its predictions have been tested at many experimen-
tal laboratories, but its mathematical foundation is still unclear. The
successful use of Yang–Mills theory to describe the strong interactions of
elementary particles depends on a subtle quantum mechanical property
called the “mass gap”: the quantum particles have positive masses, even
though the classical waves travel at the speed of light. [3]
The existence of the mass gap has been confirmed by experimental physicists and
computer simulations, although a mathematical explanation is lacking.
• Riemann hypothesis. By now the reader is well versed on the Riemann hy-
pothesis, one of the most stubborn problems in mathematics. It is the subject of
our 1942, 1945, and 1987 entries. Also see the comments for the 1933 and 1948
entries.
Figure 2. Turbulent fluid flow at multiple scales. Photo by

Steven Mathey under Creative Commons Attribution-Share Alike
4.0 International license. https://commons.wikimedia.org/
wiki/File:Self_Similar_Turbulence.png
• P versus NP problem. Let P denote the class of decision problems that can be
solved in polynomial time (with respect to the length of the input) and let NP be
the class of problems for which a proposed solution can be verified in polynomial
time. Thus, P ⊆ NP. The million-dollar question is whether equality holds [11].
That is, does knowing how to quickly verify a solution to a problem automatically
mean that a fast algorithm for solving that problem exists? For example, one
multiplication verifies the correctness of an integer factorization. Does this imply
that a deterministic, polynomial-time integer factorization algorithm exists?
• Navier–Stokes equation. This complicated system of partial differential equa-
tions with prescribed boundary conditions, named after Claude-Louis Navier
(1785–1836) and George Gabriel Stokes, (1819–1903), governs three-dimensional
fluid flow. For example, the turbulent behavior of water and air seems to adhere
to these equations; see Figure 2. Under reasonable mathematical hypotheses, do
solutions exist? Are they unique? Or can solutions “break down” in finite time?
How well does Navier–Stokes model physical reality?
• Hodge conjecture. This conjecture concerns how much of the topology of the
solution set of a system of algebraic equations can be defined in terms of further
algebraic equations. Since this is a tough one to describe with any degree of
faithfulness, we refer the reader to [3, 10] for further information.
• Poincaré conjecture. Is every simply connected, closed, three-dimensional
manifold homeomorphic to the three-dimensional sphere? This conjecture has a
long and storied history and, by some accounts, it has resulted in two or three
Fields Medals! See the 2003 entry for more details.
• Birch and Swinnerton-Dyer conjecture. What is the relationship between
the number of points on an elliptic curve over finite fields of prime order and the
rank of the group of rational points on the curve? See the comments for the 1921
entry for a detailed discussion of this conjecture.
Of the seven millennium problems, only the Poincaré conjecture has been resolved;
see the 2003 entry.
492 2000. R
Bibliography
[1] D. Adams, The Hitchhiker’s Guide to the Galaxy, Pan Books, 1979.
[2] P. Ball, Computer conference welcomes gobbledegook paper, Nature.com, https://www.
nature.com/articles/nature03653.
[3] Clay Mathematics Institute, The Millennium Prize Problems, http://www.claymath.org/
millennium-problems/millennium-prize-problems.
[4] M. Eder, J. Rybicki, and M. Kestemont, Stylometry with R: A Package for Computational
Text Analysis, The R Journal 8 (2016), no. 1, 107–121. https://journal.r-project.org/
archive/2016/RJ-2016-007/RJ-2016-007.pdf.
[5] R Developer Page, Statistical analysis environment “R” version 1.0.0 is released, http://
developer.r-project.org/R-release-1.0.0.txt.
[6] R. Van Noorden, Publishers withdraw more than 120 gibberish papers, Nature.com, https://
www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763.
[7] The R Project for Statistical Computing, http://www.r-project.org/.
[8] J. Stribling, D. Aguayo, and M. Krohn, Rooter: a methodology for the typical unification of
access points and redundancy, https://pdos.csail.mit.edu/archive/scigen/rooter.pdf.
[9] J. Stribling, M. Krohn, and D. Aguayo, SCIgen—An Automatic CS Paper Generator,
https://pdos.csail.mit.edu/archive/scigen/.
[10] Wikipedia, Hodge conjecture, https://en.wikipedia.org/wiki/Hodge_conjecture.
[11] Wikipedia, P versus NP problem, https://en.wikipedia.org/wiki/P_versus_NP_problem.
[12] Wikipedia, Pierre Brassau, https://en.wikipedia.org/wiki/Pierre_Brassau.
[13] Wikipedia, R (programming language), http://en.wikipedia.org/wiki/R (programming
language).
[14] Wikipedia, Yang–Mills existence and mass gap, https://en.wikipedia.org/wiki/Yang-
Mills_existence_and_mass_gap
2001
Colin Hughes Founds Project Euler
Introduction
Project Euler, created by Colin Hughes in 2001, is an outstanding website that
has provided countless hours of enjoyment to mathematicians, computer scientists,
and other computationally minded people. It describes itself as follows [1]:
Project Euler is a series of challenging mathematical/computer pro-
gramming problems that will require more than just mathematical in-
sights to solve. Although mathematics will help you arrive at elegant
and efficient methods, the use of a computer and programming skills
will be required to solve most problems. The motivation for starting
Project Euler, and its continuation, is to provide a platform for the
inquiring mind to delve into unfamiliar areas and learn new concepts
in a fun and recreational context.
For many of the problems, one can quickly come up with a program that will
eventually find the solution. However, this does not mean that the program will
run in a reasonable amount of time. As an extreme example of this phenomenon,
consider chess. Since there are only finitely many possible board configurations, an
analysis of chess can be reduced to a finite computation. Does the first player have
a winning strategy? Can the second player always force the game to end in a draw?
Unfortunately, the number of board configurations and possible moves is far too
large for humans or their computers to analyze by brute force. The same is true
in many of the Project Euler problems: although one can describe a brute-force
approach, the naive approach simply takes too long to run.
Project Euler problems illustrate several key points:
• Theory has a place in computational problems: a clever reformulation of the
problem may prove more tractable than the original approach.
• Implementation is nontrivial: different programming languages and environments
may be better suited to different tasks.
• Although brute force sometimes works, an elegant approach is often more illu-
minating.
We illustrate these principles with the following problem. Consider a large triangle
and several possible triangulations; see Figure 1(a). Assign colors (red, green, or
blue) to each vertex as follows:
(a) the bottom left vertex of the original triangle is red, the bottom right is green,
the top is blue;
493
494 2001. COLIN HUGHES FOUNDS PROJECT EULER
(a) The initial triangle has (b) A refinement with one (c) A further refinement with
vertices of three different col- subtriangle with vertices of three subtriangles with ver-
ors. three different colors. tices of three different colors.
Figure 1. Sperner’s lemma ensures that each refinement contains

an odd number of subtriangles that each have three vertices of
distinct colors.
(b) any vertex on an outer edge of the original triangle has its color determined
by the two vertices adjacent to it;
(c) internal vertices may be colored red, green, or blue with no restrictions.
Does there exist a small triangle with red, green, and blue vertices?
Given a fixed subdivision, we can check all possible labelings by brute force.
However, this will not settle the general question since there are infinitely many
possible subdivisions that must be considered. An elegant approach to the prob-
lem is to prove that the number of triangles with distinctly labeled vertices is odd;
therefore at least one such triangle exists. This result, now known as Sperner’s
lemma, was discovered by Emanuel Sperner (1905–1980) in 1928; see [2, 4, 5]. Sur-
prisingly, it can be used to prove Brouwer’s fixed-point theorem, a seminal result
in topology; see the 2009 entry.
Here is a sketch of the proof. First, label the colors 1, 2, 3. Let Tabc , in which
a ≤ b ≤ c, denote the number of small triangles with vertices labeled a, b, and c
in some order. We want to show that T123 is positive. Let S12 denote the number
of 1−2 segments on the bottom of the original triangle. Then twice the number of
1−2 segments in the subdivision is
T123 + 2T112 + 2T122 + S12 .
This is because a 1−1−2 or 1−2−2 triangle generates two 1−2 segments. Thus, the
parity of T123 is the same as that of S12 . We leave it as an exercise to show that
the number of 1 − 2 segments on the bottom edge of the original triangle is odd,
which proves the claim.

There are now over 400 problems of various levels of difficulty on the Project
Euler website [1]. To solve these problems quickly requires a deep understanding
of both mathematics (which often has formulas to cut down on the computations)
and computer science (to efficiently code the problem). Form a group and see how
many of these problems you can solve.
2001: Comments
Fibonacci fun. The twenty-fifth problem on the Project Euler website con-
cerns the Fibonacci numbers, defined by
F0 = 0, F1 = 1, and Fn+1 = Fn + Fn−1 . (2001.1)
It asks:
What is the index of the first term in the Fibonacci sequence to contain
1,000 digits?
One can solve this by brute force. Here is a short Mathematica program to solve
the problem by searching among the first 100,000 Fibonacci numbers:
For[n = 1, n <= 100000, n++,
If[Log[10, N[Fibonacci[n]]] >= 999, Print[n]; Break[]]
]
The computer provides the answer, n = 4,782, in a fraction of a second. However,
this is not terribly satisfactory since it does not suggest a general method. We
relied upon the “black box” command Fibonacci[] to do the work for us. What
would happen if instead of 1,000 digits we insisted upon a billion? Do we really
understand what is going on?
One of the goals of the Project Euler problems is to show the interplay between
theory and coding. Is there a more elegant approach to the problem above? Binet’s
formula is a beautiful closed-form expression for the nth Fibonacci number:
√ n √ n
1 1+ 5 1− 5
Fn = √ − . (2001.2)
5 2 2
Although it is named after Jacques Philippe Marie Binet (1786–1856), who found it
in 1843, the formula was already known to Abraham de Moivre (1667–1754). This
is a classic example of Stigler’s law of eponymy; see the comments for the 2010
entry. The comments below contain two derivations of Binet’s formula.
How does Binet’s formula help? First observe that
√
1+ 5
≈ 1.61803398
2
is the golden ratio from classical geometry. Its algebraic conjugate,1
√
1− 5
≈ −0.61803398,
2
has absolute value less than one. Therefore, (2001.2) ensures that
√ n
1 1+ 5
Fn ≈ √
5 2
1 The golden ratio is a root of z 2 − z − 1, which is irreducible over Z (and hence, by Gauss’s
lemma, over Q). The other root of this polynomial is an algebraic conjugate of the golden ratio.
496 2001. COLIN HUGHES FOUNDS PROJECT EULER
with an error that tends to zero exponentially fast. If we want the first index n
such that Fn has k + 1 digits, then we solve
√ n √
1 1+ 5 k log 10 + log 5
√ ≈ 10 k
and deduce n ≈ √ .
5 2 log( 1+ 5 ) 2
Let k = 999 and conclude that the first Fibonacci with 1,000 digits has index
approximately 4,781.86. Rounding up to 4,782 yields the answer. In addition to
estimating the critical index, we can also check our claim with Binet’s formula:
F4,781 ≈ 6.613373228392440 × 10998 ,
F4,782 ≈ 1.070066266382759 × 10999 .
We are correct! For more on the Fibonacci numbers, see the 1938, 1957, 1970, and
1980 entries.
Binet’s formula via linear algebra. Let
!
1 1
A =
1 0
and use induction to confirm that2
!
Fn+1 Fn
An = (2001.3)
Fn Fn−1
for n = 1, 2, 3, . . .. The characteristic polynomial of A is
pA (z) = z 2 − z − 1,
which has roots √ √
1+ 5 1− 5
φ= and ψ= . (2001.4)
2 2
Eigenvectors that correspond to the eigenvalues (2001.4) are s1 = [1 − ψ]T and
s2 = [1 −φ]T . This yields the diagonalization A = SDS −1 , in which S = [s1 s2 ]
and D = diag(φ, ψ). Thus,
An = (SDS −1 )n = (SDS −1 )(SDS −1 ) · · · (SDS −1 ) = SDn S −1

n times
and hence
! ! n ! !
Fn+1 Fn 1 1 φ 0 1 −φ −1
= −√ . (2001.5)
Fn Fn−1 −ψ −φ 0 ψn 5 ψ 1

An S Dn S −1
Then ! !
Fn+1 Fn 1 φn − ψ n
= √ ,
Fn Fn−1 5
in which denotes entries whose exact values are irrelevant. Compare the (1, 2)
entries on both sides and obtain Fn = √15 (φn − ψ n ), which is Binet’s formula.
2 Here are some nice consequences of (2001.3). Take determinants in (2001.3) and obtain
Simpson’s formula: Fn+1 Fn−1 − Fn2 = (−1)n . Compare the lower-left entries of Am+n = Am An
and obtain Fm+n = Fm−1 Fn + Fm Fn+1 . Induction and the preceding formula can be used to
prove that Fd |Fn whenever d|n. For example, F5 = 5 divides F10 = 55.
Binet’s formula via calculus. Let

∞
f (z) = Fn z n (2001.6)
n=0
denote the generating function for the Fibonacci numbers. Then
∞
f (z) = F0 + F1 z + Fn+2 z n+2
n=0
∞

= z+ (Fn+1 + Fn )z n+2
n=0
∞ ∞

n+2
= z+ Fn+1 z + Fn z n+2
n=0 n=0
= z + zf (z) + z 2 f (z),
and hence
z
f (z) = . (2001.7)
1 − z − z2
The roots of the denominator are −φ and −ψ. A partial fraction expansion ex-
presses (2001.7) as a linear combination of two geometric series. Some tedious
calculations eventually yield
∞
1
f (z) = √ (φn − ψ n )z n .
n=0
5
Compare the coefficients in (2001.6) and the preceding to deduce Binet’s formula.
Bibliography
[1] Project Euler.net, https://projecteuler.net/.
point theorems, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Berlin, 1980.
MR602694
[3] T. Koshy, Fibonacci and Lucas numbers with applications, Pure and Applied Mathematics
(New York), Wiley-Interscience, New York, 2001. MR1855020
[5] E. Sperner, Neuer beweis für die invarianz der dimensionszahl und des gebietes (German),
Abh. Math. Sem. Univ. Hamburg 6 (1928), no. 1, 265–272, DOI 10.1007/BF02940617.
MR3069504
2002
PRIMES in P
Introduction
Given a large integer n, how quickly can one determine whether it is prime or
composite? The naive method is to divide n by each prime 2, 3, 5, 7, . . .. If one
√
reaches n without finding a factor, then n is prime. However, if n has a few
hundred digits, this approach can take longer than the age of the universe!
A more efficient approach is based upon Fermat’s little theorem, which says
that if p is prime and p does not divide a, then1
ap−1 ≡ 1 (mod p). (2002.1)
First, select an integer a. The Euclidean algorithm rapidly computes gcd(a, n)
without the need to factor either number.2 If gcd(a, n) = 1, then n is composite. If
gcd(a, n) = 1 and an−1 ≡ 1 (mod n), then Fermat’s little theorem ensures that n is
composite (although it does not provide a specific factor of n). If an−1 ≡ 1 (mod n),
then the test is inconclusive. In this case, repeat the test with another base a.
This can be implemented rapidly on a computer since an−1 need not be com-
puted directly. An example illustrates the approach. Suppose that we wish to
determine whether n = 1763 is prime or composite. We first write 1763 in binary.
Divide n = 1763 by the largest power of 2 that is at most n and repeat:
1762 = 1024 + 738,
738 = 512 + 226,
226 = 128 + 98,
98 = 64 + 34,
34 = 32 + 2.
Thus,
1763 = 1024 + 512 + 128 + 64 + 32 + 2

= 210 + 29 + 27 + 26 + 25 + 21
= (11011100010)2 .
1 To prove Fermat’s little theorem, first show that a, 2a, 3a, . . . , (p − 1)a are distinct and
nonzero modulo p. Then a, 2a, 3a, . . . , (p − 1)a are congruent modulo p to 1, 2, 3, . . . , (p − 1), in
some order. Thus, a · 2a · 3a · · · (p − 1)a ≡ 1 · 2 · 3 · · · (p − 1) (mod p), and hence ap−1 (p − 1)! ≡
(p − 1)! (mod p). Since p does not divide (p − 1)!, we obtain ap−1 ≡ 1 (mod p).
2 A theorem of Gabriel Lamé says that the number of steps in the Euclidean algorithm is at
most five times the number of base-10 digits of min{a, n}.
499
500 2002. PRIMES IN P
Repeated squaring and reduction modulo 1763 provide

0
22 = 21 ≡ 2 (mod 1763),
1
22 = 22 ≡ 4 (mod 1763),
2
22 = 24 ≡ 16 (mod 1763),
3
22 = 28 ≡ 256 (mod 1763),
4
22 = 216 ≡ 305 (mod 1763),
5
22 = 232 ≡ 1349 (mod 1763),
6
22 = 264 ≡ 385 (mod 1763),
7
22 = 2128 ≡ 133 (mod 1763),
8
22 = 2256 ≡ 59 (mod 1763),
9
22 = 2512 ≡ 1718 (mod 1763),
10
22 = 21024 ≡ 262 (mod 1763).
Reducing modulo 1763 at each step, we obtain
10
+29 +27 +26 +25 +21 10 9 7 6 5 1
21762 ≡ 22 ≡ 22 22 22 22 22 22 (mod 1763)
≡ 262 · 1718 · 133 · 385 · 1349 · 4 (mod 1763)
≡ 262 · 1718 · 133 · 385 · 107 (mod 1763)
≡ 262 · 1718 · 133 · 646 (mod 1763)
≡ 262 · 1718 · 1294 (mod 1763)
≡ 262 · 1712 (mod 1763)
≡ 742 (mod 1763).
Since 21762 ≡ 1 (mod 1763), Fermat’s little theorem implies that 1763 is composite.
There are several important points here.
• We have proved that n is composite without providing a factor of n (for those
dying of curiosity: 1763 = 41 · 43).
• Judicious reduction modulo n means that our computations do not involve num-
bers that are significantly larger than n.
√
• The number of steps is proportional to log2 n and not n, as in the naive method.
Although the Fermat-based algorithm is fast, it is not always conclusive. For
example,
n = 341 = 11 · 41 and 2340 ≡ 1 (mod 341).
We say that 341 is a pseudoprime for the base 2. There are infinitely many such
numbers; see the comments below. The first few are
341, 561, 645, 1105, 1387, 1729, 1905, 2047, 2465, 2701, 2821,
3277, 4033, 4369, 4371, 4681, 5461, 6601, 7957, 8321, 8481, 8911,
10261, 10585, 11305, 12801, 13741, 13747, 13981, 14491, 15709,
15841, 16705, 18705, 18721, 19951.
Fortunately, 3340 ≡ 56 (mod 341) and hence 3 is a witness to the fact that 341 is
composite; that is, 341 is not a pseudoprime for the base 3. By testing an integer
n with several different bases, we can weed out more pseudoprimes. Unfortunately,
there are composite numbers n that are pseudoprime for all bases 2 ≤ k ≤ n − 1
with gcd(k, n) = 1.3 These Carmichael numbers always fool our Fermat-based
primality test; see the 2010 entry.
Is there a polynomial-time algorithm that distinguishes primes and composites?
By polynomial time we mean that there are constants A, B > 0 such that the
number of elementary steps performed by the algorithm on the input n is at most
A(log n)B . The focus on log n is because the length of the decimal (or binary)
representation of n is proportional to log n.
There are algorithms that depend upon randomly selected parameters that can
do the job. One example is the Miller–Rabin test, named after Gary Lee Miller and
Michael Oser Rabin (1931– ). Let n > 2 and write n − 1 = 2r m, in which m ≥ 1 is
odd and r ≥ 0. If
j
bm ≡ 1 (mod n) or b2 m
≡ −1 (mod n) for some j ∈ {0, 1, 2, . . . , r − 1},
then n passes Miller’s test for the base b. If n fails the test for some base b, then it
is composite. It can be shown that if n is an odd composite number, then n passes
Miller’s test for at most (n−1)/4 bases b with 1 ≤ b ≤ n−1.4 This yields the Miller–
Rabin probabilistic primality test: if n passes Miller’s test for k different bases, then
the probability that n is composite is at most 1/4k . For example, if n passes the
test for k = 50 bases, then this probability is 1/450 ≈ 7.89 × 10−31 . Although
we are not 100% certain that n is a prime, our level of confidence is sufficient for
most industrial applications. Sometimes speed is more important than absolute
certainty.
It is conceivable, although highly unlikely, that n is composite but that we
continually pick from among the (n − 1)/4 “bad” bases. Thus, we cannot guarantee
that the Miller–Rabin test will work in polynomial time. On the other hand, the
Adleman–Huang test is a random procedure that is guaranteed to find a proof of
primality for a prime input in polynomial time [1].
What we really want is a deterministic, polynomial-time algorithm that dis-
tinguishes primes and composites. Over the years there were some close calls, but
it was not until an electrifying announcement from India in 2002 that we had an
answer. Manindra Agrawal (1966– ) and his two undergraduate honors students
Neeraj Kayal (1979– ) and Nitin Saxena (1981– ) gave a fairly simple deterministic,
polynomial-time algorithm that distinguishes primes from composites. It involves
a generalization of Fermat’s little theorem to the ring of polynomials over a finite
field of prime order modulo an irreducible polynomial.
We follow the description of the AKS primality test (named for Agrawal, Kayal,
and Saxena) in [3], which also contains a number of worked examples. We first
require some preliminaries. Recall that

(Z/nZ)× = k ∈ {1, 2, . . . , n − 1} : gcd(k, n) = 1
3 Ifgcd(k, n) = 1, then we already know that n is composite.
4 If the generalized Riemann hypothesis is true, then for every composite integer n, there is
a b < 2(log2 n)2 for which n fails Miller’s test for the base b.
is a group under multiplication modulo n. The order of x ∈ (Z/nZ)× is the smallest

natural number k such that xk ≡ 1 (mod n). For example, (Z/12Z)× = {1, 5, 7, 11}.
Each element has order 2 since
12 , 52 , 72 , 112 ≡ 1 (mod 12).
For polynomials f (x), g(x), and m(x) with integer coefficients and deg m(x) ≥
1, we say that
f (x) ≡ g(x) (mod m(x)) ⇐⇒ m|(f − g),
that is, if and only if there is a polynomial h(x) with integer coefficients such that
h(x)m(x) = f (x) − g(x).
For example,
3x2 + 7x + 4 ≡ x2 + 2x + 1 (mod(x + 1))
since
(3x2 + 7x + 4) − (x2 + 2x + 1) = (2x + 3)(x + 1).
The great insight of Agrawal–Kayal–Saxena was to combine regular and polynomial
congruences. We say that
f (x) ≡ g(x) (mod n, m(x))
if there is an h(x) with
f (x) − g(x) − h(x)m(x) ≡ 0 (mod n).
Although we can describe the AKS primality test, showing that it runs in polyno-
mial time would take us too far afield. See the original paper [2] or the exposition
in [3].
AKS primality test. Let N > 1 be a positive integer.
1. Test if N is a perfect kth power for some k ≥ 2. If it is, then N is composite
and stop. Else proceed to step 2.
2. Find the smallest prime r such that the order of N modulo r is greater than
(log2 N )2 .
3. If any of the numbers in {2, 3, . . . , r} share a common divisor other than 1
with N , then N is composite and stop. Else proceed to step 4.
4. If N ≤ r, then N is prime and stop. Else proceed to step 5.

5. For each positive integer a at most φ(r) log2 N , check if
(x + a)N ≡ xN + a (mod xr − 1, N ).
If there is an a for which the congruence
fails, then N is composite; if the
congruence holds for all such a at most φ(r) log2 N , then N is prime.
If the AKS primality test terminates in either step 1 or 3, then it produces a
factor of N . This is done by applying the Euclidean algorithm in step 3 to r and
N to find their greatest common divisor. If the program ends in step 5, then N is
composite but we do not obtain a factor.
Agrawal, Kayal, and Saxena were successful in de-randomizing the prime recog-
nition problem. Here is another problem for which there is a random polynomial-
time algorithm, yet for which we do not know if there is a deterministic polynomial-
time algorithm.

Proposed by Carl Pomerance, Dartmouth College.
An integer a is a quadratic nonresidue modulo p if x2 ≡ a (mod p) has no
solutions. Exactly half of the nonzero residues modulo p fit the bill. A candidate
can be checked (in polynomial time) via Euler’s criterion or the reciprocity law
for Jacobi symbols. Thus, randomly selecting nonzero residues a until you get a
quadratic nonresidue should succeed in around two tries!
A possible deterministic algorithm sequentially tries small a until a quadratic
nonresidue is found. This works well for a large proportion of the primes. For
example, one of −1, 2, 3, 5 is a quadratic nonresidue for an odd prime p unless
p ≡ 1 or 49 (mod 120). It is believed that this procedure works in polynomial time,
but this is only known under the extended Riemann hypothesis.
Another possible strategy is to start with −1 and sequentially take square roots
modulo p until a nonsquare is found. Unfortunately, we know no method to take
modular square roots in deterministic polynomial time, unless one has an oracle
that provides quadratic nonresidues!
Is there a deterministic, polynomial-time algorithm to produce a quadratic
nonresidue modulo an odd prime p?
2002: Comments
Infinitude of base-2 psuedoprimes. To demonstrate that there are infin-
itely many pseudoprimes for the base 2, it suffices to show that for each odd, base-2
pseudoprime, there is a larger odd one. We start our construction with n = 341.
Let n be an odd pseudoprime for the base 2 and let
Mn = 2n − 1
denote the nth Mersenne number, which is known to be composite (see the 1996
entry). Because 2n−1 ≡ 1 (mod n), we have
Mn − 1 = 2n − 2 = 2(2n−1 − 1) = 2dn
for some d. Thus,
2(Mn −1)/2 − 1 = 2dn − 1
= (2n − 1)(2n(d−1) + 2n(d−2) + · · · + 2n + 1)
= Mn (2n(d−1) + 2n(d−2) + · · · + 2n + 1)
≡ 0 (mod Mn ).
Since Mn > n is composite and
2Mn −1 ≡ (2(Mn −1)/2 )2 ≡ 12 ≡ 1 (mod Mn ),
we conclude that Mn is a pseudoprime for the base 2.
Although there are infinitely many pseudoprimes for the base 2, our method
does not provide an efficient method for producing them. Indeed, M341 = 2341 −1 is
far larger than 561, the smallest pseudoprime for the base 2 after 341. The number
561 is also the first Carmichael number; see the 2010 entry.
Carl Pomerance alerted us to a simpler proof of the infinitude of base-2 pseudo-
primes. We claim that if p ≥ 5 is prime, then n = (4p −1)/3 is a base-2 pseudoprime.
First observe that 4p ≡ 1 (mod 3), so n is indeed an integer. Moreover, (2p + 1)/3
is an integer and hence n = (2p − 1)((2p + 1)/3) is composite. Fermat’s theorem
ensures that
n ≡ (2p − 1)(2p + 1)3−1 ≡ (2 − 1)(2 + 1)3−1 ≡ 1 (mod p).
Since n−1 is even, we have (n−1)/2 ≡ 0 (mod p) and hence 2n−1 −1 = 4(n−1)/2 −1
is divisible by 4p − 1. Thus, n is a base-2 pseudoprime.
Bibliography
[1] L. M. Adleman and M.-D. A. Huang, Primality testing and abelian varieties over finite fields,
Lecture Notes in Mathematics, vol. 1512, Springer-Verlag, Berlin, 1992. MR1176511
[2] M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Ann. of Math. (2) 160 (2004), no. 2,
781–793, DOI 10.4007/annals.2004.160.781. http://www.cse.iitk.ac.in/users/manindra/
algebra/primality_v6.pdf. MR2123939
MR3098499
[4] R. Crandall and C. Pomerance, Prime numbers: A computational perspective, 2nd ed.,
[5] A. Granville, It is easy to determine whether a given integer is prime, Bull. Amer. Math.
Soc. (N.S.) 42 (2005), no. 1, 3–38, DOI 10.1090/S0273-0979-04-01037-7. http://www.dms.
umontreal.ca/~andrew/PDF/Bulletin04.pdf. MR2115065
[6] C. Pomerance, Primality testing: variations on a theme of Lucas, Congr. Numer. 201 (2010),
301–312. https://math.dartmouth.edu/~carlp/PDF/lucastalk.pdf. MR2598366
2003
Poincaré Conjecture
Introduction
In 2003, Grigori Perelman, building upon seminal work of Richard S. Hamilton
(1943– ), proved the Poincaré conjecture, one of the million-dollar Clay Millennium
Problems (see the comments for the 2000 entry). The conjecture asserts that every
smooth, compact, simply connected, closed 3-manifold is homeomorphic to the 3-
sphere

(x, y, z, w) ∈ R4 : x2 + y 2 + z 2 + w2 = 1 .
Two manifolds are homeomorphic if there is a continuous bijection between them
that has a continuous inverse. For example, a circle and the trefoil knot (see the 1985
entry) are homeomorphic 1-manifolds, even though they cannot be continuously
deformed into each other when embedded in R3 . On the other hand, the 2-sphere
and the Euclidean plane are not homeomorphic: one is compact and the other is
not.1
A particularly down-to-earth explanation of the main difficulty behind the
Poincaré conjecture was recalled by Gerry Myerson in 2012 [3]:
I once heard an expert “explain” the difficulty of the n = 3 case to a
general audience by saying something like this: when n ≤ 2, there isn’t
enough room for anything to go wrong, while for n ≥ 4, there’s enough
room to fix anything that goes wrong; for n = 3, there’s enough room
for something to go wrong, and. . . it’s not clear whether there’s enough
room to fix things when they go wrong.
The cases n = 1 and n = 2 are classical and date back to the foundations of
algebraic topology. Stephen Smale proved the conjecture for n ≥ 4 in 1961 and
Michael Freedman (1951– ) proved it for n = 4 in 1982. Since both of them
received Fields Medals for their work, one can claim that the Poincaré conjecture
resulted in either two or three medals, depending upon how one accounts for the
enigmatic Perelman (see the comments below).
Although the resolution of the Poincaré conjecture is hopelessly beyond the
level of this book and the expertise of its authors, we can discuss its analogue for 2-
manifolds. By a surface, we mean a smooth, connected, two-dimensional manifold.
Think of this as a nice topological space that locally resembles R2 and does not
consist of multiple disjoint pieces. For example, a microscopic observer on a torus or
Klein bottle will believe their local environment is flat and two dimensional, much
as we perceive the ground around us as flat. A surface is closed if it is compact
1 However, the 2-sphere with a point removed is homeomorphic to the plane via stereographic
projection.
505
506 2003. POINCARÉ CONJECTURE
(a) Sphere (b) Torus
(c) Klein bottle (d) Projective plane
Figure 1. Fundamental polygons for several 2-manifolds. More

sides may be necessary for more complicated manifolds, such as a
sphere with several handles attached.
and has no boundary. For example, the sphere is closed, but the Möbius strip is
not since its boundary is a circle (see the 1958 entry). A closed surface can be
diagrammed using a fundamental polygon, an even-sided polygon with some of its
edges identified; see Figure 1 and the comments for the 1958 entry.
A surface is simply connected if every loop on the surface can be contracted to
a point without leaving the surface. For example, the sphere is simply connected
and the torus is not; see Figure 2. The analogue of the Poincaré conjecture for
2-manifolds asserts that every simply connected, closed surface is homeomorphic
to the sphere. This is a consequence of the classification of surfaces from alge-
braic topology, which says that every closed surface is homeomorphic to one of the
following:
(a) the sphere,
(b) the connected sum of tori, or
(c) the connected sum of real projective planes;
see the comments below for information about the connected sum of manifolds.
Every surface in the first two classes is orientable; every surface in the third class
is nonorientable. Of these, the only simply connected surface is the sphere; this
implies the Poincaré conjecture for 2-manifolds.
(a) Every path on the sphere is contractible to (b) Neither of these two paths on the torus is
a point. Thus, the sphere is simply connected. contractible to a point.
Figure 2. The sphere is simply connected and the torus is not.
The problem for this year was originally posed by Frank Morgan of Williams
College and it concerned 4-manifolds. However, he felt that the statement was too
imprecise to be included here. Instead, we present a simple combinatorial problem
with a visual twist that builds upon the comments to the 1980 entry. See below for
the solution.

We saw in the 1980 entry that it is impossible to tile, with nonoverlapping
2 × 1 black-and-white dominoes, a chessboard that has two corners removed (while
respecting the underlying black-and-white pattern). Is such a tiling possible if two
squares of different colors are removed instead (see Figure 3)?
2003: Comments
Perelman’s Fields Medal. Contrary to popular belief, Perelman did not
receive the prestigious Fields Medal for his resolution of the Poincaré conjecture.
He declined the award and did not even attend the award ceremony:
In May 2006, a committee of nine mathematicians voted to award
Perelman a Fields Medal for his work on the Poincaré conjecture. How-
ever, Perelman declined to accept the prize. Sir John Ball, president of
the International Mathematical Union, approached Perelman in Saint
Petersburg in June 2006 to persuade him to accept the prize. After 10
hours of attempted persuasion over two days, Ball gave up. Two weeks
later, Perelman summed up the conversation as follows: “He proposed
to me three alternatives: accept and come; accept and don’t come,
and we will send you the medal later; third, I don’t accept the prize.
From the very beginning, I told him I have chosen the third one. . . [the
prize] was completely irrelevant for me. Everybody understood that if
the proof is correct, then no other recognition is needed.” [9]
In 2010, Perelman also declined the million-dollar prize offered by the Clay foun-
dation (see the comments for the 2000 entry).
B A
Figure 3. Is it possible to tile, with 2 × 1 black-and-white domi-

noes, a chessboard that has two squares of different colors removed?
What if both squares marked “A” are removed? What if both
squares marked “B” are removed?
A monoid of manifolds. A monoid is an algebraic structure similar to a

group, except that inverses need not exist. To be more specific, a monoid is a set
that is closed under an associative binary operation for which an identity element
exists. What is the relationship between monoids and surfaces?
Given two surfaces M and N , their connected sum M #N is the surface obtained
by removing a disk from each of M and N and then gluing the resulting boundary
circles together [10]. One can show that the homeomorphism class of the resulting
surface is independent of the location of the excised disks.
Let S denote the (two-dimensional) sphere, K the Klein bottle, T the torus, and
P the (real) projective plane. The sphere is the identity element for the connected
sum operation, in the sense that S#M = M for all surfaces M . This is because if
we remove a disk from S, then the resulting surface can be deformed into a disk
that takes the place of the disk removed from M .
What about the connected sum of a surface with a torus? In visual terms,
M #T is “M with a handle attached.” What does attaching a Klein bottle to
a surface mean? If M is orientable, then M #K can be regarded as “M with a
handle whose ends are attached to opposite sides of M .” The projective plane
P is not orientable, so the notion of “side” is meaningless. This is reflected in
the algebraic relation P #T = P #K. One can also show that P #P = K and
P #K = P #T . These computations can be summarized succinctly as follows. The
monoid of homeomorphism classes of surfaces is the commutative monoid with
identity S that is generated by P and T , modulo the single relation

P #P #P = P #T.
This identity is called Dyck’s theorem, after Walther von Dyck.
The connected sum operation is compatible with the Euler characteristic (see
the comments for the 1976 entry) in the following sense:
χ(M #N ) = χ(M ) + χ(N ) − 2.
Since χ(S) = 2, χ(P ) = 1, and χ(T ) = 0, it follows that
χ(T #T # · · · #T ) = 2 − 2k and χ(P #P # · · · #P ) = 2 − k.

k times k times
Putting this all together, we see that a closed surface is completely determined,
up to homeomorphism, by its Euler characteristic and orientability. If a surface is
nonorientable, then it is homeomorphic to a connected sum of projective planes.
On the other hand, an orientable surface is homeomorphic either to a sphere or a
connected sum of tori. The number of summands, in both cases, can be discerned
by computing the Euler characteristic of the given surface [10].
Solution to the problem. The elegant solution to our problem is due to
Ralph E. Gomory (1929– ) [1]. Consider the path llustrated in Figure 4. Suppose
that two squares of different colors are removed from the board. Then they are
separated, in either direction along the path, by an even number of squares, half
of which are black and half of which are white. Thus, the desired tiling exists and,
moreover, Figure 4 suggests an algorithm to efficiently produce it.
Figure 4. The chessboard can be regarded as a cycle graph of

length 64.
What happens if we replace the standard 8 × 8 board with an 8 × 9 board? A

9 × 9 board? More generally, when can we tile an m × n board that has two squares
of the same color removed?
Bibliography
[1] R. Honsberger, Mathematical Gems I, Mathematical Association of America, 1974.
[2] J. Milnor, Poincare Conjecture, http://www.claymath.org/millennium-problems/
poincare-conjecture.
[3] G. Myerson, Poincare conjecture for n = 2 (answer), https://math.stackexchange.com/
questions/103182/poincare-conjecture-for-n-2.
[4] S. Nasar and D. Gruber, Manifold Destiny: A legendary problem and the battle over who
solved it, The New Yorker, https://www.newyorker.com/magazine/2006/08/28/manifold-
destiny.
[5] G. Perelman, The entropy formula for the Ricci flow and its geometric applications, https://
[6] G. Perelman, Ricci flow with surgery on three-manifolds, https://arxiv.org/abs/math/
0303109.
[7] G. Perelman, Finite extinction time for the solutions to the Ricci flow on certain three-
manifolds, https://arxiv.org/abs/math/0307245.
[8] T. Tao, Perelman’s proof of the Poincaré conjecture: a nonlinear PDE perspective, https://
[9] Wikipedia, Grigori Perelman, https://en.wikipedia.org/wiki/Grigori_Perelman.
[10] Wikipedia, Surface (topology), https://en.wikipedia.org/wiki/Surface_(topology).
2004
Primes in Arithmetic Progression
Introduction
2004 is another year that witnessed the announcement of two major results,
each of which is worthy of a whole entry in its own right. One was the culmination
of decades of work by dozens of mathematicians: the classification of finite simple
groups. The other is the celebrated Green–Tao theorem, which guarantees the
existence of arbitrarily long arithmetic progressions in the primes [8, 17]. Alas, we
can choose only one to focus on. However, we do have a few words to say about
finite simple groups; see the comments below.
What does the Green–Tao theorem say? It asserts that for any length , there
is an initial prime p and a common difference k so that the length- arithmetic pro-
gression p, p + k, p + 2k, . . . , p + ( − 1)k consists entirely of primes. Ben Green and
Terence Tao proved this amazing result using a “relative” version of Szemerédi’s
theorem (see the 1975 entry). Szemerédi’s theorem tells us that a subset of the
natural numbers with positive upper density contains arbitrarily long arithmetic
progressions. Unfortunately, the prime numbers have density zero and hence Sze-
merédi’s theorem does not immediately apply. Green and Tao proved a version of
Szemerédi’s theorem that applies to sets of natural numbers that are pseudoran-
dom in a certain technical sense. The final step of their proof is the construction
of a pseudorandom subset of the natural numbers that contains the primes as a
relatively dense subset. A recent overview of the theorem and its proof is [3].
Can the Green–Tao theorem be used to find arithmetic progressions in the
primes? Yes and no. The proof provides numerical bounds that guarantee the
existence of such an arithmetic progression in a certain range. However, the num-
bers produced are so astronomically large that they are well beyond the limit of
modern computation. As of mid-2018, the longest known arithmetic progression in
the primes has length twenty-six. The first such example,
43142746595714191 + 5283234035979900k, k = 0, 1, 2, . . . , 25,
was discovered in 2010 by Benoı̂t Perichon on a PlayStation 3 equipped with special

software produced for the purpose [11, 18].
There are now many generalizations and extensions of the Green–Tao theorem
[7,9,13–15]. We focus here on one of them that has a particularly nice visual appeal
to it [13]. A Gaussian integer is a number of the form a + bi, in which a, b ∈ Z
and i2 = −1. The set of Gaussian integers forms a ring, denoted Z[i], under the
usual operations inherited from the complex number system. A Gaussian prime is
511
512 2004. PRIMES IN ARITHMETIC PROGRESSION
(a) ρ = 50 (b) ρ = 100
Figure 1. Gaussian primes a + bi in the range |a|, |b| ≤ ρ.
a prime in the ring Z[i]. Thus, z ∈ Z[i] is a Gaussian prime if
z = xy with x, y ∈ Z[i] =⇒ x ∈ {1, −1, i, −i} or y ∈ {1, −1, i, −i}.
For example, 2 is not a Gaussian prime since 2 = (1 + i)(1 − i). One can show
that a Gaussian integer is prime if and only if it is of the form ±p or ±pi, in which
p ≡ 3 (mod 4) is prime in the usual sense, or if it is of the form a + bi, in which
a2 + b2 is prime in the usual sense; see Figure 1. In 2005, Terence Tao showed
that given any distinct v0 , v1 , . . . , vk−1 ∈ Z[i], then there are infinitely many sets
{a + rv0 , a + rv1 , . . . , a + rvk−1 }, in which a ∈ Z[i] and r ∈ Z\{0}, all of whose
elements are Gaussian primes.
The Green–Tao theorem, along with many other famous theorems and difficult
conjectures, follows from the Bateman–Horn conjecture. See the comments for the
2005 entry for more information about this tantalizing conjecture.

The Green–Tao theorem implies that for each natural number N , there is an
even number 2m, in which m depends on N , such that there are at least N pairs
of primes whose common difference is 2m. Prove this without appealing to the
Green–Tao theorem.
2004: Comments
Four squares in arithmetic progression. The Green–Tao theorem ad-
dresses primes in arithmetic progressions. What about perfect squares? The
comments to the 1913 entry show how to construct three squares in arithmetic
progression. Mathematical folklore credits Fermat with the proof that there does
not exist an arithmetic progression of four perfect squares [6], although Leonhard
Euler is attributed the observation in 1780 [4]. A proof using Fermat’s method of
descent can be found in [16]. The more modern approach to the problem involves
elliptic curves. The crux of the matter is that the rational quadruples (a, b, c, d)
so that a2 , b2 , c2 , d2 form an arithmetic progression can be parametrized by the
rational points on the elliptic curve
y 2 = (x − 1)(x − 2)(x + 2).
One can show that the curve has only eight rational points, all of which give rise
to trivial solutions to the original problem. Consequently, there are no rational
perfect squares in arithmetic progression. The details can be found in [4].
Euclid’s theorem revisited. There is a lot that we do not understand about

prime numbers. Even Euclid’s theorem (see p. 4) still holds some mystery. Let
a1 = 2, the first prime. Then a1 + 1 = 3, which is also prime; set a2 = 3. Next,
observe a1 a2 + 1 = 7, which is another prime; set a3 = 7. In the next stage, we
see that a1 a2 a3 + 1 = 43, another prime, which we denote by a4 . Now things get
interesting. Observe that
a1 a2 a3 a4 + 1 = 1,807 = 13 · 139;
set a5 = 13. In general, let an be the smallest prime in the factorization of

a1 a2 · · · an−1 + 1 that is not among a1 , a2 , . . . , an−1 . This yields the Euclid–Mullin
sequence [5, 10]:
2, 3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17,

5471, 52662739, 23003, 30693651606209, 37, 1741, 1313797957,
887, 71, 7127, 109, 23, 97, 159227, 643679794963466223081509857,
103, 1079990819, 9539, 3143065813, 29, 3847, 89, 19, 577, 223,
139703, 457, 9649, 61, 4357,. . . .
Does this sequence contain every prime? Without a major breakthrough in our
understanding of prime numbers, this question will likely remain unanswered for
many years to some.
Classification of finite simple groups. The year 2004 witnessed the com-
pletion of the classification of finite simple groups, a decades-long quest. A group is
simple if it contains no normal subgroups other than itself and the trivial subgroup
(see the 1992 entry for more background). Consequently, a simple group cannot
be decomposed further using the quotient group construction. The finite simple
groups are the “atoms” from which more complicated finite groups, “molecules” if
you will, can be constructed. In contrast to atoms, which come in only a hundred
or so varieties, there are infinitely many finite simple groups.
In 1972, Daniel Gorenstein (1923–1992) proposed a sixteen-step program to

complete the classification, an odyssey first (unintentionally) begun by Évariste
Galois (1811–1832) with his discovery of groups and of two families of finite simple
groups. In 2004, Michael Aschbacher (1944– ) and Stephen D. Smith published a
massive two-volume book, over a thousand pages in total, that handled the classifi-
cation of “quasithin groups” [1,2]. This was the only missing piece in the Gorenstein
program and it finally completed the classification of finite simple groups.
The classification theorem states that every finite simple group is isomorphic
to one of the following:
(a) a cyclic group of prime order,
(b) an alternating group An with n ≥ 5,
(c) a group of Lie type,
(d) one of the 26 sporadic groups.
There is a lot to unpack here and we can only sketch the details. The alternating
group An is the subgroup of Sn , the group of permutations on n symbols, that
consists of all even permutations. The groups An are simple if n ≥ 5 (the simplicity
of these groups is closely related to the fact that there is no analogue of the quadratic
formula for polynomial equations of degree five and higher; see the 1973 entry).
The technical definition of a “group of Lie type” would take us too far afield,
so we content ourselves with some broad strokes. Here “Lie” refers to Sophus Lie
(1842–1899), whose name is pronounced “Lee.” There are sixteen families of finite
simple groups of Lie type, most of which were discovered long ago. Many can be
realized as matrix groups over finite fields and several are closely related to exotic
Lie algebras. For the sake of illustration, here is one such example. Start with the
special linear group SLn (Fq ) of all n × n matrices with determinant 1 and entries
in the finite field Fq of q elements. The quotient of SLn (Fq ) by the subgroup of
nonzero multiples of the identity is the projective special linear group P SLn (Fq ).
If n ≥ 2 and q = 2, 3, then one can show that P SLn (Fq ) is a finite simple group of
Lie type.
Most interesting are the 26 sporadic groups.1 These are outliers that do not
fit neatly into any classification scheme. The sporadic groups are divided into two
broad classes: the pariahs and the happy family. The pariahs are not subquotients
of the monster group M (see the 1992 entry); that is, a pariah cannot be obtained
as a quotient group of some subgroup of M . These are the six vertices that do
not have upward paths toward the monster group M in Figure 2. In contrast, all
twenty members of the happy family are subquotients of M . They are divided into
three “generations,” with the monster group being of the third generation.
No single human in 2004 could comprehend the proof of the classification the-
orem in its entirety (see the 1976 and 1998 entries for other instances of this phe-
nomenon). It was spread over hundreds of journal articles, written by many dozens
of authors, over the course of several decades. Moreover, the final piece of the
puzzle was the two-volume book of Aschbacher and Smith, which weighs in at well
1 There is another group, named after Jacques Tits (1930– ), that is occasionally regarded as
the “27th sporadic group.” However, it is usually considered an unusual group of Lie type.
Figure 2. Table of sporadic groups and their subquotient rela-

tionships (groups that are maximal with respect to this relation
are circled). The monster group M contains 20 of the sporadic
groups as subquotients. Image by user Drschawrz https://en.
wikipedia.org/wiki/File:SporadicGroups.svg and used under
Creative Commons Attribution-Share Alike 3.0 Unported license.
over 1,000 pages. A massive effort to compile a complete and largely self-contained
proof of the classification theorem is well underway:
In 1981 the monumental project to classify all of the finite simple
groups appeared to be nearing its conclusion. Danny Gorenstein had
dubbed the project the “Thirty Years’ War” dating its inception from
an address by Richard Brauer at the International Congress of Math-
ematicians in 1954. He and Richard Lyons agreed that it would be
desirable to write a series of volumes that would contain the complete
proof of this Classification Theorem, modulo a short and clearly spec-
ified list of background results. As the existing proof was scattered
over hundreds of journal articles, some of which cited other articles
that were never published, there was a consensus that this was indeed
a worthwhile project. [12]
The project is expected to be completed in 2023. Perhaps one day soon the entire
proof will be verified by computer.
Solution to the problem. Although we could use the prime number theorem
to solve the problem, a weaker result due to Chebyshev suffices. He proved that
there are constants A ≈ 0.9212 and B ≈ 1.1055 so that
Ax Bx
≤ π(x) ≤
log x log x
for sufficiently large x, in which π(x) denotes the prime-counting function. Suppose
that x is even and large enough for Chebyshev’s estimate to hold. Then the number
of distinct pairs of primes (p, q) with 2 < p < q ≤ x is

π(x) − 1 (π(x) − 1)(π(x) − 2) π(x)2 A 2 x2
= > > .
2 2 3 3 log2 x
Since the number of possible even differences between primes at most x is bounded
above by x/2, the average number of occurrences of each difference is
π(x)−1
A2 x2 /(3 log2 x) 2A2 x
2
≥ = , (2004.1)
x/2 x/2 3 log2 x
which tends to infinity. At least one of these differences occurs at least the average
number of times. Given N , let x be an even number that is large enough to ensure
that Chebyshev’s estimates are valid and that the right-hand side of (2004.1) is
larger than N . Then there is a common difference 2m for which at least N pairs
of primes (p, q) with p − q = 2m exist.
Bibliography
[1] M. Aschbacher and S. D. Smith, The classification of quasithin groups. I, Structure of strongly
quasithin K-groups, Mathematical Surveys and Monographs, vol. 111, American Mathemat-
ical Society, Providence, RI, 2004. MR2097623
[2] M. Aschbacher and S. D. Smith, The classification of quasithin groups. II, Main theorems:
the classification of simple QTKE-groups, Mathematical Surveys and Monographs, vol. 112,
American Mathematical Society, Providence, RI, 2004. MR2097624
[3] D. Conlon, J. Fox, and Y. Zhao, The Green-Tao theorem: an exposition, EMS Surv. Math.
Sci. 1 (2014), no. 2, 249–282, DOI 10.4171/EMSS/6. https://arxiv.org/abs/1403.2957.
MR3285854
[4] K. Conrad, Arithmetic progressions of four squares, http://www.math.uconn.edu/~kconrad/
blurbs/ugradnumthy/4squarearithprog.pdf.
[5] R. Crandall and C. Pomerance, Prime numbers:A computational perspective, Springer-Verlag,
New York, 2001. MR1821158
[6] L. E. Dickson, History of the theory of numbers. Vol. II, Diophantine analysis, reprinted by
AMS, 1992.
[7] J. Fox and Y. Zhao, A short proof of the multidimensional Szemerédi theorem in the primes,
Amer. J. Math. 137 (2015), no. 4, 1139–1145, DOI 10.1353/ajm.2015.0028. MR3372317
Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. http://arxiv.org/
abs/math.NT/0404188. MR2415379
[9] B. Green and T. Tao, Linear equations in primes, Ann. of Math. (2) 171 (2010), no. 3,
1753–1850, DOI 10.4007/annals.2010.171.1753. MR2680398
[10] On-Line Encyclopedia of Integer Sequences, A000945 (Euclid-Mullin sequence: a(1) = 2,

a(n + 1) is smallest prime factor of 1 + nk=1 a(k), https://oeis.org/A000945.
[11] On-Line Encyclopedia of Integer Sequences, A204189 (Benot̂ Perichon’s 26 primes in arith-
metic progression), https://oeis.org/A204189.
[12] R. Solomon, The classification of finite simple groups: a progress report, Notices Amer.
Math. Soc. 65 (2018), no. 6, 646–651. https://www.ams.org/journals/notices/201806/
rnoti-p646.pdf. MR3792856
[13] T. Tao, The Gaussian primes contain arbitrarily shaped constellations, J. Anal. Math.
99 (2006), 109–176, DOI 10.1007/BF02789444. https://arxiv.org/abs/math/0501314.
MR2279549
[14] T. Tao and T. Ziegler, The primes contain arbitrarily long polynomial progressions, Acta
Math. 201 (2008), no. 2, 213–305, DOI 10.1007/s11511-008-0032-5. MR2461509
[15] T. Tao and T. Ziegler, A multi-dimensional Szemerédi theorem for the primes via a corre-
spondence principle, Israel J. Math. 207 (2015), no. 1, 203–228, DOI 10.1007/s11856-015-
1157-9. MR3358045
[16] A. van der Poorten, Fermat’s Four Squares Theorem, https://arxiv.org/abs/0712.3850v1.
[17] Wikipedia, Green–Tao theorem, https://en.wikipedia.org/wiki/Green-Tao_theorem.
[18] Wikipedia, Primes in arithmetic progression, https://en.wikipedia.org/wiki/
Primes_in_arithmetic_progression.
2005
William Stein Developed Sage
Introduction
A lot of mathematical software, such as Mathematica (see the 1988 entry)
and Maple, are closed source. This means that the actual nuts and bolts of the
algorithms and implementations are hidden from the user. For example, the Math-
ematica command Fibonacci[n] almost instantly returns the nth Fibonacci num-
ber. But what is going on under the hood? Is the program using the definition of
the Fibonacci numbers? Probably not, that would be painfully slow. Is it using
something along the lines of Binet’s formula (see the comments for the 2001 entry)?
Possibly. Perhaps Mathematica uses something altogether different and much more
clever. We simply do not know because the source code is not publicly available.
Without publicly available source code, it is difficult for a researcher to verify
that a program does exactly what it claims. Are the results accurate? Are the
algorithms correctly implemented? With closed-source programs, one must simply
trust that the programmers knew what they were doing and got things right.
In early 2005, William A. Stein (1974– ) released Sage (Software for Algebra
and Geometry Experimentation) in response to these issues; see Figure 1. Although
it is now called SageMath, the goal remains the same [4]:
The goal of the Sage project is to create a viable open source alter-
native to Magma, Maple, Mathematica, and MATLAB, which are all
closed source. This means that people have choice—they at least have
the option to use open source software for their math research and
teaching in all the academic areas represented by those software. Pro-
viding such a choice entails both implementing all relevant algorithms
in Sage (with competitive efficiency and correctness), and creating cor-
responding textbooks and documentation.
Figure 1. Sage logo. Image courtesy of Alex Clemesha and

Harald Schilly.
519
520 2005. WILLIAM STEIN DEVELOPED SAGE
Figure 2. Collection of four screenshots showing Sage in the fol-

lowing situations: a command-line terminal (text-only), jupyter
notebook (interactive document), “sage cell” (an online service to
run a block of code), and CoCalc (virtual online environment for
computations, showing a “Sage Worksheet”). Image courtesy of
Harald Schilly.
SageMath features a web-based interface that lets the user harness the power of
dozens of open-source packages and perform computations across the spectrum
of pure and applied mathematics; see Figure 2. Computations can be performed
locally or remotely on a SageMath server.
Although SageMath is used by many mathematicians around the world, Stein
faced enormous difficulty obtaining funding. Unlike commercially available soft-
ware, SageMath does not bring in revenue and, in fact, it did not have a single
full-time developer until 2016 [11]. Most of the software development was car-
ried out by volunteers, mostly students and working mathematicians or computer
scientists. In a 2018 interview, Stein said [4]:
My perspective with Sage has always been to try to make a tool that
people could use to compute mathematical objects more easily, with
minimal friction. They should not have to pay a lot of money, they
should have full access to readable source code, and have many good
code examples that definitely work.
Although Stein has stepped back a bit from development work on SageMath (he
is now the CEO of SageMath, Inc. and focuses mostly on its cloud-computing
platform, CoCalc), progress continues unabated [4]:
Sage development proceeds at a steady pace, with many Sage Days
workshops in both the US and Europe; for example IMA [Institute for
Mathematics and its Applications] in Minnesota is sponsoring many
workshops this year and OpenDreamKit in Europe too! Most work on
Sage is motivated by the needs of research mathematicians for their
own work. Releases keep happening, and around 100 people contribute
to each release.

Go to the SageMath homepage
http://www.sagemath.org/index.html,
download SageMath or sign up for an online account, and see what it can do!
2005: Comments
The Bateman–Horn conjecture. On the theme of numerical computation
and hot on the heels of last year’s entry (the Green–Tao theorem), we embark upon
one of the final running threads in this book: the Bateman–Horn conjecture. Like
the Riemann hypothesis (see the 1942 and 1987 entries) and the abc-conjecture (see
the 1981 entry), the Bateman–Horn conjecture has many far-reaching consequences
and remains unproven. The material below, and much more, can be found in the
recent expository article [1].
The conjecture stems from a 1962 summer undergraduate research project at
the University of Illinois at Urbana-Champaign. Paul T. Bateman (1919–2012), an
analytic number theorist who joined the university in 1950, sponsored the project
and employed a promising young student, Roger A. Horn (1942– ). In 1963, they
used the ILLIAC (Illinois Automatic Computer), the first computer built and owned
by a US-based academic institution, to run some computations concerning the
distribution of prime numbers.
Needless to say, they did not use Sage, Mathematica, or any other software that
the modern user might recognize. The programs were entered on paper tape and
fed into the machine by a dedicated operator via a noisy mechanism. An attached
printer could produce output at the modest rate of ten characters per second.
Among other computations, Bateman and Horn found the 776 primes p ≤
113,000 for which p2 + p + 1 is also prime. This computation, which took 400
minutes on the state-of-the-art ILLIAC, was performed on the first named author’s
late-2013 iMac in a tenth of a second. How times have changed! These sorts
of computations, along with previous conjectures of Bunyakowky (1854), Dickson
(1904), Landau (1912), Hardy and Littlewood (1923), and Schinzel (1958), pointed
toward a grand conjecture about the asymptotic distribution of primes generated
522 2005. WILLIAM STEIN DEVELOPED SAGE
by families of polynomials [2, 3]:
Bateman–Horn conjecture. Let f1 , f2 , . . . , fk ∈ Z[x] be distinct irreducible poly-

nomials with positive leading coefficients and let

Q(f1 , f2 , . . . , fk ; x) = {n ≤ x : f1 (n), f2 (n), . . . , fk (n) are prime}. (2005.1)
Suppose that f = f1 f2 · · · fk does not vanish identically modulo any prime. Then

C(f1 , f2 , . . . , fk ) x dt
Q(f1 , f2 , . . . , fk ; x) ∼ k k
, (2005.2)
i=1 deg fi 2 (log t)
in which
1
−k
ωf (p)

C(f1 , f2 , . . . , fk ) = 1− 1− (2005.3)
p
p p
and ωf (p) is the number of solutions to f (x) ≡ 0 (mod p).

Consequences of the Bateman–Horn conjecture include the Green–Tao theorem,
the prime number theorem, Dirichlet’s theorem on primes in arithmetic progres-
sions, and the twin prime conjecture. It also explains Euler’s enigmatic “prime
producing” polynomial and the mysterious Ulam spiral (see the comments for the
2006, 2007, and 2009 entries). We will return to the Bateman–Horn conjecture
several times over the remaining entries and explain some of these exciting connec-
tions.
Why is this conjecture plausible?1 First of all, the many hypotheses ensure
that there is no simple “obstruction” that prevents the polynomials f1 , f2 , . . . , fk
from simultaneously assuming prime values infinitely often. For example,
x2 − 1 = (x − 1)(x + 1)
is reducible and hence factors nontrivially if x ≥ 3. Another obstacle is illustrated

by x3 − x + 3, which is irreducible but always divisible by 3 since
x3 − x + 3 ≡ x3 − x ≡ x(x − 1)(x + 1) ≡ 0 (mod 3).
We expect that higher-degree polynomials assume prime values less frequently over
k
a given range. This tendency manifests itself in the denominator i=1 deg fi of
(2005.2). The integral in (2005.2) is reminiscent of the logarithmic integral that
we encountered in our study of the prime number theorem (see the 1933 entry).
The power of the logarithm reflects the fact that additional polynomials drive down
the frequency of arguments for which the polynomials simultaneously attain prime
values. Finally, the Bateman–Horn constant (2005.3) that appears in (2005.2) is a
correction factor that takes into account information about how the f1 , f2 , . . . , fk
behave modulo each prime. The fact that the infinite product (2005.3) converges
is not at all obvious. The proof is quite delicate and involves elements of both
algebraic and analytic number theory; see [1, Sect. 5] for the details.
1 See [1, Sect. 3] for a detailed heuristic derivation of the Bateman–Horn conjecture, based
upon the Cramér random model of the primes (see the comments for the 1987 entry).
Before moving on, we should say something about Roger A. Horn, a collaborator
of the first named author on a recent linear algebra textbook [5]. The following
passage is from [1, Sect. 5]:
Horn is known best for his long and storied career in matrix analy-
sis. Among his chief publications are the classic texts Matrix Analysis
[7] and Topics in Matrix Analysis [8], both coauthored with Charles
Johnson. Of his many papers, only two are on number theory; both of
these date from the early 1960s and concern the Bateman–Horn con-
jecture [2, 3]. Consequently, many of his close colleagues are unaware
of his connection to a famous conjecture in number theory.
Indeed, the first named author only became aware of Horn’s involvement in the
conjecture because of his recent work on primitive roots for twin primes [6].
Bibliography
[1] S. L. Aletheia-Zomlefer, L. Fukshansky, and S. R. Garcia, The Bateman–Horn Conjecture:
Heuristics, History, and Applications, to appear in Expositiones Mathematicae, https://
[2] P. T. Bateman and R. A. Horn, A heuristic asymptotic formula concerning the distribution
of prime numbers, Math. Comp. 16 (1962), 363–367, DOI 10.2307/2004056. MR0148632
[3] P. T. Bateman and R. A. Horn, Primes represented by irreducible polynomials in one variable,
Proc. Sympos. Pure Math., Vol. VIII, Amer. Math. Soc., Providence, R.I., 1965, pp. 119–132.
MR0176966
[4] A. Diaz-Lopez, William Stein interview, Notices Amer. Math. Soc. 65 (2018), no. 5, 540–543.
MR3753815
[6] S. R. Garcia, E. Kahoro, and F. Luca, Primitive root biases for twin primes, Experimental
Mathematics (in press), https://arxiv.org/abs/1705.02485.
[7] R. A. Horn and C. R. Johnson, Matrix analysis, 2nd ed., Cambridge University Press, Cam-
bridge, 2013. MR2978290
[8] R. A. Horn and C. R. Johnson, Topics in matrix analysis, corrected reprint of the 1991
original, Cambridge University Press, Cambridge, 1994. MR1288752
[9] Sage, http://www.sagemath.org/.
[10] W. Stein, Mathematical software and me: a very personal recollection, http://sagemath.
blogspot.com/2009/12/mathematical-software-and-me-very.html.
[11] Wikipedia, SageMath, https://en.wikipedia.org/wiki/SageMath.
2006
The Strong Perfect Graph Theorem
Introduction
Let G be a graph. The chromatic number χ(G) of G is the smallest number
of colors needed to paint the vertices of G so that no pair of adjacent vertices have
the same color. The clique number ω(G) of G is the size of the largest induced
complete subgraph in G, that is, the size of the largest subset of vertices of G, all
of which are connected to each other. Since a complete graph Kn on n vertices
satisfies χ(Kn ) = n, it follows that χ(G) ≥ ω(G) for any graph; see Figure 1. In
principle, finding either quantity is computationally intractable since both problems
are NP-hard. Nevertheless, many algorithms exist that can find χ(G) or ω(G) for
graphs of reasonable size or from certain special families.
A graph G is perfect if every induced subgraph H has χ(H) = ω(H). For
example, every bipartite graph is perfect, as is every forest (disjoint union of trees).
In 1961, Claude Berge (1926–2002) proposed a deep conjecture: a graph is perfect
if and only if neither it nor its complement has an induced subgraph that is a cycle
of odd length five or greater [2]. The conjecture implies that perfect graphs are
closed under complementation. This weaker result (the perfect graph theorem) was
proved by László Lovász (1948– ) in 1972 via an elegant polyhedral argument [8, 9].
(a) χ(K5 ) = ω(K5 ) = 5. (b) χ(C5 ) = 3 and ω(C5 ) = 2.
Figure 1. Chromatic numbers for the K5 , the complete graph on

5 vertices, and C5 , the cycle graph on 5 vertices.
525
526 2006. THE STRONG PERFECT GRAPH THEOREM
Figure 2. Graph with vertices 2, 3, . . . , 17 and edges between dis-

tinct a and b if and only if gcd(a, b) ≥ 2.
The full proof of Berge’s conjecture, now called the strong perfect graph theo-
rem, was obtained in 2006 by Maria Chudnovsky (1977– ), Neil Robertson
(1938– ), Paul Seymour (1950– ), and Robin Thomas [3], just one month before
Berge passed away. The foursome was awarded the 2009 Delbert Ray Fulkerson
Prize for outstanding work in discrete mathematics [6]:
Claude Berge introduced the class of perfect graphs in 1960, together
with a possible characterization in terms of forbidden subgraphs. The
resolution of Berge’s strong perfect graph conjecture quickly became
one of the most sought-after goals in graph theory. . . . The long, dif-
ficult, and creative proof by Chudnovsky and her colleagues is one of
the great achievements in discrete mathematics.

Proposed by Matt DeVos, Simon Fraser University.
A class G of graphs is χ-bounded if there is a fixed function f so that χ(G) ≤
f (ω(G)) for all G ∈ G. An unsolved conjecture of András Gyárfás (1945– ) asserts
that the class of graphs with no induced subgraph that is an odd cycle of length
≥ 5 is also χ-bounded. Explore this conjecture, and examine what happens if there
are no induced subgraphs that are an odd cycle of length at most 4.
2006: Comments
Bonus problem. The problem proposed for 2006 was solved in 2017 by Maria
Chudnovsky, Alex Scott, Paul Seymour, and Sophie Spirkl [4]. So here is a
“bonus problem,” posed by Steven J. Miller. Let G be the graph with vertices
(a) A seven-coloring of the plane with the

Moser spindle superimposed. Image by David
Eppstein (public domain) (b) The Golomb graph
Figure 3. The Moser spindle and the Golomb graph are unit-
distance graphs in the plane with chromatic number 4. Their exis-
tence implies that the answer to the Hadwiger–Nelson problem is
at least four.
2, 3, 4, 5, . . . , 10,000 and with edges between a and b if and only if gcd(a, b) ≥ 2;

see Figure 2. Thus, 15 and 70 are connected by an edge, while 15 and 119 are not.
What is χ(G)? What about ω(G)?
Hadwiger–Nelson problem. What is the minimum number of colors needed

to color the plane so that no two points at a unit distance from each other share the
same color? This is the famed Hadwiger–Nelson problem, due to Hugo Hadwiger
(1908–1981) and Edward Nelson (1932–2014) [12]. The answer is at least three:
consider the graph formed by the vertices of an equilateral triangle with side length
one. On the other hand, one can tile the plane with hexagons of diameter slightly
less than one. These can be colored with seven colors in such a way that the
Hadwiger–Nelson condition is satisfied; see Figure 3(a). On the other hand, the
Moser spindle (Figure 3(a)), discovered by Leo and William Moser, and the Golomb
graph (Figure 3(b)), discovered by Solomon Golomb (1932–2016), are unit-distance
graphs that can be embedded in the plane. They both have chromatic number
four; the existence of either of them implies that the answer to the Hadwiger–
Nelson problem is at least four. In 2018, Aubrey de Grey (1963– ) found a unit
distance graph in the plane with 1,581 vertices that is not four-colorable [5]. This
shows that the answer to the Hadwiger–Nelson problem is either 5, 6, or 7. What is
the correct answer? We still do not know. In fact, it may depend upon the axioms
of set theory that one assumes [10]!
Solution to the bonus problem. The even numbers 2, 4, . . . , 10,000 form a

clique of size 5,000. Thus, χ(G) ≥ 5,000. Assign each even number a different color
and then color 2n + 1 the same color as 2n. Since gcd(2n, 2n + 1) = 1, this provides
a coloring of G with exactly 5,000 different colors. Therefore, χ(G) = 5,000. A
more careful analysis confirms that ω(G) = 5,000. Since the chromatic and clique
numbers of G are the same, what can you deduce about its induced subgraphs?
Bateman–Horn and twin primes. We resume our discussion of the

Bateman–Horn conjecture from the 2005 entry. In 2005, Daniel Goldston (1954– ),
János Pintz, and Cem Yıldırım (1961– ) proved that
pn+1 − pn
lim inf = 0, (2006.1)
n→∞ log n
in which pn denotes the nth prime number.1 This major result set off a chain
reaction that led to the phenomenal work of Yitang Zhang, James Maynard, and
the Polymath8 project (see the 1919 entry and the comments for 1937).
As a first illustration of the power of the Bateman–Horn conjecture, we show
that it implies the twin prime conjecture. Before doing this, however, we pose a
few problems for the reader. Prove that the Bateman–Horn conjecture implies
(a) the prime number theorem,
(b) Dirichlet’s theorem on primes in arithmetic progressions, and
(c) Landau’s conjecture (there are infinitely many primes of the form n2 + 1).
To derive these results, apply the conjecture to the polynomials x, ax + b, and
x2 + 1, respectively; complete details can be found in [1].
Now for the twin prime conjecture. Let f1 (x) = x and f2 (x) = x + 2. Then
f1 (x) and f2 (x) are prime if and only if x and x+2 are twin primes. If f = f1 f2 and
ωf (p) denotes the number of solutions to f (x) ≡ 0 (mod p), then f (x) ≡ 0 (mod p)
if and only if x(x − 2) ≡ 0 (mod p), and hence

1 if p = 2,
ωf (p) =
2 if p ≥ 3.
The Bateman–Horn constant (2005.3) is

1
−2
ωf (p)

C(f1 , f2 ) = 1− 1−
p
p p
p(p − 2)
= 2
(p − 1)2
p≥3
= 2C2 ,
in which C2 ≈ 0.660161815 is the twin primes constant (see (1919.4) in the 1919
entry). The Bateman–Horn conjecture (2005.2) predicts that the number of n ≤ x
such that n and n + 2 are prime is
x
dt 2C2 x
Q(f1 , f2 ; x) ∼ 2C2 2
∼ .
2 (log t) (log x)2
1 We
admit that the connection with 2006 is tentative: the survey article [11] on the Goldston–
Pintz–Yıldırım theorem by Kannan Soundararajan appeared on the Bulletin of the American
Mathematical Society website on September 25, 2006).
/x
Figure 4. Graph of π2 (x) (orange) versus 2C2 2 (log t)−2 dt
(blue) and 2C2 x/(log x)2 (green) for x ≤ 10,000. Although all
three functions are asymptotically equivalent, the more compli-
cated integral expression provides a better approximation to π2
than does the more elementary expression.
Moreover, Q(f1 , f2 ; x) ∼ π2 (x), the counting function for the twin primes; see
Figure 4. This was first predicted by Hardy and Littlewood in 1923 [7].
Bibliography
[2] C. Berge, Färbung von Graphen, deren sämtliche bzw. deren ungerade Kreise starr sind,
Wiss. Z. Martin-Luther-Univ. Halle-Wittenberg Math.-Natur. Reihe 10 (1961), 114.
[3] M. Chudnovsky, N. Robertson, P. Seymour, and R. Thomas, The strong perfect graph theo-
rem, Ann. of Math. (2) 164 (2006), no. 1, 51–229, DOI 10.4007/annals.2006.164.51. http://
annals.math.princeton.edu/wp-content/uploads/annals-v164-n1-p02.pdf. MR2233847
[4] M. Chudnovky, A. Scott, P. Seymour, and S. Spirkl, Induced subgraphs of graphs with large
chromatic number, VIII: Long odd holes, https://arxiv.org/abs/1701.07217.
[5] A. D. N. J. de Grey, The chromatic number of the plane is at least 5, Geombinatorics 28
(2018), no. 1, 18–31. https://arxiv.org/abs/1804.02385. MR3820926
[6] Fulkerson Prize Committee, 2009 Fulkerson Prizes, Notices of the American Mathematical
Society 57 (2011), no. 11, 1475–1476.
[7] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’; III: On the
expression of a number as a sum of primes, Acta Math. 44 (1923), no. 1, 1–70, DOI
10.1007/BF02403921. MR1555183
[8] L. Lovász, Normal hypergraphs and the perfect graph conjecture, Discrete Math. 2 (1972),
no. 3, 253–267, DOI 10.1016/0012-365X(72)90006-4. MR0302480
[9] L. Lovász, A characterization of perfect graphs, J. Combinatorial Theory Ser. B 13 (1972),
95–98. MR0309780
[10] A. Soifer, The mathematical coloring book: Mathematics of coloring and the colorful life of
its creators, with forewords by Branko Grünbaum, Peter D. Johnson, Jr. and Cecil Rousseau
[11] K. Soundararajan, Small gaps between prime numbers: the work of Goldston-Pintz-Yıldırım,
Bull. Amer. Math. Soc. (N.S.) 44 (2007), no. 1, 1–18, DOI 10.1090/S0273-0979-06-01142-6.
MR2265008
[12] Wikipedia, Hadwiger–Nelson problem, https://en.wikipedia.org/wiki/Hadwiger-
Nelson_problem.
2007
Flatland
Introduction
The year 2007 marked the latest attempt to capture the novella Flatland: A
Romance of Many Dimensions (more commonly known as Flatland ), written in
1884 by Edwin Abbott Abbott (1838–1926). The movie is a creation of three
filmmakers from the University of Texas, Austin, who discovered that they had all
read and enjoyed the book when they were young students. Seth Caplan (1977– )
was the producer, Jeffrey Travis the director, and Dano Johnson the chief animator.
Their film is a hit with middle and high school geometry classes.
Although some of the social satire of the original book has been modified,
the central part of the story remains. The precocious hexagonal grandson in the
classic has been replaced by an equally precocious granddaughter, Hex. The major
innovation in the film is a mysterious artifact left in the two-dimensional world of
Flatland by a visitor from the third dimension, namely a cube that rotates about
a point so that the Flatlanders can see all of its various cross sections. One can
imagine what might happen. If a corner of the cube pierces the two-dimensional
world, local residents see a point appear and expand into an equilateral triangle
(Figure 1(a)). The triangle grows until its corners become blunted and form sides of
their own. The resulting hexagon slowly morphs until it is a large, regular hexagon
(Figure 1(b)). Once the cube’s center is firmly anchored in Flatland, it can spin
and rotate, revealing other shapes (Figures 1(c) and 1(d)).
The challenge for the onlookers is to imagine what kind of object could produce
all of these slices. It is only when Arthur Square and Hex are taken up into the third
dimension by the three-dimensional visitor Spherius that they begin to appreciate
geometric phenomena in a dimension higher than their own; see Figure 2.
Abbott definitely wanted to challenge his readers to imagine the analogous sit-
uation in which we are confronted by phenomena originating in a fourth spatial
dimension. The film concludes with views of a four-dimensional cube that is pro-
jected into our space as it rotates in various ways in the fourth dimension. What
would we perceive if a visitor from a four-dimensional universe visited our own?

Proposed by Thomas Banchoff, Brown University.
A four-dimensional cube (a hypercube) has sixteen vertices, each with four
coordinates that are 1 or −1. Among the most symmetric slicing hyperplanes are
the ones perpendicular to the vectors (1, 0, 0, 0) or (1, 1, 0, 0) or (1, 1, 1, 0) and, most
interestingly, (1, 1, 1, 1), the long diagonal.
531
532 2007. FLATLAND
(a) Cube cut by the plane x − y + z = −1. (b) Cube cut by the plane x − y + z = 0.
(c) Cube cut by the plane x − y + 2z = 0. (d) Cube cut by the plane z = 0.
Figure 1. The cube with vertices (±1, ±1, ±1) cut by various planes.
(a) What are the three-dimensional slices through the origin of a hypercube?
(b) Which of the central slices of the three-dimensional cube has the greatest area?
(c) Which central slice of the hypercube has the greatest volume?
(d) Describe the structure of the central slice of the five-dimensional cube by a
four-dimensional hyperplane perpendicular to its long diagonal.
Figure 2. Spherius visits Flatland. Flat World Productions,

www.flatlandthemovie.com.
2007: Comments
Nightfall. A classic science-fiction story that explores a theme similar to that
of Flatland is Isaac Asimov’s Nightfall [6]. Although it was written in 1941, well
before the Nebula Award for best science fiction short story was established in 1966,
the Science Fiction Writers of America voted it the best science fiction short story
from the era before the award. John Campbell, the influential editor of Astounding
Science Fiction, gave Asimov the following quote to use as inspiration for the story:
If the stars should appear one night in a thousand years, how would
men believe and adore, and preserve for many generations the remem-
brance of the city of God!
As in Flatland, a society must grapple with what is, to them, an inconceivable
concept. In this case, the planet Lagash is in a crowded system with six suns. The
planet is bathed in eternal day and its inhabitants never experience night or see
the distant stars. The story concerns what happens when they confront the truth
of the heavens.
Prime-generating polynomials. We are drawing near the end and it is time
to begin wrapping up some long-developing threads, among which is the Bateman–
Horn conjecture (see the comments for the 2005 and 2006 entries). We encountered
Euler’s polynomial n2 +n+41 in the comments for the 1930 entry. This polynomial
is prime for n = 0, 1, 2, . . . , 39, although it is composite for n = 40. Why is Euler’s
polynomial so good at producing primes? Can we find a quadratic polynomial that
beats Euler’s polynomial? Below, we follow the detailed exposition in [4].
First of all, no nonconstant polynomial f (x) with integer coefficients can pro-
duce only primes.1 To see this, let p = f (0), which is prime by assumption. For
1 Although no single-variable polynomial assumes only prime values, there are explicit multi-
variable polynomials that nearly do so; see the 1983 entry.

534 2007. FLATLAND
n = 0, 1, 2, . . ., the prime f (pn) is divisible by p and hence f (pn) = p. Then the

polynomial f (pn) − p has infinitely many roots, so f is the constant polynomial p.
If we search among polynomials of the form f (n) = n2 + n + k and insist that
f (n) is prime for a consecutive stretch starting with n = 0, then Euler has us beat.
Indeed, k = f (0) is prime and Georg Yuri Rainich (1886–1968) proved in 1913
that if p is prime, then n2 + n + p is prime for n = 0, 1, . . . , p − 2 if and only if
√
the imaginary quadratic field Q( 1 − 4p) has class number one [11] (see the 1966
entry). The Baker–Heegner–Stark theorem provides a short list of possibilities.
Of these,√ p = 41 provides the best results. This corresponds to the quadratic
field Q( −163), which we encountered previously in connection to Ramanujan’s
constant; see the comments for the 1975 entry. Rainich published [11] under his
original birth name, Rabinowitsch, which sometimes resulted in confusion. For
example, Bruce Palka relates the following amusing story [10]:
Rainich was giving a lecture in which he made use of a clever trick
which he had discovered. Someone in the audience indignantly in-
terrupted him pointing out that this was the famous Rabinowitsch
trick and berating Rainich for claiming to have discovered it. Without
a word Rainich turned to the blackboard, picked up the chalk, and
wrote ‘RABINOWITSCH.’ He then put down the chalk, picked up an
eraser and began erasing letters. When he was done what remained
was ‘RA IN I CH.’ He then went on with his lecture.
If we change the game, then we can beat Euler asymptotically, at least if we
believe the Bateman–Horn conjecture. Let us first examine what the conjecture
says about f (x) = x2 + x + 41. Recall that ωf (p) denotes the number of solutions
to f (x) ≡ 0 (mod p). A quick computation confirms that ωf (2) = 0. Since
4a(ax2 + bx + c) = (2ax + b)2 − (b2 − 4ac), (2007.1)
we have
x2 + x + 41 ≡ 0 (mod p) ⇐⇒ (2x + 1)2 ≡ −163 (mod p)
for p ≥ 3 and hence

−163
ωf (p) = 1 + ,
p
in which ⎧
⎪
⎨0 if p|,

= 1 if is a quadratic residue modulo p,
p ⎪
⎩
−1 if is a quadratic nonresidue modulo p,
is the Legendre symbol. Numerical computation confirms that ωf (p) = −1 for the
first eleven odd primes. This makes the corresponding Bateman–Horn constant
(2005.3) unusually large:
1
−1
ωf (p)

2
C(x + x + 41) = 1− 1− ≈ 6.63985. (2007.2)
p
p p
The Bateman–Horn conjecture predicts that there are around 3.32 Li(x) values
n ≤ x for which n2 + n + 41 is prime.
Can we find a k so that C(x2 + x + k) > C(x2 + x + 41)? That is, can we beat
Euler’s polynomial? Yes! For odd p, (2007.1) tells us that
f (x) ≡ 0 (mod p) ⇐⇒ (2x + 1)2 ≡ 1 − 4k (mod p)
and hence we want an odd k such that ( 1−4k p ) = −1 for the first several dozen
primes. Let r1 , r2 , . . . , r100 be quadratic nonresidues modulo the first 100 odd
primes, respectively. The Chinese remainder theorem provides an odd k, namely,
3682528442873462645493394982418837604455310384084190749577
5453041420103519734083583186615204669729662489042369819157
7358565650719425670030967384568941667322171286195075149379
680113340447535104953498545635385597443028681,
so that 1 − 4k ≡ rp (mod p) for each such p. Then
C(x2 + x + k) ≈ 10.9945
and hence Bateman–Horn predicts that there are around 5.5 Li(x) values n ≤ x for
which n2 + n + k is prime. This beats Euler’s polynomial, at least asymptotically.
Bibliography
[1] E. A. Abbott, Flatland: A romance of many dimensions, reprint of the sixth (1953) edition,
with a new introduction by Thomas Banchoff, Princeton Science Library, Princeton University
Press, Princeton, NJ, 2005. MR2176823
[2] E. A. Abbott, Flatland: A romance of many dimensions, the movie edition, with a new
introduction by Thomas Banchoff and contributions by Seth Caplan, Dano Johnson and
Jeffrey Travis, Princeton University Press, Princeton, NJ, 2008. MR2381792
[3] E. A. Abbott, Flatland, an edition with notes and commentary by William F. Lindgren and
Thomas F. Banchoff, MAA Spectrum, Mathematical Association of America, Washington,
DC; Cambridge University Press, Cambridge, 2010. MR2573243
[5] M. Gardner Mathematical games: The remarkable lore of the prime number, Scientific Amer-
ican, 210 (1964), 120–128.
[6] I. Asimov, Nightfall, Astounding Science Fiction, September 1941. https://www.uni.edu/
morgans/astro/course/nightfall.pdf.
[7] Banchoff and Strauss Productions, The Hypercube: Projections and Slicing, 1978. http://
www.math.brown.edu/~banchoff/video/hypercube.mp4.
[8] T. Banchoff, Additional notes on Flatland, 2014. http://www.math.brown.edu/~banchoff/
HexCentralSlices/HexCentralSlices4308.html.
[9] Flatland Homepage, Flatland the Movie. http://www.flatlandthemovie.com/.
[10] N. D. Elkies, Editor’s endnotes [erratum to MR2001148], Amer. Math. Monthly 111 (2004),
no. 5, 456–460, DOI 10.1080/00029890.2004.11920101. MR2976697
[11] G. Rabinowitsch, Eindeutigkeit der Zerlegung in Primzahlfaktoren in quadratis-
chen Zahlkörpern (German), J. Reine Angew. Math. 142 (1913), 153–164, DOI
10.1515/crll.1913.142.153. MR1580865
2008
100th Anniversary of the t-Test
Introduction
The central limit theorem is one of the masterpieces of probability theory. It
allows us to look at sums or averages of random variables sampled from an unknown
distribution and make conclusions about the distribution of these expressions. This
has powerful applications in statistics. It allow us to compare the average of a data
set to a known distribution, the Gaussian, as long as we know the population’s
standard deviation. It also permits us to set hypotheses on the value of certain
key parameters. Unfortunately, in practice we often do not know the population’s
standard deviation. Using our sample’s standard error introduces extra uncertainty
into the model that must be taken into account.
In 1908, William Sealy Gosset (1876–1937), who was working for Guinness,
ran into this problem when trying to analyze data on the best barley and hops to
use in beer production. Gosset came up with a clever solution that revolutionized
statistics: he added the error from the approximated standard deviation into the
tails of the Gaussian model. This produced a new probability distribution that
gave accurate estimates for the probability of the observations yielding a mean at
least as extreme as the observed mean given the assumptions about the population
mean. Gosset published the model under the pseudonym “A. Student” due to com-
pany policies at Guinness designed to limit other brewers from benefiting from the
statistical research of its employees. The probability density function for Student’s
t-distribution is
− ν+1
Γ( ν+1 ) x2 2
√ 2 ν 1+ ,
νπΓ( 2 ) ν
in which
∞
Γ(x) = e−t tx−1 dt (2008.1)
0
is the gamma function (see the 1942 entry and the comments for the 1998 entry)
and ν is the number of degrees of freedom in the model, generally equal to the
number of observations in the data minus 1. In applications, a t-value, equal to the
difference of the sample mean and hypothesized mean times the square root of the
number of observations divided by the sample variance, is calculated and compared
to the probabilities in this distribution.
One of the greatest challenges in doing statistics is making sure all the assump-
tions are satisfied (see the references from the 1997 entry for examples of some
catastrophic consequences). We have already remarked that the t-test allows us to
537
538 2008. 100TH ANNIVERSARY OF THE t-TEST
(a) ν ∈ {1, 5, 10} (b) ν = 30
Figure 1. Plots of t-distributions versus the standard normal.

In both images the standard normal is the one that is highest at
x = 0; on the left the value of n increases as the value at x = 0
increases.
consider situations in which the variances are unknown. Another great utility is
that when ν is modest, it is close to a Gaussian; we see this in Figure 1.
A popular application of Student’s t-test is to the correlation between two
quantitative variables. Generally, the assumption that errors from the model are
normally distributed is reasonable. However, since the regression might not be
over many points, estimating the standard deviation of the errors introduces extra
variance in the model, exactly what the Student’s t-model is designed to handle.
The model can also be used to compare the means of two populations and compare
the mean of a population to a specified value. After 100 years, the Student’s t-test
is still one of the most widespread and celebrated tools in statistics.
Since the problem below is far afield from the standard uses of the t-distribution,
we briefly remark on its inclusion here. First, it introduces several important ideas
in probability, especially the power of normalization constants (if we have a prob-
ability distribution, it must integrate to 1; this remarkably simple observation is
used numerous times in mathematics to attack difficult integrals). Second, it is
a wonderful example of how mathematics developed for one thing can find uses
in others. It also illustrates the value of being well read: problems that appear
intractable sometimes look that way before a new perspective is found. Finally, it
provides a great opportunity to discuss the gamma function.

Proposed by David Burt and Steven J. Miller, Williams College.
The density of the Student t-distribution with ν degrees of freedom is
− ν+1
Γ ν+12 t2 2
fν (t) = √ ν 1 + ,
πν Γ 2 ν
in which ν is a natural number and t is any real number. Although the t-distribution
was developed to investigate statistical problems, it has interesting applications in
pure mathematics. For example, it can be used to prove the following infinite
product representation for π discovered by John Wallis (1616–1703) in 1655:

π 2 2 4 4 6
= · · · · ··· .
2 1 3 3 5 5
It is not uncommon to prove such an identity by showing that both sides equal the
same quantity. Prove Wallis’s formula by looking at the limit of the t-distribution
as ν → ∞ and using the fact that a probability distribution integrates to one,
and by integrating the limiting distribution using brute force and the functional
equation Γ(x + 1) = xΓ(x).
2008: Comments
Stirling’s formula. How rapidly does n! grow? Stirling’s formula states that
√ n n
n! ≈ 2πn
e
(see the comments for the 1934 and 1998 entries) or, more accurately,
√ n n 1 1 139

n! = 2πn 1+ + − − · · · .
e 12n 288n2 51840n3
There are many proofs of Stirling’s formula. For example, it is a consequence of the
central limit theorem applied to sums of Poisson random variables [5, Sect. 18.7].
The definition (2008.1) ensures that Γ(0) = 1. Integration by parts confirms
that Γ(n + 1) = nΓ(n) for all natural numbers n, from which it follows that
Γ(n + 1) = n!. Thus, the gamma function can be used to interpolate the values of
√
the factorial function. For example, the value Γ( 12 ) = π arises in the definition of
the normal distribution (which we encounter below).
One way to prove Stirling’s formula involves the method of stationary phase,
also called Laplace’s method, which approximates integrals of the form
b
e−sf (x) g(x) dx
a
for certain pairs (f, g); see [6, App. A] and [5, Ch. 18]. The relevance of the
stationary phase approach stems from the fact that
∞
n! = Γ(n + 1) = e−x xn dx.
0
We sketch the argument below. It illustrates the value of embedding the quantity
we want (values of the factorial function) in a larger family where we have powerful
tools, such as calculus and analysis, at our disposal.
Suppose that f (x) > 0 and g(x) ≥ 0 on [a, b] and that there is an x0 ∈ (a, b)
such that f (x0 ) = 0. We hope to convert the integral
b
I(s) = e−sf (x) g(x) dx
a
into a Gaussian plus a small error as s → ∞. Our assumptions imply that
f (x0 )
f (x) = f (x0 ) + (x − x0 )2 + O(x2 )
2
for x near x0 . Thus,
sf (x0 )
(x−x0 )2
e−sf (x) ≈ e−sf (x0 ) · e− 2 ,
540 2008. 100TH ANNIVERSARY OF THE t-TEST
in which the second factor is a Gaussian

1 − 1 (x−x0 )2
√ e 2σs2
2πσ 2

with mean x0 and whose variance σs2 = 1/(sf √ (x0 )) tends to zero as s → ∞. This
approach even suggests why there is a 2π in Stirling’s formula: it comes from
integrating a Gaussian.
As s → ∞, most of the contribution to I(s) comes from x near x0 . For any
fixed > 0, we have
x0 +
f (x0 )
I(s) ≈ e−sf (x0 ) g(x0 ) e−s 2 (x−x0 ) dx
2
x0 −
x0 +
−sf (x0 ) 1
e−(x−x0 ) /2σs dx.
2 2
= e 2
g(x0 ) 2πσs
2
2πσs
x0 −
Since the Gaussian integrates to one over R (it is a probability distribution) and
sharply peaks at x0 as s → ∞, we obtain
,
−sf (x0 ) 2π
I(s) ≈ e g(x0 ) .
sf (x0 )
If we were more careful in our analysis, we could keep track of the lower-order terms
and bound how far we are from the true value.
How does this lead to Stirling’s formula?
∞
n! = Γ(n + 1) = e−t tn dt
0
∞
−t+n log n
= e dt
0
∞
= e−n(x−log nx) n dx (let t = nx)
0
∞
= nen log n e−n(x−log x) dx.
0
We apply the method of stationary phase as developed above, with s = n, x0 = 1,
f (x) = x − log x, and g(x) = 1. Since f (1) = 1 and g(1) = 1, we obtain
,
n log n −nf (1) 2π
n! ≈ ne e g(1)
sf (1)
"
2π
= nn+1 e−n
n
√ n n
= 2πn ,
e
which is Stirling’s approximation. See the comments for the 1934 entry for another
derivation.
Catalan numbers and their growth. While we are on the subject of facto-
rials and asymptotics, we might as well wrap up with the Catalan numbers:

1 2n (2n)!
Cn = = .
n+1 n (n + 1)! n!
Figure 2. A log plot of Cn (orange) versus the approximation

4n√
n3/2 π
for small n.
See the 1960 entry for a variety of combinatorial interpretations of these numbers.
Although one can employ Stirling’s formula to get the leading-order asymptotics
for Cn , a more precise formula is [2, Ex. 9.8]

4n 1 9 145
Cn ∼ √ − 5/2 + + ··· . (2008.2)
π n3/2 8n 128n7/2
For example, the first three terms of (2008.2) yield the approximation
C47 ≈ 33,869,142,691,002,085,695,452,443
to
C47 = 33,868,773,757,191,046,886,429,490.
It is off by a bit, but the order of magnitude and first several significant digits are
correct; see Figure 2.
Bibliography
[1] J. F. Box, Guinness, Gosset, Fisher, and small samples, Statist. Sci. 2 (1987), no. 1, 45–52.
http://projecteuclid.org/euclid.ss/1177013437. MR896258
[2] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete mathematics: A foundation for com-
puter science, 2nd ed., Addison-Wesley Publishing Company, Reading, MA, 1994. MR1397498
[3] H. Hotelling, British Statistics and Statisticians Today, Journal of the American Statistical
Association 25 (1930) 186–190.
[4] S. J. Miller, A probabilistic proof of Wallis’s formula for π, Amer. Math. Monthly 115 (2008),
no. 8, 740–745, DOI 10.1080/00029890.2008.11920586. Expanded version: http://arxiv.org/
pdf/0709.2181. MR2456095
[6] E. M. Stein and R. Shakarchi, Complex analysis, Princeton Lectures in Analysis, vol. 2, Prince-
ton University Press, Princeton, NJ, 2003. MR1976398
[7] A. Student, The probable error of a mean, Biometrika 6 (1908), no. 1, 1–25. http://www.
aliquote.org/cours/2012_biomed/biblio/Student1908.pdf.
2009
100th Anniversary of Brouwer’s

Fixed-Point Theorem
Introduction
Whether one admires the elegance of a far-reaching theorem or its applications,
Luitzen Egbertus Jan Brouwer (1881–1966), commonly known as L. E. J. Brouwer,
proved a major theorem in 1912 that appeals to all tastes. Let f : B n → B n be a
continuous function on the n-dimensional unit ball
B n = {x ∈ Rn : x ≤ 1}.
Brouwer’s fixed-point theorem asserts that f has at least one fixed point. In other
words, there exists an x ∈ B n such that f (x) = x; see Figure 1.
Brouwer’s theorem has uses far beyond analysis and topology. Nash cemented
its foundational role in game theory (see the 1944 entry) with his seminal thesis [6].
Armed with Brouwer’s theorem, he proved the existence of equilibria for noncoop-
erative games. Nash equilibria, as they were later called, are equilibrium points
in an n-person noncooperative game, in which each of the n players with pure or
mixed strategies makes the best decision possible taking into account the best de-
cision that can be made by the other n − 1 players. In 1994, Nash received the
Nobel Prize in Economics for this contribution. This application, among others,
highlights the importance of fixed-point theorems in the past, present, and future
(see the comments for the 1944 entry).
• •
x f (x) = x
Figure 1. An illustration of Brouwer’s fixed-point theorem for n = 2.
543
544 2009. 100TH ANNIVERSARY OF BROUWER’S FIXED-POINT THEOREM
Proofs of the Brouwer’s fixed-point theorem can be found in [5, 7]; see the
problem below for an unexpected combinatorial approach in the two-dimensional
case (the solution is given in the comments section).

The two-dimensional version of Sperner’s lemma concerns colorings of trian-
gles (see the 2001 entry). Use it to prove Brouwer’s fixed-point theorem for the
two-dimensional ball B 2 . Hint: First show that the closed unit ball in R2 is home-
omorphic to a closed triangle.
2009: Comments
Brouwer and eigenvalues. Suppose that A is an n × n matrix with positive
entries. Intuition suggests that A has a positive eigenvalue and a corresponding
eigenvector with nonnegative entries. This is true, and it is crucial to the study of
Markov chains; see the 1953 entry.
Let x = (x1 , x2 , . . . , xn ) ∈ Rn and define
S = {x ∈ [0, 1]n : x1 + x2 + · · · + xn = 1};
see Figure 2(a). Since A has only positive entries, it maps S into (0, ∞)n . If
x
π(x) = ,
x1 + x2 + · · · + xn
(b) A homeomorphism between the two-

dimensional sphere and an equilateral triangle
that it circumscribes can be written in polar
coordinates. Scale the ray {(r cos θ, r sin θ) :
0 ≤ r ≤ 1} so that (cos θ, sin θ) is mapped to
(a) The set S for n = 3, a vector x, and its the intersection of the ray and the boundary
projection π(x) onto S. of the triangle.
Figure 2. Constructions used in (a) the proof that a matrix with

positive entries has a positive eigenvalue and in (b) the proof of
Brouwer’s fixed-point theorem for n = 2.
then the composition f = π ◦ A is continuous1 and maps S into S. Since S is

homeomorphic to B 2 , the Brouwer fixed-point theorem ensures that f has a fixed
point p ∈ S. If f (p) = p, then p = π(Ap). However, π(Ap) is a positive scalar
multiple of Ap. Thus, Ap = λp for some λ > 0. Moreover, each entry of p is
nonnegative.
Invariance of domain. If m = n, are Rn and Rm homeomorphic (see the 1917

entry)? In other words, is there a continuous bijection f : Rn → Rm with continuous
inverse? This is a truly topological question, rather than a set-theoretic one. Indeed,
Cantor’s theory of cardinality implies that Rn and Rm are equinumerous; that is, a
bijection between them exists (see the 1918 entry). To see this, it suffices to show
that [0, 1]n and [0, 1] are equinumerous for any n. For n = 2,
(0.a1 a3 a5 . . . , 0.a2 a4 a6 . . .) → 0.a1 a2 a3 a4 a5 a6 . . .
is a bijection between [0, 1]2 and [0, 1]; a similar approach works for n ≥ 3. However,
the preceding function is not continuous so it sheds no light on our question.
One consequence of Brouwer’s fixed-point theorem is the invariance of domain
theorem, which says that if U ⊆ Rn is open and f : U → Rn is continuous and
injective, then f (U ) is open and f is a homeomorphism between U and V = f (U )
[2, 9, 11]. We can use this to prove that Rn and Rm are not homeomorphic if
m = n. Without loss of generality, suppose that m < n and φ : Rn → Rm is
injective. Define ι : Rm → Rn by
ι(x1 , x2 , . . . , xm ) = (x1 , x2 , . . . , xm , 0, 0, . . . , 0 )

n − m times
and f : R → R by f = ι◦φ. Since f is injective, invariance of domain ensures that

n n
f (Rn ) is open in Rn . However, f (Rn ) is contained in the image of ι, which contains

no nonempty open ball. This contradiction shows no injection φ : Rn → Rm exists.
In particular, Rn and Rm are not homeomorphic.
Solution to the problem. A complete argument can be found in [8]; we

sketch only the main ideas. The set

S = (x1 , x2 , x3 ) ∈ [0, 1]3 : x1 + x2 + x3 = 1
is homeomorphic to B 2 ; see Figure 2(b). Thus, it suffices to prove that every
continuous function f : S → S has a fixed point.
Suppose toward a contradiction that f : S → S is continuous and has no fixed
points. Let T be a triangulation of S, which we will paint with the “colors” {1, 2, 3}
as in Sperner’s lemma; see the 2001 entry. Color each vertex v = (v1 , v2 , v3 ) of T
with the smallest i for which f (v)i < vi , in which f (v)i denotes the ith component
of f (vi ). Such an index exists since f (v) and v are unequal by hypothesis and the
entries of both vectors each sum to 1. In particular, e1 = (1, 0, 0), e2 = (0, 1, 0),
and e3 = (0, 0, 1) are painted 1, 2, and 3, respectively. If v lies along the edge
that connects e1 and e2 , then v3 = 0 and hence v must be colored either 1 or 2.
Similarly, all vertices on the edge that connects e2 and e3 are colored 2 or 3, and so
1 Every linear transformation on a finite-dimensional normed vector space is continuous.
Prove it. Hint: Show that the closed unit ball is compact. Then prove that sup x =1 Ax < ∞.
(a) The Ulam spiral is obtained by marking (b) When the Ulam spiral is plotted on a
primes as the ordered sequence of natural num- 500 × 500 grid, “lines” begin to appear.
bers spirals away from the origin.
Figure 3. The Ulam spiral.
forth. Thus, T is a triangulation that is colored in accordance with the hypothesis of

Sperner’s lemma. Consequently, there is a small triangle whose vertices are colored
1, 2, 3, in some order.
Now consider a sequence Tn of triangulations, in which the diameters δn of
the component triangles tends to zero. Each Tn gives rise, via Sperner’s lemma,
to a small triangle that possesses vertices vn,i of each color i = 1, 2, 3. By the
compactness of T , we may pass to a subsequence and relabel so that limn→∞ vn,i
exists for i = 1, 2, 3. Since |vn,i − vn,j | ≤ δn → 0, the three limits are equal. Denote
this common limit p = (p1 , p2 , p3 ). Since f is continuous,

f (p)i = f lim vn,i i
= lim f (vn,i )i ≤ lim (vn,i )i = pi
n→∞ n→∞ n→∞
for i = 1, 2, 3. However, f (p) and p both sum to 1 and hence they must be equal.
In other words, p is a fixed point of f .
Ulam spiral. We wrap up our discussion of the Bateman–Horn conjecture,

which has been an ongoing thread throughout the comments for the 2005, 2006,
and 2007 entries, with puzzling patterns in the primes that were first discovered in
1963 by Stanislaw Ulam as he doodled during a scientific meeting. He listed the
natural numbers in increasing order in a boxy spiral emanating from the origin; see
Figure 3(a). Ulam observed that primes tend to congregate along certain diagonal
lines while they avoid others. This phenomenon provides a distinctive cross-hatch
pattern when plotted on large scales; see Figure 3(b). If the primes are truly
“random,” then how can they conspire across vast distances to form these geometric
patterns? Martin Gardner (see the 1914 entry) popularized the finding in a 1964
Scientific American column [4]:

Last fall Stanislaw M. Ulam of the Los Alamos Scientific Laboratory
attended a scientific meeting at which he found himself listening to
what he describes as a “long and very boring paper.” To pass the
time he doodled a grid of horizontal and vertical lines on a sheet of
paper. His first impulse was to compose some chess problems, then he
changed his mind and began to number the intersections, starting near
the center with 1 and moving out in a counterclockwise spiral. With
no special end in view, he began circling all the prime numbers. To
his surprise the primes seemed to have an uncanny tendency to crowd
into straight lines.
The Bateman–Horn conjecture provides an explanation for Ulam’s phenomenon
[1]. For example, the diagonal line 7, 19, 23, 47, 67, 79 . . . seems to contain many
primes; see Figure 3(a). It proves to be more convenient to consider rays instead of
lines, so we focus on the diagonal ray 7, 23, 47, 79, 119, 167, 223, . . . (of the numbers
listed here, only 119 = 7×19 is composite). Induction confirms that the nth number
on the list is f (n) = 4n2 + 4n − 1, which is an irreducible polynomial that does not
vanish modulo any prime.
More generally, if we agree to omit consecutive terms that appear at the begin-
ning of a ray (for example, we would drop 7, 8, 9 from the ray 7, 8, 9, 10, 27, 52, 85, . . .),
then there are integers b and c such that the nth number on the ray is 4n2 + bn + c.
The Bateman–Horn conjecture provides a precise asymptotic prediction for the
number of primes along each ray. For example, after some algebra, it suggests that
the number of n ≤ x such that f (n) is prime is asymptotic to 12 C(f ) Li(x), in which
(32/p)

C(f ) = 2 1− ≈ 3.70,
p−1
p≥3
Li(x) is the offset logarithmic integral function (1933.1), and (32/p) denotes a
Legendre symbol. Since

32 1 if p = 7, 17, 23, 31, 41, 47,
=
p −1 if p = 3, 5, 11, 13, 19, 29, 37, 43, 53, 59, 61, 67,
there is a substantial imbalance among the first few odd primes that makes C(f )
unusually large. This explains the particularly prime-rich ray that corresponds to
our polynomial. Analogous computations can be performed for other rays, most of
which have significantly smaller Bateman–Horn constants.
Bibliography
[2] L. E. J. Brouwer, Beweis der invarianz des n-dimensionalen gebiets (German), Math. Ann.
71 (1911), no. 3, 305–313, DOI 10.1007/BF01456846. MR1511658
[3] L. E. J. Brouwer, Über Abbildung von Mannigfaltigkeiten (German), Math. Ann. 71 (1911),
no. 1, 97–115, DOI 10.1007/BF01456931. MR1511644
[4] M. Gardner, Mathematical games: The remarkable lore of the prime number, Scientific Amer-
ican, 210 (1964), 120–128.
point theorems, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Berlin,
1980. MR602694
[6] J. F. Nash Jr, Non-cooperative games, ProQuest LLC, Ann Arbor, MI, 1950. http://www.
princeton.edu/mudd/news/faq/topics/Non-Cooperative_Games_Nash.pdf. MR2938064
[8] J. Shapiro Sperner’s lemma and Brouwer’s fixed-point theorem, http://joelshapiro.org/
Pubvit/Downloads/SpernerBrouwer/Sperner_Brouwer.pdf.
[9] T. Tao, Brouwer’s fixed point and invariance of domain theorems, and Hilbert’s fifth problem,
https://terrytao.wordpress.com/tag/invariance-of-domain/.
[10] Wikipedia, Brouwer’s fixed-point theorem, http://en.wikipedia.org/wiki/Brouwer fixed
point theorem.
[11] Wikipedia, Invariance of domain, https://en.wikipedia.org/wiki/Invariance_of_domain.
2010
Carmichael Numbers
Introduction
E-commerce would be impossible without the ability to securely transmit in-
formation. Since many modern cryptosystems, such as RSA (see the 1977 entry),
involve prime numbers, primality tests have been the focus of intense research in
recent years. A simple test is based on Fermat’s little theorem: if gcd(a, n) = 1 and
an−1 ≡ 1 (mod n), then n is composite. As we saw in the 2002 entry, this test is
not foolproof. For example, 2340 ≡ 1 (mod 341) despite the fact that 341 = 11 · 41
is composite. That is, 341 is a pseudoprime for the base 2.
In 1910, Robert Daniel Carmichael (1879–1967) observed a devious property
of the composite number 561 = 3 · 11 · 17 [3]. Fermat’s theorem ensures that
a2 ≡ 1 (mod 3), a10 ≡ 1 (mod 11), and a16 ≡ 1 (mod 17),
whenever gcd(a, 561) = 1. Therefore,
a560 ≡ (b2 )280 ≡ 1 (mod 3),
a560 ≡ (b10 )56 ≡ 1 (mod 11),
a560 ≡ (b16 )35 ≡ 1 (mod 17),
and hence a560 ≡ 1 (mod 561), whenever gcd(a, 561) = 1. Thus, 561 is a pseudo-
prime for any base relatively prime to 561. In honor of this discovery, a composite
number n is called a Carmichael number if an−1 ≡ 1 (mod n) whenever gcd(a, n) =
1 (see the comments for more information about priority and nomenclature). The
existence of Carmichael numbers prevents us from making a primality test based
only on a direct application of Fermat’s little theorem.
Carmichael himself found several such numbers and he conjectured that infin-
itely many exist. The first few are
561, 1105, 1729, 2465, 2821, 6601, 8911, 10585, 15841, 29341.
Extensive computations carried out over the years by Richard G. E. Pinch reveal
that there are 20,138,200 Carmichael numbers at most 1021 [10]. Moreover, these
computations suggest that infinitely many Carmichael numbers exist; see Figure 1.
In 1994, William Alford (1937–2003), Andrew Granville (1962– ), and Carl
Pomerance (1944– ) proved Carmichael’s conjecture [1]. To be more specific, they
showed that for large x there are at least x2/7 Carmichael numbers at most x. An
important ingredient in the proof is Korselt’s criterion:
n is composite, square free, and (p − 1)|(n − 1)
n is a Carmichael number ⇐⇒
for all primes p that divide n.
549
550 2010. CARMICHAEL NUMBERS
Figure 1. A log plot of the number of Carmichael numbers at

most 10n for n = 3, 4, . . . , 21 suggests that there are infinitely
many Carmichael numbers.
Although we cannot prove the Alford–Granville–Pomerance theorem here, we can

prove that Korselt’s condition indeed produces Carmichael numbers. Suppose that
n is composite, square free, and (p − 1)|(n − 1) for all primes p that divide n. If
p|n, then
gcd(a, p) = 1 =⇒ ap−1 ≡ 1 (mod p) (by Fermat’s little theorem)
=⇒ a n−1
≡ 1 (mod p) (since (p − 1)|(n − 1)).
Because n is square free, it is a product of distinct primes. Thus, an−1 ≡ 1 (mod n)
whenever gcd(a, n) = 1 and hence n is a Carmichael number.
Do the restrictive conditions in Korselt’s criterion ever occur? Yes! If
n = (6k + 1)(12k + 1)(18k + 1)
and if each of the three factors is prime, then
n = 1296k3 + 396k2 + 36k + 1,
and hence
n − 1 = 36k(36k2 + 11k + 1).
Since 6k, 12k, and 18k each divide 36k, and hence n − 1, it follows that n is
a Carmichael number. For example, Ramanujan’s number 1,729 is a Carmichael
number since 1,729 = 7 · 13 · 19 corresponds to k = 1.
The next Carmichael number produced by our formula is 294,409 = 37 · 73 · 109,
which corresponds to k = 6; the next one after that is 56,052,361 = 211 · 421 ·
631, which corresponds to k = 35. The values k = 45, 51, 55, and 56 also yield
Carmichael numbers. It is unknown whether there are infinitely many Carmichael
numbers of the form (6k + 1)(12k + 1)(18k + 1).1
1 TheBateman–Horn conjecture (see the 2005, 2006, 2007, and 2009 entries) implies that
there are infinitely many such Carmichael numbers. To see this, apply the conjecture to the
polynomials 6x + 1, 12x + 1, and 18x + 1.

Proposed by Carl Pomerance, Dartmouth College.

(a) Let b(n) = a ∈ {1, 2, . . . , n} : an ≡ a (mod n) . Show that b(n) = n if and
only if n is 1, a prime, or a Carmichael number.
(b) Show that if n is a non-Carmichael composite number, then b(n) ≤ 23 n.
(c) We say n is a taxicab Carmichael number if n is composite and
a(n−1)/2 ≡ ±1 (mod n)
whenever gcd(a, n) = 1. The first example is 1,729. Show that if n is a taxicab

Carmichael number, then a(n−1)/2 ≡ 1 (mod n) for all integers a relatively
prime to n; that is, the −1 in the definition never occurs.
2010: Comments
Taxicab number. The word “taxicab” in the problem refers to a famous
anecdote about Ramanujan related by G. H. Hardy:
I remember once going to see him [Ramanujan] when he was ill at
Putney. I had ridden in taxi cab number 1729 and remarked that the
number seemed to me rather a dull one, and that I hoped it was not an
unfavorable omen. “No,” he replied, “it is a very interesting number;
it is the smallest number expressible as the sum of two cubes in two
different ways.”
Ramanujan had observed that
1,729 = 93 + 103 = 13 + 123 .
Curiously, the Carmichael number
1,105 = 5 · 13 · 17
is the smallest number expressible as the sum of two squares in four different ways,
with order being irrelevant:
1,105 = 42 + 332 = 92 + 322 = 122 + 312 = 232 + 242 .
Why does this occur? A theorem of Fermat (see the comments for the 1923 and
1966 entries) asserts that an odd prime is the sum of two squares if and only if it
is congruent to 1 (mod 4). The primes 5, 13, and 17 are the smallest primes of this
form. Since
5 = 12 + 22 , 13 = 22 + 32 , and 17 = 12 + 42 ,
the identity
(a2 + b2 )(c2 + d2 ) = (ac − bd)2 + (ad + bc)2
can be used to provide the four representations of 1,105 above. Rather than taking
this multivariable polynomial identity on faith, or suggesting an unenlightening,
brute-force proof, we should provide a conceptual explanation. The identity is a
restatement of the fact that the complex absolute value is multiplicative: |zw| =
|z||w| for complex z, w. Let z = a + bi and w = c + di, in which i2 = −1. Then,
(a2 + b2 )(c2 + d2 ) = |z|2 |w|2 = |zw|2
= |(ac − bd) + i(ad + bc)|2
= (ac − bd)2 + (ad + bc)2 .
Carmichael numbers and Stigler’s law. Stigler’s law of eponymy (1980)
asserts that no scientific discovery is named after its discoverer [14]. True to form,
Stigler’s law was not proposed by statistician Stephen Stigler (1941– ); he attrib-
uted it to sociologist Robert K. Merton (1910–2003). In fact, mathematician Carl
Benjamin Boyer (1906–1976) said more or less the same thing in 1972. The history
behind Carmichael numbers is just as murky [12]. Alwin Korselt (1864–1947) an-
ticipated their discovery when he proved his criterion in 1899 (eleven years before
Carmichael’s paper [3]), although naturally he did not use the term “Carmichael
number” [7]. Since he did not provide any examples, perhaps it cannot be said that
Korselt “discovered” Carmichael numbers. That honor goes to Václav Šimerka, who
found the first seven Carmichael numbers in 1885 [11].
Carmichael numbers in arithmetic progressions. Once we know there
are infinitely many Carmichael numbers, it is natural to ask about their distribution.
Since Carmichael numbers behave, as far as Fermat’s little theorem is concerned,
like primes, it is natural to investigate other properties of primes that might be
shared with Carmichael numbers.
Dirichlet’s theorem on primes in arithmetic progressions states that if gcd(a, m)
= 1, then there are infinitely many primes congruent to a (mod m); see the 1913 and
2004 entries along with [5, 8]. Is the same true for Carmichael numbers? William
D. Banks and Pomerance proved this is the case under a certain assumption about
how soon a prime appears in an arithmetic progression. This has since been proved
unconditionally [15].
For each modulus m and each a relatively prime to m, let (a, m) be the
smallest prime congruent to a (mod m). Let
(m) = max (a, m).
gcd(a,m)=1
To prove their result, Banks and Pomerance needed to assume that

ξ
(d) ≤ Cd1+ log log d
for some constants C and ξ. This is a strong assumption indeed. Sarvadaman
Chowla (1907–1995) proved that the generalized Riemann hypothesis2 only gives
(d) ≤ C d2+ for every > 0. He conjectured the stronger result (d) ≤ C d1+ .
Roger Heath-Brown (1952– ), building on ideas of Harald Cramér, suggested that
(d) ≤ C(log d)2
holds. The bound assumed by Banks and Pomerance is somewhere in between these
two conjectures.
2 The generalized Riemann hypothesis asserts that the nontrivial zeros of Dirichlet L-functions
have real part 1/2. The 1942 entry concerns the Riemann hypothesis.
Linnik’s theorem. While we are on the theme of Dirichlet’s theorem and

“how soon” primes begin to appear in arithmetic progressions, we cannot resist
another demonstration of the power of heuristic reasoning. Yuri Vladimirovich
Linnik (1915–1972) proved in 1944 that if gcd(a, n) = 1, then there are positive
constants c and L such that there is at least one prime congruent to a (mod n)
before we reach cnL [13]. Although Linnik did not provide an explicit value for
L, Chengdong Pan (1934–1997) showed in 1957 that we may take L ≤ 10,000.
This was reduced over the years by several authors; the record L ≤ 5 was achieved
by Triantafyllos Xylouris in 2011. The generalized Riemann hypothesis implies
something much stronger: the first prime occurs by (1 + o(1))φ(n)2 (log n)2 , in
which o(1) denotes a term that tends to zero as n → ∞.
Heuristic reasoning and some basic knowledge of probability theory [9] yields
more or less the same answer. Suppose that there are many cereal boxes, each of
which has exactly one of c prizes, with equal probability. Thus, each time a box is
opened, you have a 1/c chance of getting prize 1, a 1/c chance of getting prize 2,
and so forth. How many boxes do you expect to have to open before you have at
least one of each of the c prizes?
Let Yj be the random variable that denotes how many more boxes must be
opened to get the next new prize, given that we already have j distinct prizes from
among the c possible prizes. For each pick, the probability that we get one of the
j prizes we already possess is j/c. Thus, the probability that we get a new prize is
j c−j
pj = 1 − = .
c c
The probability that Yj equals n is
(1 − pj )n−1 pj
for n ≥ 1 (the first n − 1 boxes give us prizes we already have, and the nth gives
us a new one). Thus, the expected wait for the new prize is
∞
∞
pj 1 c
E[Yj ] = n · (1 − pj )n−1 pj = n(1 − pj )n = = ,
n=1
1 − pj n=0 pj c−j
in which we used the identity
∞
∞
d n d 1 r
n
nr = r r = r = .
n=0
dr n=0 dr 1 − r (1 − r)2
The total expected wait time is the sum of our random variables, starting with Y0 ,
when we have no prizes, and ending with Yc−1 , when we need only one more prize:
E[Y0 + · · · + Yc−1 ] = E[Y0 ] + E[Y1 ] + · · · + E[Yc−1 ]

c c c 1 1 1
= + + ··· + = c 1 + + + ··· +
c c−1 1 2 3 c
∼ c log c,
with the asymptotic equivalence following from the integral test.3 Thus, we expect
to need around c log c boxes to ensure we have one of each prize.
3 The number 1 + 1
2
+ ··· + 1
n
is the nth harmonic number; see the 1933 entry.
We now perform the “change of variables” z → m and consider “primes” instead

of “prizes”! Let c = φ(n) denote the number of residue classes modulo n that can
possibly contain infinitely many primes. The argument above suggests that we need
around φ(n) log φ(n) primes to ensure we have at least one prime in each admissible
residue class. How high must we go to ensure we have this many primes? The
prime number theorem implies that the nth prime is approximately n log n (see the
comments for the 1987 entry). Thus,
pφ(n) log φ(n) ≈ φ(n) log φ(n) · log (φ(n) log φ(n)) ≈ φ(n)(log φ(n))2 ,
which is in close agreement to the prediction of the Cramér heuristic.
Bibliography
[1] W. R. Alford, A. Granville, and C. Pomerance, There are infinitely many Carmichael num-
bers, Ann. of Math. (2) 139 (1994), no. 3, 703–722, DOI 10.2307/2118576. http://www.math.
dartmouth.edu/~carlp/PDF/paper95.pdf. MR1283874
[2] W. D. Banks and C. Pomerance, On Carmichael numbers in arithmetic progressions, J.
Aust. Math. Soc. 88 (2010), no. 3, 313–321, DOI 10.1017/S1446788710000169. http://
faculty.missouri.edu/~bankswd/papers/2010_Carmichael_APs_published_version.pdf.
MR2661452
[3] R. D. Carmichael, Note on a new number theory function, Bull. Amer. Math. Soc. 16 (1910),
no. 5, 232–238, DOI 10.1090/S0002-9904-1910-01892-9. MR1558896
[4] S. Chowla, On the least prime in an arithmetical progression, J. Indian Math. Soc. (N.S.)
1 (1934), 1–3.
[5] H. Davenport, Multiplicative number theory, 3rd ed., revised and with a preface by Hugh
L. Montgomery, Graduate Texts in Mathematics, vol. 74, Springer-Verlag, New York, 2000.
MR1790423
[6] D. R. Heath-Brown, Almost-primes in arithmetic progressions and short in-
tervals, Math. Proc. Cambridge Philos. Soc. 83 (1978), no. 3, 357–375, DOI
10.1017/S0305004100054657. http://journals.cambridge.org/action/displayAbstract?
fromPage=online&aid=2079092&fileId=S0305004100054657. MR0491558
[7] A. R. Korselt, Problème chinois, L’Intermédiaire des Mathématiciens 6 (1899), 142–143.
[10] R. G. E. Pinch, The Carmichael numbers up to 1021 , Proceedings of Conference on Algo-
rithmic Number Theory 2007, Turku Centre for Computer Science 46 (2007), 129–131.
[11] V. Šimerka, Zbytky z arithmetické posloupnosti (On the remainders of an arithmetic progres-
sion), Časopis pro pěstovánı́ matematiky a fysiky, 14 (1885), no. 5, 221–225.
[12] Wikipedia, Carmichael number, https://en.wikipedia.org/wiki/Carmichael_number.
[13] Wikipedia, Linnik’s theorem, https://en.wikipedia.org/wiki/Linnik’s_theorem.
[14] Wikipedia, Stigler’s law of eponymy, https://en.wikipedia.org/wiki/Stigler’s law of
eponymy.
[15] T. Wright, Infinitely many Carmichael numbers in arithmetic progressions, Bull. Lond. Math.
Soc. 45 (2013), no. 5, 943–952, DOI 10.1112/blms/bdt013. MR3104986
2011
100th Anniversary of Egorov’s Theorem
Introduction
Although John Edensor Littlewood is best known for his work with G. H.
Hardy, he was a fine mathematician in his own right. Analysts frequently appeal
to Littlewood’s principles from measure theory [4]:
There are three principles, roughly expressible in the following terms:
Every (measurable) set is nearly a finite sum of intervals; every func-
tion (of class Lp ) is nearly continuous; every convergent sequence of
functions is nearly uniformly convergent.
Although these are not precise assertions, they suggest several important phenom-
ena. In what follows, let m denote Lebesgue measure on R.
Littlewood’s first principle means that a Lebesgue measurable subset A ⊂ R
of finite measure can be arbitrarily well-approximated by a finite union G of open
intervals in the sense that m(A\G ∪ G\A) can be made arbitrarily small.
In modern presentations, Littlewood’s third principle (Egorov’s theorem) usu-
ally comes before the second principle (Lusin’s theorem). Recall that a property
holds almost everywhere if the set of points at which it fails has measure zero. Here
is a formal statement of Egorov’s theorem [11]. Let fn : [a, b] → R be a sequence of
measurable functions that converges almost everywhere to f . For each > 0, there
is a measurable set E ⊆ [a, b] so that m(E) < and fn converges to f uniformly
on [a, b]\E. Thus, convergence almost everywhere implies uniform convergence on
subsets of large measure.
We cannot replace [a, b] in Egorov’s theorem with an unbounded interval be-
cause of the “traveling wave” phenomenon. For example, the sequence of charac-
teristic functions fn = χ[n,n+1] converges to 0 everywhere on [0, ∞). However, fn
does not converge to zero uniformly on any unbounded subset of [0, ∞) and hence
the analogue of Egorov’s theorem does not hold in this case.
One can even accomplish this with compactly supported, infinitely differen-
tiable functions. Since such functions are a staple in harmonic analysis and ad-
vanced partial differential equations, it is worth describing their construction. First
let
e−1/x if x > 0,
h(x) =
0 if x ≤ 0,
and verify, with the definition of the derivative, L’Hôpital’s rule, and induction,
that h is infinitely differentiable on R; see Figure 1(a). Then
h(x)
r(x) =
h(x) + h(1 − x)
555
556 2011. 100TH ANNIVERSARY OF EGOROV’S THEOREM
(a) The “hill” function h(x) begins perfectly (b) The “ramp” function r(x) begins perfectly
flat, then smoothly inclines on x > 0. It is flat, smoothly inclines on 0 < x < 1, and be-
infinitely differentiable. comes perfectly flat again. It is infinitely dif-
ferentiable.
(c) The function b2 (x). (d) The function b3 (x).
Figure 1. Construction of a sequence of infinitely differentiable

functions that tends to zero on [0, ∞), but not uniformly on any
set of positive measure.
acts as a smooth “ramp” from elevation 0 to 1. Finally, we obtain the sequence

bn (x) = 1 − r (x − n/2)2
of infinitely differentiable “bump functions” that travels in the positive x direction.

In particular, bn tends to zero on R, but not uniformly on any subset of positive
measure; see Figure 1.
Although commonly credited to Dmitri Egorov (1869–1931) in 1911 [1], in ac-
cordance with Stigler’s law of eponymy (see the comments for the 2010 entry),
Egorov’s theorem was first discovered by Carlo Severini (1872–1951) [7], who pub-
lished the result several months earlier in an obscure Italian journal. Severini’s
contribution was not widely acknowledged until 1924 when the influential Leonida
Tonelli (1885–1946) made it known [9]. Severini’s and Egorov’s proofs are practi-
cally identical but were discovered independently.

Proposed by Francesco Cellarosi, University of Illinois Urbana-Cham-
paign.
The Severini–Egorov theorem has been assumed to be true if fn is replaced by
a family of functions that depends on a real parameter [2, p. 79]. This problem
outlines a counterexample discovered in 1958 by J. D. Weston. Find a family of
functions fh : [0, 1) → R for h ∈ [0, 1) that satisfies the following:

(a) fh (x) = 0 for each h ∈ [0, 1), except possibly at a single point x, at which fh
assumes the value 1.
(b) fh (x) → 0 on [0, 1) as h → 0.
(c) The convergence is not uniform on any set of positive measure.
Hint: The axiom of choice can be used to construct a useful nonmeasurable set.
2011: Comments
Mathematical genealogy. It would be unfair to reduce Egorov’s mathemat-
ical contributions to his eponymous theorem. For example, he has 6,396 academic
descendants [5]! Mathematicians like to keep track of their intellectual lineages:
one’s doctoral advisor is a “parent” in the genealogy, fellow students of the same
advisor are “siblings,” and so forth.
The Mathematics Genealogy Project is an immense database that contains
detailed information about almost every mathematician [3, 5]. Before the advent of
the formal dissertation process in Europe, many of the “genetic” relations between
mathematicians were informal. For example, an older mathematician might mentor
a younger one, or one mathematician’s writing might have influenced another.
The database is endlessly fascinating. For instance, the record for most descen-
dants belongs to Sharaf al-Din al-Tusi (ca. 1135–1213). His influence eventually
leads to the Byzantine Gregory Chioniadis (ca. 1240–1320), who studied in Persia.
A few more steps leads through the Byzantine empire to renaissance Italy, where we
begin to see a few familiar names. A more detailed analysis revealed the following:
In July 2016, Cosmin Ionita and Pat Quillen of MathWorks used MAT-
LAB to analyze the Math Genealogy Project graph. At the time, the
genealogy graph contained 200,037 vertices. There were 7639 (3.8%)
isolated vertices and 1962 components of size two (advisor-advisee
pairs where we have no information about the advisor). The largest
component of the genealogy graph contained 180,094 vertices, account-
ing for 90% of all vertices in the graph. The main component has 7323
root vertices (individuals with no advisor) and 137,155 leaves (math-
ematicians with no students), accounting for 76.2% of the vertices in
this component. The next largest component sizes were 81, 50, 47, 34,
34, 33, 31, 31, and 30. [5]
A small sample of the database is illustrated in Figure 2. Posters depicting the
genealogy of individual mathematicians, and even entire departments, are sold for
a small fee by the Mathematics Genealogy Project.
Obviously. . . . One often encounters phrases like “it is easy to see that,”
“clearly. . . ,” and “obviously. . . ” in mathematical writing. Along similar lines,
Egorov concluded his paper [1] with:
On voit sans peine que ce théorème est susceptible d’un grand nombre
d’applications. [One sees with no effort that this theorem is prone to
a great number of applications.]
Figure 2. Mathematical genealogy of the first author. Among

his academic ancestors are Gauss, Dirichlet, Fourier, Poisson, La-
grange, Euler, Leibniz, Klein, and two Bernoullis. Students are
often impressed by such “family trees,” even though almost all
mathematicians can trace their academic lineage to such distin-
guished forbears.
Francis Su, former MAA president, wrote [8]:

Mathematicians are fond of using terms such as “trivial” and “obvious”
and “clear” to mean “straightforward for someone who has already
mastered the material.” But some students may think: “I must not
be good at math if I can’t see that it’s obvious.”
He adds “[t]hose who doubt themselves are most prone to feeling this way.” This
unfortunate behavior has become somewhat stereotypical for mathematicians. Su
relates the following joke, which is firmly embedded in mathematical folklore:
Professor: “We’ll skip the proof of the lemma, because it’s ob-
vious. We’ll now use the lemma to prove the theorem. . . .”
Student: “I’m sorry, I don’t think the proof of the lemma is

obvious.”
Professor: (stops to think about it and, after a long pause,

collects herself )“I was right! The proof of the lemma is obvious.
We’ll now use the lemma to prove the theorem. . . .”
We have attempted to avoid these sorts of phrases, which are falling out of favor
since they are now recognized as poor pedagogy.
Is mathematics inconsistent? On September 27, 2011, mathematical physi-

cist and blogger extraordinaire John Baez (1961– ) posted a stunning announcement
on the popular mathematics/physics/philosophy blog The n-Category Café [6]:
Edward Nelson [1932–2014], a math professor at Princeton, is writing
a book called Elements in which he claims to prove the inconsistency
of Peano arithmetic.
He reported that Nelson (see the comments for the 2006 entry) had posted a tan-
talizing message on the Foundations of Mathematics mailing list:
I am writing up a proof that Peano arithmetic (P), and even a small
fragment of primitive-recursive arithmetic (PRA), are inconsistent.
If true, this would mean that arithmetic as we know it is fatally flawed. Peano
arithmetic (see the comments for the 1929 entry), as its name implies, is an axiom
system that encompasses basic arithmetic. If it were to be proved inconsistent,
mathematics would fall. Monroe Eskew wrote [6]:
In my opinion, it [the inconsistency of Peano arithmetic] would destroy
core mathematical ideas going back to Euclid. Pure math would be
decimated. We would still have to keep the applied stuff and formulate
some new general theory that accounts for the applied math and fosters
progress in applied math. But a substantial reworking of everything
would be required.
Another perspective was provided by Paul Chang [6]:
If, for argument’s sake, PA was proven inconsistent, then math merely
becomes a defacto natural science like biology or chemistry, in the
sense that the “validity” of math no longer stems from axioms, but
rather validation against real world conditions and observations.
Nelson provided a link to an outline of his book and an overview of the proof.
Baez admitted that the details were beyond his expertise and asked, “Can anyone
take a stab at explaining some of these ideas?” He ended the opening post on the
thread in his characteristic informal style [6]:
I should admit that Nelson and I had the same Ph.D. thesis advisor
[Irving Segal (1918–1998)], so I probably take his ideas more seriously
than if he were some random unknown guy. On the other hand, he
turned me down when I asked him to supervise my undergraduate
thesis, so it’s not like we’re best buddies or anything.
Terence Tao responded via Google+ and gave a short explanation of the po-
tential flaw (having only seen an outline of the proof) [6].1 Over the course of
a few days, an increasingly more technical exchange ensued after which Nelson
1 Daniel Tausk had communicated the same concerns to Nelson independently via e-mail.
responded: “You are quite right, and my original response was wrong. Thank you
for spotting my error. . . . I withdraw my claim.” Peano arithmetic appears, at least
for the time being, to be free of contradictions.
Bibliography
[1] D. T. Egoroff, Sur les suites des fonctions mesurables, Comptes rendus hebdomadaires des
séances de l’Académie des sciences 152 (1911), 244–246.
[2] G. H. Hardy and W. W. Rogosinski, Fourier Series, Cambridge Tracts in Mathematics and
Mathematical Physics, no. 38, Cambridge University Press, 1944. MR0010206
[3] A. Jackson, A labor of love: the Mathematics Genealogy Project, Notices of the American
Mathematical Society 54 (2007), no. 8, 1002–1003.
[4] J. E. Littlewood, Lectures on the Theory of Functions, Oxford University Press, 1944.
MR0012121
[5] Mathematics Genealogy Project, https://genealogy.math.ndsu.nodak.edu/index.php.
[6] The n-Category Café: a group blog on math, physics, and philosophy, September 27,
2011: The Inconsistency of Arithmetic, https://golem.ph.utexas.edu/category/2011/09/
the_inconsistency_of_arithmeti.html.
[7] C. Severini, Sopra gli sviluppi in serie di funzioni ortogonali, Atti della Accademia Gioenia
di scienze naturali in Catania, Series V III (1910), 1–7.
[8] F. Su, Mathematical microaggressions, MAA Focus 35 (2015), no. 5, 36–37.
[9] L. Tonelli, Su una proposizione fondamentale dell’analisi, Bollettino della Unione Matematica
Italiana 2 (1924), no. 3, 103–104.
[10] J. D. Weston, A counter-example concerning Egoroff ’s theorem, J. London Math. Soc. 34
(1959), 139–140, DOI 10.1112/jlms/s1-34.2.139. MR0103961
[11] Wikipedia, Egorov’s theorem, https://en.wikipedia.org/wiki/Egorov’s_theorem.
2012
National Museum of Mathematics
Introduction
We end our book, which has celebrated 100 years of mathematical milestones,
with the opening of a museum that shares a similar goal. On December 15, 2012,
The National Museum of Mathematics (MoMath) opened in New York City [4]:
The National Museum of Mathematics began in response to the closing
of a small museum of mathematics on Long Island, the Goudreau Mu-
seum. A group of interested parties (the “Working Group”) met in Au-
gust 2008 to explore the creation of a new museum of mathematics—
one that would go well beyond the Goudreau in both its scope and
methodology. The group quickly discovered that there was no mu-
seum of mathematics in the United States, and yet there was incredible
demand for hands-on math programming.
Do we need such a museum? Yes! Mathematics, unlike most other sciences, has
a terrible public-relations problem. Most college-educated Americans ended their
mathematical studies at calculus, or perhaps even earlier in “college algebra.” The
two people most closely associated with calculus, Newton and Leibniz, had their
heyday in the 1600s. This is largely where the public understanding of mathematics
stops: in the seventeenth century. So many people fear mathematics that “math
anxiety” has been studied by educators and psychologists since the 1950s [6]. No
other science1 suffers this enormous disconnect between its practitioners and the
public. Physicists can talk about nuclear energy and black holes without receiving
puzzled looks. Biologists can speak of DNA, genes, and proteins without fear of
losing the audience. These topics from 20th-century science are firmly embedded
in the public consciousness. Mathematicians have a lot of ground to make up.
Although there have been numerous exhibits on the interplay between math
and art, or small wings in science museums, MoMath is something new. It is entirely
devoted to mathematics [4]:
Mathematics illuminates the patterns that abound in our world. The
National Museum of Mathematics strives to enhance public under-
standing and perception of mathematics. Its dynamic exhibits and
programs stimulate inquiry, spark curiosity, and reveal the wonders
of mathematics. The Museum’s activities lead a broad and diverse
audience to understand the evolving, creative, human, and aesthetic
nature of mathematics.
1 There is some debate about whether mathematics is truly a “science,” or whether it is more
akin to philosophy or even religion. However, it has long been held among the liberal arts, being
well represented in the quadrivium of arithmetic, geometry, music, and astronomy.
561
562 2012. NATIONAL MUSEUM OF MATHEMATICS
On August 2, 2018, MoMath announced the first Distinguished Chair for the
Public Dissemination of Mathematics, a visiting professorship “dedicated to raising
public awareness of math.” The first recipient is Fields Medalist Manjul Bhargava,
whom we encountered several times already. Among the many activities associated
with this prestigious post is an eight-week-long minicourse, suitable for ages thirteen
and up, on mathematics and magic.
As for public awareness, mathematicians might never catch up to their col-
leagues in other sciences. We do not even have a Nobel Prize (usually good for
thirty seconds on the evening news) for the subject. However, we do our best. Like
the popular television show NUMB3RS (2005–2010) before it, MoMath is slowly
making mathematics more accessible to the general public.

There are a lot of beautiful exhibits at MoMath. One favorite is the bike with
square wheels (Figure 1), which rolls smoothly above a surface made of catenary
curves (the shape taken by a chain or wire hung between two posts). Just a few
feet away is another interesting mode of transport. The irregularly shaped objects
in Figure 2 share a property with spheres: their height remains the same no matter
how they lie. Although it is not surprising that one could roll smoothly over a set
of spheres, the fact that there are infinitely many other candidates is unexpected.
Moreover, the sled has constant width and can spin freely as it rolls.
Find shapes, other than the sphere, so that no matter how they lie, the distance
from their highest point to the ground is constant. Can you find such shapes in all
dimensions? Is there a simple way to extend a shape that works in d − 1 dimensions
to d dimensions? If so, can you find nontrivial examples in all dimensions?
2012: Comments
All good things. . . . It has been relatively easy to find subjects for the early
entries. Those have typically been concerned with significant events from times
long past. Sufficiently many years have passed so that we can determine which
results have stood the test of time. For more recent entries, the task has been more
difficult. The opinions and personal tastes of the authors have more often than not
been the deciding factors.
Can we look forward and predict what will be included in the sequel, dedicated
to the second hundred years of Pi Mu Epsilon? Some choices are obvious. For
instance, 2014 will probably focus on Maryam Mirzakhani (1977–2017), the first
female Fields Medalist (sadly, she passed away only a few years after receiving
the medal). Will she be the first of many? It is too early to tell, although we
will have to wait until at least 2022 to see a second female medalist. Which of
2018 medalists, Caucher Birkar (1978– ), Alessio Figalli (1984– ), Peter Scholze
(1987– ), and Akshay Venkatesh (1981– ), will we devote entries to in one hundred
years? What new theories will develop and blossom in the coming century?
Certainly there are open problems that will deserve entries of their own if they
are dispatched. The theory of numbers, one of our favorite topics, offers plenty of
opportunities. Will we have an entry about the proof of the Riemann hypothesis?
Figure 1. A bike with square wheels can ride smoothly on a suit-

able surface. Image: National Museum of Mathematics.
Figure 2. A sled of constant width can glides smoothly over

shapes of constant height. Image: National Museum of Mathe-
matics.
What about the abc-conjecture, or the Bateman–Horn conjecture? Maybe the

Langlands program will be complete. On a less overarching theme, maybe the
troublesome 3x + 1 problem will fall?
564 2012. NATIONAL MUSEUM OF MATHEMATICS
We can do no better than to finish with an excerpt from David Hilbert’s address
to the International Congress of Mathematicians in 1900:
History teaches the continuity of the development of science. We know
that every age has its own problems, which the following age either
solves or casts aside as profitless and replaces by new ones. If we
would obtain an idea of the probable development of mathematical
knowledge in the immediate future, we must let the unsettled ques-
tions pass before our minds and look over the problems which the
science of today sets and whose solution we expect from the future.
To such a review of problems the present day, lying at the meeting
of the centuries, seems to me well adapted. For the close of a great
epoch not only invites us to look back into the past but also directs
our thoughts to the unknown future. . . . It is difficult and often impos-
sible to judge the value of a problem correctly in advance; for the final
award depends upon the gain which science obtains from the prob-
lem. . . while the creative power of pure reason is at work, the outer
world again comes into play, forces upon us new questions from actual
experience, opens up new branches of mathematics, and while we seek
to conquer these new fields of knowledge for the realm of pure thought,
we often find the answers to old unsolved problems and thus at the
same time advance most successfully the old theories. And it seems
to me that the numerous and surprising analogies and that apparently
prearranged harmony which the mathematician so often perceives in
the questions, methods and ideas of the various branches of his science,
have their origin in this ever-recurring interplay between thought and
experience. [3]
Bibliography
[1] C. G. Gray, Solids of constant breadth, Math. Gaz. 56 (1972), no. 398, 289–292, DOI
10.2307/3617832. MR0487786
[2] L. Hall and S. Wagon, Roads and wheels, Math. Mag. 65 (1992), no. 5, 283–301, DOI
10.2307/2691240. MR1191272
[3] D. Hilbert, Über das Unendliche (German), Math. Ann. 95 (1926), no. 1, 161–190,
DOI 10.1007/BF01206605. http://www.ams.org/journals/bull/1902-08-10/S0002-9904-
1902-00923-3/S0002-9904-1902-00923-3.pdf. MR1512272
[4] MoMath: National Museum of Mathematics, http://momath.org/.
[5] R. L. Tennison, Smooth curves of constant width, The Mathematical Gazette 60 (1976),
no. 414, 270–272.
[6] Wikipedia, Mathematical anxiety, https://en.wikipedia.org/wiki/Mathematical_anxiety.
Index of People
Abbott, Derek, 252 Bellaso, Giovan Battista, 212 Caesar, Julius, 125, 212
Abbott, Edwin Abbott, 531 Berge, Claude, 525 Caldwell, Chris K., 388
Ackermann, Wilhelm, 71 Bergelson, Vitaly, 341, 342 Campbell, John, 145, 533
Adams, Douglas, 27, 392 Bernoulli, Jacob, 381 Cantor, Georg, 21, 27, 71,
Adams, John Couch, 221 Bernstein, Sergei, 196 118, 269, 329
Adleman, Leonard, 351, 501 Bertrand, Joseph, 129 Caplan, Seth, 531
Agmon, Shmuel, 197 Beukers, Frits, 365 Carlitz, Leonard, 460
Agrawal, Manindra, 501 Beurling, Arne, 194 Carmichael, Robert Daniel,
Aguayo, Daniel, 489 Bhargava, Manjul, 47, 287, 549
Ahlin, Ashley, xi 446, 562 Carr, Avery T., xii, 222, 242,
Aigner, Martin, 3, 189 Bieberbach, Ludwig, 393 383, 386
Akhmedov, Azer, 63 Bigelow, Stephen, 63 Catalan, Eugène Charles,
Akiyama, Yo, xii Binet, Jacques Philippe 254
al-Din al-Tusi, Sharaf, 557 Marie, 495 Cauchy, Augustin-Louis, 448
Alcuin of York, 473 Birkar, Caucher, 562 Cellarosi, Francesco, 556
Alexander, James Waddell, Birkenmajer, Ludwik An- Chang, Alan, 334
396 toni, 370 Chang, Paul, 559
Alford, William, 549 Birkhoff, George, 21, 96 Chao-Haft, Max, xii
Anderson, Randy L., 435 Birman, Joan, 399 Châu, Ngô Bao, 294
Andrade, Julio, 170 Black, Fischer, 469 Chebyshev, Pafnuty, 129,
Andrews, James M., xii, 91, Blake, Katherine, xii 381, 515
231 Bohr, Neils, 14 Chen, Hang, 306
Apéry, Roger, 171, 364 Boltzmann, Ludwig, 95 Cheng, Christine, 265
Appel, Kenneth, 346 Bolyai, János, 369 Cheng, Yuanyou, 388
Arnold, Vladimir, 221 Bolyai, Wolfgang Farkas, 369 Chioniadis, Gregory, 557
Arrow, Kenneth J., 199 Bombieri, Enrico, 170 Chowla, Sarvadaman, 552
Artin, Emil, 170 Borcherds, Richard, 441 Chudnovsky, Maria, 526
Aschbacher, Michael, 514 Bott, Raoul, 300 Church, Alonzo, 122
Asimov, Isaac, 14, 145, 533 Bourbaki, Nicholas, 234 Churchill, Winston, 158
Atiyah, Michael, 299, 490 Bourgain, Jean, 97, 326 Cipolla, Michele, 410
Augustine of Hippo, 473 Boyer, Carl Benjamin, 552 Cipra, Barry, 247
Axelsson, Åke, 489 Brassau, Pierre, 489 Civario, Gilles, 402
Brauer, Richard, 515 Clausen, Thomas, 459
Babbage, Charles, 212 Broad, Steven, 23 Clay, Landon T., 490
Bach, Johann Sebastian, 88 Bronstein, Manuel, 303 Cocks, Clifford, 351
Bacon, Kevin, 1, 306 Brooks, Robert W., 357 Cohen, Paul, 269, 307, 484
Baez, John, 559 Brouwer, Luitzen Egbertus Cole, Frank Nelson, 464
Baire, René-Louis, 481 Jan, 543 Collatz, Lothar, 101
Baker, Alan, 288 Brown, Gordon, 123 Condorcet, Nicolas de, 199
Baker, Roger C., 388 Brun, Viggo, 33, 58 Connes, Alain, 163
Banach, Stefan, 62 Buffett, Warren, 470 Conrad, Brian, 233
Banks, William D., 552 Bunyakovsky, Viktor Conway, John Horton, 402,
Banzhaf III, John F., 201 Yakovlevich, 521 441, 446
Barlow, William, 476 Burkhardt-Guim, Paula, xii Cooley, James William, 281
Bateman, Paul T., 521 Burt, David, 18 Cooper, Curtis, 306
565
566 INDEX OF PEOPLE
Corsi, Craig, 2, 258 Figalli, Alessio, 562 Grossman, Jerrold, 72, 306
Cramér, Harald, 410, 552 Fippinger, Miles C., xii Grothendieck, Alexander,
Firk, Frank W. K., 13 233, 376
Dantzig, George, 137, 182 Focardi, Sergio M., 252 Gueganic, Alexandre, xii
Davids, Bob, 317 Fraenkel, Aviezri, 306 Guthrie, Francis, 345
Davis, Martin, 312, 385, 386 Francis, John G. F., 247 Gyárfás, András, 526
de Branges, Louis, 394 Franklin, Benjamin, 14
de Grey, Aubrey, 527 Franklin, Philip, 347 Hadamard, Jacques, 21, 187,
de Moivre, Abraham, 495 Freedman, Michael, 505 190, 249
Debrunner, Hans, 369 Freeman, Jesse, 118 Hadwiger, Hugo, 527
Dehn, Max, 369, 396 Frege, Gottlob, 85 Häggström, Olle, 383
Delaunay, Charles-Eugéne, Frenkel, Igor, 441 Haken, Wolfgang, 346
221 Frey, Gerhard, 234 Hales, Thomas C., 476
Diaconis, Persi, 381 Fried, David, 421 Hall, Monty, 427
Dickson, Leonard Eugene, Fry, John, 451 Halparin, Monte, 427
521 Fry, Roger, 39 Hamel, Georg, 278
Diop, Amina, xii Furstenberg, Hillel, 3, 230, Hamilton, Richard S., 505
Diophantus of Alexandria, 340 Hamming, Richard W., 252
457, 458 Hammond, Christopher N.
Dirichlet, Peter Gustav Leje-
Gale, David, 263 B., 306
une, 3, 172, 458
Galilei, Galileo, 27 Hanke, Jonathan P., 447
Duren, Peter, 308
Gallian, Joseph, 75 Hardy, Godfrey Harold, 14,
Dyson, Freeman, 81, 228,
Galois, Évariste, 145, 514 39, 57, 59, 141, 153, 187,
408
Gardner, Martin, 7, 342, 383, 195, 381, 521, 529, 551,
442, 546 555
Eder, Maciej, 487
Garfield, James A., 315 Harman, Glyn, 388
Egorov, Dmitri, 556
Gauss, Carl Friedrich, 11, 97, Harriot, Thomas, 476
Einstein, Albert, 11
143, 281, 287, 381, 395, Haselgrove, C. Brian, 93
Ekhad, Shalosh B., 403
423, 445, 448 Hasse, Helmut, 48, 170
Elga, Adam, 428
Gelfand, Israel, 148, 197, 252 Hawkins, David, 408
Elkies, Noam, 39
Gelfond, Alexandr, 117 Hay, Mark, xii
Eppstein, David, 527
Gentleman, Robert, 487 Heath-Brown, Roger, 552
Erdős, Paul, 1, 90, 102, 129,
Germain, Sophie, 460 Heaviside, Oliver, 11
187, 189, 305–307, 340
Gerwien, Paul, 369 Heawood, Percy J., 346
Escher, Maurits Cornelis, 88,
Geyer, Lukas, 362 Heegner, Kurt, 288, 454
145
Eskew, Monroe, 559 Gibbon, Edward, 145 Heeringa, Brent, 122
Eubulides of Miletus, 85 Gibbs, Josiah Willard, 11, 95 Helfgott, Harald Andrés, 127
Euclid of Alexandria, 4, 272, Ginsparg, Paul, 433 Hellegouarch, Yves, 234
275, 369, 474 Gladwell, Malcolm, 138 Hensel, Kurt, 17
Euler, Leonhard, 2, 39, 91, Glassman, Zachary, xii Hermite, Charles, 329, 342
117, 118, 171, 295, 378, Gödel, Kurt, 86, 122, 141, Hilbert, David, 40, 71, 86,
422, 423, 458, 463, 474, 269, 484 117, 269, 311, 369, 385,
513, 533 Goffman, Casper, 275, 305 457, 490, 564
Evans, Jonny, 399 Goldbach, Christian, 127 Hirzebruch, Friedrich, 300
Goldfeld, Dorian, 187 Hoover, Colleen, 23
Fabozzi, Frank J., 252 Goldston, Daniel, 528 Horn, Roger A., 521, 523
Faltings, Gerd, 378 Golomb, Solomon, 527 Horner, William George, 285
Fedi, Zolt, 306 Gomory, Ralph E., 509 Householder, Alston Scott,
Ferguson, Samuel P., 476 Gorenstein, Daniel, 514 247
Fermat, Clément-Samuel, Gosset, William Sealy, 537 Huang, Ming-Deh A., 501
458 Gowers, Timothy, 90, 490 Hughes, Colin, 493
Fermat, Pierre de, 145, 208, Graham, Ronald, 90, 431,
234, 311, 421, 448, 457, 442 Ihaka, Ross, 487
512 Granville, Andrew, 549 Ingham, Albert, 172, 388
Feynman, Richard, 76 Green, Ben, 4, 58, 511 Irons, Jeremy, 144
INDEX OF PEOPLE 567
Jacobi, Carl Gustav Jacob, Lagarias, Jeffrey, xii, 103, Manasse, Mark, 355
445 108, 370, 477 Marchal, Christian, 476
James, Bill, 319 Lagrange, Joseph-Louis, 39, Markov, Andrey, 220, 381,
Jensen, Alexandra, 346 445, 475 487
Jensen, Johan Ludwig, 460 Lambert, Joel, 306 Mason, Richard C., 375
Johnson, Charles R., 523 Lamé, Gabriel, 458, 499 Masser, David, 376
Johnson, Dano, 531 Landau, Edmund, 172, 521 Matelski, J. Peter, 357
Jones, James P., 386 Lander, Leon J., 39 Mathey, Steven, 491
Jones, Michael, 200 Langlands, Robert, 293 Matiyasevich, Yuri, 312, 385,
Jones, Peter, 277 Laplace, Pierre-Simon, 381 386
Jones, Toby, 144 Le Gall, François, 283 Maynard, James, 33, 528
Jones, Vaughan F. R., 163, Le Verrier, Urbain, 221 Mazur, Barry, 47
396 Lebesgue, Henri, 458 McGarvey, Joey, 158
Lebesgue, Victor-Amédée, McGuire, Gary, 402
Kahoro, Elvis, 416 458 McKay, John, 440
Kanigel, Robert, 143 Leclerc, Georges-Louis, 176 Mercer, Idris D., 230
Kant, Immanuel, 273 Lee, David, xii Mersenne, Marin, 463
Karp, Richard, 402 Lee, Harper, 487 Mertens, Franz, 110, 189
Kasiski, Friedrich, 212 Lefschetz, Solomon, 300 Merton, Robert C., 469
Katz, Nick, 82 Legendre, Adrien-Marie, Merton, Robert K., 552
Kayal, Neeraj, 501 445, 458 Metropolis, Nicholas, 219
Kehle, Paul, 265 Lehmer, Derrick Henry, 465 Meurman, Arne, 441
Kempe, Alfred, 346 Lehr, Jessica, xi Milićević, Djordje, 77
Kepler, Johannes, 476 Leibniz, Gottfried Wilhelm, Miller, Gary Lee, 501
Kestemont, Mike, 487 147, 561 Miller, Stephen D., 365
Khavinson, Dmitry, 362 Lemke Oliver, Robert, 415 Mills, William H., 388
Khinchin, Aleksandr, 97, 113 Lenstra, Arjen K., 355 Milnor, John, 76
Khovanov, Mikhail, 400 Leontief, Wassily, 472 Mirzakhani, Maryam, 562
Kjos-Hanssen, Bjørn, xii Lepowsky, James, 441 Mishkin, Pamela, 346, 465
Klamkin, Murray S., 103 Levi-Civita, Tullio, 11 Mizgerd, Clayton, xii
Kleene, Stephen, 122 Levinson, Norman, 153 Möbius, August Ferdinand,
Klein, Felix, 243, 440 Lewis, Michael, 317 242
Klyachko, Alexander, 68 Lie, Sophus, 514 Mochizuki, Shinichi, 376
Knuth, Donald, 363, 443 Lindeberg, Jarl, 54 Molchanov, Stanislav, 253
Knutson, Allen, 68 Lindemann, Ferdinand von, Monaco, Jane J., 435
Kobayashi, Forest, xii 329 Montague, David, 207
Kodaira, Kunihiko, 234 Linnik, Yuri Vladimirovich, Montgomery, Hugh, 81, 408
Koebe, Paul, 394 553 Mordell, Louis, 46
Kolmogorov, Andrey, 221, Liouville, Joseph, 118, 228, Moreno, Samuel G., 35
381 329 Morgan, Frank, 306, 460, 507
Kominers, Scott Duke, xii, Listing, Johann Benedict, Morgenstern, Oskar, 163
447 242 Morin, Bernard, 241
Kontorovich, Alex, 306, 323, Littlewood, John Edensor, Morse, Marston, 21
326, 402 39, 57, 59, 108, 143, 383, Moser, Jürgen, 221
Korselt, Alwin, 552 521, 529, 555 Moser, Leo, 91, 527
Kraitchek, Maurice, 383 Logsdon, Ben, xii Moser, William, 527
Krantz, Steven G., 141 Lorenz, Edward, 257 Mumford, David, 76
Krohn, Maxwell, 489 Lovász, László, 525 Muñoz-López, José, xii
Kublanovskaya, Vera, 247 Lowell, Percival, 75 Munroe, M. E., 275
Kummer, Ernst, 459 Lowry, John, 369 Müntz, Herman, 197
Kurschak, Josef, 17 Luca, Florian, 306, 417 Murray, Francis, 398
Kuzmin, Rodion, 117 Lucas, Édouard, 423, 464 Murty, M. Ram, 306
Lyons, Richard, 515 Myerson, Gerry, 505
Labbé, Cyril, 490
Lacan, Jacques, 436, 466 Mackall, Blake, 441 Na, Giebien, xii
568 INDEX OF PEOPLE
Nash Jr., John Forbes, 163, Quillen, Daniel, 76 Scott, Alex, 526
543 Segal, Irving, 559
Navier, Claude-Louis, 491 Rabin, Michael Oser, 501 Selberg, Atle, 153, 187
Nelson, Edward, 527, 559 Rainich, Georg Yuri, 534 Seldon, Hari, 145
Neuenschwander, Dwight E., Raleigh, Walter, 476 Selhorst-Jones, Vincent, 306,
251 Ramanujan, Srinivasa, 40, 307
Newton, Isaac, 67, 435, 561 59, 143, 172, 342, 365, Selvin, Steve, 427
Neyman, Jerzy, 137 381, 446, 448, 551 Serre, Jean-Pierre, 233, 299
Nicely, Thomas, 33 Ramsey, Frank Plumpton, 89 Severini, Carlo, 556
Nimitz, Chester W., 160 Reid, Constance, 385 Seymour, Paul, 526
Nishikado, Tomohiro, 359 Reidemeister, Kurt, 396 Shakespeare, William, 487
Nobel, Alfred, 469 Reiter, Harold, xi, 253 Shamir, Adi, 351
Norton, Simon P., 441 Rényi, Alfréd, 306 Shao, Lily, xii
Norwich, John Julius, 201 Reznick, Bruce, 76 Shapiro, Arnold, 241
Ribet, Ken, 234 Shapiro, Daniel, 235
O’Brien, Miles, 392 Ricci-Curbastro, Gregorio, Shapiro, Harold S., 306
O’Neill, Cathy, 267 11 Shapley, Lloyd, 263
Odlyzko, Andrew, 81, 407, Riemann, Bernhard, 11, 151, Shavgulidze, E. T., 63
408 188 Sheil-Small, Terence, 362
Oesterlé, Joseph, 376 Ringel, Gerhard, 347 Sheldon, Kathy, xii
Olbers, Wilhelm, 284 Risch, Robert Henry, 302 Sherman, David, 306
Ono, Ken, 60, 448 Rivest, Ronald, 351 Shiing-Shen, Chern, 293
Orwell, George, 391 Robertson, Neil, 526 Shor, Peter, 352
Ostrowski, Alexander, 17 Robinson, Julia, 312, 385, Siegel, Carl Ludwig, 228, 460
386 Siegel, Zachary, xii
Palka, Bruce, 534 Robinson, Raphael, 61, 71 Sierpiński, Waclaw, 271
Pan, Chengdong, 553 Rochefort, Joseph J., 160 Silva, Cesar E., 97
Parkin, Thomas R., 39 Rosenthal, Jeffrey, 219 Silverman, Joseph H., 376,
Patel, Dev, 144 Ross, Arnold, 235, 421 377
Penrose, Lionel, 201 Ross, W. Bruce, 91 Šimerka, Václav, 552
Perelman, Grigori, 433, 505 Ross, William T., 306 Simpson, Homer, 241
Perichon, Benoı̂t, 511 Roth, Alvin, 263 Singer, Isador, 299
Perpetua, Byron, 9 Roth, Klaus Friedrich, 228, Skewes, Stanley, 108
Picard, Charles Émile, 166 340 Smale, Stephen, 241, 505
Picard, Jean-Luc, 145, 166, Rowling, J. K., 487 Smith, Stephen D., 514
392 Royden, Halsey, 307 Smith, Winston, 391
Pinch, Richard G. E., 549 Rubik, Ernő, 333 Snow, Joanne, 23
Pinter, Mike, xi Rudin, Walter, 275 Snyder, Noah, 376
Pintz, János, 388, 528 Russell, Bertrand, 85, 86 Sokal, Alan, 436, 466
Poincaré, Henri, 21, 221, 257 Rybicki, Jan, 487 Sós, Vera, 306
Pólya, George, 91, 407 Soundararajan, Kannan,
Pomerance, Carl, xii, 503, Sally, Paul, 371 415, 448, 528
504, 549, 551 Sarason, Donald, 197 Spencer, Joel, 91, 93
Post, Emil Leon, 122 Sarnak, Peter, 82 Sperner, Emanuel, 494
Pratt, Kyle, 288 Sato, Daihachiro, 386 Spirkl, Sophie, 526
Punnett, Reginald Crundall, Savant, Marilyn vos, 427 Stark, Eberhard L., 35
142 Saxena, Nitin, 501 Stark, Harold, 288
Pushkin, Alexander, 220, Schilly, Harald, xii Stein, William A., xii, 519
487 Schinzel, Andrzej, 305, 521 Stepanov, Sergei Aleksan-
Putinar, Mihai, 306 Schneeberger, William, 446 drovich, 170
Putnam, Elizabeth Lowell, Schneider, Theodor, 117 Stevens, Glenn H., 421
75 Scholes, Myron S., 469 Stigler, Stephen, 552
Putnam, Hilary, 312, 385, Scholze, Peter, 562 Stoiciu, Mihai, 482
386 Schrödinger, Erwin, 67, 383 Stokes, George Gabriel, 491
Putnam, William Lowell, 75 Schultz, William Henry, 208 Stone, Daniel F., 164
INDEX OF PEOPLE 569
Stone, Harlan F., 69 Trinh, Minh-Tam, 460 Wellens, Jake, 114

Stone, Marshall H., 69 Tripp, Samuel, 403 Weston, J. D., 556
Stothers, Walter Wilson, 375 Truman, Harry S., 209 Wetzel, John E., 307
Strassen, Volker, 283 Tugemann, Bastian, 402 Wheeler, Jeffery Paul, 376
Stribling, Jeremy, 489 Tukey, John, 281 Whitehead, Alfred North, 86
Su, Francis, 558 Tunnell, Jerrold Bates, 454 Whitehead, Ian, 158, 294
Suh, Hong, xii Turán, Pál, 340 Wieman, Hunter, xii
Summers, Alexander, xii Turing, Alan, 121, 157 Wien, Douglas, 386
Świa̧tek, Grzegorz, 362 Wiener, Norbert, 146, 148
Sylvester, James Joseph, 474 Udell, Gabe, xii Wigner, Eugene, 79, 82, 251
Szász, Otto, 197 Ulam, Stanislaw, 101, 164, Wilcox, James, 364
Szekeres, George, 340 175, 546 Wiles, Andrew, 145, 208,
Szemerédi, Endre, 58, 340 234, 294, 457
Vallée-Poussin, Charles Jean Wiley, Chad, 397
de la, 187, 190 Wilf, Herbert, 276, 416
Tai, Mary M., 434
van der Waerden, Bartel
Tait, Peter Guthrie, 346 Wilmshurst, A. S., 362
Leendert, 90
Takagi, Teiji, 360 Wilson, Edward Osborne,
Vazsonyi, Andrew, 430
Tao, Terence, 4, 33, 58, 68, 252
Velupillai, K. Vela, 251
511, 512, 559 Wilson, Kenneth, 76
Venkatesh, Akshay, 562
Tarjan, Robert, 71 Wishart, John, 79
Vigenère, Blaise de, 212
Tarski, Alfred, 62 Witten, Edward, 400
Vinogradov, Ivan Matveye-
Tate, John, 490 Wolfram, Stephen, 413
vich, 127
Tausk, Daniel, 559 Woltman, George, 464
Vitali, Giuseppe, 65
Taussky-Todd, Olga, 407 von Dyck, Walther, 254, 509
Taylor, Richard, 234, 457 Xylouris, Triantafyllos, 553
von Mangoldt, Hans Carl
Tener, James, xii, 397 Friedrich, 409
Tennenbaum, Stanley, 205 von Mises, Richard, 138 Yeates, Bree, 282
Thai, Minh, 333 von Neumann, John, 63, 67, Yıldırım, Cem, 528
Thielman, H. P., 275 96, 163, 179, 398 Youngs, John W. T., 347
Thomas, Robin, 526 von Staudt, Karl, 459 Yule, Udny, 142
Thompson, John, 63 Vongsathorn, Xan, 334
Thue, Axel, 228, 442, 475 Zaremba, Stanislaw, 323
Thurston, William, 241 Wada, Hideo, 386 Zeckendorf, Edouard, 312
Tijdeman, Robert, 442 Wallace, William, 369 Zeilberger, Doron, 403, 416
Titchmarsh, Edward Wallis, John, 539 Zemdegs, Feliks, 333
Charles, 142 Wantzel, Pierre, 424 Zhang, Scott Sicong, 334
Tits, Jacques, 514 Waring, Edward, 39 Zhang, Yitang, 33, 128, 528
Tonelli, Leonida, 556 Weil, André, 76, 170, 293 Ziegler, Günter M., 3, 189
Tóth, László Fejes, 475 Weinberg, Wilhelm, 142 Zong, Chuanming, 371
Travis, Jeffrey, 531 Weiss, Gary, 306 Zuboff, Arnold, 428
Index
3x + 1 conjecture, 101 algebraic number, 117, 169, Apéry’s constant, 365

3x + 1 problem, 101, 563 295, 329, 332 Archimedean, 18
QR-decomposition, 473 algorithm, 335; 196-, 104; di- Aristotle, 370
γ, 172 vision, 461; Euclidean, arithmetic progression, 1,
φ, 223, 313 207, 499; fast Fourier 228, 339; arbitrarily long,
π, 227, 228 transform, 281; Gale– 2, 339; as a Diophantine
π, 98, 113, 115, 134, 222, 230, Shapley, 264; greedy, 312, set, 386; forming a topol-
290, 294, 329, 332, 478 364; Horner’s method, ogy, 230; monochromatic,
π 2 , 189 284; integer factoriza- 90; of perfect squares, 3,
√
2, 205 tion (naive), 181, 499, 512; of primes, 2, 3, 58,
abc-conjecture, 376, 378, 521, 500; Markov chain Monte 511
563 Carlo, 219; matrix mul- Arrow’s theorem, 199
e, 98, 115, 134, 227, 258, 324, tiplication, 282; Me- Artin–Whaples product for-
329, 331, 332, 460 tropolis, 219; Newton’s mula, 17
e + π, 118, 330 method, 259; PageRank, arXiv, 433
eπ, 118, 330 465; Pollard’s p − 1, 353; ASCII, 123
j-invariant, 343, 440 polynomial-time, 501; asymptotically equivalent,
t-test, 538 Risch, 302, 414; RSA, 33
15-theorem, 446 352; Rubik’s Cube, 334;
Atiyah–Singer index theo-
196-algorithm, 104 Shor’s, 352; simplex, 182,
rem, 299
290-theorem, 447 185; Strassen, 283; to
automorphic forms, 293
compute square roots,
axiom of choice, 62, 65, 86,
277; Wilf–Zeilberger, 416
Abel Prize, 76, 163, 299, 300, 108, 141, 269, 277, 278,
almost everywhere, 555
340 482, 557
alternating group, 439, 514
Abel–Ruffini theorem, 329 axiom of foundation, 85
alternating harmonic series,
absolute differential calculus, axiom of infinity, 85, 87
110
11 alternating series test, 110 axiom of pairing, 85
absolute value, 17; p-adic, amenable, 63 axiomatic set theory, 85
17; standard, 17; trivial, American Institute of Math-
17 ematics, 451
absolutely convergent, 110, Bacon number, 306
American Mathematical So-
146, 148 badness, 364
ciety, 385
abstract nonsense, 295 Baire category theorem, 481
American Standard Code for
Ackermann’s function, 443 Information Interchange, Baker–Heegner–Stark theo-
additively large, 340 123 rem, 288, 534
Advanced Encryption Stan- analysis situs, 242 ballotino, 201
dard, 124 analytic continuation, 151, Banach algebra, 148
AES, 124 365, 407 Banach–Tarski paradox, 61,
AIM, 451 analytic function, 151, 393 241, 277, 369, 381
AKS primality test, 501 analytic index, 302 base for a topology, 230
Alexander polynomial, 396 analytic rank, 47, 48 baseball, 317–319
Alexander’s theorem, 399 Ann Arbor Problem Book, Basel problem, 35, 140, 294
algebraic conjugate, 495 307 basic construction, 398
algebraic irrational, 114 AP-rich, 339 basic solution, 182
571
572 INDEX
Bateman–Horn conjecture, Cantor set, 22, 270, 271, 482 Collatz sequence, 101
34, 57, 129, 464, 512, 521, Cantor surjection theorem, companion matrix, 296
528, 533, 546, 550, 563 24 complete graph, 442
Bateman–Horn constant, Cantor’s powerset theorem, completeness theorem, 86
522, 534 31 conditionally convergent,
Battle of Midway, 160 cardinality, 27, 28, 269 110
bell curve, 52 Carmichael number, 501, Condorcet cycle, 200
Benford’s law, 54, 102, 131, 504, 549 Condorcet winner, 200
223 Catalan number, 253, 254, Condorcet winner criterion,
Bernoulli numbers, 459 540 200
Bernoulli random variable, category theory, 481 congruence obstruction, 57
55, 179 Cauchy functional equation, congruent number, 452, 453
Bernstein polynomial, 196 278 congruent number problem,
Bertrand’s postulate, 129 Cauchy product, 109 452
Beurling’s theorem, 195 Cauchy random variable, 53 conjecture; 3x + 1, 101;
Bieberbach conjecture, 393 Cauchy–Riemann equations, abc-, 376, 378, 521,
bijection, 28 8 563; Bateman–Horn, 34,
billiards, 258 central limit theorem, 51, 52, 57, 129, 464, 512, 521,
Binet’s formula, 313, 495, 54, 55, 79, 176, 179, 303, 528, 533, 546, 550, 563;
519 411, 537, 539 Bieberbach, 393; Birch
binomial random variable, 55 chain of subsets, 31 and Swinnerton-Dyer,
Birch and Swinnerton-Dyer chaos, 221 47, 454, 455; Conway–
conjecture, 47, 454, 455, character, 148 Norton, 441; epsilon, 234;
491 characteristic function, 54, Erdős–Turán, 340, 343;
Birkhoff ergodic theorem, 96 95 Erdős, 2, 343; Euler’s on
birthday attack, 139 characteristic polynomial, sums of powers, 39; Fer-
birthday paradox, 139 77, 247, 296, 496 mat’s, 422; Gauss’s class
birthday problem, 138 Chebyshev’s bias, 415 number, 288; Goldbach,
Black–Scholes model, 470 checksum, 124 57; Goldbach binary,
blancmange function, 360 chess, 493 127; Goldbach ternary,
Blaschke condition, 195, 197 Chinese remainder theorem, 127; Hardy–Littlewood
Bletchley Park, 157, 210 238, 535 k-tuple, 57, 58, 128;
Boneyard Book, 308 chromatic number, 525 Hardy–Littlewood (twin
Borsuk–Ulam theorem, 165 ciphertext, 124 primes), 34; Heawood,
Boston Red Sox, 317 circle method, 40, 57, 127, 346; Hilbert–Pólya, 407;
braid group, 399 441 Kepler, 346, 476; Lan-
Brouncker’s formula, 115 class field theory, 294 dau’s, 528; Mordell’s, 378;
Brouwer’s fixed-point theo- class number, 287, 446, 459 Poincaré, 433, 505, 507;
rem, 164, 494, 543 class number one problem, Polignac’s, 33; Pólya’s, 91;
Brownian motion, 470 287 Ramanujan, 294; Sato–
Brun’s constant, 33, 37 classification of finite simple Tate, 294; Taniyama–
Brunn–Minkowski theorem, groups, 300, 439, 513 Shimura, 48; Thwaites,
23 classification of surfaces, 506 101; twin prime, 33, 57,
Buffon’s needle problem, 176 Clausen–von Staudt theo- 408, 433, 522; Ulam’s,
Burali–Forti paradox, 88 rem, 459–461 101; Zaremba’s, 323, 326
busy beaver, 122 Clay Millennium Problems, connected sum, 508
busy beaver function, 122 xi, 47, 108, 153, 487, 505 consistent, 86, 270, 484
butterfly effect, 221, 257 clique number, 525 constant; Apéry’s, 365;
closed graph theorem, 481 Bateman–Horn, 522,
C ∗ -algebra, 302 closed set, 230 534; Brun’s, 33, 37;
Caesar cipher, 124, 212 CoCalc, 521 Conway, 402; Euler’s,
Calkin–Wilf sequence, 29 Cole Prize, 464 see also e; Euler–
canonical linear program- collaboration graph, 305 Mascheroni, 134, 152, 172;
ming problem, 183 Collatz function, 101 Gelfond–Schneider, 117;
Cantor dust, 24 Collatz graph, 101 Khinchin’s, 113–115, 134;
INDEX 573
Liouville’s, 119, 227, 329; Diophantine equation, 311, 378, 461; functional (zeta
Meissel–Mertens, 189; 385, 386 function), 152; Orwell’s,
Mills’s, 388; Planck’s, 67; Diophantine set, 386 391; Schrödinger, 67
Ramanujan’s, 342, 534; Dirac delta functional, 80 equidecomposable, 369
twin primes, 34 Dirichlet divisor problem, equidistributed modulo 1,
constraint matrix, 181 172 96, 131, 133
constructible, 424 Dirichlet’s approximation equinumerous, 27, 270, 545
constructible polygon, 423 theorem, 223, 227 equivalence relation, 62, 64
consumption matrix, 472 Dirichlet’s box principle, 223 Eratosthenes of Cyrene, 408
continued fraction, 73, 97, Dirichlet’s theorem on Erdős–Turán conjecture,
113, 323, 324, 326 primes in arithmetic pro- 340, 343
continuum hypothesis, 86, gressions, 3, 58, 291, 354, Erdős conjecture, 2, 343
269, 272, 277, 307 415, 522, 528, 552, 553 Erdős number, 1
contraction mapping princi- discrete dynamical system, Erdős–Bacon number, 306
ple, 165 95 ergodic hypothesis, 95
Conway’s constant, 402 discrete Fourier transform, ergodicity, 96
Conway–Norton conjecture, 281 error function, 470
441 discriminant, 287; of an ellip- Euclid’s theorem, 3, 87, 111,
cookie problem, 313 tic curve, 45 189, 230, 295, 423, 513
Coq, 346 division algorithm, 461 Euclid–Mullin sequence, 513
cosmological theorem, 402 divisor function, 171 Euclidean algorithm, 207,
countable, 28 Doctor Who, 67, 85, 145, 499
Cramér model, 410, 463 267, 393, 487 Euclidean geometry, 272
critical line, 153, 388 dyadic filtration, 25 Euclidean norm, 193
critical strip, 152, 407, 409 Dyck path, 254 Euler characteristic, 347, 509
cryptography, 47, 124, 224, Dyck’s theorem, 509 Euler product formula, 110,
351 dynamical system, 221 140, 151, 188–190, 293,
cubic close packing, 476 294, 407, 409
cycle, 386 Earth, 68, 439 Euler totient function, 416
cyclic group, 514 Egorov’s theorem, 555 Euler’s constant, see also e
cyclotomic field, 459 eigenvalue trace lemma, 80 Euler’s formula, 8, 35, 40,
election procedure, 199 118
cyclotomic polynomial, 236
Electronic Frontier Founda- Euler’s power tower, 72
tion, 464 Euler–Lucas theorem, 422
Data Encryption Standard, elementary function, 302 Euler–Mascheroni constant,
124 ellipse, 146 134, 152
de Morgan’s law, 231 elliptic curve, 513; analytic existential proof, 117
degree, 31 rank, 47; and congruent expected value, 51
Dehn invariant, 369 numbers, 454; definition, extreme point, 194
Delbert Ray Fulkerson Prize, 45; discriminant, 45; Frey,
526 234; group operation, 46; Facebook, 1, 305
density, 339 Hasse–Weil L-function of, factor, 398
derangement, 158 48; largest known rank, fast Fourier transform, 281,
DES, 124 47; modular, 234; rank, 453
diagonal argument, 29 47; rational point, 46, 454; feasible, 182
diagonal matrix, 447 torsion subgroup, 47 Fenway Park, 57
diet problem, 182, 184 empirical spectral measure, Fermat equation, 311, 378,
differential equation, 8, 166, 80 461
221, 257, 299 energy-momentum invariant, Fermat number, 421–423
digit expansion; base B, 113; 11, 12 Fermat prime, 421, 422, 424
binary, 113; decimal, 113 Enigma machine, 122, 157 Fermat’s conjecture, 422
dihedral group, 439 epsilon conjecture, 234 Fermat’s last theorem, 39,
dimension theorem, 196 equation; Black–Scholes, 145, 169, 208, 234, 294,
Diophantine approximation, 470; Diophantine, 311, 311, 375, 376, 378, 452,
62, 207, 222, 229, 391 385, 386; Fermat, 311, 457, 460, 476
574 INDEX
Fermat’s little theorem, 351, 539; generating, 448, 497; Gelfond–Schneider constant,
354, 375, 499, 501 inner, 195; iterated expo- 117
Fermat’s polygonal number nential, 72; Koebe, 393; Gelfond–Schneider theorem,
theorem, 448 L-, 3, 48; logarithmic in- 117, 119
fetid dingo’s kidneys, 27, tegral, 107, 153, 409, 411, general comprehension prin-
466, 489 547; matrix exponential, ciple, 85
FFT, 453 69; meromorphic, 233; general theory of relativity,
Fibonacci number, 131, 235, modular, 440; multiplica- 11
312, 313, 373, 495, 497, tive, 171; periodic, 281; generating function, 59, 448,
519 prime-counting, 107, 108, 497
Fields Medal, 76, 90, 145, 129, 151, 516; ramp, 555;
geodesic, 12, 21
170, 228, 269, 287, 288, rational, 233, 377; Rie-
geometric mean, 114
294, 300, 378, 397, 433, mann zeta, 3, 48, 79, 81,
geometric progression, 341
441, 446, 491, 505, 507, 110, 139, 151, 169–171,
187, 189, 190, 363, 364, geometric series, 19, 36, 59,
562
388, 407, 409, 410, 459; 78, 151
finite, 28
finite extension, 169 sawtooth, 360; square- Géométrie algébrique et
integrable, 97; square- géométrie analytique, 233
first category, 481
first incompleteness theorem, wave, 147; sum of divi- GIMPS, 464
86, 88 sors, 172; Takagi, 360; global function field, 169
fixed point, 164, 357, 543 transcendental, 233; von Global Median Matching,
Flint Hills series, 229 Mangoldt, 409 265
forcing, 269 function field, 169; global, Global Positioning System,
formula; Binet’s, 495, 519; 169; Riemann hypothesis, 14
Simson’s, 496; Stirling’s, 170; zeta function, 169 Goldbach conjecture, 57, 127
539; Wallis’s, 539 functional equation, 152; golden ratio, 223, 313, 495
forward orbit, 103 Cauchy, 278 Golden State Warriors, 317
four color theorem, 345, 346, fundamental group, 347
Golomb graph, 527
476 fundamental lemma, 294
Goodstein sequence, 86
four fours puzzle, 392 fundamental polygon, 506
Goodstein’s theorem, 87
four-square identity, 39, 41 fundamental theorem of alge-
GP-rich, 341
Fourier coefficients, 146 bra, 118, 358, 484
fundamental theorem of GPS, 14
Fourier matrix, 284
arithmetic, 29, 37 Graham’s number, 439, 442
Fourier series, 146, 221, 301
Gram–Schmidt process, 249
fractal, 18, 22, 260, 357
GAGA principle, 233 graph, 253, 386; collabora-
fractal dimension, 270, 271
Gale–Shapley algorithm, 264 tion, 305; complete, 442;
Franklin graph, 347, 349
Galileo’s paradox, 27 Franklin, 347, 349; friend-
frequency analysis, 125
Galois representation, 293 ship, 306; spanning tree in
frequency-wave number in-
Galois theory, 329 a, 484
variant, 11
game theory, 163, 543 gravitational lensing, 14, 362
Fresnel integral, 303
Frey curve, 234 gamma function, 152, 479, gravitational waves, 14
friendship graph, 306 537, 539 Great Internet Mersenne
Fubini–Tonelli theorem, 472 Gauss map, 98 Prime Search, 464
Fulkerson Prize, 526 Gauss measure, 98 greatest common divisor,
function; Ackermann’s, 443; Gauss sum, 284 207, 235
analytic, 393; blanc- Gauss’s class number conjec- greedy algorithm, 312
mange, 360; bump, 556; ture, 288 Green–Tao theorem, 2, 58,
busy beaver, 122; charac- Gauss’s lemma, 495 340, 511, 512, 521, 522
teristic, 95; Collatz, 101; Gauss–Kuzmin theorem, 97 group; alternating, 439; clas-
continuous and nowhere- Gauss–Wantzel theorem, 424 sification of finite simple
differentiable, 360; divi- Gaussian, 52, 54, 55 groups, 439; cyclic, 514;
sor, 171; elementary, 302; Gaussian integer, 511 dihedral, 439; fundamen-
Euler totient, 351, 416; Gaussian prime, 511 tal, 347; monster, 439;
gamma, 152, 479, 537, GCHQ, 351 of Lie type, 514; pariah,
INDEX 575
514; quasithin, 514; Ru- Householder matrix, 248 Kakutani’s problem, 101
bik’s Cube, 333; simple, Householder reflections, 247 KAM theory, 221
439, 513; sporadic, 514 hyperbolic geometry, 272, Kasiski method, 212
Grundgesetze der Arith- 273 Kepler conjecture, 346, 476
metik, 85 hypercube, 531 Khinchin’s constant, 114,
115, 134
Hadamard conjecture, 250 impossibility theorem, 199 Kirby–Paris theorem, 87
Hadamard matrix, 249 incompleteness theorem; Klein bottle, 243, 346, 505,
Hadamard’s inequality, 249 first, 86; second, 86 508
Hadwiger–Nelson problem, indicator function, 95 knot; Stevedore, 398; trefoil,
527 inertial frames, 11 395, 505; unknot, 395
Hahn–Banach theorem, 197 infinite, 28 knot polynomial, 396
halting problem, 122 infinitesimal generator, 69 Kolmogorov–Arnold–Moser
Hamiltonian, 67, 69, 79, 221 infinity, 269 theorem, 221
Hamiltonian cycle, 386 injective, 27 Korselt’s criterion, 549
Hardy space, 195, 301 inner function, 195 Kronecker product, 296
Hardy–Littlewood k-tuple Institute for Advanced Kronecker’s approximation
conjecture, 57, 58, 128 Study, 408 theorem, 134
Hardy–Littlewood conjec- Intel, 33 Kronecker–Weyl theorem,
ture (twin primes), 34 interlace, 67 133
Hardy–Weinberg law, 142 intermediate value theorem, Kummer’s congruence, 460,
harmonic number, 108, 553 77, 78, 164, 461 461
harmonic series, 35, 40, 110, Internal Revenue Service, 54
301 International Congress of L’Hôpital’s rule, 555
Hasse–Minkowski local- Mathematicians, 117, 269, L-function, 3, 48, 293;
global principle, 20 311, 369, 515, 564 Hasse–Weil, 48; symmet-
Hasse–Weil L-function, 48 International Mathematics ric power, 294
Hausdorff maximality princi- Competition for Univer- Lagrange’s four-square theo-
ple, 484 sity Students, 76 rem, 445, 448
Hausdorff topology, 231 invariance of domain, 545 Landau’s conjecture, 528
Hawkins prime, 408 invariant, 371, 395 Langlands program, 293, 563
Heawood conjecture, 346 invariant set, 96 Laplace’s method, 539
Heegner number, 343 invariant subspace, 194 large cardinal, 397
heptadecagon, 423 invisible forest, 237 largest known prime, 464
hereditary base-b representa- irrational, 295 Laser Interferometer
tion, 86 irrational rotation, 95 Gravitational-Wave Ob-
heuristic reasoning, 103, 421, irrationality measure, 227, servatory, 14
422, 424, 463, 553 295 √ LATEX, 363, 413
hexagonal close packing, 476 irrationality of 2, 205 lattice point, 237
hexagonal lattice packing, irrationality type, 133, 222 Laurelin the Golden, 254
475 irreducible representation, Laurent polynomial, 396
Hilbert space, 68, 79, 96, 398, 441 law of complementary prob-
407 IRS, 54 ability, 138
Hilbert’s problems, xi, 117, isologous, 211 law of large numbers, 96
153, 269, 288, 311, 369, iterated exponential func- Lebesgue measure, 24, 555
385, 451 tion, 72 Legendre symbol, 534
Hilbert–Pólya conjecture, iterated towers, 109 lemma; eigenvalue trace, 80;
407 Gauss’s, 495; Sperner’s,
Hodge conjecture, 491 Jacobi symbol, 503 494, 544–546; Zorn’s, 483
homeomorphism, 23, 347, Jacobi’s four-square theo- Leroy P. Steele Prize, 275
544, 545 rem, 445 liar’s paradox, 85, 86, 381
HOMFLY polynomial, 397 Jarnı́k competition, 76 LIGO, 14
Honors Class, 117 Jones polynomial, 395–397, linear programming, 181, 403
Horn conjecture, 68 399 Liouville lambda function, 91
Horner’s method, 284, 285 Jones tower, 398 Liouville number, 118
576 INDEX
Liouville’s constant, 119, 62, 69, 248, 249; permu- Müntz–Szász theorem, 197
227, 329 tation, 439; positive semi-
Liouville’s theorem, 118, 119, definite, 67, 250; real or-
329 thogonal, 69; real sym- naive measure theory, 95
Littlewood’s principles, 555 metric, 80, 82; selfadjoint, naive set theory, 85
Local Median Matching, 265 67 Nash equilibrium, 163, 164,
local-global principle, 20 mean, 51 543
logarithmic derivative, 188 mean value theorem for inte-
National Institute of Stan-
logarithmic integral, 107, grals, 35
dards and Technology, 124
153, 409, 411, 522, 547 measure zero, 22, 96, 97, 118
National Medal of Science,
look and say sequence, 402 Meissel–Mertens constant,
363
189
Lorentz transformation, 11 National Museum of Mathe-
Mercury, 14
Lucas–Lehmer primality matics, 561
Mersenne number, 463, 503
test, 464, 465 National Resident Match
Mersenne prime, 463, 473
Lusin’s theorem, 555 Program, 263
Mertens’s theorem, 37, 109,
Lyapunov central limit theo- National Science Founda-
110
rem, 411 tion, 451
Mertens’s theorem (prime re-
Lychrel number, 104 National Security Agency,
ciprocals), 189
method of stationary phase, 209
539 natural density, 339
Maass form, 294 natural number, 1
method of undetermined co-
MacArthur Fellow, 76 Navier–Stokes Equation, 491
efficients, 8
Magic Cube, 333 negative curvature, 21, 23
metric space, 22, 481
major arc, 41 Neptune, 221
Metropolis algorithm, 219
Major League Baseball, 311, middle square digits method, Newcomb’s paradox, 383
317, 319 179 Newton fractal, 260, 261
Mandelbrot set, 357 Millennium Prize Problems, Newton’s method, 259, 277
manifold, 241 454 Newton’s second law, 67
Maple, 519 Millennium Problems, 47 Nielsen–Schreier theorem,
MapQuest, 57 Miller–Rabin test, 501 484
Markov chain, 219, 220, 318, Mills’s constant, 388 NIST, 124
544 minor arc, 41 Nobel Prize, 76, 79, 163, 199,
Markov chain Monte Carlo Möbius strip, 241, 243, 506 251, 263, 435, 469, 472,
algorithms, 219 modular, 440 543, 562
Markov’s theorem, 399 MoMath, 561 non-Euclidean geometry, 273
Mars, 68, 75 moment, 51
nonorientable, 241, 243
Mason–Stothers theorem, Moneyball, 317
nontrivial zeros of the zeta
169, 375–377 monoid, 508
function, 153
Matching; Global Median, monotone sequence property,
norm; Euclidean, 193; on a
265; Local Median, 265 37
vector space, 193
Mathematica, 140, 413, 416, monovariant, 371
normal distribution, 51
429, 495, 519 monster group, 439, 442, 515
normal random variable, 470
mathematical induction, 8 monstrous moonshine, 440
Monte Carlo method, 175, normal subgroup, 439
MathOverflow, 399
318, 323 normal topology, 231
matrix; characteristic poly-
nomial, 296; companion, Monty Hall problem, 427, Norwegian Academy of Sci-
296; constraint, 181; con- 428 ence and Letters, 299
sumption, 472; diago- moonshine module, 441 nowhere dense, 22, 481
nal, 447; exponential, 69; Moore–Kline theorem, 25 NP-complete problem, 385,
Fourier, 283; Hadamard, Mordell’s conjecture, 378 402
249; Householder, 248; in- Morse theory, 21 NRMP, 263
tegral, 446; left and right Moser spindle, 527 NSA, 209
inverses, 193, 195; multi- Moser’s circle problem, 91 NSA Cryptomathematics In-
plication, 282; orthogonal, multiplicative function, 171 stitute, 210
INDEX 577
number; algebraic, 30, 114, orthogonal matrix, 62, 248, Poincaré disk model, 273
117, 295, 329, 332; alge- 249 point-set topology, 230
braic integer, 343; alge- Ostrowski Prize, 394 Poisson random variable, 539
braic irrational, 133; Ba- Ostrowski’s theorem, 17 pole, 188
con, 306; Bernoulli, 459; Polignac’s conjecture, 33
Carmichael, 501, 504, 549; P versus NP problem, 183, Polish Cipher Bureau, 157
Catalan, 253, 254, 540; 490, 491 Pollard’s p−1 algorithm, 353
class, 446; congruent, 452, p-adic absolute value, 17 Pólya’s conjecture, 91
453; Erdős, 305; Erdős– p-adic number, 17, 18 polygonal number, 448
Bacon, 306; Fermat, 189, PA, 87 polyhedra, 369
421–423; Fibonacci, 131, packing density, 475 Polymath8 project, 33
235, 312, 313, 373, 495, PageRank algorithm, 465 polynomial; Alexander, 396;
497, 519; Gaussian inte- pair correlation problem, 81 Bernstein, 196; character-
ger, 511; Graham’s, 439, pair of pants, 21 istic, 77, 247, 296, 496;
442; harmonic, 553; Heeg- palindrome, 104 cyclotomic, 236; Euler’s,
ner, 343; irrational, 95, pan galactic gargle blaster, 91; Fermat’s last theorem,
114, 117, 205, 222, 227, 31 376; fixed points, 358;
258, 295; irrational of type paradox; Banach–Tarski, 61, generating fractal, 261;
(K, ν), 222; Liouville, 118; 241, 369, 381; birth- harmonic, 361; HOMFLY,
Lychrel, 104; Mersenne, day, 139; Burali–Forti, 88; 397; indecomposable, 300;
189, 463, 503; natural, Galileo’s, 27; liar’s, 85, Jones, 395–397, 399; knot,
1; ordinal, 87, 88; p- 86, 381; Newcomb’s, 383; 396; Laurent, 396; prime-
adic, 17, 18; perfect, 473; nonexistence of length, 63; generating, 91, 388; roots,
polygonal, 448; prime, 1, Russell’s, 85, 86, 381; 358
57, 289, 382, 409, 414; Smale’s, 241 polynomial-time algorithm,
quadratic irrational, 114; paradoxical decomposition, 501
Ramsey, 90; rational, 17, 62 polytope, 185
117, 222, 258; RSA chal- parallel postulate, 273, 369 poset, 483
lenge, 353; Skewes, 107, Pareto condition, 199 positive semidefinite, 250
108, 443, 444; square-free,
pariah group, 514 positive semidefinite matrix,
339, 342, 445, 549; taxi-
partial order, 483 250
cab, 551; transcendental,
partition function, 40, 59 possibility theorem, 199
31, 98, 114, 118, 169, 227–
Peano arithmetic, 87 power index, 201
229, 329; triangular, 447;
Peano curves, 25 power tower, 72, 109
van der Waerden, 90
Penrose–Banzhaf power in- powerset, 25, 31
number field, 18, 169
dex, 201 preference matrix, 263
number transcendental, 332 pentagon diagram, 295 primality test; Lucas–
numbers; algebraic, 169 Pentium processor, 33 Lehmer, 464
perfect, 525 primality testing, 47
perfect graph theorem, 525 prime; Fermat, 421, 424;
off-by-one error, 363
perfect number, 473 Gaussian, 511; Hawkins,
one-to-one, 27
periodic function, 281 408; largest known, 464;
onto, 27 permutation, 157, 158 Mersenne, 463, 473; regu-
open mapping theorem, 481 permutation matrix, 439 lar, 459; Sophie Germain,
open sector, 472 perturbation theory, 221 460
open set, 230 Picard iteration, 166 prime number, 1, 4, 57, 382,
Operation Fortitude, 160 pigeonhole principle, 223, 407, 409, 414; Cramér
operator; selfadjoint, 69; 338, 346 model, 410; Fermat, 422;
skew-symmetric, 69; uni- plaintext, 124 of the form x2 + dy 2 , 289;
tary, 69 Planck’s constant, 67 twin, 416
operator theory, 193 Platonic solid, 347, 348 prime number theorem, 34,
orbit, 101 Playfair’s axiom, 273 58, 107, 129, 153, 172,
order; partial, 483; total, 31, Pluto, 75 181, 187, 189, 303, 339,
483; well, 483 Poincaré conjecture, 433, 366, 410, 424, 465, 515,
ordinal number, 87, 88 491, 505, 507 522, 528, 554
578 INDEX
prime-counting function, quadratic form, 20, 287, 446; relatively prime, 3

107, 108, 129, 151, 516 class number, 287; dis- relativity, 11
primitive recursive, 71 criminant, 287; equiva- residue, 188
primitive root, 416 lent, 287; Ramanujan’s Riemann hypothesis, 108–
principle; contraction map- ternary, 448; universal, 110, 153, 154, 169, 190,
ping, 165; Dirichlet’s box, 446 388, 407, 409, 433, 448,
223; GAGA, 233; general quadratic Gauss sum, 284 490, 501, 503, 521, 552,
comprehension, 85; Haus- quadratic irrational, 114 553, 562
dorff maximality, 484; Lit- quadratic reciprocity, 291 Riemann hypothesis for
tlewood, 555; local-global, quadrivium, 561 function fields, 170
20; of equivalence, 12, quantum mechanics, 67, 68, Riemann sphere, 188, 377
13; of inclusion-exclusion, 79 Riemann zeta function, 3, 48,
158; pigeonhole, 223, 338, quaternion, 42 79, 81, 110, 139, 151, 169–
346; uniform bounded- 171, 187, 189, 190, 293,
ness, 481; well-ordering, R, 487 363, 364, 388, 407, 409,
206, 483 radius of convergence, 152 410, 459
principle of inclusion- Ramanujan conjecture, 294 Riemann–Hurwitz formula,
exclusion, 158 Ramanujan sum, 173 377
probability vector, 215 Ramanujan’s constant, 342, Riemann–Roch theorem, 46,
problem; 3x + 1, 101; Basel, 534 170
140; birthday, 138; Buf- Ramsey number, 89, 90 Riemannian, 12
fon’s needle, 176; canon- Ramsey theory, 89, 341, 442, Riesz projection, 301
ical linear programming, 444 Riesz representation theo-
183; class number one, random matrix theory, 79, rem, 197
287; congruent number, 408 Risch algorithm, 302, 414
452; cookie, 313; diet, random number generator, Rogers–Ramanujan contin-
182; Dirichlet divisor, 172; 131, 179 ued fraction, 115
Hadwiger–Nelson, 527; random variable; Bernoulli, root of unity, 236
halting, 122; Kakutani’s, 55, 179; binomial, 55; Ross Mathematics Program,
101; linear programming, Cauchy, 53; characteris- 235
181; Monty Hall, 427, 428; tic function of, 54; con- Roth’s theorem, 133, 223,
NP-complete, 385, 402; of tinuous, 51; convergence 228
small denominators, 221; in probability, 96; density, Roth’s theorem (arithmetic
P versus NP, 183, 491; 51; mean of, 51; moments progressions), 228, 340
sleeping beauty, 428; stars of, 51, 80; normal, 470; RSA, 47, 181, 351, 353
and bars, 313; Syracuse, normally distributed, 52; RSA challenge number, 353
101; traveling salesman, Poisson, 539; standard de- Rubik’s Cube, 333; solution,
183, 385, 386; two en- viation of, 51; standard- 334
velopes, 382; Wetzel’s, ized, 52; supported on Rubik’s Cube group, 333
307 primes, 424; uniform, 52; Russell’s paradox, 85, 86, 381
Project Euler, 493 variance of, 51; Weibull,
Project Flyspeck, 476 319
random variables; inde- sabermetrics, 317
projective plane, 508
pendent, identically dis- Sage, 452, 453, 519
PROMYS, 421
tributed, 80 SageMath, 519
proof by picture, 206 Sato–Tate conjecture, 294
rank of an elliptic curve, 47
Proofs from The Book, 189 schlicht, 393
rank-nullity theorem, 299
pseudoprime, 500, 549 rational point, 46 Schrödinger equation, 67
Putnam Competition, 75 reciprocal sum, 2 Schrödinger operator, 407
Putnam Fellow, 75 recreational mathematics, 7 Schwarzschild line element,
Pythagorean theorem, 311, reflexive, 28 13
314, 319, 320 refractive index; of a mate- Schweitzer competition, 76
Pythagorean triple, 453 rial, 13; of space-time, 13 Scientific American, 7
Pythagorean won-loss for- regular prime, 459 SCIgen, 489
mula, 319 regular topology, 231 second category, 481
INDEX 579
second incompleteness theo- Stevedore knot, 398 The Book, 3

rem, 86, 484 Stigler’s law of eponymy,
see and say sequence, 402 495, 552, 556
self-similar, 271 Stirling’s formula, 115, 479,
539 theorem; 15-, 446; 290-,
selfadjoint, 67
Stone’s theorem, 68 447; Abel–Ruffini, 329;
semicircle law, 80
Alexander’s, 399; Ar-
sensitive dependence on ini- Strassen algorithm, 283
row’s impossibility, 199;
tial conditions, 258 stress-energy tensor, 12
Atiyah–Singer index, 299;
sequence; Euclid–Mullin, 513 strong perfect graph theo-
Baker–Heegner–Stark,
series; finite geometric, 422; rem, 526
288, 534; Beurling’s,
Flint Hills, 229; geomet- strong triangle inequality, 18
195; Birkhoff ergodic,
ric, 151; Leibniz, 147; Student’s t-distribution, 537
96; Borsuk–Ulam, 165;
power series, 324; radius stylo, 488
Brouwer’s fixed-point,
of convergence of a power, subfactor, 398
164, 494, 543; Brunn–
152 Sudoku, 401, 403
Minkowski, 23; Cantor
Severini–Egorov theorem, sum; of four squares, 445; of
surjection, 24; Cantor’s
556 powers of the first n pos-
powerset, 31; central
Shor’s algorithm, 352 itive integers, 8; of recip-
limit, 51, 52, 55, 79, 176,
Sierpiński triangle, 18, 271 rocals of numbers without
179, 303, 411, 537, 539;
sieve of Eratosthenes, 408 a 9 in their decimal rep-
Chinese remainder, 238,
sigma function, 172 resentation, 35; of recip-
535; Clausen–von Staudt,
significand, 54 rocals of perfect squares,
459–461; closed graph,
simple, 188 33; of reciprocals of prime
481; cosmological, 402;
simple group, 439 powers, 190; of reciprocals
dimension, 196; Dirich-
simplex algorithm, 182, 185 of primes, 4, 189, 197; of
let’s approximation, 223;
simplex method, 264 reciprocals of twin primes,
Dirichlet’s on primes in
simply connected, 506 33, 59; of three squares,
arithmetic progressions,
Simpson’s formula, 496 445, 448; Ramanujan, 173
3, 58, 291, 354, 415, 522,
Six Degrees of Kevin Bacon, sum of divisors function, 172
528, 552, 553; Egorov’s,
1 surface, 505
555; Euclid’s, 3, 87, 111,
Skewes’s number, 107, 108, surjective, 27
189, 230, 423, 513; Euler–
443, 444 symmetric, 28
Lucas, 422; Fermat’s last,
sleeping beauty problem, 428 symmetric group, 157
39, 145, 169, 208, 234,
Smale’s paradox, 241 syndetic set, 342
311, 375, 376, 378, 452,
Society for American Base- Syracuse problem, 101
457, 460, 476; Fermat’s
ball Research, 317 Szemerédi’s theorem, 90, little, 351, 354, 375, 499,
Space Invaders, 359 228, 340, 341, 511 501, 549; Fermat’s polyg-
span, 196 onal number, 448; four
special theory of relativity, Takagi function, 360 color, 345, 346, 476; four-
11 Taniyama–Shimura conjec- square (Jacobi), 445; four-
spectral theorem, 67 ture, 48 square (Lagrange), 445,
speed of light, 11, 13 taxicab Carmichael number, 448; Fubini–Tonelli, 472;
Sperner’s lemma, 494, 544– 551 fundamental theorem of
546 Taylor approximation, 139 algebra, 118, 358, 484;
sphere eversion, 241 Telperion the Silver, 254 fundamental theorem of
sphere packing, 475 Temperly–Lieb(–Jones) rela- arithmetic, 29, 37; Gauss–
sporadic group, 514 tions, 399 Kuzmin, 97; Gauss–
stable marriages, 264 tensor, 11 Wantzel, 424; Gelfond–
stable matching, 263, 264 tensor analysis, 11 Schneider, 117, 119;
standard deviation, 51 tensor product, 295 Goodstein’s, 87; Green–
standardize, 52 term-by-term multiplication, Tao, 2, 58, 340, 511, 512,
Star Trek, 85, 145, 391 109, 171 521, 522; Hahn–Banach,
stars and bars problem, 313 ternary Goldbach conjecture, 197; impossibility, 199;
Steele Prize, 394 57 intermediate value, 77,
steradians, 371 TEX, 363 78, 164, 461; invariance
580 INDEX
of domain, 545; Kirby– Toeplitz operator, 301 van der Waerden’s theorem,
Paris, 87; Kolmogorov– topological space, 230, 293, 341
Arnold–Moser, 221; Kro- 481 variance, 51
necker’s approximation, topology, 230, 242; base for Venice, 201
134; Kronecker–Weyl, a, 230; definition, 230; Venus, 68
133; Liouville’s, 118, Hausdorff, 231; noncom- Vigenère cipher, 212, 224
119, 329; Lusin’s, 555; mutative, 302; normal, Vitali set, 65, 277
Markov’s, 399; Mason– 231; regular, 231 Volterra integration opera-
Stothers, 169, 375–377; torsion subgroup, 47 tor, 195, 197
mean value theorem for torus, 243, 346, 505, 508 von Mangoldt function, 409
integrals, 35; Mertens’s, total order, 31, 483 von Neumann algebra, 163,
37, 109, 110; Mertens’s totient function, 416 396, 397
(prime reciprocals), 189; transcendence degree, 169
Moore–Kline, 25; Müntz– transcendental, 169, 295, 329 Wallace–Bolyai–Gerwien
Szász, 197; Nielsen– transcendental number, 31, theorem, 369
Schreier, 484; open map- 98, 227, 229, 329, 332 Wallis’s formula, 539
ping, 481; Ostrowski’s, 17; transition matrix, 215 Waring’s problem, 39
perfect graph, 525; prime transitive, 28 wave function, 67
number, 34, 58, 107, 129, traveling salesman problem, weak-field approximation, 13
153, 172, 181, 187, 189, 183, 385, 386 Weierstrass M -test, 360
303, 339, 366, 410, 424, tree, 253 Weierstrass approximation
465, 515, 522, 528, 554; trefoil knot, 395, 505 theorem, 193, 196
Pythagorean, 311, 314, triangular number, 447 well-ordered, 278
319, 320; rank-nullity, trivial zeros of the zeta func- well-ordering principle, 206,
299; Riemann–Roch, 170; tion, 152 483
Riesz representation, 197; Tunnell’s theorem, 454 Wetzel’s problem, 307
Roth’s, 133, 227; Roth’s TUNNY, 210 Weyl’s uniform distribution
(arithmetic progressions), Turing machine, 121 property, 96
228, 340; second incom- twin prime conjecture, 33, Whitney–Graustein theo-
pleteness, 484; Severini– 57, 408, 433, 522, 528 rem, 242
Egorov, 556; Stone’s, 68; twin primes constant, 34, 460 Wiener algebra, 148
strong perfect graph, 526; Twitter, 305 Wiener process, 470
Szemerédi’s, 90, 228, 340, two envelopes problem, 382 Wiener’s 1/f theorem, 148
341, 511; Thue’s on num-
Wigner’s semicircle law, 80,
bers with fixed prime fac- Ulam spiral, 522 253
tors, 442; Thue–Siegel–
Ulam’s conjecture, 101 Wilf–Zeilberger algorithm,
Roth, 228; Toeplitz in-
Ultra, 157 416
dex, 301; Tunnell’s, 454;
ultrametric, 18 William Lowell Putnam
van der Waerden’s, 341;
undecidable, 109, 307 Mathematical Competi-
Wallace–Bolyai–Gerwien,
uniform boundedness princi- tion, 75
369; Weierstrass ap-
ple, 481 winding number, 302
proximation, 193, 196;
uniformly strict contraction, winning coalition, 201
Whitney–Graustein, 242;
165 Wolfram Alpha, 302, 413,
Wiener’s 1/f , 148; Zeck-
unique factorization domain, 414
endorf’s, 312, 313, 373
288 Wolfram Mathematica, see
Thompson group, 63 universal, 446 also Mathematica
Thue’s theorem on numbers universal machine, 121
with fixed prime factors, universal quadratic form, 446 Zaremba’s conjecture, 323,
442 up-arrow, 443 326
Thue–Siegel–Roth theorem, upper density, 340 Zeckendorf decomposition,
228 upper multiplicative density, 312, 373
Thurston’s corrugations, 241 341 Zeckendorf’s theorem, 312,
Thwaites conjecture, 101 Uranus, 221 313, 373
time average, 95 Zermelo–Fraenkel axioms,
Toeplitz index theorem, 301 van der Waerden number, 90 85, 141, 269
INDEX 581
Zermelo–Fraenkel set theory, ZF, see also Zermelo– zombie infestation, 371
85, 86, 108, 278, 484 Fraenkel set theory Zorn’s lemma, 483
zeta function, see also Rie- ZFC, see also Zermelo–
mann zeta function Fraenkel set theory
This book is an outgrowth of a collection of 100 problems chosen to celebrate the
100th anniversary of the undergraduate math honor society Pi Mu Epsilon. Each
chapter describes a problem or event, the progress made, and connections to
entries from other years or other parts of mathematics. In places, some knowledge of
analysis or algebra, number theory or probability will be helpful. Put together, these
problems will be appealing and accessible to energetic and enthusiastic math majors
and aficionados of all stripes.
Stephan Ramon Garcia is WM Keck Distinguished Service

Professor and professor of mathematics at Pomona College.
He is the author of four books and over eighty research
articles in operator theory, complex analysis, matrix analysis,
number theory, discrete geometry, and other fields. He has
coauthored dozens of articles with students, including one
that appeared in The Best Writing on Mathematics: 2015. He
is on the editorial boards of Notices of the AMS, Proceedings
of the AMS, American Mathematical Monthly, Involve, and
Annals of Functional Analysis. He received four NSF research
grants as principal investigator and five teaching awards
from three different institutions. He is a fellow of the American Mathematical Society
and was the inaugural recipient of the Society’s Dolciani Prize for Excellence in
Research.
Steven J. Miller is professor of mathematics at Williams

College and a visiting assistant professor at Carnegie Mellon
University. He has published five books and over one hundred
research papers, most with students, in accounting, computer
Photo courtesy of Cesar Silva.
science, economics, geophysics, marketing, mathematics,

operations research, physics, sabermetrics, and statistics.
He has served on numerous editorial boards, including the
Journal of Number Theory, Notices of the AMS, and the
Pi Mu Epsilon Journal. He is active in enrichment and supple-
mental curricular initiatives for elementary and secondary
mathematics, from the Teachers as Scholars Program and
VCTAL (Value of Computational Thinking Across Grade Levels), to numerous math
camps (the Eureka Program, HCSSiM, the Mathematics League International
Summer Program, PROMYS, and the Ross Program). He is a fellow of the American
Mathematical Society, an at-large senator for Phi Beta Kappa, and a member of the
Mount Greylock Regional School Committee, where he sees firsthand the challenges
of applying mathematics.
For additional information

and updates on this book, visit
www.ams.org/bookpages/mbk-121
MBK/121

100 Years of Math Milestones 9781470436520 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

100 Years of Math Milestones 9781470436520 PDF

Uploaded by

Copyright:

Available Formats

100 YEARS OF

For additional information and updates on this book, visit

Library of Congress Cataloging-in-Publication Data

1916. Ostrowski’s Theorem 17

1927. William Lowell Putnam Mathematical Competition 75

1935. Hilbert’s Seventh Problem 117

1936. Alan Turing 121

1937. Vinogradov’s Theorem 127

1938. Benford’s Law 131

1939. The Power of Positive Thinking 137

1940. A Mathematician’s Apology 141

1941. The Foundation Trilogy 145

1942. Zeros of ζ(s) 151

1943. Breaking Enigma 157

1944. Theory of Games and Economic Behavior 163

1945. The Riemann Hypothesis in Function Fields 169

1946. Monte Carlo Method 175

1947. The Simplex Method 181

1948. Elementary Proof of the Prime Number Theorem 187

1949. Beurling’s Theorem 193

1950. Arrow’s Impossibility Theorem 199

1952. NSA Founded 209

1953. The Metropolis Algorithm 215

1954. Kolmogorov–Arnold–Moser Theorem 221

1955. Roth’s Theorem 227

1956. The GAGA Principle 233

1957. The Ross Program 235

1958. Smale’s Paradox 241

1959. QR Decomposition 247

1960. The Unreasonable Eﬀectiveness of Mathematics 251

1961. Lorenz’s Nonperiodic Flow 257

1963. Continuum Hypothesis 269

1964. Principles of Mathematical Analysis 275

1965. Fast Fourier Transform 281

1966. Class Number One Problem 287

1967. The Langlands Program 293

1968. Atiyah–Singer Index Theorem 299

1969. Erdős Numbers 305

1970. Hilbert’s Tenth Problem 311

1971. Society for American Baseball Research 317

1972. Zaremba’s Conjecture 323

1973. Transcendence of e Centennial 329

1974. Rubik’s Cube 333

1975. Szemerédi’s Theorem 339

1976. Four Color Theorem 345

1977. RSA Encryption 351

1978. Mandelbrot Set 357

1979. TEX 363

1980. Hilbert’s Third Problem 369

1981. The Mason–Stothers Theorem 375

1982. Two Envelopes Problem 381

1983. Julia Robinson 385

1984. 1984 391

1985. The Jones Polynomial 395

1986. Sudokus and Look and Say 401

1987. Primes, the Zeta Function, Randomness, and Physics 407

1988. Mathematica 413

1989. PROMYS 421

1990. The Monty Hall Problem 427

1991. arXiv 433

1992. Monstrous Moonshine 439

1993. The 15-Theorem 445

1997. The Nobel Prize of Merton and Scholes 469